date:20170121

[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16636
  
**[Test build #71795 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71795/testReport)**
 for PR 16636 at commit 
[`f99e078`](https://github.com/apache/spark/commit/f99e078dd677798c8d9674ea5e08e9a95b43c065).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16228
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71794/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16228
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16228
  
**[Test build #71794 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71794/testReport)**
 for PR 16228 at commit 
[`7e52a83`](https://github.com/apache/spark/commit/7e52a837a984716ea8a0747c73f44b81bf592ff6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16671
  
So far, the best workaround is that predicate-based JDBC API; otherwise, as 
I mentioned above, we need to do it using sampling to find the boundary of each 
block.

> In one embodiment, a user may specify a block size, via an interface. 
Blocks may be generated at the time of table partitioning. For example, 
according to a sampling technique described below, a user may select a 
particular block size and then the utility can determine the average number of 
table rows per block based on the number of storage bytes per row. Block-by 
boundary values for that range of rows of that block are determined based on 
the selected amount of rows, and provided in a query statement generated to 
obtain the statistical value for the block. That is, select rows from each 
table may be sampled or range-based. The select rows (or columns) are 
aggregated to form one âblockâ from the database table. The âblockâ may 
include the whole table, but is typically select rows of the whole table.

A few years ago, I did implement the sampling based table logical 
partitioning. See the link: https://www.google.com/patents/US20160275150. It 
works pretty well.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16657: [SPARK-19306][Core] Fix inconsistent state in DiskBlockO...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16657
  
**[Test build #71800 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71800/testReport)**
 for PR 16657 at commit 
[`b0fe795`](https://github.com/apache/spark/commit/b0fe795157a41925ba38bba02ee10a79518c8e42).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71798/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15314
  
re-ping @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #71798 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71798/testReport)**
 for PR 15314 at commit 
[`1d41615`](https://github.com/apache/spark/commit/1d41615863e7d4a0cc225a9a32cc1b175af22a49).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15314
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97214965
  
--- Diff: R/pkg/inst/tests/testthat/test_Windows.R ---
@@ -20,7 +20,7 @@ test_that("sparkJars tag in SparkContext", {
   if (.Platform$OS.type != "windows") {
 skip("This test is only for Windows, skipped")
   }
-  testOutput <- launchScript("ECHO", "a/b/c", capture = TRUE)
+  testOutput <- launchScript("ECHO", "a/b/c", wait = TRUE)
--- End diff --

Hmm, I've tried, I don't think it would work.
When calling `system2(.., wait = FALSE, capture = "")` the output to stdout 
is actually from the child process, so I don't think we would be able to see it 
from the R process.
We could redirect it, but then it would be the same as `system2(..., wait = 
FALSE, capture = TRUE)` but again it wouldn't be what we are normally calling.

I think we would need to dig deeper on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
Hi, @gatorsmile .
This is the original PR which has two fixes together now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16624
  
Hi, @gatorsmile .
I tested here and applied to #16579 .
PR #16579 has two fixes. After merging #16579 , I'm going to close this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97214415
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -172,11 +172,23 @@ class Dataset[T] private[sql](
 this(sqlContext.sparkSession, logicalPlan, encoder)
   }
 
-  /** A friendly name for this Dataset */
+  /**
+* A friendly name for this Dataset.
+*
+* @group basic
+* @since 2.2.0
+*/
   @Since("2.2.0")
   var name: String = null
 
-  /** Assign a name to this Dataset */
+  /**
+* Assign a name to this Dataset to display in the UI storage tab when 
cached.
+*
+* @param name A friendly name for this Dataset
--- End diff --

`_name`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97214402
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -85,17 +85,20 @@ def rdd(self):
 self._lazy_rdd = RDD(jrdd, self.sql_ctx._sc, 
BatchedSerializer(PickleSerializer()))
 return self._lazy_rdd
 
-@since(2.1)
+@since(2.2)
 def name(self):
 """
 Return the name of this Dataset.
 """
 return self._jdf.name()
 
 @ignore_unicode_prefix
-@since(2.1)
+@since(2.2)
 def setName(self, name):
-"""
+"""Sets the name of this Dataset.
+
+The name wil be displayed on the storage tab of the UI if the 
Dataset is cached.
--- End diff --

`wil` -> `will`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16609: [SPARK-8480] [CORE] [PYSPARK] [SPARKR] Add setNam...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16609#discussion_r97214411
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -85,17 +85,20 @@ def rdd(self):
 self._lazy_rdd = RDD(jrdd, self.sql_ctx._sc, 
BatchedSerializer(PickleSerializer()))
 return self._lazy_rdd
 
-@since(2.1)
+@since(2.2)
 def name(self):
 """
 Return the name of this Dataset.
 """
 return self._jdf.name()
 
 @ignore_unicode_prefix
-@since(2.1)
+@since(2.2)
 def setName(self, name):
-"""
+"""Sets the name of this Dataset.
+
+The name wil be displayed on the storage tab of the UI if the 
Dataset is cached.
--- End diff --

I'm not sure but maybe this should say "DataFrame" instead


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16657: [SPARK-19306][Core] Fix inconsistent state in Dis...

2017-01-21 Thread jerryshao

Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/16657#discussion_r97214376
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ---
@@ -206,18 +209,22 @@ private[spark] class DiskBlockObjectWriter(
 streamOpen = false
 closeResources()
   }
+   } catch {
+  case e: Exception =>
+logError("Uncaught exception while closing file " + file, e)
+}
 
-  val truncateStream = new FileOutputStream(file, true)
-  try {
-truncateStream.getChannel.truncate(committedPosition)
-file
-  } finally {
-truncateStream.close()
-  }
+var truncateStream: FileOutputStream = null
+try {
+  truncateStream = new FileOutputStream(file, true)
+  truncateStream.getChannel.truncate(committedPosition)
+  file
 } catch {
   case e: Exception =>
 logError("Uncaught exception while reverting partial writes to 
file " + file, e)
 file
+} finally {
+  truncateStream.close()
--- End diff --

Sorry about it. I will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97214373
  
--- Diff: R/pkg/inst/tests/testthat/test_Windows.R ---
@@ -20,7 +20,7 @@ test_that("sparkJars tag in SparkContext", {
   if (.Platform$OS.type != "windows") {
 skip("This test is only for Windows, skipped")
   }
-  testOutput <- launchScript("ECHO", "a/b/c", capture = TRUE)
+  testOutput <- launchScript("ECHO", "a/b/c", wait = TRUE)
--- End diff --

we could, but unfortunately, we don't actually call launchScript with 
wait/capture = TRUE
we call wait/capture = FALSE and expect to let console/stdout to leak 
through, and return NULL.

I'll try to add test for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing ...

2017-01-21 Thread tejasapatil

Github user tejasapatil closed the pull request at:

https://github.com/apache/spark/pull/15040


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16654
  
**[Test build #71799 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71799/testReport)**
 for PR 16654 at commit 
[`5937ce7`](https://github.com/apache/spark/commit/5937ce703df857b109982f49bca96b9c3c325587).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-21 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
Using the *predicates* parameters to split the table seems reasonable, but 
it just put some work should be done by Spark to users in my personal opinion. 
Users need know how to split the table uniform at first,  so it may use the 
`count(*)` extra to explode the distribution of the table. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71792/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15505
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16642
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16642
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71793/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71792 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71792/testReport)**
 for PR 15505 at commit 
[`0e2dec5`](https://github.com/apache/spark/commit/0e2dec532780f7e3a5c31582732e10e85e80f1d9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71793 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71793/testReport)**
 for PR 16642 at commit 
[`f76f75b`](https://github.com/apache/spark/commit/f76f75b8e8ec804307c2b80ab4a7ceb02dcae716).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SparkListenerExecutorBlacklisted(`
  * `case class SparkListenerExecutorUnblacklisted(time: Long, executorId: 
String)`
  * `case class SparkListenerNodeBlacklisted(`
  * `case class SparkListenerNodeUnblacklisted(time: Long, hostId: String)`
  * `case class QualifiedTableName(database: String, name: String)`
  * `  class MaintenanceTask(periodMs: Long, task: => Unit, onError: => 
Unit) `
  * `class FindHiveSerdeTable(session: SparkSession) extends 
Rule[LogicalPlan] `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15314
  
**[Test build #71798 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71798/testReport)**
 for PR 15314 at commit 
[`1d41615`](https://github.com/apache/spark/commit/1d41615863e7d4a0cc225a9a32cc1b175af22a49).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15314: [SPARK-17747][ML] WeightCol support non-double numeric d...

2017-01-21 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/15314
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SPARK-19327][SparkSQL] a better balance partition metho...

2017-01-21 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
Yes, this solution is not suitable for large table, but I can not find a 
better solution, this is the best optimisation I can find.
So just add it as a choose, let the users know what he is doing, and need a 
explicit enable.

From my experience, the origin equal step method can lead to some problem 
for real data. This conclusion can be get from the spark-user email and our 
real scenario. Such as users will use the `id` to partition the table, because 
the `id` is unique and with index, but after many inserts and deletes, the `id` 
range is very large, and data will lead to a skew distribution by `id`.

Very large table is not so common, and if the large table with sharding, 
this method maybe acceptable.

My personal opinion is: 
>Given another choose for users maybe valuable, only we do not enable it by 
default.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16654: [SPARK-19303][ML][WIP] Add evaluate method in clustering...

2017-01-21 Thread zhengruifeng

Github user zhengruifeng commented on the issue:

https://github.com/apache/spark/pull/16654
  
I think now clustering metrics are not that general, comparing with 
classification/regression metrics:
WSSSE only apply to `KMeans` and `BiKMeans`
Loglikelihood only apply to `GMM`

I had opened a jira about clusteringEvaluator 
https://issues.apache.org/jira/browse/SPARK-14516, which may add metrics 
included in scikit-learn 
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics.cluster

@yanboliang @jkbradley What's your opinion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-21 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97213829
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -168,6 +168,43 @@ case class AlterTableRenameCommand(
 }
 
 /**
+ * A command that add columns to a table
+ * The syntax of using this command in SQL is:
+ * {{{
+ *   ALTER TABLE table_identifier
+ *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+ * }}}
+*/
+case class AlterTableAddColumnsCommand(
+table: TableIdentifier,
+columns: Seq[StructField]) extends RunnableCommand {
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val catalog = sparkSession.sessionState.catalog
+val catalogTable = DDLUtils.verifyAlterTableAddColumn(catalog, table)
+
+// If an exception is thrown here we can just assume the table is 
uncached;
+// this can happen with Hive tables when the underlying catalog is 
in-memory.
+val wasCached = 
Try(sparkSession.catalog.isCached(table.unquotedString)).getOrElse(false)
--- End diff --

`AlterTableRenameCommand` has similar way to do the uncaching. I thought 
there might be a reason it exists there. So I did the same. But looking  at the 
code, it seems you are right. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16624
  
Please update the PR description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-21 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97213702
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -584,14 +593,18 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   // Sets the `schema`, `partitionColumnNames` and `bucketSpec` from 
the old table definition,
   // to retain the spark specific format if it is. Also add old data 
source properties to table
   // properties, to retain the data source table format.
-  val oldDataSourceProps = 
oldTableDef.properties.filter(_._1.startsWith(DATASOURCE_PREFIX))
--- End diff --

I think the valuable name needs to change since now the hive table and 
datasource table both populate the table properties with the schema. Both cases 
will go through this path. I temporarily block the datasource table ALTER ADD 
columns because I am not confident yet if I have holes. But according to 
@gatorsmile , it may be safe to support datasource table too. So I am actually 
adding more test cases to confirm. I may remove the condition in this PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16636
  
**[Test build #71796 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71796/testReport)**
 for PR 16636 at commit 
[`e3cc423`](https://github.com/apache/spark/commit/e3cc423e2ecf3e9128b8036905c044a3f658cd25).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16587
  
**[Test build #71797 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71797/testReport)**
 for PR 16587 at commit 
[`c6d6a24`](https://github.com/apache/spark/commit/c6d6a2448d51633c22d730c60d219aa16ac81bb1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-21 Thread xwu0226

Github user xwu0226 commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97213578
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
 ---
@@ -107,7 +107,13 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext taskAttemptCont
   footer = readFooter(configuration, file, range(split.getStart(), 
split.getEnd()));
   MessageType fileSchema = footer.getFileMetaData().getSchema();
   FilterCompat.Filter filter = getFilter(configuration);
-  blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema);
+  try {
+blocks = filterRowGroups(filter, footer.getBlocks(), fileSchema);
+  } catch (IllegalArgumentException e) {
+// In the case where a particular parquet files does not contain
--- End diff --

Yes. we can. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture s...

2017-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16646


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16636: [SPARK-19279] [SQL] Block Creating a Hive Table With an ...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16636
  
**[Test build #71795 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71795/testReport)**
 for PR 16636 at commit 
[`f99e078`](https://github.com/apache/spark/commit/f99e078dd677798c8d9674ea5e08e9a95b43c065).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16587: [SPARK-19229] [SQL] Disallow Creating Hive Source Tables...

2017-01-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/16587
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16646: [SPARK-19291][SPARKR][ML] spark.gaussianMixture supports...

2017-01-21 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16646
  
Merged into master. If there are new comments about the model persistence 
compatibility issue, we can address them in follow-up work. Thanks for all your 
reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
For `SET -v` without sorting, please refer #16624 , too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegressi...

2017-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16516: [SPARK-19155][ML] MLlib GeneralizedLinearRegression fami...

2017-01-21 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/16516
  
Merged into master, branch-2.1 and branch-2.0. Thanks for all your 
reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SparkSQL] a better balance partition method for jdbc AP...

2017-01-21 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
@gatorsmile can you take a look at?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SparkSQL] a better balance partition method for jdbc AP...

2017-01-21 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
Table2 with about 5M rows, 200partition by SparkSQL.

(The table using the MySQL sharding, and every partition will return 10K 
rows at most)


old partition result(elements in each partition)


>1,49,54,53,60,59,48,61,52,57,60,69,58,57,50,52,51,66,58,45,59,52,61,56,67,51,45,49,70,49,58,59,61,53,50,53,47,50,46,53,55,53,62,55,48,58,52,62,62,37,65,59,58,55,61,59,46,53,49,49,61,72,60,46,50,51,45,47,55,63,64,63,55,47,65,57,60,60,51,45,48,77,58,57,59,39,50,62,55,57,49,63,51,38,49,66,62,58,53,54,50,54,52,69,51,49,61,60,64,49,52,50,54,58,48,51,50,49,41,68,54,45,65,62,44,52,64,58,47,51,65,47,37,42,39,44,51,65,56,54,69,51,61,63,51,52,47,55,58,66,47,54,53,53,60,66,66,68,64,66,55,58,64,55,50,57,46,56,39,60,57,63,40,51,56,58,44,46,46,44,42,52,52,44,53,46,55,57,68,57,62,48,47,52,59,58,49,44,52,47

(most of data is in partition 0, but each partition will return 10K at most 
because our sharding limit.)


new partition result(elements in each partition)


>2083,1,1,6932,9799,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,8150,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,7,9,70,2,1,1,1,655,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
 
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,40,76,145,38,86,176,369,696,1338,2776,5381'


count cost time: 0.8ms


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16661: [SPARK-19313][ML][MLLIB] GaussianMixture should limit th...

2017-01-21 Thread sethah

Github user sethah commented on the issue:

https://github.com/apache/spark/pull/16661
  
ping @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16228
  
**[Test build #71794 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71794/testReport)**
 for PR 16228 at commit 
[`7e52a83`](https://github.com/apache/spark/commit/7e52a837a984716ea8a0747c73f44b81bf592ff6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SparkSQL] a better balance partition method for jdbc AP...

2017-01-21 Thread djvulee

Github user djvulee commented on the issue:

https://github.com/apache/spark/pull/16671
  
Here is the real data test result:
Table with 1.2Million rows, 50partition by SparkSQL.



old partition result(elements in each partition)


>100061,100064,100059,100066,100065,100065,100066,100066,100063,100061,100066,100065,70747,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


new partition result(elements in each partition)

>19543,19544,39083,39088,19544,19545,39085,19544,19542,19543,19545,39086,39087,19544,19545,39088,19544,19544,39088,19543,19545,39088,19544,19545,39088,19544,19544,39088,19544,19545,19543,19544,39086,19543,19545,39086,39086,19544,19545,39088,19544,19545,39088,19544,19544,39088,19544,19545,20701,0


count cost time: 1.27s












---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16671: [SparkSQL] a better balance partition method for jdbc AP...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16671
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16671: [SparkSQL] a better balance partition method for ...

2017-01-21 Thread djvulee

GitHub user djvulee opened a pull request:

https://github.com/apache/spark/pull/16671

[SparkSQL] a better balance partition method for jdbc API

## What changes were proposed in this pull request?

The partition method in` jdbc` using the equal
step, this can lead to skew between partitions. The new method
introduce a balance partition method base on the elements when split the
elements, this can relieve the skew problem with a little query cost.


## How was this patch tested?
UnitTest and real data.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/djvulee/spark balancePartition

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16671.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16671


commit 88cdf294aa579f65b8272870d762548cf54349ce
Author: DjvuLee 
Date:   2017-01-20T09:53:57Z

[SparkSQL] a better balance partition method for jdbc API

The partition method in jdbc when specify the column using te equal
step, this can lead to skew between partitions. The new method
introduce a new partition method base on the elements when split the
elements, this can keep the elements balanced between partitions.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16594: [SPARK-17078] [SQL] Show stats when explain

2017-01-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/16594#discussion_r97212822
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -649,6 +649,14 @@ object SQLConf {
   .doubleConf
   .createWithDefault(0.05)
 
+  val SHOW_STATS_IN_EXPLAIN =
--- End diff --

Then, when the stats are not accurate, will it be the cause of an 
inefficient plan? If so, why not showing them the number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71788/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16611
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16611: [SPARK-17967][SPARK-17878][SQL][PYTHON] Support for arra...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16611
  
**[Test build #71788 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71788/testReport)**
 for PR 16611 at commit 
[`28abf86`](https://github.com/apache/spark/commit/28abf86f5543996c55910f8c097dc6ede10a7d86).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread shivaram

Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97212748
  
--- Diff: R/pkg/inst/tests/testthat/test_Windows.R ---
@@ -20,7 +20,7 @@ test_that("sparkJars tag in SparkContext", {
   if (.Platform$OS.type != "windows") {
 skip("This test is only for Windows, skipped")
   }
-  testOutput <- launchScript("ECHO", "a/b/c", capture = TRUE)
+  testOutput <- launchScript("ECHO", "a/b/c", wait = TRUE)
--- End diff --

Can we add a similar test with something getting printed on `stdout` from 
the JVM ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16669: [SPARK-16101][SQL] Refactoring CSV read path to be consi...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16669
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16669: [SPARK-16101][SQL] Refactoring CSV read path to be consi...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16669
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71789/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16669: [SPARK-16101][SQL] Refactoring CSV read path to be consi...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16669
  
**[Test build #71789 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71789/testReport)**
 for PR 16669 at commit 
[`b2938ae`](https://github.com/apache/spark/commit/b2938ae080ee7c36ef751b0bca57c2bfbdf99b43).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16624
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71791/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16624
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16624
  
**[Test build #71791 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71791/testReport)**
 for PR 16624 at commit 
[`075f466`](https://github.com/apache/spark/commit/075f4667020438a650659197ac8212c785775e75).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16626: [SPARK-19261][SQL] Alter add columns for Hive tab...

2017-01-21 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16626#discussion_r97212553
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -736,6 +736,22 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   }
 
   /**
+   * Create a [[AlterTableAddColumnsCommand]] command.
+   *
+   * For example:
+   * {{{
+   *   ALTER TABLE table1
+   *   ADD COLUMNS (col_name data_type [COMMENT col_comment], ...);
+   * }}}
+   */
+  override def visitAddTableColumns(ctx: AddTableColumnsContext): 
LogicalPlan = withOrigin(ctx) {
+AlterTableAddColumnsCommand(
+  visitTableIdentifier(ctx.tableIdentifier),
+  Option(ctx.columns).map(visitColTypeList).getOrElse(Nil)
--- End diff --

columns are not optimal for this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16659: [SPARK-19309][SQL] disable common subexpression e...

2017-01-21 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16659#discussion_r97212331
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/EquivalentExpressions.scala
 ---
@@ -67,28 +67,33 @@ class EquivalentExpressions {
   /**
* Adds the expression to this data structure recursively. Stops if a 
matching expression
* is found. That is, if `expr` has already been added, its children are 
not added.
-   * If ignoreLeaf is true, leaf nodes are ignored.
*/
-  def addExprTree(
-  root: Expression,
-  ignoreLeaf: Boolean = true,
-  skipReferenceToExpressions: Boolean = true): Unit = {
-val skip = (root.isInstanceOf[LeafExpression] && ignoreLeaf) ||
+  def addExprTree(expr: Expression): Unit = {
+val skip = expr.isInstanceOf[LeafExpression] ||
   // `LambdaVariable` is usually used as a loop variable, which can't 
be evaluated ahead of the
   // loop. So we can't evaluate sub-expressions containing 
`LambdaVariable` at the beginning.
-  root.find(_.isInstanceOf[LambdaVariable]).isDefined
-// There are some special expressions that we should not recurse into 
children.
+  expr.find(_.isInstanceOf[LambdaVariable]).isDefined
+
+// There are some special expressions that we should not recurse into 
all of its children.
 //   1. CodegenFallback: it's children will not be used to generate 
code (call eval() instead)
-//   2. ReferenceToExpressions: it's kind of an explicit 
sub-expression elimination.
-val shouldRecurse = root match {
-  // TODO: some expressions implements `CodegenFallback` but can still 
do codegen,
-  // e.g. `CaseWhen`, we should support them.
-  case _: CodegenFallback => false
-  case _: ReferenceToExpressions if skipReferenceToExpressions => false
-  case _ => true
+//   2. If: common subexpressions will always be evaluated at the 
beginning, but the true and
--- End diff --

I just found that not all the children of `AtLeastNNonNulls` get accessed 
during evaluation too. Do we need to add it here too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable wor...

2017-01-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16593


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16642: [SPARK-19284][SQL]append to partitioned datasource table...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16642
  
**[Test build #71793 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71793/testReport)**
 for PR 16642 at commit 
[`f76f75b`](https://github.com/apache/spark/commit/f76f75b8e8ec804307c2b80ab4a7ceb02dcae716).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16593
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15505
  
**[Test build #71792 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71792/testReport)**
 for PR 15505 at commit 
[`0e2dec5`](https://github.com/apache/spark/commit/0e2dec532780f7e3a5c31582732e10e85e80f1d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16552
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71785/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16552: [SPARK-19152][SQL]DataFrameWriter.saveAsTable support hi...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16552
  
**[Test build #71785 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71785/testReport)**
 for PR 16552 at commit 
[`cb7a1be`](https://github.com/apache/spark/commit/cb7a1bed92a111f03dd1d7464c494be5b8fed502).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class SparkListenerExecutorBlacklisted(`
  * `case class SparkListenerExecutorUnblacklisted(time: Long, executorId: 
String)`
  * `case class SparkListenerNodeBlacklisted(`
  * `case class SparkListenerNodeUnblacklisted(time: Long, hostId: String)`
  * `  class MaintenanceTask(periodMs: Long, task: => Unit, onError: => 
Unit) `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15505: [SPARK-18890][CORE] Move task serialization from the Tas...

2017-01-21 Thread witgo

Github user witgo commented on the issue:

https://github.com/apache/spark/pull/15505
  
@squito
My understanding is that the TaskSchedulerImpl class contains many 
synchronized statements (synchronized the methods). If a synchronized 
statements execution time is very long, it will block other synchronized 
statements,  this causes reduced performance in the TaskSchedulerImpl instance. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15505: [SPARK-18890][CORE] Move task serialization from ...

2017-01-21 Thread witgo

Github user witgo commented on a diff in the pull request:

https://github.com/apache/spark/pull/15505#discussion_r97211797
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ---
@@ -602,6 +619,20 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
 Future.successful(false)
 }
 
-private[spark] object CoarseGrainedSchedulerBackend {
+private[spark] object CoarseGrainedSchedulerBackend extends Logging {
   val ENDPOINT_NAME = "CoarseGrainedScheduler"
+  // abort TaskSetManager without exception
+  def abortTaskSetManager(
+  scheduler: TaskSchedulerImpl,
+  taskId: Long,
+  msg: => String,
+  exception: Option[Throwable] = None): Unit = {
+  scheduler.taskIdToTaskSetManager.get(taskId).foreach { taskSetMgr =>
+  try {
+taskSetMgr.abort(msg, exception)
--- End diff --

`taskSetMgr.abort` is thread safety, It looks fine from the calling code.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16245
  
It is true of course you can construct a combination of complex string 
operations and compare it with a simple Scala UDF. But as you said, the 
previous claim is true in most of time. I also think Scala UDF is usually used 
to write complex logic which can't be achieved by built-in expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211716
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

Hmm, actually that's what I mean, probably some confusing with 
`non-deterministic` with `non-foldable`? I think we can skip them both in a 
short cut evaluation. as those expressions are not `stateful`(unfortunately, 
Spark SQL expression doesn't have the concept of `stateful`), so skip the 
evaluation of them are harmless, and this is exactly the short cut logic of 
expression `AND`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is getting...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16670
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is getting...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16670
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71790/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is getting...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16670
  
**[Test build #71790 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71790/testReport)**
 for PR 16670 at commit 
[`294ce99`](https://github.com/apache/spark/commit/294ce991d2e1c8d7a38b526ccf4f35a7ac41fbc1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread chenghao-intel

Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/16245
  
I think that's true in most of time for`Scala UDF needs extra conversion 
between internal format and external format on input and out`, not all of the 
time, for example, some built-in string based operations and its combinations 
are also quite heavy in evaluation, and most likely, this probably causes 
concern for an experienced SQL developers, to write an optimal(business related 
short-cutting logic) SQL expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

yes. however, if the first expression in the `AND` is `non-deterministic`, 
skipping it might change its next evaluation. so we can only reorder the 
deterministic expressions after non-deterministic expressions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15219
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71787/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15219
  
**[Test build #71787 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71787/testReport)**
 for PR 15219 at commit 
[`b15d9d5`](https://github.com/apache/spark/commit/b15d9d5724936f5946d99acc40b75754e8583aa6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15219: [SPARK-14098][SQL] Generate Java code to build CachedCol...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15219
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211489
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

I mean `(rand() > 0) && b)` should equals to `b && (rand() >0)`, and even, 
the latter probably has better performance, due to the short cut evaluation of 
`AND`. isn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/71784/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16593
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16593: [SPARK-19153][SQL]DataFrameWriter.saveAsTable work with ...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16593
  
**[Test build #71784 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71784/testReport)**
 for PR 16593 at commit 
[`7bdc265`](https://github.com/apache/spark/commit/7bdc265500cbfd6b4dc16ec6a6ce7c321e7dd3dc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16245
  
I think most of time it should be as Scala UDF needs extra conversion 
between internal format and external format on input and out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16245: [SPARK-18824][SQL] Add optimizer rule to reorder Filter ...

2017-01-21 Thread chenghao-intel

Github user chenghao-intel commented on the issue:

https://github.com/apache/spark/pull/16245
  
Actually I doubt this is really an optimization, as the assumption of Scala 
UDF is slower than the non-SCALA UDF probably not always true.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

Reordering non-deterministic expressions might change the evaluation 
results. I think `foldable` expressions are handled by other rule already. And 
I remember we don't have explicit `stateful` expression or it is classified as 
`non-deterministic` too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16596: [SPARK-19237][SPARKR][WIP] R should check for java when ...

2017-01-21 Thread felixcheung

Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/16596
  
I've found the root cause, from investigations, but need to test cross 
platform for the fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16245: [SPARK-18824][SQL] Add optimizer rule to reorder ...

2017-01-21 Thread chenghao-intel

Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/16245#discussion_r97211330
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 ---
@@ -514,6 +514,34 @@ case class OptimizeCodegen(conf: CatalystConf) extends 
Rule[LogicalPlan] {
 
 
 /**
+ * Reorders the predicates in `Filter` so more expensive expressions like 
UDF can evaluate later.
+ */
+object ReorderPredicatesInFilter extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case f @ Filter(pred, child) =>
+  // Extracts deterministic suffix expressions from Filter predicate.
+  val expressions = splitConjunctivePredicates(pred)
+  // The beginning index of the deterministic suffix expressions.
+  var splitIndex = -1
+  (expressions.length - 1 to 0 by -1).foreach { idx =>
+if (splitIndex == -1 && !expressions(idx).deterministic) {
+  splitIndex = idx + 1
+}
+  }
+  if (splitIndex == expressions.length) {
+// All expressions are non-deterministic, no reordering.
+f
+  } else {
+val (nonDeterminstics, deterministicExprs) = 
expressions.splitAt(splitIndex)
--- End diff --

I am a little confused why we need to separate the `non-deterministic`? 
Should be the `stateful` or `foldable`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16624
  
The final failure, `HiveSparkSubmitSuite.dir` is irrelevant to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16624
  
**[Test build #71791 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71791/testReport)**
 for PR 16624 at commit 
[`075f466`](https://github.com/apache/spark/commit/075f4667020438a650659197ac8212c785775e75).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16579: [SPARK-19218][SQL] Fix SET command to show a result corr...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16579
  
Hi, @srowen and @gatorsmile .
Finally, this PR resolved all issues.
Could you review this again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97211236
  
--- Diff: R/pkg/R/utils.R ---
@@ -756,12 +756,12 @@ varargsToJProperties <- function(...) {
   props
 }
 
-launchScript <- function(script, combinedArgs, capture = FALSE) {
+launchScript <- function(script, combinedArgs, wait = FALSE) {
   if (.Platform$OS.type == "windows") {
 scriptWithArgs <- paste(script, combinedArgs, sep = " ")
-shell(scriptWithArgs, translate = TRUE, wait = capture, intern = 
capture) # nolint
+shell(scriptWithArgs, translate = TRUE, wait = wait, intern = wait) # 
nolint
   } else {
-system2(script, combinedArgs, wait = capture, stdout = capture)
+system2(script, combinedArgs, wait = wait)
--- End diff --

http://www.astrostatistics.psu.edu/datasets/R/html/base/html/shell.html
on Windows, intern = F seems to mean output to the console.
(doc page is missing on stat.ethz.ch)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16624: [WIP] Fix `SET -v` not to raise exceptions for configs w...

2017-01-21 Thread dongjoon-hyun

Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/16624
  
Retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is getting...

2017-01-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16670
  
**[Test build #71790 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/71790/testReport)**
 for PR 16670 at commit 
[`294ce99`](https://github.com/apache/spark/commit/294ce991d2e1c8d7a38b526ccf4f35a7ac41fbc1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread felixcheung

Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/16670#discussion_r97211194
  
--- Diff: R/pkg/R/utils.R ---
@@ -756,12 +756,12 @@ varargsToJProperties <- function(...) {
   props
 }
 
-launchScript <- function(script, combinedArgs, capture = FALSE) {
+launchScript <- function(script, combinedArgs, wait = FALSE) {
   if (.Platform$OS.type == "windows") {
 scriptWithArgs <- paste(script, combinedArgs, sep = " ")
-shell(scriptWithArgs, translate = TRUE, wait = capture, intern = 
capture) # nolint
+shell(scriptWithArgs, translate = TRUE, wait = wait, intern = wait) # 
nolint
   } else {
-system2(script, combinedArgs, wait = capture, stdout = capture)
+system2(script, combinedArgs, wait = wait)
--- End diff --

http://stat.ethz.ch/R-manual/R-devel/library/base/html/system2.html
stdout = F means "discard output"
stdout = "" (default) means to the console


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16670: [SPARK-19324][SPARKR] Spark VJM stdout output is ...

2017-01-21 Thread felixcheung

GitHub user felixcheung opened a pull request:

https://github.com/apache/spark/pull/16670

[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR

## What changes were proposed in this pull request?

This affects mostly running job from the driver in client mode when results 
are expected to be through stdout (which should be somewhat rare, but possible)

Before:
```
> a <- as.DataFrame(cars)
> b <- group_by(a, "dist")
> c <- count(b)
> sparkR.callJMethod(c$count@jc, "explain", TRUE)
NULL
```

After:
```
> a <- as.DataFrame(cars)
> b <- group_by(a, "dist")
> c <- count(b)
> sparkR.callJMethod(c$count@jc, "explain", TRUE)
count#11L
NULL
```

Now, `column.explain()` doesn't seem very useful (we can get more extensive 
output with `DataFrame.explain()`) but there are other more complex example 
with calls of `println` in Scala/JVM side.


## How was this patch tested?

manual


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixcheung/spark rjvmstdout

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16670


commit 294ce991d2e1c8d7a38b526ccf4f35a7ac41fbc1
Author: Felix Cheung 
Date:   2017-01-22T02:14:06Z

do not drop stdout




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 >

1 - 100 of 271 matches

Mail list logo