[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18174
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79123/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18174
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18174
  
**[Test build #79123 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79123/testReport)**
 for PR 18174 at commit 
[`3efc743`](https://github.com/apache/spark/commit/3efc7433802155c957e78d23abf4847cde8e0d07).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18520
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18520
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79124/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18523: [SPARK-21285][ML] VectorAssembler reports the column nam...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18523
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18520
  
**[Test build #79124 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79124/testReport)**
 for PR 18520 at commit 
[`0b9f860`](https://github.com/apache/spark/commit/0b9f860cee44bb06feeb291b566243e139cbaf28).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18523: [SPARK-21285][ML] VectorAssembler reports the col...

2017-07-03 Thread facaiy
GitHub user facaiy opened a pull request:

https://github.com/apache/spark/pull/18523

[SPARK-21285][ML] VectorAssembler reports the column name of unsupported 
data type

## What changes were proposed in this pull request?
add the column name in the exception which is raised by unsupported data 
type.

## How was this patch tested?
[ ] pass all tests.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/facaiy/spark ENH/vectorassembler_add_col

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18523.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18523


commit 95dbf6c7b287d0010af9de377ff6b93dec760808
Author: Yan Facai (颜发才) 
Date:   2017-07-04T05:42:07Z

ENH: report the name of missing column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18519: [SPARK-16742] kerberos

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18519
  
Not a big deal but could we fix the PR title to be a bit more descriptive?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18511
  
Not a bid deal but I would like to suggest to fix the title to be more 
descriptive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #79130 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79130/testReport)**
 for PR 17848 at commit 
[`0aa6475`](https://github.com/apache/spark/commit/0aa64755009701c1d37de27c48926b4f46373fa8).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18511
  
Not a big deal but I would like to suggest to fix the title to be more 
descriptive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17848
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79130/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17848: [SPARK-20586] [SQL] Add deterministic and distinctLike t...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17848
  
**[Test build #79130 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79130/testReport)**
 for PR 17848 at commit 
[`0aa6475`](https://github.com/apache/spark/commit/0aa64755009701c1d37de27c48926b4f46373fa8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18522#discussion_r125392441
  
--- Diff: core/src/test/scala/org/apache/spark/util/UtilsSuite.scala ---
@@ -488,7 +488,7 @@ class UtilsSuite extends SparkFunSuite with 
ResetSystemProperties with Logging {
 
   test("resolveURIs with multiple paths") {
 def assertResolves(before: String, after: String): Unit = {
-  assume(before.split(",").length > 1)
+  assume(before.split(",").length >= 1)
--- End diff --

BTW, why do we fix this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18521
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18521
  
**[Test build #79128 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79128/testReport)**
 for PR 18521 at commit 
[`5b80a8b`](https://github.com/apache/spark/commit/5b80a8b92273e9abf6ce8b28dcd70fbb32d4613c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79128/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18511
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79122/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18511: [SPARK-21286][Test] Modified a unit test

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18511
  
**[Test build #79122 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79122/testReport)**
 for PR 18511 at commit 
[`1d098ab`](https://github.com/apache/spark/commit/1d098abb7c087fa26c3cae1eb8c8dd8ffbe8530b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17985
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79126/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17985
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17985
  
**[Test build #79126 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79126/testReport)**
 for PR 17985 at commit 
[`9fc9a0a`](https://github.com/apache/spark/commit/9fc9a0ad567dfb28d22d94321fcef0ea3b1ae73b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18469
  
**[Test build #79129 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79129/testReport)**
 for PR 18469 at commit 
[`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18469
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18159#discussion_r125390211
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command {
 }
 
 /**
+ * A special `RunnableCommand` which writes data out and updates metrics.
+ */
+trait DataWritingCommand extends RunnableCommand {
+
+  override lazy val metrics: Map[String, SQLMetric] = {
+val sparkContext = SparkContext.getActive.get
+Map(
+  "avgTime" -> SQLMetrics.createMetric(sparkContext, "average writing 
time (ms)"),
+  "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of 
written files"),
+  "numOutputBytes" -> SQLMetrics.createMetric(sparkContext, "bytes of 
written output"),
+  "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"),
+  "numParts" -> SQLMetrics.createMetric(sparkContext, "number of 
dynamic part")
+)
+  }
+
+  /**
+   * Callback function that update metrics collected from the writing 
operation.
+   */
+  protected def updateWritingMetrics(writeSummaries: 
Seq[ExecutedWriteSummary]): Unit = {
+val sparkContext = SparkContext.getActive.get
+var numPartitions = 0
+var numFiles = 0
+var totalNumBytes: Long = 0L
+var totalNumOutput: Long = 0L
+var totalWritingTime: Long = 0L
+var numFilesNonZeroWritingTime = 0
+
+writeSummaries.foreach { summary =>
+  numPartitions += summary.updatedPartitions.size
+  numFiles += summary.numOutputFile
+  totalNumBytes += summary.numOutputBytes
+  totalNumOutput += summary.numOutputRows
+  totalWritingTime += summary.totalWritingTime
+  numFilesNonZeroWritingTime += summary.numFilesWithNonZeroWritingTime
+}
+
+// We only count non-zero writing time when averaging total writing 
time.
+// The time for writing individual file can be zero if it's less than 
1 ms. Zero values can
--- End diff --

I guess this should be rare?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18159: [SPARK-20703][SQL] Associate metrics with data writes on...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18159
  
LGTM except some minor comments, thanks for working on it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18469
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79125/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18159#discussion_r125389877
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command {
 }
 
 /**
+ * A special `RunnableCommand` which writes data out and updates metrics.
+ */
+trait DataWritingCommand extends RunnableCommand {
+
+  override lazy val metrics: Map[String, SQLMetric] = {
+val sparkContext = SparkContext.getActive.get
+Map(
+  "avgTime" -> SQLMetrics.createMetric(sparkContext, "average writing 
time (ms)"),
+  "numFiles" -> SQLMetrics.createMetric(sparkContext, "number of 
written files"),
+  "numOutputBytes" -> SQLMetrics.createMetric(sparkContext, "bytes of 
written output"),
+  "numOutputRows" -> SQLMetrics.createMetric(sparkContext, "number of 
output rows"),
+  "numParts" -> SQLMetrics.createMetric(sparkContext, "number of 
dynamic part")
+)
+  }
+
+  /**
+   * Callback function that update metrics collected from the writing 
operation.
+   */
+  protected def updateWritingMetrics(writeSummaries: 
Seq[ExecutedWriteSummary]): Unit = {
+val sparkContext = SparkContext.getActive.get
+var numPartitions = 0
+var numFiles = 0
+var totalNumBytes: Long = 0L
+var totalNumOutput: Long = 0L
+var totalWritingTime: Long = 0L
+var numFilesNonZeroWritingTime = 0
+
+writeSummaries.foreach { summary =>
+  numPartitions += summary.updatedPartitions.size
+  numFiles += summary.numOutputFile
+  totalNumBytes += summary.numOutputBytes
+  totalNumOutput += summary.numOutputRows
+  totalWritingTime += summary.totalWritingTime
+  numFilesNonZeroWritingTime += summary.numFilesWithNonZeroWritingTime
+}
+
+// We only count non-zero writing time when averaging total writing 
time.
+// The time for writing individual file can be zero if it's less than 
1 ms. Zero values can
--- End diff --

This only happens if a partition is very small, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18469
  
**[Test build #79125 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79125/testReport)**
 for PR 18469 at commit 
[`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18159#discussion_r125389753
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -314,21 +339,40 @@ object FileFormatWriter extends Logging {
 
   recordsInFile = 0
   releaseResources()
+  numOutputRows += recordsInFile
   newOutputWriter(fileCounter)
 }
 
 val internalRow = iter.next()
+val startTime = System.nanoTime()
 currentWriter.write(internalRow)
+timeOnCurrentFile += (System.nanoTime() - startTime)
--- End diff --

instead of tracking the time here, how about we do it in `newOutputWriter`?
```
var startTime = -1
def newOutputWriter {
  if (startTime == -1) {
startTime = System.nanoTime()
  } else {
val currentTime = System.nanoTime()
totalWritingTime += currentTime - startTime
startTime = currentTime
  }
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18501: [SPARK-20256][SQL] SessionState should be created more l...

2017-07-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18501
  
Hi, @cloud-fan and @gatorsmile .
I'm back to this PR. 
Although this introduces a new concept, this could be a solution in the 
current relation between `SparkContext` and `SparkSession`.
How do you think about this approach?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-record type...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18521
  
**[Test build #79128 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79128/testReport)**
 for PR 18521 at commit 
[`5b80a8b`](https://github.com/apache/spark/commit/5b80a8b92273e9abf6ce8b28dcd70fbb32d4613c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18159: [SPARK-20703][SQL] Associate metrics with data wr...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18159#discussion_r125389285
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/commands.scala 
---
@@ -47,10 +56,73 @@ trait RunnableCommand extends logical.Command {
 }
 
 /**
+ * A special `RunnableCommand` which writes data out and updates metrics.
+ */
+trait DataWritingCommand extends RunnableCommand {
--- End diff --

let's move it to a new file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18501: [SPARK-20256][SQL] SessionState should be created more l...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18501
  
**[Test build #79127 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79127/testReport)**
 for PR 18501 at commit 
[`137f252`](https://github.com/apache/spark/commit/137f252c79f3f044507a453320a66ac6d0cb6334).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...

2017-07-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r125388937
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -85,8 +94,9 @@ case class UserDefinedFunction protected[sql] (
* @since 2.3.0
*/
   def withName(name: String): this.type = {
-this._nameOption = Option(name)
-this
+val udf = copyAll()
+udf._nameOption = Option(name)
--- End diff --

yea, I know. I just meant we added an interface `newInstance(name, 
nullable, determinism)` there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17865
  
Build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17865
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79118/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...

2017-07-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18468#discussion_r125388507
  
--- Diff: core/src/main/java/org/apache/spark/memory/MemoryMode.java ---
@@ -22,5 +22,6 @@
 @Private
 public enum MemoryMode {
   ON_HEAP,
-  OFF_HEAP
+  OFF_HEAP,
+  ON_HEAP_CACHEDBATCH
--- End diff --

Current implementation relies on memory mode to allocate a kind of 
`ColumnVector`.
If we do not add a new memory model, I think that we have to introduce 
additional conditional branches in getter/setter.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17865: [SPARK-20456][Docs] Add examples for functions collectio...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17865
  
**[Test build #79118 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79118/testReport)**
 for PR 17865 at commit 
[`f17f332`](https://github.com/apache/spark/commit/f17f332dd97b948f8dd31eb2b18c1e11dc7fead0).
 * This patch passes all tests.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...

2017-07-03 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18468#discussion_r125388308
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapCachedBatch.java
 ---
@@ -0,0 +1,403 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.vectorized;
+
+import java.nio.ByteBuffer;
+
+import org.apache.spark.memory.MemoryMode;
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
+import org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder;
+import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter;
+import org.apache.spark.sql.execution.columnar.*;
+import org.apache.spark.sql.types.*;
+import org.apache.spark.unsafe.types.UTF8String;
+
+/**
+ * A column backed by an in memory JVM array.
+ */
+public final class OnHeapCachedBatch extends ColumnVector implements 
java.io.Serializable {
+
+  // keep compressed data
+  private byte[] buffer;
+
+  // whether a row is already extracted or not. If extractTo() is called, 
set true
+  // e.g. when isNullAt() and getInt() ara called, extractTo() must be 
called only once
+  private boolean[] calledExtractTo;
+
+  // a row where the compressed data is extracted
+  private transient UnsafeRow unsafeRow;
+  private transient BufferHolder bufferHolder;
+  private transient UnsafeRowWriter rowWriter;
+  private transient MutableUnsafeRow mutableRow;
+
+  // accesssor for a column
+  private transient ColumnAccessor columnAccessor;
+
+  // an accessor uses only column 0
+  private final int ORDINAL = 0;
+
+  protected OnHeapCachedBatch(int capacity, DataType type) {
+super(capacity, type, MemoryMode.ON_HEAP_CACHEDBATCH);
+reserveInternal(capacity);
+reset();
+  }
+
+  @Override
+  public long valuesNativeAddress() {
+throw new RuntimeException("Cannot get native address for on heap 
column");
+  }
+  @Override
+  public long nullsNativeAddress() {
+throw new RuntimeException("Cannot get native address for on heap 
column");
+  }
+
+  @Override
+  public void close() {
+  }
+
+  private void initialize() {
+if (columnAccessor == null) {
+  setColumnAccessor();
+}
+if (mutableRow == null) {
+  setRowSetter();
+}
+  }
+
+  private void setColumnAccessor() {
+ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
+columnAccessor = ColumnAccessor$.MODULE$.apply(type, byteBuffer);
+calledExtractTo = new boolean[capacity];
+  }
+
+  private void setRowSetter() {
+unsafeRow = new UnsafeRow(1);
+bufferHolder = new BufferHolder(unsafeRow);
+rowWriter = new UnsafeRowWriter(bufferHolder, 1);
+mutableRow = new MutableUnsafeRow(rowWriter);
+  }
+
+  // call extractTo() before getting actual data
+  private void prepareRowAccess(int rowId) {
--- End diff --

I agree with you. We can optimize these access by enhancing existing APIs.
Should we address these extensions in this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5

2017-07-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18502
  
I'm going to close this PR for a while. Thank you and sorry for this PR, 
@srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10....

2017-07-03 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/18502


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic and distinc...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r125387952
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -85,8 +94,9 @@ case class UserDefinedFunction protected[sql] (
* @since 2.3.0
*/
   def withName(name: String): this.type = {
-this._nameOption = Option(name)
-this
+val udf = copyAll()
+udf._nameOption = Option(name)
--- End diff --

@maropu We should make a copy when calling `withName`, instead of returning 
this object. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18522
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79120/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18522
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18522: [MINOR]Closes stream and releases any system resources a...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18522
  
**[Test build #79120 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79120/testReport)**
 for PR 18522 at commit 
[`c0cf41d`](https://github.com/apache/spark/commit/c0cf41d2d7aeb0d02ed3593464072f3b083f3f6f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5

2017-07-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/18502
  
Actually, the Spark failure is due to flaky test.

However, for PySpark failures, we are hitting 
https://github.com/bartdag/py4j/issues/278 .

We need to wait for 0.10.6.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125387151
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
+new_msg = lambda msg: msg
+new_name = lambda n: "field %s" % n
+else:
+new_msg = lambda msg: "%s: %s" % (name, msg)
+new_name = lambda n: "field %s in %s" % (n, name)
+
+def verify_nullability(obj):
+if obj is None:
+if nullable:
+return True
+else:
+raise ValueError(new_msg("This field is not nullable, but 
got None"))
 else:
-raise ValueError("This field is not nullable, but got None")
+return False
 
--- End diff --

sounds good. Will give a shot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125386646
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
+new_msg = lambda msg: msg
+new_name = lambda n: "field %s" % n
+else:
+new_msg = lambda msg: "%s: %s" % (name, msg)
+new_name = lambda n: "field %s in %s" % (n, name)
+
+def verify_nullability(obj):
+if obj is None:
+if nullable:
+return True
+else:
+raise ValueError(new_msg("This field is not nullable, but 
got None"))
 else:
-raise ValueError("This field is not nullable, but got None")
+return False
 
--- End diff --

how about:
```
def verify_nullability(obj): ...

if isinstance(dataType, StringType):
  def verify_string(obj): ...
  verify_value = verify_string
elif ...

def verify(obj):
  if (verify_nullability(obj)):
return None
  verify_value(obj)
return verify
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact 

[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125386547
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,19 @@
 import functools
 import time
 import datetime
+import traceback
+
+if sys.version_info[:2] <= (2, 6):
--- End diff --

Ah, hmm.. it should specifically check Python 2.6 <= as unittest2 is 
unittest from Python 2.7. To check minor versions, I think we should "extract" 
or compare with raw string `'2.6'`. 

I will clean up inconsistency in another PR about this later if you are 
okay as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18174: [SPARK-20950][CORE]add a new config to diskWriteB...

2017-07-03 Thread manku-timma
Github user manku-timma commented on a diff in the pull request:

https://github.com/apache/spark/pull/18174#discussion_r125386319
  
--- Diff: 
core/src/main/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriter.java ---
@@ -360,12 +368,10 @@ void forceSorterToSpill() throws IOException {
 
 final OutputStream bos = new BufferedOutputStream(
 new FileOutputStream(outputFile),
-(int) 
sparkConf.getSizeAsKb("spark.shuffle.unsafe.file.output.buffer", "32k") * 1024);
+outputBufferSizeInBytes);
--- End diff --

Just to understand what is happening.

1. Shuffle records are written to a serialisation buffer (1M) after 
serialisation
2. The serialised buffer is written to in-memory-sorter’s buffer
3. once in-memory sorter’s buffer is full, the data is copied to 
sorter’s disk buffer (1M)
4. the sorter’s disk buffer is written out to a buffered output stream 
(buffer = 32k)

I am guessing reducing the sorter’s disk buffer (in step 3) is helping 
because it triggers fewer writes/allocations in a single call at step 4 (and 
allowing more parallelism in writing back to disk and copying of data).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125386326
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -168,7 +173,9 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
   override def foldable: Boolean = children.forall(_.foldable)
 
   override def checkInputDataTypes(): TypeCheckResult = {
-if (children.size % 2 != 0) {
+if (children == Nil) {
+  TypeCheckResult.TypeCheckFailure("input to function coalesce cannot 
be empty")
--- End diff --

oh, my bad.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125386253
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   rand(Random.nextLong()), randn(Random.nextLong())
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
+
+  test("SPARK-21281 fails if functions have no argument") {
--- End diff --

ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18502
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18502
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79121/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18502: [SPARK-21278][PYSPARK][WIP] Upgrade to Py4J 0.10.5

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18502
  
**[Test build #79121 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79121/testReport)**
 for PR 18502 at commit 
[`5ba7b11`](https://github.com/apache/spark/commit/5ba7b112ae110acc5e9908c47d1df67b7be3a58b).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125386150
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
+new_msg = lambda msg: msg
+new_name = lambda n: "field %s" % n
+else:
+new_msg = lambda msg: "%s: %s" % (name, msg)
+new_name = lambda n: "field %s in %s" % (n, name)
+
+def verify_nullability(obj):
+if obj is None:
+if nullable:
+return True
+else:
+raise ValueError(new_msg("This field is not nullable, but 
got None"))
 else:
-raise ValueError("This field is not nullable, but got None")
+return False
 
 # StringType can work with any types
 if isinstance(dataType, StringType):
-return
+def verify_string(obj):
+if verify_nullability(obj):
+return None
--- End diff --

ah makes sense, but at least here we are not returning earlier, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark issue #18445: [Spark-19726][SQL] Faild to insert null timestamp value ...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18445
  
yes. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125385778
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,19 @@
 import functools
 import time
 import datetime
+import traceback
+
+if sys.version_info[:2] <= (2, 6):
--- End diff --

I will keep in mind.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125385638
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,19 @@
 import functools
 import time
 import datetime
+import traceback
+
+if sys.version_info[:2] <= (2, 6):
--- End diff --

can we follow the existing style here? You can send a PR to update all of 
them later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125385500
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
--- End diff --

This looks a bit odds but I could not figure out a shorter and cleaner way 
...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17832: [SPARK-20557][SQL] Support for db column type TIM...

2017-07-03 Thread atrigent
Github user atrigent commented on a diff in the pull request:

https://github.com/apache/spark/pull/17832#discussion_r125385324
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -223,6 +223,9 @@ object JdbcUtils extends Logging {
   case java.sql.Types.STRUCT=> StringType
   case java.sql.Types.TIME  => TimestampType
   case java.sql.Types.TIMESTAMP => TimestampType
+  case java.sql.Types.TIMESTAMP_WITH_TIMEZONE
+=> TimestampType
+  case -101 => TimestampType
--- End diff --

Why was this `-101` thing put here instead of in the Oracle dialect?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18118: [SPARK-20199][ML] : Provided featureSubsetStrategy to GB...

2017-07-03 Thread pralabhkumar
Github user pralabhkumar commented on the issue:

https://github.com/apache/spark/pull/18118
  
ping @sethah @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125384875
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   rand(Random.nextLong()), randn(Random.nextLong())
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
+
+  test("SPARK-21281 fails if functions have no argument") {
--- End diff --

Could you create a helper function for removing these duplicate codes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125384834
  
--- Diff: python/pyspark/sql/session.py ---
@@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
 schema = [str(x) for x in data.columns]
 data = [r.tolist() for r in data.to_records(index=False)]
 
-verify_func = _verify_type if verifySchema else lambda _, t: True
 if isinstance(schema, StructType):
+verify_func = _make_type_verifier(schema) if verifySchema else 
lambda _: True
+
 def prepare(obj):
-verify_func(obj, schema)
+verify_func(obj)
 return obj
 elif isinstance(schema, DataType):
 dataType = schema
 schema = StructType().add("value", schema)
 
+verify_func = _make_type_verifier(
+dataType, name="field value") if verifySchema else lambda 
_: True
--- End diff --

Oh, wait you mean `field value`. Yes, this is printed as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125384785
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -41,8 +41,13 @@ case class CreateArray(children: Seq[Expression]) 
extends Expression {
 
   override def foldable: Boolean = children.forall(_.foldable)
 
-  override def checkInputDataTypes(): TypeCheckResult =
-TypeUtils.checkForSameTypeInputExpr(children.map(_.dataType), 
"function array")
+  override def checkInputDataTypes(): TypeCheckResult = {
+if (children == Nil) {
+  TypeCheckResult.TypeCheckFailure("input to function coalesce cannot 
be empty")
--- End diff --

`coalesce `?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125384801
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
 ---
@@ -168,7 +173,9 @@ case class CreateMap(children: Seq[Expression]) extends 
Expression {
   override def foldable: Boolean = children.forall(_.foldable)
 
   override def checkInputDataTypes(): TypeCheckResult = {
-if (children.size % 2 != 0) {
+if (children == Nil) {
+  TypeCheckResult.TypeCheckFailure("input to function coalesce cannot 
be empty")
--- End diff --

`coalesce ` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125384578
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   rand(Random.nextLong()), randn(Random.nextLong())
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
+
+  test("SPARK-21281 fails if functions have no argument") {
+var errMsg = intercept[AnalysisException] {
+  spark.range(1).select(array())
+}.getMessage
+assert(errMsg.contains("due to data type mismatch: input to function 
coalesce cannot be empty"))
+
+errMsg = intercept[AnalysisException] {
+  spark.range(1).select(map())
+}.getMessage
+assert(errMsg.contains("due to data type mismatch: input to function 
coalesce cannot be empty"))
+
+// spark.range(1).select(coalesce())
+errMsg = intercept[AnalysisException] {
+  spark.range(1).select(coalesce())
+}.getMessage
+assert(errMsg.contains("due to data type mismatch: input to function 
coalesce cannot be empty"))
+
+// This hits java.lang.AssertionError
+// spark.range(1).select(struct())
+
+errMsg = intercept[IllegalArgumentException] {
+  spark.range(1).select(greatest())
+}.getMessage
+assert(errMsg.contains("requirement failed: greatest requires at least 
2 arguments"))
--- End diff --

uh. I see.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17985
  
**[Test build #79126 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79126/testReport)**
 for PR 17985 at commit 
[`9fc9a0a`](https://github.com/apache/spark/commit/9fc9a0ad567dfb28d22d94321fcef0ea3b1ae73b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18516: [SPARK-21281][SQL] Throw AnalysisException if arr...

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18516#discussion_r125384379
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala ---
@@ -448,6 +448,43 @@ class DataFrameFunctionsSuite extends QueryTest with 
SharedSQLContext {
   rand(Random.nextLong()), randn(Random.nextLong())
 ).foreach(assertValuesDoNotChangeAfterCoalesceOrUnion(_))
   }
+
+  test("SPARK-21281 fails if functions have no argument") {
--- End diff --

Could you move these functions to `.sql`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT

2017-07-03 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/18228
  
`Substring` is the most commonly used as `left` and `right`, and i think 
using these form is  more friendly for users.
Also mysql and SQL server support these two functions  and `Substring`  
@viirya



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17985: Add "full_outer" name to join types

2017-07-03 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/17985
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125384150
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
+new_msg = lambda msg: msg
+new_name = lambda n: "field %s" % n
+else:
+new_msg = lambda msg: "%s: %s" % (name, msg)
+new_name = lambda n: "field %s in %s" % (n, name)
+
+def verify_nullability(obj):
+if obj is None:
+if nullable:
+return True
+else:
+raise ValueError(new_msg("This field is not nullable, but 
got None"))
 else:
-raise ValueError("This field is not nullable, but got None")
+return False
 
 # StringType can work with any types
 if isinstance(dataType, StringType):
-return
+def verify_string(obj):
+if verify_nullability(obj):
+return None
--- End diff --

```python
def A():
print "a"

def B():
print "a"
return None

def C():
print "a"
return

print(A() is None)
print(A() == B() == C())
```

These are synonyms. I believe this is also about preference - `return` vs 
`return None` if we should 

[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18469
  
**[Test build #79125 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79125/testReport)**
 for PR 18469 at commit 
[`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125383919
  
--- Diff: python/pyspark/sql/session.py ---
@@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
 schema = [str(x) for x in data.columns]
 data = [r.tolist() for r in data.to_records(index=False)]
 
-verify_func = _verify_type if verifySchema else lambda _, t: True
 if isinstance(schema, StructType):
+verify_func = _make_type_verifier(schema) if verifySchema else 
lambda _: True
+
 def prepare(obj):
-verify_func(obj, schema)
+verify_func(obj)
 return obj
 elif isinstance(schema, DataType):
 dataType = schema
 schema = StructType().add("value", schema)
 
+verify_func = _make_type_verifier(
+dataType, name="field value") if verifySchema else lambda 
_: True
--- End diff --

I don't think so. It should return `None` I think but I just wanted to 
avoid other changes for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125383823
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,19 @@
 import functools
 import time
 import datetime
+import traceback
+
+if sys.version_info[:2] <= (2, 6):
--- End diff --

I think it is a preference. I don't think either is particularly better. 
Just per documentation. it sounds `version_info` is preferred - 
https://docs.python.org/2/library/sys.html#sys.version

> Do not extract version information out of it, rather, use `version_info` 
and the functions 

although we are not "extract"ing anyway ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18469
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18460: [SPARK-21247][SQL] Allow case-insensitive type eq...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18460#discussion_r125383817
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -79,8 +80,12 @@ abstract class DataType extends AbstractDataType {
* Check if `this` and `other` are the same data type when ignoring 
nullability
* (`StructField.nullable`, `ArrayType.containsNull`, and 
`MapType.valueContainsNull`).
*/
-  private[spark] def sameType(other: DataType): Boolean =
-DataType.equalsIgnoreNullability(this, other)
+  private[spark] def sameType(other: DataType, isCaseSensitive: Boolean = 
true): Boolean =
--- End diff --

maybe we should not consider field names in `sameType`, @gatorsmile what do 
you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18520: [SPARK-21295] [SQL] Use qualified names in error message...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18520
  
**[Test build #79124 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79124/testReport)**
 for PR 18520 at commit 
[`0b9f860`](https://github.com/apache/spark/commit/0b9f860cee44bb06feeb291b566243e139cbaf28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18228: [SPARK-21007][SQL]Add SQL function - RIGHT && LEFT

2017-07-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18228
  
As we already have `substring`, do we still need them?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18469
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18469
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79115/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18469: [SPARK-21256] [SQL] Add withSQLConf to Catalyst Test

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18469
  
**[Test build #79115 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79115/testReport)**
 for PR 18469 at commit 
[`7431a8d`](https://github.com/apache/spark/commit/7431a8df09fada093d47abb49079de81cdbd1d9e).
 * This patch **fails PySpark pip packaging tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18444: [SPARK-16542][SQL][PYSPARK] Fix bugs about types that re...

2017-07-03 Thread ueshin
Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18444
  
I guess you can define source code encoding to the top of the file like:

```
# -*- coding: utf-8 -*-
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79117/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18174
  
**[Test build #79123 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79123/testReport)**
 for PR 18174 at commit 
[`3efc743`](https://github.com/apache/spark/commit/3efc7433802155c957e78d23abf4847cde8e0d07).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17633
  
**[Test build #79117 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79117/testReport)**
 for PR 17633 at commit 
[`7965ef3`](https://github.com/apache/spark/commit/7965ef35ce45dbaabb7ba525a1b41625365b9da6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18174: [SPARK-20950][CORE]add a new config to diskWriteBufferSi...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18174
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125382241
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1249,121 +1249,201 @@ def _infer_schema_type(obj, dataType):
 }
 
 
-def _verify_type(obj, dataType, nullable=True):
+def _make_type_verifier(dataType, nullable=True, name=None):
 """
 Verify the type of obj against dataType, raise a TypeError if they do 
not match.
 
 Also verify the value of obj against datatype, raise a ValueError if 
it's not within the allowed
 range, e.g. using 128 as ByteType will overflow. Note that, Python 
float is not checked, so it
 will become infinity when cast to Java float if it overflows.
 
->>> _verify_type(None, StructType([]))
->>> _verify_type("", StringType())
->>> _verify_type(0, LongType())
->>> _verify_type(list(range(3)), ArrayType(ShortType()))
->>> _verify_type(set(), ArrayType(StringType())) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(StructType([]))(None)
+>>> _make_type_verifier(StringType())("")
+>>> _make_type_verifier(LongType())(0)
+>>> _make_type_verifier(ArrayType(ShortType()))(list(range(3)))
+>>> _make_type_verifier(ArrayType(StringType()))(set()) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 TypeError:...
->>> _verify_type({}, MapType(StringType(), IntegerType()))
->>> _verify_type((), StructType([]))
->>> _verify_type([], StructType([]))
->>> _verify_type([1], StructType([])) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({})
+>>> _make_type_verifier(StructType([]))(())
+>>> _make_type_verifier(StructType([]))([])
+>>> _make_type_verifier(StructType([]))([1]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> # Check if numeric values are within the allowed range.
->>> _verify_type(12, ByteType())
->>> _verify_type(1234, ByteType()) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType())(12)
+>>> _make_type_verifier(ByteType())(1234) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type(None, ByteType(), False) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(ByteType(), False)(None) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type([1, None], ArrayType(ShortType(), False)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(
+... ArrayType(ShortType(), False))([1, None]) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
->>> _verify_type({None: 1}, MapType(StringType(), IntegerType()))
+>>> _make_type_verifier(MapType(StringType(), IntegerType()))({None: 
1})
 Traceback (most recent call last):
 ...
 ValueError:...
 >>> schema = StructType().add("a", IntegerType()).add("b", 
StringType(), False)
->>> _verify_type((1, None), schema) # doctest: +IGNORE_EXCEPTION_DETAIL
+>>> _make_type_verifier(schema)((1, None)) # doctest: 
+IGNORE_EXCEPTION_DETAIL
 Traceback (most recent call last):
 ...
 ValueError:...
 """
-if obj is None:
-if nullable:
-return
+
+if name is None:
+new_msg = lambda msg: msg
+new_name = lambda n: "field %s" % n
+else:
+new_msg = lambda msg: "%s: %s" % (name, msg)
+new_name = lambda n: "field %s in %s" % (n, name)
+
+def verify_nullability(obj):
+if obj is None:
+if nullable:
+return True
+else:
+raise ValueError(new_msg("This field is not nullable, but 
got None"))
 else:
-raise ValueError("This field is not nullable, but got None")
+return False
 
 # StringType can work with any types
 if isinstance(dataType, StringType):
-return
+def verify_string(obj):
+if verify_nullability(obj):
+return None
--- End diff --

why a verify method needs to return something?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125382032
  
--- Diff: python/pyspark/sql/session.py ---
@@ -514,17 +514,21 @@ def createDataFrame(self, data, schema=None, 
samplingRatio=None, verifySchema=Tr
 schema = [str(x) for x in data.columns]
 data = [r.tolist() for r in data.to_records(index=False)]
 
-verify_func = _verify_type if verifySchema else lambda _, t: True
 if isinstance(schema, StructType):
+verify_func = _make_type_verifier(schema) if verifySchema else 
lambda _: True
+
 def prepare(obj):
-verify_func(obj, schema)
+verify_func(obj)
 return obj
 elif isinstance(schema, DataType):
 dataType = schema
 schema = StructType().add("value", schema)
 
+verify_func = _make_type_verifier(
+dataType, name="field value") if verifySchema else lambda 
_: True
--- End diff --

is "field value" useful in the error message?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...

2017-07-03 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18522#discussion_r125380867
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala ---
@@ -76,7 +76,11 @@ private[spark] class FileAppender(inputStream: 
InputStream, file: File, bufferSi
   }
 }
   } {
-closeFile()
+try {
+  inputStream.close()
--- End diff --

The another  reason is that this function runs in another thread


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18521: [SPARK-19507][SPARK-21296][PYTHON] Avoid per-reco...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18521#discussion_r125380790
  
--- Diff: python/pyspark/sql/tests.py ---
@@ -30,6 +30,19 @@
 import functools
 import time
 import datetime
+import traceback
+
+if sys.version_info[:2] <= (2, 6):
--- End diff --

this is different from how we check python version in other files, is this 
better than `sys.version >= '3'`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18468#discussion_r125380209
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapCachedBatch.java
 ---
@@ -0,0 +1,403 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.sql.execution.vectorized;
+
+import java.nio.ByteBuffer;
+
+import org.apache.spark.memory.MemoryMode;
+import org.apache.spark.sql.catalyst.expressions.UnsafeRow;
+import org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder;
+import org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter;
+import org.apache.spark.sql.execution.columnar.*;
+import org.apache.spark.sql.types.*;
+import org.apache.spark.unsafe.types.UTF8String;
+
+/**
+ * A column backed by an in memory JVM array.
+ */
+public final class OnHeapCachedBatch extends ColumnVector implements 
java.io.Serializable {
+
+  // keep compressed data
+  private byte[] buffer;
+
+  // whether a row is already extracted or not. If extractTo() is called, 
set true
+  // e.g. when isNullAt() and getInt() ara called, extractTo() must be 
called only once
+  private boolean[] calledExtractTo;
+
+  // a row where the compressed data is extracted
+  private transient UnsafeRow unsafeRow;
+  private transient BufferHolder bufferHolder;
+  private transient UnsafeRowWriter rowWriter;
+  private transient MutableUnsafeRow mutableRow;
+
+  // accesssor for a column
+  private transient ColumnAccessor columnAccessor;
+
+  // an accessor uses only column 0
+  private final int ORDINAL = 0;
+
+  protected OnHeapCachedBatch(int capacity, DataType type) {
+super(capacity, type, MemoryMode.ON_HEAP_CACHEDBATCH);
+reserveInternal(capacity);
+reset();
+  }
+
+  @Override
+  public long valuesNativeAddress() {
+throw new RuntimeException("Cannot get native address for on heap 
column");
+  }
+  @Override
+  public long nullsNativeAddress() {
+throw new RuntimeException("Cannot get native address for on heap 
column");
+  }
+
+  @Override
+  public void close() {
+  }
+
+  private void initialize() {
+if (columnAccessor == null) {
+  setColumnAccessor();
+}
+if (mutableRow == null) {
+  setRowSetter();
+}
+  }
+
+  private void setColumnAccessor() {
+ByteBuffer byteBuffer = ByteBuffer.wrap(buffer);
+columnAccessor = ColumnAccessor$.MODULE$.apply(type, byteBuffer);
+calledExtractTo = new boolean[capacity];
+  }
+
+  private void setRowSetter() {
+unsafeRow = new UnsafeRow(1);
+bufferHolder = new BufferHolder(unsafeRow);
+rowWriter = new UnsafeRowWriter(bufferHolder, 1);
+mutableRow = new MutableUnsafeRow(rowWriter);
+  }
+
+  // call extractTo() before getting actual data
+  private void prepareRowAccess(int rowId) {
--- End diff --

this looks weird that we put the value to a row and then read that value 
from the row, can we return that value directly? e.g. 
`columnAccessor.extractTo` should be able to take a `ColumnVector` as input and 
set value to it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18522: [MINOR]Closes stream and releases any system reso...

2017-07-03 Thread 10110346
Github user 10110346 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18522#discussion_r125379907
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/logging/FileAppender.scala ---
@@ -76,7 +76,11 @@ private[spark] class FileAppender(inputStream: 
InputStream, file: File, bufferSi
   }
 }
   } {
-closeFile()
+try {
+  inputStream.close()
--- End diff --

Yes,you are right.
But this function is only used in `ExecutorRunner`,  also if an exception 
occurs within this function,this will ensure the inputStream  is closed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18468: [SPARK-20873][SQL] Enhance ColumnVector to suppor...

2017-07-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18468#discussion_r125379816
  
--- Diff: core/src/main/java/org/apache/spark/memory/MemoryMode.java ---
@@ -22,5 +22,6 @@
 @Private
 public enum MemoryMode {
   ON_HEAP,
-  OFF_HEAP
+  OFF_HEAP,
+  ON_HEAP_CACHEDBATCH
--- End diff --

hmm, I don't think this can be a new memory mode...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >