date:20170717

[GitHub] spark issue #18468: [SPARK-20873][SQL] Creat CachedBatchColumnVector to abst...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18468
  
`ArrowColumnVector` is also a wrapper for arrow vector, and it doesn't 
introduce vector type stuff.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
@cloud-fan Thank you for your comments. Based on [this 
discussion](https://github.com/apache/spark/pull/18468#discussion_r125395003), 
I introduced `VectorType`.
I have just seen @ueshin 's `ArrowColumnVector` implementation. I will 
update `CachedBatchColumnVector` based on your comments and @ueshin 's 
implementation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18487: [SPARK-21243][Core] Limit no. of map outputs in a...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18487#discussion_r127885748
  
--- Diff: 
core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 
---
@@ -277,11 +290,13 @@ final class ShuffleBlockFetcherIterator(
   } else if (size < 0) {
 throw new BlockException(blockId, "Negative block size " + 
size)
   }
-  if (curRequestSize >= targetRequestSize) {
+  if (curRequestSize >= targetRequestSize ||
+  curBlocks.size >= maxBlocksInFlightPerAddress) {
--- End diff --

We may have a lot of adjacent fetch requests in the queue, shall we shuffle 
the request queue before fetching?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79694/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79694 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)**
 for PR 18654 at commit 
[`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18649
  
cc @jerryshao 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18468
  
I think this PR doesn't have a good abstraction of the problem. For table 
cache, our goal is not making the comressed data a `ColumnVector`, but having 
an efficient way to convert the compressed data(byte array) to `ColumnVector`. 
I think the most efficient way is to not do conversion at all, but having a 
wrapper, i.e. having a `class CachedBatchColumnVector(data: Array[Byte])`, 
which implements various `getXXX` methods by doing decompression. Then we don't 
need to introduce the `VectorType` concept and change `ColumnVector`.

@kiszk what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread jinxing64

Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/18634
  
@cloud-fan @jiangxb1987 
Thanks for help! I will refine and post the result of manual test late 
today :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882623
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

And this test case doesn't cover when CurrentRow is not in the window 
frame. We'd better add that senario.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame to avoid...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18634
  
@jinxing64 I think this patch is straightforward, can you do a manual test, 
which OOM before and works after this PR? We can put the test in PR description 
so that other people can try it out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882430
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

BTW this test is not very related to this PR, just adds test coverage for 
range window frame.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18634: [SPARK-21414] Refine SlidingWindowFunctionFrame t...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18634#discussion_r127882358
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLWindowFunctionSuite.scala
 ---
@@ -356,6 +356,42 @@ class SQLWindowFunctionSuite extends QueryTest with 
SharedSQLContext {
 spark.catalog.dropTempView("nums")
   }
 
+  test("window function: mutiple window expressions specified by range in 
a single expression") {
+val nums = sparkContext.parallelize(1 to 10).map(x => (x, x % 
2)).toDF("x", "y")
+nums.createOrReplaceTempView("nums")
--- End diff --

wrap your test with `withTempView`, which can drop the view automatically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18649: [SPARK-21395][SQL] Spark SQL hive-thriftserver doesn't r...

2017-07-17 Thread debugger87

Github user debugger87 commented on the issue:

https://github.com/apache/spark/pull/18649
  
@cloud-fan Any suggestions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-17 Thread ueshin

Github user ueshin commented on the issue:

https://github.com/apache/spark/pull/18655
  
Thank you for your comments.
I agree that we should split this into smaller PRs. I'll push another 
commit to remove `ArrowColumnVector` from this as soon as possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18468: [SPARK-20873][SQL] Enhance ColumnVector to support compr...

2017-07-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/18468
  
ping @ueshin @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18655: [SPARK-21440][SQL][PYSPARK] Refactor ArrowConverters and...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18655
  
yea let's put `ArrowColumnVector` and its tests in a new PR and merge that 
first.

`ArrowWriter` will also be used for pandas UDF, see 
https://issues.apache.org/jira/browse/SPARK-21190 for more details, so it makes 
sense to move it to a separated file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
Also merged to branch-2.2


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
thanks @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18667: Fix the simpleString used in error messages

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18667
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18667: Fix the simpleString used in error messages

2017-07-17 Thread fxbonnet

GitHub user fxbonnet opened a pull request:

https://github.com/apache/spark/pull/18667

Fix the simpleString used in error messages

## What changes were proposed in this pull request?

(Please fill in changes proposed in this fix)

## How was this patch tested?

(Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fxbonnet/spark patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18667


commit e31555ba0b297054c504d3e2eaac20befb10738d
Author: Francois-Xavier Bonnet 
Date:   2017-07-18T04:19:17Z

Fix the simpleString used in error messages




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-17 Thread ueshin

Github user ueshin commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r127879502
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -792,6 +793,76 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 collectAndValidate(df, json, "binaryData.json")
   }
 
+  test("date type conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "date",
+ |  "type" : {
+ |"name" : "date",
+ |"unit" : "DAY"
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 32
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 4,
+ |"columns" : [ {
+ |  "name" : "date",
+ |  "count" : 4,
+ |  "VALIDITY" : [ 1, 1, 1, 1 ],
+ |  "DATA" : [ -1, 0, 16533, 16930 ]
+ |} ]
+ |  } ]
+ |}
+   """.stripMargin
+
+val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US)
+val d1 = new Date(-1)  // "1969-12-31 13:10:15.000 UTC"
+val d2 = new Date(0)  // "1970-01-01 13:10:15.000 UTC"
+val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime)
+val d4 = new Date(sdf.parse("2016-05-09 12:01:01.000 UTC").getTime)
+
+// Date is created unaware of timezone, but DateTimeUtils force 
defaultTimeZone()
+
assert(DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d2)).getTime == 
d2.getTime)
--- End diff --

We handle `DateType` value as days from `1970-01-01` internally.

When converting from/to `Date` to/from internal value, we assume the `Date` 
instance contains the timestamp of `00:00:00` time of the day in 
`TimeZone.getDefault()` timezone, which is the offset of the timezone. e.g. in 
JST (GMT+09:00):

```
scala> TimeZone.setDefault(TimeZone.getTimeZone("JST"))

scala> Date.valueOf("1970-01-01").getTime()
res6: Long = -3240
```

whereas in PST (GMT-08:00):

```
scala> TimeZone.setDefault(TimeZone.getTimeZone("PST"))

scala> Date.valueOf("1970-01-01").getTime()
res8: Long = 2880
```

We use `DateTimeUtils.defaultTimeZone()` to adjust the offset.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18660: [SPARK-21445] Make IntWrapper and LongWrapper in ...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18660


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18660
  
thanks, merging to master!

@brkyvz I think it's fine, this bug is very obvious.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread brkyvz

Github user brkyvz commented on the issue:

https://github.com/apache/spark/pull/18660
  
I couldn't write an easy reproduction for the bug :(


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18583: [SPARK-21332][SQL] Incorrect result type inferred...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18583


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18583: [SPARK-21332][SQL] Incorrect result type inferred for so...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/18583
  
Thanks! Merging to master/2.2/2.1/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127876852
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempPath { dir =>
--- End diff --

Could we maybe just do as below?

```scala
withTempPath { path =>
  spark.range(100).repartition(10).where("id = 50").write.parquet(path)
  val partFiles = path.listFiles()
.filter(f => f.isFile && !f.getName.startsWith(".") && 
!f.getName.startsWith("_"))
  assert(partFiles.length === 2)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread JoshRosen

Github user JoshRosen commented on the issue:

https://github.com/apache/spark/pull/18662
  
Merged to master. Thanks for the quick reviews.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18662: [SPARK-21444] Be more defensive when removing bro...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18662


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127875986
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

I meant joining keys. I am not sure if `a = c && rand(b) < 0` is a joining 
key?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18666: [SPARK-21449][SQL][Hive]Close HiveClient's SessionState ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18666
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18666: [SPARK-21449][SQL][Hive]Close HiveClient's Sessio...

2017-07-17 Thread yaooqinn

GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/18666

[SPARK-21449][SQL][Hive]Close HiveClient's SessionState to delete residual 
dirs


## What changes were proposed in this pull request?

When sparkSession.stop() is called, close the hive client too.

## How was this patch tested?

manully

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark SPARK-21449

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18666.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18666


commit cac9fe7a627911079e55d5704fcf1b49228c5147
Author: Kent Yao 
Date:   2017-07-18T03:22:17Z

Hive client's SessionState was not closed properly in HiveExternalCatalog




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18663
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79692/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18663: [SPARK-20079][yarn] Fix client AM not allocating executo...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18663
  
**[Test build #79692 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79692/testReport)**
 for PR 18663 at commit 
[`1496b78`](https://github.com/apache/spark/commit/1496b78d2bcd2003b23307f767c57c0dc2818e16).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy

Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18305#discussion_r127874833
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
 ---
@@ -32,40 +34,45 @@ private[ml] trait DifferentiableRegularization[T] 
extends DiffFunction[T] {
 }
 
 /**
- * A Breeze diff function for computing the L2 regularized loss and 
gradient of an array of
+ * A Breeze diff function for computing the L2 regularized loss and 
gradient of a vector of
  * coefficients.
  *
  * @param regParam The magnitude of the regularization.
  * @param shouldApply A function (Int => Boolean) indicating whether a 
given index should have
  *regularization applied to it.
- * @param featuresStd Option indicating whether the regularization should 
be scaled by the standard
- *deviation of the features.
+ * @param applyFeaturesStd Option for a function which maps coefficient 
index (column major) to the
+ * feature standard deviation. If `None`, no 
standardization is applied.
  */
 private[ml] class L2Regularization(
-val regParam: Double,
+override val regParam: Double,
 shouldApply: Int => Boolean,
-featuresStd: Option[Array[Double]]) extends 
DifferentiableRegularization[Array[Double]] {
+applyFeaturesStd: Option[Int => Double]) extends 
DifferentiableRegularization[Vector] {
 
-  override def calculate(coefficients: Array[Double]): (Double, 
Array[Double]) = {
-var sum = 0.0
-val gradient = new Array[Double](coefficients.length)
-coefficients.indices.filter(shouldApply).foreach { j =>
-  val coef = coefficients(j)
-  featuresStd match {
-case Some(stds) =>
-  val std = stds(j)
-  if (std != 0.0) {
-val temp = coef / (std * std)
-sum += coef * temp
-gradient(j) = regParam * temp
-  } else {
-0.0
+  override def calculate(coefficients: Vector): (Double, Vector) = {
+coefficients match {
+  case dv: DenseVector =>
+var sum = 0.0
+val gradient = new Array[Double](dv.size)
+dv.values.indices.filter(shouldApply).foreach { j =>
+  val coef = coefficients(j)
+  applyFeaturesStd match {
+case Some(getStd) =>
+  val std = getStd(j)
+  if (std != 0.0) {
+val temp = coef / (std * std)
+sum += coef * temp
+gradient(j) = regParam * temp
+  } else {
+0.0
+  }
+case None =>
+  sum += coef * coef
+  gradient(j) = coef * regParam
--- End diff --

Trivial, to match `regParam * temp` above, how about using `regParam * 
coef`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18665: [SPARK-21446] [SQL] Fix setAutoCommit never executed

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18665
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18665: [SPARK-21446] [SQL] Fix setAutoCommit never execu...

2017-07-17 Thread DFFuture

GitHub user DFFuture opened a pull request:

https://github.com/apache/spark/pull/18665

[SPARK-21446] [SQL] Fix setAutoCommit never executed

## What changes were proposed in this pull request?
JIRA Issue: https://issues.apache.org/jira/browse/SPARK-21446
options.asConnectionProperties can not have fetchsizeï¼because fetchsize 
belongs to Spark-only options, and Spark-only options have been excluded in 
connection properities.
So change properties of beforeFetch from  
options.asConnectionProperties.asScala.toMap to 
options.asProperties.asScala.toMap

## How was this patch tested?



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/DFFuture/spark sparksql_pg

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18665.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18665


commit 9ba431a838a16a8371b3d3f6ef028158576f85d2
Author: DFFuture 
Date:   2017-07-18T00:36:06Z

asConnectionProperties can not have fetchsize, change it to asProperties




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127874260
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

The join type also matters. For example, are we able to push it to the left 
side for the right outer join?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18652: [WIP] Pull non-deterministic joining keys from Jo...

2017-07-17 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/18652#discussion_r127874213
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1912,6 +1913,26 @@ class Analyzer(
   nondeterToAttr.get(e).map(_.toAttribute).getOrElse(e)
 }.copy(child = newChild)
 
+  case j: Join if j.condition.isDefined && 
!j.condition.get.deterministic =>
+j match {
+  // We can push down non-deterministic joining keys.
--- End diff --

`a = c && rand(3) * b < 0 ` Are we able to push down the second one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18662
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79691/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18662
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18662: [SPARK-21444] Be more defensive when removing broadcasts...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18662
  
**[Test build #79691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79691/testReport)**
 for PR 18662 at commit 
[`a5ebcac`](https://github.com/apache/spark/commit/a5ebcac4ceb14eb8342ce085965b370186b4aba9).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18305: [SPARK-20988][ML] Logistic regression uses aggreg...

2017-07-17 Thread facaiy

Github user facaiy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18305#discussion_r127873828
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -598,8 +598,23 @@ class LogisticRegression @Since("1.2.0") (
 val regParamL2 = (1.0 - $(elasticNetParam)) * $(regParam)
 
 val bcFeaturesStd = instances.context.broadcast(featuresStd)
-val costFun = new LogisticCostFun(instances, numClasses, 
$(fitIntercept),
-  $(standardization), bcFeaturesStd, regParamL2, multinomial = 
isMultinomial,
+val getAggregatorFunc = new LogisticAggregator(bcFeaturesStd, 
numClasses, $(fitIntercept),
+  multinomial = isMultinomial)(_)
+val getFeaturesStd = (j: Int) => if (j >= 0 && j < 
numCoefficientSets * numFeatures) {
+  featuresStd(j / numCoefficientSets)
+} else {
+  0.0
+}
+
+val regularization = if (regParamL2 != 0.0) {
+  val shouldApply = (idx: Int) => idx >= 0 && idx < numFeatures * 
numCoefficientSets
--- End diff --

It seems that the `regularization` contains `intercept`, right?

However, the comment in [LogisticRegression.scala: 
1903L](https://github.com/apache/spark/pull/18305/files#diff-3734f1689cb8a80b07974eb93de0795dL1903)
 is:
> // We do not apply regularization to the intercepts



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79694 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79694/testReport)**
 for PR 18654 at commit 
[`f7d7c09`](https://github.com/apache/spark/commit/f7d7c091fbf11dde9e1dde0dae574d477406f5ed).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127872988
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
+val allFiles = dst_dir.listFiles(new FilenameFilter {
+  override def accept(dir: File, name: String): Boolean = {
+!name.startsWith(".") && !name.startsWith("_")
+  }
+})
+// First partition file and the data file
--- End diff --

Can't agree more,  firstly I try to implement like this but the 
`FileFormatWriter.write` can only see the iterator of each task self.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18660
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18660
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79689/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18660
  
**[Test build #79689 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79689/testReport)**
 for PR 18660 at commit 
[`d220290`](https://github.com/apache/spark/commit/d2202903518b3dfa0f4a719a0b9cb5431088ed66).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  public static class LongWrapper implements Serializable `
  * `  public static class IntWrapper implements Serializable `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18664
  
**[Test build #79693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79693/testReport)**
 for PR 18664 at commit 
[`69e1e21`](https://github.com/apache/spark/commit/69e1e21bf4bebc7bea6bd9322e4300df71a90b18).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18664
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79693/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18661: [SPARK-21409][SS] Follow up PR to allow different...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18661


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread tdas

Github user tdas commented on the issue:

https://github.com/apache/spark/pull/18661
  
Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127869754
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

I mean..  for example, if we happen to have a single partition in the `df` 
in any event, I guess this test will become invalid ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18661
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79690/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18661
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18661: [SPARK-21409][SS] Follow up PR to allow different types ...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18661
  
**[Test build #79690 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79690/testReport)**
 for PR 18661 at commit 
[`351c207`](https://github.com/apache/spark/commit/351c20704e5ba2577bd18a5a9dd2f577141c453a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait StateStoreCustomMetric `
  * `case class StateStoreCustomSizeMetric(name: String, desc: String) 
extends StateStoreCustomMetric`
  * `case class StateStoreCustomTimingMetric(name: String, desc: String) 
extends StateStoreCustomMetric`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79687/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127868378
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

I was thinking just in order to make sure the (previous) number of files 
written out.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18654
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18654
  
**[Test build #79687 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79687/testReport)**
 for PR 18654 at commit 
[`6153001`](https://github.com/apache/spark/commit/6153001bc42deee197030ad91fbb4f72bd1aa5d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18631: [SPARK-21410][CORE] Create less partitions for Ra...

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18631


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18631: [SPARK-21410][CORE] Create less partitions for RangePart...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18631
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867549
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
+val allFiles = dst_dir.listFiles(new FilenameFilter {
+  override def accept(dir: File, name: String): Boolean = {
+!name.startsWith(".") && !name.startsWith("_")
+  }
+})
+// First partition file and the data file
--- End diff --

Ideally we only need the first partition file if all other partitions are 
empty, but this is hard to do right now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867486
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
+
+  withTempDir { dst_dir =>
+dst_dir.delete()
+df.where("id = 50").write.parquet(dst_dir.toString)
--- End diff --

why we need repartition?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867380
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
--- End diff --

+1 for the shorter one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867341
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
--- End diff --

+1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867290
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
 committer.setupTask(taskAttemptContext)
 
 val writeTask =
-  if (description.partitionColumns.isEmpty && 
description.bucketIdExpression.isEmpty) {
+  if (sparkPartitionId != 0 && !iterator.hasNext) {
--- End diff --

cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127867254
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 ---
@@ -236,7 +236,10 @@ object FileFormatWriter extends Logging {
 committer.setupTask(taskAttemptContext)
 
 val writeTask =
-  if (description.partitionColumns.isEmpty && 
description.bucketIdExpression.isEmpty) {
+  if (sparkPartitionId != 0 && !iterator.hasNext) {
--- End diff --

This is a little hacky but is the simplest fix I think.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18632: [SPARK-21412][SQL] Reset BufferHolder while initi...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/18632#discussion_r127866899
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
 ---
@@ -51,6 +51,7 @@ public UnsafeRowWriter(BufferHolder holder, int 
numFields) {
 this.nullBitsSize = UnsafeRow.calculateBitSetWidthInBytes(numFields);
 this.fixedSize = nullBitsSize + 8 * numFields;
 this.startingOffset = holder.cursor;
+holder.reset();
--- End diff --

I not very sure about this, but what if this writer is for inner struct? 
Then the buffer holder is shared between many writers and we should only reset 
once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79684/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18633
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18632
  
OK to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18633: [SPARK-21411][YARN] Lazily create FS within kerberized U...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18633
  
**[Test build #79684 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79684/testReport)**
 for PR 18633 at commit 
[`95988c1`](https://github.com/apache/spark/commit/95988c112905018d20c6d78a2ab688164735ede6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866465
  
--- Diff: 
sql/core/src/test/java/test/org/apache/spark/sql/JavaUDFSuite.java ---
@@ -121,4 +122,29 @@ public void udf6Test() {
 Row result = spark.sql("SELECT returnOne()").head();
 Assert.assertEquals(1, result.getInt(0));
   }
+
+  public static class randUDFTest implements UDF1 {
--- End diff --

`RandUDFTest`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -103,4 +110,19 @@ case class UserDefinedFunction protected[sql] (
   udf
 }
   }
+
+  /**
+   * Updates UserDefinedFunction to non-deterministic.
+   *
+   * @since 2.3.0
+   */
+  def nonDeterministic(): UserDefinedFunction = {
--- End diff --

not a big deal, let's keep it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #17848: [SPARK-20586] [SQL] Add deterministic to ScalaUDF...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/17848#discussion_r127866355
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLContextSuite.scala ---
@@ -69,7 +69,7 @@ class SQLContextSuite extends SparkFunSuite with 
SharedSparkContext {
 
 // UDF should not be shared
 def myadd(a: Int, b: Int): Int = a + b
-session1.udf.register[Int, Int, Int]("myadd", myadd)
+session1.udf.register[Int, Int, Int]("myadd", myadd _)
--- End diff --

this sounds like a source code compatibility issue, can we look into it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18632: [SPARK-21412][SQL] Reset BufferHolder while initialize a...

2017-07-17 Thread gczsjdy

Github user gczsjdy commented on the issue:

https://github.com/apache/spark/pull/18632
  
@cloud-fan @viirya @gatorsmile Could you please help me review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127865091
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
--- End diff --

OK, I'll remove this assert and leave a note.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/18654
  
Yep, empty result dir need this meta, otherwise will throw the exception:
```
org.apache.spark.sql.AnalysisException: Unable to infer schema for Parquet. 
It must be specified manually.;
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
  at 
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$9.apply(DataSource.scala:188)
  at scala.Option.getOrElse(Option.scala:121)
  at 
org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:187)
  at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:381)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:190)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:571)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:555)
  ... 48 elided
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-17 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r127864419
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -792,6 +793,76 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 collectAndValidate(df, json, "binaryData.json")
   }
 
+  test("date type conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "date",
+ |  "type" : {
+ |"name" : "date",
+ |"unit" : "DAY"
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 32
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 4,
+ |"columns" : [ {
+ |  "name" : "date",
+ |  "count" : 4,
+ |  "VALIDITY" : [ 1, 1, 1, 1 ],
+ |  "DATA" : [ -1, 0, 16533, 16930 ]
+ |} ]
+ |  } ]
+ |}
+   """.stripMargin
+
+val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US)
+val d1 = new Date(-1)  // "1969-12-31 13:10:15.000 UTC"
+val d2 = new Date(0)  // "1970-01-01 13:10:15.000 UTC"
+val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime)
+val d4 = new Date(sdf.parse("2016-05-09 12:01:01.000 UTC").getTime)
+
+// Date is created unaware of timezone, but DateTimeUtils force 
defaultTimeZone()
+
assert(DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d2)).getTime == 
d2.getTime)
--- End diff --

cc @ueshin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18654
  
schema and the footer in case of Parquet. There is more context here - 
https://github.com/apache/spark/pull/17395#discussion_r107611325.

For example, if we don't write out the empty files, it breaks:

```scala
spark.range(100).filter("id > 100").write.parquet("/tmp/abc")
spark.read.parquet("/tmp/abc").show()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18627: [BACKPORT-2.1][SPARK-19104][SQL] Lambda variables in Ext...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18627
  
thanks, merging to 2.1!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18657: [MINOR] Improve SQLConf messages

2017-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18657


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18657: [MINOR] Improve SQLConf messages

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18657
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18654: [SPARK-21435][SQL] Empty files should be skipped while w...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18654
  
> leaving the first partition for meta writing

What is the meta we need to write?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-17 Thread BryanCutler

Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/18664#discussion_r127861741
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/arrow/ArrowConvertersSuite.scala
 ---
@@ -792,6 +793,76 @@ class ArrowConvertersSuite extends SharedSQLContext 
with BeforeAndAfterAll {
 collectAndValidate(df, json, "binaryData.json")
   }
 
+  test("date type conversion") {
+val json =
+  s"""
+ |{
+ |  "schema" : {
+ |"fields" : [ {
+ |  "name" : "date",
+ |  "type" : {
+ |"name" : "date",
+ |"unit" : "DAY"
+ |  },
+ |  "nullable" : true,
+ |  "children" : [ ],
+ |  "typeLayout" : {
+ |"vectors" : [ {
+ |  "type" : "VALIDITY",
+ |  "typeBitWidth" : 1
+ |}, {
+ |  "type" : "DATA",
+ |  "typeBitWidth" : 32
+ |} ]
+ |  }
+ |} ]
+ |  },
+ |  "batches" : [ {
+ |"count" : 4,
+ |"columns" : [ {
+ |  "name" : "date",
+ |  "count" : 4,
+ |  "VALIDITY" : [ 1, 1, 1, 1 ],
+ |  "DATA" : [ -1, 0, 16533, 16930 ]
+ |} ]
+ |  } ]
+ |}
+   """.stripMargin
+
+val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss.SSS z", Locale.US)
+val d1 = new Date(-1)  // "1969-12-31 13:10:15.000 UTC"
+val d2 = new Date(0)  // "1970-01-01 13:10:15.000 UTC"
+val d3 = new Date(sdf.parse("2015-04-08 13:10:15.000 UTC").getTime)
+val d4 = new Date(sdf.parse("2016-05-09 12:01:01.000 UTC").getTime)
+
+// Date is created unaware of timezone, but DateTimeUtils force 
defaultTimeZone()
+
assert(DateTimeUtils.toJavaDate(DateTimeUtils.fromJavaDate(d2)).getTime == 
d2.getTime)
--- End diff --

@holdenk @cloud-fan I'm trying out the DateType conversion and ran into 
this problem.  The Dataset encoder uses `DateTimeUtils.toJavaDate` and 
`fromJavaDate` similar to above, and this forces a `defaultTimeZone()` when 
working with the data.  So a value `new Date(0)` should be the epoch, but in my 
timezone it forces it to be the day before and the test above will not pass.  

What are your thoughts on this, should the conversion to Arrow assume the 
defaultTimeZone()?  is this something that should be fixed first in Spark?  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18664
  
**[Test build #79693 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79693/testReport)**
 for PR 18664 at commit 
[`69e1e21`](https://github.com/apache/spark/commit/69e1e21bf4bebc7bea6bd9322e4300df71a90b18).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18658
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18658
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79683/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18658: [SPARK-20871][SQL] limit logging of Janino code

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18658
  
**[Test build #79683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79683/testReport)**
 for PR 18658 at commit 
[`52b20f3`](https://github.com/apache/spark/commit/52b20f38f550dacc4896d061c5ac7f69ad56f875).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Tim...

2017-07-17 Thread BryanCutler

GitHub user BryanCutler opened a pull request:

https://github.com/apache/spark/pull/18664

[SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp support to 
ArrowConverters for toPandas() Conversion

## What changes were proposed in this pull request?

WIP started with DateType

## How was this patch tested?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/BryanCutler/spark 
arrow-date-timestamp-SPARK-21375

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18664.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18664


commit 5aa8b9e72aee17ffa51f4cb1048f5a3f93a5a380
Author: Bryan Cutler 
Date:   2017-07-13T16:53:31Z

added date type and started test, still some issue with time difference

commit 20313f92758e5639b309ba810945a8415941ef86
Author: Bryan Cutler 
Date:   2017-07-18T00:42:15Z

DateTimeUtils forces defaultTimeZone

commit 69e1e21bf4bebc7bea6bd9322e4300df71a90b18
Author: Bryan Cutler 
Date:   2017-07-18T00:48:47Z

fix style checks




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-07-17 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16630
  
**[Test build #79688 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79688/testReport)**
 for PR 16630 at commit 
[`57f1e5c`](https://github.com/apache/spark/commit/57f1e5c259d7f237324dd1b3b481b7e82952b53e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16630
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79688/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #18654: [SPARK-21435][SQL] Empty files should be skipped ...

2017-07-17 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18654#discussion_r127860788
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileFormatWriterSuite.scala
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.{File, FilenameFilter}
+
+import org.apache.spark.sql.QueryTest
+import org.apache.spark.sql.test.SharedSQLContext
+
+class FileFormatWriterSuite extends QueryTest with SharedSQLContext {
+
+  test("empty file should be skipped while write to file") {
+withTempDir { dir =>
+  dir.delete()
+  spark.range(1).repartition(10).write.parquet(dir.toString)
+  val df = spark.read.parquet(dir.toString)
+  val allFiles = dir.listFiles(new FilenameFilter {
+override def accept(dir: File, name: String): Boolean = {
+  !name.startsWith(".") && !name.startsWith("_")
+}
+  })
+  assert(allFiles.length == 10)
--- End diff --

but I guess this one (the latter) does not test this change? If this test 
passes regardless of this PR change, I would rather remove this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #16630: [SPARK-19270][ML] Add summary table to GLM summary

2017-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16630
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79686/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18660: [SPARK-21445] Make IntWrapper and LongWrapper in UTF8Str...

2017-07-17 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/18660
  
good catch! LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 389 matches

Mail list logo