[GitHub] spark issue #21298: [SPARK-24198][SparkR][SQL] Adding slice function to Spar...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21298
  
lgtm too except that the nit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21288: [SPARK-24206][SQL] Improve FilterPushdownBenchmar...

2018-05-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21288#discussion_r187764083
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/FilterPushdownBenchmark.scala ---
@@ -32,14 +32,14 @@ import org.apache.spark.util.{Benchmark, Utils}
  */
 object FilterPushdownBenchmark {
   val conf = new SparkConf()
-  conf.set("orc.compression", "snappy")
-  conf.set("spark.sql.parquet.compression.codec", "snappy")
+.setMaster("local[1]")
+.setAppName("FilterPushdownBenchmark")
+.set("spark.driver.memory", "3g")
--- End diff --

these and master - change to setIfMissing()? I think it's great if these 
can be set via config


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21301
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90538/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21301
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21301
  
**[Test build #90538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90538/testReport)**
 for PR 21301 at commit 
[`bdbd409`](https://github.com/apache/spark/commit/bdbd409bb85793edd8eb8e5ff58c1d094ef12555).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21298: [SPARK-24198][SparkR][SQL] Adding slice function ...

2018-05-11 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/21298#discussion_r187763906
  
--- Diff: R/pkg/R/functions.R ---
@@ -3138,6 +3139,23 @@ setMethod("size",
 column(jc)
   })
 
+#' @details
+#' \code{slice}: Returns an array containing all the elements in x from 
the index start
+#' (or starting from the end if start is negative) with the specified 
length.
+#'
+#' @rdname column_collection_functions
+#' @param start an index indicating the first element occuring in the 
result.
+#' @param length a number of consecutive elements choosen to the result.
+#'
--- End diff --

nit: remove unneeded line


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21309: [SPARK-23907] Removes regr_* functions in functio...

2018-05-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21309


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
Just had a short talk with Reynold. LGTM too. thanks for bearing with me.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21309
  
Better compile time error. Plus a lot of people are already using these.

On Fri, May 11, 2018 at 7:35 PM Hyukjin Kwon 
wrote:

> Yup, then why not just deprecate other functions in other APIs for 3.0.0,
> and promote the usage of expr?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21296: [SPARK-24244][SQL] Passing only required columns to the ...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21296
  
Can we update the migration guide then? I want to see if the note makes 
sense.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21296: [SPARK-24244][SQL] Passing only required columns ...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21296#discussion_r187762615
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/UnivocityParser.scala
 ---
@@ -73,11 +64,24 @@ class UnivocityParser(
   // Each input token is placed in each output row's position by mapping 
these. In this case,
   //
   //   output row - ["A", 2]
-  private val valueConverters: Array[ValueConverter] =
-schema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  private val valueConverters: Array[ValueConverter] = {
+requiredSchema.map(f => makeConverter(f.name, f.dataType, f.nullable, 
options)).toArray
+  }
 
-  private val tokenIndexArr: Array[Int] = {
-requiredSchema.map(f => schema.indexOf(f)).toArray
+  private val tokenizer = {
+val parserSetting = options.asParserSettings
+if (requiredSchema.length < schema.length) {
+  val tokenIndexArr = requiredSchema.map(f => 
java.lang.Integer.valueOf(schema.indexOf(f)))
+  parserSetting.selectIndexes(tokenIndexArr: _*)
+}
+new CsvParser(parserSetting)
+  }
+
+  private val row = new GenericInternalRow(requiredSchema.length)
--- End diff --

Seems we don't need to move this down.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21309
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90537/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21309
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21309
  
**[Test build #90537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90537/testReport)**
 for PR 21309 at commit 
[`ce2c305`](https://github.com/apache/spark/commit/ce2c305169d90c4d7803338d85d2d4c92a8e1d3c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/21302#discussion_r187762385
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   }
 }
   }
+
+  test("SPARK-23852: Broken Parquet push-down for partially-written 
stats") {
+// parquet-1217.parquet contains a single column with values -1, 0, 1, 
2 and null.
+// The row-group statistics include null counts, but not min and max 
values, which
+// triggers PARQUET-1217.
+val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21304: Fix typo in UDF type match error message

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21304
  
It's okay but mind quickly checking other typos around this place while we 
are here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
Yup, then why not just deprecate other functions in other APIs for 3.0.0, 
and promote the usage of expr?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21309
  
Adding it to sql would allow it to be available everywhere (through expr)
right?

On Fri, May 11, 2018 at 7:30 PM Hyukjin Kwon 
wrote:

> Thing is, I am a bit confused when to add it to other APIs. I thought if
> it's expected to be less commonly used, it shouldn't be added at the first
> place. We have UDFs.
>
> I have been a bit confused of some functions specifically not added into
> other APIs. If that's worth being added in an API, I thought it makes 
sense
> to add it to other APIs too. Is there a reason to add them to SQL side
> specifically?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
I am asking this to use the same judgement for when to add it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
Thing is, I am a bit confused when to add it to other APIs. I thought if 
it's expected to be less commonly used, it shouldn't be added at the first 
place. We have UDFs.

I have been a bit confused of some functions specifically not added into 
other APIs. If that's worth being added in an API, I thought it makes sense to 
add it to other APIs too. Is there a reason to add them to SQL side 
specifically?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21309
  
Btw it’s been always the case that the less commonly used functions are 
not
part of this file. There is just a lot of overhead to maintaining all of
them.

I’m not even sure if the regr_* expressions should be added in the first
place.

On Fri, May 11, 2018 at 7:20 PM Hyukjin Kwon 
wrote:

> @rxin , how about splitting up this file by the
> group or something, or deprecating all the functions that can be called 
via
> expr for 3.0.0? To me, it looked a bit odd when some functions exist and
> some did not. It was an actual use case and I had to check which function
> exists or not every time.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21054
  
There is not a single function that can’t be called by expr. It mainly 
adds
some type safety.

On Fri, May 11, 2018 at 7:18 PM Hyukjin Kwon 
wrote:

> *@HyukjinKwon* commented on this pull request.
> --
>
> In sql/core/src/main/scala/org/apache/spark/sql/functions.scala
> :
>
> > @@ -775,6 +775,178 @@ object functions {
> */
>def var_pop(columnName: String): Column = var_pop(Column(columnName))
>
> +  /**
> +   * Aggregate function: returns the number of non-null pairs.
> +   *
> +   * @group agg_funcs
> +   * @since 2.4.0
> +   */
> +  def regr_count(y: Column, x: Column): Column = withAggregateFunction {
>
> @rxin , how about splitting up this file by the
> group or something, or deprecating all the functions that can be called 
via
> expr for 3.0.0? To me, it looked a bit odd when some functions exist and
> some did not. It was an actual use case and I had to check which function
> exists or not every time.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21309
  
@rxin, how about splitting up this file by the group or something, or 
deprecating all the functions that can be called via expr for 3.0.0? To me, it 
looked a bit odd when some functions exist and some did not. It was an actual 
use case and I had to check which function exists or not every time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21054#discussion_r187761743
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -775,6 +775,178 @@ object functions {
*/
   def var_pop(columnName: String): Column = var_pop(Column(columnName))
 
+  /**
+   * Aggregate function: returns the number of non-null pairs.
+   *
+   * @group agg_funcs
+   * @since 2.4.0
+   */
+  def regr_count(y: Column, x: Column): Column = withAggregateFunction {
--- End diff --

@rxin, how about splitting up this file by the group or something, or 
deprecating all the functions that can be called via expr for 3.0.0? To me, it 
looked a bit odd when some functions exist and some did not. It was an actual 
use case and I had to check which function exists or not every time.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90536/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21302
  
**[Test build #90536 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)**
 for PR 21302 at commit 
[`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90535/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21305
  
**[Test build #90535 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90535/testReport)**
 for PR 21305 at commit 
[`d2e4c41`](https://github.com/apache/spark/commit/d2e4c41c3dc139cc602270d2ed9bdbbb02fd50be).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21308
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21308
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90534/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21308
  
**[Test build #90534 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90534/testReport)**
 for PR 21308 at commit 
[`ffbd3cb`](https://github.com/apache/spark/commit/ffbd3cb320ba260e6ecbda58a1c90b164aa0a97e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21301
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90532/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21301
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3165/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21306
  
**[Test build #90532 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90532/testReport)**
 for PR 21306 at commit 
[`7130d13`](https://github.com/apache/spark/commit/7130d13de27c99480189cdce3b7f00749a801e9c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21301
  
**[Test build #90538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90538/testReport)**
 for PR 21301 at commit 
[`bdbd409`](https://github.com/apache/spark/commit/bdbd409bb85793edd8eb8e5ff58c1d094ef12555).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21246: [SPARK-23901][SQL] Add masking functions

2018-05-11 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21246#discussion_r187759378
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/maskExpressions.scala
 ---
@@ -0,0 +1,569 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.commons.codec.digest.DigestUtils
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.catalyst.expressions.MaskExpressionsUtils._
+import org.apache.spark.sql.catalyst.expressions.MaskLike._
+import org.apache.spark.sql.catalyst.expressions.codegen.{CodegenContext, 
CodeGenerator, ExprCode}
+import org.apache.spark.sql.types._
+import org.apache.spark.unsafe.types.UTF8String
+
+
+trait MaskLike {
+  def upper: String
+  def lower: String
+  def digit: String
+
+  protected lazy val upperReplacement: Int = getReplacementChar(upper, 
defaultMaskedUppercase)
+  protected lazy val lowerReplacement: Int = getReplacementChar(lower, 
defaultMaskedLowercase)
+  protected lazy val digitReplacement: Int = getReplacementChar(digit, 
defaultMaskedDigit)
+
+  protected val maskUtilsClassName: String = 
classOf[MaskExpressionsUtils].getName
+
+  def inputStringLengthCode(inputString: String, length: String): String = 
{
+s"${CodeGenerator.JAVA_INT} $length = $inputString.codePointCount(0, 
$inputString.length());"
+  }
+
+  def appendMaskedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($maskUtilsClassName.transformChar($codePoint,
+   |$upperReplacement, $lowerReplacement,
+   |$digitReplacement, $defaultMaskedOther));
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendUnchangedToStringBuilderCode(
+  ctx: CodegenContext,
+  sb: String,
+  inputString: String,
+  offset: String,
+  numChars: String): String = {
+val i = ctx.freshName("i")
+val codePoint = ctx.freshName("codePoint")
+s"""
+   |for (${CodeGenerator.JAVA_INT} $i = 0; $i < $numChars; $i++) {
+   |  ${CodeGenerator.JAVA_INT} $codePoint = 
$inputString.codePointAt($offset);
+   |  $sb.appendCodePoint($codePoint);
+   |  $offset += Character.charCount($codePoint);
+   |}
+ """.stripMargin
+  }
+
+  def appendMaskedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(transformChar(
+codePoint,
+upperReplacement,
+lowerReplacement,
+digitReplacement,
+defaultMaskedOther))
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+
+  def appendUnchangedToStringBuffer(
+  sb: StringBuffer,
+  inputString: String,
+  startOffset: Int,
+  numChars: Int): Int = {
+var offset = startOffset
+(1 to numChars) foreach { _ =>
+  val codePoint = inputString.codePointAt(offset)
+  sb.appendCodePoint(codePoint)
+  offset += Character.charCount(codePoint)
+}
+offset
+  }
+}
+
+trait MaskLikeWithN extends MaskLike {
+  def n: Int
+  protected lazy val charCount: Int = if (n < 0) 0 else n
+}
+
+/**
+ * Utils for mask 

[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21301
  
@dongjoon-hyun thanks, done


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21303: [BUILD] Close stale PRs

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21303
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21303: [BUILD] Close stale PRs

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21303
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90526/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21303: [BUILD] Close stale PRs

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21303
  
**[Test build #90526 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90526/testReport)**
 for PR 21303 at commit 
[`8f80f75`](https://github.com/apache/spark/commit/8f80f754bd5642433f6d1fee31c0c1dcd28e6d33).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21278: [SPARKR] Require Java 8 for SparkR

2018-05-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21278


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21301: [SPARK-24228][SQL] Fix Java lint errors

2018-05-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21301
  
@kiszk . Could you run for all?
```
~/PR-21301:PR-21301$ dev/lint-java
exec: curl --progress-bar -L 
https://downloads.typesafe.com/zinc/0.3.15/zinc-0.3.15.tgz
 
100.0%
exec: curl --progress-bar -L 
https://downloads.typesafe.com/scala/2.11.8/scala-2.11.8.tgz
 
100.0%
exec: curl --progress-bar -L 
https://www.apache.org/dyn/closer.lua?action=download=/maven/maven-3/3.3.9/binaries/apache-maven-3.3.9-bin.tar.gz
 
100.0%
Using `mvn` from path: 
/home/dongjoon/PR-21301/build/apache-maven-3.3.9/bin/mvn
Checkstyle checks failed at following occurrences:
[ERROR] 
src/main/java/org/apache/spark/sql/sources/v2/reader/partitioning/Distribution.java:[25]
 (sizes) LineLength: Line is longer than 100 characters (found 109).
[ERROR] 
src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReader.java:[38]
 (sizes) LineLength: Line is longer than 100 characters (found 102).
[ERROR] 
src/test/java/test/org/apache/spark/sql/sources/v2/JavaAdvancedDataSourceV2.java:[110]
 (sizes) LineLength: Line is longer than 100 characters (found 101).
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90527/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21302
  
**[Test build #90527 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90527/testReport)**
 for PR 21302 at commit 
[`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21278: [SPARKR] Require Java 8 for SparkR

2018-05-11 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/21278
  
Merging this to master and branch-2.3


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21309
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3164/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21309
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21276
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21276
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90525/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21276
  
**[Test build #90525 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90525/testReport)**
 for PR 21276 at commit 
[`3d067b8`](https://github.com/apache/spark/commit/3d067b883a947b0b3b3dd200c7767125001aefef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21309
  
**[Test build #90537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90537/testReport)**
 for PR 21309 at commit 
[`ce2c305`](https://github.com/apache/spark/commit/ce2c305169d90c4d7803338d85d2d4c92a8e1d3c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21309: [SPARK-23907] Removes regr_* functions in functions.scal...

2018-05-11 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/21309
  
cc @gatorsmile @mgaido91 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21309: [SPARK-23907] Removes regr_* functions in functio...

2018-05-11 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/21309

[SPARK-23907] Removes regr_* functions in functions.scala

## What changes were proposed in this pull request?
This patch removes the various regr_* functions in functions.scala. They 
are so uncommon that I don't think they deserve real estate in functions.scala. 
We can consider adding them later if more users need them.

## How was this patch tested?
Removed the associated test case as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark SPARK-23907

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21309


commit ce2c305169d90c4d7803338d85d2d4c92a8e1d3c
Author: Reynold Xin 
Date:   2018-05-11T23:24:15Z

[SPARK-23907] Removes regr_ functions in functions.scala




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21054: [SPARK-23907][SQL] Add regr_* functions

2018-05-11 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/21054#discussion_r187751801
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -775,6 +775,178 @@ object functions {
*/
   def var_pop(columnName: String): Column = var_pop(Column(columnName))
 
+  /**
+   * Aggregate function: returns the number of non-null pairs.
+   *
+   * @group agg_funcs
+   * @since 2.4.0
+   */
+  def regr_count(y: Column, x: Column): Column = withAggregateFunction {
--- End diff --

do we need all of these? seems like users can just invoke expr to do them. 
this file is getting very long.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3163/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21303: [BUILD] Close stale PRs

2018-05-11 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/21303
  
I'll leave #18229 just to err on the side of caution, but it does look 
stale.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21302
  
**[Test build #90536 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90536/testReport)**
 for PR 21302 at commit 
[`8566ba1`](https://github.com/apache/spark/commit/8566ba19cd194330002d40efccbae40388d6b0b3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21276
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90524/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21276: [SPARK-24216][SQL] Spark TypedAggregateExpression uses g...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21276
  
**[Test build #90524 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90524/testReport)**
 for PR 21276 at commit 
[`9ff048a`](https://github.com/apache/spark/commit/9ff048ab08b55e48fbcf7acd7672e3205b46b9aa).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/21302
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21307
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90533/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21307
  
**[Test build #90533 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90533/testReport)**
 for PR 21307 at commit 
[`b3dd256`](https://github.com/apache/spark/commit/b3dd256957004621c39838b4f9eaddc431cb7bc6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21302
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90523/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21302
  
**[Test build #90523 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90523/testReport)**
 for PR 21302 at commit 
[`c681819`](https://github.com/apache/spark/commit/c681819ae4af46b685b4dcca0039b0be13ce1bb0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21302
  
cc @liancheng @michal-databricks @cloud-fan Please double check and confirm 
the risk of these two Parquet PRs is low. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21305
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3162/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21305
  
**[Test build #90535 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90535/testReport)**
 for PR 21305 at commit 
[`d2e4c41`](https://github.com/apache/spark/commit/d2e4c41c3dc139cc602270d2ed9bdbbb02fd50be).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21308
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21308
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3161/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21302: [SPARK-23852][SQL] Upgrade to Parquet 1.8.3

2018-05-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/21302#discussion_r187745471
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -602,6 +602,16 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
   }
 }
   }
+
+  test("SPARK-23852: Broken Parquet push-down for partially-written 
stats") {
+// parquet-1217.parquet contains a single column with values -1, 0, 1, 
2 and null.
+// The row-group statistics include null counts, but not min and max 
values, which
+// triggers PARQUET-1217.
+val df = readResourceParquetFile("test-data/parquet-1217.parquet")
--- End diff --

Since this test case assumes `spark.sql.parquet.filterPushdown=true`, let's 
use the followings.
```scala
withSQLConf(SQLConf.PARQUET_FILTER_PUSHDOWN_ENABLED.key -> "true",
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21308: SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21308
  
**[Test build #90534 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90534/testReport)**
 for PR 21308 at commit 
[`ffbd3cb`](https://github.com/apache/spark/commit/ffbd3cb320ba260e6ecbda58a1c90b164aa0a97e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19802: [SPARK-22594][CORE] Handling spark-submit and mas...

2018-05-11 Thread Jiri-Kremser
Github user Jiri-Kremser closed the pull request at:

https://github.com/apache/spark/pull/19802


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21308: SPARK-24253: Add DeleteSupport mix-in for DataSou...

2018-05-11 Thread rdblue
GitHub user rdblue opened a pull request:

https://github.com/apache/spark/pull/21308

SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.

## What changes were proposed in this pull request?

Adds `DeleteSupport` mix-in for `DataSourceV2`. This mix-in provides a 
method to delete data with catalyst expressions in support of `delete from` and 
overwrite logical operations.

## How was this patch tested?

No tests, this adds an interface.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rdblue/spark SPARK-24253-add-v2-delete-support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21308


commit c0243cd5807142a3c61e5615406842d9d97bf7de
Author: Ryan Blue 
Date:   2018-05-11T22:04:15Z

SPARK-24253: Add DeleteSupport mix-in for DataSourceV2.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21307
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3160/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21307
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21307: [SPARK-24186][R][SQL]change reverse and concat to collec...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21307
  
**[Test build #90533 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90533/testReport)**
 for PR 21307 at commit 
[`b3dd256`](https://github.com/apache/spark/commit/b3dd256957004621c39838b4f9eaddc431cb7bc6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21307: [SPARK-24186][R][SQL]change reverse and concat to...

2018-05-11 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/21307

[SPARK-24186][R][SQL]change reverse and concat to collection functions in R



## What changes were proposed in this pull request?

reverse and concat are already in functions.R as column string functions. 
Since now these two functions are categorized as collection functions in scala 
and python, we will do the same in R.

## How was this patch tested?

Add test in test_sparkSQL.R


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark_24186

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21307.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21307


commit b3dd256957004621c39838b4f9eaddc431cb7bc6
Author: Huaxin Gao 
Date:   2018-05-11T21:40:06Z

[SPARK-24186][R][SQL]change reverse and concat to collection functions in R




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21218: [SPARK-24155][ML] Instrumentation improvements for clust...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21218
  
**[Test build #90529 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90529/testReport)**
 for PR 21218 at commit 
[`4e2cb81`](https://github.com/apache/spark/commit/4e2cb8141c5a0389fb619f3d423021768de91904).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21218: [SPARK-24155][ML] Instrumentation improvements for clust...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21218
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21218: [SPARK-24155][ML] Instrumentation improvements for clust...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21218
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90529/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to include...

2018-05-11 Thread BryanCutler
Github user BryanCutler commented on the issue:

https://github.com/apache/spark/pull/20280
  
closing now, will revisit for Spark 3.0


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20280: [SPARK-22232][PYTHON][SQL] Fixed Row pickling to ...

2018-05-11 Thread BryanCutler
Github user BryanCutler closed the pull request at:

https://github.com/apache/spark/pull/20280


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21299
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90520/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21299
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21299: [SPARK-24250][SQL] support accessing SQLConf inside task...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21299
  
**[Test build #90520 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90520/testReport)**
 for PR 21299 at commit 
[`2ecabe4`](https://github.com/apache/spark/commit/2ecabe4fd984bb6a3f909364dcee27490c7a5d0a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21183
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/90528/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21183: [SPARK-22210][ML] Add seed for LDA variationalTopicInfer...

2018-05-11 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21183
  
**[Test build #90528 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/90528/testReport)**
 for PR 21183 at commit 
[`a846937`](https://github.com/apache/spark/commit/a8469374b00c0466d480367992799ca77a7afa06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21306: [SPARK-24252][SQL] Add DataSourceV2 mix-in for catalog s...

2018-05-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21306
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3159/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >