[GitHub] spark pull request #21757: [SPARK-24797] [SQL] respect spark.sql.hive.conver...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21757#discussion_r202248225
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala
 ---
@@ -254,13 +254,15 @@ class FindDataSourceTable(sparkSession: SparkSession) 
extends Rule[LogicalPlan]
 
   override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
 case i @ InsertIntoTable(UnresolvedCatalogRelation(tableMeta), _, _, 
_, _)
-if DDLUtils.isDatasourceTable(tableMeta) =>
+if DDLUtils.isDatasourceTable(tableMeta) &&
+  DDLUtils.convertSchema(tableMeta, sparkSession) =>
--- End diff --

I do not think this is a right fix. If the original table is the native 
data source table, we will always use our parquet/orc reader instead of hive 
serde. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21758
  
**[Test build #92962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92962/testReport)**
 for PR 21758 at commit 
[`c25ec47`](https://github.com/apache/spark/commit/c25ec473ff078c071aec513953f56c64e6a228a4).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `trait BarrierTaskContext extends TaskContext `
  * `class BarrierTaskContextImpl(`
  * `class RDDBarrier[T: ClassTag](rdd: RDD[T]) `
  * `case class WorkerOffer(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92962/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21652: [SPARK-24551][K8S] Add integration tests for secrets

2018-07-12 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/21652
  
Looks like the integration tests have been failing for the past few runs. 
Otherwise LGTM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21758
  
**[Test build #92962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92962/testReport)**
 for PR 21758 at commit 
[`c25ec47`](https://github.com/apache/spark/commit/c25ec473ff078c071aec513953f56c64e6a228a4).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21758
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/919/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21758: [SPARK-24795][CORE] Implement barrier execution mode

2018-07-12 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21758
  
cc @mengxr @gatorsmile @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the ext...

2018-07-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20795


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21758: [SPARK-24795][CORE] Implement barrier execution m...

2018-07-12 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/21758

[SPARK-24795][CORE] Implement barrier execution mode

## What changes were proposed in this pull request?

Propose new APIs and modify job/task scheduling to support barrier 
execution mode, which requires all tasks in a same barrier stage start at the 
same time, and retry all tasks in case some tasks fail in the middle. The 
barrier execution mode is useful for some ML/DL workloads.

The proposed API changes include:
`RDDBarrier` that marks an RDD as barrier (Spark must launch all the tasks 
together for the current stage).
`BarrierTaskContext` that support global sync of all tasks in a barrier 
stage, and provide extra `BarrierTaskInfo`s.

In DAGScheduler, we retry all tasks of a barrier stage in case some tasks 
fail in the middle, this is achieved by unregistering map outputs for a 
shuffleId (for ShuffleMapStage) or clear the finished partitions in an active 
job (for ResultStage).

## How was this patch tested?

Add `RDDBarrierSuite` to ensure we convert RDDs correctly;
Add new test cases in `DAGSchedulerSuite` to ensure we do task scheduling 
correctly;
Add new test cases in `SparkContextSuite` to ensure the barrier execution 
mode actually works (both under local mode and local cluster mode).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark barrier-execution-mode

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21758.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21758


commit c25ec473ff078c071aec513953f56c64e6a228a4
Author: Xingbo Jiang 
Date:   2018-07-12T17:38:58Z

implement barrier execution mode.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20795
  
Thanks! Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-12 Thread jiangxb1987
Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/21698
  
IIUC the output produced by `rdd1.zip(rdd2).map(v => (computeKey(v._1, 
v._2), computeValue(v._1, v._2)))` shall always have the same cardinality, no 
matter how many tasks are retried, so where is the data loss issue?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21745
  
**[Test build #92961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92961/testReport)**
 for PR 21745 at commit 
[`9e00db9`](https://github.com/apache/spark/commit/9e00db938ddc6293899170e19b41530b22fb525a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/918/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21757
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21757
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92960/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21757
  
**[Test build #92960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92960/testReport)**
 for PR 21757 at commit 
[`a5d72cc`](https://github.com/apache/spark/commit/a5d72cc2cc77da7d8fab0cfc4a48959b774c).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

2018-07-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21745#discussion_r202242420
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2387,4 +2387,25 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 val mapWithBinaryKey = map(lit(Array[Byte](1.toByte)), lit(1))
 
checkAnswer(spark.range(1).select(mapWithBinaryKey.getItem(Array[Byte](1.toByte))),
 Row(1))
   }
+
+  test("SPARK-24781: Using a reference from Dataset in Filter/Sort might 
not work") {
+val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
+val filter1 = df.select(df("name")).filter(df("id") === 0)
+val filter2 = df.select(col("name")).filter(col("id") === 0)
+checkAnswer(filter1, filter2.collect())
+
+val sort1 = df.select(df("name")).orderBy(df("id"))
+val sort2 = df.select(col("name")).orderBy(col("id"))
+checkAnswer(sort1, sort2.collect())
+
+withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
--- End diff --

Will update it in next commit.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92958/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21745
  
**[Test build #92958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92958/testReport)**
 for PR 21745 at commit 
[`a98f416`](https://github.com/apache/spark/commit/a98f4161c682b90755e9599a437241dcaeb388b5).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21556#discussion_r202240358
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -225,12 +316,44 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
   def createFilter(schema: MessageType, predicate: sources.Filter): 
Option[FilterPredicate] = {
 val nameToType = getFieldMap(schema)
 
+def isDecimalMatched(value: Any, decimalMeta: DecimalMetadata): 
Boolean = value match {
+  case decimal: JBigDecimal =>
+decimal.scale == decimalMeta.getScale
+  case _ => false
+}
+
+// Decimal type must make sure that filter value's scale matched the 
file.
+// If doesn't matched, which would cause data corruption.
+// Other types must make sure that filter value's type matched the 
file.
--- End diff --

I would say like .. Parquet's type in the given file should be matched to 
the value's type in the pushed filter in order to push down the filter to 
Parquet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21556#discussion_r202239380
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -225,12 +316,44 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean, pushDownStartWith:
   def createFilter(schema: MessageType, predicate: sources.Filter): 
Option[FilterPredicate] = {
 val nameToType = getFieldMap(schema)
 
+def isDecimalMatched(value: Any, decimalMeta: DecimalMetadata): 
Boolean = value match {
+  case decimal: JBigDecimal =>
+decimal.scale == decimalMeta.getScale
+  case _ => false
+}
+
+// Decimal type must make sure that filter value's scale matched the 
file.
--- End diff --

Shall we leave this comment around the decimal `case`s below or around 
`isDecimalMatched`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21556
  
@rdblue, so basically  you mean it looks both equality comparison and 
nullsafe equality comparison are identically pushed down and looks it should be 
distinguished; otherwise, there could be a potential problem? If so, yup. I 
agree with it.

I think we won't have actually a chance to push down equality comparison or 
nullsafe equality comparison with actual `null` value by the optimizer. 
However, sure, I think we shouldn't relay on it. I think actually we should 
disallow one of both nullsafe equality comparison or equality comparison with 
`null` in `ParquetFilters`.

Thing is, I remember I checked the inside of Parquet's equality comparison 
API itself is actually nullsafe a long ago like few years ago - this of course 
should be double checked.

Since this PR doesn't change the existing behaviour on this and looks 
needing some more investigation (e.g., checking if it is still (or it has been) 
true what I remembered and checked about Parquet's equality comparison), 
probably, it might be okay to leave it as is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21745
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21745#discussion_r202236189
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2387,4 +2387,25 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
 val mapWithBinaryKey = map(lit(Array[Byte](1.toByte)), lit(1))
 
checkAnswer(spark.range(1).select(mapWithBinaryKey.getItem(Array[Byte](1.toByte))),
 Row(1))
   }
+
+  test("SPARK-24781: Using a reference from Dataset in Filter/Sort might 
not work") {
+val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
+val filter1 = df.select(df("name")).filter(df("id") === 0)
+val filter2 = df.select(col("name")).filter(col("id") === 0)
+checkAnswer(filter1, filter2.collect())
+
+val sort1 = df.select(df("name")).orderBy(df("id"))
+val sort2 = df.select(col("name")).orderBy(col("id"))
+checkAnswer(sort1, sort2.collect())
+
+withSQLConf(SQLConf.DATAFRAME_RETAIN_GROUP_COLUMNS.key -> "false") {
--- End diff --

This test case should be split to two. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92954/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20795
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20795
  
**[Test build #92954 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92954/testReport)**
 for PR 20795 at commit 
[`26f2f54`](https://github.com/apache/spark/commit/26f2f540d30f2e87405489513220468e7708742b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...

2018-07-12 Thread huaxingao
Github user huaxingao commented on the issue:

https://github.com/apache/spark/pull/21645
  
Thanks! @HyukjinKwon @felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21690: [SPARK-24713]AppMatser of spark streaming kafka OOM if t...

2018-07-12 Thread yuanboliu
Github user yuanboliu commented on the issue:

https://github.com/apache/spark/pull/21690
  
After applying this patch, my application can be running successfully. This 
issue could happen in the case of many topics(hundreds of ) consumed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21447: [SPARK-24339][SQL]Add project for transform/map/reduce s...

2018-07-12 Thread xdcjie
Github user xdcjie commented on the issue:

https://github.com/apache/spark/pull/21447
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20795: [SPARK-23486]cache the function name from the external c...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/20795
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20795: [SPARK-23486]cache the function name from the ext...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20795#discussion_r202231590
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1204,16 +1207,46 @@ class Analyzer(
* only performs simple existence check according to the function 
identifier to quickly identify
* undefined functions without triggering relation resolution, which may 
incur potentially
* expensive partition/schema discovery process in some cases.
-   *
+   * In order to avoid duplicate external functions lookup, the external 
function identifier will
+   * store in the local hash set externalFunctionNameSet.
* @see [[ResolveFunctions]]
* @see https://issues.apache.org/jira/browse/SPARK-19737
*/
   object LookupFunctions extends Rule[LogicalPlan] {
-override def apply(plan: LogicalPlan): LogicalPlan = 
plan.transformAllExpressions {
-  case f: UnresolvedFunction if !catalog.functionExists(f.name) =>
-withPosition(f) {
-  throw new 
NoSuchFunctionException(f.name.database.getOrElse("default"), f.name.funcName)
-}
+override def apply(plan: LogicalPlan): LogicalPlan = {
+  val externalFunctionNameSet = new 
mutable.HashSet[FunctionIdentifier]()
+  plan.transformAllExpressions {
+case f: UnresolvedFunction
+  if externalFunctionNameSet.contains(normalizeFuncName(f.name)) 
=> f
+case f: UnresolvedFunction if catalog.isRegisteredFunction(f.name) 
=> f
+case f: UnresolvedFunction if catalog.isPersistentFunction(f.name) 
=>
+  externalFunctionNameSet.add(normalizeFuncName(f.name))
+  f
+case f: UnresolvedFunction =>
+  withPosition(f) {
+throw new 
NoSuchFunctionException(f.name.database.getOrElse(catalog.getCurrentDatabase),
+  f.name.funcName)
+  }
+  }
+}
+
+def normalizeFuncName(name: FunctionIdentifier): FunctionIdentifier = {
--- End diff --

This is a common utility function. We can refactor the code later. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21757
  
**[Test build #92960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92960/testReport)**
 for PR 21757 at commit 
[`a5d72cc`](https://github.com/apache/spark/commit/a5d72cc2cc77da7d8fab0cfc4a48959b774c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21757
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/917/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21757
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21757: [SQL][SPARK-24797] respect spark.sql.hive.convert...

2018-07-12 Thread CodingCat
GitHub user CodingCat opened a pull request:

https://github.com/apache/spark/pull/21757

[SQL][SPARK-24797] respect spark.sql.hive.convertMetastoreOrc/Parquet when 
build…


## What changes were proposed in this pull request?

the current code path ignore the value of 
spark.sql.hive.convertMetastoreParquet when building data source table 

 


https://github.com/apache/spark/blob/e0559f238009e02c40f65678fec691c07904e8c0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L263

as a result, even I turned off spark.sql.hive.convertMetastoreParquet, 
Spark SQL still uses its own parquet reader to access table instead of delegate 
to serder

This PR checks the value of the configuration when building data source 
table

## How was this patch tested?

existing test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/CodingCat/spark SPARK-24797

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21757.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21757


commit a5d72cc2cc77da7d8fab0cfc4a48959b774c
Author: Nan Zhu 
Date:   2018-07-13T02:44:25Z

respect respect spark.sql.hive.convertMetastoreOrc/Parquet when build the 
data source table




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21757: [SPARK-24797] [SQL] respect spark.sql.hive.convertMetast...

2018-07-12 Thread CodingCat
Github user CodingCat commented on the issue:

https://github.com/apache/spark/pull/21757
  
@felixcheung 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21645: [SPARK-24537][R]Add array_remove / array_zip / ma...

2018-07-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21645


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21645: [SPARK-24537][R]Add array_remove / array_zip / map_from_...

2018-07-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/21645
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
**[Test build #92959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92959/testReport)**
 for PR 21748 at commit 
[`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92959/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/916/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/916/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21748
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/916/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21748
  
**[Test build #92959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92959/testReport)**
 for PR 21748 at commit 
[`88a9d7f`](https://github.com/apache/spark/commit/88a9d7fa94e17e55f8e28d8922cff759625b1e42).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21748: [SPARK-23146][K8S] Support client mode.

2018-07-12 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21748
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21263: [SPARK-24084][ThriftServer] Add job group id for ...

2018-07-12 Thread caneGuy
Github user caneGuy closed the pull request at:

https://github.com/apache/spark/pull/21263


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21745
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21698: [SPARK-23243][Core] Fix RDD.repartition() data correctne...

2018-07-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/21698
  
> Given this, there is no ambiguity in cardinality of zip().map() ... which 
two tuples from rdd1 and rdd2 get zip'ed together can be arbitrary : and I 
agree about that.

yes, but the following `.groupByKey().map()` has ambiguity in cardinality 
because the tulples get zipped can be arbitrary, isn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92949/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21608
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21608: [SPARK-24626] [SQL] Improve location size calculation in...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21608
  
**[Test build #92949 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92949/testReport)**
 for PR 21608 at commit 
[`98ee81b`](https://github.com/apache/spark/commit/98ee81ba8581e57ff0bc098d0b05254cf72adada).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/915/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21745
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21745: [SPARK-24781][SQL] Using a reference from Dataset in Fil...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21745
  
**[Test build #92958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92958/testReport)**
 for PR 21745 at commit 
[`a98f416`](https://github.com/apache/spark/commit/a98f4161c682b90755e9599a437241dcaeb388b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21745: [SPARK-24781][SQL] Using a reference from Dataset...

2018-07-12 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21745#discussion_r202217276
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -1165,15 +1173,19 @@ class Analyzer(
 (newExprs, AnalysisBarrier(newChild))
 
   case p: Project =>
+// Resolving expressions against current plan.
 val maybeResolvedExprs = exprs.map(resolveExpression(_, p))
+// Recursively resolving expressions on the child of current 
plan.
 val (newExprs, newChild) = 
resolveExprsAndAddMissingAttrs(maybeResolvedExprs, p.child)
-val missingAttrs = AttributeSet(newExprs) -- 
AttributeSet(maybeResolvedExprs)
+// If some attributes used by expressions are resolvable only 
on the rewritten child
+// plan, we need to add them into original projection.
+val missingAttrs = (AttributeSet(newExprs) -- 
p.outputSet).intersect(newChild.outputSet)
--- End diff --

Without this `intersect`, some tests fail, e.g.,  `group-analytics.sql` in 
`SQLQueryTestSuite`. Some attributes are resolved on parent plans, not on child 
plans. We can't add them as missing attributes here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92957/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92957 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92957/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/914/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/914/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/914/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21556: [SPARK-24549][SQL] Support Decimal type push down...

2018-07-12 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/21556#discussion_r202214356
  
--- Diff: sql/core/benchmarks/FilterPushdownBenchmark-results.txt ---
@@ -292,120 +292,120 @@ Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
 
 Select 1 decimal(9, 2) row (value = 7864320): Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
 

-Parquet Vectorized3785 / 3867  4.2 
240.6   1.0X
-Parquet Vectorized (Pushdown) 3820 / 3928  4.1 
242.9   1.0X
-Native ORC Vectorized 3981 / 4049  4.0 
253.1   1.0X
-Native ORC Vectorized (Pushdown)   702 /  735 22.4 
 44.6   5.4X
+Parquet Vectorized4407 / 4852  3.6 
280.2   1.0X
+Parquet Vectorized (Pushdown) 1602 / 1634  9.8 
101.8   2.8X
--- End diff --

Here is a test:
```scala
// decimal(9, 2) max values is 999.99
// 1024 * 1024 * 15 =  15728640
val path = "/tmp/spark/parquet"
spark.range(1024 * 1024 * 15).selectExpr("cast((id) as decimal(9, 2)) as 
id").orderBy("id").write.mode("overwrite").parquet(path)
```
The generated parquet metadata:
```shell
$ java -jar ./parquet-tools/target/parquet-tools-1.10.1-SNAPSHOT.jar meta  
/tmp/spark/parquet
file:
file:/tmp/spark/parquet/part-0-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
creator: parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 

file schema: spark_schema 


id:  OPTIONAL INT32 O:DECIMAL R:0 D:1

row group 1: RC:5728640 TS:36 OFFSET:4 


id:   INT32 SNAPPY DO:0 FPO:4 SZ:38/36/0.95 VC:5728640 
ENC:PLAIN,BIT_PACKED,RLE ST:[no stats for this column]
file:
file:/tmp/spark/parquet/part-1-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
creator: parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 

file schema: spark_schema 


id:  OPTIONAL INT32 O:DECIMAL R:0 D:1

row group 1: RC:651016 TS:2604209 OFFSET:4 


id:   INT32 SNAPPY DO:0 FPO:4 SZ:2604325/2604209/1.00 VC:651016 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 0.00, max: 651015.00, num_nulls: 0]
file:
file:/tmp/spark/parquet/part-2-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
creator: parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 

file schema: spark_schema 


id:  OPTIONAL INT32 O:DECIMAL R:0 D:1

row group 1: RC:3231146 TS:12925219 OFFSET:4 


id:   INT32 SNAPPY DO:0 FPO:4 SZ:12925864/12925219/1.00 VC:3231146 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 651016.00, max: 3882161.00, num_nulls: 0]
file:
file:/tmp/spark/parquet/part-3-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
creator: parquet-mr version 1.10.0 (build 
031a6654009e3b82020012a18434c582bd74c73a) 
extra:   org.apache.spark.sql.parquet.row.metadata = 
{"type":"struct","fields":[{"name":"id","type":"decimal(9,2)","nullable":true,"metadata":{}}]}
 

file schema: spark_schema 


id:  OPTIONAL INT32 O:DECIMAL R:0 D:1

row group 1: RC:2887956 TS:11552408 OFFSET:4 


id:   INT32 SNAPPY DO:0 FPO:4 SZ:11552986/11552408/1.00 VC:2887956 
ENC:PLAIN,BIT_PACKED,RLE ST:[min: 3882162.00, max: 6770117.00, num_nulls: 0]
file:
file:/tmp/spark/parquet/part-4-26b38556-494a-4b89-923e-69ea73365488-c000.snappy.parquet
 
creator: 

[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21750
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21750
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92953/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21750: [SPARK-24754][ML] Minhash integer overflow

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21750
  
**[Test build #92953 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92953/testReport)**
 for PR 21750 at commit 
[`55f70ee`](https://github.com/apache/spark/commit/55f70ee3ee146a41c6f89121c2544959302cd79d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92956 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92956/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92957 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92957/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92956/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21583
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/913/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/913/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/913/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21753: [SPARK-24790][SQL] Allow complex aggregate expres...

2018-07-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/21753


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21753: [SPARK-24790][SQL] Allow complex aggregate expres...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/21753#discussion_r202211733
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -586,12 +581,17 @@ class Analyzer(
 }
 }
 
-private def isAggregateExpression(expr: Expression): Boolean = {
-  expr match {
-case Alias(e, _) => isAggregateExpression(e)
-case AggregateExpression(_, _, _, _) => true
-case _ => false
-  }
+// Support any aggregate expression that can appear in an Aggregate 
plan except Pandas UDF.
+// TODO: Support Pandas UDF.
+private def checkValidAggregateExpression(expr: Expression): Unit = 
expr match {
+  case _: AggregateExpression => // OK and leave the argument check to 
CheckAnalysis.
+  case expr: PythonUDF if PythonUDF.isGroupedAggPandasUDF(expr) =>
--- End diff --

I created a JIRA for this support 
https://issues.apache.org/jira/browse/SPARK-24796 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21753: [SPARK-24790][SQL] Allow complex aggregate expressions i...

2018-07-12 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21753
  
LGTM

Thanks! Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92956 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92956/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread shaneknapp
Github user shaneknapp commented on the issue:

https://github.com/apache/spark/pull/21583
  
test this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
**[Test build #92955 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92955/testReport)**
 for PR 21583 at commit 
[`c0b5927`](https://github.com/apache/spark/commit/c0b5927ec80853403a129e15fded372a9170a0db).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92955/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test status failure
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/912/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/912/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21583
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21583: [SPARK-23984][K8S][Test] Added Integration Tests for PyS...

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21583
  
Kubernetes integration test starting
URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/912/



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21102
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21102
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92947/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21102: [SPARK-23913][SQL] Add array_intersect function

2018-07-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21102
  
**[Test build #92947 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92947/testReport)**
 for PR 21102 at commit 
[`7d789e2`](https://github.com/apache/spark/commit/7d789e221dd6c6d4d7176dcec87a867ec5386a60).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21556: [SPARK-24549][SQL] Support Decimal type push down to the...

2018-07-12 Thread rdblue
Github user rdblue commented on the issue:

https://github.com/apache/spark/pull/21556
  
@wangyum, can you explain what was happening with the `decimal(9,2)` 
benchmark more clearly? I asked additional questions, but the thread is on a 
line that changed so it's collapsed by default.

Also, `valueCanMakeFilterOn` returns true for all null values, so I think 
we still have a problem there. Conversion from EqualNullSafe needs to support 
null filter values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21163: [SPARK-24097][ML] Instrumentation improvements - RandomF...

2018-07-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21163
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92950/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >