[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99883/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23142
  
**[Test build #99883 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99883/testReport)**
 for PR 23142 at commit 
[`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240026727
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

can we check the metrics of `SortExec` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23249: [SPARK-26297][SQL] improve the doc of Distributio...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23249#discussion_r240026485
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
 ---
@@ -118,10 +115,12 @@ case class HashClusteredDistribution(
 
 /**
  * Represents data where tuples have been ordered according to the 
`ordering`
- * [[Expression Expressions]].  This is a strictly stronger guarantee than
- * [[ClusteredDistribution]] as an ordering will ensure that tuples that 
share the
- * same value for the ordering expressions are contiguous and will never 
be split across
- * partitions.
+ * [[Expression Expressions]]. Its requirement is defined as the following:
+ *   - Given any 2 adjacent partitions, all the rows of the second 
partition must be larger than or
+ * equal to any row in the first partition, according to the 
`ordering` expressions.
--- End diff --

Note that, only sort requires `OrderedDistribution`, and global sort 
doesn't care if there are equal-rows across partitions.

Here is a definition of the requirement. When designing protocols, it's 
important to make the requirement as weak as possible, and make guarantees as 
strong as possible.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partitioned t...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23255
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23255: [SPARK-26307] [SQL] Fix CTAS when INSERT a partit...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23255#discussion_r240026441
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/InsertSuite.scala ---
@@ -752,6 +752,17 @@ class InsertSuite extends QueryTest with 
TestHiveSingleton with BeforeAndAfter
 }
   }
 
+  test("CTAS: INSERT a partitioned table using Hive serde") {
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23262#discussion_r240026394
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -53,7 +53,7 @@ object RDDConversions {
 data.mapPartitions { iterator =>
   val numColumns = outputTypes.length
   val mutableRow = new GenericInternalRow(numColumns)
-  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter)
+  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter).toArray
--- End diff --

shall we use `RowEncoder` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23262: [SPARK-26312][SQL]Converting converters in RDDCon...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23262#discussion_r240026388
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -33,7 +33,7 @@ object RDDConversions {
 data.mapPartitions { iterator =>
   val numColumns = outputTypes.length
   val mutableRow = new GenericInternalRow(numColumns)
-  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter)
+  val converters = 
outputTypes.map(CatalystTypeConverters.createToCatalystConverter).toArray
--- End diff --

shall we use `ExpressionEncoder` here?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23248: [SPARK-26293][SQL] Cast exception when having pyt...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23248#discussion_r240026330
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala
 ---
@@ -131,8 +131,20 @@ object ExtractPythonUDFs extends Rule[LogicalPlan] 
with PredicateHelper {
 expressions.flatMap(collectEvaluableUDFs)
   }
 
-  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case plan: LogicalPlan => extract(plan)
+  def apply(plan: LogicalPlan): LogicalPlan = plan match {
+// SPARK-26293: A subquery will be rewritten into join later, and will 
go through this rule
+// eventually. Here we skip subquery, as Python UDF only needs to be 
extracted once.
+case _: Subquery => plan
--- End diff --

I think you have a point here. If subquery will be converted to join, why 
do we need to optimize subquery ahead?

Anyway, that's something we need to discuss later. cc @dilipbiswal for the 
subquery question.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23253#discussion_r240026245
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -35,7 +35,9 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 3.0, CSV datasource uses java.time API for parsing and 
generating CSV content. New formatting implementation supports date/timestamp 
patterns conformed to ISO 8601. To switch back to the implementation used in 
Spark 2.4 and earlier, set `spark.sql.legacy.timeParser.enabled` to `true`.
 
-  - In Spark version 2.4 and earlier, CSV datasource converts a malformed 
CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, 
returned row can contain non-`null` fields if some of CSV column values were 
parsed and converted to desired types successfully.
+  - In Spark version 2.4 and earlier, CSV datasource converts a malformed 
CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, 
the returned row can contain non-`null` fields if some of CSV column values 
were parsed and converted to desired types successfully.
+
+  - In Spark version 2.4 and earlier, JSON datasource and JSON functions 
like `from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, the 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
--- End diff --

does `from_csv` support it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23253#discussion_r240026237
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala
 ---
@@ -229,6 +229,11 @@ private[json] trait TestJsonData {
   """{"date": "27/10/2014 18:30"}""" ::
   """{"date": "28/01/2016 20:00"}""" :: Nil))(Encoders.STRING)
 
+  def badRecords: Dataset[String] =
--- End diff --

if it's only used in one test, let's move it to that test


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23265
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5900/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23265
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23204
  
+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23265
  
**[Test build #99884 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99884/testReport)**
 for PR 23265 at commit 
[`6a837c0`](https://github.com/apache/spark/commit/6a837c019eaf7bc9907715a54778bfbb339f3342).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with ...

2018-12-08 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23265

[2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in 
UnsafeWriter

backport https://github.com/apache/spark/pull/23239 to 2.4

-

## What changes were proposed in this pull request?

A followup of https://github.com/apache/spark/pull/23043

There are 4 places we need to deal with NaN and -0.0:
1. comparison expressions. `-0.0` and `0.0` should be treated as same. 
Different NaNs should be treated as same.
2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs 
should be treated as same.
3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. 
Different NaNs should be assigned to the same group.
4. window partition keys. `-0.0` and `0.0` should be treated as same. 
Different NaNs should be treated as same.

The case 1 is OK. Our comparison already handles NaN and -0.0, and for 
struct/array/map, we will recursively compare the fields/elements.

Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary 
directly, and different NaNs have different binary representation, and the same 
thing happens for -0.0 and 0.0.

To fix it, a simple solution is: normalize float/double when building 
unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't 
need to worry about it anymore.

Following this direction, this PR moves the handling of NaN and -0.0 from 
`Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not 
handle them, which reduces the perf overhead. It's also easier to add comments 
explaining why we do it in `UnsafeWriter`.

## How was this patch tested?

existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark minor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23265


commit 6a837c019eaf7bc9907715a54778bfbb339f3342
Author: Wenchen Fan 
Date:   2018-12-08T19:18:09Z

[SPARK-26021][SQL][FOLLOWUP] only deal with NaN and -0.0 in UnsafeWriter

A followup of https://github.com/apache/spark/pull/23043

There are 4 places we need to deal with NaN and -0.0:
1. comparison expressions. `-0.0` and `0.0` should be treated as same. 
Different NaNs should be treated as same.
2. Join keys. `-0.0` and `0.0` should be treated as same. Different NaNs 
should be treated as same.
3. grouping keys. `-0.0` and `0.0` should be assigned to the same group. 
Different NaNs should be assigned to the same group.
4. window partition keys. `-0.0` and `0.0` should be treated as same. 
Different NaNs should be treated as same.

The case 1 is OK. Our comparison already handles NaN and -0.0, and for 
struct/array/map, we will recursively compare the fields/elements.

Case 2, 3 and 4 are problematic, as they compare `UnsafeRow` binary 
directly, and different NaNs have different binary representation, and the same 
thing happens for -0.0 and 0.0.

To fix it, a simple solution is: normalize float/double when building 
unsafe data (`UnsafeRow`, `UnsafeArrayData`, `UnsafeMapData`). Then we don't 
need to worry about it anymore.

Following this direction, this PR moves the handling of NaN and -0.0 from 
`Platform` to `UnsafeWriter`, so that places like `UnsafeRow.setFloat` will not 
handle them, which reduces the perf overhead. It's also easier to add comments 
explaining why we do it in `UnsafeWriter`.

existing tests

Closes #23239 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23265: [2.4][SPARK-26021][SQL][FOLLOWUP] only deal with NaN and...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23265
  
cc @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23258: [SPARK-23375][SQL][FOLLOWUP][TEST] Test Sort metr...

2018-12-08 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/23258#discussion_r240024723
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala
 ---
@@ -182,10 +182,13 @@ class SQLMetricsSuite extends SparkFunSuite with 
SQLMetricsTestUtils with Shared
   }
 
   test("Sort metrics") {
-// Assume the execution plan is
-// WholeStageCodegen(nodeId = 0, Range(nodeId = 2) -> Sort(nodeId = 1))
-val ds = spark.range(10).sort('id)
-testSparkPlanMetrics(ds.toDF(), 2, Map.empty)
+// Assume the execution plan with node id is
+// Sort(nodeId = 0)
+//   Exchange(nodeId = 1)
+// LocalTableScan(nodeId = 2)
+val df = Seq(1, 3, 2).toDF("id").sort('id)
+testSparkPlanMetrics(df, 2, Map.empty)
--- End diff --

@mgaido91 This case tries to check `Sort` (nodeId=0) metrics, rather than 
`LocalTableScan`. The second parameter (`2`) of `testSparkPlanMetrics(df, 2, 
Map.empty)` means `expectedNumOfJobs` rather than `nodeId`. The third parameter 
`expectedMetrics` will pass `nodeId` together with corresponding expected 
metrics. Because metrics of Sort node (including `sortTime`, `peakMemory`, 
`spillSize`) may change during each execution, unlike metrics like 
`numOutputRows`, we have no way to check these values.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5899/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23204
  
@cloud-fan and @JkSelf .
For the partial revert, we had better create a new Apache JIRA issue. That 
will be a more cleaner way to backport.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23142
  
**[Test build #99883 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99883/testReport)**
 for PR 23142 at commit 
[`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23142
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23142
  
**[Test build #99882 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99882/testReport)**
 for PR 23142 at commit 
[`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99882/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...

2018-12-08 Thread seancxmao
Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/23238
  
Thank you! @dongjoon-hyun 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22683: [SPARK-25696] The storage memory displayed on spark Appl...

2018-12-08 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22683
  
Fortunately the syntax is "100m", which has always meant "100 * 1024 * 
1024" or "100 MiB"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread JkSelf
Github user JkSelf commented on the issue:

https://github.com/apache/spark/pull/23204
  
@cloud-fan  ok, i will revert as your comments later.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread JkSelf
Github user JkSelf commented on the issue:

https://github.com/apache/spark/pull/23204
  
The result of all queries in tpcds with 1TB data scale is in [tpcds 
result](https://docs.google.com/spreadsheets/d/18a5BdOlmm8euTaRodyeWum9yu92mbWWu6JbhGXtr7yE/edit#gid=0)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23252: [SPARK-26239] File-based secret key loading for S...

2018-12-08 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/23252#discussion_r240022921
  
--- Diff: core/src/test/scala/org/apache/spark/SecurityManagerSuite.scala 
---
@@ -440,12 +473,27 @@ class SecurityManagerSuite extends SparkFunSuite with 
ResetSystemProperties {
 intercept[IllegalArgumentException] {
   mgr.getSecretKey()
 }
+  case FILE =>
+val secretFile = createTempSecretFile()
+conf.set(AUTH_SECRET_FILE, secretFile.getAbsolutePath)
+mgr.initializeAuth()
+assert(encodeFileAsBase64(secretFile) === 
mgr.getSecretKey())
 }
   }
 }
   )
 }
   }
 
+  private def encodeFileAsBase64(secretFile: File) = {
+Base64.getEncoder.encodeToString(Files.readAllBytes(secretFile.toPath))
+  }
+
+  private def createTempSecretFile(contents: String = "test-secret"): File 
= {
+val secretDir = Utils.createTempDir("temp-secrets")
+val secretFile = new File(secretDir, "temp-secret.txt")
+Files.write(secretFile.toPath, 
contents.getBytes(StandardCharsets.UTF_8))
+secretFile
--- End diff --

ah it's fine


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23224: [SPARK-26277][SQL][TEST] WholeStageCodegen metrics shoul...

2018-12-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23224
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23256: [SPARK-24207][R] follow-up PR for SPARK-24207 to fix cod...

2018-12-08 Thread felixcheung
Github user felixcheung commented on the issue:

https://github.com/apache/spark/pull/23256
  
ideally, but really not for this PR


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23201: [SPARK-26246][SQL] Infer date and timestamp types...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23201#discussion_r240022552
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala
 ---
@@ -121,7 +122,26 @@ private[sql] class JsonInferSchema(options: 
JSONOptions) extends Serializable {
 DecimalType(bigDecimal.precision, bigDecimal.scale)
 }
 decimalTry.getOrElse(StringType)
-  case VALUE_STRING => StringType
+  case VALUE_STRING =>
+val stringValue = parser.getText
--- End diff --

If we switch the order here, we don't need the length check 
[here](https://github.com/apache/spark/pull/23201/files#diff-e925de14239f40430d05f9ffd0360f10R130),
 right?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22088: [SPARK-24931][CORE]CoarseGrainedExecutorBackend send wro...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22088
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23207: [SPARK-26193][SQL] Implement shuffle write metric...

2018-12-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23207


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r240022467
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala 
---
@@ -222,4 +222,61 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
   assert(df4.columns === Array("str", "max_int"))
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
--- End diff --

Please fix this first for the first and second review comments.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23207: [SPARK-26193][SQL] Implement shuffle write metrics in SQ...

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23207
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/23204
  
according to 
https://github.com/apache/spark/pull/23214#issuecomment-443999282 , the hash 
join metrics is wrongly implemented. I think it's fine to revert it and 
re-implement it later.

@JkSelf can you address the comments and only revert the hash join part? 
thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23142
  
**[Test build #99882 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99882/testReport)**
 for PR 23142 at commit 
[`56f39cc`](https://github.com/apache/spark/commit/56f39cc5838c3f609c8657639ac3a45991fde99f).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5898/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23142
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23142: [SPARK-26170][SS] Add missing metrics in FlatMapGroupsWi...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23142
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23204: Revert "[SPARK-21052][SQL] Add hash map metrics to join"

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23204
  
Hi, @LuciferYang . If we are not going to revert this, could you close this 
PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration d...

2018-12-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23238


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23238
  
Thanks, @seancxmao .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23238: [SPARK-25132][SQL][FOLLOWUP][DOC] Add migration doc for ...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23238
  
Merged to master and branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23253
  
cc @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23257
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99881/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23257
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23257
  
**[Test build #99881 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99881/testReport)**
 for PR 23257 at commit 
[`9ee62ac`](https://github.com/apache/spark/commit/9ee62acc50bd5fb72a84a620e0fe50e27f7df515).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23259
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23259
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99880/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23259
  
**[Test build #99880 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99880/testReport)**
 for PR 23259 at commit 
[`01bc383`](https://github.com/apache/spark/commit/01bc38347496a6194b46ace0feb7d2cd1adb614e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99879/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23253
  
**[Test build #99879 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99879/testReport)**
 for PR 23253 at commit 
[`fc088df`](https://github.com/apache/spark/commit/fc088df2e98133e11d8fe60f899d3212e4cbee35).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23242
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99877/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23242
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23242
  
**[Test build #99877 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99877/testReport)**
 for PR 23242 at commit 
[`de531c8`](https://github.com/apache/spark/commit/de531c81f3274145a1de839c80d93e3c37264da0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99878/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23169
  
**[Test build #99878 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99878/testReport)**
 for PR 23169 at commit 
[`855f540`](https://github.com/apache/spark/commit/855f5404c53eba51ed373fa9e7be4eaafd60bb30).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23257
  
**[Test build #99881 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99881/testReport)**
 for PR 23257 at commit 
[`9ee62ac`](https://github.com/apache/spark/commit/9ee62acc50bd5fb72a84a620e0fe50e27f7df515).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23257
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread MaxGekk
Github user MaxGekk commented on the issue:

https://github.com/apache/spark/pull/23257
  
@cloud-fan What do you think of the PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23257: [SPARK-26310][SQL] Verify applicability of JSON options

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23257
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5897/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23259
  
cc @hvanhovell , @mgaido91 , too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23259
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5896/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23259
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23259
  
**[Test build #99880 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99880/testReport)**
 for PR 23259 at commit 
[`01bc383`](https://github.com/apache/spark/commit/01bc38347496a6194b46ace0feb7d2cd1adb614e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23259: [SPARK-26215][SQL][WIP] Define reserved/non-reserved key...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23259
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23253
  
**[Test build #99879 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99879/testReport)**
 for PR 23253 at commit 
[`fc088df`](https://github.com/apache/spark/commit/fc088df2e98133e11d8fe60f899d3212e4cbee35).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5895/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread MaxGekk
Github user MaxGekk commented on a diff in the pull request:

https://github.com/apache/spark/pull/23253#discussion_r240014440
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -37,6 +37,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, CSV datasource converts a malformed 
CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, 
returned row can contain non-`null` fields if some of CSV column values were 
parsed and converted to desired types successfully.
 
+  - In Spark version 2.4 and earlier, JSON datasource and JSON functions 
like `from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
--- End diff --

Fixed for CSV too


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23108: [Spark-25993][SQL][TEST]Add test cases for CREATE...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23108#discussion_r240014393
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/HiveOrcSourceSuite.scala 
---
@@ -190,4 +192,103 @@ class HiveOrcSourceSuite extends OrcSuite with 
TestHiveSingleton {
   }
 }
   }
+
+  test("SPARK-25993 CREATE EXTERNAL TABLE with subdirectories") {
+Seq(true, false).foreach { convertMetastore =>
+  withSQLConf(HiveUtils.CONVERT_METASTORE_ORC.key -> 
s"$convertMetastore") {
+withTempDir { dir =>
+  val dataDir = new 
File(s"${dir.getCanonicalPath}/l3/l2/l1/").toURI
+  val parentDir = s"${dir.getCanonicalPath}/l3/l2/"
+  val l3Dir = s"${dir.getCanonicalPath}/l3/"
+  val wildcardParentDir = new File(s"${dir}/l3/l2/*").toURI
+  val wildcardL3Dir = new File(s"${dir}/l3/*").toURI
+
+  try {
+hiveClient.runSqlHive("USE default")
+hiveClient.runSqlHive(
+  """
+|CREATE EXTERNAL TABLE hive_orc(
+|  C1 INT,
+|  C2 INT,
+|  C3 STRING)
+|STORED AS orc""".stripMargin)
+// Hive throws an exception if I assign the location in the 
create table statement.
+hiveClient.runSqlHive(
+  s"ALTER TABLE hive_orc SET LOCATION '$dataDir'")
+hiveClient.runSqlHive(
+  """
+|INSERT INTO TABLE hive_orc
+|VALUES (1, 1, 'orc1'), (2, 2, 'orc2')""".stripMargin)
+
+withTable("tbl1", "tbl2", "tbl3", "tbl4") {
+  val parentDirStatement =
+s"""
+   |CREATE EXTERNAL TABLE tbl1(
+   |  c1 int,
+   |  c2 int,
+   |  c3 string)
+   |STORED AS orc
+   |LOCATION '${parentDir}'""".stripMargin
+  sql(parentDirStatement)
+  val parentDirSqlStatement = s"select * from tbl1"
+  if (convertMetastore) {
+checkAnswer(sql(parentDirSqlStatement), Nil)
+  } else {
+checkAnswer(sql(parentDirSqlStatement),
+  (1 to 2).map(i => Row(i, i, s"orc$i")))
+  }
+
+  val l3DirStatement =
+s"""
+   |CREATE EXTERNAL TABLE tbl2(
+   |  c1 int,
+   |  c2 int,
+   |  c3 string)
+   |STORED AS orc
+   |LOCATION '${l3Dir}'""".stripMargin
+  sql(l3DirStatement)
+  val l3DirSqlStatement = s"select * from tbl2"
+  if (convertMetastore) {
+checkAnswer(sql(l3DirSqlStatement), Nil)
+  } else {
+checkAnswer(sql(l3DirSqlStatement),
+  (1 to 2).map(i => Row(i, i, s"orc$i")))
+  }
+
+  val wildcardStatement =
+s"""
+   |CREATE EXTERNAL TABLE tbl3(
+   |  c1 int,
+   |  c2 int,
+   |  c3 string)
+   |STORED AS orc
+   |LOCATION '$wildcardParentDir'""".stripMargin
+  sql(wildcardStatement)
+  val wildcardSqlStatement = s"select * from tbl3"
+  if (convertMetastore) {
+checkAnswer(sql(wildcardSqlStatement),
+  (1 to 2).map(i => Row(i, i, s"orc$i")))
+  } else {
+checkAnswer(sql(wildcardSqlStatement), Nil)
+  }
+
+  val wildcardL3Statement =
+s"""
+   |CREATE EXTERNAL TABLE tbl4(
+   |  c1 int,
+   |  c2 int,
+   |  c3 string)
+   |STORED AS orc
+   |LOCATION '$wildcardL3Dir'""".stripMargin
+  sql(wildcardL3Statement)
+  val wildcardL3SqlStatement = s"select * from tbl4"
+  checkAnswer(sql(wildcardL3SqlStatement), Nil)
+}
+  } finally {
+hiveClient.runSqlHive("DROP TABLE IF EXISTS hive_orc")
--- End diff --

Got it. I missed that.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (bran...

2018-12-08 Thread srowen
Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/23264


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23225: [SPARK-26287][CORE]Don't need to create an empty ...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23225#discussion_r240014126
  
--- Diff: 
core/src/test/java/org/apache/spark/shuffle/sort/UnsafeShuffleWriterSuite.java 
---
@@ -235,11 +235,8 @@ public void writeEmptyIterator() throws Exception {
 final Option mapStatus = writer.stop(true);
 assertTrue(mapStatus.isDefined());
 assertTrue(mergedOutputFile.exists());
+assertEquals(0, spillFilesCreated.size());
 assertArrayEquals(new long[NUM_PARTITITONS], 
partitionSizesInMergedFile);
-assertEquals(0, taskMetrics.shuffleWriteMetrics().recordsWritten());
-assertEquals(0, taskMetrics.shuffleWriteMetrics().bytesWritten());
-assertEquals(0, taskMetrics.diskBytesSpilled());
-assertEquals(0, taskMetrics.memoryBytesSpilled());
--- End diff --

We need to keep these test coverage guaranteeing that the task metrics 
remained an untouched state.
Please revert this removal.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5894/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23239: [SPARK-26021][SQL][followup] only deal with NaN a...

2018-12-08 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23239


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23169
  
**[Test build #99878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99878/testReport)**
 for PR 23169 at commit 
[`855f540`](https://github.com/apache/spark/commit/855f5404c53eba51ed373fa9e7be4eaafd60bb30).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23239: [SPARK-26021][SQL][followup] only deal with NaN and -0.0...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23239
  
Hi, @cloud-fan . Please make another PR for `branch-2.4`. There is a 
conflict on `branch-2.4`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23253: [SPARK-26303][SQL] Return partial results for bad...

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/23253#discussion_r240013729
  
--- Diff: docs/sql-migration-guide-upgrade.md ---
@@ -37,6 +37,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, CSV datasource converts a malformed 
CSV string to a row with all `null`s in the PERMISSIVE mode. Since Spark 3.0, 
returned row can contain non-`null` fields if some of CSV column values were 
parsed and converted to desired types successfully.
 
+  - In Spark version 2.4 and earlier, JSON datasource and JSON functions 
like `from_json` convert a bad JSON record to a row with all `null`s in the 
PERMISSIVE mode when specified schema is `StructType`. Since Spark 3.0, 
returned row can contain non-`null` fields if some of JSON column values were 
parsed and converted to desired types successfully.
--- End diff --

`returned row` -> `the returned row`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (branch-2.4)

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23264
  
This is merged now. Please close the PR, @srowen . :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23264: [SPARK-26266][BUILD] Update to Scala 2.12.8 (branch-2.4)

2018-12-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/23264
  
Merged to branch-2.4.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23242
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5893/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23242
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23242: [SPARK-26285][CORE] accumulator metrics sources for Long...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23242
  
**[Test build #99877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99877/testReport)**
 for PR 23242 at commit 
[`de531c8`](https://github.com/apache/spark/commit/de531c81f3274145a1de839c80d93e3c37264da0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23242: [SPARK-26285][CORE] accumulator metrics sources f...

2018-12-08 Thread abellina
Github user abellina commented on a diff in the pull request:

https://github.com/apache/spark/pull/23242#discussion_r240013209
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/AccumulatorMetricsTest.scala 
---
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+// scalastyle:off println
+package org.apache.spark.examples
+
+import org.apache.spark.metrics.source.{DoubleAccumulatorSource, 
LongAccumulatorSource}
+import org.apache.spark.sql.SparkSession
+
+/**
+ * Usage: AccumulatorMetricsTest [partitions] [numElem] [blockSize]
+ */
+object AccumulatorMetricsTest {
--- End diff --

@redsanket yes at first I thought that, but there are other many other 
"examples" here with the suffix Test.  


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23253
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99875/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23253: [SPARK-26303][SQL] Return partial results for bad JSON r...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23253
  
**[Test build #99875 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99875/testReport)**
 for PR 23253 at commit 
[`b19b3e1`](https://github.com/apache/spark/commit/b19b3e1ce25836a419b056a63bc320f1b82dc1b1).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99876/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/23169
  
**[Test build #99876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99876/testReport)**
 for PR 23169 at commit 
[`f6d0efc`](https://github.com/apache/spark/commit/f6d0efc7c0e1d461e5854c6e04f3347f174bf13a).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class GBTClassifierParams(GBTParams, HasVarianceImpurity):`
  * `class GBTClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol,`
  * `class HasDistanceMeasure(Params):`
  * `class HasValidationIndicatorCol(Params):`
  * `class HasVarianceImpurity(Params):`
  * `class TreeRegressorParams(HasVarianceImpurity):`
  * `class GBTParams(TreeEnsembleParams, HasMaxIter, HasStepSize, 
HasValidationIndicatorCol):`
  * `class GBTRegressorParams(GBTParams, TreeRegressorParams):`
  * `class GBTRegressor(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol,`
  * `class ArrowCollectSerializer(Serializer):`
  * `class CSVInferSchema(val options: CSVOptions) extends Serializable `
  * `class InterpretedSafeProjection(expressions: Seq[Expression]) extends 
Projection `
  * `sealed trait DateTimeFormatter `
  * `class Iso8601DateTimeFormatter(`
  * `class LegacyDateTimeFormatter(`
  * `class LegacyFallbackDateTimeFormatter(`
  * `sealed trait DateFormatter `
  * `class Iso8601DateFormatter(`
  * `class LegacyDateFormatter(`
  * `class LegacyFallbackDateFormatter(`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5892/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for...

2018-12-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/23169
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   >