[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22484
  
**[Test build #96500 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96500/testReport)**
 for PR 22484 at commit 
[`2d778a4`](https://github.com/apache/spark/commit/2d778a4c8fb5d3b373856837496882c05ff1d42d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3399/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark to use m...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22484
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22526: [SPARK-25502][WEBUI]Empty Page when page number e...

2018-09-23 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22526#discussion_r219732766
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -685,7 +685,15 @@ private[ui] class TaskDataSource(
 
   private var _tasksToShow: Seq[TaskData] = null
 
-  override def dataSize: Int = taskCount(stage)
+  override def dataSize: Int = {
+val storedTasks = store.taskCount(stage.stageId, stage.attemptId).toInt
+val totalTasks = taskCount(stage)
+if (totalTasks > storedTasks) {
--- End diff --

Yes. totalTasks will be always greater than or equal to  storedTasks. We 
can simply return storedTasks. But for better understanding I have put it in 
the if else condition.

I have modified the code based on your suggestion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use...

2018-09-23 Thread yucai
Github user yucai commented on a diff in the pull request:

https://github.com/apache/spark/pull/22495#discussion_r219731873
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SortBenchmark.scala
 ---
@@ -28,12 +28,15 @@ import org.apache.spark.util.random.XORShiftRandom
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.SortBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/-results.txt".
+ * }}}
  */
-class SortBenchmark extends BenchmarkWithCodegen {
+object SortBenchmark extends BenchmarkBase {
--- End diff --

@dongjoon-hyun `SortBenchmark` does not use any function provided in 
`BenchmarkWithCodegen`, so I remove it.
Another option is like #22484 did, make `BenchmarkWithCodegen` extend 
`BenchmarkBase`, and then `SortBenchmark` can extend `BenchmarkWithCodegen`.
Do you prefer the 2nd way?

BTW, congratulations! :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [WIP][SPARK-25497][SQL] Limit operation within whole sta...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22524
  
**[Test build #96499 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96499/testReport)**
 for PR 22524 at commit 
[`a09e60f`](https://github.com/apache/spark/commit/a09e60f1e026504657f3de7669eb79cc0b4c2c8c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [WIP][SPARK-25497][SQL] Limit operation within whole sta...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22524
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [WIP][SPARK-25497][SQL] Limit operation within whole sta...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22524
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3398/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22524: [WIP][SPARK-25497][SQL] Limit operation within wh...

2018-09-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/22524#discussion_r219731695
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala ---
@@ -556,7 +556,7 @@ class DataFrameAggregateSuite extends QueryTest with 
SharedSQLContext {
   Seq(Row(1, 2, Seq("a", "b")), Row(3, 2, Seq("c", "c", "d"
   }
 
-  test("SPARK-18004 limit + aggregates") {
+  test("SPARK-18528 limit + aggregates") {
--- End diff --

This JIRA number is wrong.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-23 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
> See JIRA, I don't think this should be merged.

I have referred Databricks doc 
https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-database.html
 and implemented accordingly.Let me know if any suggesstion


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/22511
  
Also cc @zsxwing @JoshRosen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r219729921
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -250,5 +265,22 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 def codegenToSeq(): Seq[(String, String)] = {
   org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
 }
+
+/**
+ * Dumps debug information about query execution into the specified 
file.
+ */
+def toFile(path: String): Unit = {
+  val filePath = new Path(path)
+  val fs = FileSystem.get(filePath.toUri, 
sparkSession.sessionState.newHadoopConf())
+  val writer = new OutputStreamWriter(fs.create(filePath))
--- End diff --

cc @zsxwing Could you help review this function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r219729889
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -250,5 +265,22 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 def codegenToSeq(): Seq[(String, String)] = {
   org.apache.spark.sql.execution.debug.codegenStringSeq(executedPlan)
 }
+
+/**
+ * Dumps debug information about query execution into the specified 
file.
+ */
+def toFile(path: String): Unit = {
+  val filePath = new Path(path)
+  val fs = FileSystem.get(filePath.toUri, 
sparkSession.sessionState.newHadoopConf())
--- End diff --

val fs = filePath.getFileSystem(spark.sessionState.newHadoopConf())


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22525: [SPARK-25503][CORE][WEBUI]Total task message in stage pa...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22525
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22488#discussion_r219729655
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/DatasetBenchmark.scala ---
@@ -242,75 +248,20 @@ object DatasetBenchmark {
 benchmark
   }
 
-  def main(args: Array[String]): Unit = {
-val spark = SparkSession.builder
-  .master("local[*]")
-  .appName("Dataset benchmark")
-  .getOrCreate()
+  val spark = SparkSession.builder
+.master("local[*]")
+.appName("Dataset benchmark")
+.getOrCreate()
--- End diff --

Can we move this SparkSession building part into `benchmark()` function and 
before `runBenchmark("Dataset Benchmark")`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.writeLega...

2018-09-23 Thread seancxmao
Github user seancxmao commented on the issue:

https://github.com/apache/spark/pull/22453
  
FYI. I had a brief survey on Parquet decimal support of computing engines 
at the time of writing. 

Hive
* [HIVE-19069](https://jira.apache.org/jira/browse/HIVE-19069) Hive can't 
read int32 and int64 Parquet decimal. Not resolved yet.

Impala:
* [IMPALA-5628](https://issues.apache.org/jira/browse/IMPALA-5628) Parquet 
support for additional valid decimal representations. This is an umbrella JIRA.
* [IMPALA-2494](https://issues.apache.org/jira/browse/IMPALA-2494) Impala 
Unable to scan a Decimal column stored as Bytes. Fix Version/s: Impala 2.11.0.
* [IMPALA-5542](https://issues.apache.org/jira/browse/IMPALA-5542) Impala 
cannot scan Parquet decimal stored as int64_t/int32_t. Fix Version/s: Impala 
3.1.0, not released yet.

Presto

* [issues/7232](https://github.com/prestodb/presto/issues/7232). Can't read 
decimal type in parquet files written by spark and referenced as external in 
the hive metastore
* [issues/7533](https://github.com/prestodb/presto/issues/7533). Improve 
decimal type support in the new Parquet reader. Fixed Version: 
[0.182](https://prestodb.io/docs/current/release/release-0.182.html)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22429: [SPARK-25440][SQL] Dumping query execution info t...

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22429#discussion_r219729210
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala
 ---
@@ -668,11 +670,19 @@ case class WholeStageCodegenExec(child: 
SparkPlan)(val codegenStageId: Int)
   override def generateTreeString(
   depth: Int,
   lastChildren: Seq[Boolean],
-  builder: StringBuilder,
+  writer: Writer,
   verbose: Boolean,
   prefix: String = "",
-  addSuffix: Boolean = false): StringBuilder = {
-child.generateTreeString(depth, lastChildren, builder, verbose, 
s"*($codegenStageId) ")
+  addSuffix: Boolean = false,
+  maxFields: Option[Int]): Unit = {
+child.generateTreeString(
+  depth,
+  lastChildren,
+  writer,
+  verbose,
+  s"*($codegenStageId) ",
+  false,
--- End diff --

named boolean : `addSuffix = false`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219729166
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

OK, I will update the doc and describe scenarios and reasons why we need 
this flag.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/22521
  
Jenkins, retest this please.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22433: [SPARK-25442][SQL][K8S] Support STS to run in k8s...

2018-09-23 Thread liyinan926
Github user liyinan926 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22433#discussion_r219728772
  
--- Diff: docs/running-on-kubernetes.md ---
@@ -340,6 +340,43 @@ RBAC authorization and how to configure Kubernetes 
service accounts for pods, pl
 [Using RBAC 
Authorization](https://kubernetes.io/docs/admin/authorization/rbac/) and
 [Configure Service Accounts for 
Pods](https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/).
 
+## Running Spark Thrift Server
+
+Thrift JDBC/ODBC Server (aka Spark Thrift Server or STS) is Spark SQL’s 
port of Apache Hive’s HiveServer2 that allows
+JDBC/ODBC clients to execute SQL queries over JDBC and ODBC protocols on 
Apache Spark.
+
+### Client Deployment Mode
+
+To start STS in client mode, excute the following command
+
+```bash
+$ sbin/start-thriftserver.sh \
+--master k8s://https://:
+```
+
+### Cluster Deployment Mode
+
+To start STS in cluster mode, excute the following command
+
+```bash
+$ sbin/start-thriftserver.sh \
+--master k8s://https://: \
+--deploy-mode cluster
+```
+
+The most basic workflow is to use the pod name (driver pod name incase of 
cluster mode and self pod name(pod/container from 
--- End diff --

The script may be run from a client machine outside a k8s cluster. In this 
case, there's not even a pod. I would suggest separating the explanation of the 
user flow details by the deploy mode (client vs cluster).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22521
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22521
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96495/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22521
  
**[Test build #96495 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96495/testReport)**
 for PR 22521 at commit 
[`4af98e7`](https://github.com/apache/spark/commit/4af98e76319cbb363b5646f3cde85a3eca12a6ef).
 * This patch **fails SparkR unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22466: [SPARK-25464][SQL]On dropping the Database it will drop ...

2018-09-23 Thread sandeep-katta
Github user sandeep-katta commented on the issue:

https://github.com/apache/spark/pull/22466
  
Yes I agree 2 database should not point to same path,**currently this is 
the loop hole in spark which is required to fix**.If this solution is not okay 
,then we can append the dbname.db to the location given by the user
for e.g
create database db1 location /user/hive/warehouse
then the location of the DB should be /user/hive/warehouse/db1.db




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22458: [SPARK-25459] Add viewOriginalText back to Catalo...

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/22458#discussion_r219727730
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala 
---
@@ -2348,4 +2348,17 @@ class HiveDDLSuite
   }
 }
   }
+
+  test("desc formatted table should also show viewOriginalText for views") 
{
+withView("v1") {
+  sql("CREATE VIEW v1 AS SELECT 1 AS value")
+  assert(sql("DESC FORMATTED v1").collect().containsSlice(
+Seq(
+  Row("Type", "VIEW", ""),
+  Row("View Text", "SELECT 1 AS value", ""),
+  Row("View Original Text:", "SELECT 1 AS value", "")
--- End diff --

To do that, maybe using the Hive client to create a view, instead of Spark


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [WIP][SPARK-25497][SQL] Limit operation within whole sta...

2018-09-23 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/22524
  
@xuanyuanking Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22326
  
**[Test build #96498 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96498/testReport)**
 for PR 22326 at commit 
[`caf6f94`](https://github.com/apache/spark/commit/caf6f94b980e877f02c57b9647bae7df5d4e16ae).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3397/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22326
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21999: [WIP][SQL] Flattening nested structures

2018-09-23 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/21999
  
Thanks! 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22326: [SPARK-25314][SQL] Fix Python UDF accessing attributes f...

2018-09-23 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22326
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22524: [WIP][SPARK-25497][SQL] Limit operation within whole sta...

2018-09-23 Thread xuanyuanking
Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/22524
  
@viirya As @shaneknapp reply in mail-list, you can try 
https://hadrian.ist.berkeley.edu/jenkins/. Thanks @shaneknapp :)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22525: [SPARK-25503][CORE][WEBUI]Total task message in s...

2018-09-23 Thread shahidki31
Github user shahidki31 commented on a diff in the pull request:

https://github.com/apache/spark/pull/22525#discussion_r219726139
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -132,7 +132,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val totalTasksNumStr = if (totalTasks == storedTasks) {
   s"$totalTasks"
 } else {
-  s"$storedTasks, showing ${totalTasks}"
+  s"$totalTasks, showing $storedTasks"
--- End diff --

Done. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSche...

2018-09-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22486


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22488: [SPARK-25479][TEST] Refactor DatasetBenchmark to use mai...

2018-09-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22488
  
@dongjoon-hyun I think this refactor is ready to go. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

2018-09-23 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r219725643
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait for micro benchmarks that are supposed to run 
standalone (i.e. not together
+ * with other test suites).
+ */
+trait RunBenchmarkWithCodegen extends BenchmarkBase {
+
+  val spark: SparkSession = getSparkSession
+
+  /** Subclass can override this function to build their own SparkSession 
*/
+  def getSparkSession: SparkSession = {
+SparkSession.builder()
+  .master("local[1]")
+  .appName(this.getClass.getCanonicalName)
+  .config(SQLConf.SHUFFLE_PARTITIONS.key, 1)
+  .config(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key, 1)
+  .getOrCreate()
+  }
+
+  /** Runs function `f` with whole stage codegen on and off. */
+  def runBenchmark(name: String, cardinality: Long)(f: => Unit): Unit = {
--- End diff --

How about `runBenchmark` -> `runSqlBaseBenchmark `?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-09-23 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r219725654
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala
 ---
@@ -17,22 +17,19 @@
 
 package org.apache.spark.sql
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- *  build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to 
"sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class  
+ * 2. build/sbt "sql/test:runMain "
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *Results will be written to 
"benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --

Thanks @dongjoon-hyun. Actually I'm waiting for 
https://github.com/apache/spark/pull/22484. I want to move  `withTempDir()` to  
`RunBenchmarkWithCodegen.scala`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22484: [SPARK-25476][TEST] Refactor AggregateBenchmark t...

2018-09-23 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/22484#discussion_r219725606
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/RunBenchmarkWithCodegen.scala
 ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import org.apache.spark.benchmark.{Benchmark, BenchmarkBase}
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Common base trait for micro benchmarks that are supposed to run 
standalone (i.e. not together
+ * with other test suites).
+ */
+trait RunBenchmarkWithCodegen extends BenchmarkBase {
--- End diff --

How about `RunBenchmarkWithCodegen` -> `SqlBaseBenchmarkBase`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22486
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22486
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22486
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96494/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22495: [SPARK-25486][TEST] Refactor SortBenchmark to use...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22495#discussion_r219725464
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/SortBenchmark.scala
 ---
@@ -28,12 +28,15 @@ import org.apache.spark.util.random.XORShiftRandom
 
 /**
  * Benchmark to measure performance for aggregate primitives.
- * To run this:
- *  build/sbt "sql/test-only *benchmark.SortBenchmark"
- *
- * Benchmarks in this file are skipped in normal builds.
+ * {{{
+ *   To run this benchmark:
+ *   1. without sbt: bin/spark-submit --class  
+ *   2. build/sbt "sql/test:runMain "
+ *   3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *  Results will be written to "benchmarks/-results.txt".
+ * }}}
  */
-class SortBenchmark extends BenchmarkWithCodegen {
+object SortBenchmark extends BenchmarkBase {
--- End diff --

@yucai . `BenchmarkWithCodegen` is different from `BenchmarkBase`. Can we 
keep `BenchmarkWithCodegen`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22486
  
**[Test build #96494 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96494/testReport)**
 for PR 22486 at commit 
[`9494afd`](https://github.com/apache/spark/commit/9494afd9e649751188b52fee5ac30d745985a03c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22511
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22501: [SPARK-25492][TEST] Refactor WideSchemaBenchmark ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22501#discussion_r219724989
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideSchemaBenchmark.scala
 ---
@@ -17,22 +17,19 @@
 
 package org.apache.spark.sql
 
-import java.io.{File, FileOutputStream, OutputStream}
+import java.io.File
 
-import org.scalatest.BeforeAndAfterEach
-
-import org.apache.spark.SparkFunSuite
-import org.apache.spark.sql.functions._
-import org.apache.spark.util.{Benchmark, Utils}
+import org.apache.spark.util.{Benchmark, BenchmarkBase => 
FileBenchmarkBase, Utils}
 
 /**
  * Benchmark for performance with very wide and nested DataFrames.
- * To run this:
- *  build/sbt "sql/test-only *WideSchemaBenchmark"
- *
- * Results will be written to 
"sql/core/benchmarks/WideSchemaBenchmark-results.txt".
+ * To run this benchmark:
+ * 1. without sbt: bin/spark-submit --class  
+ * 2. build/sbt "sql/test:runMain "
+ * 3. generate result: SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt 
"sql/test:runMain "
+ *Results will be written to 
"benchmarks/WideSchemaBenchmark-results.txt".
--- End diff --

Could you fix doc generation failure?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22511
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3396/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22511
  
**[Test build #96497 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96497/testReport)**
 for PR 22511 at commit 
[`aee82ab`](https://github.com/apache/spark/commit/aee82abe4cd9fbefa14fb280644276fe491bcf9a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22511: [SPARK-25422][CORE] Don't memory map blocks streamed to ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22511
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.writeLega...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22453
  
cc @jaceklaskowski FYI


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219722950
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

++1 for more information actually.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21524: [SPARK-24212][ML][doc] Add the example and user guide fo...

2018-09-23 Thread tengpeng
Github user tengpeng commented on the issue:

https://github.com/apache/spark/pull/21524
  
Yes, but may not be recently. Is there a "deadline" (e.g. branch cut)
coming?

On Tue, Sep 18, 2018 at 4:23 PM Sean Owen  wrote:

> @tengpeng  would you like to update this?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219722694
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

OK that sounds important to document. But the reasoning in this thread is 
also more useful information I think. Instead of describing it as a legacy 
format (implying it's not valid Parquet or something) and that it's required 
for Hive and Impala, can we mention or point to the specific reason that would 
cause you to need this? The value of the documentation here is in whether it 
helps the user know when to set it one way or the other.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22491
  
Thank you for pinging me, @wangyum .


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22530
  
**[Test build #96496 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96496/testReport)**
 for PR 22530 at commit 
[`9b1cc1d`](https://github.com/apache/spark/commit/9b1cc1d826cb89f0ed6021ae6c8cddc978c0173e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22530
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3395/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22530
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22491: [SPARK-25483][TEST] Refactor UnsafeArrayDataBenchmark to...

2018-09-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22491
  
@dongjoon-hyun This refactor is ready to go. Thanks.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22530: [SPARK-24869][SQL] Fix SaveIntoDataSourceCommand's input...

2018-09-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22530
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS so...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22529#discussion_r219722208
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -173,12 +173,16 @@ final class DataStreamReader 
private[sql](sparkSession: SparkSession) extends Lo
 }
 ds match {
   case s: MicroBatchReadSupport =>
+val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+  ds = s, conf = sparkSession.sessionState.conf)
+val options = sessionOptions ++ extraOptions
+val dataSourceOptions = new DataSourceOptions(options.asJava)
 var tempReader: MicroBatchReader = null
 val schema = try {
   tempReader = s.createMicroBatchReader(
 Optional.ofNullable(userSpecifiedSchema.orNull),
 Utils.createTempDir(namePrefix = 
s"temporaryReader").getCanonicalPath,
-options)
+dataSourceOptions)
--- End diff --

yup. the conflict looks mainly because of renaming.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread seancxmao
Github user seancxmao commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219721110
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

I'd like to add my 2 cents. We use both Spark and Hive in our Hadoop/Spark 
clusters. And we have 2 types of tables, working tables and target tables. 
Working tables are only used by Spark jobs, while target tables are populated 
by Spark and exposed to downstream jobs including Hive jobs. Our data engineers 
frequently meet with this issue when they use Hive to read target tables. 
Finally we decided to set spark.sql.parquet.writeLegacyFormat=true as the 
default value for target tables and explicitly describe this in our internal 
developer guide.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT whe...

2018-09-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22531


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/22531
  
Merged to master.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219719299
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

This is, of course, something we should remove in long term but my 
impression is that it's better to expose and explicitly mention we deprecate 
this later, and the remove it out.

I already argued a bit (for instance in SPARK-20297) to explain how to 
workaround and why it is. Was thinking it's better document this and reduce 
such overhead at least.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22531
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96493/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22531
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219719166
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

@srowen, actually, this configuration specifically related with 
compatibility with other systems like Impala (not only old Spark ones) where 
decimals are written based on fixed binary format (nowdays it's written in 
int-based in Spark). If this configurations is not enabled, they are unable to 
read what Spark wrote.

Given 
https://stackoverflow.com/questions/44279870/why-cant-impala-read-parquet-files-after-spark-sqls-write
 and JIRA like 
[SPARK-20297](https://issues.apache.org/jira/browse/SPARK-20297), I think this 
configuration is kind of important. I even expected more documentation about 
this configuration specifically at the first place.

Personally I have been thinking it would better to leave this configuration 
after 3.0 as well for better compatibility. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22531
  
**[Test build #96493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96493/testReport)**
 for PR 22531 at commit 
[`d138427`](https://github.com/apache/spark/commit/d138427f35d4980b263dfef21b8810fe455443ca).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22453: [SPARK-20937][DOCS] Describe spark.sql.parquet.wr...

2018-09-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22453#discussion_r219717918
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1002,6 +1002,15 @@ Configuration of Parquet can be done using the 
`setConf` method on `SparkSession
 
   
 
+
+  spark.sql.parquet.writeLegacyFormat
--- End diff --

This should go with the other parquet properties if anything, but, this one 
is so old I don't think it's worth documenting. It shouldn't be used today.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22463: remove annotation @Experimental

2018-09-23 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22463
  
Yeah, I think Experimental is over-used in the APIs. They just never get 
un-marked and lots of pretty old stuff, that de facto is just not changeable 
now, is still labeled this way. This seems to be more "DeveloperAPI" than 
"Experimental". Still, I also don't know the right answer for this code. Ping 
the author?

I figured we'd remove just about all current Experimental tags when, say, 
Spark 3 rolls around.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22473: [SPARK-25449][CORE] Heartbeat shouldn't include a...

2018-09-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22473#discussion_r219717779
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -83,6 +83,17 @@ package object config {
   private[spark] val EXECUTOR_CLASS_PATH =
 
ConfigBuilder(SparkLauncher.EXECUTOR_EXTRA_CLASSPATH).stringConf.createOptional
 
+  private[spark] val EXECUTOR_HEARTBEAT_DROP_ZERO_METRICS =
+
ConfigBuilder("spark.executor.heartbeat.dropZeroMetrics").booleanConf.createWithDefault(true)
--- End diff --

Question -- when would you not want this to be true? It's already changing 
behavior here, but what's the case where you need a safety valve to go back? 
it's just not broadcasting changes that can't matter because they're zero?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22520: [SPARK-25509][Core]Windows doesn't support POSIX ...

2018-09-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22520#discussion_r219717720
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -133,9 +133,15 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 
   // Visible for testing.
   private[history] val listing: KVStore = storePath.map { path =>
-val perms = PosixFilePermissions.fromString("rwx--")
-val dbPath = Files.createDirectories(new File(path, 
"listing.ldb").toPath(),
-  PosixFilePermissions.asFileAttribute(perms)).toFile()
+var dbPath : File = null
--- End diff --

Nit: no space before colon. Rather than make a var and assign to null, just 
assign `val dbPath = ...` to the result of the if statement.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22516: [SPARK-25468]Highlight current page index in the history...

2018-09-23 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22516
  
Rather than modify jquery, can we override this in Spark-specific CSS? 
otherwise we might lose customizations when updating jquery.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22521
  
**[Test build #96495 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96495/testReport)**
 for PR 22521 at commit 
[`4af98e7`](https://github.com/apache/spark/commit/4af98e76319cbb363b5646f3cde85a3eca12a6ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22521
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22521: [SPARK-24519] Compute SHUFFLE_MIN_NUM_PARTS_TO_HIGHLY_CO...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22521
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3394/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22486
  
**[Test build #96494 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96494/testReport)**
 for PR 22486 at commit 
[`9494afd`](https://github.com/apache/spark/commit/9494afd9e649751188b52fee5ac30d745985a03c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22486
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22486
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3393/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][SQL][TEST] Refactor CompressionSchemeBench...

2018-09-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22486
  
@dongjoon-hyun Thanks a lot.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22526: [SPARK-25502][WEBUI]Empty Page when page number e...

2018-09-23 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/22526#discussion_r219715489
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -685,7 +685,15 @@ private[ui] class TaskDataSource(
 
   private var _tasksToShow: Seq[TaskData] = null
 
-  override def dataSize: Int = taskCount(stage)
+  override def dataSize: Int = {
+val storedTasks = store.taskCount(stage.stageId, stage.attemptId).toInt
+val totalTasks = taskCount(stage)
+if (totalTasks > storedTasks) {
--- End diff --

Just write `math.min(storedTasks, totalTasks)`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22525: [SPARK-25503][WEBUI] Total task message in stage page is...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22525
  
Retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22525: [SPARK-25503][WEBUI] Total task message in stage ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22525#discussion_r219715130
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala ---
@@ -132,7 +132,7 @@ private[ui] class StagePage(parent: StagesTab, store: 
AppStatusStore) extends We
 val totalTasksNumStr = if (totalTasks == storedTasks) {
   s"$totalTasks"
 } else {
-  s"$storedTasks, showing ${totalTasks}"
+  s"$totalTasks, showing $storedTasks"
--- End diff --

Could you update the title to `[SPARK-25503][CORE][WEBUI] ...`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21747: [SPARK-24165][SQL][branch-2.3] Fixing conditional expres...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/21747
  
Hi, @mn-mikke and @cloud-fan and @maropu .
2.3.2 vote passed today and 2.4.0-rc1 doesn't have this issue. Given that 
Spark 2.4.0 will come faster than Spark 2.3.3, are we heading to (1) or (2)?
1. Mark this as resolved in 2.4.0 and close this PR?
2. Proceed this PR for 2.3.3?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22227: [SPARK-25202] [SQL] Implements split with limit s...

2018-09-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/7#discussion_r219714851
  
--- Diff: R/pkg/R/functions.R ---
@@ -3404,19 +3404,27 @@ setMethod("collect_set",
 #' Equivalent to \code{split} SQL function.
 #'
 #' @rdname column_string_functions
+#' @param limit determines the length of the returned array.
+#'  \itemize{
+#'  \item \code{limit > 0}: length of the array will be at 
most \code{limit}
+#'  \item \code{limit <= 0}: the returned array can have any 
length
+#'  }
+#'
 #' @aliases split_string split_string,Column-method
 #' @examples
 #'
 #' \dontrun{
 #' head(select(df, split_string(df$Sex, "a")))
 #' head(select(df, split_string(df$Class, "\\d")))
+#' head(select(df, split_string(df$Class, "\\d", 2)))
 #' # This is equivalent to the following SQL expression
 #' head(selectExpr(df, "split(Class, 'd')"))}
--- End diff --

good point - also the example should run in the order documented.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS sources d...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22529
  
cc @cloud-fan and @gatorsmile 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22529: [SPARK-25460][BRANCH-2.4][SS] DataSourceV2: SS so...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22529#discussion_r219714244
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamReader.scala 
---
@@ -173,12 +173,16 @@ final class DataStreamReader 
private[sql](sparkSession: SparkSession) extends Lo
 }
 ds match {
   case s: MicroBatchReadSupport =>
+val sessionOptions = DataSourceV2Utils.extractSessionConfigs(
+  ds = s, conf = sparkSession.sessionState.conf)
+val options = sessionOptions ++ extraOptions
+val dataSourceOptions = new DataSourceOptions(options.asJava)
 var tempReader: MicroBatchReader = null
 val schema = try {
   tempReader = s.createMicroBatchReader(
 Optional.ofNullable(userSpecifiedSchema.orNull),
 Utils.createTempDir(namePrefix = 
s"temporaryReader").getCanonicalPath,
-options)
+dataSourceOptions)
--- End diff --

So, this part is the difference, isn't it?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][TEST] Refactor CompressionSchemeBenchmark ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22486
  
Hi, @wangyum . If you don't mind, could you review and merge [my 
PR](https://github.com/wangyum/spark/pull/10) to your branch? The BM title is 
updated and latest OpenJDK on AWS is used.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22486: [SPARK-25478][TEST] Refactor CompressionSchemeBenchmark ...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22486
  
Could you add `[SQL]` at the title?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/96492/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/21688
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22531
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3392/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22531
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #21688: [SPARK-21809] : Change Stage Page to use datatables to s...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/21688
  
**[Test build #96492 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96492/testReport)**
 for PR 21688 at commit 
[`cbfbd07`](https://github.com/apache/spark/commit/cbfbd07a31960f6d49eb38c66d73ff776ecd0ffb).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22531
  
**[Test build #96493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/96493/testReport)**
 for PR 22531 at commit 
[`d138427`](https://github.com/apache/spark/commit/d138427f35d4980b263dfef21b8810fe455443ca).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22531: [SPARK-25415][SQL][FOLLOW-UP] Add Locale.ROOT when toUpp...

2018-09-23 Thread wangyum
Github user wangyum commented on the issue:

https://github.com/apache/spark/pull/22531
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22532: [SPARK-20845][SQL] Support specification of column names...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22532
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22532: [SPARK-20845][SQL] Support specification of column names...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22532
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22532: [SPARK-20845][SQL] Support specification of column names...

2018-09-23 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22532
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22532: [SPARK-20845][SQL] Support specification of colum...

2018-09-23 Thread misutoth
GitHub user misutoth opened a pull request:

https://github.com/apache/spark/pull/22532

[SPARK-20845][SQL] Support specification of column names in INSERT INTO 
command.

## What changes were proposed in this pull request?

One can specify a list of columns for an INSERT INTO command. The columns 
shall be listed in parenthesis just following the table name. Query columns are 
then matched to this very same order.

```
scala> sql("CREATE TABLE t (s string, i int)")
scala> sql("INSERT INTO t values ('first', 1)")
scala> sql("INSERT INTO t (i, s) values (2, 'second')")
scala> sql("SELECT * FROM t").show
+--+---+
| s|  i|
+--+---+
| first|  1|
|second|  2|
+--+---+


scala>
```

In the above example the _second_ insertion utilizes the new functionality. 
The number and its associated string is given in reverse order `(2, 'second')` 
according to the column list specified for the table `(i, s)`. The result can 
be seen at the end of the command list. Intermediate output of the commands are 
omitted for the sake of brevity.

## How was this patch tested?

InsertSuite (both in source and in hive sub-packages) were extended with 
tests exercising specification of column names listing in INSERT INTO commands.

Also ran the above sample, and ran tests in `sql`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/misutoth/spark insert-into-columns

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22532


commit 1dda672d336b906ecc133f468435b4cf38859e2d
Author: Mihaly Toth 
Date:   2018-03-20T06:13:01Z

[SPARK-20845][SQL] Support specification of column names in INSERT INTO 
command.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22486: [SPARK-25478][TEST] Refactor CompressionSchemeBen...

2018-09-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22486#discussion_r219711018
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/compression/CompressionSchemeBenchmark.scala
 ---
@@ -318,28 +229,17 @@ object CompressionSchemeBenchmark extends 
AllCompressionSchemes {
 }
 testData.rewind()
 
-// Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
-// STRING Encode:  Best/Avg Time(ms)Rate(M/s)  
 Per Row(ns)   Relative
-// 
---
-// PassThrough(1.000) 56 /   57   1197.9   
0.8   1.0X
-// RunLengthEncoding(0.893) 4892 / 4937 13.7   
   72.9   0.0X
-// DictionaryEncoding(0.167)2968 / 2992 22.6   
   44.2   0.0X
 runEncodeBenchmark("STRING Encode", iters, count, STRING, testData)
-
-// Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
-// STRING Decode:  Best/Avg Time(ms)Rate(M/s)  
 Per Row(ns)   Relative
-// 
---
-// PassThrough  2422 / 2449 27.7   
   36.1   1.0X
-// RunLengthEncoding2885 / 3018 23.3   
   43.0   0.8X
-// DictionaryEncoding   2716 / 2752 24.7   
   40.5   0.9X
 runDecodeBenchmark("STRING Decode", iters, count, STRING, testData)
   }
 
-  def main(args: Array[String]): Unit = {
-bitEncodingBenchmark(1024)
-shortEncodingBenchmark(1024)
-intEncodingBenchmark(1024)
-longEncodingBenchmark(1024)
-stringEncodingBenchmark(1024)
+  override def benchmark(): Unit = {
+runBenchmark("encoding benchmark") {
--- End diff --

How about `Compression Scheme Benchmark`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   >