spark git commit: [SPARK-20614][PROJECT INFRA] Use the same log4j configuration with Jenkins in AppVeyor
Repository: spark Updated Branches: refs/heads/master 5d75b14bf -> b433acae7 [SPARK-20614][PROJECT INFRA] Use the same log4j configuration with Jenkins in AppVeyor ## What changes were proposed in this pull request? Currently, there are flooding logs in AppVeyor (in the console). This has been fine because we can download all the logs. However, (given my observations so far), logs are truncated when there are too many. It has been grown recently and it started to get truncated. For example, see https://ci.appveyor.com/project/ApacheSoftwareFoundation/spark/build/1209-master Even after the log is downloaded, it looks truncated as below: ``` [00:44:21] 17/05/04 18:56:18 INFO TaskSetManager: Finished task 197.0 in stage 601.0 (TID 9211) in 0 ms on localhost (executor driver) (194/200) [00:44:21] 17/05/04 18:56:18 INFO Executor: Running task 199.0 in stage 601.0 (TID 9213) [00:44:21] 17/05/04 18:56:18 INFO Executor: Finished task 198.0 in stage 601.0 (TID 9212). 2473 bytes result sent to driver ... ``` Probably, it looks better to use the same log4j configuration that we are using for SparkR tests in Jenkins(please see https://github.com/apache/spark/blob/fc472bddd1d9c6a28e57e31496c0166777af597e/R/run-tests.sh#L26 and https://github.com/apache/spark/blob/fc472bddd1d9c6a28e57e31496c0166777af597e/R/log4j.properties) ``` # Set everything to be logged to the file target/unit-tests.log log4j.rootCategory=INFO, file log4j.appender.file=org.apache.log4j.FileAppender log4j.appender.file.append=true log4j.appender.file.file=R/target/unit-tests.log log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss.SSS} %t %p %c{1}: %m%n # Ignore messages below warning level from Jetty, because it's a bit verbose log4j.logger.org.eclipse.jetty=WARN org.eclipse.jetty.LEVEL=WARN ``` ## How was this patch tested? Manually tested with spark-test account - https://ci.appveyor.com/project/spark-test/spark/build/672-r-log4j (there is an example for flaky test here) - https://ci.appveyor.com/project/spark-test/spark/build/673-r-log4j (I re-ran the build). Author: hyukjinkwon Closes #17873 from HyukjinKwon/appveyor-reduce-logs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b433acae Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b433acae Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b433acae Branch: refs/heads/master Commit: b433acae74887e59f2e237a6284a4ae04fbbe854 Parents: 5d75b14 Author: hyukjinkwon Authored: Fri May 5 21:26:55 2017 -0700 Committer: Felix Cheung Committed: Fri May 5 21:26:55 2017 -0700 -- appveyor.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b433acae/appveyor.yml -- diff --git a/appveyor.yml b/appveyor.yml index bbb2758..4d31af7 100644 --- a/appveyor.yml +++ b/appveyor.yml @@ -49,7 +49,7 @@ build_script: - cmd: mvn -DskipTests -Psparkr -Phive -Phive-thriftserver package test_script: - - cmd: .\bin\spark-submit2.cmd --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R + - cmd: .\bin\spark-submit2.cmd --driver-java-options "-Dlog4j.configuration=file:///%CD:\=/%/R/log4j.properties" --conf spark.hadoop.fs.defaultFS="file:///" R\pkg\tests\run-all.R notifications: - provider: Email - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20208][DOCS][FOLLOW-UP] Add FP-Growth to SparkR programming guide
Repository: spark Updated Branches: refs/heads/branch-2.2 1d9b7a74a -> 423a78625 [SPARK-20208][DOCS][FOLLOW-UP] Add FP-Growth to SparkR programming guide ## What changes were proposed in this pull request? Add `spark.fpGrowth` to SparkR programming guide. ## How was this patch tested? Manual tests. Author: zero323 Closes #17775 from zero323/SPARK-20208-FOLLOW-UP. (cherry picked from commit ba7666274e71f1903e5050a5e53fbdcd21debde5) Signed-off-by: Felix Cheung Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/423a7862 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/423a7862 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/423a7862 Branch: refs/heads/branch-2.2 Commit: 423a78625620523ab6a51b2274548a985fc18ed0 Parents: 1d9b7a7 Author: zero323 Authored: Thu Apr 27 00:34:20 2017 -0700 Committer: Felix Cheung Committed: Fri May 5 20:55:44 2017 -0700 -- docs/sparkr.md | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/423a7862/docs/sparkr.md -- diff --git a/docs/sparkr.md b/docs/sparkr.md index 4039520..18db4cb 100644 --- a/docs/sparkr.md +++ b/docs/sparkr.md @@ -476,6 +476,10 @@ SparkR supports the following machine learning algorithms currently: * [`spark.als`](api/R/spark.als.html): [`Alternating Least Squares (ALS)`](ml-collaborative-filtering.html#collaborative-filtering) + Frequent Pattern Mining + +* [`spark.fpGrowth`](api/R/spark.fpGrowth.html) : [`FP-growth`](ml-frequent-pattern-mining.html#fp-growth) + Statistics * [`spark.kstest`](api/R/spark.kstest.html): `Kolmogorov-Smirnov Test` - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/master b31648c08 -> 5d75b14bf [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz Sompolski Closes #17875 from juliuszsompolski/SPARK-20616. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5d75b14b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5d75b14b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5d75b14b Branch: refs/heads/master Commit: 5d75b14bf0f4c1f0813287efaabf49797908ed55 Parents: b31648c Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:06 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5d75b14b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/branch-2.2 f59c74a94 -> 1d9b7a74a [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz Sompolski Closes #17875 from juliuszsompolski/SPARK-20616. (cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1d9b7a74 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1d9b7a74 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1d9b7a74 Branch: refs/heads/branch-2.2 Commit: 1d9b7a74a839021814ab28d3eba3636c64483130 Parents: f59c74a Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:13 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1d9b7a74/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch
Repository: spark Updated Branches: refs/heads/branch-2.1 704b249b6 -> a1112c615 [SPARK-20616] RuleExecutor logDebug of batch results should show diff to start of batch ## What changes were proposed in this pull request? Due to a likely typo, the logDebug msg printing the diff of query plans shows a diff to the initial plan, not diff to the start of batch. ## How was this patch tested? Now the debug message prints the diff between start and end of batch. Author: Juliusz Sompolski Closes #17875 from juliuszsompolski/SPARK-20616. (cherry picked from commit 5d75b14bf0f4c1f0813287efaabf49797908ed55) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a1112c61 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a1112c61 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a1112c61 Branch: refs/heads/branch-2.1 Commit: a1112c615b05d615048159c9d324aa10a4391d4e Parents: 704b249 Author: Juliusz Sompolski Authored: Fri May 5 15:31:06 2017 -0700 Committer: Reynold Xin Committed: Fri May 5 15:31:23 2017 -0700 -- .../scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a1112c61/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala index 6fc828f..85b368c 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/rules/RuleExecutor.scala @@ -122,7 +122,7 @@ abstract class RuleExecutor[TreeType <: TreeNode[_]] extends Logging { logDebug( s""" |=== Result of Batch ${batch.name} === - |${sideBySide(plan.treeString, curPlan.treeString).mkString("\n")} + |${sideBySide(batchStartPlan.treeString, curPlan.treeString).mkString("\n")} """.stripMargin) } else { logTrace(s"Batch ${batch.name} has no effect.") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20132][DOCS] Add documentation for column string functions
Repository: spark Updated Branches: refs/heads/branch-2.2 24fffacad -> f59c74a94 [SPARK-20132][DOCS] Add documentation for column string functions ## What changes were proposed in this pull request? Add docstrings to column.py for the Column functions `rlike`, `like`, `startswith`, and `endswith`. Pass these docstrings through `_bin_op` There may be a better place to put the docstrings. I put them immediately above the Column class. ## How was this patch tested? I ran `make html` on my local computer to remake the documentation, and verified that the html pages were displaying the docstrings correctly. I tried running `dev-tests`, and the formatting tests passed. However, my mvn build didn't work I think due to issues on my computer. These docstrings are my original work and free license. davies has done the most recent work reorganizing `_bin_op` Author: Michael Patterson Closes #17469 from map222/patterson-documentation. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f59c74a9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f59c74a9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f59c74a9 Branch: refs/heads/branch-2.2 Commit: f59c74a9460b0db4e6c3ecbe872e2eaadc43e2cc Parents: 24fffac Author: Michael Patterson Authored: Sat Apr 22 19:58:54 2017 -0700 Committer: Cheng Lian Committed: Fri May 5 13:26:49 2017 -0700 -- python/pyspark/sql/column.py | 70 +++ 1 file changed, 64 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f59c74a9/python/pyspark/sql/column.py -- diff --git a/python/pyspark/sql/column.py b/python/pyspark/sql/column.py index ec05c18..46c1707 100644 --- a/python/pyspark/sql/column.py +++ b/python/pyspark/sql/column.py @@ -250,11 +250,50 @@ class Column(object): raise TypeError("Column is not iterable") # string methods +_rlike_doc = """ +Return a Boolean :class:`Column` based on a regex match. + +:param other: an extended regex expression + +>>> df.filter(df.name.rlike('ice$')).collect() +[Row(age=2, name=u'Alice')] +""" +_like_doc = """ +Return a Boolean :class:`Column` based on a SQL LIKE match. + +:param other: a SQL LIKE pattern + +See :func:`rlike` for a regex version + +>>> df.filter(df.name.like('Al%')).collect() +[Row(age=2, name=u'Alice')] +""" +_startswith_doc = """ +Return a Boolean :class:`Column` based on a string match. + +:param other: string at end of line (do not use a regex `^`) + +>>> df.filter(df.name.startswith('Al')).collect() +[Row(age=2, name=u'Alice')] +>>> df.filter(df.name.startswith('^Al')).collect() +[] +""" +_endswith_doc = """ +Return a Boolean :class:`Column` based on matching end of string. + +:param other: string at end of line (do not use a regex `$`) + +>>> df.filter(df.name.endswith('ice')).collect() +[Row(age=2, name=u'Alice')] +>>> df.filter(df.name.endswith('ice$')).collect() +[] +""" + contains = _bin_op("contains") -rlike = _bin_op("rlike") -like = _bin_op("like") -startswith = _bin_op("startsWith") -endswith = _bin_op("endsWith") +rlike = ignore_unicode_prefix(_bin_op("rlike", _rlike_doc)) +like = ignore_unicode_prefix(_bin_op("like", _like_doc)) +startswith = ignore_unicode_prefix(_bin_op("startsWith", _startswith_doc)) +endswith = ignore_unicode_prefix(_bin_op("endsWith", _endswith_doc)) @ignore_unicode_prefix @since(1.3) @@ -303,8 +342,27 @@ class Column(object): desc = _unary_op("desc", "Returns a sort expression based on the" " descending order of the given column name.") -isNull = _unary_op("isNull", "True if the current expression is null.") -isNotNull = _unary_op("isNotNull", "True if the current expression is not null.") +_isNull_doc = """ +True if the current expression is null. Often combined with +:func:`DataFrame.filter` to select rows with null values. + +>>> from pyspark.sql import Row +>>> df2 = sc.parallelize([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]).toDF() +>>> df2.filter(df2.height.isNull()).collect() +[Row(height=None, name=u'Alice')] +""" +_isNotNull_doc = """ +True if the current expression is null. Often combined with +:func:`DataFrame.filter` to select rows with non-null values. + +>>> from pyspark.sql import Row +>>> df2 = sc.parallelize([Row(name=u'Tom', height=80), Row(name=u'Alice', height=None)]).toDF() +>>> df2.filter(df2.height.isNotNull()).collect() +[Row(height=80, name=u'Tom')] +""" + +isNull = ignore_unicode_prefix(_unary_op
spark git commit: [SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE
Repository: spark Updated Branches: refs/heads/master bd5788287 -> b31648c08 [SPARK-20557][SQL] Support for db column type TIMESTAMP WITH TIME ZONE ## What changes were proposed in this pull request? SparkSQL can now read from a database table with column type [TIMESTAMP WITH TIME ZONE](https://docs.oracle.com/javase/8/docs/api/java/sql/Types.html#TIMESTAMP_WITH_TIMEZONE). ## How was this patch tested? Tested against Oracle database. JoshRosen, you seem to know the class, would you look at this? Thanks! Author: Jannik Arndt Closes #17832 from JannikArndt/spark-20557-timestamp-with-timezone. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b31648c0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b31648c0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b31648c0 Branch: refs/heads/master Commit: b31648c081e8db34e0d6c71875318f7b0b047c8b Parents: bd57882 Author: Jannik Arndt Authored: Fri May 5 11:42:55 2017 -0700 Committer: Xiao Li Committed: Fri May 5 11:42:55 2017 -0700 -- .../apache/spark/sql/jdbc/OracleIntegrationSuite.scala | 13 + .../sql/execution/datasources/jdbc/JdbcUtils.scala | 3 +++ 2 files changed, 16 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b31648c0/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala -- diff --git a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala index 1bb89a3..85d4a4a 100644 --- a/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala +++ b/external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala @@ -70,6 +70,12 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo """.stripMargin.replaceAll("\n", " ")).executeUpdate() conn.commit() +conn.prepareStatement("CREATE TABLE ts_with_timezone (id NUMBER(10), t TIMESTAMP WITH TIME ZONE)") +.executeUpdate() +conn.prepareStatement("INSERT INTO ts_with_timezone VALUES (1, to_timestamp_tz('1999-12-01 11:00:00 UTC','-MM-DD HH:MI:SS TZR'))") +.executeUpdate() +conn.commit() + sql( s""" |CREATE TEMPORARY VIEW datetime @@ -185,4 +191,11 @@ class OracleIntegrationSuite extends DockerJDBCIntegrationSuite with SharedSQLCo sql("INSERT INTO TABLE datetime1 SELECT * FROM datetime where id = 1") checkRow(sql("SELECT * FROM datetime1 where id = 1").head()) } + + test("SPARK-20557: column type TIMEZONE with TIME STAMP should be recognized") { +val dfRead = sqlContext.read.jdbc(jdbcUrl, "ts_with_timezone", new Properties) +val rows = dfRead.collect() +val types = rows(0).toSeq.map(x => x.getClass.toString) +assert(types(1).equals("class java.sql.Timestamp")) + } } http://git-wip-us.apache.org/repos/asf/spark/blob/b31648c0/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala index 0183805..fb877d1 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala @@ -223,6 +223,9 @@ object JdbcUtils extends Logging { case java.sql.Types.STRUCT=> StringType case java.sql.Types.TIME => TimestampType case java.sql.Types.TIMESTAMP => TimestampType + case java.sql.Types.TIMESTAMP_WITH_TIMEZONE +=> TimestampType + case -101 => TimestampType // Value for Timestamp with Time Zone in Oracle case java.sql.Types.TINYINT => IntegerType case java.sql.Types.VARBINARY => BinaryType case java.sql.Types.VARCHAR => StringType - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load
Repository: spark Updated Branches: refs/heads/branch-2.1 2a7f5dae5 -> 704b249b6 [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load ## What changes were proposed in this pull request? I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout. This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #17863 from zsxwing/fix-kafka-flaky-test. (cherry picked from commit bd5788287957d8610a6d19c273b75bd4cdd2d166) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/704b249b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/704b249b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/704b249b Branch: refs/heads/branch-2.1 Commit: 704b249b6a3ea956086d6c6ef50da18e8228eeb4 Parents: 2a7f5da Author: Shixiong Zhu Authored: Fri May 5 11:08:26 2017 -0700 Committer: Shixiong Zhu Committed: Fri May 5 11:08:43 2017 -0700 -- .../test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/704b249b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala -- diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala index c2cbd86..4345f88 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala @@ -281,6 +281,7 @@ class KafkaTestUtils(withBrokerProps: Map[String, Object] = Map.empty) extends L props.put("log.flush.interval.messages", "1") props.put("replica.socket.timeout.ms", "1500") props.put("delete.topic.enable", "true") +props.put("offsets.topic.num.partitions", "1") props.putAll(withBrokerProps.asJava) props } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load
Repository: spark Updated Branches: refs/heads/branch-2.2 f71aea6a0 -> 24fffacad [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load ## What changes were proposed in this pull request? I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout. This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #17863 from zsxwing/fix-kafka-flaky-test. (cherry picked from commit bd5788287957d8610a6d19c273b75bd4cdd2d166) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/24fffaca Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/24fffaca Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/24fffaca Branch: refs/heads/branch-2.2 Commit: 24fffacad709c553e0f24ae12a8cca3ab980af3c Parents: f71aea6 Author: Shixiong Zhu Authored: Fri May 5 11:08:26 2017 -0700 Committer: Shixiong Zhu Committed: Fri May 5 11:08:32 2017 -0700 -- .../test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/24fffaca/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala -- diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala index 2ce2760..f86b8f5 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala @@ -292,6 +292,7 @@ class KafkaTestUtils(withBrokerProps: Map[String, Object] = Map.empty) extends L props.put("log.flush.interval.messages", "1") props.put("replica.socket.timeout.ms", "1500") props.put("delete.topic.enable", "true") +props.put("offsets.topic.num.partitions", "1") props.putAll(withBrokerProps.asJava) props } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load
Repository: spark Updated Branches: refs/heads/master 41439fd52 -> bd5788287 [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load ## What changes were proposed in this pull request? I checked the logs of https://amplab.cs.berkeley.edu/jenkins/job/spark-branch-2.2-test-maven-hadoop-2.7/47/ and found it took several seconds to create Kafka internal topic `__consumer_offsets`. As Kafka creates this topic lazily, the topic creation happens in the first test `deserialization of initial offset with Spark 2.1.0` and causes it timeout. This PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #17863 from zsxwing/fix-kafka-flaky-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd578828 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd578828 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd578828 Branch: refs/heads/master Commit: bd5788287957d8610a6d19c273b75bd4cdd2d166 Parents: 41439fd Author: Shixiong Zhu Authored: Fri May 5 11:08:26 2017 -0700 Committer: Shixiong Zhu Committed: Fri May 5 11:08:26 2017 -0700 -- .../test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bd578828/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala -- diff --git a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala index 2ce2760..f86b8f5 100644 --- a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala +++ b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala @@ -292,6 +292,7 @@ class KafkaTestUtils(withBrokerProps: Map[String, Object] = Map.empty) extends L props.put("log.flush.interval.messages", "1") props.put("replica.socket.timeout.ms", "1500") props.put("delete.topic.enable", "true") +props.put("offsets.topic.num.partitions", "1") props.putAll(withBrokerProps.asJava) props } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec
Repository: spark Updated Branches: refs/heads/master b9ad2d191 -> 41439fd52 [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec ## What changes were proposed in this pull request? ObjectHashAggregateExec is missing numOutputRows, add this metrics for it. ## How was this patch tested? Added unit tests for the new metrics. Author: Yucai Closes #17678 from yucai/objectAgg_numOutputRows. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/41439fd5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/41439fd5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/41439fd5 Branch: refs/heads/master Commit: 41439fd52dd263b9f7d92e608f027f193f461777 Parents: b9ad2d1 Author: Yucai Authored: Fri May 5 09:51:57 2017 -0700 Committer: Xiao Li Committed: Fri May 5 09:51:57 2017 -0700 -- .../aggregate/ObjectAggregationIterator.scala | 8 ++-- .../aggregate/ObjectHashAggregateExec.scala | 3 ++- .../sql/execution/metric/SQLMetricsSuite.scala| 18 ++ 3 files changed, 26 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/41439fd5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala index 3a7fcf1..6e47f9d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.expressions.codegen.{BaseOrdering, GenerateOrdering} import org.apache.spark.sql.execution.UnsafeKVExternalSorter +import org.apache.spark.sql.execution.metric.SQLMetric import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.StructType import org.apache.spark.unsafe.KVIterator @@ -39,7 +40,8 @@ class ObjectAggregationIterator( newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, originalInputAttributes: Seq[Attribute], inputRows: Iterator[InternalRow], -fallbackCountThreshold: Int) +fallbackCountThreshold: Int, +numOutputRows: SQLMetric) extends AggregationIterator( groupingExpressions, originalInputAttributes, @@ -83,7 +85,9 @@ class ObjectAggregationIterator( override final def next(): UnsafeRow = { val entry = aggBufferIterator.next() -generateOutput(entry.groupingKey, entry.aggregationBuffer) +val res = generateOutput(entry.groupingKey, entry.aggregationBuffer) +numOutputRows += 1 +res } /** http://git-wip-us.apache.org/repos/asf/spark/blob/41439fd5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala index 3fcb7ec..b53521b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala @@ -117,7 +117,8 @@ case class ObjectHashAggregateExec( newMutableProjection(expressions, inputSchema, subexpressionEliminationEnabled), child.output, iter, -fallbackCountThreshold) +fallbackCountThreshold, +numOutputRows) if (!hasInput && groupingExpressions.isEmpty) { numOutputRows += 1 Iterator.single[UnsafeRow](aggregationIterator.outputForEmptyGroupingKeyWithoutInput()) http://git-wip-us.apache.org/repos/asf/spark/blob/41439fd5/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala index 2ce7db6..e5442455 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala @@ -143,6 +143,24 @@ class SQLMetricsSuite extends Sp
spark git commit: [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec
Repository: spark Updated Branches: refs/heads/branch-2.2 1fa3c86a7 -> f71aea6a0 [SPARK-20381][SQL] Add SQL metrics of numOutputRows for ObjectHashAggregateExec ## What changes were proposed in this pull request? ObjectHashAggregateExec is missing numOutputRows, add this metrics for it. ## How was this patch tested? Added unit tests for the new metrics. Author: Yucai Closes #17678 from yucai/objectAgg_numOutputRows. (cherry picked from commit 41439fd52dd263b9f7d92e608f027f193f461777) Signed-off-by: Xiao Li Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f71aea6a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f71aea6a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f71aea6a Branch: refs/heads/branch-2.2 Commit: f71aea6a0be6eda24623d8563d971687ecd04caf Parents: 1fa3c86 Author: Yucai Authored: Fri May 5 09:51:57 2017 -0700 Committer: Xiao Li Committed: Fri May 5 09:52:10 2017 -0700 -- .../aggregate/ObjectAggregationIterator.scala | 8 ++-- .../aggregate/ObjectHashAggregateExec.scala | 3 ++- .../sql/execution/metric/SQLMetricsSuite.scala| 18 ++ 3 files changed, 26 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f71aea6a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala index 3a7fcf1..6e47f9d 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala @@ -24,6 +24,7 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.expressions.codegen.{BaseOrdering, GenerateOrdering} import org.apache.spark.sql.execution.UnsafeKVExternalSorter +import org.apache.spark.sql.execution.metric.SQLMetric import org.apache.spark.sql.internal.SQLConf import org.apache.spark.sql.types.StructType import org.apache.spark.unsafe.KVIterator @@ -39,7 +40,8 @@ class ObjectAggregationIterator( newMutableProjection: (Seq[Expression], Seq[Attribute]) => MutableProjection, originalInputAttributes: Seq[Attribute], inputRows: Iterator[InternalRow], -fallbackCountThreshold: Int) +fallbackCountThreshold: Int, +numOutputRows: SQLMetric) extends AggregationIterator( groupingExpressions, originalInputAttributes, @@ -83,7 +85,9 @@ class ObjectAggregationIterator( override final def next(): UnsafeRow = { val entry = aggBufferIterator.next() -generateOutput(entry.groupingKey, entry.aggregationBuffer) +val res = generateOutput(entry.groupingKey, entry.aggregationBuffer) +numOutputRows += 1 +res } /** http://git-wip-us.apache.org/repos/asf/spark/blob/f71aea6a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala index 3fcb7ec..b53521b 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala @@ -117,7 +117,8 @@ case class ObjectHashAggregateExec( newMutableProjection(expressions, inputSchema, subexpressionEliminationEnabled), child.output, iter, -fallbackCountThreshold) +fallbackCountThreshold, +numOutputRows) if (!hasInput && groupingExpressions.isEmpty) { numOutputRows += 1 Iterator.single[UnsafeRow](aggregationIterator.outputForEmptyGroupingKeyWithoutInput()) http://git-wip-us.apache.org/repos/asf/spark/blob/f71aea6a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala index 2ce7db6..e5442455 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsSuite.scala +++ b/sql/core/src/test/scala/org/apache/
spark git commit: [SPARK-20613] Remove excess quotes in Windows executable
Repository: spark Updated Branches: refs/heads/branch-2.1 179f5370e -> 2a7f5dae5 [SPARK-20613] Remove excess quotes in Windows executable ## What changes were proposed in this pull request? Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark. '""C:\Program' is not recognized as an internal or external command, operable program or batch file. ## How was this patch tested? Tested manually on Windows 10. Author: Jarrett Meyer Closes #17861 from jarrettmeyer/fix-windows-cmd. (cherry picked from commit b9ad2d1916af5091c8585d06ccad8219e437e2bc) Signed-off-by: Felix Cheung Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2a7f5dae Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2a7f5dae Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2a7f5dae Branch: refs/heads/branch-2.1 Commit: 2a7f5dae5906f01973d2744ab7b59b54e42f302b Parents: 179f537 Author: Jarrett Meyer Authored: Fri May 5 08:30:42 2017 -0700 Committer: Felix Cheung Committed: Fri May 5 08:35:19 2017 -0700 -- bin/spark-class2.cmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2a7f5dae/bin/spark-class2.cmd -- diff --git a/bin/spark-class2.cmd b/bin/spark-class2.cmd index 9faa7d6..f6157f4 100644 --- a/bin/spark-class2.cmd +++ b/bin/spark-class2.cmd @@ -51,7 +51,7 @@ if not "x%SPARK_PREPEND_CLASSES%"=="x" ( rem Figure out where java is. set RUNNER=java if not "x%JAVA_HOME%"=="x" ( - set RUNNER="%JAVA_HOME%\bin\java" + set RUNNER=%JAVA_HOME%\bin\java ) else ( where /q "%RUNNER%" if ERRORLEVEL 1 ( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20613] Remove excess quotes in Windows executable
Repository: spark Updated Branches: refs/heads/branch-2.2 dbb54a7b3 -> 1fa3c86a7 [SPARK-20613] Remove excess quotes in Windows executable ## What changes were proposed in this pull request? Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark. '""C:\Program' is not recognized as an internal or external command, operable program or batch file. ## How was this patch tested? Tested manually on Windows 10. Author: Jarrett Meyer Closes #17861 from jarrettmeyer/fix-windows-cmd. (cherry picked from commit b9ad2d1916af5091c8585d06ccad8219e437e2bc) Signed-off-by: Felix Cheung Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1fa3c86a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1fa3c86a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1fa3c86a Branch: refs/heads/branch-2.2 Commit: 1fa3c86a740e072957a2104dbd02ca3c158c508d Parents: dbb54a7 Author: Jarrett Meyer Authored: Fri May 5 08:30:42 2017 -0700 Committer: Felix Cheung Committed: Fri May 5 08:30:56 2017 -0700 -- bin/spark-class2.cmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/1fa3c86a/bin/spark-class2.cmd -- diff --git a/bin/spark-class2.cmd b/bin/spark-class2.cmd index 9faa7d6..f6157f4 100644 --- a/bin/spark-class2.cmd +++ b/bin/spark-class2.cmd @@ -51,7 +51,7 @@ if not "x%SPARK_PREPEND_CLASSES%"=="x" ( rem Figure out where java is. set RUNNER=java if not "x%JAVA_HOME%"=="x" ( - set RUNNER="%JAVA_HOME%\bin\java" + set RUNNER=%JAVA_HOME%\bin\java ) else ( where /q "%RUNNER%" if ERRORLEVEL 1 ( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20613] Remove excess quotes in Windows executable
Repository: spark Updated Branches: refs/heads/master 9064f1b04 -> b9ad2d191 [SPARK-20613] Remove excess quotes in Windows executable ## What changes were proposed in this pull request? Quotes are already added to the RUNNER variable on line 54. There is no need to put quotes on line 67. If you do, you will get an error when launching Spark. '""C:\Program' is not recognized as an internal or external command, operable program or batch file. ## How was this patch tested? Tested manually on Windows 10. Author: Jarrett Meyer Closes #17861 from jarrettmeyer/fix-windows-cmd. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b9ad2d19 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b9ad2d19 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b9ad2d19 Branch: refs/heads/master Commit: b9ad2d1916af5091c8585d06ccad8219e437e2bc Parents: 9064f1b Author: Jarrett Meyer Authored: Fri May 5 08:30:42 2017 -0700 Committer: Felix Cheung Committed: Fri May 5 08:30:42 2017 -0700 -- bin/spark-class2.cmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b9ad2d19/bin/spark-class2.cmd -- diff --git a/bin/spark-class2.cmd b/bin/spark-class2.cmd index 9faa7d6..f6157f4 100644 --- a/bin/spark-class2.cmd +++ b/bin/spark-class2.cmd @@ -51,7 +51,7 @@ if not "x%SPARK_PREPEND_CLASSES%"=="x" ( rem Figure out where java is. set RUNNER=java if not "x%JAVA_HOME%"=="x" ( - set RUNNER="%JAVA_HOME%\bin\java" + set RUNNER=%JAVA_HOME%\bin\java ) else ( where /q "%RUNNER%" if ERRORLEVEL 1 ( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20495][SQL][CORE] Add StorageLevel to cacheTable API
Repository: spark Updated Branches: refs/heads/master 5773ab121 -> 9064f1b04 [SPARK-20495][SQL][CORE] Add StorageLevel to cacheTable API ## What changes were proposed in this pull request? Currently cacheTable API only supports MEMORY_AND_DISK. This PR adds additional API to take different storage levels. ## How was this patch tested? unit tests Author: madhu Closes #17802 from phatak-dev/cacheTableAPI. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9064f1b0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9064f1b0 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9064f1b0 Branch: refs/heads/master Commit: 9064f1b04461513a147aeb8179471b05595ddbc4 Parents: 5773ab1 Author: madhu Authored: Fri May 5 22:44:03 2017 +0800 Committer: Wenchen Fan Committed: Fri May 5 22:44:03 2017 +0800 -- project/MimaExcludes.scala| 2 ++ .../scala/org/apache/spark/sql/catalog/Catalog.scala | 14 +- .../org/apache/spark/sql/internal/CatalogImpl.scala | 13 + .../org/apache/spark/sql/internal/CatalogSuite.scala | 8 4 files changed, 36 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9064f1b0/project/MimaExcludes.scala -- diff --git a/project/MimaExcludes.scala b/project/MimaExcludes.scala index dbf933f..d50882c 100644 --- a/project/MimaExcludes.scala +++ b/project/MimaExcludes.scala @@ -36,6 +36,8 @@ object MimaExcludes { // Exclude rules for 2.3.x lazy val v23excludes = v22excludes ++ Seq( +// [SPARK-20495][SQL] Add StorageLevel to cacheTable API + ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.sql.catalog.Catalog.cacheTable") ) // Exclude rules for 2.2.x http://git-wip-us.apache.org/repos/asf/spark/blob/9064f1b0/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala b/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala index 7e5da01..ab81725 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala @@ -22,7 +22,7 @@ import scala.collection.JavaConverters._ import org.apache.spark.annotation.{Experimental, InterfaceStability} import org.apache.spark.sql.{AnalysisException, DataFrame, Dataset} import org.apache.spark.sql.types.StructType - +import org.apache.spark.storage.StorageLevel /** * Catalog interface for Spark. To access this, use `SparkSession.catalog`. @@ -477,6 +477,18 @@ abstract class Catalog { def cacheTable(tableName: String): Unit /** + * Caches the specified table with the given storage level. + * + * @param tableName is either a qualified or unqualified name that designates a table/view. + * If no database identifier is provided, it refers to a temporary view or + * a table/view in the current database. + * @param storageLevel storage level to cache table. + * @since 2.3.0 + */ + def cacheTable(tableName: String, storageLevel: StorageLevel): Unit + + + /** * Removes the specified table from the in-memory cache. * * @param tableName is either a qualified or unqualified name that designates a table/view. http://git-wip-us.apache.org/repos/asf/spark/blob/9064f1b0/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala index 0b8e538..e1049c6 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/internal/CatalogImpl.scala @@ -30,6 +30,8 @@ import org.apache.spark.sql.catalyst.plans.logical.LocalRelation import org.apache.spark.sql.execution.command.AlterTableRecoverPartitionsCommand import org.apache.spark.sql.execution.datasources.{CreateTable, DataSource} import org.apache.spark.sql.types.StructType +import org.apache.spark.storage.StorageLevel + /** @@ -420,6 +422,17 @@ class CatalogImpl(sparkSession: SparkSession) extends Catalog { } /** + * Caches the specified table or view with the given storage level. + * + * @group cachemgmt + * @since 2.3.0 + */ + override def cacheTable(tableName: String, storageLevel: StorageLevel): Unit = { +sparkSession.sharedState.cacheManager.cacheQuery( + sparkSession.table(tableName), Some(tableName), storageLevel) + } +
spark git commit: [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
Repository: spark Updated Branches: refs/heads/branch-2.1 d10b0f654 -> 179f5370e [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode ## What changes were proposed in this pull request? Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error. ## How was this patch tested? Existing unit tests, manual spark-shell testing with posix mode on Author: jyu00 Closes #17852 from jyu00/master. (cherry picked from commit 5773ab121d5d7cbefeef17ff4ac6f8af36cc1251) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/179f5370 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/179f5370 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/179f5370 Branch: refs/heads/branch-2.1 Commit: 179f5370e68aa3c1f035f8ac400129c3935e96f8 Parents: d10b0f6 Author: jyu00 Authored: Fri May 5 11:36:51 2017 +0100 Committer: Sean Owen Committed: Fri May 5 11:37:10 2017 +0100 -- bin/spark-class | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/179f5370/bin/spark-class -- diff --git a/bin/spark-class b/bin/spark-class index 77ea40c..65d3b96 100755 --- a/bin/spark-class +++ b/bin/spark-class @@ -72,6 +72,8 @@ build_command() { printf "%d\0" $? } +# Turn off posix mode since it does not allow process substitution +set +o posix CMD=() while IFS= read -d '' -r ARG; do CMD+=("$ARG") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
Repository: spark Updated Branches: refs/heads/master 37cdf077c -> 5773ab121 [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode ## What changes were proposed in this pull request? Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error. ## How was this patch tested? Existing unit tests, manual spark-shell testing with posix mode on Author: jyu00 Closes #17852 from jyu00/master. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5773ab12 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5773ab12 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5773ab12 Branch: refs/heads/master Commit: 5773ab121d5d7cbefeef17ff4ac6f8af36cc1251 Parents: 37cdf07 Author: jyu00 Authored: Fri May 5 11:36:51 2017 +0100 Committer: Sean Owen Committed: Fri May 5 11:36:51 2017 +0100 -- bin/spark-class | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5773ab12/bin/spark-class -- diff --git a/bin/spark-class b/bin/spark-class index 77ea40c..65d3b96 100755 --- a/bin/spark-class +++ b/bin/spark-class @@ -72,6 +72,8 @@ build_command() { printf "%d\0" $? } +# Turn off posix mode since it does not allow process substitution +set +o posix CMD=() while IFS= read -d '' -r ARG; do CMD+=("$ARG") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode
Repository: spark Updated Branches: refs/heads/branch-2.2 7cb566abc -> dbb54a7b3 [SPARK-20546][DEPLOY] spark-class gets syntax error in posix mode ## What changes were proposed in this pull request? Updated spark-class to turn off posix mode so the process substitution doesn't cause a syntax error. ## How was this patch tested? Existing unit tests, manual spark-shell testing with posix mode on Author: jyu00 Closes #17852 from jyu00/master. (cherry picked from commit 5773ab121d5d7cbefeef17ff4ac6f8af36cc1251) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dbb54a7b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dbb54a7b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dbb54a7b Branch: refs/heads/branch-2.2 Commit: dbb54a7b39568cc9e8046a86113b98c3c69b7d11 Parents: 7cb566a Author: jyu00 Authored: Fri May 5 11:36:51 2017 +0100 Committer: Sean Owen Committed: Fri May 5 11:36:58 2017 +0100 -- bin/spark-class | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dbb54a7b/bin/spark-class -- diff --git a/bin/spark-class b/bin/spark-class index 77ea40c..65d3b96 100755 --- a/bin/spark-class +++ b/bin/spark-class @@ -72,6 +72,8 @@ build_command() { printf "%d\0" $? } +# Turn off posix mode since it does not allow process substitution +set +o posix CMD=() while IFS= read -d '' -r ARG; do CMD+=("$ARG") - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced
Repository: spark Updated Branches: refs/heads/branch-2.2 c8756288d -> 7cb566abc [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced ## What changes were proposed in this pull request? Replace the deprecated property name `fs.default.name` to `fs.defaultFS` that newly introduced. ## How was this patch tested? Existing tests Author: Yuming Wang Closes #17856 from wangyum/SPARK-19660. (cherry picked from commit 37cdf077cd3f436f777562df311e3827b0727ce7) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7cb566ab Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7cb566ab Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7cb566ab Branch: refs/heads/branch-2.2 Commit: 7cb566abc27d41d5816dee16c6ecb749da2adf46 Parents: c875628 Author: Yuming Wang Authored: Fri May 5 11:31:59 2017 +0100 Committer: Sean Owen Committed: Fri May 5 11:32:07 2017 +0100 -- .../spark/sql/execution/streaming/state/StateStoreSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7cb566ab/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala index ebb7422..cc09b2d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala @@ -314,7 +314,7 @@ class StateStoreSuite extends SparkFunSuite with BeforeAndAfter with PrivateMeth test("SPARK-19677: Committing a delta file atop an existing one should not fail on HDFS") { val conf = new Configuration() conf.set("fs.fake.impl", classOf[RenameLikeHDFSFileSystem].getName) -conf.set("fs.default.name", "fake:///") +conf.set("fs.defaultFS", "fake:///") val provider = newStoreProvider(hadoopConf = conf) provider.getStore(0).commit() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced
Repository: spark Updated Branches: refs/heads/master 4411ac705 -> 37cdf077c [SPARK-19660][SQL] Replace the deprecated property name fs.default.name to fs.defaultFS that newly introduced ## What changes were proposed in this pull request? Replace the deprecated property name `fs.default.name` to `fs.defaultFS` that newly introduced. ## How was this patch tested? Existing tests Author: Yuming Wang Closes #17856 from wangyum/SPARK-19660. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37cdf077 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37cdf077 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37cdf077 Branch: refs/heads/master Commit: 37cdf077cd3f436f777562df311e3827b0727ce7 Parents: 4411ac7 Author: Yuming Wang Authored: Fri May 5 11:31:59 2017 +0100 Committer: Sean Owen Committed: Fri May 5 11:31:59 2017 +0100 -- .../spark/sql/execution/streaming/state/StateStoreSuite.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/37cdf077/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala -- diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala index ebb7422..cc09b2d 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala @@ -314,7 +314,7 @@ class StateStoreSuite extends SparkFunSuite with BeforeAndAfter with PrivateMeth test("SPARK-19677: Committing a delta file atop an existing one should not fail on HDFS") { val conf = new Configuration() conf.set("fs.fake.impl", classOf[RenameLikeHDFSFileSystem].getName) -conf.set("fs.default.name", "fake:///") +conf.set("fs.defaultFS", "fake:///") val provider = newStoreProvider(hadoopConf = conf) provider.getStore(0).commit() - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [INFRA] Close stale PRs
Repository: spark Updated Branches: refs/heads/master 0d16faab9 -> 4411ac705 [INFRA] Close stale PRs ## What changes were proposed in this pull request? This PR proposes to close a stale PR, several PRs suggested to be closed by a committer and obviously inappropriate PRs. Closes #9 Closes #17853 Closes #17732 Closes #17456 Closes #17410 Closes #17314 Closes #17362 Closes #17542 ## How was this patch tested? N/A Author: hyukjinkwon Closes #17855 from HyukjinKwon/close-pr. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4411ac70 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4411ac70 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4411ac70 Branch: refs/heads/master Commit: 4411ac70524ced901f7807d492fb0ad2480a8841 Parents: 0d16faa Author: hyukjinkwon Authored: Fri May 5 09:50:40 2017 +0100 Committer: Sean Owen Committed: Fri May 5 09:50:40 2017 +0100 -- -- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org