[spark] branch master updated (4d770db -> eee3467)
This is an automated email from the ASF dual-hosted git repository. gurwls223 pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4d770db [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row add eee3467 [SPARK-27938][SQL] Remove feature flag LEGACY_PASS_PARTITION_BY_AS_OPTIONS No new revisions were added by this update. Summary of changes: .../org/apache/spark/sql/internal/SQLConf.scala| 9 - .../org/apache/spark/sql/DataFrameWriter.scala | 9 +++-- .../sql/test/DataFrameReaderWriterSuite.scala | 22 +++--- 3 files changed, 10 insertions(+), 30 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row
This is an automated email from the ASF dual-hosted git repository. meng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4d770db [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row 4d770db is described below commit 4d770db0eb252c56072f093eae318bad3d20b8d7 Author: Xiangrui Meng AuthorDate: Thu Jun 6 15:45:44 2019 -0700 [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row ## What changes were proposed in this pull request? Issued fixed in https://github.com/apache/spark/pull/24734 but that PR might takes longer to merge. ## How was this patch tested? It should pass existing unit tests. Closes #24816 from mengxr/SPARK-27968. Authored-by: Xiangrui Meng Signed-off-by: Xiangrui Meng --- .../sql/execution/python/ArrowEvalPythonExec.scala | 27 -- 1 file changed, 5 insertions(+), 22 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala index 000ae97..73a43af 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala @@ -86,28 +86,11 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], resultAttrs: Seq[Attribute] sessionLocalTimeZone, pythonRunnerConf).compute(batchIter, context.partitionId(), context) -new Iterator[InternalRow] { - - private var currentIter = if (columnarBatchIter.hasNext) { -val batch = columnarBatchIter.next() -val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType()) -assert(outputTypes == actualDataTypes, "Invalid schema from pandas_udf: " + - s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}") -batch.rowIterator.asScala - } else { -Iterator.empty - } - - override def hasNext: Boolean = currentIter.hasNext || { -if (columnarBatchIter.hasNext) { - currentIter = columnarBatchIter.next().rowIterator.asScala - hasNext -} else { - false -} - } - - override def next(): InternalRow = currentIter.next() +columnarBatchIter.flatMap { batch => + val actualDataTypes = (0 until batch.numCols()).map(i => batch.column(i).dataType()) + assert(outputTypes == actualDataTypes, "Invalid schema from pandas_udf: " + +s"expected ${outputTypes.mkString(", ")}, got ${actualDataTypes.mkString(", ")}") + batch.rowIterator.asScala } } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount
This is an automated email from the ASF dual-hosted git repository. tgraves pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d30284b [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount d30284b is described below commit d30284b5a51dd784f663eb4eea37087b35a54d00 Author: Thomas Graves AuthorDate: Thu Jun 6 14:16:05 2019 -0500 [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount ## What changes were proposed in this pull request? Change the resource config spark.{executor/driver}.resource.{resourceName}.count to .amount to allow future usage of containing both a count and a unit. Right now we only support counts - # of gpus for instance, but in the future we may want to support units for things like memory - 25G. I think making the user only have to specify a single config .amount is better then making them specify 2 separate configs of a .count and then a .unit. Change it now since its a user facing config. Amount also matches how the spark on yarn configs are setup. ## How was this patch tested? Unit tests and manually verified on yarn and local cluster mode Closes #24810 from tgravescs/SPARK-27760-amount. Authored-by: Thomas Graves Signed-off-by: Thomas Graves --- .../main/scala/org/apache/spark/SparkConf.scala| 4 +-- .../main/scala/org/apache/spark/SparkContext.scala | 12 .../main/scala/org/apache/spark/TestUtils.scala| 2 +- .../executor/CoarseGrainedExecutorBackend.scala| 2 +- .../org/apache/spark/internal/config/package.scala | 2 +- .../org/apache/spark/ResourceDiscovererSuite.scala | 2 +- .../scala/org/apache/spark/SparkConfSuite.scala| 8 ++--- .../scala/org/apache/spark/SparkContextSuite.scala | 24 +++ .../CoarseGrainedExecutorBackendSuite.scala| 26 .../CoarseGrainedSchedulerBackendSuite.scala | 2 +- .../spark/scheduler/TaskSchedulerImplSuite.scala | 4 +-- docs/configuration.md | 14 - .../apache/spark/deploy/k8s/KubernetesUtils.scala | 4 +-- .../k8s/features/BasicDriverFeatureStepSuite.scala | 2 +- .../features/BasicExecutorFeatureStepSuite.scala | 4 +-- .../spark/deploy/yarn/ResourceRequestHelper.scala | 8 ++--- .../spark/deploy/yarn/YarnSparkHadoopUtil.scala| 2 +- .../org/apache/spark/deploy/yarn/ClientSuite.scala | 36 ++ .../spark/deploy/yarn/YarnAllocatorSuite.scala | 4 +-- 19 files changed, 93 insertions(+), 69 deletions(-) diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala b/core/src/main/scala/org/apache/spark/SparkConf.scala index 227f4a5..e231a40 100644 --- a/core/src/main/scala/org/apache/spark/SparkConf.scala +++ b/core/src/main/scala/org/apache/spark/SparkConf.scala @@ -512,8 +512,8 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging with Seria */ private[spark] def getTaskResourceRequirements(): Map[String, Int] = { getAllWithPrefix(SPARK_TASK_RESOURCE_PREFIX) - .withFilter { case (k, v) => k.endsWith(SPARK_RESOURCE_COUNT_SUFFIX)} - .map { case (k, v) => (k.dropRight(SPARK_RESOURCE_COUNT_SUFFIX.length), v.toInt)}.toMap + .withFilter { case (k, v) => k.endsWith(SPARK_RESOURCE_AMOUNT_SUFFIX)} + .map { case (k, v) => (k.dropRight(SPARK_RESOURCE_AMOUNT_SUFFIX.length), v.toInt)}.toMap } /** diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 66f8f41..c169842 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -391,7 +391,7 @@ class SparkContext(config: SparkConf) extends Logging { } // verify the resources we discovered are what the user requested val driverReqResourcesAndCounts = - SparkConf.getConfigsWithSuffix(allDriverResourceConfs, SPARK_RESOURCE_COUNT_SUFFIX).toMap + SparkConf.getConfigsWithSuffix(allDriverResourceConfs, SPARK_RESOURCE_AMOUNT_SUFFIX).toMap ResourceDiscoverer.checkActualResourcesMeetRequirements(driverReqResourcesAndCounts, _resources) logInfo("===") @@ -2725,7 +2725,7 @@ object SparkContext extends Logging { // executor and resources required by each task. val taskResourcesAndCount = sc.conf.getTaskResourceRequirements() val executorResourcesAndCounts = sc.conf.getAllWithPrefixAndSuffix( -SPARK_EXECUTOR_RESOURCE_PREFIX, SPARK_RESOURCE_COUNT_SUFFIX).toMap +SPARK_EXECUTOR_RESOURCE_PREFIX, SPARK_RESOURCE_AMOUNT_SUFFIX).toMap var numSlots = execCores / taskCores var limitingResourceName = "CPU"
[spark] branch master updated: [SPARK-27918][SQL] Port boolean.sql
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new eadb538 [SPARK-27918][SQL] Port boolean.sql eadb538 is described below commit eadb53824d08480131498c7eb5bd7674f48b62c7 Author: Yuming Wang AuthorDate: Thu Jun 6 10:57:10 2019 -0700 [SPARK-27918][SQL] Port boolean.sql ## What changes were proposed in this pull request? This PR is to port boolean.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out When porting the test cases, found two PostgreSQL specific features that do not exist in Spark SQL: - [SPARK-27931](https://issues.apache.org/jira/browse/SPARK-27931): Accept 'on' and 'off' as input for boolean data type / Trim the string when cast to boolean type / Accept unique prefixes thereof - [SPARK-27924](https://issues.apache.org/jira/browse/SPARK-27924): Support E061-14: Search Conditions Also, found an inconsistent behavior: - [SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): Unsupported input throws an exception in PostgreSQL but Spark accepts it and sets the value to `NULL`, for example: ```sql SELECT bool 'test' AS error; -- SELECT boolean('test') AS error; ``` ## How was this patch tested? N/A Closes #24767 from wangyum/SPARK-27918. Authored-by: Yuming Wang Signed-off-by: gatorsmile --- .../resources/sql-tests/inputs/pgSQL/boolean.sql | 284 .../sql-tests/results/pgSQL/boolean.sql.out| 741 + .../org/apache/spark/sql/SQLQueryTestSuite.scala | 10 + 3 files changed, 1035 insertions(+) diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql new file mode 100644 index 000..8ba6f97 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql @@ -0,0 +1,284 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- BOOLEAN +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql + +-- +-- sanity check - if this fails go insane! +-- +SELECT 1 AS one; + + +-- **testing built-in type bool + +-- check bool input syntax + +SELECT true AS true; + +SELECT false AS false; + +SELECT boolean('t') AS true; + +-- [SPARK-27931] Trim the string when cast string type to boolean type +SELECT boolean(' f ') AS false; + +SELECT boolean('true') AS true; + +-- [SPARK-27923] PostgreSQL does not accept 'test' but Spark SQL accepts it and sets it to NULL +SELECT boolean('test') AS error; + +SELECT boolean('false') AS false; + +-- [SPARK-27923] PostgreSQL does not accept 'foo' but Spark SQL accepts it and sets it to NULL +SELECT boolean('foo') AS error; + +SELECT boolean('y') AS true; + +SELECT boolean('yes') AS true; + +-- [SPARK-27923] PostgreSQL does not accept 'yeah' but Spark SQL accepts it and sets it to NULL +SELECT boolean('yeah') AS error; + +SELECT boolean('n') AS false; + +SELECT boolean('no') AS false; + +-- [SPARK-27923] PostgreSQL does not accept 'nay' but Spark SQL accepts it and sets it to NULL +SELECT boolean('nay') AS error; + +-- [SPARK-27931] Accept 'on' and 'off' as input for boolean data type +SELECT boolean('on') AS true; + +SELECT boolean('off') AS false; + +-- [SPARK-27931] Accept unique prefixes thereof +SELECT boolean('of') AS false; + +-- [SPARK-27923] PostgreSQL does not accept 'o' but Spark SQL accepts it and sets it to NULL +SELECT boolean('o') AS error; + +-- [SPARK-27923] PostgreSQL does not accept 'on_' but Spark SQL accepts it and sets it to NULL +SELECT boolean('on_') AS error; + +-- [SPARK-27923] PostgreSQL does not accept 'off_' but Spark SQL accepts it and sets it to NULL +SELECT boolean('off_') AS error; + +SELECT boolean('1') AS true; + +-- [SPARK-27923] PostgreSQL does not accept '11' but Spark SQL accepts it and sets it to NULL +SELECT boolean('11') AS error; + +SELECT boolean('0') AS false; + +-- [SPARK-27923] PostgreSQL does not accept '000' but Spark SQL accepts it and sets it to NULL +SELECT boolean('000') AS error; + +-- [SPARK-27923] PostgreSQL does not accept '' but Spark SQL accepts it and sets it to NULL +SELECT boolean('') AS error; + +-- and, or, not in qualifications + +SELECT boolean('t') or boolean('f') AS true; + +SELECT boolean('t') and boolean('f') AS false; + +SELECT not boolean('f') AS true; + +SELECT boolean('t') = boolean('f') AS false; + +SELECT boolean('t') <> boolean('f') AS true; + +SELECT boolean('t') > boolean('f') AS true; + +SELECT
[spark] branch master updated: [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]
This is an automated email from the ASF dual-hosted git repository. lixiao pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 4de9649 [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2] 4de9649 is described below commit 4de96493ae1595bf6a80596c99df0e003ef0cf7d Author: Yuming Wang AuthorDate: Thu Jun 6 09:28:59 2019 -0700 [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2] ## What changes were proposed in this pull request? This PR is to port AGGREGATES.sql from PostgreSQL regression tests. https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350 The expected results can be found in the link: https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984 When porting the test cases, found four PostgreSQL specific features that do not exist in Spark SQL: - [SPARK-27877](https://issues.apache.org/jira/browse/SPARK-27877): Implement SQL-standard LATERAL subqueries - [SPARK-27878](https://issues.apache.org/jira/browse/SPARK-27878): Support ARRAY(sub-SELECT) expressions - [SPARK-27879](https://issues.apache.org/jira/browse/SPARK-27879): Implement bitwise integer aggregates(BIT_AND and BIT_OR) - [SPARK-27880](https://issues.apache.org/jira/browse/SPARK-27880): Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY) ## How was this patch tested? N/A Closes #24743 from wangyum/SPARK-27883. Authored-by: Yuming Wang Signed-off-by: gatorsmile --- .../sql-tests/inputs/pgSQL/aggregates_part1.sql| 2 +- .../sql-tests/inputs/pgSQL/aggregates_part2.sql| 228 + .../results/pgSQL/aggregates_part2.sql.out | 162 +++ 3 files changed, 391 insertions(+), 1 deletion(-) diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql index de7bbda..a81eca2 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql @@ -3,7 +3,7 @@ -- -- -- AGGREGATES [Part 1] --- https://github.com/postgres/postgres/blob/02ddd499322ab6f2f0d58692955dc9633c2150fc/src/test/regress/sql/aggregates.sql#L1-L143 +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L1-L143 -- avoid bit-exact output here because operations may not be bit-exact. -- SET extra_float_digits = 0; diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql new file mode 100644 index 000..c461370 --- /dev/null +++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql @@ -0,0 +1,228 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- AGGREGATES [Part 2] +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350 + +create temporary view int4_tbl as select * from values + (0), + (123456), + (-123456), + (2147483647), + (-2147483647) + as int4_tbl(f1); + +-- Test handling of Params within aggregate arguments in hashed aggregation. +-- Per bug report from Jeevan Chalke. +-- [SPARK-27877] Implement SQL-standard LATERAL subqueries +-- explain (verbose, costs off) +-- select s1, s2, sm +-- from generate_series(1, 3) s1, +-- lateral (select s2, sum(s1 + s2) sm +-- from generate_series(1, 3) s2 group by s2) ss +-- order by 1, 2; +-- select s1, s2, sm +-- from generate_series(1, 3) s1, +-- lateral (select s2, sum(s1 + s2) sm +-- from generate_series(1, 3) s2 group by s2) ss +-- order by 1, 2; + +-- [SPARK-27878] Support ARRAY(sub-SELECT) expressions +-- explain (verbose, costs off) +-- select array(select sum(x+y) s +-- from generate_series(1,3) y group by y order by s) +-- from generate_series(1,3) x; +-- select array(select sum(x+y) s +-- from generate_series(1,3) y group by y order by s) +-- from generate_series(1,3) x; + +-- [SPARK-27879] Implement bitwise integer aggregates(BIT_AND and BIT_OR) +-- +-- test for bitwise integer aggregates +-- +-- CREATE TEMPORARY TABLE bitwise_test( +-- i2 INT2, +-- i4 INT4, +-- i8 INT8, +-- i INTEGER, +-- x INT2, +-- y BIT(4) +-- ); + +-- empty case +-- SELECT +-- BIT_AND(i2) AS "?", +-- BIT_OR(i4) AS "?" +-- FROM bitwise_test; + +-- COPY bitwise_test FROM STDIN NULL 'null'; +-- 1 1 1 1 1 B0101 +-- 3 3 3 null2 B0100 +-- 7 7 7 3 4 B1100 +-- \. + +-- SELECT +-- BIT_AND(i2) AS "1", +-- BIT_AND(i4) AS "1", +-- BIT_AND(i8) AS "1", +-- BIT_AND(i) AS "?", +-- BIT_AND(x)