date:20190606

[spark] branch master updated (4d770db -> eee3467)

2019-06-06 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 4d770db  [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly 
read the first row
 add eee3467  [SPARK-27938][SQL] Remove feature flag 
LEGACY_PASS_PARTITION_BY_AS_OPTIONS

No new revisions were added by this update.

Summary of changes:
 .../org/apache/spark/sql/internal/SQLConf.scala|  9 -
 .../org/apache/spark/sql/DataFrameWriter.scala |  9 +++--
 .../sql/test/DataFrameReaderWriterSuite.scala  | 22 +++---
 3 files changed, 10 insertions(+), 30 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row

2019-06-06 Thread meng

This is an automated email from the ASF dual-hosted git repository.

meng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4d770db  [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly 
read the first row
4d770db is described below

commit 4d770db0eb252c56072f093eae318bad3d20b8d7
Author: Xiangrui Meng 
AuthorDate: Thu Jun 6 15:45:44 2019 -0700

[SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first 
row

## What changes were proposed in this pull request?

Issued fixed in https://github.com/apache/spark/pull/24734 but that PR 
might takes longer to merge.

## How was this patch tested?

It should pass existing unit tests.

Closes #24816 from mengxr/SPARK-27968.

Authored-by: Xiangrui Meng 
Signed-off-by: Xiangrui Meng 
---
 .../sql/execution/python/ArrowEvalPythonExec.scala | 27 --
 1 file changed, 5 insertions(+), 22 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
index 000ae97..73a43af 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala
@@ -86,28 +86,11 @@ case class ArrowEvalPythonExec(udfs: Seq[PythonUDF], 
resultAttrs: Seq[Attribute]
   sessionLocalTimeZone,
   pythonRunnerConf).compute(batchIter, context.partitionId(), context)
 
-new Iterator[InternalRow] {
-
-  private var currentIter = if (columnarBatchIter.hasNext) {
-val batch = columnarBatchIter.next()
-val actualDataTypes = (0 until batch.numCols()).map(i => 
batch.column(i).dataType())
-assert(outputTypes == actualDataTypes, "Invalid schema from 
pandas_udf: " +
-  s"expected ${outputTypes.mkString(", ")}, got 
${actualDataTypes.mkString(", ")}")
-batch.rowIterator.asScala
-  } else {
-Iterator.empty
-  }
-
-  override def hasNext: Boolean = currentIter.hasNext || {
-if (columnarBatchIter.hasNext) {
-  currentIter = columnarBatchIter.next().rowIterator.asScala
-  hasNext
-} else {
-  false
-}
-  }
-
-  override def next(): InternalRow = currentIter.next()
+columnarBatchIter.flatMap { batch =>
+  val actualDataTypes = (0 until batch.numCols()).map(i => 
batch.column(i).dataType())
+  assert(outputTypes == actualDataTypes, "Invalid schema from pandas_udf: 
" +
+s"expected ${outputTypes.mkString(", ")}, got 
${actualDataTypes.mkString(", ")}")
+  batch.rowIterator.asScala
 }
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount

2019-06-06 Thread tgraves

This is an automated email from the ASF dual-hosted git repository.

tgraves pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d30284b  [SPARK-27760][CORE] Spark resources - change user resource 
config from .count to .amount
d30284b is described below

commit d30284b5a51dd784f663eb4eea37087b35a54d00
Author: Thomas Graves 
AuthorDate: Thu Jun 6 14:16:05 2019 -0500

[SPARK-27760][CORE] Spark resources - change user resource config from 
.count to .amount

## What changes were proposed in this pull request?

Change the resource config 
spark.{executor/driver}.resource.{resourceName}.count to .amount to allow 
future usage of containing both a count and a unit.  Right now we only support 
counts - # of gpus for instance, but in the future we may want to support units 
for things like memory - 25G. I think making the user only have to specify a 
single config .amount is better then making them specify 2 separate configs of 
a .count and then a .unit.  Change it now since its a user facing config.

Amount also matches how the spark on yarn configs are setup.

## How was this patch tested?

Unit tests and manually verified on yarn and local cluster mode

Closes #24810 from tgravescs/SPARK-27760-amount.

Authored-by: Thomas Graves 
Signed-off-by: Thomas Graves 
---
 .../main/scala/org/apache/spark/SparkConf.scala|  4 +--
 .../main/scala/org/apache/spark/SparkContext.scala | 12 
 .../main/scala/org/apache/spark/TestUtils.scala|  2 +-
 .../executor/CoarseGrainedExecutorBackend.scala|  2 +-
 .../org/apache/spark/internal/config/package.scala |  2 +-
 .../org/apache/spark/ResourceDiscovererSuite.scala |  2 +-
 .../scala/org/apache/spark/SparkConfSuite.scala|  8 ++---
 .../scala/org/apache/spark/SparkContextSuite.scala | 24 +++
 .../CoarseGrainedExecutorBackendSuite.scala| 26 
 .../CoarseGrainedSchedulerBackendSuite.scala   |  2 +-
 .../spark/scheduler/TaskSchedulerImplSuite.scala   |  4 +--
 docs/configuration.md  | 14 -
 .../apache/spark/deploy/k8s/KubernetesUtils.scala  |  4 +--
 .../k8s/features/BasicDriverFeatureStepSuite.scala |  2 +-
 .../features/BasicExecutorFeatureStepSuite.scala   |  4 +--
 .../spark/deploy/yarn/ResourceRequestHelper.scala  |  8 ++---
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala|  2 +-
 .../org/apache/spark/deploy/yarn/ClientSuite.scala | 36 ++
 .../spark/deploy/yarn/YarnAllocatorSuite.scala |  4 +--
 19 files changed, 93 insertions(+), 69 deletions(-)

diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index 227f4a5..e231a40 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -512,8 +512,8 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable 
with Logging with Seria
*/
   private[spark] def getTaskResourceRequirements(): Map[String, Int] = {
 getAllWithPrefix(SPARK_TASK_RESOURCE_PREFIX)
-  .withFilter { case (k, v) => k.endsWith(SPARK_RESOURCE_COUNT_SUFFIX)}
-  .map { case (k, v) => (k.dropRight(SPARK_RESOURCE_COUNT_SUFFIX.length), 
v.toInt)}.toMap
+  .withFilter { case (k, v) => k.endsWith(SPARK_RESOURCE_AMOUNT_SUFFIX)}
+  .map { case (k, v) => (k.dropRight(SPARK_RESOURCE_AMOUNT_SUFFIX.length), 
v.toInt)}.toMap
   }
 
   /**
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 66f8f41..c169842 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -391,7 +391,7 @@ class SparkContext(config: SparkConf) extends Logging {
 }
 // verify the resources we discovered are what the user requested
 val driverReqResourcesAndCounts =
-  SparkConf.getConfigsWithSuffix(allDriverResourceConfs, 
SPARK_RESOURCE_COUNT_SUFFIX).toMap
+  SparkConf.getConfigsWithSuffix(allDriverResourceConfs, 
SPARK_RESOURCE_AMOUNT_SUFFIX).toMap
 
ResourceDiscoverer.checkActualResourcesMeetRequirements(driverReqResourcesAndCounts,
 _resources)
 
 
logInfo("===")
@@ -2725,7 +2725,7 @@ object SparkContext extends Logging {
   // executor and resources required by each task.
   val taskResourcesAndCount = sc.conf.getTaskResourceRequirements()
   val executorResourcesAndCounts = sc.conf.getAllWithPrefixAndSuffix(
-SPARK_EXECUTOR_RESOURCE_PREFIX, SPARK_RESOURCE_COUNT_SUFFIX).toMap
+SPARK_EXECUTOR_RESOURCE_PREFIX, SPARK_RESOURCE_AMOUNT_SUFFIX).toMap
   var numSlots = execCores / taskCores
   var limitingResourceName = "CPU"

[spark] branch master updated: [SPARK-27918][SQL] Port boolean.sql

2019-06-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new eadb538  [SPARK-27918][SQL] Port boolean.sql
eadb538 is described below

commit eadb53824d08480131498c7eb5bd7674f48b62c7
Author: Yuming Wang 
AuthorDate: Thu Jun 6 10:57:10 2019 -0700

[SPARK-27918][SQL] Port boolean.sql

## What changes were proposed in this pull request?

This PR is to port boolean.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out

When porting the test cases, found two PostgreSQL specific features that do 
not exist in Spark SQL:
- [SPARK-27931](https://issues.apache.org/jira/browse/SPARK-27931): Accept 
'on' and 'off' as input for boolean data type / Trim the string when cast to 
boolean type / Accept unique prefixes thereof
- [SPARK-27924](https://issues.apache.org/jira/browse/SPARK-27924): Support 
E061-14: Search Conditions

Also, found an inconsistent behavior:
- [SPARK-27923](https://issues.apache.org/jira/browse/SPARK-27923): 
Unsupported input throws an exception in PostgreSQL but Spark accepts it and 
sets the value to `NULL`, for example:
```sql
SELECT bool 'test' AS error; -- SELECT boolean('test') AS error;
```

## How was this patch tested?

N/A

Closes #24767 from wangyum/SPARK-27918.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../resources/sql-tests/inputs/pgSQL/boolean.sql   | 284 
 .../sql-tests/results/pgSQL/boolean.sql.out| 741 +
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   |  10 +
 3 files changed, 1035 insertions(+)

diff --git a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql
new file mode 100644
index 000..8ba6f97
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/boolean.sql
@@ -0,0 +1,284 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- BOOLEAN
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql
+
+--
+-- sanity check - if this fails go insane!
+--
+SELECT 1 AS one;
+
+
+-- **testing built-in type bool
+
+-- check bool input syntax
+
+SELECT true AS true;
+
+SELECT false AS false;
+
+SELECT boolean('t') AS true;
+
+-- [SPARK-27931] Trim the string when cast string type to boolean type
+SELECT boolean('   f   ') AS false;
+
+SELECT boolean('true') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept 'test' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('test') AS error;
+
+SELECT boolean('false') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'foo' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('foo') AS error;
+
+SELECT boolean('y') AS true;
+
+SELECT boolean('yes') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept 'yeah' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('yeah') AS error;
+
+SELECT boolean('n') AS false;
+
+SELECT boolean('no') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'nay' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('nay') AS error;
+
+-- [SPARK-27931] Accept 'on' and 'off' as input for boolean data type
+SELECT boolean('on') AS true;
+
+SELECT boolean('off') AS false;
+
+-- [SPARK-27931] Accept unique prefixes thereof
+SELECT boolean('of') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept 'o' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('o') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept 'on_' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('on_') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept 'off_' but Spark SQL accepts it 
and sets it to NULL
+SELECT boolean('off_') AS error;
+
+SELECT boolean('1') AS true;
+
+-- [SPARK-27923] PostgreSQL does not accept '11' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('11') AS error;
+
+SELECT boolean('0') AS false;
+
+-- [SPARK-27923] PostgreSQL does not accept '000' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('000') AS error;
+
+-- [SPARK-27923] PostgreSQL does not accept '' but Spark SQL accepts it and 
sets it to NULL
+SELECT boolean('') AS error;
+
+-- and, or, not in qualifications
+
+SELECT boolean('t') or boolean('f') AS true;
+
+SELECT boolean('t') and boolean('f') AS false;
+
+SELECT not boolean('f') AS true;
+
+SELECT boolean('t') = boolean('f') AS false;
+
+SELECT boolean('t') <> boolean('f') AS true;
+
+SELECT boolean('t') > boolean('f') AS true;
+
+SELECT

[spark] branch master updated: [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

2019-06-06 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 4de9649  [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]
4de9649 is described below

commit 4de96493ae1595bf6a80596c99df0e003ef0cf7d
Author: Yuming Wang 
AuthorDate: Thu Jun 6 09:28:59 2019 -0700

[SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

## What changes were proposed in this pull request?

This PR is to port AGGREGATES.sql from PostgreSQL regression tests. 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350

The expected results can be found in the link: 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/aggregates.out#L499-L984

When porting the test cases, found four PostgreSQL specific features that 
do not exist in Spark SQL:

- [SPARK-27877](https://issues.apache.org/jira/browse/SPARK-27877): 
Implement SQL-standard LATERAL subqueries
- [SPARK-27878](https://issues.apache.org/jira/browse/SPARK-27878): Support 
ARRAY(sub-SELECT) expressions
- [SPARK-27879](https://issues.apache.org/jira/browse/SPARK-27879): 
Implement bitwise integer aggregates(BIT_AND and BIT_OR)
- [SPARK-27880](https://issues.apache.org/jira/browse/SPARK-27880): 
Implement boolean aggregates(BOOL_AND, BOOL_OR and EVERY)

## How was this patch tested?

N/A

Closes #24743 from wangyum/SPARK-27883.

Authored-by: Yuming Wang 
Signed-off-by: gatorsmile 
---
 .../sql-tests/inputs/pgSQL/aggregates_part1.sql|   2 +-
 .../sql-tests/inputs/pgSQL/aggregates_part2.sql| 228 +
 .../results/pgSQL/aggregates_part2.sql.out | 162 +++
 3 files changed, 391 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
index de7bbda..a81eca2 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part1.sql
@@ -3,7 +3,7 @@
 --
 --
 -- AGGREGATES [Part 1]
--- 
https://github.com/postgres/postgres/blob/02ddd499322ab6f2f0d58692955dc9633c2150fc/src/test/regress/sql/aggregates.sql#L1-L143
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L1-L143
 
 -- avoid bit-exact output here because operations may not be bit-exact.
 -- SET extra_float_digits = 0;
diff --git 
a/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql 
b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
new file mode 100644
index 000..c461370
--- /dev/null
+++ b/sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part2.sql
@@ -0,0 +1,228 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- AGGREGATES [Part 2]
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L145-L350
+
+create temporary view int4_tbl as select * from values
+  (0),
+  (123456),
+  (-123456),
+  (2147483647),
+  (-2147483647)
+  as int4_tbl(f1);
+
+-- Test handling of Params within aggregate arguments in hashed aggregation.
+-- Per bug report from Jeevan Chalke.
+-- [SPARK-27877] Implement SQL-standard LATERAL subqueries
+-- explain (verbose, costs off)
+-- select s1, s2, sm
+-- from generate_series(1, 3) s1,
+--  lateral (select s2, sum(s1 + s2) sm
+--   from generate_series(1, 3) s2 group by s2) ss
+-- order by 1, 2;
+-- select s1, s2, sm
+-- from generate_series(1, 3) s1,
+--  lateral (select s2, sum(s1 + s2) sm
+--   from generate_series(1, 3) s2 group by s2) ss
+-- order by 1, 2;
+
+-- [SPARK-27878] Support ARRAY(sub-SELECT) expressions
+-- explain (verbose, costs off)
+-- select array(select sum(x+y) s
+-- from generate_series(1,3) y group by y order by s)
+--   from generate_series(1,3) x;
+-- select array(select sum(x+y) s
+-- from generate_series(1,3) y group by y order by s)
+--   from generate_series(1,3) x;
+
+-- [SPARK-27879] Implement bitwise integer aggregates(BIT_AND and BIT_OR)
+--
+-- test for bitwise integer aggregates
+--
+-- CREATE TEMPORARY TABLE bitwise_test(
+--   i2 INT2,
+--   i4 INT4,
+--   i8 INT8,
+--   i INTEGER,
+--   x INT2,
+--   y BIT(4)
+-- );
+
+-- empty case
+-- SELECT
+--   BIT_AND(i2) AS "?",
+--   BIT_OR(i4)  AS "?"
+-- FROM bitwise_test;
+
+-- COPY bitwise_test FROM STDIN NULL 'null';
+-- 1   1   1   1   1   B0101
+-- 3   3   3   null2   B0100
+-- 7   7   7   3   4   B1100
+-- \.
+
+-- SELECT
+--   BIT_AND(i2) AS "1",
+--   BIT_AND(i4) AS "1",
+--   BIT_AND(i8) AS "1",
+--   BIT_AND(i)  AS "?",
+--   BIT_AND(x)

[spark] branch master updated (4d770db -> eee3467)

[spark] branch master updated: [SPARK-27968] ArrowEvalPythonExec.evaluate shouldn't eagerly read the first row

[spark] branch master updated: [SPARK-27760][CORE] Spark resources - change user resource config from .count to .amount

[spark] branch master updated: [SPARK-27918][SQL] Port boolean.sql

[spark] branch master updated: [SPARK-27883][SQL] Port AGGREGATES.sql [Part 2]

5 matches

Site Navigation

Mail list logo

Footer information