[jira] [Assigned] (SPARK-28281) Convert and port 'having.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28281:


Assignee: Apache Spark

> Convert and port 'having.sql' into UDF test base
> 
>
> Key: SPARK-28281
> URL: https://issues.apache.org/jira/browse/SPARK-28281
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28281) Convert and port 'having.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28281:


Assignee: (was: Apache Spark)

> Convert and port 'having.sql' into UDF test base
> 
>
> Key: SPARK-28281
> URL: https://issues.apache.org/jira/browse/SPARK-28281
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28252) local/global temp view should not accept duplicate column names

2019-07-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881766#comment-16881766
 ] 

Yuming Wang commented on SPARK-28252:
-

PostgreSQL also do not support it.
{code:sql}
postgres=# CREATE TEMPORARY VIEW spark_28252 as select 1 as c1, 2 as c1;
ERROR:  column "c1" specified more than once
{code}


> local/global temp view should not accept duplicate column names
> ---
>
> Key: SPARK-28252
> URL: https://issues.apache.org/jira/browse/SPARK-28252
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> {noformat}
> scala> spark.sql("create temp view v1 as select 1 as col1, 2 as col1")
> res0: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("select col1 from v1").show
> 19/07/04 22:27:19 WARN ObjectStore: Failed to get database global_temp, 
> returning NoSuchObjectException
> org.apache.spark.sql.AnalysisException: Reference 'col1' is ambiguous, could 
> be: v1.col1, v1.col1.; line 1 pos 7
>   at 
> org.apache.spark.sql.catalyst.expressions.package$AttributeSeq.resolve(package.scala:259)
>   at 
> org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveChildren(LogicalPlan.scala:101)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$40.apply(Analyzer.scala:890)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$40.apply(Analyzer.scala:892)
>   at 
> org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:53)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28289) Convert and port 'union.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881763#comment-16881763
 ] 

Hyukjin Kwon commented on SPARK-28289:
--

Please go ahead.

> Convert and port 'union.sql' into UDF test base
> ---
>
> Key: SPARK-28289
> URL: https://issues.apache.org/jira/browse/SPARK-28289
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28289) Convert and port 'union.sql' into UDF test base

2019-07-09 Thread Yiheng Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881750#comment-16881750
 ] 

Yiheng Wang commented on SPARK-28289:
-

Hi [~hyukjin.kwon],

I'll be working on this.

Thanks.

> Convert and port 'union.sql' into UDF test base
> ---
>
> Key: SPARK-28289
> URL: https://issues.apache.org/jira/browse/SPARK-28289
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28324) The LOG function using 10 as the base, but Spark using E

2019-07-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881748#comment-16881748
 ] 

Yuming Wang commented on SPARK-28324:
-

PostgreSQL, Vertica, Teradata using 10 as the base. 
DB2, SQL Server, Hive and MySQL using E as the base.



> The LOG function using 10 as the base, but Spark using E
> 
>
> Key: SPARK-28324
> URL: https://issues.apache.org/jira/browse/SPARK-28324
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Spark SQL:
> {code:sql}
> spark-sql> select log(10);
> 2.302585092994046
> {code}
> PostgreSQL:
> {code:sql}
> postgres=# select log(10);
>  log
> -
>1
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28312) Add numeric.sql

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28312:


Assignee: Apache Spark

> Add numeric.sql
> ---
>
> Key: SPARK-28312
> URL: https://issues.apache.org/jira/browse/SPARK-28312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28312) Add numeric.sql

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28312:


Assignee: (was: Apache Spark)

> Add numeric.sql
> ---
>
> Key: SPARK-28312
> URL: https://issues.apache.org/jira/browse/SPARK-28312
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28283) Convert and port 'intersect-all.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881739#comment-16881739
 ] 

Hyukjin Kwon commented on SPARK-28283:
--

Thanks. [~imback82]

> Convert and port 'intersect-all.sql' into UDF test base
> ---
>
> Key: SPARK-28283
> URL: https://issues.apache.org/jira/browse/SPARK-28283
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28284) Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881738#comment-16881738
 ] 

Hyukjin Kwon commented on SPARK-28284:
--

Yea, or we can add some conditions on {{ON}} that returns \{{true}}.

> Convert and port 'join-empty-relation.sql' into UDF test base
> -
>
> Key: SPARK-28284
> URL: https://issues.apache.org/jira/browse/SPARK-28284
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
 For instance, let's add a comment as below on the top:
{code:java}
-- This test file was converted from xxx.sql.
{code}
3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}} into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments. Please see [this 
comment|https://github.com/apache/spark/pull/25090#discussion_r301880585] when 
you file a JIRA.
 It's more than perfect if you are even able to fix an issue found but this can 
be done separately. There is a great example to check and follow at 
SPARK-28323, done by [~viirya]

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
 See [https://github.com/apache/spark/pull/25069] as an example.

Note that registered UDFs all return strings - so there are some differences 
are expected.
 Note that this JIRA targets plan specific cases in general.
 Note that one {{output.sql.out}} file is shared for three UDF test cases 
(Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
 Note that this guide is supposed to be updated continuously given how it goes.
 Note that this test case uses the integrated UDF test base. See 
[https://github.com/apache/spark/pull/24752] if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/r

[jira] [Commented] (SPARK-28283) Convert and port 'intersect-all.sql' into UDF test base

2019-07-09 Thread Terry Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881732#comment-16881732
 ] 

Terry Kim commented on SPARK-28283:
---

I will work on this.

> Convert and port 'intersect-all.sql' into UDF test base
> ---
>
> Key: SPARK-28283
> URL: https://issues.apache.org/jira/browse/SPARK-28283
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
 For instance, let's add a comment as below on the top:
{code:java}
-- This test file was converted from xxx.sql.
{code}
3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}} into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments. Please see [this 
comment|https://github.com/apache/spark/pull/25090#discussion_r301880585] when 
you file a JIRA.
It's more than perfect if you are even able to fix it but this can be done 
separately. There is a great example to check and follow at SPARK-28323, done 
by [~viirya]

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
 See [https://github.com/apache/spark/pull/25069] as an example.

Note that registered UDFs all return strings - so there are some differences 
are expected.
 Note that this JIRA targets plan specific cases in general.
 Note that one {{output.sql.out}} file is shared for three UDF test cases 
(Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
 Note that this guide is supposed to be updated continuously given how it goes.
 Note that this test case uses the integrated UDF test base. See 
[https://github.com/apache/spark/pull/24752] if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-

[jira] [Commented] (SPARK-28284) Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-09 Thread Terry Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881731#comment-16881731
 ] 

Terry Kim commented on SPARK-28284:
---

join-empty-relation.sql has the following:

{code:java}
SELECT * FROM t1 INNER JOIN empty_table;
SELECT * FROM t1 CROSS JOIN empty_table;
SELECT * FROM t1 LEFT OUTER JOIN empty_table;
SELECT * FROM t1 RIGHT OUTER JOIN empty_table;
SELECT * FROM t1 FULL OUTER JOIN empty_table;
SELECT * FROM t1 LEFT SEMI JOIN empty_table;
SELECT * FROM t1 LEFT ANTI JOIN empty_table;

SELECT * FROM empty_table INNER JOIN t1;
SELECT * FROM empty_table CROSS JOIN t1;
SELECT * FROM empty_table LEFT OUTER JOIN t1;
SELECT * FROM empty_table RIGHT OUTER JOIN t1;
SELECT * FROM empty_table FULL OUTER JOIN t1;
SELECT * FROM empty_table LEFT SEMI JOIN t1;
SELECT * FROM empty_table LEFT ANTI JOIN t1;

SELECT * FROM empty_table INNER JOIN empty_table;
SELECT * FROM empty_table CROSS JOIN empty_table;
SELECT * FROM empty_table LEFT OUTER JOIN empty_table;
SELECT * FROM empty_table RIGHT OUTER JOIN empty_table;
SELECT * FROM empty_table FULL OUTER JOIN empty_table;
SELECT * FROM empty_table LEFT SEMI JOIN empty_table;
SELECT * FROM empty_table LEFT ANTI JOIN empty_table;
{code}

Where can I put `udf`? Do you want to modify SELECT clause?

> Convert and port 'join-empty-relation.sql' into UDF test base
> -
>
> Key: SPARK-28284
> URL: https://issues.apache.org/jira/browse/SPARK-28284
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27923) List all cases that PostgreSQL throws an exception but Spark SQL is NULL

2019-07-09 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27923:

Description: 
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L43-L74].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L107-L118].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L239-L241].

When porting the 
[float8.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float8.sql]
 found five case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L34-L65].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L38-L41].
 # Cannot take logarithm of zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L439-L440].
 # Cannot take logarithm of a negative number [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L441-L442].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L445-L446].

When porting the 
[numeric.sql|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql]
 found six case:
 # Invalid decimal [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L689-L696].
 # decimal type can not accept [Infinity and 
-Infinity|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L718-L731].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1429-L1460].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1883-L1887].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1940-L1945].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1987-L1998].






  was:
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2

[jira] [Created] (SPARK-28324) The LOG function using 10 as the base, but Spark using E

2019-07-09 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28324:
---

 Summary: The LOG function using 10 as the base, but Spark using E
 Key: SPARK-28324
 URL: https://issues.apache.org/jira/browse/SPARK-28324
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Spark SQL:
{code:sql}
spark-sql> select log(10);
2.302585092994046
{code}
PostgreSQL:

{code:sql}
postgres=# select log(10);
 log
-
   1
(1 row)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27923) List all cases that PostgreSQL throws an exception but Spark SQL is NULL

2019-07-09 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27923:

Description: 
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L43-L74].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L107-L118].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L239-L241].

When porting the 
[float8.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float8.sql]
 found five case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L34-L65].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L38-L41].
 # Cannot take logarithm of zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L439-L440].
 # Cannot take logarithm of a negative number [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L441-L442].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L445-L446].

When porting the 
[numeric.sql|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql]
 found five case:
 # Invalid decimal [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L689-L696].
 # decimal type can not accept [Infinity and 
-Infinity|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L718-L731].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1429-L1460].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1883-L1887].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1940-L1945].





  was:
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.

[jira] [Updated] (SPARK-27923) List all cases that PostgreSQL throws an exception but Spark SQL is NULL

2019-07-09 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27923:

Description: 
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L43-L74].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L107-L118].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L239-L241].

When porting the 
[float8.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float8.sql]
 found five case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L34-L65].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L38-L41].
 # Cannot take logarithm of zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L439-L440].
 # Cannot take logarithm of a negative number [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L441-L442].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L445-L446].

When porting the 
[numeric.sql|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql]
 found three case:
 # Invalid decimal [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L689-L696].
 # decimal type can not accept [Infinity and 
-Infinity|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L718-L731].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1429-L1460].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1883-L1887].





  was:
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three 

[jira] [Updated] (SPARK-27923) List all cases that PostgreSQL throws an exception but Spark SQL is NULL

2019-07-09 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27923:

Description: 
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L43-L74].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L107-L118].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L239-L241].

When porting the 
[float8.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float8.sql]
 found five case:
 # Bad input [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L34-L65].
 # Bad special inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L38-L41].
 # Cannot take logarithm of zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L439-L440].
 # Cannot take logarithm of a negative number [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float8.out#L441-L442].
 # Divide by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/float4.out#L445-L446].

When porting the 
[numeric.sql|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/numeric.sql]
 found four case:
 # Invalid decimal [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L689-L696].
 # decimal type can not accept [Infinity and 
-Infinity|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L718-L731].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1429-L1460].
# Invalid inputs [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1883-L1887].





  was:
In this ticket, we plan to list all cases that PostgreSQL throws an exception 
but Spark SQL is NULL.

When porting the 
[boolean.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/boolean.sql]
 found a case:
 # Cast unaccepted value to boolean type throws [invalid input 
syntax|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/boolean.out#L45-L47].

When porting the 
[case.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/case.sql]
 found a case:
 # Division by zero [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/case.out#L96-L99].

When porting the 
[date.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/date.sql]
 found a case:
 # Invalid date [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/date.out#L13-L14].

When porting the 
[int2.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int2.sql]
 found a case:
 # Invalid short [throws an 
exception|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/expected/int2.out#L9-L10].

When porting the 
[float4.sql|https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/float4.sql]
 found three c

[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
 For instance, let's add a comment as below on the top:
{code:java}
-- This test file was converted from xxx.sql.
{code}
3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}} into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments. Please see [this 
comment|https://github.com/apache/spark/pull/25090#discussion_r301880585] when 
you file a JIRA.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
 See [https://github.com/apache/spark/pull/25069] as an example.

Note that registered UDFs all return strings - so there are some differences 
are expected.
 Note that this JIRA targets plan specific cases in general.
 Note that one {{output.sql.out}} file is shared for three UDF test cases 
(Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
 Note that this guide is supposed to be updated continuously given how it goes.
 Note that this test case uses the integrated UDF test base. See 
[https://github.com/apache/spark/pull/24752] if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure y

[jira] [Commented] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881717#comment-16881717
 ] 

Liang-Chi Hsieh commented on SPARK-28323:
-

I found this bug when doing SPARK-28276.

> PythonUDF should be able to use in join condition
> -
>
> Key: SPARK-28323
> URL: https://issues.apache.org/jira/browse/SPARK-28323
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> There is a bug in {{ExtractPythonUDFs}} that produces wrong result 
> attributes. It causes a failure when using PythonUDFs among multiple child 
> plans, e.g., join. An example is using PythonUDFs in join condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28323:


Assignee: (was: Apache Spark)

> PythonUDF should be able to use in join condition
> -
>
> Key: SPARK-28323
> URL: https://issues.apache.org/jira/browse/SPARK-28323
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> There is a bug in {{ExtractPythonUDFs}} that produces wrong result 
> attributes. It causes a failure when using PythonUDFs among multiple child 
> plans, e.g., join. An example is using PythonUDFs in join condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28323:


Assignee: Apache Spark

> PythonUDF should be able to use in join condition
> -
>
> Key: SPARK-28323
> URL: https://issues.apache.org/jira/browse/SPARK-28323
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> There is a bug in {{ExtractPythonUDFs}} that produces wrong result 
> attributes. It causes a failure when using PythonUDFs among multiple child 
> plans, e.g., join. An example is using PythonUDFs in join condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28323) PythonUDF should be able to use in join condition

2019-07-09 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28323:
---

 Summary: PythonUDF should be able to use in join condition
 Key: SPARK-28323
 URL: https://issues.apache.org/jira/browse/SPARK-28323
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 3.0.0
Reporter: Liang-Chi Hsieh


There is a bug in {{ExtractPythonUDFs}} that produces wrong result attributes. 
It causes a failure when using PythonUDFs among multiple child plans, e.g., 
join. An example is using PythonUDFs in join condition.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-28073) ANSI SQL: Character literals

2019-07-09 Thread jiaan.geng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-28073:
---
Comment: was deleted

(was: I'm working on.)

> ANSI SQL: Character literals
> 
>
> Key: SPARK-28073
> URL: https://issues.apache.org/jira/browse/SPARK-28073
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> ||Feature ID||Feature Name||Feature Description||
> |E021-03|Character literals|— Subclause 5.3, “”:  [ 
> ... ] |
> Example:
> {code:sql}
> SELECT 'first line'
> ' - next line'
>   ' - third line'
>   AS "Three lines to one";
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28136) Add int8.sql

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28136:
-

Assignee: Yuming Wang

> Add int8.sql
> 
>
> Key: SPARK-28136
> URL: https://issues.apache.org/jira/browse/SPARK-28136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int8.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28136) Port int8.sql

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28136:
--
Summary: Port int8.sql  (was: Add int8.sql)

> Port int8.sql
> -
>
> Key: SPARK-28136
> URL: https://issues.apache.org/jira/browse/SPARK-28136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int8.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28136) Add int8.sql

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28136.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24933
[https://github.com/apache/spark/pull/24933]

> Add int8.sql
> 
>
> Key: SPARK-28136
> URL: https://issues.apache.org/jira/browse/SPARK-28136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In this ticket, we plan to add the regression test cases of 
> https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/int8.sql.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28278:


Assignee: Apache Spark

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28278:


Assignee: (was: Apache Spark)

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28322) DIV support decimal type

2019-07-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881694#comment-16881694
 ] 

Yuming Wang commented on SPARK-28322:
-

{{DIV}} and {{/}} are a little different:
{code:sql}
select 12345678901234567890 / 123;
  ?column?  

 100371373180768845
(1 row)

select div(12345678901234567890, 123);
div 

 100371373180768844
(1 row)
{code}
[https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/expected/numeric.out#L1564-L1574]

> DIV support decimal type
> 
>
> Key: SPARK-28322
> URL: https://issues.apache.org/jira/browse/SPARK-28322
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Spark SQL:
> {code:sql}
> spark-sql> SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
> Error in query: cannot resolve '(CAST(10 AS DECIMAL(10,0)) div CAST(3 AS 
> DECIMAL(10,0)))' due to data type mismatch: '(CAST(10 AS DECIMAL(10,0)) div 
> CAST(3 AS DECIMAL(10,0)))' requires integral type, not decimal(10,0); line 1 
> pos 7;
> 'Project [unresolvedalias((cast(10 as decimal(10,0)) div cast(3 as 
> decimal(10,0))), None)]
> +- OneRowRelation
> {code}
> PostgreSQL:
> {code:sql}
> postgres=# SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
>  div
> -
>3
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28275) Convert and port 'count.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28275:


Assignee: (was: Apache Spark)

> Convert and port 'count.sql' into UDF test base
> ---
>
> Key: SPARK-28275
> URL: https://issues.apache.org/jira/browse/SPARK-28275
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28275) Convert and port 'count.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28275:


Assignee: Apache Spark

> Convert and port 'count.sql' into UDF test base
> ---
>
> Key: SPARK-28275
> URL: https://issues.apache.org/jira/browse/SPARK-28275
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28322) DIV support decimal type

2019-07-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881691#comment-16881691
 ] 

Yuming Wang commented on SPARK-28322:
-

cc [~mgaido]

> DIV support decimal type
> 
>
> Key: SPARK-28322
> URL: https://issues.apache.org/jira/browse/SPARK-28322
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Spark SQL:
> {code:sql}
> spark-sql> SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
> Error in query: cannot resolve '(CAST(10 AS DECIMAL(10,0)) div CAST(3 AS 
> DECIMAL(10,0)))' due to data type mismatch: '(CAST(10 AS DECIMAL(10,0)) div 
> CAST(3 AS DECIMAL(10,0)))' requires integral type, not decimal(10,0); line 1 
> pos 7;
> 'Project [unresolvedalias((cast(10 as decimal(10,0)) div cast(3 as 
> decimal(10,0))), None)]
> +- OneRowRelation
> {code}
> PostgreSQL:
> {code:sql}
> postgres=# SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
>  div
> -
>3
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28322) DIV support decimal type

2019-07-09 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28322:
---

 Summary: DIV support decimal type
 Key: SPARK-28322
 URL: https://issues.apache.org/jira/browse/SPARK-28322
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


Spark SQL:
{code:sql}
spark-sql> SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
Error in query: cannot resolve '(CAST(10 AS DECIMAL(10,0)) div CAST(3 AS 
DECIMAL(10,0)))' due to data type mismatch: '(CAST(10 AS DECIMAL(10,0)) div 
CAST(3 AS DECIMAL(10,0)))' requires integral type, not decimal(10,0); line 1 
pos 7;
'Project [unresolvedalias((cast(10 as decimal(10,0)) div cast(3 as 
decimal(10,0))), None)]
+- OneRowRelation
{code}

PostgreSQL:
{code:sql}
postgres=# SELECT DIV(CAST(10 AS DECIMAL), CAST(3 AS DECIMAL));
 div
-
   3
(1 row)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881689#comment-16881689
 ] 

Hyukjin Kwon commented on SPARK-28288:
--

Please go ahead.

> Convert and port 'window.sql' into UDF test base
> 
>
> Key: SPARK-28288
> URL: https://issues.apache.org/jira/browse/SPARK-28288
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-09 Thread YoungGyu Chun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881675#comment-16881675
 ] 

YoungGyu Chun commented on SPARK-28288:
---

Hello [~hyukjin.kwon],

I'll be working on this. Thank you.

> Convert and port 'window.sql' into UDF test base
> 
>
> Key: SPARK-28288
> URL: https://issues.apache.org/jira/browse/SPARK-28288
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28274) Convert and port 'pgSQL/window.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881671#comment-16881671
 ] 

Hyukjin Kwon commented on SPARK-28274:
--

We should wait for SPARK-23160

> Convert and port 'pgSQL/window.sql' into UDF test base
> --
>
> Key: SPARK-28274
> URL: https://issues.apache.org/jira/browse/SPARK-28274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See SPARK-23160



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28274) Convert and port 'pgSQL/window.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881670#comment-16881670
 ] 

Hyukjin Kwon commented on SPARK-28274:
--

Oops, seems like it was my mistake.

> Convert and port 'pgSQL/window.sql' into UDF test base
> --
>
> Key: SPARK-28274
> URL: https://issues.apache.org/jira/browse/SPARK-28274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See SPARK-23160



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28274) Convert and port 'pgSQL/window.sql' into UDF test base

2019-07-09 Thread Terry Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881664#comment-16881664
 ] 

Terry Kim commented on SPARK-28274:
---

[~hyukjin.kwon] I don't see the window.sql file under 
sql/core/src/test/resources/sql-tests/inputs/pgSQL/. 

> Convert and port 'pgSQL/window.sql' into UDF test base
> --
>
> Key: SPARK-28274
> URL: https://issues.apache.org/jira/browse/SPARK-28274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See SPARK-23160



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27889) Make development scripts under dev/ support Python 3

2019-07-09 Thread Weichen Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881663#comment-16881663
 ] 

Weichen Xu commented on SPARK-27889:


Discussed with [~mengxr] offline. I will work on this.

> Make development scripts under dev/ support Python 3
> 
>
> Key: SPARK-27889
> URL: https://issues.apache.org/jira/browse/SPARK-27889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, Deploy
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiao Li
>Priority: Major
>
> Some of our internal python scripts under dev/ only support Python 2. With 
> deprecation of Python 2, we should make those scripts support Python 3. So 
> developers have a way to avoid seeing the deprecation warning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25382) Remove ImageSchema.readImages in 3.0

2019-07-09 Thread Weichen Xu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881662#comment-16881662
 ] 

Weichen Xu commented on SPARK-25382:


I will work on this. Thank!

> Remove ImageSchema.readImages in 3.0
> 
>
> Key: SPARK-25382
> URL: https://issues.apache.org/jira/browse/SPARK-25382
> Project: Spark
>  Issue Type: Task
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> A follow-up task from SPARK-25345. We might need to support sampling 
> (SPARK-25383) in order to remove readImages.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28316) Decimal precision issue

2019-07-09 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881648#comment-16881648
 ] 

Yuming Wang commented on SPARK-28316:
-

cc [~joshrosen] [~cloud_fan] [~Gengliang.Wang]

> Decimal precision issue
> ---
>
> Key: SPARK-28316
> URL: https://issues.apache.org/jira/browse/SPARK-28316
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Multiply check:
> {code:sql}
> -- Spark SQL
> spark-sql> select cast(-34338492.215397047 as decimal(38, 10)) * 
> cast(-34338492.215397047 as decimal(38, 10));
> 1179132047626883.596862
> -- PostgreSQL
> postgres=# select cast(-34338492.215397047 as numeric(38, 10)) * 
> cast(-34338492.215397047 as numeric(38, 10));
>?column?
> ---
>  1179132047626883.59686213585632020900
> (1 row)
> {code}
> Division check:
> {code:sql}
> -- Spark SQL
> spark-sql> select cast(93901.57763026 as decimal(38, 10)) / cast(4.31 as 
> decimal(38, 10));
> 21786.908963
> -- PostgreSQL
> postgres=# select cast(93901.57763026 as numeric(38, 10)) / cast(4.31 as 
> numeric(38, 10));
>   ?column?
> 
>  21786.908962937355
> (1 row)
> {code}
> POWER(10, LN(value)) check:
> {code:sql}
> -- Spark SQL
> spark-sql> SELECT CAST(POWER(cast('10' as decimal(38, 18)), 
> LN(ABS(round(cast(-24926804.04504742 as decimal(38, 10)),200 AS 
> decimal(38, 10));
> 107511333880051856
> -- PostgreSQL
> postgres=# SELECT CAST(POWER(cast('10' as numeric(38, 18)), 
> LN(ABS(round(cast(-24926804.04504742 as numeric(38, 10)),200 AS 
> numeric(38, 10));
>  power
> ---
>  107511333880052007.0414112467
> (1 row)
> {code}
> AVG, STDDEV and VARIANCE returns double type:
> {code:sql}
> -- Spark SQL
> spark-sql> create temporary view t1 as select * from values
>  >   (cast(-24926804.04504742 as decimal(38, 10))),
>  >   (cast(16397.038491 as decimal(38, 10))),
>  >   (cast(7799461.4119 as decimal(38, 10)))
>  >   as t1(t);
> spark-sql> SELECT AVG(t), STDDEV(t), VARIANCE(t) FROM t1;
> -5703648.53155214 1.7096528995154984E72.922913036821751E14
> -- PostgreSQL
> postgres=# SELECT AVG(t), STDDEV(t), VARIANCE(t)  from (values 
> (cast(-24926804.04504742 as decimal(38, 10))), (cast(16397.038491 as 
> decimal(38, 10))), (cast(7799461.4119 as decimal(38, 10 t1(t);
>   avg  |stddev |   
> variance
> ---+---+--
>  -5703648.53155214 | 17096528.99515498420743029415 | 
> 292291303682175.094017569588
> (1 row)
> {code}
> EXP returns double type:
> {code:sql}
> -- Spark SQL
> spark-sql> select exp(cast(1.0 as decimal(31,30)));
> 2.718281828459045
> -- PostgreSQL
> postgres=# select exp(cast(1.0 as decimal(31,30)));
>exp
> --
>  2.718281828459045235360287471353
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-09 Thread Terry Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881645#comment-16881645
 ] 

Terry Kim commented on SPARK-28278:
---

I will work on this.

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
 For instance, let's add a comment as below on the top:
{code:java}
-- This test file was converted from xxx.sql.
{code}
3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}}s into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
 See [https://github.com/apache/spark/pull/25069] as an example.

Note that registered UDFs all return strings - so there are some differences 
are expected.
 Note that this JIRA targets plan specific cases in general.
 Note that one {{output.sql.out}} file is shared for three UDF test cases 
(Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
 Note that this guide is supposed to be updated continuously given how it goes.
 Note that this test case uses the integrated UDF test base. See 
[https://github.com/apache/spark/pull/24752] if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import p

[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
 For instance, let's add a comment as below on the top:
{code:java}
-- This test file was converted from xxx.sql.
{code}
3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple \{{udf(...)}} into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
 See [https://github.com/apache/spark/pull/25069] as an example.

Note that registered UDFs all return strings - so there are some differences 
are expected.
 Note that this JIRA targets plan specific cases in general.
 Note that one {{output.sql.out}} file is shared for three UDF test cases 
(Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
 Note that this guide is supposed to be updated continuously given how it goes.
 Note that this test case uses the integrated UDF test base. See 
[https://github.com/apache/spark/pull/24752] if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import p

[jira] [Updated] (SPARK-28281) Convert and port 'having.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28281:
-
Component/s: Tests
 PySpark

> Convert and port 'having.sql' into UDF test base
> 
>
> Key: SPARK-28281
> URL: https://issues.apache.org/jira/browse/SPARK-28281
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28286) Convert and port 'pivot.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28286:
-
Component/s: Tests
 PySpark

> Convert and port 'pivot.sql' into UDF test base
> ---
>
> Key: SPARK-28286
> URL: https://issues.apache.org/jira/browse/SPARK-28286
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28280) Convert and port 'group-by.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28280:
-
Component/s: Tests
 PySpark

> Convert and port 'group-by.sql' into UDF test base
> --
>
> Key: SPARK-28280
> URL: https://issues.apache.org/jira/browse/SPARK-28280
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28278) Convert and port 'except-all.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28278:
-
Component/s: Tests
 PySpark

> Convert and port 'except-all.sql' into UDF test base
> 
>
> Key: SPARK-28278
> URL: https://issues.apache.org/jira/browse/SPARK-28278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28282) Convert and port 'inline-table.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28282:
-
Component/s: Tests
 PySpark

> Convert and port 'inline-table.sql' into UDF test base
> --
>
> Key: SPARK-28282
> URL: https://issues.apache.org/jira/browse/SPARK-28282
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28287) Convert and port 'udaf.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28287:
-
Component/s: Tests
 PySpark

> Convert and port 'udaf.sql' into UDF test base
> --
>
> Key: SPARK-28287
> URL: https://issues.apache.org/jira/browse/SPARK-28287
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28275) Convert and port 'count.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28275:
-
Component/s: Tests
 PySpark

> Convert and port 'count.sql' into UDF test base
> ---
>
> Key: SPARK-28275
> URL: https://issues.apache.org/jira/browse/SPARK-28275
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28279) Convert and port 'group-analysis.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28279:
-
Component/s: Tests
 PySpark

> Convert and port 'group-analysis.sql' into UDF test base
> 
>
> Key: SPARK-28279
> URL: https://issues.apache.org/jira/browse/SPARK-28279
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28289) Convert and port 'union.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28289:
-
Component/s: Tests
 PySpark

> Convert and port 'union.sql' into UDF test base
> ---
>
> Key: SPARK-28289
> URL: https://issues.apache.org/jira/browse/SPARK-28289
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28288) Convert and port 'window.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28288:
-
Component/s: Tests
 PySpark

> Convert and port 'window.sql' into UDF test base
> 
>
> Key: SPARK-28288
> URL: https://issues.apache.org/jira/browse/SPARK-28288
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28277) Convert and port 'except.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28277:
-
Component/s: Tests
 PySpark

> Convert and port 'except.sql' into UDF test base
> 
>
> Key: SPARK-28277
> URL: https://issues.apache.org/jira/browse/SPARK-28277
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28273) Convert and port 'pgSQL/case.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28273:
-
Component/s: Tests
 PySpark

> Convert and port 'pgSQL/case.sql' into UDF test base
> 
>
> Key: SPARK-28273
> URL: https://issues.apache.org/jira/browse/SPARK-28273
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> See SPARK-27934



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28277) Convert and port 'except.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881629#comment-16881629
 ] 

Hyukjin Kwon commented on SPARK-28277:
--

Please go ahead!

> Convert and port 'except.sql' into UDF test base
> 
>
> Key: SPARK-28277
> URL: https://issues.apache.org/jira/browse/SPARK-28277
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28283) Convert and port 'intersect-all.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28283:
-
Component/s: Tests
 PySpark

> Convert and port 'intersect-all.sql' into UDF test base
> ---
>
> Key: SPARK-28283
> URL: https://issues.apache.org/jira/browse/SPARK-28283
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28285) Convert and port 'outer-join.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28285:
-
Component/s: Tests
 PySpark

> Convert and port 'outer-join.sql' into UDF test base
> 
>
> Key: SPARK-28285
> URL: https://issues.apache.org/jira/browse/SPARK-28285
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27922) Convert and port 'natural-join.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27922:
-
Component/s: Tests

> Convert and port 'natural-join.sql' into UDF test base
> --
>
> Key: SPARK-27922
> URL: https://issues.apache.org/jira/browse/SPARK-27922
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28276) Convert and port 'cross-join.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28276:
-
Component/s: Tests
 PySpark

> Convert and port 'cross-join.sql' into UDF test base
> 
>
> Key: SPARK-28276
> URL: https://issues.apache.org/jira/browse/SPARK-28276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28285) Convert and port 'outer-join.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881630#comment-16881630
 ] 

Hyukjin Kwon commented on SPARK-28285:
--

Thanks.

> Convert and port 'outer-join.sql' into UDF test base
> 
>
> Key: SPARK-28285
> URL: https://issues.apache.org/jira/browse/SPARK-28285
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28272) Convert and port 'pgSQL/aggregates_part3.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28272:
-
Component/s: Tests
 PySpark

> Convert and port 'pgSQL/aggregates_part3.sql' into UDF test base
> 
>
> Key: SPARK-28272
> URL: https://issues.apache.org/jira/browse/SPARK-28272
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> see SPARK-27988



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28274) Convert and port 'pgSQL/window.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28274:
-
Component/s: Tests
 PySpark

> Convert and port 'pgSQL/window.sql' into UDF test base
> --
>
> Key: SPARK-28274
> URL: https://issues.apache.org/jira/browse/SPARK-28274
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> See SPARK-23160



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28284) Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28284:
-
Component/s: Tests
 PySpark

> Convert and port 'join-empty-relation.sql' into UDF test base
> -
>
> Key: SPARK-28284
> URL: https://issues.apache.org/jira/browse/SPARK-28284
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28271) Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28271:
-
Component/s: PySpark

> Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base
> 
>
> Key: SPARK-28271
> URL: https://issues.apache.org/jira/browse/SPARK-28271
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> see SPARK-27883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28270) Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28270:
-
Component/s: Tests
 PySpark

> Convert and port 'pgSQL/aggregates_part1.sql' into UDF test base
> 
>
> Key: SPARK-28270
> URL: https://issues.apache.org/jira/browse/SPARK-28270
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> see SPARK-27770



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Component/s: Tests

> Convert applicable *.sql tests into UDF integrated test base
> 
>
> Key: SPARK-27921
> URL: https://issues.apache.org/jira/browse/SPARK-27921
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This JIRA targets to improve Python test coverage in particular about 
> {{ExtractPythonUDFs}}.
>  This rule has caused many regressions or issues such as SPARK-27803, 
> SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
>  We should convert *.sql test cases that can be affected by this rule 
> {{ExtractPythonUDFs}} like 
> [https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
>  Namely most of plan related test cases might have to be converted.
> *Here is the rough contribution guide to follow:*
> Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
> you're able to do this:
> {code:java}
> >>> import pandas
> >>> pandas.__version__
> '0.23.4'
> >>> import pyarrow
> >>> pyarrow.__version__
> '0.13.0'
> >>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
> pyarrow.Table
> a: int64
> metadata
> 
> OrderedDict([(b'pandas',
>   b'{"index_columns": [{"kind": "range", "name": null, "start": '
>   b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
>   b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
>   b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
>   b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
>   b'mpy_type": "int64", "metadata": null}], "creator": {"library'
>   b'": "pyarrow", "version": "0.13.0"}, "pandas_version": 
> null}')])
> {code}
>  
>  1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
> file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}
> 2. Keep the comments and state that this file was copied from 
> {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
> For instance, let's add a comment as below on the top:
> {code}
> -- This test file was converted from xxx.sql.
> {code}
> 3. Run it below:
> {code:java}
> SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- 
> -z udf/udf-xxx.sql"
> git add .
> {code}
> 4. Insert one or multiple {{udf(...)}}s into each statement. It is not 
> required to add more combinations.
>  And it is not strict about where to insert. Ideally, we should try to put 
> udf differently for each statement.
> 5. Run it below again:
> {code:java}
> SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- 
> -z udf/udf-xxx.sql"
> git diff
> # or git diff --no-index 
> sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
> sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
> {code}
> 6. Compare results with original file, 
> {{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}
> 7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
> comments.
> 8. Run without generating golden files and check:
> {code:java}
> build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
> {code}
> 9. When you open a PR. please attach {{git diff --no-index 
> sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
> sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
> description with the template below:
> {code:java}
> Diff comparing to 'xxx.sql'
> 
> ```diff
> ...  # here you put 'git diff' results
> ```
> 
> 
> {code}
> 10. You're ready. Please go for a PR! If the PR contains other minor fixes, 
> use {{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is 
> purely about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
> See https://github.com/apache/spark/pull/25069 as an example.
> Note that registered UDFs all return strings - so there are some differences 
> are expected.
> Note that this JIRA targets plan specific cases in general.
> Note that one {{output.sql.out}} file is shared for three UDF test cases 
> (Scala UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
> Note that this guide is supposed to be updated continuously given how it goes.
> Note that this test case uses the integrated UDF test base. See 
> https://github.com/apache/spark/pull/24752 if you're interested in it or find 
> an issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

--

[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
For instance, let's add a comment as below on the top:

{code}
-- This test file was converted from xxx.sql.
{code}

3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}}s into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains other minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
See https://github.com/apache/spark/pull/25069 as an example.


Note that registered UDFs all return strings - so there are some differences 
are expected.
Note that this JIRA targets plan specific cases in general.
Note that one {{output.sql.out}} file is shared for three UDF test cases (Scala 
UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
Note that this guide is supposed to be updated continuously given how it goes.
Note that this test case uses the integrated UDF test base. See 
https://github.com/apache/spark/pull/24752 if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pa

[jira] [Updated] (SPARK-27921) Convert applicable *.sql tests into UDF integrated test base

2019-07-09 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-27921:
-
Description: 
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas.__version__
'0.23.4'
>>> import pyarrow
>>> pyarrow.__version__
'0.13.0'
>>> pyarrow.Table.from_pandas(pandas.DataFrame({'a': [1,2,3]}))
pyarrow.Table
a: int64
metadata

OrderedDict([(b'pandas',
  b'{"index_columns": [{"kind": "range", "name": null, "start": '
  b'0, "stop": 3, "step": 1}], "column_indexes": [{"name": null,'
  b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
  b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
  b'{"name": "a", "field_name": "a", "pandas_type": "int64", "nu'
  b'mpy_type": "int64", "metadata": null}], "creator": {"library'
  b'": "pyarrow", "version": "0.13.0"}, "pandas_version": null}')])
{code}
 
 1. Copy and paste {{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}} 
file into {{sql/core/src/test/resources/sql-tests/inputs/udf/udf-xxx.sql}}

2. Keep the comments and state that this file was copied from 
{{sql/core/src/test/resources/sql-tests/inputs/xxx.sql}}, for now.
For instance, let's add a comment as below on the top:

{code}
-- This test file was converted from xxx.sql.
{code}

3. Run it below:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git add .
{code}
4. Insert one or multiple {{udf(...)}}s into each statement. It is not required 
to add more combinations.
 And it is not strict about where to insert. Ideally, we should try to put udf 
differently for each statement.

5. Run it below again:
{code:java}
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt "sql/test-only *SQLQueryTestSuite -- -z 
udf/udf-xxx.sql"
git diff
# or git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out
{code}
6. Compare results with original file, 
{{sql/core/src/test/resources/sql-tests/results/xxx.sql.out}}

7. If there are diff, analyze it, file or find the JIRA, skip the tests with 
comments.

8. Run without generating golden files and check:
{code:java}
build/sbt "sql/test-only *SQLQueryTestSuite -- -z udf/udf-xxx.sql"
{code}
9. When you open a PR. please attach {{git diff --no-index 
sql/core/src/test/resources/sql-tests/results/xxx.sql.out 
sql/core/src/test/resources/sql-tests/results/udf/xxx.sql.out}} in the PR 
description with the template below:
{code:java}
Diff comparing to 'xxx.sql'


```diff
...  # here you put 'git diff' results
```



{code}
10. You're ready. Please go for a PR! If the PR contains minor fixes, use 
{{[SPARK-X][SQL][PYTHON]}} prefix in the PR title. If the PR is purely 
about tests, use {{[SPARK-X][SQL][PYTHON][TESTS]}}.
See https://github.com/apache/spark/pull/25069 as an example.


Note that registered UDFs all return strings - so there are some differences 
are expected.
Note that this JIRA targets plan specific cases in general.
Note that one {{output.sql.out}} file is shared for three UDF test cases (Scala 
UDF, Python UDF, and Pandas UDF). Beware of it when you fix the tests.
Note that this guide is supposed to be updated continuously given how it goes.
Note that this test case uses the integrated UDF test base. See 
https://github.com/apache/spark/pull/24752 if you're interested in it or find 
an issue.

  was:
This JIRA targets to improve Python test coverage in particular about 
{{ExtractPythonUDFs}}.
 This rule has caused many regressions or issues such as SPARK-27803, 
SPARK-26147, SPARK-26864, SPARK-26293, SPARK-25314 and SPARK-24721.
 We should convert *.sql test cases that can be affected by this rule 
{{ExtractPythonUDFs}} like 
[https://github.com/apache/spark/blob/f5317f10b25bd193cf5026a8f4fd1cd1ded8f5b4/sql/core/src/test/resources/sql-tests/inputs/udf/udf-inner-join.sql]
 Namely most of plan related test cases might have to be converted.

*Here is the rough contribution guide to follow:*

Make sure you have Python with Pandas 0.23.2+ and PyArrow 0.12.1+. Check if 
you're able to do this:
{code:java}
>>> import pandas
>>> pandas._

[jira] [Comment Edited] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-09 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881623#comment-16881623
 ] 

Josh Rosen edited comment on SPARK-27570 at 7/10/19 12:22 AM:
--

I ran into a very similar issue, except I was reading from S3 instead of 
OpenStack Swift. In my reproduction, the addition or removal of filters or 
projections affected whether I hit the error. In my case, I think the problem 
was https://issues.apache.org/jira/browse/HADOOP-16109, an issue where Parquet 
could sometimes use access patterns that hit a bug in seek() in S3AInputStream 
(/cc [~ste...@apache.org]). I confirmed this by re-running my failing job 
against an exact copy of the data stored on HDFS (which succeeded).


was (Author: joshrosen):
I ran into a very similar issue, except I was reading from S3 instead of 
OpenStack Swift. In my reproduction, the addition or removal of filters or 
projections affected whether I hit the error. In my case, I think the problem 
was https://issues.apache.org/jira/browse/HADOOP-16109, an issue where Parquet 
could sometimes use access patterns that hit a bug in seek() in S3AInputStream 
(/cc [~ste...@apache.org]).

> java.io.EOFException Reached the end of stream - Reading Parquet from Swift
> ---
>
> Key: SPARK-27570
> URL: https://issues.apache.org/jira/browse/SPARK-27570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Harry Hough
>Priority: Major
>
> I did see issue SPARK-25966 but it seems there are some differences as his 
> problem was resolved after rebuilding the parquet files on write. This is 
> 100% reproducible for me across many different days of data.
> I get exceptions such as "Reached the end of stream with 750477 bytes left to 
> read" during some read operations of parquet files. I am reading these files 
> from Openstack swift using openstack-hadoop 2.7.7 on Spark 2.4.
> The issues seem to happen with the where statement. I have also tried filter 
> and combining the statements into one as well as the dataset method with 
> column without any luck. Which column or what the actual filter is on the 
> where also doesn't seem to make a difference to the error occurring or not.
>  
> {code:java}
> val engagementDS = spark
>   .read
>   .parquet(createSwiftAddr("engagements", folder))
>   .where("engtype != 0")
>   .where("engtype != 1000")
>   .groupBy($"accid", $"sessionkey")
>   .agg(collect_list(struct($"time", $"pid", $"engtype", $"pageid", 
> $"testid")).as("engagements"))
> // Exiting paste mode, now interpreting.
> [Stage 53:> (0 + 32) / 32]2019-04-25 19:02:12 ERROR Executor:91 - Exception 
> in task 24.0 in stage 53.0 (TID 688)
> java.io.EOFException: Reached the end of stream with 1323959 bytes left to 
> read
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:301)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:256)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:159)
> at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
> at 
> org.apache.spark.sql.execution.aggregate.ObjectHas

[jira] [Commented] (SPARK-28277) Convert and port 'except.sql' into UDF test base

2019-07-09 Thread Huaxin Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881624#comment-16881624
 ] 

Huaxin Gao commented on SPARK-28277:


I will work on this. Thanks. 

> Convert and port 'except.sql' into UDF test base
> 
>
> Key: SPARK-28277
> URL: https://issues.apache.org/jira/browse/SPARK-28277
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25966) "EOF Reached the end of stream with bytes left to read" while reading/writing to Parquets

2019-07-09 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881625#comment-16881625
 ] 

Josh Rosen commented on SPARK-25966:


Cross-post: there's discussion of a similar issue at 
https://issues.apache.org/jira/browse/SPARK-27570. Based on that, I suspect 
that https://issues.apache.org/jira/browse/HADOOP-16109 may fix this problem.

> "EOF Reached the end of stream with bytes left to read" while reading/writing 
> to Parquets
> -
>
> Key: SPARK-25966
> URL: https://issues.apache.org/jira/browse/SPARK-25966
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
> Environment: Spark 2.4.0 (built from RC5 tag) running Hadoop 3.1.1 on 
> top of a Mesos cluster. Both input and output Parquet files are on S3.
>Reporter: Alessandro Andrioni
>Priority: Major
>
> I was persistently getting the following exception while trying to run one 
> Spark job we have using Spark 2.4.0. It went away after I regenerated from 
> scratch all the input Parquet files (generated by another Spark job also 
> using Spark 2.4.0).
> Is there a chance that Spark is writing (quite rarely) corrupted Parquet 
> files?
> {code:java}
> org.apache.spark.SparkException: Job aborted.
>   at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:196)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
>   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>   at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
>   at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:668)
>   at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>   at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:668)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:276)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
>   at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
>   at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:557)
>   (...)
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 312 in stage 682.0 failed 4 times, most recent failure: Lost task 312.3 
> in stage 682.0 (TID 235229, 10.130.29.78, executor 77): java.io.EOFException: 
> Reached the end of stream with 996 bytes left to read
>   at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
>   at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
>   at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:301)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.jav

[jira] [Commented] (SPARK-28285) Convert and port 'outer-join.sql' into UDF test base

2019-07-09 Thread Huaxin Gao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881622#comment-16881622
 ] 

Huaxin Gao commented on SPARK-28285:


I will work on this. Thanks. 

> Convert and port 'outer-join.sql' into UDF test base
> 
>
> Key: SPARK-28285
> URL: https://issues.apache.org/jira/browse/SPARK-28285
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27570) java.io.EOFException Reached the end of stream - Reading Parquet from Swift

2019-07-09 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881623#comment-16881623
 ] 

Josh Rosen commented on SPARK-27570:


I ran into a very similar issue, except I was reading from S3 instead of 
OpenStack Swift. In my reproduction, the addition or removal of filters or 
projections affected whether I hit the error. In my case, I think the problem 
was https://issues.apache.org/jira/browse/HADOOP-16109, an issue where Parquet 
could sometimes use access patterns that hit a bug in seek() in S3AInputStream 
(/cc [~ste...@apache.org]).

> java.io.EOFException Reached the end of stream - Reading Parquet from Swift
> ---
>
> Key: SPARK-27570
> URL: https://issues.apache.org/jira/browse/SPARK-27570
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Harry Hough
>Priority: Major
>
> I did see issue SPARK-25966 but it seems there are some differences as his 
> problem was resolved after rebuilding the parquet files on write. This is 
> 100% reproducible for me across many different days of data.
> I get exceptions such as "Reached the end of stream with 750477 bytes left to 
> read" during some read operations of parquet files. I am reading these files 
> from Openstack swift using openstack-hadoop 2.7.7 on Spark 2.4.
> The issues seem to happen with the where statement. I have also tried filter 
> and combining the statements into one as well as the dataset method with 
> column without any luck. Which column or what the actual filter is on the 
> where also doesn't seem to make a difference to the error occurring or not.
>  
> {code:java}
> val engagementDS = spark
>   .read
>   .parquet(createSwiftAddr("engagements", folder))
>   .where("engtype != 0")
>   .where("engtype != 1000")
>   .groupBy($"accid", $"sessionkey")
>   .agg(collect_list(struct($"time", $"pid", $"engtype", $"pageid", 
> $"testid")).as("engagements"))
> // Exiting paste mode, now interpreting.
> [Stage 53:> (0 + 32) / 32]2019-04-25 19:02:12 ERROR Executor:91 - Exception 
> in task 24.0 in stage 53.0 (TID 688)
> java.io.EOFException: Reached the end of stream with 1323959 bytes left to 
> read
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:104)
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)
> at 
> org.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:1174)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:805)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:301)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:256)
> at 
> org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:159)
> at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
> at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:619)
> at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:107)
> at 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apac

[jira] [Assigned] (SPARK-27922) Convert and port 'natural-join.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27922:


Assignee: Apache Spark

> Convert and port 'natural-join.sql' into UDF test base
> --
>
> Key: SPARK-27922
> URL: https://issues.apache.org/jira/browse/SPARK-27922
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27922) Convert and port 'natural-join.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27922:


Assignee: (was: Apache Spark)

> Convert and port 'natural-join.sql' into UDF test base
> --
>
> Key: SPARK-27922
> URL: https://issues.apache.org/jira/browse/SPARK-27922
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22158) convertMetastore should not ignore storage properties

2019-07-09 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881602#comment-16881602
 ] 

Ruslan Dautkhanov commented on SPARK-22158:
---

[~dongjoon] I may have misreported it - sorry. 

[~waleedfateem] made some tests, I thought 2.2.0 is affected as well, but 
you're probably right that 2.2.1 is the first one affected.
Cloudera has pointed to this Jira.

Thank you. 

> convertMetastore should not ignore storage properties
> -
>
> Key: SPARK-22158
> URL: https://issues.apache.org/jira/browse/SPARK-22158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> From the beginning, convertMetastoreOrc ignores table properties and use an 
> emtpy map instead. It's the same with convertMetastoreParquet.
> {code}
> val options = Map[String, String]()
> {code}
> - SPARK-14070: 
> https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650
> - master: 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22158) convertMetastore should not ignore storage properties

2019-07-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881592#comment-16881592
 ] 

Dongjoon Hyun commented on SPARK-22158:
---

[~Tagar]. This is not related to that because that is reported at 2.2.0 and 
this is merged to 2.2.1. :)

> convertMetastore should not ignore storage properties
> -
>
> Key: SPARK-22158
> URL: https://issues.apache.org/jira/browse/SPARK-22158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> From the beginning, convertMetastoreOrc ignores table properties and use an 
> emtpy map instead. It's the same with convertMetastoreParquet.
> {code}
> val options = Map[String, String]()
> {code}
> - SPARK-14070: 
> https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650
> - master: 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28140) Pyspark API to create spark.mllib RowMatrix from DataFrame

2019-07-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-28140.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24953
[https://github.com/apache/spark/pull/24953]

> Pyspark API to create spark.mllib RowMatrix from DataFrame
> --
>
> Key: SPARK-28140
> URL: https://issues.apache.org/jira/browse/SPARK-28140
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Henry Davidge
>Assignee: Henry Davidge
>Priority: Minor
> Fix For: 3.0.0
>
>
> Since many functions are only implemented in spark.mllib, it is often 
> necessary to convert DataFrames of spark.ml vectors to spark.mllib 
> distributed matrix formats. The first step, converting the spark.ml vectors 
> to the spark.mllib equivalent, is straightforward. However, to the best of my 
> knowledge it's not possible to convert the resulting DataFrame to a RowMatrix 
> without using a python lambda function, which can have a significant 
> performance hit. In my recent use case, SVD took 3.5m using the Scala API, 
> but 12m using Python.
> To get around this performance hit, I propose adding a constructor to the 
> Pyspark RowMatrix class that accepts a DataFrame with a single column of 
> spark.mllib vectors. I'd be happy to add an equivalent API for 
> IndexedRowMatrix if there is demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28140) Pyspark API to create spark.mllib RowMatrix from DataFrame

2019-07-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28140:
--
Priority: Minor  (was: Major)

> Pyspark API to create spark.mllib RowMatrix from DataFrame
> --
>
> Key: SPARK-28140
> URL: https://issues.apache.org/jira/browse/SPARK-28140
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Henry Davidge
>Priority: Minor
>
> Since many functions are only implemented in spark.mllib, it is often 
> necessary to convert DataFrames of spark.ml vectors to spark.mllib 
> distributed matrix formats. The first step, converting the spark.ml vectors 
> to the spark.mllib equivalent, is straightforward. However, to the best of my 
> knowledge it's not possible to convert the resulting DataFrame to a RowMatrix 
> without using a python lambda function, which can have a significant 
> performance hit. In my recent use case, SVD took 3.5m using the Scala API, 
> but 12m using Python.
> To get around this performance hit, I propose adding a constructor to the 
> Pyspark RowMatrix class that accepts a DataFrame with a single column of 
> spark.mllib vectors. I'd be happy to add an equivalent API for 
> IndexedRowMatrix if there is demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28140) Pyspark API to create spark.mllib RowMatrix from DataFrame

2019-07-09 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-28140:
-

Assignee: Henry Davidge

> Pyspark API to create spark.mllib RowMatrix from DataFrame
> --
>
> Key: SPARK-28140
> URL: https://issues.apache.org/jira/browse/SPARK-28140
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 3.0.0
>Reporter: Henry Davidge
>Assignee: Henry Davidge
>Priority: Minor
>
> Since many functions are only implemented in spark.mllib, it is often 
> necessary to convert DataFrames of spark.ml vectors to spark.mllib 
> distributed matrix formats. The first step, converting the spark.ml vectors 
> to the spark.mllib equivalent, is straightforward. However, to the best of my 
> knowledge it's not possible to convert the resulting DataFrame to a RowMatrix 
> without using a python lambda function, which can have a significant 
> performance hit. In my recent use case, SVD took 3.5m using the Scala API, 
> but 12m using Python.
> To get around this performance hit, I propose adding a constructor to the 
> Pyspark RowMatrix class that accepts a DataFrame with a single column of 
> spark.mllib vectors. I'd be happy to add an equivalent API for 
> IndexedRowMatrix if there is demand.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28321) functions.udf(UDF0, DataType) produces unexpected results

2019-07-09 Thread Vladimir Matveev (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vladimir Matveev updated SPARK-28321:
-
Description: 
It looks like that the `f.udf(UDF0, DataType)` variant of the UDF 
Column-creating methods is wrong 
([https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061|https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061):]):

 
{code:java}
def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = {
  val func = f.asInstanceOf[UDF0[Any]].call()
  SparkUserDefinedFunction.create(() => func, returnType, inputSchemas = 
Seq.fill(0)(None))
}
{code}
Here the UDF passed as the first argument will be called *right inside the 
`udf` method* on the driver, rather than at the dataframe computation time on 
executors. One of the major issues here is that non-deterministic UDFs (e.g. 
generating a random value) will produce unexpected results:

 

 
{code:java}
val scalaudf = f.udf { () => scala.util.Random.nextInt() }.asNondeterministic()
val javaudf = f.udf(new UDF0[Int] { override def call(): Int = 
scala.util.Random.nextInt() }, IntegerType).asNondeterministic()

(1 to 100).toDF().select(scalaudf().as("scala"), javaudf().as("java")).show()

// prints

+---+-+
|  scala| java|
+---+-+
|  934190385|478543809|
|-1082102515|478543809|
|  774466710|478543809|
| 1883582103|478543809|
|-1959743031|478543809|
| 1534685218|478543809|
| 1158899264|478543809|
|-1572590653|478543809|
| -309451364|478543809|
| -906574467|478543809|
| -436584308|478543809|
| 1598340674|478543809|
|-1331343156|478543809|
|-1804177830|478543809|
|-1682906106|478543809|
| -197444289|478543809|
|  260603049|478543809|
|-1993515667|478543809|
|-1304685845|478543809|
|  481017016|478543809|
+---+-{code}
Note that the version which relies on a different overload of the 
`functions.udf` method works correctly.

 

  was:
It looks like that the `f.udf(UDF0, DataType)` variant of the UDF 
Column-creating methods is wrong 
([https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061):]

 
{code:java}
def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = {
  val func = f.asInstanceOf[UDF0[Any]].call()
  SparkUserDefinedFunction.create(() => func, returnType, inputSchemas = 
Seq.fill(0)(None))
}
{code}
Here the UDF passed as the first argument will be called *right inside the 
`udf` method* on the driver, rather than at the dataframe computation time on 
executors. One of the major issues here is that non-deterministic UDFs (e.g. 
generating a random value) will produce unexpected results:

 

 
{code:java}
val scalaudf = f.udf { () => scala.util.Random.nextInt() }.asNondeterministic()
val javaudf = f.udf(new UDF0[Int] { override def call(): Int = 
scala.util.Random.nextInt() }, IntegerType).asNondeterministic()

(1 to 100).toDF().select(scalaudf().as("scala"), javaudf().as("java")).show()

// prints

+---+-+
|  scala| java|
+---+-+
|  934190385|478543809|
|-1082102515|478543809|
|  774466710|478543809|
| 1883582103|478543809|
|-1959743031|478543809|
| 1534685218|478543809|
| 1158899264|478543809|
|-1572590653|478543809|
| -309451364|478543809|
| -906574467|478543809|
| -436584308|478543809|
| 1598340674|478543809|
|-1331343156|478543809|
|-1804177830|478543809|
|-1682906106|478543809|
| -197444289|478543809|
|  260603049|478543809|
|-1993515667|478543809|
|-1304685845|478543809|
|  481017016|478543809|
+---+-{code}
Note that the version which relies on a different overload of the 
`functions.udf` method works correctly.

 


> functions.udf(UDF0, DataType) produces unexpected results
> -
>
> Key: SPARK-28321
> URL: https://issues.apache.org/jira/browse/SPARK-28321
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.4.3
>Reporter: Vladimir Matveev
>Priority: Major
>
> It looks like that the `f.udf(UDF0, DataType)` variant of the UDF 
> Column-creating methods is wrong 
> ([https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061|https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061):]):
>  
> {code:java}
> def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = {
>   val func = f.asInstanceOf[UDF0[Any]].call()
>   SparkUserDefinedFunction.create(() => func, returnType, inputSchemas = 
> Seq.fill(0)(Non

[jira] [Created] (SPARK-28321) functions.udf(UDF0, DataType) produces unexpected results

2019-07-09 Thread Vladimir Matveev (JIRA)
Vladimir Matveev created SPARK-28321:


 Summary: functions.udf(UDF0, DataType) produces unexpected results
 Key: SPARK-28321
 URL: https://issues.apache.org/jira/browse/SPARK-28321
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3, 2.3.2
Reporter: Vladimir Matveev


It looks like that the `f.udf(UDF0, DataType)` variant of the UDF 
Column-creating methods is wrong 
([https://github.com/apache/spark/blob/c3e32bf06c35ba2580d46150923abfa795b4446a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L4061):]

 
{code:java}
def udf(f: UDF0[_], returnType: DataType): UserDefinedFunction = {
  val func = f.asInstanceOf[UDF0[Any]].call()
  SparkUserDefinedFunction.create(() => func, returnType, inputSchemas = 
Seq.fill(0)(None))
}
{code}
Here the UDF passed as the first argument will be called *right inside the 
`udf` method* on the driver, rather than at the dataframe computation time on 
executors. One of the major issues here is that non-deterministic UDFs (e.g. 
generating a random value) will produce unexpected results:

 

 
{code:java}
val scalaudf = f.udf { () => scala.util.Random.nextInt() }.asNondeterministic()
val javaudf = f.udf(new UDF0[Int] { override def call(): Int = 
scala.util.Random.nextInt() }, IntegerType).asNondeterministic()

(1 to 100).toDF().select(scalaudf().as("scala"), javaudf().as("java")).show()

// prints

+---+-+
|  scala| java|
+---+-+
|  934190385|478543809|
|-1082102515|478543809|
|  774466710|478543809|
| 1883582103|478543809|
|-1959743031|478543809|
| 1534685218|478543809|
| 1158899264|478543809|
|-1572590653|478543809|
| -309451364|478543809|
| -906574467|478543809|
| -436584308|478543809|
| 1598340674|478543809|
|-1331343156|478543809|
|-1804177830|478543809|
|-1682906106|478543809|
| -197444289|478543809|
|  260603049|478543809|
|-1993515667|478543809|
|-1304685845|478543809|
|  481017016|478543809|
+---+-{code}
Note that the version which relies on a different overload of the 
`functions.udf` method works correctly.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28271) Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28271:
--
Component/s: Tests

> Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base
> 
>
> Key: SPARK-28271
> URL: https://issues.apache.org/jira/browse/SPARK-28271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> see SPARK-27883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22158) convertMetastore should not ignore storage properties

2019-07-09 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881437#comment-16881437
 ] 

Ruslan Dautkhanov edited comment on SPARK-22158 at 7/9/19 6:57 PM:
---

[~dongjoon] can you please check if PR-20522 causes SPARK-28266 data 
correctness regression?

Thank you.


was (Author: tagar):
[~dongjoon] can you please check if this causes SPARK-28266 data correctness 
regression? 

Thank you.

> convertMetastore should not ignore storage properties
> -
>
> Key: SPARK-22158
> URL: https://issues.apache.org/jira/browse/SPARK-22158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> From the beginning, convertMetastoreOrc ignores table properties and use an 
> emtpy map instead. It's the same with convertMetastoreParquet.
> {code}
> val options = Map[String, String]()
> {code}
> - SPARK-14070: 
> https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650
> - master: 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28310) ANSI SQL grammar support: first_value/last_value(expression, [RESPECT NULLS | IGNORE NULLS])

2019-07-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881462#comment-16881462
 ] 

Dongjoon Hyun commented on SPARK-28310:
---

I marked this to `Minor` because this is just a syntax acceptance issue.

> ANSI SQL grammar support: first_value/last_value(expression, [RESPECT NULLS | 
> IGNORE NULLS])
> 
>
> Key: SPARK-28310
> URL: https://issues.apache.org/jira/browse/SPARK-28310
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zhu, Lipeng
>Priority: Minor
>
> According to the ANSI SQL 2011:
> {code:sql}
>  ::= 
>  ::= RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Teradata - 
> [https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/SUwCpTupqmlBJvi2mipOaA]
>  
> Oracle - 
> [https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC]
> Redshift – 
> [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html]
>  
> Postgresql didn't implement the Ignore/respect nulls. 
> [https://www.postgresql.org/docs/devel/functions-window.html]
> h3. Note
> The SQL standard defines a {{RESPECT NULLS}} or {{IGNORE NULLS}} option for 
> {{lead}}, {{lag}}, {{first_value}}, {{last_value}}, and {{nth_value}}. This 
> is not implemented in PostgreSQL: the behavior is always the same as the 
> standard's default, namely {{RESPECT NULLS}}.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28310) ANSI SQL grammar support: first_value/last_value(expression, [RESPECT NULLS | IGNORE NULLS])

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28310:
--
Priority: Minor  (was: Major)

> ANSI SQL grammar support: first_value/last_value(expression, [RESPECT NULLS | 
> IGNORE NULLS])
> 
>
> Key: SPARK-28310
> URL: https://issues.apache.org/jira/browse/SPARK-28310
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Zhu, Lipeng
>Priority: Minor
>
> According to the ANSI SQL 2011:
> {code:sql}
>  ::= 
>  ::= RESPECT NULLS | IGNORE NULLS
>  ::=
> [  treatment>
> ]
>  ::=
> FIRST_VALUE | LAST_VALUE
> {code}
> Teradata - 
> [https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/SUwCpTupqmlBJvi2mipOaA]
>  
> Oracle - 
> [https://docs.oracle.com/en/database/oracle/oracle-database/18/sqlrf/FIRST_VALUE.html#GUID-D454EC3F-370C-4C64-9B11-33FCB10D95EC]
> Redshift – 
> [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_first_value.html]
>  
> Postgresql didn't implement the Ignore/respect nulls. 
> [https://www.postgresql.org/docs/devel/functions-window.html]
> h3. Note
> The SQL standard defines a {{RESPECT NULLS}} or {{IGNORE NULLS}} option for 
> {{lead}}, {{lag}}, {{first_value}}, {{last_value}}, and {{nth_value}}. This 
> is not implemented in PostgreSQL: the behavior is always the same as the 
> standard's default, namely {{RESPECT NULLS}}.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28234) Spark Resources - add python support to get resources

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28234:


Assignee: (was: Apache Spark)

> Spark Resources - add python support to get resources
> -
>
> Key: SPARK-28234
> URL: https://issues.apache.org/jira/browse/SPARK-28234
> Project: Spark
>  Issue Type: Story
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Priority: Major
>
> Add the equivalent python api for sc.resources and TaskContext.resources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28234) Spark Resources - add python support to get resources

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28234:


Assignee: Apache Spark

> Spark Resources - add python support to get resources
> -
>
> Key: SPARK-28234
> URL: https://issues.apache.org/jira/browse/SPARK-28234
> Project: Spark
>  Issue Type: Story
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Apache Spark
>Priority: Major
>
> Add the equivalent python api for sc.resources and TaskContext.resources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28234) Spark Resources - add python support to get resources

2019-07-09 Thread Thomas Graves (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-28234:
-

Assignee: Thomas Graves

> Spark Resources - add python support to get resources
> -
>
> Key: SPARK-28234
> URL: https://issues.apache.org/jira/browse/SPARK-28234
> Project: Spark
>  Issue Type: Story
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> Add the equivalent python api for sc.resources and TaskContext.resources



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28271) Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28271:


Assignee: Apache Spark

> Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base
> 
>
> Key: SPARK-28271
> URL: https://issues.apache.org/jira/browse/SPARK-28271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> see SPARK-27883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28271) Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base

2019-07-09 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28271:


Assignee: (was: Apache Spark)

> Convert and port 'pgSQL/aggregates_part2.sql' into UDF test base
> 
>
> Key: SPARK-28271
> URL: https://issues.apache.org/jira/browse/SPARK-28271
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> see SPARK-27883



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28320) Spark job eventually fails after several "attempted to access non-existent accumulator" in DAGScheduler

2019-07-09 Thread Martin Studer (JIRA)
Martin Studer created SPARK-28320:
-

 Summary: Spark job eventually fails after several "attempted to 
access non-existent accumulator" in DAGScheduler
 Key: SPARK-28320
 URL: https://issues.apache.org/jira/browse/SPARK-28320
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Martin Studer


I'm running into an issue where a Spark 2.3.0 (Hortonworks HDP 2.6.5) job 
eventually fails with
{noformat}
ERROR ApplicationMaster: User application exited with status 1
INFO ApplicationMaster: Final app status: FAILED, exitCode: 1, (reason: User 
application exited with status 1)
INFO SparkContext: Invoking stop() from shutdown hook
{noformat}
after receiving several exception of the form
{noformat}
ERROR DAGScheduler: Failed to update accumulators for task 0
org.apache.spark.SparkException: attempted to access non-existent accumulator 
39052
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1130)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1124)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1207)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1817)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)

{noformat}
In addition to "attempted to access non-existent accumulator" I have also 
noticed some (but much less) instances of "Attempted to access garbage 
collected accumulator":
{noformat}
ERROR DAGScheduler: Failed to update accumulators for task 0
java.lang.IllegalStateException: Attempted to access garbage collected 
accumulator 38352
at 
org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:265)
at 
org.apache.spark.util.AccumulatorContext$$anonfun$get$1.apply(AccumulatorV2.scala:261)
at scala.Option.map(Option.scala:146)
at 
org.apache.spark.util.AccumulatorContext$.get(AccumulatorV2.scala:261)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1127)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$updateAccumulators$1.apply(DAGScheduler.scala:1124)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:1124)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1207)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1817)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1769)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1758)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{noformat}
To provide some more context: This happens in a recursive algorithm implemented 
in pyspark where I leverage data frame checkpointing to truncate the lineage 
graph. Checkpointing is done asynchronously by invoking the count action on a 
different thread when recursing (using Python thread pools).

While "attempted to access garbage collected accumulator" seems to be an 
unexpected (illegal state) exception, it's unclear to me whether "attempted to 
access non-existent accumulator" is an expected exception in some 
circumstances, specifically related to checkpointing.

The issue looks somewhat related to 
https://issues.apache.org/jira/browse/SPARK-22371 but that issue does not 
mention "attempted to access non-existent accumulator".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22158) convertMetastore should not ignore storage properties

2019-07-09 Thread Ruslan Dautkhanov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881437#comment-16881437
 ] 

Ruslan Dautkhanov commented on SPARK-22158:
---

[~dongjoon] can you please check if this causes SPARK-28266 data correctness 
regression? 

Thank you.

> convertMetastore should not ignore storage properties
> -
>
> Key: SPARK-22158
> URL: https://issues.apache.org/jira/browse/SPARK-22158
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.0, 2.2.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> From the beginning, convertMetastoreOrc ignores table properties and use an 
> emtpy map instead. It's the same with convertMetastoreParquet.
> {code}
> val options = Map[String, String]()
> {code}
> - SPARK-14070: 
> https://github.com/apache/spark/pull/11891/files#diff-ee66e11b56c21364760a5ed2b783f863R650
> - master: 
> https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L197



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28319) DataSourceV2: Support SHOW TABLES

2019-07-09 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-28319:
-

 Summary: DataSourceV2: Support SHOW TABLES
 Key: SPARK-28319
 URL: https://issues.apache.org/jira/browse/SPARK-28319
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Ryan Blue


SHOW TABLES needs to support v2 catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28157) Make SHS clear KVStore LogInfo for the blacklisted entries

2019-07-09 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881352#comment-16881352
 ] 

Dongjoon Hyun commented on SPARK-28157:
---

I raised it as a blocker because this causes a missing information in the event 
log listing which is the Spark History Server core feature.

> Make SHS clear KVStore LogInfo for the blacklisted entries
> --
>
> Key: SPARK-28157
> URL: https://issues.apache.org/jira/browse/SPARK-28157
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 3.0.0, 2.4.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> At Spark 2.4.0/2.3.2/2.2.3, SPARK-24948 delegated access permission checks to 
> the file system, and maintains a blacklist for all event log files failed 
> once at reading. The blacklisted log files are released back after 
> CLEAN_INTERVAL_S .
> However, the files whose size don't changes are ignored forever because 
> shouldReloadLog return false always when the size is the same with the value 
> in KVStore. This is recovered only via SHS restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28157) Make SHS clear KVStore LogInfo for the blacklisted entries

2019-07-09 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28157:
--
Priority: Blocker  (was: Major)

> Make SHS clear KVStore LogInfo for the blacklisted entries
> --
>
> Key: SPARK-28157
> URL: https://issues.apache.org/jira/browse/SPARK-28157
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 2.3.3, 2.4.0, 2.4.1, 2.4.2, 3.0.0, 2.4.3
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 2.3.4, 2.4.4, 3.0.0
>
>
> At Spark 2.4.0/2.3.2/2.2.3, SPARK-24948 delegated access permission checks to 
> the file system, and maintains a blacklist for all event log files failed 
> once at reading. The blacklisted log files are released back after 
> CLEAN_INTERVAL_S .
> However, the files whose size don't changes are ignored forever because 
> shouldReloadLog return false always when the size is the same with the value 
> in KVStore. This is recovered only via SHS restart.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >