[jira] [Updated] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-11 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30485:
---
Description: 
Remove the following SQL configs:
 * spark.sql.variable.substitute.depth
 * spark.sql.execution.pandas.respectSessionTimeZone
 * spark.sql.parquet.int64AsTimestampMillis

Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
map:
 
[https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189]

  was:
Remove the following SQL configs:
* spark.sql.variable.substitute.depth
* spark.sql.execution.pandas.respectSessionTimeZone
* spark.sql.parquet.int64AsTimestampMillis
* Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName which 
was deprecated in v2.4

Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
map:
https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189


> Remove SQL configs deprecated before v2.4
> -
>
> Key: SPARK-30485
> URL: https://issues.apache.org/jira/browse/SPARK-30485
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Remove the following SQL configs:
>  * spark.sql.variable.substitute.depth
>  * spark.sql.execution.pandas.respectSessionTimeZone
>  * spark.sql.parquet.int64AsTimestampMillis
> Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
> map:
>  
> [https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-10 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012678#comment-17012678
 ] 

Maxim Gekk commented on SPARK-30485:


[~dongjoon] [~srowen] [~cloud_fan] [~hyukjin.kwon] WDYT of the removing?

> Remove SQL configs deprecated before v2.4
> -
>
> Key: SPARK-30485
> URL: https://issues.apache.org/jira/browse/SPARK-30485
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Remove the following SQL configs:
> * spark.sql.variable.substitute.depth
> * spark.sql.execution.pandas.respectSessionTimeZone
> * spark.sql.parquet.int64AsTimestampMillis
> * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName 
> which was deprecated in v2.4
> Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
> map:
> https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30485) Remove SQL configs deprecated before v2.4

2020-01-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30485:
--

 Summary: Remove SQL configs deprecated before v2.4
 Key: SPARK-30485
 URL: https://issues.apache.org/jira/browse/SPARK-30485
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Remove the following SQL configs:
* spark.sql.variable.substitute.depth
* spark.sql.execution.pandas.respectSessionTimeZone
* spark.sql.parquet.int64AsTimestampMillis
* Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName which 
was deprecated in v2.4

Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs 
map:
https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests

2020-01-10 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30482:
---
Component/s: (was: SQL)

> Add sub-class of AppenderSkeleton reusable in tests
> ---
>
> Key: SPARK-30482
> URL: https://issues.apache.org/jira/browse/SPARK-30482
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Minor
>
> Some tests define similar sub-class of AppenderSkeleton. The code duplication 
> can be eliminated by defining common class in 
> [SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests

2020-01-10 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30482:
--

 Summary: Add sub-class of AppenderSkeleton reusable in tests
 Key: SPARK-30482
 URL: https://issues.apache.org/jira/browse/SPARK-30482
 Project: Spark
  Issue Type: Test
  Components: SQL, Tests
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Some tests define similar sub-class of AppenderSkeleton. The code duplication 
can be eliminated by defining common class in 
[SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30442) Write mode ignored when using CodecStreams

2020-01-07 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009963#comment-17009963
 ] 

Maxim Gekk commented on SPARK-30442:


> This can cause issues, particularly with aws tools, that make it impossible 
>to retry.

Could you clarify how it makes retry impossible. When the mode is set to 
overwrite, Spark deletes entire folder and writes new files - should be no 
clashes. In the append mode, new files are added - Spark does not append to 
existing files. What's the situation when files should be overwritten? 

> Write mode ignored when using CodecStreams
> --
>
> Key: SPARK-30442
> URL: https://issues.apache.org/jira/browse/SPARK-30442
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 2.4.4
>Reporter: Jesse Collins
>Priority: Major
>
> Overwrite is hardcoded to false in the codec stream. This can cause issues, 
> particularly with aws tools, that make it impossible to retry.
> Ideally, this should be read from the write mode set for the DataWriter that 
> is writing through this codec class.
> [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30429) WideSchemaBenchmark fails with OOM

2020-01-06 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009398#comment-17009398
 ] 

Maxim Gekk commented on SPARK-30429:


Bisect have found the first bad commit. I specified the recent master as a bad 
commit and 62551cceebf6aca8b6bd8164cd2ed85564726f6c as the good commit.
{code}
cb5ea201df5fae8aacb653ffb4147b9288bca1e9 is the first bad commit
commit cb5ea201df5fae8aacb653ffb4147b9288bca1e9
Author: Liang-Chi Hsieh 
Date:   Thu Oct 25 19:27:45 2018 +0800

[SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag
...
   Closes #22749 from viirya/SPARK-24762-refactor.

Authored-by: Liang-Chi Hsieh 
Signed-off-by: Wenchen Fan 

:04 04 11961d7665e9097c682cdf6d51163ad4b3ffdf90 
cb82a04e8a2fa1505c2db36c9c6578544e502601 M  sql
bisect run success
{code}
/cc [~cloud_fan] [~viirya]

> WideSchemaBenchmark fails with OOM
> --
>
> Key: SPARK-30429
> URL: https://issues.apache.org/jira/browse/SPARK-30429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
> Attachments: WideSchemaBenchmark_console.txt
>
>
> Run WideSchemaBenchmark on the master (commit 
> bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via:
> {code}
> SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
> org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
> {code}
> This fails with:
> {code}
> Caused by: java.lang.reflect.InvocationTargetException
> [error]   at 
> sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
> [error]   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> [error]   at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467)
> [error]   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> [error]   ... 132 more
> [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> [error]   at java.util.Arrays.copyOfRange(Arrays.java:3664)
> [error]   at java.lang.String.(String.java:207)
> [error]   at java.lang.StringBuilder.toString(StringBuilder.java:407)
> [error]   at 
> org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411)
> [error]   at 
> org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410)
> [error]   at 
> org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown 
> Source)
> {code}
> Full stack dump is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30429) WideSchemaBenchmark fails with OOM

2020-01-06 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009189#comment-17009189
 ] 

Maxim Gekk commented on SPARK-30429:


[~dongjoon] I ran git bisect. Let see what it will find during this night.

> WideSchemaBenchmark fails with OOM
> --
>
> Key: SPARK-30429
> URL: https://issues.apache.org/jira/browse/SPARK-30429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
> Attachments: WideSchemaBenchmark_console.txt
>
>
> Run WideSchemaBenchmark on the master (commit 
> bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via:
> {code}
> SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
> org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
> {code}
> This fails with:
> {code}
> Caused by: java.lang.reflect.InvocationTargetException
> [error]   at 
> sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
> [error]   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> [error]   at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467)
> [error]   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> [error]   ... 132 more
> [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> [error]   at java.util.Arrays.copyOfRange(Arrays.java:3664)
> [error]   at java.lang.String.(String.java:207)
> [error]   at java.lang.StringBuilder.toString(StringBuilder.java:407)
> [error]   at 
> org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411)
> [error]   at 
> org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410)
> [error]   at 
> org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown 
> Source)
> {code}
> Full stack dump is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30429) WideSchemaBenchmark fails with OOM

2020-01-05 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30429:
---
Attachment: WideSchemaBenchmark_console.txt

> WideSchemaBenchmark fails with OOM
> --
>
> Key: SPARK-30429
> URL: https://issues.apache.org/jira/browse/SPARK-30429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
> Attachments: WideSchemaBenchmark_console.txt
>
>
> Run WideSchemaBenchmark on the master (commit 
> bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via:
> {code}
> SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
> org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
> {code}
> This fails with:
> {code}
> Caused by: java.lang.reflect.InvocationTargetException
> [error]   at 
> sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
> [error]   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> [error]   at 
> java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
> [error]   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467)
> [error]   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> [error]   ... 132 more
> [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> [error]   at java.util.Arrays.copyOfRange(Arrays.java:3664)
> [error]   at java.lang.String.(String.java:207)
> [error]   at java.lang.StringBuilder.toString(StringBuilder.java:407)
> [error]   at 
> org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411)
> [error]   at 
> org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410)
> [error]   at 
> org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown 
> Source)
> {code}
> Full stack dump is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30429) WideSchemaBenchmark fails with OOM

2020-01-05 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30429:
--

 Summary: WideSchemaBenchmark fails with OOM
 Key: SPARK-30429
 URL: https://issues.apache.org/jira/browse/SPARK-30429
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Run WideSchemaBenchmark on the master (commit 
bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via:
{code}
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain 
org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark"
{code}
This fails with:
{code}
Caused by: java.lang.reflect.InvocationTargetException
[error] at 
sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
[error] at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[error] at 
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
[error] at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468)
[error] at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)
[error] at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467)
[error] at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
[error] ... 132 more
[error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
[error] at java.util.Arrays.copyOfRange(Arrays.java:3664)
[error] at java.lang.String.(String.java:207)
[error] at java.lang.StringBuilder.toString(StringBuilder.java:407)
[error] at 
org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411)
[error] at 
org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410)
[error] at 
org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown 
Source)
{code}
Full stack dump is attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30416) Log a warning for deprecated SQL config in `set()` and `unset()`

2020-01-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30416:
--

 Summary: Log a warning for deprecated SQL config in `set()` and 
`unset()`
 Key: SPARK-30416
 URL: https://issues.apache.org/jira/browse/SPARK-30416
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


- Gather deprecated SQL configs and add extra info - when a config was 
deprecated and why
- Output warning about deprecated SQL config in set() and unset()



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30412) Eliminate warnings in Java tests regarding to deprecated API

2020-01-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30412:
--

 Summary: Eliminate warnings in Java tests regarding to deprecated 
API
 Key: SPARK-30412
 URL: https://issues.apache.org/jira/browse/SPARK-30412
 Project: Spark
  Issue Type: Sub-task
  Components: Java API, SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Suppress warnings about deprecated Spark API in Java test suites:
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java
Warning:Warning:line (32)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (91)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (100)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (109)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (118)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
{code}
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/Java8DatasetAggregatorSuite.java
Warning:Warning:line (28)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (37)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (46)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (55)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
Warning:Warning:line (64)java: 
org.apache.spark.sql.expressions.javalang.typed in 
org.apache.spark.sql.expressions.javalang has been deprecated
{code}
{code}
/Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java
Warning:Warning:line (478)java: 
json(org.apache.spark.api.java.JavaRDD) in 
org.apache.spark.sql.DataFrameReader has been deprecated
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30174) Eliminate warnings :part 4

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006953#comment-17006953
 ] 

Maxim Gekk commented on SPARK-30174:


[~shivuson...@gmail.com] Are you still working on this? If so, could you write 
in the ticket how are going to fix the warnings, please.

> Eliminate warnings :part 4
> --
>
> Key: SPARK-30174
> URL: https://issues.apache.org/jira/browse/SPARK-30174
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: jobit mathew
>Priority: Minor
>
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala
> {code:java}
> Warning:Warning:line (127)value ENABLE_JOB_SUMMARY in class 
> ParquetOutputFormat is deprecated: see corresponding Javadoc for more 
> information.
>   && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) {
> Warning:Warning:line (261)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
> new org.apache.parquet.hadoop.ParquetInputSplit(
> Warning:Warning:line (272)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
> ParquetFileReader.readFooter(sharedConf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> Warning:Warning:line (442)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
>   ParquetFileReader.readFooter(
> {code}
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala
> {code:java}
>  Warning:Warning:line (91)value ENABLE_JOB_SUMMARY in class 
> ParquetOutputFormat is deprecated: see corresponding Javadoc for more 
> information.
>   && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) {
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30172) Eliminate warnings: part3

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006952#comment-17006952
 ] 

Maxim Gekk commented on SPARK-30172:


[~Ankitraj] Are you still working on this?

> Eliminate warnings: part3
> -
>
> Key: SPARK-30172
> URL: https://issues.apache.org/jira/browse/SPARK-30172
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> /sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala
> Warning:Warning:line (422)method initialize in class AbstractSerDe is 
> deprecated: see corresponding Javadoc for more information.
> serde.initialize(null, properties)
> /sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala
> Warning:Warning:line (216)method initialize in class GenericUDTF is 
> deprecated: see corresponding Javadoc for more information.
>   protected lazy val outputInspector = 
> function.initialize(inputInspectors.toArray)
> Warning:Warning:line (342)class UDAF in package exec is deprecated: see 
> corresponding Javadoc for more information.
>   new GenericUDAFBridge(funcWrapper.createFunction[UDAF]())
> Warning:Warning:line (503)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> def serialize(buffer: AggregationBuffer): Array[Byte] = {
> Warning:Warning:line (523)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> def deserialize(bytes: Array[Byte]): AggregationBuffer = {
> Warning:Warning:line (538)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean)
> Warning:Warning:line (538)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
> case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean)
> /sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java
> Warning:Warning:line (44)java: getTypes() in org.apache.orc.Reader has 
> been deprecated
> Warning:Warning:line (47)java: getTypes() in org.apache.orc.Reader has 
> been deprecated
> /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala
> Warning:Warning:line (2,368)method readFooter in class ParquetFileReader 
> is deprecated: see corresponding Javadoc for more information.
> val footer = ParquetFileReader.readFooter(
> /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala
> Warning:Warning:line (202)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def getNewAggregationBuffer: AggregationBuffer = new 
> MockUDAFBuffer(0L, 0L)
> Warning:Warning:line (204)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def reset(agg: AggregationBuffer): Unit = {
> Warning:Warning:line (212)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def iterate(agg: AggregationBuffer, parameters: Array[AnyRef]): 
> Unit = {
> Warning:Warning:line (221)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def merge(agg: AggregationBuffer, partial: Object): Unit = {
> Warning:Warning:line (231)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def terminatePartial(agg: AggregationBuffer): AnyRef = {
> Warning:Warning:line (236)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def terminate(agg: AggregationBuffer): AnyRef = 
> terminatePartial(agg)
> Warning:Warning:line (257)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def getNewAggregationBuffer: AggregationBuffer = {
> Warning:Warning:line (266)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def reset(agg: AggregationBuffer): Unit = {
> Warning:Warning:line (277)trait AggregationBuffer in class 
> GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more 
> information.
>   override def iterate(agg: AggregationBuffer, parameters: Arr

[jira] [Commented] (SPARK-30171) Eliminate warnings: part2

2020-01-02 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006949#comment-17006949
 ] 

Maxim Gekk commented on SPARK-30171:


[~srowen] SPARK-30258 fixes warnings AvroFunctionsSuite.scala but not in 
parsedOptions.ignoreExtension . I am not sure how we can avoid warnings related 
to ignoreExtension.

> Eliminate warnings: part2
> -
>
> Key: SPARK-30171
> URL: https://issues.apache.org/jira/browse/SPARK-30171
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>
> AvroFunctionsSuite.scala
> Warning:Warning:line (41)method to_avro in package avro is deprecated (since 
> 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' instead.
> val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b"))
> Warning:Warning:line (41)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b"))
> Warning:Warning:line (54)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, 
> avroTypeStr)), df)
> Warning:Warning:line (54)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, 
> avroTypeStr)), df)
> Warning:Warning:line (59)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroStructDF = df.select(to_avro('struct).as("avro"))
> Warning:Warning:line (70)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
> checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df)
> Warning:Warning:line (76)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val avroStructDF = df.select(to_avro('struct).as("avro"))
> Warning:Warning:line (118)method to_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' 
> instead.
> val readBackOne = dfOne.select(to_avro($"array").as("avro"))
> Warning:Warning:line (119)method from_avro in package avro is deprecated 
> (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' 
> instead.
>   .select(from_avro($"avro", avroTypeArrStruct).as("array"))
> AvroPartitionReaderFactory.scala
> Warning:Warning:line (64)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
> if (parsedOptions.ignoreExtension || 
> partitionedFile.filePath.endsWith(".avro")) {
> AvroFileFormat.scala
> Warning:Warning:line (98)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
>   if (parsedOptions.ignoreExtension || file.filePath.endsWith(".avro")) {
> AvroUtils.scala
> Warning:Warning:line (55)value ignoreExtension in class AvroOptions is 
> deprecated (since 3.0): Use the general data source option pathGlobFilter for 
> filtering file names
> inferAvroSchemaFromFiles(files, conf, parsedOptions.ignoreExtension,



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30409) Use `NoOp` datasource in SQL benchmarks

2020-01-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30409:
--

 Summary: Use `NoOp` datasource in SQL benchmarks
 Key: SPARK-30409
 URL: https://issues.apache.org/jira/browse/SPARK-30409
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, SQL benchmarks use `count()`, `collect()` and `foreach(_ => ())` 
actions. The actions have additional overhead. For example, `collect()` 
converts column values to external type values and pull data on the driver. 
Need to unify benchmark and the `NoOp` datasource except the benchmarks for 
`count()` or `collect()`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30401) Call requireNonStaticConf() only once

2020-01-01 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006392#comment-17006392
 ] 

Maxim Gekk commented on SPARK-30401:


I am working on it

> Call requireNonStaticConf() only once
> -
>
> Key: SPARK-30401
> URL: https://issues.apache.org/jira/browse/SPARK-30401
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Trivial
>
> The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the 
> same input:
> 1. Inside of set(, true)
> 2. set() converts the second argument to a string and calls set(, 
> "true") where requireNonStaticConf() is invoked one more time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30401) Call requireNonStaticConf() only once

2020-01-01 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30401:
--

 Summary: Call requireNonStaticConf() only once
 Key: SPARK-30401
 URL: https://issues.apache.org/jira/browse/SPARK-30401
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the 
same input:
1. Inside of set(, true)
2. set() converts the second argument to a string and calls set(, 
"true") where requireNonStaticConf() is invoked one more time



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30323) Support filters pushdown in CSV datasource

2019-12-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30323:
--

 Summary: Support filters pushdown in CSV datasource
 Key: SPARK-30323
 URL: https://issues.apache.org/jira/browse/SPARK-30323
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


- Implement the `SupportsPushDownFilters` interface in `CSVScanBuilder`
- Apply filters in UnivocityParser
- Change API UnivocityParser - return Seq[InternalRow] from `convert()`
- Update CSVBenchmark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30309) Mark `Filter` as a `sealed` class

2019-12-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30309:
--

 Summary: Mark `Filter` as a `sealed` class
 Key: SPARK-30309
 URL: https://issues.apache.org/jira/browse/SPARK-30309
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Add the `sealed` keyword to the `Filter` class at the 
`org.apache.spark.sql.sources` package. So, the compiler should output a 
warning if handling of a filter is missed in a datasource:
{code}
Warning:(154, 65) match may not be exhaustive.
It would fail on the following inputs: AlwaysFalse(), AlwaysTrue()
def translate(filter: sources.Filter): Option[Expression] = filter match {
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30258) Eliminate warnings of deprecated Spark APIs in tests

2019-12-13 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30258:
---
Summary: Eliminate warnings of deprecated Spark APIs in tests  (was: 
Eliminate warnings of depracted Spark APIs in tests)

> Eliminate warnings of deprecated Spark APIs in tests
> 
>
> Key: SPARK-30258
> URL: https://issues.apache.org/jira/browse/SPARK-30258
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Suppress deprecation warnings in tests that check deprecated Spark APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30258) Eliminate warnings of depracted Spark APIs in tests

2019-12-13 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30258:
--

 Summary: Eliminate warnings of depracted Spark APIs in tests
 Key: SPARK-30258
 URL: https://issues.apache.org/jira/browse/SPARK-30258
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Suppress deprecation warnings in tests that check deprecated Spark APIs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30168) Eliminate warnings in Parquet datasource

2019-12-13 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995754#comment-16995754
 ] 

Maxim Gekk commented on SPARK-30168:


[~Ankitraj] Go ahead.

> Eliminate warnings in Parquet datasource
> 
>
> Key: SPARK-30168
> URL: https://issues.apache.org/jira/browse/SPARK-30168
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> # 
> sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
> {code}
> Warning:Warning:line (120)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] 
> = {
> Warning:Warning:line (125)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   new org.apache.parquet.hadoop.ParquetInputSplit(
> Warning:Warning:line (134)method readFooter in class ParquetFileReader is 
> deprecated: see corresponding Javadoc for more information.
>   ParquetFileReader.readFooter(conf, filePath, 
> SKIP_ROW_GROUPS).getFileMetaData
> Warning:Warning:line (183)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   split: ParquetInputSplit,
> Warning:Warning:line (212)class ParquetInputSplit in package hadoop is 
> deprecated: see corresponding Javadoc for more information.
>   split: ParquetInputSplit,
> {code}
> # 
> sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
> {code}
> Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in 
> org.apache.parquet.hadoop has been deprecated
> Warning:Warning:line (95)java: 
> org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has 
> been deprecated
> Warning:Warning:line (95)java: 
> org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has 
> been deprecated
> Warning:Warning:line (97)java: getRowGroupOffsets() in 
> org.apache.parquet.hadoop.ParquetInputSplit has been deprecated
> Warning:Warning:line (105)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (108)java: 
> filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType)
>  in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated
> Warning:Warning:line (111)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (147)java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (203)java: 
> readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> Warning:Warning:line (226)java: 
> ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
>  in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
> {code}
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
> # 
> sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala
> # 
> sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30165) Eliminate compilation warnings

2019-12-12 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994806#comment-16994806
 ] 

Maxim Gekk commented on SPARK-30165:


> Are you sure on these?

I am almost sure we can fix Parquet and Kafka related warnings. Not sure about 
warnings coming from deprecated Spark API. Maybe it is possible to suppress 
such warnings in tests. In any case, we know in advance that we test deprecated 
API. Such warnings don't guard us from mistakes.

I quickly googled and found this 
[https://github.com/scala/bug/issues/7934#issuecomment-292425679] . Maybe we 
can use the approach in tests.

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30170:
---
Description: 
Eliminate compilation warnings in:
 # StopWordsRemoverSuite
{code:java}
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
{code}

 # MLTest.scala
{code:java}
Warning:Warning:line (88)match may not be exhaustive.
It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
val n = Attribute.fromStructField(dataframe.schema(colName)) match {
{code}

 # FloatType.scala
{code:java}
Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 
2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
{code}

 # AnalysisExternalCatalogSuite.scala
{code:java}
Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
deprecated: see corresponding Javadoc for more information.
  verifyZeroInteractions(catalog)
{code}

 # CSVExprUtilsSuite.scala
{code:java}
Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
instead.
("\0", Some("\u"), None)
{code}

 # CollectionExpressionsSuite.scala, HashExpressionsSuite.scala, 
ExpressionParserSuite.scala
{code:java}
Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be 
enabled
by making the implicit value scala.language.implicitConversions visible.
This can be achieved by adding the import clause 'import 
scala.language.implicitConversions'
or by setting the compiler option -language:implicitConversions.
See the Scaladoc for value scala.language.implicitConversions for a discussion
why the feature should be explicitly enabled.

[jira] [Commented] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990989#comment-16990989
 ] 

Maxim Gekk commented on SPARK-30170:


I am working on this

> Eliminate warnings: part 1
> --
>
> Key: SPARK-30170
> URL: https://issues.apache.org/jira/browse/SPARK-30170
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Eliminate compilation warnings in:
> # StopWordsRemoverSuite
> {code}
> Warning:Warning:line (245)non-variable type argument String in type pattern 
> Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (245)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> Warning:Warning:line (271)non-variable type argument String in type 
> pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is 
> eliminated by erasure
> case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
> Seq[String]) =>
> {code}
> # MLTest.scala
> {code}
> Warning:Warning:line (88)match may not be exhaustive.
> It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
> val n = Attribute.fromStructField(dataframe.schema(colName)) match {
> {code}
> # FloatType.scala
> {code}
> Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
> BigDecimal(y)).floatValue
> Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
> BigDecimal(y)).floatValue
> Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
> BigDecimal(y)).floatValue
> Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
> (since 2.11.0): The default conversion from Float may not do what you want. 
> Use BigDecimal.decimal for a String representation, or explicitly convert the 
> Float with .toDouble.
> def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
> BigDecimal(y)).floatValue
> {code}
> # AnalysisExternalCatalogSuite.scala
> {code}
> Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
> deprecated: see corresponding Javadoc for more information.
>   verifyZeroInteractions(catalog)
> {code}
> # CSVExprUtilsSuite.scala
> {code}
> Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
> instead.
> ("\0", Some("\u"), None)
> {c

[jira] [Created] (SPARK-30170) Eliminate warnings: part 1

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30170:
--

 Summary: Eliminate warnings: part 1
 Key: SPARK-30170
 URL: https://issues.apache.org/jira/browse/SPARK-30170
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Eliminate compilation warnings in:
# StopWordsRemoverSuite
{code}
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (245)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
Warning:Warning:line (271)non-variable type argument String in type pattern 
Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated 
by erasure
case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: 
Seq[String]) =>
{code}
# MLTest.scala
{code}
Warning:Warning:line (88)match may not be exhaustive.
It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute
val n = Attribute.fromStructField(dataframe.schema(colName)) match {
{code}
# FloatType.scala
{code}
Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 
2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (81)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def quot(x: Float, y: Float): Float = (BigDecimal(x) quot 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
Warning:Warning:line (82)method apply in object BigDecimal is deprecated 
(since 2.11.0): The default conversion from Float may not do what you want. Use 
BigDecimal.decimal for a String representation, or explicitly convert the Float 
with .toDouble.
def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder 
BigDecimal(y)).floatValue
{code}
# AnalysisExternalCatalogSuite.scala
{code}
Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is 
deprecated: see corresponding Javadoc for more information.
  verifyZeroInteractions(catalog)
{code}
# CSVExprUtilsSuite.scala
{code}
Warning:Warning:line (81)Octal escape literals are deprecated, use \u 
instead.
("\0", Some("\u"), None)
{code}
# CollectionExpressionsSuite.scala, ashExpressionsSuite.scala, 
ExpressionParserSuite.scala 
{code}
Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be 
enabled
by making the implicit value scala.language.implicitConversions visible.
This can be achieved by adding the import clause 'import 
scala.language.implicitConversions'
or by setting the compiler option -language:implicitConversions.
See the

[jira] [Created] (SPARK-30169) Eliminate warnings in Kafka connector

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30169:
--

 Summary: Eliminate warnings in Kafka connector
 Key: SPARK-30169
 URL: https://issues.apache.org/jira/browse/SPARK-30169
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Eliminate compilation warnings in the files:
{code}
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala
external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala
external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30166) Eliminate warnings in JSONOptions

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30166:
---
Summary: Eliminate warnings in JSONOptions  (was: Eliminate compilation 
warnings in JSONOptions)

> Eliminate warnings in JSONOptions
> -
>
> Key: SPARK-30166
> URL: https://issues.apache.org/jira/browse/SPARK-30166
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Scala 2.12 outputs the following warnings for JSONOptions:
> {code}
> sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
> Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, 
> allowNumericLeadingZeros)
> Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, 
> allowNonNumericNumbers)
> Warning:Warning:line (139)Java enum 
> ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: 
> see corresponding Javadoc for more information.
> 
> factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER,
> Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java 
> enum Feature is deprecated: see corresponding Javadoc for more information.
> factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, 
> allowUnquotedControlChars)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30168) Eliminate warnings in Parquet datasource

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30168:
--

 Summary: Eliminate warnings in Parquet datasource
 Key: SPARK-30168
 URL: https://issues.apache.org/jira/browse/SPARK-30168
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


# 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala
{code}
Warning:Warning:line (120)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] = {
Warning:Warning:line (125)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  new org.apache.parquet.hadoop.ParquetInputSplit(
Warning:Warning:line (134)method readFooter in class ParquetFileReader is 
deprecated: see corresponding Javadoc for more information.
  ParquetFileReader.readFooter(conf, filePath, 
SKIP_ROW_GROUPS).getFileMetaData
Warning:Warning:line (183)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  split: ParquetInputSplit,
Warning:Warning:line (212)class ParquetInputSplit in package hadoop is 
deprecated: see corresponding Javadoc for more information.
  split: ParquetInputSplit,
{code}
# 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java
{code}
Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in 
org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit 
in org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit 
in org.apache.parquet.hadoop has been deprecated
Warning:Warning:line (97)java: getRowGroupOffsets() in 
org.apache.parquet.hadoop.ParquetInputSplit has been deprecated
Warning:Warning:line (105)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (108)java: 
filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType)
 in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated
Warning:Warning:line (111)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (147)java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (203)java: 
readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
Warning:Warning:line (226)java: 
ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List)
 in org.apache.parquet.hadoop.ParquetFileReader has been deprecated
{code}
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala
# 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala
# sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990925#comment-16990925
 ] 

Maxim Gekk commented on SPARK-30165:


[~aman_omer] Feel free to take a sub-set of warnings and create a sub-task to 
fix them.

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Component/s: (was: Build)
 SQL

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30166) Eliminate compilation warnings in JSONOptions

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30166:
--

 Summary: Eliminate compilation warnings in JSONOptions
 Key: SPARK-30166
 URL: https://issues.apache.org/jira/browse/SPARK-30166
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Scala 2.12 outputs the following warnings for JSONOptions:

{code}
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala
Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java 
enum Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, 
allowNumericLeadingZeros)
Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum 
Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, 
allowNonNumericNumbers)
Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER 
in Java enum Feature is deprecated: see corresponding Javadoc for more 
information.
factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER,
Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java 
enum Feature is deprecated: see corresponding Javadoc for more information.
factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, 
allowUnquotedControlChars)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Description: This is an umbrella ticket for sub-tasks for eliminating 
compilation warnings.  I dumped all warnings to the spark_warnings.txt file 
attached to the ticket.  (was: This is an umbrella ticket for sub-tasks for 
eliminating compilation warnings. )

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings.  I dumped all warnings to the spark_warnings.txt file attached to 
> the ticket.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-30165:
---
Attachment: spark_warnings.txt

> Eliminate compilation warnings
> --
>
> Key: SPARK-30165
> URL: https://issues.apache.org/jira/browse/SPARK-30165
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: spark_warnings.txt
>
>
> This is an umbrella ticket for sub-tasks for eliminating compilation 
> warnings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30165) Eliminate compilation warnings

2019-12-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-30165:
--

 Summary: Eliminate compilation warnings
 Key: SPARK-30165
 URL: https://issues.apache.org/jira/browse/SPARK-30165
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 3.0.0
Reporter: Maxim Gekk


This is an umbrella ticket for sub-tasks for eliminating compilation warnings. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29963) Check formatting timestamps up to microsecond precision by JSON/CSV datasource

2019-11-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29963:
--

 Summary: Check formatting timestamps up to microsecond precision 
by JSON/CSV datasource
 Key: SPARK-29963
 URL: https://issues.apache.org/jira/browse/SPARK-29963
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Port tests added for 2.4 by the commit: 
https://github.com/apache/spark/commit/47cb1f359af62383e24198dbbaa0b4503348cd04



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29949) JSON/CSV formats timestamps incorrectly

2019-11-18 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29949:
--

 Summary: JSON/CSV formats timestamps incorrectly
 Key: SPARK-29949
 URL: https://issues.apache.org/jira/browse/SPARK-29949
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


For example:
{code}
scala> val t = java.sql.Timestamp.valueOf("2019-11-18 11:56:00.123456")
t: java.sql.Timestamp = 2019-11-18 11:56:00.123456
scala> Seq(t).toDF("t").select(to_json(struct($"t"), Map("timestampFormat" -> 
"-MM-dd HH:mm:ss.SS"))).show(false)
+-+
|structstojson(named_struct(NamePlaceholder(), t))|
+-+
|{"t":"2019-11-18 11:56:00.000123"}   |
+-+
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106
 ] 

Maxim Gekk edited comment on SPARK-29758 at 11/17/19 6:17 PM:
--

Another solution is to disable this optimization: 
[https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478]


was (Author: maxgekk):
Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {

[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106
 ] 

Maxim Gekk commented on SPARK-29758:


Another solution is to remove this optimization: 
https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result_with_prefix
> res64: Int = 27

[jira] [Commented] (SPARK-29575) from_json can produce nulls for fields which are marked as non-nullable

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976102#comment-16976102
 ] 

Maxim Gekk commented on SPARK-29575:


This is intentional behavior. User's schema is forcibly set as nullable. See 
SPARK-23173  

> from_json can produce nulls for fields which are marked as non-nullable
> ---
>
> Key: SPARK-29575
> URL: https://issues.apache.org/jira/browse/SPARK-29575
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.4
>Reporter: Victor Lopez
>Priority: Major
>
> I believe this issue was resolved elsewhere 
> (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this 
> bug seems to still be there.
> The issue appears when using {{from_json}} to parse a column in a Spark 
> dataframe. It seems like {{from_json}} ignores whether the schema provided 
> has any {{nullable:False}} property.
> {code:java}
> schema = T.StructType().add(T.StructField('id', T.LongType(), 
> nullable=False)).add(T.StructField('name', T.StringType(), nullable=False))
> data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': 
> 'jane'})}]
> df = spark.read.json(sc.parallelize(data))
> df.withColumn("details", F.from_json("user", 
> schema)).select("details.*").show()
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29758) json_tuple truncates fields

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976099#comment-16976099
 ] 

Maxim Gekk commented on SPARK-29758:


I have reproduced the issue on 2.4. The problem is in Jackson core 2.6.7. It 
was fixed by 
https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200#diff-7990edc67621822770cdc62e12d933d4R647-R650
 in the version 2.7.7. We could try to back port this 
https://github.com/apache/spark/pull/21596 on 2.4. [~hyukjin.kwon] WDYT? 

> json_tuple truncates fields
> ---
>
> Key: SPARK-29758
> URL: https://issues.apache.org/jira/browse/SPARK-29758
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.4
> Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave 
> 10.14.3, Spark 2.4.4)
> Jdk 8, Scala 2.11.12
>Reporter: Stanislav
>Priority: Major
>
> `json_tuple` has inconsistent behaviour with `from_json` - but only if json 
> string is longer than 2700 characters or so.
> This can be reproduced in spark-shell and on cluster, but not in scalatest, 
> for some reason.
> {code}
> import org.apache.spark.sql.functions.{from_json, json_tuple}
> import org.apache.spark.sql.types._
> val counterstring = 
> "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*"
> val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("result", json_tuple('json, "test"))
>   .select('result)
>   .as[String].head.length
> val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json")
>   .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", 
> StringType)
>   .withColumn("result", $"parsed.test")
>   .select('result)
>   .as[String].head.length
> scala> json_tuple_result
> res62: Int = 2791
> scala> from_json_result
> res63: Int = 2800
> {code}
> Result is influenced by the total length of the json string at the moment of 
> parsing:
> {code}
> val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", 
> "test":"$counterstring"}""").toDF("jso

[jira] [Updated] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29933:
---
Attachment: filter_tests.patch

> ThriftServerQueryTestSuite runs tests with wrong settings
> -
>
> Key: SPARK-29933
> URL: https://issues.apache.org/jira/browse/SPARK-29933
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
> Attachments: filter_tests.patch
>
>
> ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
> keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
> the PostgreSQL dialect. See 
> https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings

2019-11-17 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29933:
--

 Summary: ThriftServerQueryTestSuite runs tests with wrong settings
 Key: SPARK-29933
 URL: https://issues.apache.org/jira/browse/SPARK-29933
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it 
keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in 
the PostgreSQL dialect. See 
https://github.com/apache/spark/pull/26473#issuecomment-554510643



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-17 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975944#comment-16975944
 ] 

Maxim Gekk commented on SPARK-29931:


> It's conceivable there could a reason to do it later, or sooner.

Later is not problem what about sooner. Most of the configs were added for 
Spark 3.0. If you decide to remove one of them in a minor release between 3.0 
and 4.0, you can break user apps that is unacceptable for minor releases, I do 
believe.

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-16 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975813#comment-16975813
 ] 

Maxim Gekk commented on SPARK-29931:


[~rxin] [~lixiao] [~srowen] [~dongjoon] [~cloud_fan] [~hyukjin.kwon] Does this 
make sense for you?

> Declare all SQL legacy configs as will be removed in Spark 4.0
> --
>
> Key: SPARK-29931
> URL: https://issues.apache.org/jira/browse/SPARK-29931
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> Add the sentence to descriptions of all legacy SQL configs existed before 
> Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of 
> such configs:
> * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
> * spark.sql.legacy.literal.pickMinimumPrecision
> * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
> * spark.sql.legacy.sizeOfNull
> * spark.sql.legacy.replaceDatabricksSparkAvro.enabled
> * spark.sql.legacy.setopsPrecedence.enabled
> * spark.sql.legacy.integralDivide.returnBigint
> * spark.sql.legacy.bucketedTableScan.outputOrdering
> * spark.sql.legacy.parser.havingWithoutGroupByAsWhere
> * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
> * spark.sql.legacy.setCommandRejectsSparkCoreConfs
> * spark.sql.legacy.utcTimestampFunc.enabled
> * spark.sql.legacy.typeCoercion.datetimeToString
> * spark.sql.legacy.looseUpcast
> * spark.sql.legacy.ctePrecedence.enabled
> * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29931:
--

 Summary: Declare all SQL legacy configs as will be removed in 
Spark 4.0
 Key: SPARK-29931
 URL: https://issues.apache.org/jira/browse/SPARK-29931
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Add the sentence to descriptions of all legacy SQL configs existed before Spark 
3.0: "This config will be removed in Spark 4.0.". Here is the list of such 
configs:
* spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName
* spark.sql.legacy.literal.pickMinimumPrecision
* spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation
* spark.sql.legacy.sizeOfNull
* spark.sql.legacy.replaceDatabricksSparkAvro.enabled
* spark.sql.legacy.setopsPrecedence.enabled
* spark.sql.legacy.integralDivide.returnBigint
* spark.sql.legacy.bucketedTableScan.outputOrdering
* spark.sql.legacy.parser.havingWithoutGroupByAsWhere
* spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue
* spark.sql.legacy.setCommandRejectsSparkCoreConfs
* spark.sql.legacy.utcTimestampFunc.enabled
* spark.sql.legacy.typeCoercion.datetimeToString
* spark.sql.legacy.looseUpcast
* spark.sql.legacy.ctePrecedence.enabled
* spark.sql.legacy.arrayExistsFollowsThreeValuedLogic



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29930:
--

 Summary: Remove SQL configs declared to be removed in Spark 3.0
 Key: SPARK-29930
 URL: https://issues.apache.org/jira/browse/SPARK-29930
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Need to remove the following SQL configs:
* spark.sql.fromJsonForceNullableSchema
* spark.sql.legacy.compareDateTimestampInTimestamp



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29928) Check parsing timestamps up to microsecond precision by JSON/CSV datasource

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29928:
--

 Summary: Check parsing timestamps up to microsecond precision by 
JSON/CSV datasource
 Key: SPARK-29928
 URL: https://issues.apache.org/jira/browse/SPARK-29928
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Port tests added for 2.4 by the commit: 
https://github.com/apache/spark/commit/9c7e8be1dca8285296f3052c41f35043699d7d10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources

2019-11-16 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29904:
---
Affects Version/s: 2.4.0
   2.4.1
   2.4.2
   2.4.3

> Parse timestamps in microsecond precision by JSON/CSV datasources
> -
>
> Key: SPARK-29904
> URL: https://issues.apache.org/jira/browse/SPARK-29904
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.5
>
>
> Currently, Spark can parse strings with timestamps from JSON/CSV in 
> millisecond precision. Internally, timestamps have microsecond precision. The 
> ticket aims to modify parsing logic in Spark 2.4 to support the microsecond 
> precision. Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview 
> is risky, so, need to find another lighter solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`

2019-11-16 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975697#comment-16975697
 ] 

Maxim Gekk commented on SPARK-29927:


[~cloud_fan] WDYT, does it make sense to change the functions as well?

> Parse timestamps in microsecond precision by `to_timestamp`, 
> `to_unix_timestamp`, `unix_timestamp`
> --
>
> Key: SPARK-29927
> URL: https://issues.apache.org/jira/browse/SPARK-29927
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Major
>
> Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` 
> functions uses SimpleDateFormat to parse strings to timestamps. 
> SimpleDateFormat is able to parse only in millisecond precision if an user 
> specified `SSS` in a pattern. The ticket aims to support parsing up to the 
> microsecond precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`

2019-11-16 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29927:
--

 Summary: Parse timestamps in microsecond precision by 
`to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
 Key: SPARK-29927
 URL: https://issues.apache.org/jira/browse/SPARK-29927
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` functions 
uses SimpleDateFormat to parse strings to timestamps. SimpleDateFormat is able 
to parse only in millisecond precision if an user specified `SSS` in a pattern. 
The ticket aims to support parsing up to the microsecond precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29920) Parsing failure on interval '20 15' day to hour

2019-11-15 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29920:
--

 Summary: Parsing failure on interval '20 15' day to hour
 Key: SPARK-29920
 URL: https://issues.apache.org/jira/browse/SPARK-29920
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk



{code:sql}
spark-sql> select interval '20 15' day to hour;
Error in query:
requirement failed: Interval string must match day-time format of 'd h:m:s.n': 
20 15(line 1, pos 16)

== SQL ==
select interval '20 15' day to hour
^^^
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources

2019-11-14 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29904:
--

 Summary: Parse timestamps in microsecond precision by JSON/CSV 
datasources
 Key: SPARK-29904
 URL: https://issues.apache.org/jira/browse/SPARK-29904
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, Spark can parse strings with timestamps from JSON/CSV in millisecond 
precision. Internally, timestamps have microsecond precision. The ticket aims 
to modify parsing logic in Spark 2.4 to support the microsecond precision. 
Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview is risky, 
so, need to find another lighter solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29866) Upper case enum values

2019-11-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29866:
--

 Summary: Upper case enum values
 Key: SPARK-29866
 URL: https://issues.apache.org/jira/browse/SPARK-29866
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Unify naming of enum values and upper case their names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29864) Strict parsing of day-time strings to intervals

2019-11-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29864:
--

 Summary: Strict parsing of day-time strings to intervals
 Key: SPARK-29864
 URL: https://issues.apache.org/jira/browse/SPARK-29864
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Currently, the IntervalUtils.fromDayTimeString() method does not takes into 
account the left bound `from` and truncates the result using the right bound 
`to`. The method should respect to the bounds specified by an user.

Oracle and MySQL respect to user's bounds, see 
https://github.com/apache/spark/pull/26358#issuecomment-551942719 and 
https://github.com/apache/spark/pull/26358#issuecomment-549272475 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29819) Introduce an enum for interval units

2019-11-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29819:
--

 Summary: Introduce an enum for interval units
 Key: SPARK-29819
 URL: https://issues.apache.org/jira/browse/SPARK-29819
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Add enum for interval units. This will allow to type check inputs and to avoid 
typos in interval unit names.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29385) Make `INTERVAL` values comparable

2019-11-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk resolved SPARK-29385.

Fix Version/s: 3.0.0
   Resolution: Fixed

Resolved by the PR: https://github.com/apache/spark/pull/26337

> Make `INTERVAL` values comparable
> -
>
> Key: SPARK-29385
> URL: https://issues.apache.org/jira/browse/SPARK-29385
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> PostgreSQL allows to compare interval by `=`, `<>`, `<`, `<=`, `>`, `>=`. For 
> example:
> {code}
> maxim=# select interval '1 month' > interval '29 days';
>  ?column? 
> --
>  t
> {code}
> but the same fails in Spark:
> {code}
> spark-sql> select interval 1 month > interval 29 days;
> Error in query: cannot resolve '(interval 1 months > interval 4 weeks 1 
> days)' due to data type mismatch: GreaterThan does not support ordering on 
> type interval; line 1 pos 7;
> 'Project [unresolvedalias((interval 1 months > interval 4 weeks 1 days), 
> None)]
> +- OneRowRelation
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29408) Support interval literal with negative sign `-`

2019-11-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29408:
---
Description: 
For example:
{code}
maxim=# select -interval '1 day -1 hour';
 ?column?
---
 -1 days +01:00:00
(1 row)

maxim=# select - interval '1-2' AS "negative year-month";
 negative year-month 
-
 -1 years -2 mons
(1 row)
{code}

  was:
For example:
{code}
maxim=# select - interval '1-2' AS "negative year-month";
 negative year-month 
-
 -1 years -2 mons
(1 row)
{code}


> Support interval literal with negative sign `-`
> ---
>
> Key: SPARK-29408
> URL: https://issues.apache.org/jira/browse/SPARK-29408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code}
> maxim=# select -interval '1 day -1 hour';
>  ?column?
> ---
>  -1 days +01:00:00
> (1 row)
> maxim=# select - interval '1-2' AS "negative year-month";
>  negative year-month 
> -
>  -1 years -2 mons
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29750) Avoid dependency from joda-time

2019-11-04 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29750:
--

 Summary: Avoid dependency from joda-time
 Key: SPARK-29750
 URL: https://issues.apache.org/jira/browse/SPARK-29750
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 2.4.4
Reporter: Maxim Gekk


* Remove direct dependency from joda-time
* If it is used somewhere in Spark, use Java 8 time API instead of it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29736) Improve stability of tests for special datetime values

2019-11-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29736:
--

 Summary: Improve stability of tests for special datetime values
 Key: SPARK-29736
 URL: https://issues.apache.org/jira/browse/SPARK-29736
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


The test can fail around midnight if reference values are taken before midnight 
and tested code resolves special values after midnight.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29733) Wrong order of assertEquals parameters

2019-11-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29733:
--

 Summary: Wrong order of assertEquals parameters
 Key: SPARK-29733
 URL: https://issues.apache.org/jira/browse/SPARK-29733
 Project: Spark
  Issue Type: Test
  Components: ML, Spark Core, SQL, Structured Streaming
Affects Versions: 2.4.4
Reporter: Maxim Gekk


The assertEquals() requires the expected value as the first parameter, for 
instance: 
https://junit.org/junit4/javadoc/4.12/org/junit/Assert.html#assertEquals(long,%20long)
but in some places the expected value is passed as the second parameter that 
confuses when such assert fails.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29723) Get date and time parts of an interval as java classes

2019-11-02 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29723:
--

 Summary: Get date and time parts of an interval as java classes
 Key: SPARK-29723
 URL: https://issues.apache.org/jira/browse/SPARK-29723
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Taking into account that an instances of CalendarInterval can be returned to 
users as the result of collect or in UDF, it could be convenient for the users 
to get parts of interval as Java classes. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29712) fromDayTimeString() does not take into account the left bound

2019-11-01 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29712:
--

 Summary: fromDayTimeString() does not take into account the left 
bound
 Key: SPARK-29712
 URL: https://issues.apache.org/jira/browse/SPARK-29712
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: Maxim Gekk


Currently, fromDayTimeString() takes into account the right bound but not the 
left one. For example:
{code}
spark-sql> SELECT interval '1 2:03:04' hour to minute;
interval 1 days 2 hours 3 minutes
{code}
The result should be *interval 2 hours 3 minutes*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp

2019-10-30 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963680#comment-16963680
 ] 

Maxim Gekk commented on SPARK-29636:


1. The output is different because Spark uses the session local time zone while 
converting timestamps to strings
2. It seems this format is not supported, see 
https://github.com/apache/spark/blob/4cfce3e5d03b0badb4e9685499be2ab0fca5747a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L204-L211
 . The seconds field as well as hour and minute one is mandatory.

> Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
> ---
>
> Key: SPARK-29636
> URL: https://issues.apache.org/jira/browse/SPARK-29636
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dylan Guedes
>Priority: Major
>
> Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 
> 10:23:54+01' to timestamp:
> {code:sql}
> spark-sql> select cast ('11:00 BST' as timestamp);
> NULL
> Time taken: 2.248 seconds, Fetched 1 row(s)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963375#comment-16963375
 ] 

Maxim Gekk commented on SPARK-29671:


For example, PostgreSQL displays intervals like:
{code}
maxim=# select interval '1010 year 9 month 8 day 7 hour 6 minute -5 second 4 
millisecond -3 microseconds';
 interval
--
 1010 years 9 mons 8 days 07:05:55.003997
(1 row)
{code}
but this requires "normalization" because time fields cannot be negative.

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963373#comment-16963373
 ] 

Maxim Gekk commented on SPARK-29671:


[~cloud_fan][~dongjoon] Let's discuss here how to improve the string 
representation of intervals.

> Change format of interval string
> 
>
> Key: SPARK-29671
> URL: https://issues.apache.org/jira/browse/SPARK-29671
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Minor
>
> The ticket aims to improve format of interval representation as a string. See 
> https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29671) Change format of interval string

2019-10-30 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29671:
--

 Summary: Change format of interval string
 Key: SPARK-29671
 URL: https://issues.apache.org/jira/browse/SPARK-29671
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


The ticket aims to improve format of interval representation as a string. See 
https://github.com/apache/spark/pull/26313#issuecomment-547820035



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29669) Refactor IntervalUtils.fromDayTimeString()

2019-10-30 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29669:
--

 Summary: Refactor IntervalUtils.fromDayTimeString()
 Key: SPARK-29669
 URL: https://issues.apache.org/jira/browse/SPARK-29669
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


* Add UnitName enumeration and use it in AstBuilder and in IntervalUtils
* Make fromDayTimeString more generic and avoid adhoc code
* Introduce unit value properties like min/max values and a function to convert 
parsed value to micros



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29651) Incorrect parsing of interval seconds fraction

2019-10-30 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29651:
--

 Summary: Incorrect parsing of interval seconds fraction
 Key: SPARK-29651
 URL: https://issues.apache.org/jira/browse/SPARK-29651
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 2.3.0, 2.2.0, 2.1.0, 2.0.0
Reporter: Maxim Gekk


* The fractional part of interval seconds unit is incorrectly parsed if the 
number of digits is less than 9, for example:
{code}
spark-sql> select interval '10.123456 seconds';
interval 10 seconds 123 microseconds
{code}
The result must be *interval 10 seconds 123 milliseconds 456 microseconds*

* If the seconds unit of an interval is negative, it is incorrectly converted 
to `CalendarInterval`, for example:
{code}
spark-sql> select interval '-10.123456789 seconds';
interval -9 seconds -876 milliseconds -544 microseconds
{code}
Taking into account truncation to microseconds, the result must be *interval 
-10 seconds -123 milliseconds -456 microseconds*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29614) Failure of DateTimeUtilsSuite and TimestampFormatterSuite

2019-10-27 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29614:
--

 Summary: Failure of DateTimeUtilsSuite and TimestampFormatterSuite
 Key: SPARK-29614
 URL: https://issues.apache.org/jira/browse/SPARK-29614
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


* 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/653/
* 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112721/testReport/



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29607) Move static methods from CalendarInterval to IntervalUtils

2019-10-25 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29607:
--

 Summary: Move static methods from CalendarInterval to IntervalUtils
 Key: SPARK-29607
 URL: https://issues.apache.org/jira/browse/SPARK-29607
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Move static methods from the CalendarInterval class to the helper object 
IntervalUtils. Need to rewrite Java code to Scala code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29605) Optimize string to interval casting

2019-10-25 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29605:
--

 Summary: Optimize string to interval casting
 Key: SPARK-29605
 URL: https://issues.apache.org/jira/browse/SPARK-29605
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Implement new function stringToInterval in IntervalUtils to cast a value of 
UTF8String to an instance of CalendarInterval that should be faster than 
existing implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29533) Benchmark casting strings to intervals

2019-10-21 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29533:
--

 Summary: Benchmark casting strings to intervals
 Key: SPARK-29533
 URL: https://issues.apache.org/jira/browse/SPARK-29533
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Add benchmark for casting interval strings to intervals for different number of 
interval units.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29524) Unordered interval units

2019-10-20 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29524:
--

 Summary: Unordered interval units
 Key: SPARK-29524
 URL: https://issues.apache.org/jira/browse/SPARK-29524
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, Spark requires particular order of interval units in casting from 
strings - `YEAR` .. `MICROSECOND`. PostgreSQL allows any order:
{code}
maxim=# select interval '1 second 2 hours';
 interval
--
 02:00:01
(1 row)
{code}
but Spark fails on while parsing:
{code}
spark-sql> select interval '1 second 2 hours';
NULL
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29520) Incorrect checking of negative intervals

2019-10-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29520:
--

 Summary: Incorrect checking of negative intervals
 Key: SPARK-29520
 URL: https://issues.apache.org/jira/browse/SPARK-29520
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.4
Reporter: Maxim Gekk


An interval is negative when its duration is negative. The following code 
checks interval incorrectly:
* 
https://github.com/apache/spark/blob/f302c2ee6203de36e966fcc58917af4847dff7f2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala#L163
* 
https://github.com/apache/spark/blob/d841b33ba3a9b0504597dbccd4b0d11fa810abf3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29518) Benchmark `date_part` for `INTERVAL`

2019-10-19 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29518:
--

 Summary: Benchmark `date_part` for `INTERVAL`
 Key: SPARK-29518
 URL: https://issues.apache.org/jira/browse/SPARK-29518
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


SPARK-28420 supported the `INTERVAL` columns in `date_part()`. Need to add 
benchmarks for the new type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations

2019-10-18 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954579#comment-16954579
 ] 

Maxim Gekk commented on SPARK-29508:


I am working on it

> Implicitly cast strings in datetime arithmetic operations
> -
>
> Key: SPARK-29508
> URL: https://issues.apache.org/jira/browse/SPARK-29508
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Maxim Gekk
>Priority: Minor
>
> To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` 
> types in the cases:
>  # Cast string to interval in interval - string
>  # Cast string to interval in datetime + string or string + datetime
>  # Cast string to timestamp in datetime - string or string - datetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations

2019-10-18 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29508:
--

 Summary: Implicitly cast strings in datetime arithmetic operations
 Key: SPARK-29508
 URL: https://issues.apache.org/jira/browse/SPARK-29508
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` 
types in the cases:
 # Cast string to interval in interval - string
 # Cast string to interval in datetime + string or string + datetime
 # Cast string to timestamp in datetime - string or string - datetime



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29387) Support `*` and `/` operators for intervals

2019-10-15 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29387:
---
Summary: Support `*` and `/` operators for intervals  (was: Support `*` and 
`\` operators for intervals)

> Support `*` and `/` operators for intervals
> ---
>
> Key: SPARK-29387
> URL: https://issues.apache.org/jira/browse/SPARK-29387
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Support `*` by numeric, `/` by numeric. See 
> [https://www.postgresql.org/docs/12/functions-datetime.html]
> ||Operator||Example||Result||
> |*|900 * interval '1 second'|interval '00:15:00'|
> |*|21 * interval '1 day'|interval '21 days'|
> |/|interval '1 hour' / double precision '1.5'|interval '00:40:00'|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10848) Applied JSON Schema Works for json RDD but not when loading json file

2019-10-13 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950461#comment-16950461
 ] 

Maxim Gekk commented on SPARK-10848:


Nullable = false in user's schema cannot guarantee that nulls don't appear in 
loaded data. That can lead to weird errors like corruptions in saved parquet 
files described in SPARK-23173 

> Applied JSON Schema Works for json RDD but not when loading json file
> -
>
> Key: SPARK-10848
> URL: https://issues.apache.org/jira/browse/SPARK-10848
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Miklos Christine
>Priority: Minor
>
> Using a defined schema to load a json rdd works as expected. Loading the json 
> records from a file does not apply the supplied schema. Mainly the nullable 
> field isn't applied correctly. Loading from a file uses nullable=true on all 
> fields regardless of applied schema. 
> Code to reproduce:
> {code}
> import  org.apache.spark.sql.types._
> val jsonRdd = sc.parallelize(List(
>   """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", 
> "ProductCode": "WQT648", "Qty": 5}""",
>   """{"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", 
> "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, 
> "expressDelivery":true}"""))
> val mySchema = StructType(Array(
>   StructField(name="OrderID"   , dataType=LongType, nullable=false),
>   StructField("CustomerID", IntegerType, false),
>   StructField("OrderDate", DateType, false),
>   StructField("ProductCode", StringType, false),
>   StructField("Qty", IntegerType, false),
>   StructField("Discount", FloatType, true),
>   StructField("expressDelivery", BooleanType, true)
> ))
> val myDF = sqlContext.read.schema(mySchema).json(jsonRdd)
> val schema1 = myDF.printSchema
> val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json")
> val schema2 = dfDFfromFile.printSchema
> {code}
> Orders.json
> {code}
> {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": 
> "WQT648", "Qty": 5}
> {"OrderID": 2, "CustomerID":16  , "OrderDate": "2015-07-11", "ProductCode": 
> "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true}
> {code}
> The behavior should be consistent. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29448) Support the `INTERVAL` type by Parquet datasource

2019-10-12 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29448:
--

 Summary: Support the `INTERVAL` type by Parquet datasource
 Key: SPARK-29448
 URL: https://issues.apache.org/jira/browse/SPARK-29448
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Parquet format allows to store intervals as triple of (milliseconds, days, 
months) see 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval . 
The `INTERVAL` logical type is used for an interval of time. _It must annotate 
a fixed_len_byte_array of length 12. This array stores three little-endian 
unsigned integers that represent durations at different granularities of time. 
The first stores a number in months, the second stores a number in days, and 
the third stores a number in milliseconds. This representation is independent 
of any particular timezone or date._

Need to support writing and reading values of Catalyst's CalendarIntervalType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29440) Support java.time.Duration as an external type of CalendarIntervalType

2019-10-11 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29440:
--

 Summary: Support java.time.Duration as an external type of 
CalendarIntervalType
 Key: SPARK-29440
 URL: https://issues.apache.org/jira/browse/SPARK-29440
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.4
Reporter: Maxim Gekk


Currently, Spark SQL doesn't have any external type for Catalyst's 
CalendarIntervalType. Internal CalendarInterval is partially exposed but it 
cannot be used in UDF for example. This ticket aims to provide 
`java.time.Duration` as one of external types of Spark `INTERVAL`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29382) Support writing `INTERVAL` type to datasource table

2019-10-11 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29382:
---
Description: 
Creating a table with `INTERVAL` column for writing failed with the error:
{code:java}
spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval);
Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval' but 'interval' is found.;
{code}

This is needed for SPARK-29368

  was:
Spark cannot create a table using parquet if a column has the `INTERVAL` type:
{code}
spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval) USING PARQUET;
Error in query: Parquet data source does not support interval data type.;
{code}
This is needed for SPARK-29368


> Support writing `INTERVAL` type to datasource table
> ---
>
> Key: SPARK-29382
> URL: https://issues.apache.org/jira/browse/SPARK-29382
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Creating a table with `INTERVAL` column for writing failed with the error:
> {code:java}
> spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval);
> Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval' but 'interval' is found.;
> {code}
> This is needed for SPARK-29368



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29382) Support writing `INTERVAL` type to datasource table

2019-10-11 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29382:
---
Summary: Support writing `INTERVAL` type to datasource table  (was: Support 
the `INTERVAL` type by Parquet datasource)

> Support writing `INTERVAL` type to datasource table
> ---
>
> Key: SPARK-29382
> URL: https://issues.apache.org/jira/browse/SPARK-29382
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Spark cannot create a table using parquet if a column has the `INTERVAL` type:
> {code}
> spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval) USING PARQUET;
> Error in query: Parquet data source does not support interval data type.;
> {code}
> This is needed for SPARK-29368



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26651) Use Proleptic Gregorian calendar

2019-10-10 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948957#comment-16948957
 ] 

Maxim Gekk commented on SPARK-26651:


[~jiangxb] Could you consider this for including to the major changes lists of 
Spark 3.0 

> Use Proleptic Gregorian calendar
> 
>
> Key: SPARK-26651
> URL: https://issues.apache.org/jira/browse/SPARK-26651
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>  Labels: ReleaseNote
>
> Spark 2.4 and previous versions use a hybrid calendar - Julian + Gregorian in 
> date/timestamp parsing, functions and expressions. The ticket aims to switch 
> Spark on Proleptic Gregorian calendar, and use java.time classes introduced 
> in Java 8 for timestamp/date manipulations. One of the purpose of switching 
> on Proleptic Gregorian calendar is to conform to SQL standard which supposes 
> such calendar.
> *Release note:*
> Spark 3.0 has switched on Proleptic Gregorian calendar in parsing, 
> formatting, and converting dates and timestamps as well as in extracting 
> sub-components like years, days and etc. It uses Java 8 API classes from the 
> java.time packages that based on [ISO chronology 
> |https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html].
>  Previous versions of Spark performed those operations by using [the hybrid 
> calendar|https://docs.oracle.com/javase/7/docs/api/java/util/GregorianCalendar.html]
>  (Julian + Gregorian). The changes might impact on the results for dates and 
> timestamps before October 15, 1582 (Gregorian).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29408) Support interval literal with negative sign `-`

2019-10-09 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29408:
---
Summary: Support interval literal with negative sign `-`  (was: Support 
interval literal with negative sign `-`.)

> Support interval literal with negative sign `-`
> ---
>
> Key: SPARK-29408
> URL: https://issues.apache.org/jira/browse/SPARK-29408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For example:
> {code}
> maxim=# select - interval '1-2' AS "negative year-month";
>  negative year-month 
> -
>  -1 years -2 mons
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29408) Support interval literal with negative sign `-`.

2019-10-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29408:
--

 Summary: Support interval literal with negative sign `-`.
 Key: SPARK-29408
 URL: https://issues.apache.org/jira/browse/SPARK-29408
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


For example:
{code}
maxim=# select - interval '1-2' AS "negative year-month";
 negative year-month 
-
 -1 years -2 mons
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29407) Support syntax for zero interval

2019-10-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29407:
--

 Summary: Support syntax for zero interval
 Key: SPARK-29407
 URL: https://issues.apache.org/jira/browse/SPARK-29407
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Support special syntax for zero interval like PostgreSQL does:
{code}
maxim=# SELECT  interval '0';
 interval 
--
 00:00:00
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29406) Interval output styles

2019-10-09 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29406:
--

 Summary: Interval output styles
 Key: SPARK-29406
 URL: https://issues.apache.org/jira/browse/SPARK-29406
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


The output format of the interval type can be set to one of the four styles 
sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET 
intervalstyle, see
 
[https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT]
||Style Specification||Year-Month Interval||Day-Time Interval||Mixed Interval||
|{{sql_standard}}|1-2|3 4:05:06|-1-2 +3 -4:05:06|
|{{postgres}}|1 year 2 mons|3 days 04:05:06|-1 year -2 mons +3 days -04:05:06|
|{{postgres_verbose}}|@ 1 year 2 mons|@ 3 days 4 hours 5 mins 6 secs|@ 1 year 2 
mons -3 days 4 hours 5 mins 6 secs ago|
|{{iso_8601}}|P1Y2M|P3DT4H5M6S|P-1Y-2M3DT-4H-5M-6S|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29370) Interval strings without explicit unit markings

2019-10-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29370:
---
Description: 
In PostgreSQL, Quantities of days, hours, minutes, and seconds can be specified 
without explicit unit markings. For example, '1 12:59:10' is read the same as 
'1 day 12 hours 59 min 10 sec'. For example:
{code:java}
maxim=# select interval '1 12:59:10';
interval

 1 day 12:59:10
(1 row)
{code}
It should allow to specify the sign:
{code}
maxim=# SELECT interval '1 +2:03:04' minute to second;
interval

 1 day 02:03:04
maxim=# SELECT interval '1 -2:03:04' minute to second;
interval 
-
 1 day -02:03:04
{code}
 

  was:
In PostgreSQL, Quantities of days, hours, minutes, and seconds can be specified 
without explicit unit markings. For example, '1 12:59:10' is read the same as 
'1 day 12 hours 59 min 10 sec'. For example:
{code}
maxim=# select interval '1 12:59:10';
interval

 1 day 12:59:10
(1 row)
{code}


> Interval strings without explicit unit markings
> ---
>
> Key: SPARK-29370
> URL: https://issues.apache.org/jira/browse/SPARK-29370
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> In PostgreSQL, Quantities of days, hours, minutes, and seconds can be 
> specified without explicit unit markings. For example, '1 12:59:10' is read 
> the same as '1 day 12 hours 59 min 10 sec'. For example:
> {code:java}
> maxim=# select interval '1 12:59:10';
> interval
> 
>  1 day 12:59:10
> (1 row)
> {code}
> It should allow to specify the sign:
> {code}
> maxim=# SELECT interval '1 +2:03:04' minute to second;
> interval
> 
>  1 day 02:03:04
> maxim=# SELECT interval '1 -2:03:04' minute to second;
> interval 
> -
>  1 day -02:03:04
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29395) Precision of the interval type

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29395:
--

 Summary: Precision of the interval type
 Key: SPARK-29395
 URL: https://issues.apache.org/jira/browse/SPARK-29395
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


PostgreSQL allows to specify interval precision, see 
[https://www.postgresql.org/docs/12/datatype-datetime.html]
|{{interval [ _{{fields}}_ ] [ (_{{p}}_) ]}}|16 bytes|time interval|-17800 
years|17800 years|1 microsecond|

For example:
{code}
maxim=# SELECT interval '1 2:03.4567' day to second(2);
 interval  
---
 1 day 00:02:03.46
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29394) Support ISO 8601 format for intervals

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29394:
--

 Summary: Support ISO 8601 format for intervals
 Key: SPARK-29394
 URL: https://issues.apache.org/jira/browse/SPARK-29394
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Interval values can also be written as ISO 8601 time intervals, using either 
the “format with designators” of the standard's section 4.4.3.2 or the 
“alternative format” of section 4.4.3.3. 
 For example:
|P1Y2M3DT4H5M6S|ISO 8601 “format with designators”|
|P0001-02-03T04:05:06|ISO 8601 “alternative format”: same meaning as above|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29393) Add the make_interval() function

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29393:
--

 Summary: Add the make_interval() function
 Key: SPARK-29393
 URL: https://issues.apache.org/jira/browse/SPARK-29393
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


PostgreSQL allows to make an interval by using the make_interval() function:
|{{make_interval(_{{years}}_ }}{{int}}{{ DEFAULT 0, _{{months}}_ }}{{int}}{{ 
DEFAULT 0, _{{weeks}}_ }}{{int}}{{ DEFAULT 0, _{{days}}_ }}{{int}}{{ DEFAULT 0, 
_{{hours}}_ }}{{int}}{{ DEFAULT 0, _{{mins}}_ }}{{int}}{{ DEFAULT 0, _{{secs}}_ 
}}{{double precision}}{{ DEFAULT 0.0)}}|{{interval}}|Create interval from 
years, months, weeks, days, hours, minutes and seconds 
fields|{{make_interval(days => 10)}}|{{10 days}}|
See https://www.postgresql.org/docs/12/functions-datetime.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29391) Default year-month units

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29391:
--

 Summary: Default year-month units
 Key: SPARK-29391
 URL: https://issues.apache.org/jira/browse/SPARK-29391
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


PostgreSQL can assume default year-month units by defaults:
{code}
maxim=# SELECT interval '1-2'; 
   interval
---
 1 year 2 mons
{code}
but the same produces NULL in Spark:



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29390:
--

 Summary: Add  the justify_days(),  justify_hours() and  
justify_interval() functions
 Key: SPARK-29390
 URL: https://issues.apache.org/jira/browse/SPARK-29390
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


See *Table 9.31. Date/Time Functions* 
([https://www.postgresql.org/docs/12/functions-datetime.html)]
|{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day time 
periods are represented as months|{{justify_days(interval '35 days')}}|{{1 mon 
5 days}}|
| {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour 
time periods are represented as days|{{justify_hours(interval '27 hours')}}|{{1 
day 03:00:00}}|
| {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using 
{{justify_days}} and {{justify_hours}}, with additional sign 
adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days 23:00:00}}|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29389) Short synonyms of interval units

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29389:
--

 Summary: Short synonyms of interval units
 Key: SPARK-29389
 URL: https://issues.apache.org/jira/browse/SPARK-29389
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Should be supported the following synonyms:
{code}
 ["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"),
   "CENTURY", ("CENTURIES", "C", "CENT"),
   "DECADE", ("DECADES", "DEC", "DECS"),
   "YEAR", ("Y", "YEARS", "YR", "YRS"),
   "QUARTER", ("QTR"),
   "MONTH", ("MON", "MONS", "MONTHS"),
   "DAY", ("D", "DAYS"),
   "HOUR", ("H", "HOURS", "HR", "HRS"),
   "MINUTE", ("M", "MIN", "MINS", "MINUTES"),
   "SECOND", ("S", "SEC", "SECONDS", "SECS"),
   "MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", 
"MS"),
   "MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", 
"US"),
   "EPOCH"]
{code}

For example:
{code}
maxim=# select '1y 10mon -10d -10h -10min -10.01s 
ago'::interval;
interval

 -1 years -10 mons +10 days 10:10:10.01
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29389) Support synonyms for interval units

2019-10-08 Thread Maxim Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maxim Gekk updated SPARK-29389:
---
Summary: Support synonyms for interval units  (was: Short synonyms of 
interval units)

> Support synonyms for interval units
> ---
>
> Key: SPARK-29389
> URL: https://issues.apache.org/jira/browse/SPARK-29389
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Priority: Major
>
> Should be supported the following synonyms:
> {code}
>  ["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"),
>"CENTURY", ("CENTURIES", "C", "CENT"),
>"DECADE", ("DECADES", "DEC", "DECS"),
>"YEAR", ("Y", "YEARS", "YR", "YRS"),
>"QUARTER", ("QTR"),
>"MONTH", ("MON", "MONS", "MONTHS"),
>"DAY", ("D", "DAYS"),
>"HOUR", ("H", "HOURS", "HR", "HRS"),
>"MINUTE", ("M", "MIN", "MINS", "MINUTES"),
>"SECOND", ("S", "SEC", "SECONDS", "SECS"),
>"MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", 
> "MSECONDS", "MS"),
>"MICROSECONDS", ("USEC", "USECS", "USECONDS", 
> "MICROSECON", "US"),
>"EPOCH"]
> {code}
> For example:
> {code}
> maxim=# select '1y 10mon -10d -10h -10min -10.01s 
> ago'::interval;
> interval
> 
>  -1 years -10 mons +10 days 10:10:10.01
> (1 row)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29388) Construct intervals from the `millenniums`, `centuries` or `decades` units

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29388:
--

 Summary: Construct intervals from the `millenniums`, `centuries` 
or `decades` units
 Key: SPARK-29388
 URL: https://issues.apache.org/jira/browse/SPARK-29388
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


PostgreSQL supports `millenniums`, `centuries` or `decades` interval units. See
{code}
maxim=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17 
minutes 31 seconds'::interval;
 interval  
---
 4541 years 4 mons 4 days 00:17:31
(1 row)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29387) Support `*` and `\` operators for intervals

2019-10-08 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-29387:
--

 Summary: Support `*` and `\` operators for intervals
 Key: SPARK-29387
 URL: https://issues.apache.org/jira/browse/SPARK-29387
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


Support `*` by numeric, `/` by numeric. See 
[https://www.postgresql.org/docs/12/functions-datetime.html]
||Operator||Example||Result||
|*|900 * interval '1 second'|interval '00:15:00'|
|*|21 * interval '1 day'|interval '21 days'|
|/|interval '1 hour' / double precision '1.5'|interval '00:40:00'|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    2   3   4   5   6   7   8   9   10   11   >