[jira] [Updated] (SPARK-30485) Remove SQL configs deprecated before v2.4
[ https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30485: --- Description: Remove the following SQL configs: * spark.sql.variable.substitute.depth * spark.sql.execution.pandas.respectSessionTimeZone * spark.sql.parquet.int64AsTimestampMillis Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs map: [https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189] was: Remove the following SQL configs: * spark.sql.variable.substitute.depth * spark.sql.execution.pandas.respectSessionTimeZone * spark.sql.parquet.int64AsTimestampMillis * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName which was deprecated in v2.4 Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs map: https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189 > Remove SQL configs deprecated before v2.4 > - > > Key: SPARK-30485 > URL: https://issues.apache.org/jira/browse/SPARK-30485 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Remove the following SQL configs: > * spark.sql.variable.substitute.depth > * spark.sql.execution.pandas.respectSessionTimeZone > * spark.sql.parquet.int64AsTimestampMillis > Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs > map: > > [https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30485) Remove SQL configs deprecated before v2.4
[ https://issues.apache.org/jira/browse/SPARK-30485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012678#comment-17012678 ] Maxim Gekk commented on SPARK-30485: [~dongjoon] [~srowen] [~cloud_fan] [~hyukjin.kwon] WDYT of the removing? > Remove SQL configs deprecated before v2.4 > - > > Key: SPARK-30485 > URL: https://issues.apache.org/jira/browse/SPARK-30485 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Remove the following SQL configs: > * spark.sql.variable.substitute.depth > * spark.sql.execution.pandas.respectSessionTimeZone > * spark.sql.parquet.int64AsTimestampMillis > * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > which was deprecated in v2.4 > Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs > map: > https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30485) Remove SQL configs deprecated before v2.4
Maxim Gekk created SPARK-30485: -- Summary: Remove SQL configs deprecated before v2.4 Key: SPARK-30485 URL: https://issues.apache.org/jira/browse/SPARK-30485 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Remove the following SQL configs: * spark.sql.variable.substitute.depth * spark.sql.execution.pandas.respectSessionTimeZone * spark.sql.parquet.int64AsTimestampMillis * Maybe spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName which was deprecated in v2.4 Recently all deprecated SQL configs were gathered to the deprecatedSQLConfigs map: https://github.com/apache/spark/blob/1ffa627ffb93dc1027cb4b72f36ec9b7319f48e4/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L2160-L2189 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests
[ https://issues.apache.org/jira/browse/SPARK-30482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30482: --- Component/s: (was: SQL) > Add sub-class of AppenderSkeleton reusable in tests > --- > > Key: SPARK-30482 > URL: https://issues.apache.org/jira/browse/SPARK-30482 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Minor > > Some tests define similar sub-class of AppenderSkeleton. The code duplication > can be eliminated by defining common class in > [SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30482) Add sub-class of AppenderSkeleton reusable in tests
Maxim Gekk created SPARK-30482: -- Summary: Add sub-class of AppenderSkeleton reusable in tests Key: SPARK-30482 URL: https://issues.apache.org/jira/browse/SPARK-30482 Project: Spark Issue Type: Test Components: SQL, Tests Affects Versions: 2.4.4 Reporter: Maxim Gekk Some tests define similar sub-class of AppenderSkeleton. The code duplication can be eliminated by defining common class in [SparkFunSuite.scala|https://github.com/apache/spark/compare/master...MaxGekk:dedup-appender-skeleton?expand=1#diff-d521001af1af1a2aace870feb25ae0b0] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30442) Write mode ignored when using CodecStreams
[ https://issues.apache.org/jira/browse/SPARK-30442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009963#comment-17009963 ] Maxim Gekk commented on SPARK-30442: > This can cause issues, particularly with aws tools, that make it impossible >to retry. Could you clarify how it makes retry impossible. When the mode is set to overwrite, Spark deletes entire folder and writes new files - should be no clashes. In the append mode, new files are added - Spark does not append to existing files. What's the situation when files should be overwritten? > Write mode ignored when using CodecStreams > -- > > Key: SPARK-30442 > URL: https://issues.apache.org/jira/browse/SPARK-30442 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 2.4.4 >Reporter: Jesse Collins >Priority: Major > > Overwrite is hardcoded to false in the codec stream. This can cause issues, > particularly with aws tools, that make it impossible to retry. > Ideally, this should be read from the write mode set for the DataWriter that > is writing through this codec class. > [https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CodecStreams.scala#L81] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30429) WideSchemaBenchmark fails with OOM
[ https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009398#comment-17009398 ] Maxim Gekk commented on SPARK-30429: Bisect have found the first bad commit. I specified the recent master as a bad commit and 62551cceebf6aca8b6bd8164cd2ed85564726f6c as the good commit. {code} cb5ea201df5fae8aacb653ffb4147b9288bca1e9 is the first bad commit commit cb5ea201df5fae8aacb653ffb4147b9288bca1e9 Author: Liang-Chi Hsieh Date: Thu Oct 25 19:27:45 2018 +0800 [SPARK-25746][SQL] Refactoring ExpressionEncoder to get rid of flat flag ... Closes #22749 from viirya/SPARK-24762-refactor. Authored-by: Liang-Chi Hsieh Signed-off-by: Wenchen Fan :04 04 11961d7665e9097c682cdf6d51163ad4b3ffdf90 cb82a04e8a2fa1505c2db36c9c6578544e502601 M sql bisect run success {code} /cc [~cloud_fan] [~viirya] > WideSchemaBenchmark fails with OOM > -- > > Key: SPARK-30429 > URL: https://issues.apache.org/jira/browse/SPARK-30429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: WideSchemaBenchmark_console.txt > > > Run WideSchemaBenchmark on the master (commit > bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via: > {code} > SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain > org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark" > {code} > This fails with: > {code} > Caused by: java.lang.reflect.InvocationTargetException > [error] at > sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) > [error] at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > [error] at > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468) > [error] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467) > [error] at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > [error] ... 132 more > [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > [error] at java.util.Arrays.copyOfRange(Arrays.java:3664) > [error] at java.lang.String.(String.java:207) > [error] at java.lang.StringBuilder.toString(StringBuilder.java:407) > [error] at > org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411) > [error] at > org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410) > [error] at > org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown > Source) > {code} > Full stack dump is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30429) WideSchemaBenchmark fails with OOM
[ https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009189#comment-17009189 ] Maxim Gekk commented on SPARK-30429: [~dongjoon] I ran git bisect. Let see what it will find during this night. > WideSchemaBenchmark fails with OOM > -- > > Key: SPARK-30429 > URL: https://issues.apache.org/jira/browse/SPARK-30429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: WideSchemaBenchmark_console.txt > > > Run WideSchemaBenchmark on the master (commit > bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via: > {code} > SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain > org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark" > {code} > This fails with: > {code} > Caused by: java.lang.reflect.InvocationTargetException > [error] at > sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) > [error] at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > [error] at > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468) > [error] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467) > [error] at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > [error] ... 132 more > [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > [error] at java.util.Arrays.copyOfRange(Arrays.java:3664) > [error] at java.lang.String.(String.java:207) > [error] at java.lang.StringBuilder.toString(StringBuilder.java:407) > [error] at > org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411) > [error] at > org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410) > [error] at > org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown > Source) > {code} > Full stack dump is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30429) WideSchemaBenchmark fails with OOM
[ https://issues.apache.org/jira/browse/SPARK-30429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30429: --- Attachment: WideSchemaBenchmark_console.txt > WideSchemaBenchmark fails with OOM > -- > > Key: SPARK-30429 > URL: https://issues.apache.org/jira/browse/SPARK-30429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > Attachments: WideSchemaBenchmark_console.txt > > > Run WideSchemaBenchmark on the master (commit > bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via: > {code} > SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain > org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark" > {code} > This fails with: > {code} > Caused by: java.lang.reflect.InvocationTargetException > [error] at > sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) > [error] at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > [error] at > java.lang.reflect.Constructor.newInstance(Constructor.java:423) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468) > [error] at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > [error] at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467) > [error] at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > [error] ... 132 more > [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded > [error] at java.util.Arrays.copyOfRange(Arrays.java:3664) > [error] at java.lang.String.(String.java:207) > [error] at java.lang.StringBuilder.toString(StringBuilder.java:407) > [error] at > org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411) > [error] at > org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410) > [error] at > org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown > Source) > {code} > Full stack dump is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30429) WideSchemaBenchmark fails with OOM
Maxim Gekk created SPARK-30429: -- Summary: WideSchemaBenchmark fails with OOM Key: SPARK-30429 URL: https://issues.apache.org/jira/browse/SPARK-30429 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Run WideSchemaBenchmark on the master (commit bc16bb1dd095c9e1c8deabf6ac0d528441a81d88) via: {code} SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideSchemaBenchmark" {code} This fails with: {code} Caused by: java.lang.reflect.InvocationTargetException [error] at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source) [error] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) [error] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) [error] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$7(TreeNode.scala:468) [error] at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) [error] at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$makeCopy$1(TreeNode.scala:467) [error] at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) [error] ... 132 more [error] Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded [error] at java.util.Arrays.copyOfRange(Arrays.java:3664) [error] at java.lang.String.(String.java:207) [error] at java.lang.StringBuilder.toString(StringBuilder.java:407) [error] at org.apache.spark.sql.types.StructType.catalogString(StructType.scala:411) [error] at org.apache.spark.sql.types.StructType.$anonfun$catalogString$1(StructType.scala:410) [error] at org.apache.spark.sql.types.StructType$$Lambda$2441/1040526643.apply(Unknown Source) {code} Full stack dump is attached. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30416) Log a warning for deprecated SQL config in `set()` and `unset()`
Maxim Gekk created SPARK-30416: -- Summary: Log a warning for deprecated SQL config in `set()` and `unset()` Key: SPARK-30416 URL: https://issues.apache.org/jira/browse/SPARK-30416 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk - Gather deprecated SQL configs and add extra info - when a config was deprecated and why - Output warning about deprecated SQL config in set() and unset() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30412) Eliminate warnings in Java tests regarding to deprecated API
Maxim Gekk created SPARK-30412: -- Summary: Eliminate warnings in Java tests regarding to deprecated API Key: SPARK-30412 URL: https://issues.apache.org/jira/browse/SPARK-30412 Project: Spark Issue Type: Sub-task Components: Java API, SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Suppress warnings about deprecated Spark API in Java test suites: {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetAggregatorSuite.java Warning:Warning:line (32)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (91)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (100)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (109)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (118)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated {code} {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/Java8DatasetAggregatorSuite.java Warning:Warning:line (28)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (37)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (46)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (55)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated Warning:Warning:line (64)java: org.apache.spark.sql.expressions.javalang.typed in org.apache.spark.sql.expressions.javalang has been deprecated {code} {code} /Users/maxim/proj/eliminate-warnings-part2/sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java Warning:Warning:line (478)java: json(org.apache.spark.api.java.JavaRDD) in org.apache.spark.sql.DataFrameReader has been deprecated {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30174) Eliminate warnings :part 4
[ https://issues.apache.org/jira/browse/SPARK-30174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006953#comment-17006953 ] Maxim Gekk commented on SPARK-30174: [~shivuson...@gmail.com] Are you still working on this? If so, could you write in the ticket how are going to fix the warnings, please. > Eliminate warnings :part 4 > -- > > Key: SPARK-30174 > URL: https://issues.apache.org/jira/browse/SPARK-30174 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: jobit mathew >Priority: Minor > > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala > {code:java} > Warning:Warning:line (127)value ENABLE_JOB_SUMMARY in class > ParquetOutputFormat is deprecated: see corresponding Javadoc for more > information. > && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) { > Warning:Warning:line (261)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > new org.apache.parquet.hadoop.ParquetInputSplit( > Warning:Warning:line (272)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter(sharedConf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > Warning:Warning:line (442)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter( > {code} > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala > {code:java} > Warning:Warning:line (91)value ENABLE_JOB_SUMMARY in class > ParquetOutputFormat is deprecated: see corresponding Javadoc for more > information. > && conf.get(ParquetOutputFormat.ENABLE_JOB_SUMMARY) == null) { > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30172) Eliminate warnings: part3
[ https://issues.apache.org/jira/browse/SPARK-30172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006952#comment-17006952 ] Maxim Gekk commented on SPARK-30172: [~Ankitraj] Are you still working on this? > Eliminate warnings: part3 > - > > Key: SPARK-30172 > URL: https://issues.apache.org/jira/browse/SPARK-30172 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > /sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala > Warning:Warning:line (422)method initialize in class AbstractSerDe is > deprecated: see corresponding Javadoc for more information. > serde.initialize(null, properties) > /sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveUDFs.scala > Warning:Warning:line (216)method initialize in class GenericUDTF is > deprecated: see corresponding Javadoc for more information. > protected lazy val outputInspector = > function.initialize(inputInspectors.toArray) > Warning:Warning:line (342)class UDAF in package exec is deprecated: see > corresponding Javadoc for more information. > new GenericUDAFBridge(funcWrapper.createFunction[UDAF]()) > Warning:Warning:line (503)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > def serialize(buffer: AggregationBuffer): Array[Byte] = { > Warning:Warning:line (523)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > def deserialize(bytes: Array[Byte]): AggregationBuffer = { > Warning:Warning:line (538)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean) > Warning:Warning:line (538)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > case class HiveUDAFBuffer(buf: AggregationBuffer, canDoMerge: Boolean) > /sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkOrcNewRecordReader.java > Warning:Warning:line (44)java: getTypes() in org.apache.orc.Reader has > been deprecated > Warning:Warning:line (47)java: getTypes() in org.apache.orc.Reader has > been deprecated > /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala > Warning:Warning:line (2,368)method readFooter in class ParquetFileReader > is deprecated: see corresponding Javadoc for more information. > val footer = ParquetFileReader.readFooter( > /sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDAFSuite.scala > Warning:Warning:line (202)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def getNewAggregationBuffer: AggregationBuffer = new > MockUDAFBuffer(0L, 0L) > Warning:Warning:line (204)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def reset(agg: AggregationBuffer): Unit = { > Warning:Warning:line (212)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def iterate(agg: AggregationBuffer, parameters: Array[AnyRef]): > Unit = { > Warning:Warning:line (221)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def merge(agg: AggregationBuffer, partial: Object): Unit = { > Warning:Warning:line (231)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def terminatePartial(agg: AggregationBuffer): AnyRef = { > Warning:Warning:line (236)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def terminate(agg: AggregationBuffer): AnyRef = > terminatePartial(agg) > Warning:Warning:line (257)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def getNewAggregationBuffer: AggregationBuffer = { > Warning:Warning:line (266)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def reset(agg: AggregationBuffer): Unit = { > Warning:Warning:line (277)trait AggregationBuffer in class > GenericUDAFEvaluator is deprecated: see corresponding Javadoc for more > information. > override def iterate(agg: AggregationBuffer, parameters: Arr
[jira] [Commented] (SPARK-30171) Eliminate warnings: part2
[ https://issues.apache.org/jira/browse/SPARK-30171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006949#comment-17006949 ] Maxim Gekk commented on SPARK-30171: [~srowen] SPARK-30258 fixes warnings AvroFunctionsSuite.scala but not in parsedOptions.ignoreExtension . I am not sure how we can avoid warnings related to ignoreExtension. > Eliminate warnings: part2 > - > > Key: SPARK-30171 > URL: https://issues.apache.org/jira/browse/SPARK-30171 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > AvroFunctionsSuite.scala > Warning:Warning:line (41)method to_avro in package avro is deprecated (since > 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' instead. > val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b")) > Warning:Warning:line (41)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroDF = df.select(to_avro('id).as("a"), to_avro('str).as("b")) > Warning:Warning:line (54)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, > avroTypeStr)), df) > Warning:Warning:line (54)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroDF.select(from_avro('a, avroTypeLong), from_avro('b, > avroTypeStr)), df) > Warning:Warning:line (59)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroStructDF = df.select(to_avro('struct).as("avro")) > Warning:Warning:line (70)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > checkAnswer(avroStructDF.select(from_avro('avro, avroTypeStruct)), df) > Warning:Warning:line (76)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val avroStructDF = df.select(to_avro('struct).as("avro")) > Warning:Warning:line (118)method to_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.to_avro' > instead. > val readBackOne = dfOne.select(to_avro($"array").as("avro")) > Warning:Warning:line (119)method from_avro in package avro is deprecated > (since 3.0.0): Please use 'org.apache.spark.sql.avro.functions.from_avro' > instead. > .select(from_avro($"avro", avroTypeArrStruct).as("array")) > AvroPartitionReaderFactory.scala > Warning:Warning:line (64)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > if (parsedOptions.ignoreExtension || > partitionedFile.filePath.endsWith(".avro")) { > AvroFileFormat.scala > Warning:Warning:line (98)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > if (parsedOptions.ignoreExtension || file.filePath.endsWith(".avro")) { > AvroUtils.scala > Warning:Warning:line (55)value ignoreExtension in class AvroOptions is > deprecated (since 3.0): Use the general data source option pathGlobFilter for > filtering file names > inferAvroSchemaFromFiles(files, conf, parsedOptions.ignoreExtension, -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30409) Use `NoOp` datasource in SQL benchmarks
Maxim Gekk created SPARK-30409: -- Summary: Use `NoOp` datasource in SQL benchmarks Key: SPARK-30409 URL: https://issues.apache.org/jira/browse/SPARK-30409 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, SQL benchmarks use `count()`, `collect()` and `foreach(_ => ())` actions. The actions have additional overhead. For example, `collect()` converts column values to external type values and pull data on the driver. Need to unify benchmark and the `NoOp` datasource except the benchmarks for `count()` or `collect()` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30401) Call requireNonStaticConf() only once
[ https://issues.apache.org/jira/browse/SPARK-30401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17006392#comment-17006392 ] Maxim Gekk commented on SPARK-30401: I am working on it > Call requireNonStaticConf() only once > - > > Key: SPARK-30401 > URL: https://issues.apache.org/jira/browse/SPARK-30401 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Trivial > > The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the > same input: > 1. Inside of set(, true) > 2. set() converts the second argument to a string and calls set(, > "true") where requireNonStaticConf() is invoked one more time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30401) Call requireNonStaticConf() only once
Maxim Gekk created SPARK-30401: -- Summary: Call requireNonStaticConf() only once Key: SPARK-30401 URL: https://issues.apache.org/jira/browse/SPARK-30401 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk The RuntimeConfig.requireNonStaticConf() method can be called 2 times for the same input: 1. Inside of set(, true) 2. set() converts the second argument to a string and calls set(, "true") where requireNonStaticConf() is invoked one more time -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30323) Support filters pushdown in CSV datasource
Maxim Gekk created SPARK-30323: -- Summary: Support filters pushdown in CSV datasource Key: SPARK-30323 URL: https://issues.apache.org/jira/browse/SPARK-30323 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk - Implement the `SupportsPushDownFilters` interface in `CSVScanBuilder` - Apply filters in UnivocityParser - Change API UnivocityParser - return Seq[InternalRow] from `convert()` - Update CSVBenchmark -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30309) Mark `Filter` as a `sealed` class
Maxim Gekk created SPARK-30309: -- Summary: Mark `Filter` as a `sealed` class Key: SPARK-30309 URL: https://issues.apache.org/jira/browse/SPARK-30309 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Add the `sealed` keyword to the `Filter` class at the `org.apache.spark.sql.sources` package. So, the compiler should output a warning if handling of a filter is missed in a datasource: {code} Warning:(154, 65) match may not be exhaustive. It would fail on the following inputs: AlwaysFalse(), AlwaysTrue() def translate(filter: sources.Filter): Option[Expression] = filter match { {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30258) Eliminate warnings of deprecated Spark APIs in tests
[ https://issues.apache.org/jira/browse/SPARK-30258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30258: --- Summary: Eliminate warnings of deprecated Spark APIs in tests (was: Eliminate warnings of depracted Spark APIs in tests) > Eliminate warnings of deprecated Spark APIs in tests > > > Key: SPARK-30258 > URL: https://issues.apache.org/jira/browse/SPARK-30258 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Suppress deprecation warnings in tests that check deprecated Spark APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30258) Eliminate warnings of depracted Spark APIs in tests
Maxim Gekk created SPARK-30258: -- Summary: Eliminate warnings of depracted Spark APIs in tests Key: SPARK-30258 URL: https://issues.apache.org/jira/browse/SPARK-30258 Project: Spark Issue Type: Sub-task Components: Tests Affects Versions: 3.0.0 Reporter: Maxim Gekk Suppress deprecation warnings in tests that check deprecated Spark APIs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30168) Eliminate warnings in Parquet datasource
[ https://issues.apache.org/jira/browse/SPARK-30168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995754#comment-16995754 ] Maxim Gekk commented on SPARK-30168: [~Ankitraj] Go ahead. > Eliminate warnings in Parquet datasource > > > Key: SPARK-30168 > URL: https://issues.apache.org/jira/browse/SPARK-30168 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > # > sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala > {code} > Warning:Warning:line (120)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] > = { > Warning:Warning:line (125)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > new org.apache.parquet.hadoop.ParquetInputSplit( > Warning:Warning:line (134)method readFooter in class ParquetFileReader is > deprecated: see corresponding Javadoc for more information. > ParquetFileReader.readFooter(conf, filePath, > SKIP_ROW_GROUPS).getFileMetaData > Warning:Warning:line (183)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > split: ParquetInputSplit, > Warning:Warning:line (212)class ParquetInputSplit in package hadoop is > deprecated: see corresponding Javadoc for more information. > split: ParquetInputSplit, > {code} > # > sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java > {code} > Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in > org.apache.parquet.hadoop has been deprecated > Warning:Warning:line (95)java: > org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has > been deprecated > Warning:Warning:line (95)java: > org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has > been deprecated > Warning:Warning:line (97)java: getRowGroupOffsets() in > org.apache.parquet.hadoop.ParquetInputSplit has been deprecated > Warning:Warning:line (105)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (108)java: > filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType) > in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated > Warning:Warning:line (111)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (147)java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (203)java: > readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > Warning:Warning:line (226)java: > ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) > in org.apache.parquet.hadoop.ParquetFileReader has been deprecated > {code} > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala > # > sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala > # > sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994806#comment-16994806 ] Maxim Gekk commented on SPARK-30165: > Are you sure on these? I am almost sure we can fix Parquet and Kafka related warnings. Not sure about warnings coming from deprecated Spark API. Maybe it is possible to suppress such warnings in tests. In any case, we know in advance that we test deprecated API. Such warnings don't guard us from mistakes. I quickly googled and found this [https://github.com/scala/bug/issues/7934#issuecomment-292425679] . Maybe we can use the approach in tests. > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30170) Eliminate warnings: part 1
[ https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30170: --- Description: Eliminate compilation warnings in: # StopWordsRemoverSuite {code:java} Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => {code} # MLTest.scala {code:java} Warning:Warning:line (88)match may not be exhaustive. It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute val n = Attribute.fromStructField(dataframe.schema(colName)) match { {code} # FloatType.scala {code:java} Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue {code} # AnalysisExternalCatalogSuite.scala {code:java} Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is deprecated: see corresponding Javadoc for more information. verifyZeroInteractions(catalog) {code} # CSVExprUtilsSuite.scala {code:java} Warning:Warning:line (81)Octal escape literals are deprecated, use \u instead. ("\0", Some("\u"), None) {code} # CollectionExpressionsSuite.scala, HashExpressionsSuite.scala, ExpressionParserSuite.scala {code:java} Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be enabled by making the implicit value scala.language.implicitConversions visible. This can be achieved by adding the import clause 'import scala.language.implicitConversions' or by setting the compiler option -language:implicitConversions. See the Scaladoc for value scala.language.implicitConversions for a discussion why the feature should be explicitly enabled.
[jira] [Commented] (SPARK-30170) Eliminate warnings: part 1
[ https://issues.apache.org/jira/browse/SPARK-30170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990989#comment-16990989 ] Maxim Gekk commented on SPARK-30170: I am working on this > Eliminate warnings: part 1 > -- > > Key: SPARK-30170 > URL: https://issues.apache.org/jira/browse/SPARK-30170 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Eliminate compilation warnings in: > # StopWordsRemoverSuite > {code} > Warning:Warning:line (245)non-variable type argument String in type pattern > Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (245)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > Warning:Warning:line (271)non-variable type argument String in type > pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is > eliminated by erasure > case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: > Seq[String]) => > {code} > # MLTest.scala > {code} > Warning:Warning:line (88)match may not be exhaustive. > It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute > val n = Attribute.fromStructField(dataframe.schema(colName)) match { > {code} > # FloatType.scala > {code} > Warning:Warning:line (81)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def quot(x: Float, y: Float): Float = (BigDecimal(x) quot > BigDecimal(y)).floatValue > Warning:Warning:line (81)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def quot(x: Float, y: Float): Float = (BigDecimal(x) quot > BigDecimal(y)).floatValue > Warning:Warning:line (82)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder > BigDecimal(y)).floatValue > Warning:Warning:line (82)method apply in object BigDecimal is deprecated > (since 2.11.0): The default conversion from Float may not do what you want. > Use BigDecimal.decimal for a String representation, or explicitly convert the > Float with .toDouble. > def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder > BigDecimal(y)).floatValue > {code} > # AnalysisExternalCatalogSuite.scala > {code} > Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is > deprecated: see corresponding Javadoc for more information. > verifyZeroInteractions(catalog) > {code} > # CSVExprUtilsSuite.scala > {code} > Warning:Warning:line (81)Octal escape literals are deprecated, use \u > instead. > ("\0", Some("\u"), None) > {c
[jira] [Created] (SPARK-30170) Eliminate warnings: part 1
Maxim Gekk created SPARK-30170: -- Summary: Eliminate warnings: part 1 Key: SPARK-30170 URL: https://issues.apache.org/jira/browse/SPARK-30170 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Eliminate compilation warnings in: # StopWordsRemoverSuite {code} Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (245)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => Warning:Warning:line (271)non-variable type argument String in type pattern Seq[String] (the underlying of Seq[String]) is unchecked since it is eliminated by erasure case Row(r1: Seq[String], e1: Seq[String], r2: Seq[String], e2: Seq[String]) => {code} # MLTest.scala {code} Warning:Warning:line (88)match may not be exhaustive. It would fail on the following inputs: NumericAttribute(), UnresolvedAttribute val n = Attribute.fromStructField(dataframe.schema(colName)) match { {code} # FloatType.scala {code} Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (81)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def quot(x: Float, y: Float): Float = (BigDecimal(x) quot BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue Warning:Warning:line (82)method apply in object BigDecimal is deprecated (since 2.11.0): The default conversion from Float may not do what you want. Use BigDecimal.decimal for a String representation, or explicitly convert the Float with .toDouble. def rem(x: Float, y: Float): Float = (BigDecimal(x) remainder BigDecimal(y)).floatValue {code} # AnalysisExternalCatalogSuite.scala {code} Warning:Warning:line (62)method verifyZeroInteractions in class Mockito is deprecated: see corresponding Javadoc for more information. verifyZeroInteractions(catalog) {code} # CSVExprUtilsSuite.scala {code} Warning:Warning:line (81)Octal escape literals are deprecated, use \u instead. ("\0", Some("\u"), None) {code} # CollectionExpressionsSuite.scala, ashExpressionsSuite.scala, ExpressionParserSuite.scala {code} Warning:Warning:line (39)implicit conversion method stringToUTF8Str should be enabled by making the implicit value scala.language.implicitConversions visible. This can be achieved by adding the import clause 'import scala.language.implicitConversions' or by setting the compiler option -language:implicitConversions. See the
[jira] [Created] (SPARK-30169) Eliminate warnings in Kafka connector
Maxim Gekk created SPARK-30169: -- Summary: Eliminate warnings in Kafka connector Key: SPARK-30169 URL: https://issues.apache.org/jira/browse/SPARK-30169 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Eliminate compilation warnings in the files: {code} external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/ConsumerStrategy.scala external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/DirectKafkaInputDStream.scala external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/DirectKafkaStreamSuite.scala external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaTestUtils.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaOffsetReader.scala external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30166) Eliminate warnings in JSONOptions
[ https://issues.apache.org/jira/browse/SPARK-30166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30166: --- Summary: Eliminate warnings in JSONOptions (was: Eliminate compilation warnings in JSONOptions) > Eliminate warnings in JSONOptions > - > > Key: SPARK-30166 > URL: https://issues.apache.org/jira/browse/SPARK-30166 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Scala 2.12 outputs the following warnings for JSONOptions: > {code} > sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala > Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, > allowNumericLeadingZeros) > Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, > allowNonNumericNumbers) > Warning:Warning:line (139)Java enum > ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: > see corresponding Javadoc for more information. > > factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, > Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java > enum Feature is deprecated: see corresponding Javadoc for more information. > factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, > allowUnquotedControlChars) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30168) Eliminate warnings in Parquet datasource
Maxim Gekk created SPARK-30168: -- Summary: Eliminate warnings in Parquet datasource Key: SPARK-30168 URL: https://issues.apache.org/jira/browse/SPARK-30168 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk # sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala {code} Warning:Warning:line (120)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. Option[TimeZone]) => RecordReader[Void, T]): RecordReader[Void, T] = { Warning:Warning:line (125)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. new org.apache.parquet.hadoop.ParquetInputSplit( Warning:Warning:line (134)method readFooter in class ParquetFileReader is deprecated: see corresponding Javadoc for more information. ParquetFileReader.readFooter(conf, filePath, SKIP_ROW_GROUPS).getFileMetaData Warning:Warning:line (183)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. split: ParquetInputSplit, Warning:Warning:line (212)class ParquetInputSplit in package hadoop is deprecated: see corresponding Javadoc for more information. split: ParquetInputSplit, {code} # sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java {code} Warning:Warning:line (55)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (95)java: org.apache.parquet.hadoop.ParquetInputSplit in org.apache.parquet.hadoop has been deprecated Warning:Warning:line (97)java: getRowGroupOffsets() in org.apache.parquet.hadoop.ParquetInputSplit has been deprecated Warning:Warning:line (105)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (108)java: filterRowGroups(org.apache.parquet.filter2.compat.FilterCompat.Filter,java.util.List,org.apache.parquet.schema.MessageType) in org.apache.parquet.filter2.compat.RowGroupFilter has been deprecated Warning:Warning:line (111)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (147)java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (203)java: readFooter(org.apache.hadoop.conf.Configuration,org.apache.hadoop.fs.Path,org.apache.parquet.format.converter.ParquetMetadataConverter.MetadataFilter) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated Warning:Warning:line (226)java: ParquetFileReader(org.apache.hadoop.conf.Configuration,org.apache.parquet.hadoop.metadata.FileMetaData,org.apache.hadoop.fs.Path,java.util.List,java.util.List) in org.apache.parquet.hadoop.ParquetFileReader has been deprecated {code} # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCompatibilityTest.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetInteroperabilitySuite.scala # sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetTest.scala # sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveDDLSuite.scala -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990925#comment-16990925 ] Maxim Gekk commented on SPARK-30165: [~aman_omer] Feel free to take a sub-set of warnings and create a sub-task to fix them. > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Component/s: (was: Build) SQL > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30166) Eliminate compilation warnings in JSONOptions
Maxim Gekk created SPARK-30166: -- Summary: Eliminate compilation warnings in JSONOptions Key: SPARK-30166 URL: https://issues.apache.org/jira/browse/SPARK-30166 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Scala 2.12 outputs the following warnings for JSONOptions: {code} sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JSONOptions.scala Warning:Warning:line (137)Java enum ALLOW_NUMERIC_LEADING_ZEROS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NUMERIC_LEADING_ZEROS, allowNumericLeadingZeros) Warning:Warning:line (138)Java enum ALLOW_NON_NUMERIC_NUMBERS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_NON_NUMERIC_NUMBERS, allowNonNumericNumbers) Warning:Warning:line (139)Java enum ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_BACKSLASH_ESCAPING_ANY_CHARACTER, Warning:Warning:line (141)Java enum ALLOW_UNQUOTED_CONTROL_CHARS in Java enum Feature is deprecated: see corresponding Javadoc for more information. factory.configure(JsonParser.Feature.ALLOW_UNQUOTED_CONTROL_CHARS, allowUnquotedControlChars) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Description: This is an umbrella ticket for sub-tasks for eliminating compilation warnings. I dumped all warnings to the spark_warnings.txt file attached to the ticket. (was: This is an umbrella ticket for sub-tasks for eliminating compilation warnings. ) > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. I dumped all warnings to the spark_warnings.txt file attached to > the ticket. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30165) Eliminate compilation warnings
[ https://issues.apache.org/jira/browse/SPARK-30165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-30165: --- Attachment: spark_warnings.txt > Eliminate compilation warnings > -- > > Key: SPARK-30165 > URL: https://issues.apache.org/jira/browse/SPARK-30165 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: spark_warnings.txt > > > This is an umbrella ticket for sub-tasks for eliminating compilation > warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30165) Eliminate compilation warnings
Maxim Gekk created SPARK-30165: -- Summary: Eliminate compilation warnings Key: SPARK-30165 URL: https://issues.apache.org/jira/browse/SPARK-30165 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 3.0.0 Reporter: Maxim Gekk This is an umbrella ticket for sub-tasks for eliminating compilation warnings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29963) Check formatting timestamps up to microsecond precision by JSON/CSV datasource
Maxim Gekk created SPARK-29963: -- Summary: Check formatting timestamps up to microsecond precision by JSON/CSV datasource Key: SPARK-29963 URL: https://issues.apache.org/jira/browse/SPARK-29963 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Port tests added for 2.4 by the commit: https://github.com/apache/spark/commit/47cb1f359af62383e24198dbbaa0b4503348cd04 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29949) JSON/CSV formats timestamps incorrectly
Maxim Gekk created SPARK-29949: -- Summary: JSON/CSV formats timestamps incorrectly Key: SPARK-29949 URL: https://issues.apache.org/jira/browse/SPARK-29949 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk For example: {code} scala> val t = java.sql.Timestamp.valueOf("2019-11-18 11:56:00.123456") t: java.sql.Timestamp = 2019-11-18 11:56:00.123456 scala> Seq(t).toDF("t").select(to_json(struct($"t"), Map("timestampFormat" -> "-MM-dd HH:mm:ss.SS"))).show(false) +-+ |structstojson(named_struct(NamePlaceholder(), t))| +-+ |{"t":"2019-11-18 11:56:00.000123"} | +-+ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106 ] Maxim Gekk edited comment on SPARK-29758 at 11/17/19 6:17 PM: -- Another solution is to disable this optimization: [https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478] was (Author: maxgekk): Another solution is to remove this optimization: https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478 > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {
[jira] [Commented] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976106#comment-16976106 ] Maxim Gekk commented on SPARK-29758: Another solution is to remove this optimization: https://github.com/apache/spark/blob/v2.4.4/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala#L475-L478 > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {code} > val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", > "test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > scala> json_tuple_result_with_prefix > res64: Int = 27
[jira] [Commented] (SPARK-29575) from_json can produce nulls for fields which are marked as non-nullable
[ https://issues.apache.org/jira/browse/SPARK-29575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976102#comment-16976102 ] Maxim Gekk commented on SPARK-29575: This is intentional behavior. User's schema is forcibly set as nullable. See SPARK-23173 > from_json can produce nulls for fields which are marked as non-nullable > --- > > Key: SPARK-29575 > URL: https://issues.apache.org/jira/browse/SPARK-29575 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.4 >Reporter: Victor Lopez >Priority: Major > > I believe this issue was resolved elsewhere > (https://issues.apache.org/jira/browse/SPARK-23173), though for Pyspark this > bug seems to still be there. > The issue appears when using {{from_json}} to parse a column in a Spark > dataframe. It seems like {{from_json}} ignores whether the schema provided > has any {{nullable:False}} property. > {code:java} > schema = T.StructType().add(T.StructField('id', T.LongType(), > nullable=False)).add(T.StructField('name', T.StringType(), nullable=False)) > data = [{'user': str({'name': 'joe', 'id':1})}, {'user': str({'name': > 'jane'})}] > df = spark.read.json(sc.parallelize(data)) > df.withColumn("details", F.from_json("user", > schema)).select("details.*").show() > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29758) json_tuple truncates fields
[ https://issues.apache.org/jira/browse/SPARK-29758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16976099#comment-16976099 ] Maxim Gekk commented on SPARK-29758: I have reproduced the issue on 2.4. The problem is in Jackson core 2.6.7. It was fixed by https://github.com/FasterXML/jackson-core/commit/554f8db0f940b2a53f974852a2af194739d65200#diff-7990edc67621822770cdc62e12d933d4R647-R650 in the version 2.7.7. We could try to back port this https://github.com/apache/spark/pull/21596 on 2.4. [~hyukjin.kwon] WDYT? > json_tuple truncates fields > --- > > Key: SPARK-29758 > URL: https://issues.apache.org/jira/browse/SPARK-29758 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.4 > Environment: EMR 5.15.0 (Spark 2.3.0) And MacBook Pro (Mojave > 10.14.3, Spark 2.4.4) > Jdk 8, Scala 2.11.12 >Reporter: Stanislav >Priority: Major > > `json_tuple` has inconsistent behaviour with `from_json` - but only if json > string is longer than 2700 characters or so. > This can be reproduced in spark-shell and on cluster, but not in scalatest, > for some reason. > {code} > import org.apache.spark.sql.functions.{from_json, json_tuple} > import org.apache.spark.sql.types._ > val counterstring = > "*3*5*7*9*12*15*18*21*24*27*30*33*36*39*42*45*48*51*54*57*60*63*66*69*72*75*78*81*84*87*90*93*96*99*103*107*111*115*119*123*127*131*135*139*143*147*151*155*159*163*167*171*175*179*183*187*191*195*199*203*207*211*215*219*223*227*231*235*239*243*247*251*255*259*263*267*271*275*279*283*287*291*295*299*303*307*311*315*319*323*327*331*335*339*343*347*351*355*359*363*367*371*375*379*383*387*391*395*399*403*407*411*415*419*423*427*431*435*439*443*447*451*455*459*463*467*471*475*479*483*487*491*495*499*503*507*511*515*519*523*527*531*535*539*543*547*551*555*559*563*567*571*575*579*583*587*591*595*599*603*607*611*615*619*623*627*631*635*639*643*647*651*655*659*663*667*671*675*679*683*687*691*695*699*703*707*711*715*719*723*727*731*735*739*743*747*751*755*759*763*767*771*775*779*783*787*791*795*799*803*807*811*815*819*823*827*831*835*839*843*847*851*855*859*863*867*871*875*879*883*887*891*895*899*903*907*911*915*919*923*927*931*935*939*943*947*951*955*959*963*967*971*975*979*983*987*991*995*1000*1005*1010*1015*1020*1025*1030*1035*1040*1045*1050*1055*1060*1065*1070*1075*1080*1085*1090*1095*1100*1105*1110*1115*1120*1125*1130*1135*1140*1145*1150*1155*1160*1165*1170*1175*1180*1185*1190*1195*1200*1205*1210*1215*1220*1225*1230*1235*1240*1245*1250*1255*1260*1265*1270*1275*1280*1285*1290*1295*1300*1305*1310*1315*1320*1325*1330*1335*1340*1345*1350*1355*1360*1365*1370*1375*1380*1385*1390*1395*1400*1405*1410*1415*1420*1425*1430*1435*1440*1445*1450*1455*1460*1465*1470*1475*1480*1485*1490*1495*1500*1505*1510*1515*1520*1525*1530*1535*1540*1545*1550*1555*1560*1565*1570*1575*1580*1585*1590*1595*1600*1605*1610*1615*1620*1625*1630*1635*1640*1645*1650*1655*1660*1665*1670*1675*1680*1685*1690*1695*1700*1705*1710*1715*1720*1725*1730*1735*1740*1745*1750*1755*1760*1765*1770*1775*1780*1785*1790*1795*1800*1805*1810*1815*1820*1825*1830*1835*1840*1845*1850*1855*1860*1865*1870*1875*1880*1885*1890*1895*1900*1905*1910*1915*1920*1925*1930*1935*1940*1945*1950*1955*1960*1965*1970*1975*1980*1985*1990*1995*2000*2005*2010*2015*2020*2025*2030*2035*2040*2045*2050*2055*2060*2065*2070*2075*2080*2085*2090*2095*2100*2105*2110*2115*2120*2125*2130*2135*2140*2145*2150*2155*2160*2165*2170*2175*2180*2185*2190*2195*2200*2205*2210*2215*2220*2225*2230*2235*2240*2245*2250*2255*2260*2265*2270*2275*2280*2285*2290*2295*2300*2305*2310*2315*2320*2325*2330*2335*2340*2345*2350*2355*2360*2365*2370*2375*2380*2385*2390*2395*2400*2405*2410*2415*2420*2425*2430*2435*2440*2445*2450*2455*2460*2465*2470*2475*2480*2485*2490*2495*2500*2505*2510*2515*2520*2525*2530*2535*2540*2545*2550*2555*2560*2565*2570*2575*2580*2585*2590*2595*2600*2605*2610*2615*2620*2625*2630*2635*2640*2645*2650*2655*2660*2665*2670*2675*2680*2685*2690*2695*2700*2705*2710*2715*2720*2725*2730*2735*2740*2745*2750*2755*2760*2765*2770*2775*2780*2785*2790*2795*2800*" > val json_tuple_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("result", json_tuple('json, "test")) > .select('result) > .as[String].head.length > val from_json_result = Seq(s"""{"test":"$counterstring"}""").toDF("json") > .withColumn("parsed", from_json('json, StructType(Seq(StructField("test", > StringType) > .withColumn("result", $"parsed.test") > .select('result) > .as[String].head.length > scala> json_tuple_result > res62: Int = 2791 > scala> from_json_result > res63: Int = 2800 > {code} > Result is influenced by the total length of the json string at the moment of > parsing: > {code} > val json_tuple_result_with_prefix = Seq(s"""{"prefix": "dummy", > "test":"$counterstring"}""").toDF("jso
[jira] [Updated] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings
[ https://issues.apache.org/jira/browse/SPARK-29933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29933: --- Attachment: filter_tests.patch > ThriftServerQueryTestSuite runs tests with wrong settings > - > > Key: SPARK-29933 > URL: https://issues.apache.org/jira/browse/SPARK-29933 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > Attachments: filter_tests.patch > > > ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it > keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in > the PostgreSQL dialect. See > https://github.com/apache/spark/pull/26473#issuecomment-554510643 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29933) ThriftServerQueryTestSuite runs tests with wrong settings
Maxim Gekk created SPARK-29933: -- Summary: ThriftServerQueryTestSuite runs tests with wrong settings Key: SPARK-29933 URL: https://issues.apache.org/jira/browse/SPARK-29933 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk ThriftServerQueryTestSuite must run ANSI tests in the Spark dialect but it keeps settings from previous runs. And in fact, it run `ansi/interval.sql` in the PostgreSQL dialect. See https://github.com/apache/spark/pull/26473#issuecomment-554510643 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
[ https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975944#comment-16975944 ] Maxim Gekk commented on SPARK-29931: > It's conceivable there could a reason to do it later, or sooner. Later is not problem what about sooner. Most of the configs were added for Spark 3.0. If you decide to remove one of them in a minor release between 3.0 and 4.0, you can break user apps that is unacceptable for minor releases, I do believe. > Declare all SQL legacy configs as will be removed in Spark 4.0 > -- > > Key: SPARK-29931 > URL: https://issues.apache.org/jira/browse/SPARK-29931 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Add the sentence to descriptions of all legacy SQL configs existed before > Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of > such configs: > * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > * spark.sql.legacy.literal.pickMinimumPrecision > * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation > * spark.sql.legacy.sizeOfNull > * spark.sql.legacy.replaceDatabricksSparkAvro.enabled > * spark.sql.legacy.setopsPrecedence.enabled > * spark.sql.legacy.integralDivide.returnBigint > * spark.sql.legacy.bucketedTableScan.outputOrdering > * spark.sql.legacy.parser.havingWithoutGroupByAsWhere > * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue > * spark.sql.legacy.setCommandRejectsSparkCoreConfs > * spark.sql.legacy.utcTimestampFunc.enabled > * spark.sql.legacy.typeCoercion.datetimeToString > * spark.sql.legacy.looseUpcast > * spark.sql.legacy.ctePrecedence.enabled > * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
[ https://issues.apache.org/jira/browse/SPARK-29931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975813#comment-16975813 ] Maxim Gekk commented on SPARK-29931: [~rxin] [~lixiao] [~srowen] [~dongjoon] [~cloud_fan] [~hyukjin.kwon] Does this make sense for you? > Declare all SQL legacy configs as will be removed in Spark 4.0 > -- > > Key: SPARK-29931 > URL: https://issues.apache.org/jira/browse/SPARK-29931 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > Add the sentence to descriptions of all legacy SQL configs existed before > Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of > such configs: > * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName > * spark.sql.legacy.literal.pickMinimumPrecision > * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation > * spark.sql.legacy.sizeOfNull > * spark.sql.legacy.replaceDatabricksSparkAvro.enabled > * spark.sql.legacy.setopsPrecedence.enabled > * spark.sql.legacy.integralDivide.returnBigint > * spark.sql.legacy.bucketedTableScan.outputOrdering > * spark.sql.legacy.parser.havingWithoutGroupByAsWhere > * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue > * spark.sql.legacy.setCommandRejectsSparkCoreConfs > * spark.sql.legacy.utcTimestampFunc.enabled > * spark.sql.legacy.typeCoercion.datetimeToString > * spark.sql.legacy.looseUpcast > * spark.sql.legacy.ctePrecedence.enabled > * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29931) Declare all SQL legacy configs as will be removed in Spark 4.0
Maxim Gekk created SPARK-29931: -- Summary: Declare all SQL legacy configs as will be removed in Spark 4.0 Key: SPARK-29931 URL: https://issues.apache.org/jira/browse/SPARK-29931 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Add the sentence to descriptions of all legacy SQL configs existed before Spark 3.0: "This config will be removed in Spark 4.0.". Here is the list of such configs: * spark.sql.legacy.execution.pandas.groupedMap.assignColumnsByName * spark.sql.legacy.literal.pickMinimumPrecision * spark.sql.legacy.allowCreatingManagedTableUsingNonemptyLocation * spark.sql.legacy.sizeOfNull * spark.sql.legacy.replaceDatabricksSparkAvro.enabled * spark.sql.legacy.setopsPrecedence.enabled * spark.sql.legacy.integralDivide.returnBigint * spark.sql.legacy.bucketedTableScan.outputOrdering * spark.sql.legacy.parser.havingWithoutGroupByAsWhere * spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue * spark.sql.legacy.setCommandRejectsSparkCoreConfs * spark.sql.legacy.utcTimestampFunc.enabled * spark.sql.legacy.typeCoercion.datetimeToString * spark.sql.legacy.looseUpcast * spark.sql.legacy.ctePrecedence.enabled * spark.sql.legacy.arrayExistsFollowsThreeValuedLogic -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29930) Remove SQL configs declared to be removed in Spark 3.0
Maxim Gekk created SPARK-29930: -- Summary: Remove SQL configs declared to be removed in Spark 3.0 Key: SPARK-29930 URL: https://issues.apache.org/jira/browse/SPARK-29930 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Need to remove the following SQL configs: * spark.sql.fromJsonForceNullableSchema * spark.sql.legacy.compareDateTimestampInTimestamp -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29928) Check parsing timestamps up to microsecond precision by JSON/CSV datasource
Maxim Gekk created SPARK-29928: -- Summary: Check parsing timestamps up to microsecond precision by JSON/CSV datasource Key: SPARK-29928 URL: https://issues.apache.org/jira/browse/SPARK-29928 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Port tests added for 2.4 by the commit: https://github.com/apache/spark/commit/9c7e8be1dca8285296f3052c41f35043699d7d10 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources
[ https://issues.apache.org/jira/browse/SPARK-29904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29904: --- Affects Version/s: 2.4.0 2.4.1 2.4.2 2.4.3 > Parse timestamps in microsecond precision by JSON/CSV datasources > - > > Key: SPARK-29904 > URL: https://issues.apache.org/jira/browse/SPARK-29904 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.5 > > > Currently, Spark can parse strings with timestamps from JSON/CSV in > millisecond precision. Internally, timestamps have microsecond precision. The > ticket aims to modify parsing logic in Spark 2.4 to support the microsecond > precision. Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview > is risky, so, need to find another lighter solution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
[ https://issues.apache.org/jira/browse/SPARK-29927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16975697#comment-16975697 ] Maxim Gekk commented on SPARK-29927: [~cloud_fan] WDYT, does it make sense to change the functions as well? > Parse timestamps in microsecond precision by `to_timestamp`, > `to_unix_timestamp`, `unix_timestamp` > -- > > Key: SPARK-29927 > URL: https://issues.apache.org/jira/browse/SPARK-29927 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Major > > Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` > functions uses SimpleDateFormat to parse strings to timestamps. > SimpleDateFormat is able to parse only in millisecond precision if an user > specified `SSS` in a pattern. The ticket aims to support parsing up to the > microsecond precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29927) Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp`
Maxim Gekk created SPARK-29927: -- Summary: Parse timestamps in microsecond precision by `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` Key: SPARK-29927 URL: https://issues.apache.org/jira/browse/SPARK-29927 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, the `to_timestamp`, `to_unix_timestamp`, `unix_timestamp` functions uses SimpleDateFormat to parse strings to timestamps. SimpleDateFormat is able to parse only in millisecond precision if an user specified `SSS` in a pattern. The ticket aims to support parsing up to the microsecond precision. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29920) Parsing failure on interval '20 15' day to hour
Maxim Gekk created SPARK-29920: -- Summary: Parsing failure on interval '20 15' day to hour Key: SPARK-29920 URL: https://issues.apache.org/jira/browse/SPARK-29920 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk {code:sql} spark-sql> select interval '20 15' day to hour; Error in query: requirement failed: Interval string must match day-time format of 'd h:m:s.n': 20 15(line 1, pos 16) == SQL == select interval '20 15' day to hour ^^^ {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29904) Parse timestamps in microsecond precision by JSON/CSV datasources
Maxim Gekk created SPARK-29904: -- Summary: Parse timestamps in microsecond precision by JSON/CSV datasources Key: SPARK-29904 URL: https://issues.apache.org/jira/browse/SPARK-29904 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, Spark can parse strings with timestamps from JSON/CSV in millisecond precision. Internally, timestamps have microsecond precision. The ticket aims to modify parsing logic in Spark 2.4 to support the microsecond precision. Porting of DateFormatter/TimestampFormatter from Spark 3.0-preview is risky, so, need to find another lighter solution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29866) Upper case enum values
Maxim Gekk created SPARK-29866: -- Summary: Upper case enum values Key: SPARK-29866 URL: https://issues.apache.org/jira/browse/SPARK-29866 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Unify naming of enum values and upper case their names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29864) Strict parsing of day-time strings to intervals
Maxim Gekk created SPARK-29864: -- Summary: Strict parsing of day-time strings to intervals Key: SPARK-29864 URL: https://issues.apache.org/jira/browse/SPARK-29864 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Currently, the IntervalUtils.fromDayTimeString() method does not takes into account the left bound `from` and truncates the result using the right bound `to`. The method should respect to the bounds specified by an user. Oracle and MySQL respect to user's bounds, see https://github.com/apache/spark/pull/26358#issuecomment-551942719 and https://github.com/apache/spark/pull/26358#issuecomment-549272475 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29819) Introduce an enum for interval units
Maxim Gekk created SPARK-29819: -- Summary: Introduce an enum for interval units Key: SPARK-29819 URL: https://issues.apache.org/jira/browse/SPARK-29819 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Add enum for interval units. This will allow to type check inputs and to avoid typos in interval unit names. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29385) Make `INTERVAL` values comparable
[ https://issues.apache.org/jira/browse/SPARK-29385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk resolved SPARK-29385. Fix Version/s: 3.0.0 Resolution: Fixed Resolved by the PR: https://github.com/apache/spark/pull/26337 > Make `INTERVAL` values comparable > - > > Key: SPARK-29385 > URL: https://issues.apache.org/jira/browse/SPARK-29385 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > PostgreSQL allows to compare interval by `=`, `<>`, `<`, `<=`, `>`, `>=`. For > example: > {code} > maxim=# select interval '1 month' > interval '29 days'; > ?column? > -- > t > {code} > but the same fails in Spark: > {code} > spark-sql> select interval 1 month > interval 29 days; > Error in query: cannot resolve '(interval 1 months > interval 4 weeks 1 > days)' due to data type mismatch: GreaterThan does not support ordering on > type interval; line 1 pos 7; > 'Project [unresolvedalias((interval 1 months > interval 4 weeks 1 days), > None)] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29408) Support interval literal with negative sign `-`
[ https://issues.apache.org/jira/browse/SPARK-29408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29408: --- Description: For example: {code} maxim=# select -interval '1 day -1 hour'; ?column? --- -1 days +01:00:00 (1 row) maxim=# select - interval '1-2' AS "negative year-month"; negative year-month - -1 years -2 mons (1 row) {code} was: For example: {code} maxim=# select - interval '1-2' AS "negative year-month"; negative year-month - -1 years -2 mons (1 row) {code} > Support interval literal with negative sign `-` > --- > > Key: SPARK-29408 > URL: https://issues.apache.org/jira/browse/SPARK-29408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > For example: > {code} > maxim=# select -interval '1 day -1 hour'; > ?column? > --- > -1 days +01:00:00 > (1 row) > maxim=# select - interval '1-2' AS "negative year-month"; > negative year-month > - > -1 years -2 mons > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29750) Avoid dependency from joda-time
Maxim Gekk created SPARK-29750: -- Summary: Avoid dependency from joda-time Key: SPARK-29750 URL: https://issues.apache.org/jira/browse/SPARK-29750 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 2.4.4 Reporter: Maxim Gekk * Remove direct dependency from joda-time * If it is used somewhere in Spark, use Java 8 time API instead of it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29736) Improve stability of tests for special datetime values
Maxim Gekk created SPARK-29736: -- Summary: Improve stability of tests for special datetime values Key: SPARK-29736 URL: https://issues.apache.org/jira/browse/SPARK-29736 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk The test can fail around midnight if reference values are taken before midnight and tested code resolves special values after midnight. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29733) Wrong order of assertEquals parameters
Maxim Gekk created SPARK-29733: -- Summary: Wrong order of assertEquals parameters Key: SPARK-29733 URL: https://issues.apache.org/jira/browse/SPARK-29733 Project: Spark Issue Type: Test Components: ML, Spark Core, SQL, Structured Streaming Affects Versions: 2.4.4 Reporter: Maxim Gekk The assertEquals() requires the expected value as the first parameter, for instance: https://junit.org/junit4/javadoc/4.12/org/junit/Assert.html#assertEquals(long,%20long) but in some places the expected value is passed as the second parameter that confuses when such assert fails. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29723) Get date and time parts of an interval as java classes
Maxim Gekk created SPARK-29723: -- Summary: Get date and time parts of an interval as java classes Key: SPARK-29723 URL: https://issues.apache.org/jira/browse/SPARK-29723 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Taking into account that an instances of CalendarInterval can be returned to users as the result of collect or in UDF, it could be convenient for the users to get parts of interval as Java classes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29712) fromDayTimeString() does not take into account the left bound
Maxim Gekk created SPARK-29712: -- Summary: fromDayTimeString() does not take into account the left bound Key: SPARK-29712 URL: https://issues.apache.org/jira/browse/SPARK-29712 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 3.0.0 Reporter: Maxim Gekk Currently, fromDayTimeString() takes into account the right bound but not the left one. For example: {code} spark-sql> SELECT interval '1 2:03:04' hour to minute; interval 1 days 2 hours 3 minutes {code} The result should be *interval 2 hours 3 minutes* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29636) Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp
[ https://issues.apache.org/jira/browse/SPARK-29636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963680#comment-16963680 ] Maxim Gekk commented on SPARK-29636: 1. The output is different because Spark uses the session local time zone while converting timestamps to strings 2. It seems this format is not supported, see https://github.com/apache/spark/blob/4cfce3e5d03b0badb4e9685499be2ab0fca5747a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L204-L211 . The seconds field as well as hour and minute one is mandatory. > Can't parse '11:00 BST' or '2000-10-19 10:23:54+01' signatures to timestamp > --- > > Key: SPARK-29636 > URL: https://issues.apache.org/jira/browse/SPARK-29636 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dylan Guedes >Priority: Major > > Currently, Spark can't parse a string such as '11:00 BST' or '2000-10-19 > 10:23:54+01' to timestamp: > {code:sql} > spark-sql> select cast ('11:00 BST' as timestamp); > NULL > Time taken: 2.248 seconds, Fetched 1 row(s) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29671) Change format of interval string
[ https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963375#comment-16963375 ] Maxim Gekk commented on SPARK-29671: For example, PostgreSQL displays intervals like: {code} maxim=# select interval '1010 year 9 month 8 day 7 hour 6 minute -5 second 4 millisecond -3 microseconds'; interval -- 1010 years 9 mons 8 days 07:05:55.003997 (1 row) {code} but this requires "normalization" because time fields cannot be negative. > Change format of interval string > > > Key: SPARK-29671 > URL: https://issues.apache.org/jira/browse/SPARK-29671 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > The ticket aims to improve format of interval representation as a string. See > https://github.com/apache/spark/pull/26313#issuecomment-547820035 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29671) Change format of interval string
[ https://issues.apache.org/jira/browse/SPARK-29671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963373#comment-16963373 ] Maxim Gekk commented on SPARK-29671: [~cloud_fan][~dongjoon] Let's discuss here how to improve the string representation of intervals. > Change format of interval string > > > Key: SPARK-29671 > URL: https://issues.apache.org/jira/browse/SPARK-29671 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Minor > > The ticket aims to improve format of interval representation as a string. See > https://github.com/apache/spark/pull/26313#issuecomment-547820035 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29671) Change format of interval string
Maxim Gekk created SPARK-29671: -- Summary: Change format of interval string Key: SPARK-29671 URL: https://issues.apache.org/jira/browse/SPARK-29671 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk The ticket aims to improve format of interval representation as a string. See https://github.com/apache/spark/pull/26313#issuecomment-547820035 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29669) Refactor IntervalUtils.fromDayTimeString()
Maxim Gekk created SPARK-29669: -- Summary: Refactor IntervalUtils.fromDayTimeString() Key: SPARK-29669 URL: https://issues.apache.org/jira/browse/SPARK-29669 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk * Add UnitName enumeration and use it in AstBuilder and in IntervalUtils * Make fromDayTimeString more generic and avoid adhoc code * Introduce unit value properties like min/max values and a function to convert parsed value to micros -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29651) Incorrect parsing of interval seconds fraction
Maxim Gekk created SPARK-29651: -- Summary: Incorrect parsing of interval seconds fraction Key: SPARK-29651 URL: https://issues.apache.org/jira/browse/SPARK-29651 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 2.3.0, 2.2.0, 2.1.0, 2.0.0 Reporter: Maxim Gekk * The fractional part of interval seconds unit is incorrectly parsed if the number of digits is less than 9, for example: {code} spark-sql> select interval '10.123456 seconds'; interval 10 seconds 123 microseconds {code} The result must be *interval 10 seconds 123 milliseconds 456 microseconds* * If the seconds unit of an interval is negative, it is incorrectly converted to `CalendarInterval`, for example: {code} spark-sql> select interval '-10.123456789 seconds'; interval -9 seconds -876 milliseconds -544 microseconds {code} Taking into account truncation to microseconds, the result must be *interval -10 seconds -123 milliseconds -456 microseconds* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29614) Failure of DateTimeUtilsSuite and TimestampFormatterSuite
Maxim Gekk created SPARK-29614: -- Summary: Failure of DateTimeUtilsSuite and TimestampFormatterSuite Key: SPARK-29614 URL: https://issues.apache.org/jira/browse/SPARK-29614 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk * https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-3.2/653/ * https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/112721/testReport/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29607) Move static methods from CalendarInterval to IntervalUtils
Maxim Gekk created SPARK-29607: -- Summary: Move static methods from CalendarInterval to IntervalUtils Key: SPARK-29607 URL: https://issues.apache.org/jira/browse/SPARK-29607 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Move static methods from the CalendarInterval class to the helper object IntervalUtils. Need to rewrite Java code to Scala code. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29605) Optimize string to interval casting
Maxim Gekk created SPARK-29605: -- Summary: Optimize string to interval casting Key: SPARK-29605 URL: https://issues.apache.org/jira/browse/SPARK-29605 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Implement new function stringToInterval in IntervalUtils to cast a value of UTF8String to an instance of CalendarInterval that should be faster than existing implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29533) Benchmark casting strings to intervals
Maxim Gekk created SPARK-29533: -- Summary: Benchmark casting strings to intervals Key: SPARK-29533 URL: https://issues.apache.org/jira/browse/SPARK-29533 Project: Spark Issue Type: Test Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Add benchmark for casting interval strings to intervals for different number of interval units. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29524) Unordered interval units
Maxim Gekk created SPARK-29524: -- Summary: Unordered interval units Key: SPARK-29524 URL: https://issues.apache.org/jira/browse/SPARK-29524 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, Spark requires particular order of interval units in casting from strings - `YEAR` .. `MICROSECOND`. PostgreSQL allows any order: {code} maxim=# select interval '1 second 2 hours'; interval -- 02:00:01 (1 row) {code} but Spark fails on while parsing: {code} spark-sql> select interval '1 second 2 hours'; NULL {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29520) Incorrect checking of negative intervals
Maxim Gekk created SPARK-29520: -- Summary: Incorrect checking of negative intervals Key: SPARK-29520 URL: https://issues.apache.org/jira/browse/SPARK-29520 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.4.4 Reporter: Maxim Gekk An interval is negative when its duration is negative. The following code checks interval incorrectly: * https://github.com/apache/spark/blob/f302c2ee6203de36e966fcc58917af4847dff7f2/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/GroupStateImpl.scala#L163 * https://github.com/apache/spark/blob/d841b33ba3a9b0504597dbccd4b0d11fa810abf3/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L734 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29518) Benchmark `date_part` for `INTERVAL`
Maxim Gekk created SPARK-29518: -- Summary: Benchmark `date_part` for `INTERVAL` Key: SPARK-29518 URL: https://issues.apache.org/jira/browse/SPARK-29518 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk SPARK-28420 supported the `INTERVAL` columns in `date_part()`. Need to add benchmarks for the new type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations
[ https://issues.apache.org/jira/browse/SPARK-29508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954579#comment-16954579 ] Maxim Gekk commented on SPARK-29508: I am working on it > Implicitly cast strings in datetime arithmetic operations > - > > Key: SPARK-29508 > URL: https://issues.apache.org/jira/browse/SPARK-29508 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.4 >Reporter: Maxim Gekk >Priority: Minor > > To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` > types in the cases: > # Cast string to interval in interval - string > # Cast string to interval in datetime + string or string + datetime > # Cast string to timestamp in datetime - string or string - datetime -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29508) Implicitly cast strings in datetime arithmetic operations
Maxim Gekk created SPARK-29508: -- Summary: Implicitly cast strings in datetime arithmetic operations Key: SPARK-29508 URL: https://issues.apache.org/jira/browse/SPARK-29508 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk To improve Spark SQL UX, strings can be cast to the `INTERVAL` or `TIMESTAMP` types in the cases: # Cast string to interval in interval - string # Cast string to interval in datetime + string or string + datetime # Cast string to timestamp in datetime - string or string - datetime -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29387) Support `*` and `/` operators for intervals
[ https://issues.apache.org/jira/browse/SPARK-29387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29387: --- Summary: Support `*` and `/` operators for intervals (was: Support `*` and `\` operators for intervals) > Support `*` and `/` operators for intervals > --- > > Key: SPARK-29387 > URL: https://issues.apache.org/jira/browse/SPARK-29387 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Support `*` by numeric, `/` by numeric. See > [https://www.postgresql.org/docs/12/functions-datetime.html] > ||Operator||Example||Result|| > |*|900 * interval '1 second'|interval '00:15:00'| > |*|21 * interval '1 day'|interval '21 days'| > |/|interval '1 hour' / double precision '1.5'|interval '00:40:00'| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10848) Applied JSON Schema Works for json RDD but not when loading json file
[ https://issues.apache.org/jira/browse/SPARK-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950461#comment-16950461 ] Maxim Gekk commented on SPARK-10848: Nullable = false in user's schema cannot guarantee that nulls don't appear in loaded data. That can lead to weird errors like corruptions in saved parquet files described in SPARK-23173 > Applied JSON Schema Works for json RDD but not when loading json file > - > > Key: SPARK-10848 > URL: https://issues.apache.org/jira/browse/SPARK-10848 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Miklos Christine >Priority: Minor > > Using a defined schema to load a json rdd works as expected. Loading the json > records from a file does not apply the supplied schema. Mainly the nullable > field isn't applied correctly. Loading from a file uses nullable=true on all > fields regardless of applied schema. > Code to reproduce: > {code} > import org.apache.spark.sql.types._ > val jsonRdd = sc.parallelize(List( > """{"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", > "ProductCode": "WQT648", "Qty": 5}""", > """{"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", > "ProductCode": "LG4-Z5", "Qty": 10, "Discount":0.25, > "expressDelivery":true}""")) > val mySchema = StructType(Array( > StructField(name="OrderID" , dataType=LongType, nullable=false), > StructField("CustomerID", IntegerType, false), > StructField("OrderDate", DateType, false), > StructField("ProductCode", StringType, false), > StructField("Qty", IntegerType, false), > StructField("Discount", FloatType, true), > StructField("expressDelivery", BooleanType, true) > )) > val myDF = sqlContext.read.schema(mySchema).json(jsonRdd) > val schema1 = myDF.printSchema > val dfDFfromFile = sqlContext.read.schema(mySchema).json("Orders.json") > val schema2 = dfDFfromFile.printSchema > {code} > Orders.json > {code} > {"OrderID": 1, "CustomerID":452 , "OrderDate": "2015-05-16", "ProductCode": > "WQT648", "Qty": 5} > {"OrderID": 2, "CustomerID":16 , "OrderDate": "2015-07-11", "ProductCode": > "LG4-Z5", "Qty": 10, "Discount":0.25, "expressDelivery":true} > {code} > The behavior should be consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29448) Support the `INTERVAL` type by Parquet datasource
Maxim Gekk created SPARK-29448: -- Summary: Support the `INTERVAL` type by Parquet datasource Key: SPARK-29448 URL: https://issues.apache.org/jira/browse/SPARK-29448 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Parquet format allows to store intervals as triple of (milliseconds, days, months) see https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#interval . The `INTERVAL` logical type is used for an interval of time. _It must annotate a fixed_len_byte_array of length 12. This array stores three little-endian unsigned integers that represent durations at different granularities of time. The first stores a number in months, the second stores a number in days, and the third stores a number in milliseconds. This representation is independent of any particular timezone or date._ Need to support writing and reading values of Catalyst's CalendarIntervalType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29440) Support java.time.Duration as an external type of CalendarIntervalType
Maxim Gekk created SPARK-29440: -- Summary: Support java.time.Duration as an external type of CalendarIntervalType Key: SPARK-29440 URL: https://issues.apache.org/jira/browse/SPARK-29440 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.4 Reporter: Maxim Gekk Currently, Spark SQL doesn't have any external type for Catalyst's CalendarIntervalType. Internal CalendarInterval is partially exposed but it cannot be used in UDF for example. This ticket aims to provide `java.time.Duration` as one of external types of Spark `INTERVAL`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29382) Support writing `INTERVAL` type to datasource table
[ https://issues.apache.org/jira/browse/SPARK-29382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29382: --- Description: Creating a table with `INTERVAL` column for writing failed with the error: {code:java} spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval); Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.IllegalArgumentException: Error: type expected at the position 0 of 'interval' but 'interval' is found.; {code} This is needed for SPARK-29368 was: Spark cannot create a table using parquet if a column has the `INTERVAL` type: {code} spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval) USING PARQUET; Error in query: Parquet data source does not support interval data type.; {code} This is needed for SPARK-29368 > Support writing `INTERVAL` type to datasource table > --- > > Key: SPARK-29382 > URL: https://issues.apache.org/jira/browse/SPARK-29382 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Creating a table with `INTERVAL` column for writing failed with the error: > {code:java} > spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval); > Error in query: org.apache.hadoop.hive.ql.metadata.HiveException: > java.lang.IllegalArgumentException: Error: type expected at the position 0 of > 'interval' but 'interval' is found.; > {code} > This is needed for SPARK-29368 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29382) Support writing `INTERVAL` type to datasource table
[ https://issues.apache.org/jira/browse/SPARK-29382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29382: --- Summary: Support writing `INTERVAL` type to datasource table (was: Support the `INTERVAL` type by Parquet datasource) > Support writing `INTERVAL` type to datasource table > --- > > Key: SPARK-29382 > URL: https://issues.apache.org/jira/browse/SPARK-29382 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Spark cannot create a table using parquet if a column has the `INTERVAL` type: > {code} > spark-sql> CREATE TABLE INTERVAL_TBL (f1 interval) USING PARQUET; > Error in query: Parquet data source does not support interval data type.; > {code} > This is needed for SPARK-29368 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26651) Use Proleptic Gregorian calendar
[ https://issues.apache.org/jira/browse/SPARK-26651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948957#comment-16948957 ] Maxim Gekk commented on SPARK-26651: [~jiangxb] Could you consider this for including to the major changes lists of Spark 3.0 > Use Proleptic Gregorian calendar > > > Key: SPARK-26651 > URL: https://issues.apache.org/jira/browse/SPARK-26651 > Project: Spark > Issue Type: Umbrella > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Labels: ReleaseNote > > Spark 2.4 and previous versions use a hybrid calendar - Julian + Gregorian in > date/timestamp parsing, functions and expressions. The ticket aims to switch > Spark on Proleptic Gregorian calendar, and use java.time classes introduced > in Java 8 for timestamp/date manipulations. One of the purpose of switching > on Proleptic Gregorian calendar is to conform to SQL standard which supposes > such calendar. > *Release note:* > Spark 3.0 has switched on Proleptic Gregorian calendar in parsing, > formatting, and converting dates and timestamps as well as in extracting > sub-components like years, days and etc. It uses Java 8 API classes from the > java.time packages that based on [ISO chronology > |https://docs.oracle.com/javase/8/docs/api/java/time/chrono/IsoChronology.html]. > Previous versions of Spark performed those operations by using [the hybrid > calendar|https://docs.oracle.com/javase/7/docs/api/java/util/GregorianCalendar.html] > (Julian + Gregorian). The changes might impact on the results for dates and > timestamps before October 15, 1582 (Gregorian). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29408) Support interval literal with negative sign `-`
[ https://issues.apache.org/jira/browse/SPARK-29408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29408: --- Summary: Support interval literal with negative sign `-` (was: Support interval literal with negative sign `-`.) > Support interval literal with negative sign `-` > --- > > Key: SPARK-29408 > URL: https://issues.apache.org/jira/browse/SPARK-29408 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > For example: > {code} > maxim=# select - interval '1-2' AS "negative year-month"; > negative year-month > - > -1 years -2 mons > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29408) Support interval literal with negative sign `-`.
Maxim Gekk created SPARK-29408: -- Summary: Support interval literal with negative sign `-`. Key: SPARK-29408 URL: https://issues.apache.org/jira/browse/SPARK-29408 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk For example: {code} maxim=# select - interval '1-2' AS "negative year-month"; negative year-month - -1 years -2 mons (1 row) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29407) Support syntax for zero interval
Maxim Gekk created SPARK-29407: -- Summary: Support syntax for zero interval Key: SPARK-29407 URL: https://issues.apache.org/jira/browse/SPARK-29407 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Support special syntax for zero interval like PostgreSQL does: {code} maxim=# SELECT interval '0'; interval -- 00:00:00 (1 row) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29406) Interval output styles
Maxim Gekk created SPARK-29406: -- Summary: Interval output styles Key: SPARK-29406 URL: https://issues.apache.org/jira/browse/SPARK-29406 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk The output format of the interval type can be set to one of the four styles sql_standard, postgres, postgres_verbose, or iso_8601, using the command SET intervalstyle, see [https://www.postgresql.org/docs/11/datatype-datetime.html#DATATYPE-INTERVAL-OUTPUT] ||Style Specification||Year-Month Interval||Day-Time Interval||Mixed Interval|| |{{sql_standard}}|1-2|3 4:05:06|-1-2 +3 -4:05:06| |{{postgres}}|1 year 2 mons|3 days 04:05:06|-1 year -2 mons +3 days -04:05:06| |{{postgres_verbose}}|@ 1 year 2 mons|@ 3 days 4 hours 5 mins 6 secs|@ 1 year 2 mons -3 days 4 hours 5 mins 6 secs ago| |{{iso_8601}}|P1Y2M|P3DT4H5M6S|P-1Y-2M3DT-4H-5M-6S| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29370) Interval strings without explicit unit markings
[ https://issues.apache.org/jira/browse/SPARK-29370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29370: --- Description: In PostgreSQL, Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. For example: {code:java} maxim=# select interval '1 12:59:10'; interval 1 day 12:59:10 (1 row) {code} It should allow to specify the sign: {code} maxim=# SELECT interval '1 +2:03:04' minute to second; interval 1 day 02:03:04 maxim=# SELECT interval '1 -2:03:04' minute to second; interval - 1 day -02:03:04 {code} was: In PostgreSQL, Quantities of days, hours, minutes, and seconds can be specified without explicit unit markings. For example, '1 12:59:10' is read the same as '1 day 12 hours 59 min 10 sec'. For example: {code} maxim=# select interval '1 12:59:10'; interval 1 day 12:59:10 (1 row) {code} > Interval strings without explicit unit markings > --- > > Key: SPARK-29370 > URL: https://issues.apache.org/jira/browse/SPARK-29370 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > In PostgreSQL, Quantities of days, hours, minutes, and seconds can be > specified without explicit unit markings. For example, '1 12:59:10' is read > the same as '1 day 12 hours 59 min 10 sec'. For example: > {code:java} > maxim=# select interval '1 12:59:10'; > interval > > 1 day 12:59:10 > (1 row) > {code} > It should allow to specify the sign: > {code} > maxim=# SELECT interval '1 +2:03:04' minute to second; > interval > > 1 day 02:03:04 > maxim=# SELECT interval '1 -2:03:04' minute to second; > interval > - > 1 day -02:03:04 > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29395) Precision of the interval type
Maxim Gekk created SPARK-29395: -- Summary: Precision of the interval type Key: SPARK-29395 URL: https://issues.apache.org/jira/browse/SPARK-29395 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk PostgreSQL allows to specify interval precision, see [https://www.postgresql.org/docs/12/datatype-datetime.html] |{{interval [ _{{fields}}_ ] [ (_{{p}}_) ]}}|16 bytes|time interval|-17800 years|17800 years|1 microsecond| For example: {code} maxim=# SELECT interval '1 2:03.4567' day to second(2); interval --- 1 day 00:02:03.46 (1 row) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29394) Support ISO 8601 format for intervals
Maxim Gekk created SPARK-29394: -- Summary: Support ISO 8601 format for intervals Key: SPARK-29394 URL: https://issues.apache.org/jira/browse/SPARK-29394 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Interval values can also be written as ISO 8601 time intervals, using either the “format with designators” of the standard's section 4.4.3.2 or the “alternative format” of section 4.4.3.3. For example: |P1Y2M3DT4H5M6S|ISO 8601 “format with designators”| |P0001-02-03T04:05:06|ISO 8601 “alternative format”: same meaning as above| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29393) Add the make_interval() function
Maxim Gekk created SPARK-29393: -- Summary: Add the make_interval() function Key: SPARK-29393 URL: https://issues.apache.org/jira/browse/SPARK-29393 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk PostgreSQL allows to make an interval by using the make_interval() function: |{{make_interval(_{{years}}_ }}{{int}}{{ DEFAULT 0, _{{months}}_ }}{{int}}{{ DEFAULT 0, _{{weeks}}_ }}{{int}}{{ DEFAULT 0, _{{days}}_ }}{{int}}{{ DEFAULT 0, _{{hours}}_ }}{{int}}{{ DEFAULT 0, _{{mins}}_ }}{{int}}{{ DEFAULT 0, _{{secs}}_ }}{{double precision}}{{ DEFAULT 0.0)}}|{{interval}}|Create interval from years, months, weeks, days, hours, minutes and seconds fields|{{make_interval(days => 10)}}|{{10 days}}| See https://www.postgresql.org/docs/12/functions-datetime.html -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29391) Default year-month units
Maxim Gekk created SPARK-29391: -- Summary: Default year-month units Key: SPARK-29391 URL: https://issues.apache.org/jira/browse/SPARK-29391 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk PostgreSQL can assume default year-month units by defaults: {code} maxim=# SELECT interval '1-2'; interval --- 1 year 2 mons {code} but the same produces NULL in Spark: -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29390) Add the justify_days(), justify_hours() and justify_interval() functions
Maxim Gekk created SPARK-29390: -- Summary: Add the justify_days(), justify_hours() and justify_interval() functions Key: SPARK-29390 URL: https://issues.apache.org/jira/browse/SPARK-29390 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk See *Table 9.31. Date/Time Functions* ([https://www.postgresql.org/docs/12/functions-datetime.html)] |{{justify_days(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 30-day time periods are represented as months|{{justify_days(interval '35 days')}}|{{1 mon 5 days}}| | {{justify_hours(}}{{interval}}{{)}}|{{interval}}|Adjust interval so 24-hour time periods are represented as days|{{justify_hours(interval '27 hours')}}|{{1 day 03:00:00}}| | {{justify_interval(}}{{interval}}{{)}}|{{interval}}|Adjust interval using {{justify_days}} and {{justify_hours}}, with additional sign adjustments|{{justify_interval(interval '1 mon -1 hour')}}|{{29 days 23:00:00}}| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29389) Short synonyms of interval units
Maxim Gekk created SPARK-29389: -- Summary: Short synonyms of interval units Key: SPARK-29389 URL: https://issues.apache.org/jira/browse/SPARK-29389 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Should be supported the following synonyms: {code} ["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"), "CENTURY", ("CENTURIES", "C", "CENT"), "DECADE", ("DECADES", "DEC", "DECS"), "YEAR", ("Y", "YEARS", "YR", "YRS"), "QUARTER", ("QTR"), "MONTH", ("MON", "MONS", "MONTHS"), "DAY", ("D", "DAYS"), "HOUR", ("H", "HOURS", "HR", "HRS"), "MINUTE", ("M", "MIN", "MINS", "MINUTES"), "SECOND", ("S", "SEC", "SECONDS", "SECS"), "MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", "MSECONDS", "MS"), "MICROSECONDS", ("USEC", "USECS", "USECONDS", "MICROSECON", "US"), "EPOCH"] {code} For example: {code} maxim=# select '1y 10mon -10d -10h -10min -10.01s ago'::interval; interval -1 years -10 mons +10 days 10:10:10.01 (1 row) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29389) Support synonyms for interval units
[ https://issues.apache.org/jira/browse/SPARK-29389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-29389: --- Summary: Support synonyms for interval units (was: Short synonyms of interval units) > Support synonyms for interval units > --- > > Key: SPARK-29389 > URL: https://issues.apache.org/jira/browse/SPARK-29389 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Priority: Major > > Should be supported the following synonyms: > {code} > ["MILLENNIUM", ("MILLENNIA", "MIL", "MILS"), >"CENTURY", ("CENTURIES", "C", "CENT"), >"DECADE", ("DECADES", "DEC", "DECS"), >"YEAR", ("Y", "YEARS", "YR", "YRS"), >"QUARTER", ("QTR"), >"MONTH", ("MON", "MONS", "MONTHS"), >"DAY", ("D", "DAYS"), >"HOUR", ("H", "HOURS", "HR", "HRS"), >"MINUTE", ("M", "MIN", "MINS", "MINUTES"), >"SECOND", ("S", "SEC", "SECONDS", "SECS"), >"MILLISECONDS", ("MSEC", "MSECS", "MILLISECON", > "MSECONDS", "MS"), >"MICROSECONDS", ("USEC", "USECS", "USECONDS", > "MICROSECON", "US"), >"EPOCH"] > {code} > For example: > {code} > maxim=# select '1y 10mon -10d -10h -10min -10.01s > ago'::interval; > interval > > -1 years -10 mons +10 days 10:10:10.01 > (1 row) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29388) Construct intervals from the `millenniums`, `centuries` or `decades` units
Maxim Gekk created SPARK-29388: -- Summary: Construct intervals from the `millenniums`, `centuries` or `decades` units Key: SPARK-29388 URL: https://issues.apache.org/jira/browse/SPARK-29388 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk PostgreSQL supports `millenniums`, `centuries` or `decades` interval units. See {code} maxim=# select '4 millenniums 5 centuries 4 decades 1 year 4 months 4 days 17 minutes 31 seconds'::interval; interval --- 4541 years 4 mons 4 days 00:17:31 (1 row) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-29387) Support `*` and `\` operators for intervals
Maxim Gekk created SPARK-29387: -- Summary: Support `*` and `\` operators for intervals Key: SPARK-29387 URL: https://issues.apache.org/jira/browse/SPARK-29387 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Support `*` by numeric, `/` by numeric. See [https://www.postgresql.org/docs/12/functions-datetime.html] ||Operator||Example||Result|| |*|900 * interval '1 second'|interval '00:15:00'| |*|21 * interval '1 day'|interval '21 days'| |/|interval '1 hour' / double precision '1.5'|interval '00:40:00'| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org