[jira] [Created] (SPARK-28672) [UDF] Duplicate function creation should not allow
ABHISHEK KUMAR GUPTA created SPARK-28672: Summary: [UDF] Duplicate function creation should not allow Key: SPARK-28672 URL: https://issues.apache.org/jira/browse/SPARK-28672 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: ABHISHEK KUMAR GUPTA 0: jdbc:hive2://10.18.18.214:23040/default> create function addm_3 AS 'com.huawei.bigdata.hive.example.udf.multiply' using jar 'hdfs://hacluster/user/Multiply.jar'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.084 seconds) 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm_3 AS 'com.huawei.bigdata.hive.example.udf.multiply' using jar 'hdfs://hacluster/user/Multiply.jar'; INFO : converting to local hdfs://hacluster/user/Multiply.jar INFO : Added [/tmp/8a396308-41f8-4335-9de4-8268ce5c70fe_resources/Multiply.jar] to class path INFO : Added resources: [hdfs://hacluster/user/Multiply.jar] +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.134 seconds) 0: jdbc:hive2://10.18.18.214:23040/default> show functions like addm_3; +-+--+ |function | +-+--+ | addm_3 | | default.addm_3 | +-+--+ 2 rows selected (0.047 seconds) 0: jdbc:hive2://10.18.18.214:23040/default> When show function executed it is listing both the function but what about the db for permanent function when user has not specified. Duplicate should not be allowed if user creating temporary one with the same name. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28565) DataSourceV2: DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-28565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz reassigned SPARK-28565: --- Assignee: Burak Yavuz (was: John Zhuge) > DataSourceV2: DataFrameWriter.saveAsTable > - > > Key: SPARK-28565 > URL: https://issues.apache.org/jira/browse/SPARK-28565 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: John Zhuge >Assignee: Burak Yavuz >Priority: Major > Fix For: 3.0.0 > > > Support multiple catalogs in the following use cases: > * DataFrameWriter.saveAsTable("catalog.db.tbl") -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28565) DataSourceV2: DataFrameWriter.saveAsTable
[ https://issues.apache.org/jira/browse/SPARK-28565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Burak Yavuz resolved SPARK-28565. - Resolution: Fixed Resolved by [https://github.com/apache/spark/pull/25330] > DataSourceV2: DataFrameWriter.saveAsTable > - > > Key: SPARK-28565 > URL: https://issues.apache.org/jira/browse/SPARK-28565 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: John Zhuge >Assignee: Burak Yavuz >Priority: Major > Fix For: 3.0.0 > > > Support multiple catalogs in the following use cases: > * DataFrameWriter.saveAsTable("catalog.db.tbl") -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28671) [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again
[ https://issues.apache.org/jira/browse/SPARK-28671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903570#comment-16903570 ] pavithra ramachandran commented on SPARK-28671: --- i will work on this > [UDF] dropping permanent function when a temporary function with the same > name already exists giving wrong msg on dropping it again > --- > > Key: SPARK-28671 > URL: https://issues.apache.org/jira/browse/SPARK-28671 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Created jar and uploaded at hdfs path > 1../hdfs dfs -put /opt/trash1/AddDoublesUDF.jar /user/user1/ > 2.Launch beeline and created permanent function > CREATE FUNCTION addDoubles AS > 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/user1/AddDoublesUDF.jar'; > 3.Perform select operation > jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); > +--+--+ > | default.addDoubles(1, 2, 3) | > +--+--+ > | 6.0 | > +--+--+ > 1 row selected (0.111 seconds) > 4.Created temporary function as below > jdbc:hive2://100.100.208.125:23040/default> CREATE temporary FUNCTION > addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/user1/AddDoublesUDF.jar'; > 5.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); > +--+--+ > | addDoubles(1, 2, 3) | > +--+--+ > | 6.0 | > +--+--+ > 1 row selected (0.088 seconds) > 6.Drop function > jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles; > +-+--+ > | Result | > +-+--+ > +-+--+ > 7.jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); > -- It is success > 8.Drop again Error thrown > jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles; > Error: org.apache.spark.sql.catalyst.analysis.NoSuchFunctionException: > Undefined function: 'default.addDoubles'. This function is neither a > registered temporary function nor a permanent function registered in the > database 'default'.; (state=,code=0) > 9.Perform again select > jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); > +--+--+ > | addDoubles(1, 2, 3) | > +--+--+ > | 6.0 | > > Issue is why the Error msg shown is step 8 saying it is neither registered as > permanent or temporary function where as it is registered as temporary > function in step 4 that is why in step 9 select is returning result. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28671) [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again
ABHISHEK KUMAR GUPTA created SPARK-28671: Summary: [UDF] dropping permanent function when a temporary function with the same name already exists giving wrong msg on dropping it again Key: SPARK-28671 URL: https://issues.apache.org/jira/browse/SPARK-28671 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: ABHISHEK KUMAR GUPTA Created jar and uploaded at hdfs path 1. ./hdfs dfs -put /opt/trash1/AddDoublesUDF.jar /user/user1/ 2. Launch beeline and created permanent function CREATE FUNCTION addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/user1/AddDoublesUDF.jar'; 3. Perform select operation jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); +--+--+ | default.addDoubles(1, 2, 3) | +--+--+ | 6.0 | +--+--+ 1 row selected (0.111 seconds) 4. Created temporary function as below jdbc:hive2://100.100.208.125:23040/default> CREATE temporary FUNCTION addDoubles AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/user1/AddDoublesUDF.jar'; 5. jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); +--+--+ | addDoubles(1, 2, 3) | +--+--+ | 6.0 | +--+--+ 1 row selected (0.088 seconds) 6. Drop function jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles; +-+--+ | Result | +-+--+ +-+--+ 7. jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); -- It is success 8. Drop again Error thrown jdbc:hive2://100.100.208.125:23040/default> drop function addDoubles; Error: org.apache.spark.sql.catalyst.analysis.NoSuchFunctionException: Undefined function: 'default.addDoubles'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; (state=,code=0) 9. Perform again select jdbc:hive2://100.100.208.125:23040/default> SELECT addDoubles(1,2,3); +--+--+ | addDoubles(1, 2, 3) | +--+--+ | 6.0 | Issue is why the Error msg shown is step 8 saying it is neither registered as permanent or temporary function where as it is registered as temporary function in step 4 that is why in step 9 select is returning result. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local
[ https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903565#comment-16903565 ] Sandeep Katta commented on SPARK-28670: --- [~dongjoon] [~hyukjin.kwon], as per this Jira there is a difference in *temporary* and *permanent* functions creation. IMHO I feel this behaviour should be consistent. Whenever the user creates the function if the resource does not exists then it should fail. > [UDF] create permanent UDF does not throw Exception if jar does not exist in > HDFS path or Local > --- > > Key: SPARK-28670 > URL: https://issues.apache.org/jira/browse/SPARK-28670 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > jdbc:hive2://10.18.18.214:23040/default> create function addm AS > 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.241 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm > AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > INFO : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar > ERROR : Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28017) Enhance DATE_TRUNC
[ https://issues.apache.org/jira/browse/SPARK-28017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28017: --- Assignee: Maxim Gekk > Enhance DATE_TRUNC > -- > > Key: SPARK-28017 > URL: https://issues.apache.org/jira/browse/SPARK-28017 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Maxim Gekk >Priority: Major > > For DATE_TRUNC, we need support: microseconds, milliseconds, decade, century, > millennium. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28017) Enhance DATE_TRUNC
[ https://issues.apache.org/jira/browse/SPARK-28017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28017. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25336 [https://github.com/apache/spark/pull/25336] > Enhance DATE_TRUNC > -- > > Key: SPARK-28017 > URL: https://issues.apache.org/jira/browse/SPARK-28017 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > For DATE_TRUNC, we need support: microseconds, milliseconds, decade, century, > millennium. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local
[ https://issues.apache.org/jira/browse/SPARK-28670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903559#comment-16903559 ] Sandeep Katta commented on SPARK-28670: --- [~abhishek.akg] thanks for raising this issue, I will post the PR ASAP > [UDF] create permanent UDF does not throw Exception if jar does not exist in > HDFS path or Local > --- > > Key: SPARK-28670 > URL: https://issues.apache.org/jira/browse/SPARK-28670 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > jdbc:hive2://10.18.18.214:23040/default> create function addm AS > 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (0.241 seconds) > 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm > AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar > 'hdfs://hacluster/user/AddDoublesUDF1.jar'; > INFO : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar > ERROR : Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > java.lang.RuntimeException: Failed to read external resource > hdfs://hacluster/user/AddDoublesUDF1.jar > at > org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28670) [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local
ABHISHEK KUMAR GUPTA created SPARK-28670: Summary: [UDF] create permanent UDF does not throw Exception if jar does not exist in HDFS path or Local Key: SPARK-28670 URL: https://issues.apache.org/jira/browse/SPARK-28670 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: ABHISHEK KUMAR GUPTA jdbc:hive2://10.18.18.214:23040/default> create function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.241 seconds) 0: jdbc:hive2://10.18.18.214:23040/default> create temporary function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'; INFO : converting to local hdfs://hacluster/user/AddDoublesUDF1.jar ERROR : Failed to read external resource hdfs://hacluster/user/AddDoublesUDF1.jar java.lang.RuntimeException: Failed to read external resource hdfs://hacluster/user/AddDoublesUDF1.jar at org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288) -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28572) Add simple analysis checks to the V2 Create Table code paths
[ https://issues.apache.org/jira/browse/SPARK-28572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28572. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25305 [https://github.com/apache/spark/pull/25305] > Add simple analysis checks to the V2 Create Table code paths > > > Key: SPARK-28572 > URL: https://issues.apache.org/jira/browse/SPARK-28572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Major > Fix For: 3.0.0 > > > Currently, the V2 Create Table code paths don't have any checks around: > # The existence of transforms in the table schema > # Duplications of transforms > # Case sensitivity checks around column names > Having these rudimentary checks would simplify V2 Catalog development. > Note that the goal of this JIRA is to not make V2 Create Table Hive > Compatible. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28572) Add simple analysis checks to the V2 Create Table code paths
[ https://issues.apache.org/jira/browse/SPARK-28572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28572: --- Assignee: Burak Yavuz > Add simple analysis checks to the V2 Create Table code paths > > > Key: SPARK-28572 > URL: https://issues.apache.org/jira/browse/SPARK-28572 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Burak Yavuz >Assignee: Burak Yavuz >Priority: Major > > Currently, the V2 Create Table code paths don't have any checks around: > # The existence of transforms in the table schema > # Duplications of transforms > # Case sensitivity checks around column names > Having these rudimentary checks would simplify V2 Catalog development. > Note that the goal of this JIRA is to not make V2 Create Table Hive > Compatible. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28650) Fix the guarantee of ForeachWriter
[ https://issues.apache.org/jira/browse/SPARK-28650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903510#comment-16903510 ] Jungtaek Lim commented on SPARK-28650: -- It sounds like either correctness or data-loss, and most of cases end users should change their implementation of open method to always return true for safety. Are you planning to work on this? If you don't plan to address this sooner, I'd like to take this up. Given we've changed guarantee, do we want to keep the signature of "open" as it is? By leaving it as it is, we still give a chance to skip writing but according to the guarantee it only makes sense when skipping whole batch. > Fix the guarantee of ForeachWriter > -- > > Key: SPARK-28650 > URL: https://issues.apache.org/jira/browse/SPARK-28650 > Project: Spark > Issue Type: Documentation > Components: Structured Streaming >Affects Versions: 2.4.3 >Reporter: Shixiong Zhu >Priority: Major > > Right now ForeachWriter has the following guarantee: > {code} > If the streaming query is being executed in the micro-batch mode, then every > partition > represented by a unique tuple (partitionId, epochId) is guaranteed to have > the same data. > Hence, (partitionId, epochId) can be used to deduplicate and/or > transactionally commit data > and achieve exactly-once guarantees. > {code} > > But we can break this easily actually when restarting a query but a batch is > re-run (e.g., upgrade Spark) > * Source returns a different DataFrame that has a different partition number > (e.g., we start to not create empty partitions in Kafka Source V2). > * A new added optimization rule may change the number of partitions in the > new run. > * Change the file split size in the new run. > Since we cannot guarantee that the same (partitionId, epochId) has the same > data. We should update the document for "ForeachWriter". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28587) JDBC data source's partition whereClause should support jdbc dialect
[ https://issues.apache.org/jira/browse/SPARK-28587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903447#comment-16903447 ] Takeshi Yamamuro commented on SPARK-28587: -- Hi, [~397090770]. Could you give me more info about your env to reproduce the failure? I wrote some tests ([https://github.com/apache/spark/compare/master...maropu:SPARK-28587]) and investigated it though, I couldn't find a root cause for that. For example, query, schema, Phoenix version, calcite version, brabrabra... > JDBC data source's partition whereClause should support jdbc dialect > > > Key: SPARK-28587 > URL: https://issues.apache.org/jira/browse/SPARK-28587 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.3 >Reporter: wyp >Priority: Minor > > When we use JDBC data source to search data from Phoenix, and use timestamp > data type column for partitionColumn, e.g. > {code:java} > val url = "jdbc:phoenix:thin:url=localhost:8765;serialization=PROTOBUF" > val driver = "org.apache.phoenix.queryserver.client.Driver" > val df = spark.read.format("jdbc") > .option("url", url) > .option("driver", driver) > .option("fetchsize", "1000") > .option("numPartitions", "6") > .option("partitionColumn", "times") > .option("lowerBound", "2019-07-31 00:00:00") > .option("upperBound", "2019-08-01 00:00:00") > .option("dbtable", "test") > .load().select("id") > println(df.count()) > {code} > there will throw AvaticaSqlException in phoenix: > {code:java} > org.apache.calcite.avatica.AvaticaSqlException: Error -1 (0) : while > preparing SQL: SELECT 1 FROM search_info_test WHERE "TIMES" < '2019-07-31 > 04:00:00' or "TIMES" is null > at org.apache.calcite.avatica.Helper.createException(Helper.java:54) > at org.apache.calcite.avatica.Helper.createException(Helper.java:41) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:368) > at > org.apache.calcite.avatica.AvaticaConnection.prepareStatement(AvaticaConnection.java:299) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:300) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:55) > at org.apache.spark.scheduler.Task.run(Task.scala:121) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408) > at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > java.lang.RuntimeException: org.apache.phoenix.schema.TypeMismatchException: > ERROR 203 (22005): Type mismatch. TIMESTAMP and VARCHAR for "TIMES" < > '2019-07-31 04:00:00' > at org.apache.calcite.avatica.jdbc.JdbcMeta.propagate(JdbcMeta.java:700) > at > org.apache.calcite.avatica.jdbc.PhoenixJdbcMeta.prepare(PhoenixJdbcMeta.java:67) > at > org.apache.calcite.avatica.remote.LocalService.apply(LocalService.java:195) > at > org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1215) > at > org.apache.calcite.avatica.remote.Service$PrepareRequest.accept(Service.java:1186) > at > org.apache.calcite.avatica.remote.AbstractHandler.apply(AbstractHandler.java:94) > at > org.apache.calcite.avatica.remote.ProtobufHandler.apply(ProtobufHandler.java:46) > at > org.apache.calcite.avatica.server.AvaticaProtobufHandler.handle(AvaticaProtobufHandler.java:127) > at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) >
[jira] [Assigned] (SPARK-28660) Add aggregates.sql - Part4
[ https://issues.apache.org/jira/browse/SPARK-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28660: - Assignee: Yuming Wang > Add aggregates.sql - Part4 > -- > > Key: SPARK-28660 > URL: https://issues.apache.org/jira/browse/SPARK-28660 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28660) Add aggregates.sql - Part4
[ https://issues.apache.org/jira/browse/SPARK-28660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28660. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25392 [https://github.com/apache/spark/pull/25392] > Add aggregates.sql - Part4 > -- > > Key: SPARK-28660 > URL: https://issues.apache.org/jira/browse/SPARK-28660 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > In this ticket, we plan to add the regression test cases of > https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28642) Hide credentials in show create table
[ https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-28642. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25375 [https://github.com/apache/spark/pull/25375] > Hide credentials in show create table > - > > Key: SPARK-28642 > URL: https://issues.apache.org/jira/browse/SPARK-28642 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > {code:sql} > spark-sql> show create table mysql_federated_sample; > CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, > `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, > `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` > STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN) > USING org.apache.spark.sql.jdbc > OPTIONS ( > `url` 'jdbc:mysql://localhost/hive?user=root=mypasswd', > `driver` 'com.mysql.jdbc.Driver', > `dbtable` 'TBLS' > ) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28642) Hide credentials in show create table
[ https://issues.apache.org/jira/browse/SPARK-28642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-28642: - Assignee: Yuming Wang > Hide credentials in show create table > - > > Key: SPARK-28642 > URL: https://issues.apache.org/jira/browse/SPARK-28642 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > {code:sql} > spark-sql> show create table mysql_federated_sample; > CREATE TABLE `mysql_federated_sample` (`TBL_ID` BIGINT, `CREATE_TIME` INT, > `DB_ID` BIGINT, `LAST_ACCESS_TIME` INT, `OWNER` STRING, `RETENTION` INT, > `SD_ID` BIGINT, `TBL_NAME` STRING, `TBL_TYPE` STRING, `VIEW_EXPANDED_TEXT` > STRING, `VIEW_ORIGINAL_TEXT` STRING, `IS_REWRITE_ENABLED` BOOLEAN) > USING org.apache.spark.sql.jdbc > OPTIONS ( > `url` 'jdbc:mysql://localhost/hive?user=root=mypasswd', > `driver` 'com.mysql.jdbc.Driver', > `dbtable` 'TBLS' > ) > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28669) System Information Functions
Yuming Wang created SPARK-28669: --- Summary: System Information Functions Key: SPARK-28669 URL: https://issues.apache.org/jira/browse/SPARK-28669 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang ||Name||Return Type||Description|| |{{current_catalog}}|{{name}}|name of current database (called “catalog” in the SQL standard)| |{{current_database()}}|{{name}}|name of current database| |{{current_query()}}|{{text}}|text of the currently executing query, as submitted by the client (might contain more than one statement)| |{{current_role}}|{{name}}|equivalent to {{current_user}}| |{{current_schema}}{{[()]}}|{{name}}|name of current schema| |{{current_schemas(}}{{boolean}}{{)}}|{{name[]}}|names of schemas in search path, optionally including implicit schemas| |{{current_user}}|{{name}}|user name of current execution context| |{{inet_client_addr()}}|{{inet}}|address of the remote connection| |{{inet_client_port()}}|{{int}}|port of the remote connection| |{{inet_server_addr()}}|{{inet}}|address of the local connection| |{{inet_server_port()}}|{{int}}|port of the local connection| |{{pg_backend_pid()}}|{{int}}|Process ID of the server process attached to the current session| |{{pg_blocking_pids(}}{{int}}{{)}}|{{int[]}}|Process ID(s) that are blocking specified server process ID from acquiring a lock| |{{pg_conf_load_time()}}|{{timestamp with time zone}}|configuration load time| |{{pg_current_logfile([{{text}}])}}|{{text}}|Primary log file name, or log in the requested format, currently in use by the logging collector| |{{pg_my_temp_schema()}}|{{oid}}|OID of session's temporary schema, or 0 if none| |{{pg_is_other_temp_schema(}}{{oid}}{{)}}|{{boolean}}|is schema another session's temporary schema?| |{{pg_listening_channels()}}|{{setof text}}|channel names that the session is currently listening on| |{{pg_notification_queue_usage()}}|{{double}}|fraction of the asynchronous notification queue currently occupied (0-1)| |{{pg_postmaster_start_time()}}|{{timestamp with time zone}}|server start time| |{{pg_safe_snapshot_blocking_pids(}}{{int}}{{)}}|{{int[]}}|Process ID(s) that are blocking specified server process ID from acquiring a safe snapshot| |{{pg_trigger_depth()}}|{{int}}|current nesting level of PostgreSQL triggers (0 if not called, directly or indirectly, from inside a trigger)| |{{session_user}}|{{name}}|session user name| |{{user}}|{{name}}|equivalent to {{current_user}}| Example: {code:sql} postgres=# SELECT pg_collation_for(description) FROM pg_description LIMIT 1; pg_collation_for -- "default" (1 row) {code} https://www.postgresql.org/docs/10/functions-info.html -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28085) Spark Scala API documentation URLs not working properly in Chrome
[ https://issues.apache.org/jira/browse/SPARK-28085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903377#comment-16903377 ] Andrew Leverentz commented on SPARK-28085: -- In Chrome 76, this issue appears to be resolved. Thanks to anyone out there who submitted bug reports :) > Spark Scala API documentation URLs not working properly in Chrome > - > > Key: SPARK-28085 > URL: https://issues.apache.org/jira/browse/SPARK-28085 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.3 >Reporter: Andrew Leverentz >Priority: Minor > > In Chrome version 75, URLs in the Scala API documentation are not working > properly, which makes them difficult to bookmark. > For example, URLs like the following get redirected to a generic "root" > package page: > [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/Dataset.html] > [https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset] > Here's the URL that I get redirected to: > [https://spark.apache.org/docs/latest/api/scala/index.html#package] > This issue seems to have appeared between versions 74 and 75 of Chrome, but > the documentation URLs still work in Safari. I suspect that this has > something to do with security-related changes to how Chrome 75 handles frames > and/or redirects. I've reported this issue to the Chrome team via the > in-browser help menu, but I don't have any visibility into their response, so > it's not clear whether they'll consider this a bug or "working as intended". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26859) Fix field writer index bug in non-vectorized ORC deserializer
[ https://issues.apache.org/jira/browse/SPARK-26859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26859: -- Fix Version/s: 2.3.4 > Fix field writer index bug in non-vectorized ORC deserializer > - > > Key: SPARK-26859 > URL: https://issues.apache.org/jira/browse/SPARK-26859 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Ivan Vergiliev >Assignee: Ivan Vergiliev >Priority: Major > Labels: correctness > Fix For: 2.3.4, 2.4.1, 3.0.0 > > > There is a bug in the ORC deserialization code that, when triggered, results > in completely wrong data being read. I've marked this as a Blocker as per the > docs in https://spark.apache.org/contributing.html as it's a data correctness > issue. > The bug is triggered when the following set of conditions are all met: > - the non-vectorized ORC reader is being used; > - a schema is explicitly specified when reading the ORC file > - the provided schema has columns not present in the ORC file, and these > columns are in the middle of the schema > - the ORC file being read contains null values in the columns after the ones > added by the schema. > When all of these are met: > - the internal state of the ORC deserializer gets messed up, and, as a result > - the null values from the ORC file end up being set on wrong columns, not > the one they're in, and > - the old values from the null columns don't get cleared from the previous > record. > Here's a concrete example. Let's consider the following DataFrame: > {code:scala} > val rdd = sparkContext.parallelize(Seq((1, 2, "abc"), (4, 5, "def"), > (8, 9, null))) > val df = rdd.toDF("col1", "col2", "col3") > {code} > and the following schema: > {code:scala} > col1 int, col4 int, col2 int, col3 string > {code} > Notice the `col4 int` added in the middle that doesn't exist in the dataframe. > Saving this dataframe to ORC and then reading it back with the specified > schema should result in reading the same values, with nulls for `col4`. > Instead, we get the following back: > {code:java} > [1,null,2,abc] > [4,null,5,def] > [8,null,null,def] > {code} > Notice how the `def` from the second record doesn't get properly cleared and > ends up in the third record as well; also, instead of `col2 = 9` in the last > record as expected, we get the null that should've been in column 3 instead. > *Impact* > When this issue is triggered, it results in completely wrong results being > read from the ORC file. The set of conditions under which it gets triggered > is somewhat narrow so the set of affected users is probably limited. There > are possibly also people that are affected but haven't realized it because > the conditions are so obscure. > *Bug details* > The issue is caused by calling `setNullAt` with a wrong index in > `OrcDeserializer.scala:deserialize()`. I have a fix that I'll send out for > review shortly. > *Workaround* > This bug is currently only triggered when new columns are added to the middle > of the schema. This means that it can be worked around by only adding new > columns at the end. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28668) Support the V2SessionCatalog with AlterTable commands
Burak Yavuz created SPARK-28668: --- Summary: Support the V2SessionCatalog with AlterTable commands Key: SPARK-28668 URL: https://issues.apache.org/jira/browse/SPARK-28668 Project: Spark Issue Type: Planned Work Components: SQL Affects Versions: 3.0.0 Reporter: Burak Yavuz We need to support the V2SessionCatalog with AlterTable commands so that V2 DataSources can leverage DDL through SQL ALTER TABLE commands. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28667) Support the V2SessionCatalog in insertInto
Burak Yavuz created SPARK-28667: --- Summary: Support the V2SessionCatalog in insertInto Key: SPARK-28667 URL: https://issues.apache.org/jira/browse/SPARK-28667 Project: Spark Issue Type: Planned Work Components: SQL Affects Versions: 3.0.0 Reporter: Burak Yavuz We need to support the V2SessionCatalog in the insert into SQL code paths as well as the DataFrameWriter code paths. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28666) Support the V2SessionCatalog in saveAsTable
Burak Yavuz created SPARK-28666: --- Summary: Support the V2SessionCatalog in saveAsTable Key: SPARK-28666 URL: https://issues.apache.org/jira/browse/SPARK-28666 Project: Spark Issue Type: Planned Work Components: SQL Affects Versions: 3.0.0 Reporter: Burak Yavuz We need to support the V2SessionCatalog in the old saveAsTable code paths so that V2 DataSources can leverage the old DataFrameWriter code path. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal to 0
[ https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitrii Shakshin updated SPARK-28665: - Description: Violate state machine FAILED after FINISHED org.apache.spark.launcher.ChildProcAppHandle#monitorChild https://issues.apache.org/jira/browse/SPARK-17742 was: Violate state machine FAILED after FINISH org.apache.spark.launcher.ChildProcAppHandle#monitorChild https://issues.apache.org/jira/browse/SPARK-17742 > State FINISHED is not final if exitCode not equal to 0 > -- > > Key: SPARK-28665 > URL: https://issues.apache.org/jira/browse/SPARK-28665 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.1, 2.4.3 >Reporter: Dmitrii Shakshin >Priority: Minor > Labels: bug > > Violate state machine > FAILED after FINISHED > org.apache.spark.launcher.ChildProcAppHandle#monitorChild > https://issues.apache.org/jira/browse/SPARK-17742 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal to 0
[ https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitrii Shakshin updated SPARK-28665: - Summary: State FINISHED is not final if exitCode not equal to 0 (was: State FINISHED is not final if exitCode not equal 0) > State FINISHED is not final if exitCode not equal to 0 > -- > > Key: SPARK-28665 > URL: https://issues.apache.org/jira/browse/SPARK-28665 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.1, 2.4.3 >Reporter: Dmitrii Shakshin >Priority: Minor > Labels: bug > > Violate state machine > FAILED after FINISH > org.apache.spark.launcher.ChildProcAppHandle#monitorChild > https://issues.apache.org/jira/browse/SPARK-17742 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode not equal 0
[ https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitrii Shakshin updated SPARK-28665: - Summary: State FINISHED is not final if exitCode not equal 0 (was: State FINISHED is not final if exitCode <> 0) > State FINISHED is not final if exitCode not equal 0 > --- > > Key: SPARK-28665 > URL: https://issues.apache.org/jira/browse/SPARK-28665 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.1, 2.4.3 >Reporter: Dmitrii Shakshin >Priority: Minor > Labels: bug > > Violate state machine > FAILED after FINISH > org.apache.spark.launcher.ChildProcAppHandle#monitorChild > https://issues.apache.org/jira/browse/SPARK-17742 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28665) State FINISHED is not final if exitCode <> 0
[ https://issues.apache.org/jira/browse/SPARK-28665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitrii Shakshin updated SPARK-28665: - Summary: State FINISHED is not final if exitCode <> 0 (was: State FINISHED is not final if exitCode <> -1) > State FINISHED is not final if exitCode <> 0 > > > Key: SPARK-28665 > URL: https://issues.apache.org/jira/browse/SPARK-28665 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 2.3.1, 2.4.3 >Reporter: Dmitrii Shakshin >Priority: Minor > Labels: bug > > Violate state machine > FAILED after FINISH > org.apache.spark.launcher.ChildProcAppHandle#monitorChild > https://issues.apache.org/jira/browse/SPARK-17742 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28665) State FINISHED is not final if exitCode <> -1
Dmitrii Shakshin created SPARK-28665: Summary: State FINISHED is not final if exitCode <> -1 Key: SPARK-28665 URL: https://issues.apache.org/jira/browse/SPARK-28665 Project: Spark Issue Type: Bug Components: Spark Submit Affects Versions: 2.4.3, 2.3.1 Reporter: Dmitrii Shakshin Violate state machine FAILED after FINISH org.apache.spark.launcher.ChildProcAppHandle#monitorChild https://issues.apache.org/jira/browse/SPARK-17742 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27492) GPU scheduling - High level user documentation
[ https://issues.apache.org/jira/browse/SPARK-27492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903081#comment-16903081 ] wuyi commented on SPARK-27492: -- I'm wondering would it be possible or better if we could use accelerator instead of resource for the whole module ? The word "resource" is overlap with traditional resources, e.g. memory, core and can be a little hard or ambiguous to describe sometimes. Though, these would bring a lot of troublesome words change. > GPU scheduling - High level user documentation > -- > > Key: SPARK-27492 > URL: https://issues.apache.org/jira/browse/SPARK-27492 > Project: Spark > Issue Type: Story > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Priority: Major > > For the SPIP - Accelerator-aware task scheduling for Spark, > https://issues.apache.org/jira/browse/SPARK-24615 Add some high level user > documentation about how this feature works together and point to things like > the example discovery script, etc. > > - make sure to document the discovery script and what permissions are needed > and any security implications > - Document standalone - local-cluster mode limitation of only a single > resource file or discovery script so you have to have coordination on for it > to work right. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28664) ORDER BY in aggregate function
Yuming Wang created SPARK-28664: --- Summary: ORDER BY in aggregate function Key: SPARK-28664 URL: https://issues.apache.org/jira/browse/SPARK-28664 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang {code:sql} SELECT min(x ORDER BY y) FROM (VALUES(1, NULL)) AS d(x,y); SELECT min(x ORDER BY y) FROM (VALUES(1, 2)) AS d(x,y); {code} https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/sql/aggregates.sql#L978-L982 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28653) Create table using DDL statement should not auto create the destination folder
[ https://issues.apache.org/jira/browse/SPARK-28653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thanida updated SPARK-28653: Issue Type: Bug (was: Question) > Create table using DDL statement should not auto create the destination folder > -- > > Key: SPARK-28653 > URL: https://issues.apache.org/jira/browse/SPARK-28653 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Thanida >Priority: Minor > > I create external table using this following DDL statement, the destination > path was auto-created. > {code:java} > CREATE TABLE ${tableName} USING parquet LOCATION ${path} > {code} > But, if I specified file format as csv or json, the destination path was not > created. > {code:java} > CREATE TABLE ${tableName} USING CSV LOCATION ${path} > {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28663) Aggregate Functions for Statistics
[ https://issues.apache.org/jira/browse/SPARK-28663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28663: Description: ||Function||Argument Type||Return Type||Partial Mode||Description|| |{{corr(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|correlation coefficient| |{{covar_pop(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|population covariance| |{{covar_samp(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|sample covariance| |{{regr_avgx(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|average of the independent variable ({{sum(_{{X_}})/_{{N}}_}})| |{{regr_avgy(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|average of the dependent variable ({{sum(_{{Y_}})/_{{N}}_}})| |{{regr_count(_Y_}}, _{{X}}_)|{{double precision}}|{{bigint}}|Yes|number of input rows in which both expressions are nonnull| |{{regr_intercept(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|y-intercept of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_r2(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|square of the correlation coefficient| |{{regr_slope(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|slope of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_sxx(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X_}}^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” of the independent variable)| |{{regr_sxy(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X_}}*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} (“sum of products”of independent times dependent variable)| |{{regr_syy(_Y_}}, _{{X}}_)|{{double precision}}|{{double precision}}|Yes|{{sum(_{{Y_}}^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” of the dependent variable)| [https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE] was: ||Function||Argument Type||Return Type||Partial Mode||Description|| |{{corr(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|correlation coefficient| |{{covar_pop(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|population covariance| |{{covar_samp(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|sample covariance| |{{regr_avgx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|average of the independent variable ({{sum(_{{X}}_)/_{{N}}_}})| |{{regr_avgy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|average of the dependent variable ({{sum(_{{Y}}_)/_{{N}}_}})| |{{regr_count(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{bigint}}|Yes|number of input rows in which both expressions are nonnull| |{{regr_intercept(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|y-intercept of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_r2(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|square of the correlation coefficient| |{{regr_slope(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|slope of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_sxx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X}}_^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” of the independent variable)| |{{regr_sxy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X}}_*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} (“sum of products”of independent times dependent variable)| |{{regr_syy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{Y}}_^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” of the dependent variable)| https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE > Aggregate Functions for Statistics > -- > > Key: SPARK-28663 > URL: https://issues.apache.org/jira/browse/SPARK-28663 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Argument Type||Return Type||Partial Mode||Description|| > |{{corr(_Y_}}, _{{X}}_)|{{double precision}}|{{double > precision}}|Yes|correlation coefficient| > |{{covar_pop(_Y_}}, _{{X}}_)|{{double precision}}|{{double > precision}}|Yes|population covariance| > |{{covar_samp(_Y_}}, _{{X}}_)|{{double precision}}|{{double > precision}}|Yes|sample covariance| > |{{regr_avgx(_Y_}}, _{{X}}_)|{{double precision}}|{{double > precision}}|Yes|average of the independent variable > ({{sum(_{{X_}})/_{{N}}_}})| > |{{regr_avgy(_Y_}}, _{{X}}_)|{{double precision}}|{{double > precision}}|Yes|average of the dependent variable
[jira] [Created] (SPARK-28663) Aggregate Functions for Statistics
Yuming Wang created SPARK-28663: --- Summary: Aggregate Functions for Statistics Key: SPARK-28663 URL: https://issues.apache.org/jira/browse/SPARK-28663 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang ||Function||Argument Type||Return Type||Partial Mode||Description|| |{{corr(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|correlation coefficient| |{{covar_pop(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|population covariance| |{{covar_samp(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|sample covariance| |{{regr_avgx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|average of the independent variable ({{sum(_{{X}}_)/_{{N}}_}})| |{{regr_avgy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|average of the dependent variable ({{sum(_{{Y}}_)/_{{N}}_}})| |{{regr_count(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{bigint}}|Yes|number of input rows in which both expressions are nonnull| |{{regr_intercept(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|y-intercept of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_r2(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|square of the correlation coefficient| |{{regr_slope(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|slope of the least-squares-fit linear equation determined by the (_{{X}}_, _{{Y}}_) pairs| |{{regr_sxx(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X}}_^2) - sum(_{{X}}_)^2/_{{N}}_}} (“sum of squares” of the independent variable)| |{{regr_sxy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{X}}_*_{{Y}}_) - sum(_{{X}}_) * sum(_{{Y}}_)/_{{N}}_}} (“sum of products”of independent times dependent variable)| |{{regr_syy(_{{Y}}_, _{{X}}_)}}|{{double precision}}|{{double precision}}|Yes|{{sum(_{{Y}}_^2) - sum(_{{Y}}_)^2/_{{N}}_}} (“sum of squares” of the dependent variable)| https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-AGGREGATE-STATISTICS-TABLE -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28662) Create Hive Partitioned Table without specifying data type for partition columns will success unexpectedly
Greg Lee created SPARK-28662: Summary: Create Hive Partitioned Table without specifying data type for partition columns will success unexpectedly Key: SPARK-28662 URL: https://issues.apache.org/jira/browse/SPARK-28662 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Greg Lee Fix For: 3.0.0 *Case :* Create Hive Partitioned Table without specifying data type for partition column will success unexpectly. {code:java} // create a hive table partition by b, but the data type of b isn't specified. CREATE TABLE tbl(a int) PARTITIONED BY (b) STORED AS parquet {code} *Root Cause:* In https://issues.apache.org/jira/browse/SPARK-26435 , PARTITIONED BY clause are extended to support Hive CTAS as following: {code:java} // Before (PARTITIONED BY '(' partitionColumns=colTypeList ')’ //After (PARTITIONED BY '(' partitionColumns=colTypeList ‘)’| PARTITIONED BY partitionColumnNames=identifierList) | {code} Create Table Statement like above case will pass the syntax check, and recognized as (PARTITIONED BY partitionColumnNames=identifierList) 。 We should check this case in visitCreateHiveTable and give a explicit error message to user -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28661) Hypothetical-Set Aggregate Functions
Yuming Wang created SPARK-28661: --- Summary: Hypothetical-Set Aggregate Functions Key: SPARK-28661 URL: https://issues.apache.org/jira/browse/SPARK-28661 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return Type||Partial Mode||Description|| |{{rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} {{"any"}}|{{VARIADIC}} {{"any"}}|{{bigint}}|No|rank of the hypothetical row, with gaps for duplicate rows| |{{dense_rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} {{"any"}}|{{VARIADIC}} {{"any"}}|{{bigint}}|No|rank of the hypothetical row, without gaps| |{{percent_rank(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} {{"any"}}|{{VARIADIC}} {{"any"}}|{{double precision}}|No|relative rank of the hypothetical row, ranging from 0 to 1| |{{cume_dist(_args_}}) WITHIN GROUP (ORDER BY {{sorted_args}})|{{VARIADIC}} {{"any"}}|{{VARIADIC}} {{"any"}}|{{double precision}}|No|relative rank of the hypothetical row, ranging from 1/_{{N}}_ to 1| [https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-HYPOTHETICAL-TABLE] -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions
[ https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27980: Description: ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return Type||Partial Mode||Description|| |{{mode() WITHIN GROUP (ORDER BY sort_expression)}}| |any sortable type|same as sort expression|No|returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)| |{{percentile_cont(_fraction_}}) WITHIN GROUP (ORDER BY {{sort_expression}})|{{double precision}}|{{double precision}} or {{interval}}|same as sort expression|No|continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed| |{{percentile_cont(_fractions_}}) WITHIN GROUP (ORDER BY {{sort_expression}})|{{double precision[]}}|{{double precision}} or {{interval}}|array of sort expression's type|No|multiple continuous percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the value corresponding to that percentile| |{{percentile_disc(_fraction_}}) WITHIN GROUP (ORDER BY {{sort_expression}})|{{double precision}}|any sortable type|same as sort expression|No|discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction| |{{percentile_disc(_fractions_}}) WITHIN GROUP (ORDER BY {{sort_expression}})|{{double precision[]}}|any sortable type|array of sort expression's type|No|multiple discrete percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the input value corresponding to that percentile| [https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE] was: ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return Type||Partial Mode||Description|| |{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable type|same as sort expression|No|returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)| |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or {{interval}}|same as sort expression|No|continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed| |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or {{interval}}|array of sort expression's type|No|multiple continuous percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the value corresponding to that percentile| |{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as sort expression|No|discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction| |{{percentile_disc(_{{fractions}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision[]}}|any sortable type|array of sort expression's type|No|multiple discrete percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the input value corresponding to that percentile| https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE > Ordered-Set Aggregate Functions > --- > > Key: SPARK-27980 > URL: https://issues.apache.org/jira/browse/SPARK-27980 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return > Type||Partial Mode||Description|| > |{{mode() WITHIN GROUP (ORDER BY sort_expression)}}| |any sortable type|same > as sort expression|No|returns the most frequent input value (arbitrarily > choosing the first one if there are multiple equally-frequent results)| > |{{percentile_cont(_fraction_}}) WITHIN GROUP (ORDER BY > {{sort_expression}})|{{double precision}}|{{double precision}} or > {{interval}}|same as sort expression|No|continuous percentile: returns a > value corresponding to the specified fraction in the ordering, interpolating > between adjacent input items if needed| > |{{percentile_cont(_fractions_}}) WITHIN GROUP (ORDER BY > {{sort_expression}})|{{double precision[]}}|{{double precision}} or > {{interval}}|array of sort expression's type|No|multiple continuous > percentile: returns an array of
[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions
[ https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27980: Description: ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return Type||Partial Mode||Description|| |{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable type|same as sort expression|No|returns the most frequent input value (arbitrarily choosing the first one if there are multiple equally-frequent results)| |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or {{interval}}|same as sort expression|No|continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed| |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or {{interval}}|array of sort expression's type|No|multiple continuous percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the value corresponding to that percentile| |{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as sort expression|No|discrete percentile: returns the first input value whose position in the ordering equals or exceeds the specified fraction| |{{percentile_disc(_{{fractions}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision[]}}|any sortable type|array of sort expression's type|No|multiple discrete percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the input value corresponding to that percentile| https://www.postgresql.org/docs/11/functions-aggregate.html#FUNCTIONS-ORDEREDSET-TABLE was: ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return Type||Partial Mode||Description|| |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY _{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or {{interval}}|same as sort expression|No|continuous percentile: returns a value corresponding to the specified fraction in the ordering, interpolating between adjacent input items if needed| |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or {{interval}}|array of sort expression's type|No|multiple continuous percentile: returns an array of results matching the shape of the _{{fractions}}_ parameter, with each non-null element replaced by the value corresponding to that percentile| Currently, the following DBMSs support the syntax: https://www.postgresql.org/docs/current/functions-aggregate.html https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/RgAqeSpr93jpuGAvDTud3w https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAnalytic%20Functions%7C_25 > Ordered-Set Aggregate Functions > --- > > Key: SPARK-27980 > URL: https://issues.apache.org/jira/browse/SPARK-27980 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return > Type||Partial Mode||Description|| > |{{mode() WITHIN GROUP (ORDER BY_{{sort_expression}}_)}}| |any sortable > type|same as sort expression|No|returns the most frequent input value > (arbitrarily choosing the first one if there are multiple equally-frequent > results)| > |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER > BY_{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or > {{interval}}|same as sort expression|No|continuous percentile: returns a > value corresponding to the specified fraction in the ordering, interpolating > between adjacent input items if needed| > |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER > BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or > {{interval}}|array of sort expression's type|No|multiple continuous > percentile: returns an array of results matching the shape of the > _{{fractions}}_ parameter, with each non-null element replaced by the value > corresponding to that percentile| > |{{percentile_disc(_{{fraction}}_) WITHIN GROUP (ORDER > BY_{{sort_expression}}_)}}|{{double precision}}|any sortable type|same as > sort expression|No|discrete percentile: returns the first input value whose > position in the ordering equals or exceeds the specified fraction| >
[jira] [Updated] (SPARK-27980) Ordered-Set Aggregate Functions
[ https://issues.apache.org/jira/browse/SPARK-27980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27980: Summary: Ordered-Set Aggregate Functions (was: Add built-in Ordered-Set Aggregate Functions: percentile_cont) > Ordered-Set Aggregate Functions > --- > > Key: SPARK-27980 > URL: https://issues.apache.org/jira/browse/SPARK-27980 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > ||Function||Direct Argument Type(s)||Aggregated Argument Type(s)||Return > Type||Partial Mode||Description|| > |{{percentile_cont(_{{fraction}}_) WITHIN GROUP (ORDER BY > _{{sort_expression}}_)}}|{{double precision}}|{{double precision}} or > {{interval}}|same as sort expression|No|continuous percentile: returns a > value corresponding to the specified fraction in the ordering, interpolating > between adjacent input items if needed| > |{{percentile_cont(_{{fractions}}_) WITHIN GROUP (ORDER > BY_{{sort_expression}}_)}}|{{double precision[]}}|{{double precision}} or > {{interval}}|array of sort expression's type|No|multiple continuous > percentile: returns an array of results matching the shape of the > _{{fractions}}_ parameter, with each non-null element replaced by the value > corresponding to that percentile| > Currently, the following DBMSs support the syntax: > https://www.postgresql.org/docs/current/functions-aggregate.html > https://docs.aws.amazon.com/redshift/latest/dg/r_PERCENTILE_CONT.html > https://docs.teradata.com/reader/756LNiPSFdY~4JcCCcR5Cw/RgAqeSpr93jpuGAvDTud3w > https://www.vertica.com/docs/9.2.x/HTML/Content/Authoring/SQLReferenceManual/Functions/Analytic/PERCENTILE_CONTAnalytic.htm?tocpath=SQL%20Reference%20Manual%7CSQL%20Functions%7CAnalytic%20Functions%7C_25 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28660) Add aggregates.sql - Part4
Yuming Wang created SPARK-28660: --- Summary: Add aggregates.sql - Part4 Key: SPARK-28660 URL: https://issues.apache.org/jira/browse/SPARK-28660 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang In this ticket, we plan to add the regression test cases of https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/aggregates.sql#L607-L997 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28659) insert overwrite directory using stored as parquet does not creates snappy.parquet data file at HDFS side
[ https://issues.apache.org/jira/browse/SPARK-28659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902929#comment-16902929 ] Udbhav Agrawal commented on SPARK-28659: I will work on this > insert overwrite directory using stored as parquet does not creates > snappy.parquet data file at HDFS side > > > Key: SPARK-28659 > URL: https://issues.apache.org/jira/browse/SPARK-28659 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Major > > 1. insert overwrite directory '/opt/trash_u/' using parquet select 2; > 2. Check at hdfs side ./hdfs dfs -ls /opt/trash_u > data file created with snappy.parquet as below > /opt/trash_u/part-0-6de61796-4ebd-40b9-a303-d53182c89332-c000.snappy.parquet > 3. insert overwrite directory '/opt/trash_u/' stored as parquet select 2; > 4. Check at hdfs side ./hdfs dfs -ls /opt/trash_u, data file created with > snappy.parquet as below > /opt/trash_u/part-0-50d5d863-0389-4cba-ae5f-ea3f89cd2eab-c000 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28659) insert overwrite directory using stored as parquet does not creates snappy.parquet data file at HDFS side
ABHISHEK KUMAR GUPTA created SPARK-28659: Summary: insert overwrite directory using stored as parquet does not creates snappy.parquet data file at HDFS side Key: SPARK-28659 URL: https://issues.apache.org/jira/browse/SPARK-28659 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Reporter: ABHISHEK KUMAR GUPTA 1. insert overwrite directory '/opt/trash_u/' using parquet select 2; 2. Check at hdfs side ./hdfs dfs -ls /opt/trash_u data file created with snappy.parquet as below /opt/trash_u/part-0-6de61796-4ebd-40b9-a303-d53182c89332-c000.snappy.parquet 3. insert overwrite directory '/opt/trash_u/' stored as parquet select 2; 4. Check at hdfs side ./hdfs dfs -ls /opt/trash_u, data file created with snappy.parquet as below /opt/trash_u/part-0-50d5d863-0389-4cba-ae5f-ea3f89cd2eab-c000 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer
[ https://issues.apache.org/jira/browse/SPARK-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-28654. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25386 [https://github.com/apache/spark/pull/25386] > Move "Extract Python UDFs" to the last in optimizer > --- > > Key: SPARK-28654 > URL: https://issues.apache.org/jira/browse/SPARK-28654 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.0.0 > > > Plans after "Extract Python UDFs" are very flaky and error-prone to other > plans. For instance, > if we add some rules, for instance, [{PushDownPredicates}}, > The optimization is rolled back as below: > {code} > === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates > === > !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) Join Cross, > (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) > !+- Join Cross :- Project [_1#2 AS > a#7, _2#3 AS b#8] > ! :- Project [_1#2 AS a#7, _2#3 AS b#8] : +- LocalRelation > [_1#2, _2#3] > ! : +- LocalRelation [_1#2, _2#3] +- Project [_1#13 AS > c#18, _2#14 AS d#19] > ! +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation > [_1#13, _2#14] > ! +- LocalRelation [_1#13, _2#14] > {code} > Seems we should do Python UDFs cases at the last even after post hoc rules. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer
[ https://issues.apache.org/jira/browse/SPARK-28654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-28654: --- Assignee: Hyukjin Kwon > Move "Extract Python UDFs" to the last in optimizer > --- > > Key: SPARK-28654 > URL: https://issues.apache.org/jira/browse/SPARK-28654 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Plans after "Extract Python UDFs" are very flaky and error-prone to other > plans. For instance, > if we add some rules, for instance, [{PushDownPredicates}}, > The optimization is rolled back as below: > {code} > === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates > === > !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) Join Cross, > (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) > !+- Join Cross :- Project [_1#2 AS > a#7, _2#3 AS b#8] > ! :- Project [_1#2 AS a#7, _2#3 AS b#8] : +- LocalRelation > [_1#2, _2#3] > ! : +- LocalRelation [_1#2, _2#3] +- Project [_1#13 AS > c#18, _2#14 AS d#19] > ! +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation > [_1#13, _2#14] > ! +- LocalRelation [_1#13, _2#14] > {code} > Seems we should do Python UDFs cases at the last even after post hoc rules. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got {code:java} // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ {code} {{Utils.classForName return Class[Nothing], I think it should be defind as Class[_] to resolve this issue}} was: When run spark on yarn, I got ``` {{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ }} {{```}} {{```Utils.classForName``` return Class[Nothing], I think it should be defind as Class[_] to resolve this issue}} > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > Attachments: warn.jpg > > > When run spark on yarn, I got > {code:java} > // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ > {code} > > {{Utils.classForName return Class[Nothing], I think it should be defind as > Class[_] to resolve this issue}} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got ``` {{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ }} {{```}} {{```Utils.classForName``` return Class[Nothing], I think it should be defind as Class[_] to resolve this issue}} was:When run spark on yarn, I got > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > Attachments: warn.jpg > > > When run spark on yarn, I got > ``` > {{java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ }} > {{```}} > {{```Utils.classForName``` return Class[Nothing], I think it should be defind > as Class[_] to resolve this issue}} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got {code:java} // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ {code} !warn.jpg! {{Utils.classForName return Class[Nothing], I think it should be defind as Class[_] to resolve this issue}} was: When run spark on yarn, I got {code:java} // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ {code} {{Utils.classForName return Class[Nothing], I think it should be defind as Class[_] to resolve this issue}} > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > Attachments: warn.jpg > > > When run spark on yarn, I got > {code:java} > // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ > {code} > !warn.jpg! > {{Utils.classForName return Class[Nothing], I think it should be defind as > Class[_] to resolve this issue}} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Attachment: warn.jpg > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > Attachments: warn.jpg > > > When run spark on yarn, I got > {code:java} > // java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ > {code} > > {{Utils.classForName return Class[Nothing], I think it should be defind as > Class[_] to resolve this issue}} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ !image-2019-08-08-17-39-19-552.png! was: When run spark on yarn, I got """ java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ """ !image-2019-08-08-17-39-19-552.png! > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > > When run spark on yarn, I got > java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ > !image-2019-08-08-17-39-19-552.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got (was: When run spark on yarn, I got java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ !image-2019-08-08-17-39-19-552.png!) > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > > When run spark on yarn, I got -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Description: When run spark on yarn, I got """ java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ """ !image-2019-08-08-17-39-19-552.png! > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > > When run spark on yarn, I got > """ > java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder > cannot be cast to scala.runtime.Nothing$ > """ > > !image-2019-08-08-17-39-19-552.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode
[ https://issues.apache.org/jira/browse/SPARK-28658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] deshanxiao updated SPARK-28658: --- Description: In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. Maybe we can use a clearer state not a "success". was: In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. > Yarn FinalStatus is always "success" in yarn-client mode > -- > > Key: SPARK-28658 > URL: https://issues.apache.org/jira/browse/SPARK-28658 > Project: Spark > Issue Type: Improvement > Components: YARN >Affects Versions: 3.0.0 >Reporter: deshanxiao >Priority: Major > > In yarn-client mode, the finalStatus of application will always be success > because the ApplicationMaster returns success when the driver disconnected. > A simple examle is that: > {code:java} > sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect > {code} > When we run the code in yarn-client mode, the finalStatus will be success. It > misleads us. Maybe we can use a clearer state not a "success". -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28657) Fix currentContext Instance failed sometimes
[ https://issues.apache.org/jira/browse/SPARK-28657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hong dongdong updated SPARK-28657: -- Environment: was: When run spark on yarn, I got """ java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ """ !image-2019-08-08-17-39-19-552.png! > Fix currentContext Instance failed sometimes > > > Key: SPARK-28657 > URL: https://issues.apache.org/jira/browse/SPARK-28657 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 > Environment: > >Reporter: hong dongdong >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28658) Yarn FinalStatus is always "success" in yarn-client mode
deshanxiao created SPARK-28658: -- Summary: Yarn FinalStatus is always "success" in yarn-client mode Key: SPARK-28658 URL: https://issues.apache.org/jira/browse/SPARK-28658 Project: Spark Issue Type: Improvement Components: YARN Affects Versions: 3.0.0 Reporter: deshanxiao In yarn-client mode, the finalStatus of application will always be success because the ApplicationMaster returns success when the driver disconnected. A simple examle is that: {code:java} sc.parallelize(Seq(1, 3, 4, 5)).map(x => x / 0).collect {code} When we run the code in yarn-client mode, the finalStatus will be success. It misleads us. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28657) Fix currentContext Instance failed sometimes
hong dongdong created SPARK-28657: - Summary: Fix currentContext Instance failed sometimes Key: SPARK-28657 URL: https://issues.apache.org/jira/browse/SPARK-28657 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0 Environment: When run spark on yarn, I got """ java.lang.ClassCastException: org.apache.hadoop.ipc.CallerContext$Builder cannot be cast to scala.runtime.Nothing$ """ !image-2019-08-08-17-39-19-552.png! Reporter: hong dongdong -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28644) Port HIVE-10646: ColumnValue does not handle NULL_TYPE
[ https://issues.apache.org/jira/browse/SPARK-28644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28644. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25378 [https://github.com/apache/spark/pull/25378] > Port HIVE-10646: ColumnValue does not handle NULL_TYPE > -- > > Key: SPARK-28644 > URL: https://issues.apache.org/jira/browse/SPARK-28644 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > Port HIVE-10646 to fix Hive 0.12's JDBC client can not handle NULL_TYPE: > {code:sql} > Connected to: Hive (version 3.0.0-SNAPSHOT) > Driver: Hive (version 0.12.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 0.12.0 by Apache Hive > 0: jdbc:hive2://localhost:1> select null; > org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346) > at > org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423) > at > org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405) > {code} > Server log: > {noformat} > 19/08/07 09:34:07 ERROR TThreadPoolServer: Error occurred during processing > of message. > java.lang.NullPointerException > at > org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388) > at > org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338) > at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288) > at > org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605) > at > org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525) > at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13192) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13156) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13107) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28644) Port HIVE-10646: ColumnValue does not handle NULL_TYPE
[ https://issues.apache.org/jira/browse/SPARK-28644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28644: Assignee: Yuming Wang > Port HIVE-10646: ColumnValue does not handle NULL_TYPE > -- > > Key: SPARK-28644 > URL: https://issues.apache.org/jira/browse/SPARK-28644 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Port HIVE-10646 to fix Hive 0.12's JDBC client can not handle NULL_TYPE: > {code:sql} > Connected to: Hive (version 3.0.0-SNAPSHOT) > Driver: Hive (version 0.12.0) > Transaction isolation: TRANSACTION_REPEATABLE_READ > Beeline version 0.12.0 by Apache Hive > 0: jdbc:hive2://localhost:1> select null; > org.apache.thrift.transport.TTransportException > at > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) > at > org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346) > at > org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423) > at > org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405) > {code} > Server log: > {noformat} > 19/08/07 09:34:07 ERROR TThreadPoolServer: Error occurred during processing > of message. > java.lang.NullPointerException > at > org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:388) > at > org.apache.hive.service.cli.thrift.TRow$TRowStandardScheme.write(TRow.java:338) > at org.apache.hive.service.cli.thrift.TRow.write(TRow.java:288) > at > org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:605) > at > org.apache.hive.service.cli.thrift.TRowSet$TRowSetStandardScheme.write(TRowSet.java:525) > at org.apache.hive.service.cli.thrift.TRowSet.write(TRowSet.java:455) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:550) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp$TFetchResultsRespStandardScheme.write(TFetchResultsResp.java:486) > at > org.apache.hive.service.cli.thrift.TFetchResultsResp.write(TFetchResultsResp.java:412) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13192) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result$FetchResults_resultStandardScheme.write(TCLIService.java:13156) > at > org.apache.hive.service.cli.thrift.TCLIService$FetchResults_result.write(TCLIService.java:13107) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:58) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:310) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28656) Support `millennium`, `century` and `decade` at `extract()`
[ https://issues.apache.org/jira/browse/SPARK-28656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-28656: --- Summary: Support `millennium`, `century` and `decade` at `extract()` (was: Support `millenium`, `century` and `decade` at `extract()`) > Support `millennium`, `century` and `decade` at `extract()` > --- > > Key: SPARK-28656 > URL: https://issues.apache.org/jira/browse/SPARK-28656 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, > DAY, DAYOFWEEK, HOUR, MINUTE, SECOND. > We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, > MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, > ISOYEAR. > https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28474) Lower JDBC client cannot read binary type
[ https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-28474. -- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 25379 [https://github.com/apache/spark/pull/25379] > Lower JDBC client cannot read binary type > - > > Key: SPARK-28474 > URL: https://issues.apache.org/jira/browse/SPARK-28474 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > Logs: > {noformat} > java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible > with java.lang.String > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at > java.security.AccessController.doPrivileged(AccessController.java:770) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy26.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String > at > org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198) > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ... 18 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28474) Lower JDBC client cannot read binary type
[ https://issues.apache.org/jira/browse/SPARK-28474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-28474: Assignee: Yuming Wang > Lower JDBC client cannot read binary type > - > > Key: SPARK-28474 > URL: https://issues.apache.org/jira/browse/SPARK-28474 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Logs: > {noformat} > java.lang.RuntimeException: java.lang.ClassCastException: [B incompatible > with java.lang.String > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:83) > at > org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) > at > org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) > at > java.security.AccessController.doPrivileged(AccessController.java:770) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) > at com.sun.proxy.$Proxy26.fetchResults(Unknown Source) > at > org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:455) > at > org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:621) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) > at > org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:53) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:819) > Caused by: java.lang.ClassCastException: [B incompatible with java.lang.String > at > org.apache.hive.service.cli.ColumnValue.toTColumnValue(ColumnValue.java:198) > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:60) > at org.apache.hive.service.cli.RowBasedSet.addRow(RowBasedSet.java:32) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.getNextRowSet(SparkExecuteStatementOperation.scala:148) > at > org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:220) > at > org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:785) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) > ... 18 more > {noformat} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28655) Support to cut the event log, and solve the history server was too slow when event log is too large.
[ https://issues.apache.org/jira/browse/SPARK-28655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902774#comment-16902774 ] Shao commented on SPARK-28655: -- [https://github.com/apache/spark/pull/25387] > Support to cut the event log, and solve the history server was too slow when > event log is too large. > > > Key: SPARK-28655 > URL: https://issues.apache.org/jira/browse/SPARK-28655 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.3 >Reporter: Shao >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28656) Support `millenium`, `century` and `decade` at `extract()`
Maxim Gekk created SPARK-28656: -- Summary: Support `millenium`, `century` and `decade` at `extract()` Key: SPARK-28656 URL: https://issues.apache.org/jira/browse/SPARK-28656 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Maxim Gekk Assignee: Maxim Gekk Fix For: 3.0.0 Currently, we support these field for EXTRACT: YEAR, QUARTER, MONTH, WEEK, DAY, DAYOFWEEK, HOUR, MINUTE, SECOND. We also need support: EPOCH, CENTURY, MILLENNIUM, DECADE, MICROSECONDS, MILLISECONDS, DOW, ISODOW, DOY, TIMEZONE, TIMEZONE_M, TIMEZONE_H, JULIAN, ISOYEAR. https://www.postgresql.org/docs/11/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28655) Support to cut the event log, and solve the history server was too slow when event log is too large.
Shao created SPARK-28655: Summary: Support to cut the event log, and solve the history server was too slow when event log is too large. Key: SPARK-28655 URL: https://issues.apache.org/jira/browse/SPARK-28655 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 2.4.3 Reporter: Shao -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28654) Move "Extract Python UDFs" to the last in optimizer
Hyukjin Kwon created SPARK-28654: Summary: Move "Extract Python UDFs" to the last in optimizer Key: SPARK-28654 URL: https://issues.apache.org/jira/browse/SPARK-28654 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Hyukjin Kwon Plans after "Extract Python UDFs" are very flaky and error-prone to other plans. For instance, if we add some rules, for instance, [{PushDownPredicates}}, The optimization is rolled back as below: {code} === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushDownPredicates === !Filter (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) Join Cross, (dummyUDF(a#7, c#18) = dummyUDF(d#19, c#18)) !+- Join Cross :- Project [_1#2 AS a#7, _2#3 AS b#8] ! :- Project [_1#2 AS a#7, _2#3 AS b#8] : +- LocalRelation [_1#2, _2#3] ! : +- LocalRelation [_1#2, _2#3] +- Project [_1#13 AS c#18, _2#14 AS d#19] ! +- Project [_1#13 AS c#18, _2#14 AS d#19] +- LocalRelation [_1#13, _2#14] ! +- LocalRelation [_1#13, _2#14] {code} Seems we should do Python UDFs cases at the last even after post hoc rules. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28653) Create table using DDL statement should not auto create the destination folder
Thanida created SPARK-28653: --- Summary: Create table using DDL statement should not auto create the destination folder Key: SPARK-28653 URL: https://issues.apache.org/jira/browse/SPARK-28653 Project: Spark Issue Type: Question Components: Spark Core Affects Versions: 2.4.3 Reporter: Thanida I create external table using this following DDL statement, the destination path was auto-created. {code:java} CREATE TABLE ${tableName} USING parquet LOCATION ${path} {code} But, if I specified file format as csv or json, the destination path was not created. {code:java} CREATE TABLE ${tableName} USING CSV LOCATION ${path} {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28330) ANSI SQL: Top-level in
[ https://issues.apache.org/jira/browse/SPARK-28330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16902711#comment-16902711 ] jiaan.geng commented on SPARK-28330: I'm working. > ANSI SQL: Top-level in > > > Key: SPARK-28330 > URL: https://issues.apache.org/jira/browse/SPARK-28330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > h2. {{LIMIT}} and {{OFFSET}} > LIMIT and OFFSET allow you to retrieve just a portion of the rows that are > generated by the rest of the query: > {noformat} > SELECT select_list > FROM table_expression > [ ORDER BY ... ] > [ LIMIT { number | ALL } ] [ OFFSET number ] > {noformat} > If a limit count is given, no more than that many rows will be returned (but > possibly fewer, if the query itself yields fewer rows). LIMIT ALL is the same > as omitting the LIMIT clause, as is LIMIT with a NULL argument. > OFFSET says to skip that many rows before beginning to return rows. OFFSET 0 > is the same as omitting the OFFSET clause, as is OFFSET with a NULL argument. > If both OFFSET and LIMIT appear, then OFFSET rows are skipped before starting > to count the LIMIT rows that are returned. > https://www.postgresql.org/docs/11/queries-limit.html > *Feature ID*: F861 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org