[jira] [Resolved] (SPARK-25817) Dataset encoder should support combination of map and product type
[ https://issues.apache.org/jira/browse/SPARK-25817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25817. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22812 [https://github.com/apache/spark/pull/22812] > Dataset encoder should support combination of map and product type > -- > > Key: SPARK-25817 > URL: https://issues.apache.org/jira/browse/SPARK-25817 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25833) Views without column names created by Hive are not readable by Spark
[ https://issues.apache.org/jira/browse/SPARK-25833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1291#comment-1291 ] Chenxiao Mao commented on SPARK-25833: -- [~dkbiswal] Thanks for you comments. I think you are right that this is a duplicate. Does it make sense to describe this compatibility issue explicitly in the user guide to help users troubleshoot this issue? > Views without column names created by Hive are not readable by Spark > > > Key: SPARK-25833 > URL: https://issues.apache.org/jira/browse/SPARK-25833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > A simple example to reproduce this issue. > create a view via Hive CLI: > {code:sql} > hive> CREATE VIEW v1 AS SELECT * FROM (SELECT 1) t1 > {code} > query that view via Spark > {code:sql} > spark-sql> select * from v1; > Error in query: cannot resolve '`t1._c0`' given input columns: [1]; line 1 > pos 7; > 'Project [*] > +- 'SubqueryAlias v1, `default`.`v1` >+- 'Project ['t1._c0] > +- SubqueryAlias t1 > +- Project [1 AS 1#41] > +- OneRowRelation$ > {code} > Check the view definition: > {code:sql} > hive> desc extended v1; > OK > _c0 int > ... > viewOriginalText:SELECT * FROM (SELECT 1) t1, > viewExpandedText:SELECT `t1`.`_c0` FROM (SELECT 1) `t1` > ... > {code} > _c0 in above view definition is automatically generated by Hive, which is not > recognizable by Spark. > see [Hive > LanguageManual|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=30746446&navigatingVersions=true#LanguageManualDDL-CreateView] > for more details: > {quote}If no column names are supplied, the names of the view's columns will > be derived automatically from the defining SELECT expression. (If the SELECT > contains unaliased scalar expressions such as x+y, the resulting view column > names will be generated in the form _C0, _C1, etc.) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated SPARK-25778: --- Attachment: SPARK-25778.patch > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > Attachments: SPARK-25778.patch > > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) > at > org.apache.ranger.authorizati
[jira] [Commented] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1285#comment-1285 ] Apache Spark commented on SPARK-25778: -- User 'gss2002' has created a pull request for this issue: https://github.com/apache/spark/pull/22867 > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.che
[jira] [Commented] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1284#comment-1284 ] Apache Spark commented on SPARK-25778: -- User 'gss2002' has created a pull request for this issue: https://github.com/apache/spark/pull/22867 > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.che
[jira] [Assigned] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25778: Assignee: Apache Spark > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Assignee: Apache Spark >Priority: Major > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) > at > org.apache.ranger.authorization.
[jira] [Assigned] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25778: Assignee: (was: Apache Spark) > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205) > at > org.apache.ranger.authorization.hadoop.RangerHdfsAuthoriz
[jira] [Updated] (SPARK-25778) WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS
[ https://issues.apache.org/jira/browse/SPARK-25778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Senia updated SPARK-25778: --- Summary: WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to tmpDir from $PWD to HDFS (was: WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access) > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > tmpDir from $PWD to HDFS > - > > Key: SPARK-25778 > URL: https://issues.apache.org/jira/browse/SPARK-25778 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming, YARN >Affects Versions: 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 2.2.1, 2.2.2, 2.3.1, > 2.3.2 >Reporter: Greg Senia >Priority: Major > > WriteAheadLogBackedBlockRDD in YARN Cluster Mode Fails due lack of access to > HDFS path due to it using a similar name was $PWD folder from YARN AM Cluster > Mode for Spark > While attempting to use Spark Streaming and WriteAheadLogs. I noticed the > following errors after the driver attempted to recovery the already read data > that was being written to HDFS in the checkpoint folder. After spending many > hours looking at the cause of the following error below due to the fact the > parent folder /hadoop exists in our HDFS FS.. I am wonder if its possible to > make an option configurable to choose an alternate bogus directory that will > never be used. > hadoop fs -ls / > drwx-- - dsadmdsadm 0 2017-06-20 13:20 /hadoop > hadoop fs -ls /hadoop/apps > drwx-- - dsadm dsadm 0 2017-06-20 13:20 /hadoop/apps > streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala > val nonExistentDirectory = new File( > System.getProperty("java.io.tmpdir"), > UUID.randomUUID().toString).getAbsolutePath > writeAheadLog = WriteAheadLogUtils.createLogForReceiver( > SparkEnv.get.conf, nonExistentDirectory, hadoopConf) > dataRead = writeAheadLog.read(partition.walRecordHandle) > 18/10/19 00:03:03 DEBUG YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 72 on executor id: 1 hostname: ha20t5002dn.tech.hdp.example.com. > 18/10/19 00:03:03 DEBUG BlockManager: Getting local block broadcast_4_piece0 > as bytes > 18/10/19 00:03:03 DEBUG BlockManager: Level for block broadcast_4_piece0 is > StorageLevel(disk, memory, 1 replicas) > 18/10/19 00:03:03 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory > on ha20t5002dn.tech.hdp.example.com:32768 (size: 33.7 KB, free: 912.2 MB) > 18/10/19 00:03:03 WARN TaskSetManager: Lost task 0.0 in stage 3.0 (TID 71, > ha20t5002dn.tech.hdp.example.com, executor 1): > org.apache.spark.SparkException: Could not read data from write ahead log > record > FileBasedWriteAheadLogSegment(hdfs://tech/user/hdpdevspark/sparkstreaming/Spark_Streaming_MQ_IDMS/receivedData/0/log-1539921695606-1539921755606,0,1017) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.org$apache$spark$streaming$rdd$WriteAheadLogBackedBlockRDD$$getBlockFromWriteAheadLog$1(WriteAheadLogBackedBlockRDD.scala:145) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD$$anonfun$compute$1.apply(WriteAheadLogBackedBlockRDD.scala:173) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.streaming.rdd.WriteAheadLogBackedBlockRDD.compute(WriteAheadLogBackedBlockRDD.scala:173) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:108) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.security.AccessControlException: Permission > denied: user=hdpdevspark, access=EXECUTE, > inode="/hadoop/diskc/hadoop/yarn/local/usercache/hdpdevspark/appcache/application_1539554105597_0338/container_e322_1539554105597_0338_01_02/tmp/170f36b8-9202-4556-89a4-64587c7136b6":dsadm:dsadm:drwx-- > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319) > at > org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259) > at > org.apache.hadoop.hdfs.ser
[jira] [Assigned] (SPARK-19851) Add support for EVERY and ANY (SOME) aggregates
[ https://issues.apache.org/jira/browse/SPARK-19851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-19851: --- Assignee: Dilip Biswal > Add support for EVERY and ANY (SOME) aggregates > --- > > Key: SPARK-19851 > URL: https://issues.apache.org/jira/browse/SPARK-19851 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.1.0 >Reporter: Michael Styles >Assignee: Dilip Biswal >Priority: Major > Fix For: 3.0.0 > > > Add support for EVERY and ANY (SOME) aggregates. > - EVERY returns true if all input values are true. > - ANY returns true if at least one input value is true. > - SOME is equivalent to ANY. > Both aggregates are part of the SQL standard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19851) Add support for EVERY and ANY (SOME) aggregates
[ https://issues.apache.org/jira/browse/SPARK-19851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-19851. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22809 [https://github.com/apache/spark/pull/22809] > Add support for EVERY and ANY (SOME) aggregates > --- > > Key: SPARK-19851 > URL: https://issues.apache.org/jira/browse/SPARK-19851 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.1.0 >Reporter: Michael Styles >Priority: Major > Fix For: 3.0.0 > > > Add support for EVERY and ANY (SOME) aggregates. > - EVERY returns true if all input values are true. > - ANY returns true if at least one input value is true. > - SOME is equivalent to ANY. > Both aggregates are part of the SQL standard. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1252#comment-1252 ] Apache Spark commented on SPARK-12172: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/22866 > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1251#comment-1251 ] Apache Spark commented on SPARK-12172: -- User 'felixcheung' has created a pull request for this issue: https://github.com/apache/spark/pull/22866 > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12172: Assignee: (was: Apache Spark) > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12172) Consider removing SparkR internal RDD APIs
[ https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12172: Assignee: Apache Spark > Consider removing SparkR internal RDD APIs > -- > > Key: SPARK-12172 > URL: https://issues.apache.org/jira/browse/SPARK-12172 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Felix Cheung >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-25859. -- Resolution: Fixed Assignee: Huaxin Gao Fix Version/s: 2.4.0 Target Version/s: 2.4.0 > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 2.4.0 > > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-16693) Remove R deprecated methods
[ https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-16693. -- Resolution: Fixed Assignee: Felix Cheung Fix Version/s: 3.0.0 > Remove R deprecated methods > --- > > Key: SPARK-16693 > URL: https://issues.apache.org/jira/browse/SPARK-16693 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 3.0.0 > > > For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25823) map_filter can generate incorrect data
[ https://issues.apache.org/jira/browse/SPARK-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25823: -- Priority: Critical (was: Blocker) > map_filter can generate incorrect data > -- > > Key: SPARK-25823 > URL: https://issues.apache.org/jira/browse/SPARK-25823 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Critical > Labels: correctness > > This is not a regression because this occurs in new high-order functions like > `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows > the duplication. If we want to allow this difference in new high-order > functions, we had better add some warning about this different on these > functions after RC4 voting pass at least. Otherwise, this will surprise > Presto-based users. > *Spark 2.4* > {code:java} > spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM > (SELECT map_concat(map(1,2), map(1,3)) m); > spark-sql> SELECT * FROM t; > {1:3} {1:2} > {code} > *Presto 0.212* > {code:java} > presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT > map_concat(map(array[1],array[2]), map(array[1],array[3])) a); >a | _col1 > ---+--- > {1=3} | {} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25823) map_filter can generate incorrect data
[ https://issues.apache.org/jira/browse/SPARK-25823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25823: -- Affects Version/s: (was: 2.4.0) 3.0.0 > map_filter can generate incorrect data > -- > > Key: SPARK-25823 > URL: https://issues.apache.org/jira/browse/SPARK-25823 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Blocker > Labels: correctness > > This is not a regression because this occurs in new high-order functions like > `map_filter` and `map_concat`. The root cause is Spark's `CreateMap` allows > the duplication. If we want to allow this difference in new high-order > functions, we had better add some warning about this different on these > functions after RC4 voting pass at least. Otherwise, this will surprise > Presto-based users. > *Spark 2.4* > {code:java} > spark-sql> CREATE TABLE t AS SELECT m, map_filter(m, (k,v) -> v=2) c FROM > (SELECT map_concat(map(1,2), map(1,3)) m); > spark-sql> SELECT * FROM t; > {1:3} {1:2} > {code} > *Presto 0.212* > {code:java} > presto> SELECT a, map_filter(a, (k,v) -> v = 2) FROM (SELECT > map_concat(map(array[1],array[2]), map(array[1],array[3])) a); >a | _col1 > ---+--- > {1=3} | {} > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-25833) Views without column names created by Hive are not readable by Spark
[ https://issues.apache.org/jira/browse/SPARK-25833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1178#comment-1178 ] Dilip Biswal edited comment on SPARK-25833 at 10/27/18 8:39 PM: This looks like a duplicate of https://issues.apache.org/jira/browse/SPARK-24864. Please see the discussion there. Basically Hive and Spark are two different systems and follow a different scheme to compute auto generated column names. We should be using aliases in the view definition to make it runnable from spark. cc [~smilegator] [~srowen] Thank you. was (Author: dkbiswal): This looks like a duplicate of https://issues.apache.org/jira/browse/SPARK-24864. Please see the discussion there. Basically Hive and Spark are two different systems and follow a different scheme to compute auto generated column names. We should be using aliases in the view definition to make it runnable from spark. Thank you. > Views without column names created by Hive are not readable by Spark > > > Key: SPARK-25833 > URL: https://issues.apache.org/jira/browse/SPARK-25833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > A simple example to reproduce this issue. > create a view via Hive CLI: > {code:sql} > hive> CREATE VIEW v1 AS SELECT * FROM (SELECT 1) t1 > {code} > query that view via Spark > {code:sql} > spark-sql> select * from v1; > Error in query: cannot resolve '`t1._c0`' given input columns: [1]; line 1 > pos 7; > 'Project [*] > +- 'SubqueryAlias v1, `default`.`v1` >+- 'Project ['t1._c0] > +- SubqueryAlias t1 > +- Project [1 AS 1#41] > +- OneRowRelation$ > {code} > Check the view definition: > {code:sql} > hive> desc extended v1; > OK > _c0 int > ... > viewOriginalText:SELECT * FROM (SELECT 1) t1, > viewExpandedText:SELECT `t1`.`_c0` FROM (SELECT 1) `t1` > ... > {code} > _c0 in above view definition is automatically generated by Hive, which is not > recognizable by Spark. > see [Hive > LanguageManual|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=30746446&navigatingVersions=true#LanguageManualDDL-CreateView] > for more details: > {quote}If no column names are supplied, the names of the view's columns will > be derived automatically from the defining SELECT expression. (If the SELECT > contains unaliased scalar expressions such as x+y, the resulting view column > names will be generated in the form _C0, _C1, etc.) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25858) Passing Field Metadata to Parquet
[ https://issues.apache.org/jira/browse/SPARK-25858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinli Shang resolved SPARK-25858. - Resolution: Later It is a little early to open this issue. I will re-open it after the dependency issues are designed. > Passing Field Metadata to Parquet > - > > Key: SPARK-25858 > URL: https://issues.apache.org/jira/browse/SPARK-25858 > Project: Spark > Issue Type: New Feature > Components: Input/Output >Affects Versions: 2.3.2 >Reporter: Xinli Shang >Priority: Major > > h1. Problem Statement > The Spark WriteSupport class for Parquet is hardcoded to use > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport, which > is not configurable. Currently, this class doesn’t carry over the field > metadata in StructType to MessageType. However, Parquet column encryption > (Parquet-1396, Parquet-1178) requires the field metadata inside MessageType > of Parquet, so that the metadata can be used to control column encryption. > h1. Technical Solution > # Extend SparkToParquetSchemaConverter class and override convert() method > to add the functionality of carrying over the field metadata > # Extend ParquetWriteSupport and use the extended converter in #1. The > extension avoids changing the built-in WriteSupport to mitigate the risk. > # Change Spark code to make the WriteSupport class configurable to let the > user configure to use the extended WriteSupport in #2. The default > WriteSupport is still > org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport. > h1. Technical Details > {{Note: The code below kind of in messy format. The link below shows correct > format. }} > h2. Extend SparkToParquetSchemaConverter class > *SparkToParquetMetadataSchemaConverter* extends > SparkToParquetSchemaConverter { > > *override* def convert(catalystSchema: StructType): MessageType = > { Types ._buildMessage_() > .addFields(catalystSchema.map(*convertFieldWithMetadata*): _*) > .named(ParquetSchemaConverter._SPARK_PARQUET_SCHEMA_NAME_) > } > > private def *convertFieldWithMetadata*(field: StructField) : Type = > { val extField = new ExtType[Any](convertField(field)) > val metaBuilder = new MetadataBuilder().withMetadata(field.metadata) > val metaData = metaBuilder.getMap > extField.setMetadata(metaData) return extField } > } > h2. Extend ParquetWriteSupport > class CryptoParquetWriteSupport extends ParquetWriteSupport { > *override* def init(configuration: Configuration): WriteContext = > { val converter = new > *SparkToParquetMetadataSchemaConverter*(configuration) > createContext(configuration, converter) } > } > h2. Make WriteSupport configurable > class ParquetFileFormat{ > > ** override def prepareWrite(...) { > … > *if (conf.get(ParquetOutputFormat.**_WRITE_SUPPORT_CLASS_**) == null) > {* > ParquetOutputFormat._setWriteSupportClass_(job, > _classOf_[ParquetWriteSupport]) > ** > ... > } > } > h1. Verification > The > [ParquetHelloWorld.java|https://github.com/shangxinli/parquet-writesupport-extensions/blob/master/src/main/java/com/uber/ParquetHelloWorld.java] > in the github repository > [parquet-writesupport-extensions|https://github.com/shangxinli/parquet-writesupport-extensions] > has a sample verification of passing down the field metadata and perform > column encryption. > h1. Dependency > * Parquet-1178 > * Parquet-1396 > * Parquet-1397 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1211#comment-1211 ] Huaxin Gao commented on SPARK-25859: PowerIterationClustering is not in the doc either. Do I need to add it too? > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.
[ https://issues.apache.org/jira/browse/SPARK-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25861: Assignee: (was: Apache Spark) > Remove unused refreshInterval parameter from the headerSparkPage method. > > > Key: SPARK-25861 > URL: https://issues.apache.org/jira/browse/SPARK-25861 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: shahid >Priority: Minor > > https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221 > > refreshInterval is not used anywhere in the headerSparkPage method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.
[ https://issues.apache.org/jira/browse/SPARK-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25861: Assignee: Apache Spark > Remove unused refreshInterval parameter from the headerSparkPage method. > > > Key: SPARK-25861 > URL: https://issues.apache.org/jira/browse/SPARK-25861 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: shahid >Assignee: Apache Spark >Priority: Minor > > https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221 > > refreshInterval is not used anywhere in the headerSparkPage method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.
[ https://issues.apache.org/jira/browse/SPARK-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1196#comment-1196 ] Apache Spark commented on SPARK-25861: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/22864 > Remove unused refreshInterval parameter from the headerSparkPage method. > > > Key: SPARK-25861 > URL: https://issues.apache.org/jira/browse/SPARK-25861 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: shahid >Priority: Minor > > https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221 > > refreshInterval is not used anywhere in the headerSparkPage method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.
[ https://issues.apache.org/jira/browse/SPARK-25861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1197#comment-1197 ] Apache Spark commented on SPARK-25861: -- User 'shahidki31' has created a pull request for this issue: https://github.com/apache/spark/pull/22864 > Remove unused refreshInterval parameter from the headerSparkPage method. > > > Key: SPARK-25861 > URL: https://issues.apache.org/jira/browse/SPARK-25861 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.3.2 >Reporter: shahid >Priority: Minor > > https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221 > > refreshInterval is not used anywhere in the headerSparkPage method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25861) Remove unused refreshInterval parameter from the headerSparkPage method.
shahid created SPARK-25861: -- Summary: Remove unused refreshInterval parameter from the headerSparkPage method. Key: SPARK-25861 URL: https://issues.apache.org/jira/browse/SPARK-25861 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.3.2 Reporter: shahid https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L221 refreshInterval is not used anywhere in the headerSparkPage method. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25859: Assignee: Apache Spark > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25859: Assignee: (was: Apache Spark) > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1189#comment-1189 ] Apache Spark commented on SPARK-25859: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/22863 > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1188#comment-1188 ] Apache Spark commented on SPARK-25859: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/22863 > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25833) Views without column names created by Hive are not readable by Spark
[ https://issues.apache.org/jira/browse/SPARK-25833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1178#comment-1178 ] Dilip Biswal commented on SPARK-25833: -- This looks like a duplicate of https://issues.apache.org/jira/browse/SPARK-24864. Please see the discussion there. Basically Hive and Spark are two different systems and follow a different scheme to compute auto generated column names. We should be using aliases in the view definition to make it runnable from spark. Thank you. > Views without column names created by Hive are not readable by Spark > > > Key: SPARK-25833 > URL: https://issues.apache.org/jira/browse/SPARK-25833 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Chenxiao Mao >Priority: Major > > A simple example to reproduce this issue. > create a view via Hive CLI: > {code:sql} > hive> CREATE VIEW v1 AS SELECT * FROM (SELECT 1) t1 > {code} > query that view via Spark > {code:sql} > spark-sql> select * from v1; > Error in query: cannot resolve '`t1._c0`' given input columns: [1]; line 1 > pos 7; > 'Project [*] > +- 'SubqueryAlias v1, `default`.`v1` >+- 'Project ['t1._c0] > +- SubqueryAlias t1 > +- Project [1 AS 1#41] > +- OneRowRelation$ > {code} > Check the view definition: > {code:sql} > hive> desc extended v1; > OK > _c0 int > ... > viewOriginalText:SELECT * FROM (SELECT 1) t1, > viewExpandedText:SELECT `t1`.`_c0` FROM (SELECT 1) `t1` > ... > {code} > _c0 in above view definition is automatically generated by Hive, which is not > recognizable by Spark. > see [Hive > LanguageManual|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=30746446&navigatingVersions=true#LanguageManualDDL-CreateView] > for more details: > {quote}If no column names are supplied, the names of the view's columns will > be derived automatically from the defining SELECT expression. (If the SELECT > contains unaliased scalar expressions such as x+y, the resulting view column > names will be generated in the form _C0, _C1, etc.) > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25661) Refactor AvroWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1147#comment-1147 ] Apache Spark commented on SPARK-25661: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22861 > Refactor AvroWriteBenchmark to use main method > -- > > Key: SPARK-25661 > URL: https://issues.apache.org/jira/browse/SPARK-25661 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25663) Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1148#comment-1148 ] yucai commented on SPARK-25663: --- [~Gengliang.Wang] I make an improvement on this, could you help review? https://github.com/apache/spark/pull/22861 > Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use > main method > > > Key: SPARK-25663 > URL: https://issues.apache.org/jira/browse/SPARK-25663 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25663) Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25663: Assignee: (was: Apache Spark) > Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use > main method > > > Key: SPARK-25663 > URL: https://issues.apache.org/jira/browse/SPARK-25663 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25661) Refactor AvroWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25661: Assignee: (was: Apache Spark) > Refactor AvroWriteBenchmark to use main method > -- > > Key: SPARK-25661 > URL: https://issues.apache.org/jira/browse/SPARK-25661 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25663) Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1144#comment-1144 ] Apache Spark commented on SPARK-25663: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22861 > Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use > main method > > > Key: SPARK-25663 > URL: https://issues.apache.org/jira/browse/SPARK-25663 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25661) Refactor AvroWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25661: Assignee: Apache Spark > Refactor AvroWriteBenchmark to use main method > -- > > Key: SPARK-25661 > URL: https://issues.apache.org/jira/browse/SPARK-25661 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25663) Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25663: Assignee: Apache Spark > Refactor BuiltInDataSourceWriteBenchmark and DataSourceWriteBenchmark to use > main method > > > Key: SPARK-25663 > URL: https://issues.apache.org/jira/browse/SPARK-25663 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25661) Refactor AvroWriteBenchmark to use main method
[ https://issues.apache.org/jira/browse/SPARK-25661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1146#comment-1146 ] Apache Spark commented on SPARK-25661: -- User 'yucai' has created a pull request for this issue: https://github.com/apache/spark/pull/22861 > Refactor AvroWriteBenchmark to use main method > -- > > Key: SPARK-25661 > URL: https://issues.apache.org/jira/browse/SPARK-25661 > Project: Spark > Issue Type: Sub-task > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-23367) Include python document style checking
[ https://issues.apache.org/jira/browse/SPARK-23367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23367. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 22425 [https://github.com/apache/spark/pull/22425] > Include python document style checking > -- > > Key: SPARK-23367 > URL: https://issues.apache.org/jira/browse/SPARK-23367 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.2.1 >Reporter: Rekha Joshi >Assignee: Rekha Joshi >Priority: Minor > Fix For: 3.0.0 > > > As per discussions [PR#20378 |https://github.com/apache/spark/pull/20378] > this jira is to include python doc style checking in spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-23367) Include python document style checking
[ https://issues.apache.org/jira/browse/SPARK-23367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-23367: - Assignee: Rekha Joshi > Include python document style checking > -- > > Key: SPARK-23367 > URL: https://issues.apache.org/jira/browse/SPARK-23367 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 2.2.1 >Reporter: Rekha Joshi >Assignee: Rekha Joshi >Priority: Minor > Fix For: 3.0.0 > > > As per discussions [PR#20378 |https://github.com/apache/spark/pull/20378] > this jira is to include python doc style checking in spark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25816) Functions does not resolve Columns correctly
[ https://issues.apache.org/jira/browse/SPARK-25816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1058#comment-1058 ] Peter Toth commented on SPARK-25816: Thanks [~bzhang], It seems both are regressions from 2.2 to 2.3 for the same reason. My submitted PR fixes them. > Functions does not resolve Columns correctly > > > Key: SPARK-25816 > URL: https://issues.apache.org/jira/browse/SPARK-25816 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Brian Zhang >Priority: Critical > Attachments: final_allDatatypes_Spark.avro, source.snappy.parquet > > > When there is a duplicate column name in the current Dataframe and orginal > Dataframe where current df is selected from, Spark in 2.3.0 and 2.3.1 does > not resolve the column correctly when using it in the expression, hence > causing casting issue. The same code is working in Spark 2.2.1 > Please see below code to reproduce the issue > import org.apache.spark._ > import org.apache.spark.rdd._ > import org.apache.spark.storage.StorageLevel._ > import org.apache.spark.sql._ > import org.apache.spark.sql.DataFrame > import org.apache.spark.sql.types._ > import org.apache.spark.sql.functions._ > import org.apache.spark.sql.catalyst.expressions._ > import org.apache.spark.sql.Column > val v0 = spark.read.parquet("/data/home/bzinfa/bz/source.snappy.parquet") > val v00 = v0.toDF(v0.schema.fields.indices.view.map("" + _):_*) > val v5 = v00.select($"13".as("0"),$"14".as("1"),$"15".as("2")) > val v5_2 = $"2" > v5.where(lit(500).<(v5_2(new Column(new MapKeys(v5_2.expr))(lit(0) > //v00's 3rdcolumn is binary and 16th is map > Error: > org.apache.spark.sql.AnalysisException: cannot resolve 'map_keys(`2`)' due to > data type mismatch: argument 1 requires map type, however, '`2`' is of binary > type.; > > 'Project [0#1591, 1#1592, 2#1593] +- 'Filter (500 < > {color:#FF}2#1593{color}[map_keys({color:#FF}2#1561{color})[0]]) +- > Project [13#1572 AS 0#1591, 14#1573 AS 1#1592, 15#1574 AS 2#1593, 2#1561] +- > Project [c_bytes#1527 AS 0#1559, c_union#1528 AS 1#1560, c_fixed#1529 AS > 2#1561, c_boolean#1530 AS 3#1562, c_float#1531 AS 4#1563, c_double#1532 AS > 5#1564, c_int#1533 AS 6#1565, c_long#1534L AS 7#1566L, c_string#1535 AS > 8#1567, c_decimal_18_2#1536 AS 9#1568, c_decimal_28_2#1537 AS 10#1569, > c_decimal_38_2#1538 AS 11#1570, c_date#1539 AS 12#1571, simple_struct#1540 AS > 13#1572, simple_array#1541 AS 14#1573, simple_map#1542 AS 15#1574] +- > Relation[c_bytes#1527,c_union#1528,c_fixed#1529,c_boolean#1530,c_float#1531,c_double#1532,c_int#1533,c_long#1534L,c_string#1535,c_decimal_18_2#1536,c_decimal_28_2#1537,c_decimal_38_2#1538,c_date#1539,simple_struct#1540,simple_array#1541,simple_map#1542] > parquet -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24709) Inferring schema from JSON string literal
[ https://issues.apache.org/jira/browse/SPARK-24709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1033#comment-1033 ] Apache Spark commented on SPARK-24709: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/22858 > Inferring schema from JSON string literal > - > > Key: SPARK-24709 > URL: https://issues.apache.org/jira/browse/SPARK-24709 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Fix For: 2.4.0 > > > Need to add new function - *schema_of_json()*. The function should infer > schema of JSON string literal. The result of the function is a schema in DDL > format. > One of the use cases is passing output of _schema_of_json()_ to > *from_json()*. Currently, the _from_json()_ function requires a schema as a > mandatory argument. An user has to pass a schema as string literal in SQL. > The new function should allow schema inferring from an example. Let's say > json_col is a column containing JSON string with the same schema. It should > be possible to pass a JSON string with the same schema to _schema_of_json()_ > which infers schema for the particular example. > {code:sql} > select from_json(json_col, schema_of_json('{"f1": 0, "f2": [0], "f2": "a"}')) > from json_table; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25860) Replace Literal(null, _) with FalseLiteral whenever possible
[ https://issues.apache.org/jira/browse/SPARK-25860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1032#comment-1032 ] Apache Spark commented on SPARK-25860: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/22857 > Replace Literal(null, _) with FalseLiteral whenever possible > > > Key: SPARK-25860 > URL: https://issues.apache.org/jira/browse/SPARK-25860 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Anton Okolnychyi >Priority: Major > > We should have a new optimization rule that replaces {{Literal(null, _)}} > with {{FalseLiteral}} in conditions in {{Join}} and {{Filter}}, predicates in > {{If}}, conditions in {{CaseWhen}}. > The underlying idea is that those expressions evaluate to {{false}} if the > underlying expression is {{null}} (as an example see > {{GeneratePredicate$create}} or {{doGenCode}} and {{eval}} methods in {{If}} > and {{CaseWhen}}). Therefore, we can replace {{Literal(null, _)}} with > {{FalseLiteral}}, which can lead to more optimizations later on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25860) Replace Literal(null, _) with FalseLiteral whenever possible
[ https://issues.apache.org/jira/browse/SPARK-25860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25860: Assignee: (was: Apache Spark) > Replace Literal(null, _) with FalseLiteral whenever possible > > > Key: SPARK-25860 > URL: https://issues.apache.org/jira/browse/SPARK-25860 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Anton Okolnychyi >Priority: Major > > We should have a new optimization rule that replaces {{Literal(null, _)}} > with {{FalseLiteral}} in conditions in {{Join}} and {{Filter}}, predicates in > {{If}}, conditions in {{CaseWhen}}. > The underlying idea is that those expressions evaluate to {{false}} if the > underlying expression is {{null}} (as an example see > {{GeneratePredicate$create}} or {{doGenCode}} and {{eval}} methods in {{If}} > and {{CaseWhen}}). Therefore, we can replace {{Literal(null, _)}} with > {{FalseLiteral}}, which can lead to more optimizations later on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25860) Replace Literal(null, _) with FalseLiteral whenever possible
[ https://issues.apache.org/jira/browse/SPARK-25860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25860: Assignee: Apache Spark > Replace Literal(null, _) with FalseLiteral whenever possible > > > Key: SPARK-25860 > URL: https://issues.apache.org/jira/browse/SPARK-25860 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Anton Okolnychyi >Assignee: Apache Spark >Priority: Major > > We should have a new optimization rule that replaces {{Literal(null, _)}} > with {{FalseLiteral}} in conditions in {{Join}} and {{Filter}}, predicates in > {{If}}, conditions in {{CaseWhen}}. > The underlying idea is that those expressions evaluate to {{false}} if the > underlying expression is {{null}} (as an example see > {{GeneratePredicate$create}} or {{doGenCode}} and {{eval}} methods in {{If}} > and {{CaseWhen}}). Therefore, we can replace {{Literal(null, _)}} with > {{FalseLiteral}}, which can lead to more optimizations later on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25860) Replace Literal(null, _) with FalseLiteral whenever possible
[ https://issues.apache.org/jira/browse/SPARK-25860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1031#comment-1031 ] Apache Spark commented on SPARK-25860: -- User 'aokolnychyi' has created a pull request for this issue: https://github.com/apache/spark/pull/22857 > Replace Literal(null, _) with FalseLiteral whenever possible > > > Key: SPARK-25860 > URL: https://issues.apache.org/jira/browse/SPARK-25860 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 3.0.0 >Reporter: Anton Okolnychyi >Priority: Major > > We should have a new optimization rule that replaces {{Literal(null, _)}} > with {{FalseLiteral}} in conditions in {{Join}} and {{Filter}}, predicates in > {{If}}, conditions in {{CaseWhen}}. > The underlying idea is that those expressions evaluate to {{false}} if the > underlying expression is {{null}} (as an example see > {{GeneratePredicate$create}} or {{doGenCode}} and {{eval}} methods in {{If}} > and {{CaseWhen}}). Therefore, we can replace {{Literal(null, _)}} with > {{FalseLiteral}}, which can lead to more optimizations later on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25860) Replace Literal(null, _) with FalseLiteral whenever possible
Anton Okolnychyi created SPARK-25860: Summary: Replace Literal(null, _) with FalseLiteral whenever possible Key: SPARK-25860 URL: https://issues.apache.org/jira/browse/SPARK-25860 Project: Spark Issue Type: Improvement Components: Optimizer, SQL Affects Versions: 3.0.0 Reporter: Anton Okolnychyi We should have a new optimization rule that replaces {{Literal(null, _)}} with {{FalseLiteral}} in conditions in {{Join}} and {{Filter}}, predicates in {{If}}, conditions in {{CaseWhen}}. The underlying idea is that those expressions evaluate to {{false}} if the underlying expression is {{null}} (as an example see {{GeneratePredicate$create}} or {{doGenCode}} and {{eval}} methods in {{If}} and {{CaseWhen}}). Therefore, we can replace {{Literal(null, _)}} with {{FalseLiteral}}, which can lead to more optimizations later on. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25259) Left/Right join support push down during-join predicates
[ https://issues.apache.org/jira/browse/SPARK-25259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-25259: Description: For example: {code:sql} create temporary view EMPLOYEE as select * from values ("10", "HAAS", "A00"), ("10", "THOMPSON", "B01"), ("30", "KWAN", "C01"), ("000110", "LUCCHESSI", "A00"), ("000120", "O'CONNELL", "A))"), ("000130", "QUINTANA", "C01") as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT); create temporary view DEPARTMENT as select * from values ("A00", "SPIFFY COMPUTER SERVICE DIV.", "10"), ("B01", "PLANNING", "20"), ("C01", "INFORMATION CENTER", "30"), ("D01", "DEVELOPMENT CENTER", null) as DEPARTMENT(DEPTNO, DEPTNAME, MGRNO); create temporary view PROJECT as select * from values ("AD3100", "ADMIN SERVICES", "D01"), ("IF1000", "QUERY SERVICES", "C01"), ("IF2000", "USER EDUCATION", "E01"), ("MA2100", "WELD LINE AUDOMATION", "D01"), ("PL2100", "WELD LINE PLANNING", "01") as PROJECT(PROJNO, PROJNAME, DEPTNO); {code} below SQL: {code:sql} SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01'; {code} can Optimized to: {code:sql} SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01'; {code} was: For example: {code:sql} create temporary view EMPLOYEE as select * from values ("10", "HAAS", "A00"), ("10", "THOMPSON", "B01"), ("30", "KWAN", "C01"), ("000110", "LUCCHESSI", "A00"), ("000120", "O'CONNELL", "A))"), ("000130", "QUINTANA", "C01") as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT); create temporary view DEPARTMENT as select * from values ("A00", "SPIFFY COMPUTER SERVICE DIV.", "10"), ("B01", "PLANNING", "20"), ("C01", "INFORMATION CENTER", "30"), ("D01", "DEVELOPMENT CENTER", null) as EMPLOYEE(DEPTNO, DEPTNAME, MGRNO); create temporary view PROJECT as select * from values ("AD3100", "ADMIN SERVICES", "D01"), ("IF1000", "QUERY SERVICES", "C01"), ("IF2000", "USER EDUCATION", "E01"), ("MA2100", "WELD LINE AUDOMATION", "D01"), ("PL2100", "WELD LINE PLANNING", "01") as EMPLOYEE(PROJNO, PROJNAME, DEPTNO); {code} below SQL: {code:sql} SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01'; {code} can Optimized to: {code:sql} SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D ON P.DEPTNO = D.DEPTNO AND P.DEPTNO='E01'; {code} > Left/Right join support push down during-join predicates > > > Key: SPARK-25259 > URL: https://issues.apache.org/jira/browse/SPARK-25259 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Yuming Wang >Priority: Major > > For example: > {code:sql} > create temporary view EMPLOYEE as select * from values > ("10", "HAAS", "A00"), > ("10", "THOMPSON", "B01"), > ("30", "KWAN", "C01"), > ("000110", "LUCCHESSI", "A00"), > ("000120", "O'CONNELL", "A))"), > ("000130", "QUINTANA", "C01") > as EMPLOYEE(EMPNO, LASTNAME, WORKDEPT); > create temporary view DEPARTMENT as select * from values > ("A00", "SPIFFY COMPUTER SERVICE DIV.", "10"), > ("B01", "PLANNING", "20"), > ("C01", "INFORMATION CENTER", "30"), > ("D01", "DEVELOPMENT CENTER", null) > as DEPARTMENT(DEPTNO, DEPTNAME, MGRNO); > create temporary view PROJECT as select * from values > ("AD3100", "ADMIN SERVICES", "D01"), > ("IF1000", "QUERY SERVICES", "C01"), > ("IF2000", "USER EDUCATION", "E01"), > ("MA2100", "WELD LINE AUDOMATION", "D01"), > ("PL2100", "WELD LINE PLANNING", "01") > as PROJECT(PROJNO, PROJNAME, DEPTNO); > {code} > below SQL: > {code:sql} > SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME > FROM PROJECT P LEFT OUTER JOIN DEPARTMENT D > ON P.DEPTNO = D.DEPTNO > AND P.DEPTNO='E01'; > {code} > can Optimized to: > {code:sql} > SELECT PROJNO, PROJNAME, P.DEPTNO, DEPTNAME > FROM PROJECT P LEFT OUTER JOIN (SELECT * FROM DEPARTMENT WHERE DEPTNO='E01') D > ON P.DEPTNO = D.DEPTNO > AND P.DEPTNO='E01'; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-25859: --- Description: scala/java/python examples and doc for PrefixSpan are added in 3.0 in https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the examples and doc in 2.4. (was: scala/java/python examples and doc for PrefixSpan are added 3.0 in https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the examples and doc in 2.4.) > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added in 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
[ https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16665973#comment-16665973 ] Huaxin Gao commented on SPARK-25859: [~felixcheung] I had some problems to submit a PR for v2.4.0-rc5. Will try again tomorrow. > add scala/java/python example and doc for PrefixSpan > > > Key: SPARK-25859 > URL: https://issues.apache.org/jira/browse/SPARK-25859 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.4.0 >Reporter: Huaxin Gao >Priority: Major > > scala/java/python examples and doc for PrefixSpan are added 3.0 in > https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the > examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25859) add scala/java/python example and doc for PrefixSpan
Huaxin Gao created SPARK-25859: -- Summary: add scala/java/python example and doc for PrefixSpan Key: SPARK-25859 URL: https://issues.apache.org/jira/browse/SPARK-25859 Project: Spark Issue Type: Improvement Components: ML Affects Versions: 2.4.0 Reporter: Huaxin Gao scala/java/python examples and doc for PrefixSpan are added 3.0 in https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the examples and doc in 2.4. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org