[jira] [Resolved] (SPARK-30093) Improve error message for creating views
[ https://issues.apache.org/jira/browse/SPARK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-30093. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26731 [https://github.com/apache/spark/pull/26731] > Improve error message for creating views > > > Key: SPARK-30093 > URL: https://issues.apache.org/jira/browse/SPARK-30093 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Aman Omer >Assignee: Aman Omer >Priority: Major > Fix For: 3.0.0 > > > Improve error message for creating views. > https://github.com/apache/spark/pull/26317#discussion_r352377363 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30093) Improve error message for creating views
[ https://issues.apache.org/jira/browse/SPARK-30093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-30093: --- Assignee: Aman Omer > Improve error message for creating views > > > Key: SPARK-30093 > URL: https://issues.apache.org/jira/browse/SPARK-30093 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Aman Omer >Assignee: Aman Omer >Priority: Major > > Improve error message for creating views. > https://github.com/apache/spark/pull/26317#discussion_r352377363 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30135) Add documentation for DELETE JAR and DELETE File command
[ https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Katta updated SPARK-30135: -- Summary: Add documentation for DELETE JAR and DELETE File command (was: Add documentation for DELETE JAR command) > Add documentation for DELETE JAR and DELETE File command > > > Key: SPARK-30135 > URL: https://issues.apache.org/jira/browse/SPARK-30135 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30137) Support DELETE file
[ https://issues.apache.org/jira/browse/SPARK-30137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988508#comment-16988508 ] Sandeep Katta commented on SPARK-30137: --- I started working on this feature, I will raise the PR soon > Support DELETE file > > > Key: SPARK-30137 > URL: https://issues.apache.org/jira/browse/SPARK-30137 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30137) Support DELETE file
Sandeep Katta created SPARK-30137: - Summary: Support DELETE file Key: SPARK-30137 URL: https://issues.apache.org/jira/browse/SPARK-30137 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Sandeep Katta -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30133) Support DELETE Jar and DELETE File functionality in spark
[ https://issues.apache.org/jira/browse/SPARK-30133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandeep Katta updated SPARK-30133: -- Summary: Support DELETE Jar and DELETE File functionality in spark (was: Support DELETE Jar functionality in spark) > Support DELETE Jar and DELETE File functionality in spark > - > > Key: SPARK-30133 > URL: https://issues.apache.org/jira/browse/SPARK-30133 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > Labels: Umbrella > > SPARK should support delete jar feature -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath
Sandeep Katta created SPARK-30136: - Summary: DELETE JAR should also remove the jar from executor classPath Key: SPARK-30136 URL: https://issues.apache.org/jira/browse/SPARK-30136 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Sandeep Katta -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30136) DELETE JAR should also remove the jar from executor classPath
[ https://issues.apache.org/jira/browse/SPARK-30136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988504#comment-16988504 ] Sandeep Katta commented on SPARK-30136: --- I started working on this feature, I will raise the PR soon > DELETE JAR should also remove the jar from executor classPath > - > > Key: SPARK-30136 > URL: https://issues.apache.org/jira/browse/SPARK-30136 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath
[ https://issues.apache.org/jira/browse/SPARK-30134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988502#comment-16988502 ] Sandeep Katta commented on SPARK-30134: --- I started working on this feature, I will raise the PR soon > DELETE JAR should remove from addedJars list and from classpath > > > Key: SPARK-30134 > URL: https://issues.apache.org/jira/browse/SPARK-30134 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30135) Add documentation for DELETE JAR command
[ https://issues.apache.org/jira/browse/SPARK-30135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988503#comment-16988503 ] Sandeep Katta commented on SPARK-30135: --- I started working on this feature, I will raise the PR soon > Add documentation for DELETE JAR command > > > Key: SPARK-30135 > URL: https://issues.apache.org/jira/browse/SPARK-30135 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30134) DELETE JAR should remove from addedJars list and from classpath
Sandeep Katta created SPARK-30134: - Summary: DELETE JAR should remove from addedJars list and from classpath Key: SPARK-30134 URL: https://issues.apache.org/jira/browse/SPARK-30134 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: Sandeep Katta -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30135) Add documentation for DELETE JAR command
Sandeep Katta created SPARK-30135: - Summary: Add documentation for DELETE JAR command Key: SPARK-30135 URL: https://issues.apache.org/jira/browse/SPARK-30135 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.0.0 Reporter: Sandeep Katta -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30133) Support DELETE Jar functionality in spark
Sandeep Katta created SPARK-30133: - Summary: Support DELETE Jar functionality in spark Key: SPARK-30133 URL: https://issues.apache.org/jira/browse/SPARK-30133 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Sandeep Katta SPARK should support delete jar feature -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29915) spark-py and spark-r images are not created with docker-image-tool.sh
[ https://issues.apache.org/jira/browse/SPARK-29915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988492#comment-16988492 ] Veera commented on SPARK-29915: --- Yes this option is not included , but you can add manually in docker files and create your own custom docker image > spark-py and spark-r images are not created with docker-image-tool.sh > - > > Key: SPARK-29915 > URL: https://issues.apache.org/jira/browse/SPARK-29915 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Michał Wesołowski >Priority: Major > > Currently at version 3.0.0-preview docker-image-tool.sh script has the > [following > lines|[https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh#L173]] > defined: > {code:java} > local PYDOCKERFILE=${PYDOCKERFILE:-false} > local RDOCKERFILE=${RDOCKERFILE:-false} {code} > Because of this change spark-py nor spark-r images get created. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29021) NoSuchElementException: key not found: hostPath.spark-local-dir-5.options.path
[ https://issues.apache.org/jira/browse/SPARK-29021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988487#comment-16988487 ] Veera commented on SPARK-29021: --- Hi Even im faing the issue while submiitng the spark /opt/spark/bin/spark-submit --master XXX/ \ --deploy-mode cluster \ --name spark-wordcount \ --conf spark.kubernetes.namespace=spark --conf spark.kubernetes.authenticate.driver.serviceAccountName=spar-sa \ --conf spark.executor.instances=3 \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image=container image latest version \ --conf spark.kubernetes.pyspark.pythonVersion=3 \ --conf spark.kubernetes.authenticate.submission.caCertFile=/etc/kubernetes/pki/ca.crt \ --conf spark.kubernetes.driver.volumes.hostPath.sparkwordcount.mount.path=/root \ --conf spark.kubernetes.driver.volumes.hostPath.sparkwordcount.mount.readOnly=false \ --conf spark.kubernetes.driver.volumes.hostPath.sparkwordcount.options.claimName=spark-wordcount-claim \ /root/pyscripts/wordcount.py Exception in thread "main" java.util.NoSuchElementException: hostPath.sparkwordcount.options.path at org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps$$anonfun$getTry$1.apply(KubernetesVolumeUtils.scala:107) at org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps$$anonfun$getTry$1.apply(KubernetesVolumeUtils.scala:107) at scala.Option.fold(Option.scala:158) at org.apache.spark.deploy.k8s.KubernetesVolumeUtils$MapOps.getTry(KubernetesVolumeUtils.scala:107) > NoSuchElementException: key not found: hostPath.spark-local-dir-5.options.path > -- > > Key: SPARK-29021 > URL: https://issues.apache.org/jira/browse/SPARK-29021 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.4 >Reporter: Kent Yao >Priority: Major > > Mount hostPath has an issue: > {code:java} > Exception in thread "main" java.util.NoSuchElementException: key not found: > hostPath.spark-local-dir-5.options.pathException in thread "main" > java.util.NoSuchElementException: key not found: > hostPath.spark-local-dir-5.options.path at > scala.collection.MapLike.default(MapLike.scala:235) at > scala.collection.MapLike.default$(MapLike.scala:234) at > scala.collection.AbstractMap.default(Map.scala:63) at > scala.collection.MapLike.apply(MapLike.scala:144) at > scala.collection.MapLike.apply$(MapLike.scala:143) at > scala.collection.AbstractMap.apply(Map.scala:63) at > org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.parseVolumeSpecificConf(KubernetesVolumeUtils.scala:70) > at > org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.$anonfun$parseVolumesWithPrefix$1(KubernetesVolumeUtils.scala:43) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at > scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:321) at > scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:977) at > scala.collection.TraversableLike.map(TraversableLike.scala:237) at > scala.collection.TraversableLike.map$(TraversableLike.scala:230) at > scala.collection.AbstractSet.scala$collection$SetLike$$super$map(Set.scala:51) > at scala.collection.SetLike.map(SetLike.scala:104) at > scala.collection.SetLike.map$(SetLike.scala:104) at > scala.collection.AbstractSet.map(Set.scala:51) at > org.apache.spark.deploy.k8s.KubernetesVolumeUtils$.parseVolumesWithPrefix(KubernetesVolumeUtils.scala:33) > at > org.apache.spark.deploy.k8s.KubernetesConf$.createDriverConf(KubernetesConf.scala:179) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:214) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:198) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:179) at > org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:202) at > org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:89) at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:999) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1008) at > org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-29152) Spark Executor Plugin API shutdown is not proper when dynamic allocation enabled
[ https://issues.apache.org/jira/browse/SPARK-29152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Raushan updated SPARK-29152: --- Affects Version/s: 3.0.0 > Spark Executor Plugin API shutdown is not proper when dynamic allocation > enabled > > > Key: SPARK-29152 > URL: https://issues.apache.org/jira/browse/SPARK-29152 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0 >Reporter: jobit mathew >Priority: Major > > *Issue Description* > Spark Executor Plugin API *shutdown handling is not proper*, when dynamic > allocation enabled .Plugin's shutdown method is not processed when dynamic > allocation is enabled and *executors become dead* after inactive time. > *Test Precondition* > 1. Create a plugin and make a jar named SparkExecutorplugin.jar > import org.apache.spark.ExecutorPlugin; > public class ExecutoTest1 implements ExecutorPlugin{ > public void init(){ > System.out.println("Executor Plugin Initialised."); > } > public void shutdown(){ > System.out.println("Executor plugin closed successfully."); > } > } > 2. Create the jars with the same and put it in folder /spark/examples/jars > *Test Steps* > 1. launch bin/spark-sql with dynamic allocation enabled > ./spark-sql --master yarn --conf spark.executor.plugins=ExecutoTest1 --jars > /opt/HA/C10/install/spark/spark/examples/jars/SparkExecutorPlugin.jar --conf > spark.dynamicAllocation.enabled=true --conf > spark.dynamicAllocation.initialExecutors=2 --conf > spark.dynamicAllocation.minExecutors=1 > 2 create a table , insert the data and select * from tablename > 3.Check the spark UI Jobs tab/SQL tab > 4. Check all Executors(executor tab will give all executors details) > application log file for Executor plugin Initialization and Shutdown messages > or operations. > Example > /yarn/logdir/application_1567156749079_0025/container_e02_1567156749079_0025_01_05/ > stdout > 5. Wait for the executor to be dead after the inactive time and check the > same container log > 6. Kill the spark sql and check the container log for executor plugin > shutdown. > *Expect Output* > 1. Job should be success. Create table ,insert and select query should be > success. > 2.While running query All Executors log should contain the executor plugin > Init messages or operations. > "Executor Plugin Initialised. > 3.Once the executors are dead ,shutdown message should be there in log file. > “ Executor plugin closed successfully. > 4.Once the sql application closed ,shutdown message should be there in log. > “ Executor plugin closed successfully". > *Actual Output* > Shutdown message is not called when executor is dead after inactive time. > *Observation* > Without dynamic allocation Executor plugin is working fine. But after > enabling dynamic allocation,Executor shutdown is not processed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30121) Fix memory usage in sbt build script
[ https://issues.apache.org/jira/browse/SPARK-30121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-30121: - Description: 1. the default memory setting is missing in usage instructions {code:java} ``` build/sbt -h ``` ``` -mem set memory options (default: , which is -Xms2048m -Xmx2048m -XX:ReservedCodeCacheSize=256m) ``` {code} 2. the Perm space is not needed anymore, since java7 is removed. was: {code:java} ``` -mem set memory options (default: , which is -Xms2048m -Xmx2048m -XX:ReservedCodeCacheSize=256m) ``` {code} > Fix memory usage in sbt build script > > > Key: SPARK-30121 > URL: https://issues.apache.org/jira/browse/SPARK-30121 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Kent Yao >Priority: Minor > > 1. the default memory setting is missing in usage instructions > {code:java} > ``` > build/sbt -h > ``` > ``` > -mem set memory options (default: , which is -Xms2048m > -Xmx2048m -XX:ReservedCodeCacheSize=256m) > ``` > {code} > 2. the Perm space is not needed anymore, since java7 is removed. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30121) Fix memory usage in sbt build script
[ https://issues.apache.org/jira/browse/SPARK-30121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-30121: - Summary: Fix memory usage in sbt build script (was: Miss default memory setting for sbt usage) > Fix memory usage in sbt build script > > > Key: SPARK-30121 > URL: https://issues.apache.org/jira/browse/SPARK-30121 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.4.4, 3.0.0 >Reporter: Kent Yao >Priority: Minor > > {code:java} > ``` > -mem set memory options (default: , which is -Xms2048m > -Xmx2048m -XX:ReservedCodeCacheSize=256m) > ``` > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth
[ https://issues.apache.org/jira/browse/SPARK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-30129. --- Fix Version/s: 3.0.0 Resolution: Fixed This is resolved via https://github.com/apache/spark/pull/26760 > New auth engine does not keep client ID in TransportClient after auth > - > > Key: SPARK-30129 > URL: https://issues.apache.org/jira/browse/SPARK-30129 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Major > Fix For: 3.0.0 > > > Found a little bug when working on a feature; when auth is on, it's expected > that the {{TransportClient}} provides the authenticated ID of the client > (generally the app ID), but the new auth engine is not setting that > information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth
[ https://issues.apache.org/jira/browse/SPARK-30129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-30129: - Assignee: Marcelo Masiero Vanzin > New auth engine does not keep client ID in TransportClient after auth > - > > Key: SPARK-30129 > URL: https://issues.apache.org/jira/browse/SPARK-30129 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.4, 3.0.0 >Reporter: Marcelo Masiero Vanzin >Assignee: Marcelo Masiero Vanzin >Priority: Major > > Found a little bug when working on a feature; when auth is on, it's expected > that the {{TransportClient}} provides the authenticated ID of the client > (generally the app ID), but the new auth engine is not setting that > information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30132) Scala 2.13 compile errors from Hadoop LocalFileSystem subclasses
Sean R. Owen created SPARK-30132: Summary: Scala 2.13 compile errors from Hadoop LocalFileSystem subclasses Key: SPARK-30132 URL: https://issues.apache.org/jira/browse/SPARK-30132 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.0.0 Reporter: Sean R. Owen A few classes in our test code extend Hadoop's LocalFileSystem. Scala 2.13 returns a compile error here - not for the Spark code, but because the Hadoop code (it says) illegally overrides appendFile() with slightly different generic types in its return value. This code is valid Java, evidently, and the code actually doesn't define any generic types, so, I even wonder if it's a scalac bug. So far I don't see a workaround for this. This only affects the Hadoop 3.2 build, in that it comes up with respect to a method new in Hadoop 3. (There is actually another instance of a similar problem that affects Hadoop 2, but I can see a tiny hack workaround for it). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30125) Remove PostgreSQL dialect
[ https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988301#comment-16988301 ] Yuanjian Li commented on SPARK-30125: - [~aman_omer] Sorry, my PR was a little delayed by fixing related UTs, you can still help in reviewing. > Remove PostgreSQL dialect > - > > Key: SPARK-30125 > URL: https://issues.apache.org/jira/browse/SPARK-30125 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > As the discussion in > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], > we need to remove PostgreSQL dialect form code base for several reasons: > 1. The current approach makes the codebase complicated and hard to maintain. > 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. > > Curently we have 3 features under PostgreSQL dialect: > 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. > are also allowed as true string. > 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard > behavior), but return int in PostgreSQL > 3. SPARK-28395: `int / int` returns double in Spark, but returns int in > PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs
[ https://issues.apache.org/jira/browse/SPARK-30084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-30084: Assignee: Nicholas Chammas > Add docs showing how to automatically rebuild Python API docs > - > > Key: SPARK-30084 > URL: https://issues.apache.org/jira/browse/SPARK-30084 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 3.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > > `jekyll serve --watch` doesn't watch the API docs. That means you have to > kill and restart jekyll every time you update your API docs, just to see the > effect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs
[ https://issues.apache.org/jira/browse/SPARK-30084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-30084. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26719 [https://github.com/apache/spark/pull/26719] > Add docs showing how to automatically rebuild Python API docs > - > > Key: SPARK-30084 > URL: https://issues.apache.org/jira/browse/SPARK-30084 > Project: Spark > Issue Type: Improvement > Components: Build, Documentation >Affects Versions: 3.0.0 >Reporter: Nicholas Chammas >Assignee: Nicholas Chammas >Priority: Minor > Fix For: 3.0.0 > > > `jekyll serve --watch` doesn't watch the API docs. That means you have to > kill and restart jekyll every time you update your API docs, just to see the > effect. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30131) Add array_median function
[ https://issues.apache.org/jira/browse/SPARK-30131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988289#comment-16988289 ] Alexander Hagerf commented on SPARK-30131: -- Created PR for this issue: [https://github.com/apache/spark/pull/26762] > Add array_median function > - > > Key: SPARK-30131 > URL: https://issues.apache.org/jira/browse/SPARK-30131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Alexander Hagerf >Priority: Minor > Fix For: 3.0.0 > > > It is known that there isn't any exact median function in Spark SQL, and this > might be a difficult problem to solve efficiently. However, to find the > median for an array should be a simple task, and something that users can > utilize when collecting numeric values to a list or set. > > This can already be achieved by using sorting and choosing element, but can > get cumbersome and if a fully tested function is provided in the API, I think > it can solve some headache for many. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30131) Add array_median function
[ https://issues.apache.org/jira/browse/SPARK-30131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Hagerf updated SPARK-30131: - Description: It is known that there isn't any exact median function in Spark SQL, and this might be a difficult problem to solve efficiently. However, to find the median for an array should be a simple task, and something that users can utilize when collecting numeric values to a list or set. This can already be achieved by using sorting and choosing element, but can get cumbersome and if a fully tested function is provided in the API, I think it can solve some headache for many. was: It is known that there isn't any exact median function in Spark SQL, and this might be a difficult problem to solve efficiently. However, to find the median for an array should be a simple task, and something that users can utilize when collecting numeric values to a list or set. This can already be achieved by using sorting and choosing element, but can get cumbersome and if a fully tested function is provided in the API, I think it can solve some headache for many. > Add array_median function > - > > Key: SPARK-30131 > URL: https://issues.apache.org/jira/browse/SPARK-30131 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 2.4.4 >Reporter: Alexander Hagerf >Priority: Minor > Fix For: 3.0.0 > > > It is known that there isn't any exact median function in Spark SQL, and this > might be a difficult problem to solve efficiently. However, to find the > median for an array should be a simple task, and something that users can > utilize when collecting numeric values to a list or set. > This can already be achieved by using sorting and choosing element, but can > get cumbersome and if a fully tested function is provided in the API, I think > it can solve some headache for many. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30131) Add array_median function
Alexander Hagerf created SPARK-30131: Summary: Add array_median function Key: SPARK-30131 URL: https://issues.apache.org/jira/browse/SPARK-30131 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.4 Reporter: Alexander Hagerf Fix For: 3.0.0 It is known that there isn't any exact median function in Spark SQL, and this might be a difficult problem to solve efficiently. However, to find the median for an array should be a simple task, and something that users can utilize when collecting numeric values to a list or set. This can already be achieved by using sorting and choosing element, but can get cumbersome and if a fully tested function is provided in the API, I think it can solve some headache for many. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-29640) [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in Spark driver
[ https://issues.apache.org/jira/browse/SPARK-29640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Grove reopened SPARK-29640: Re-opening this as the issue came back and is ongoing. I am exploring other solutions and will post here if/when I find a solution that works. > [K8S] Intermittent "java.net.UnknownHostException: kubernetes.default.svc" in > Spark driver > -- > > Key: SPARK-29640 > URL: https://issues.apache.org/jira/browse/SPARK-29640 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.4 >Reporter: Andy Grove >Priority: Major > > We are running into intermittent DNS issues where the Spark driver fails to > resolve "kubernetes.default.svc" when trying to create executors. We are > running Spark 2.4.4 (with the patch for SPARK-28921) in cluster mode in EKS. > This happens approximately 10% of the time. > Here is the stack trace: > {code:java} > Exception in thread "main" org.apache.spark.SparkException: External > scheduler cannot be instantiated > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2794) > at org.apache.spark.SparkContext.(SparkContext.scala:493) > at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935) > at > org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926) > at com.rms.execution.test.SparkPiTask$.main(SparkPiTask.scala:36) > at com.rms.execution.test.SparkPiTask.main(SparkPiTask.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: > [get] for kind: [Pod] with name: > [wf-5-69674f15d0fc45-1571354060179-driver] in namespace: > [tenant-8-workflows] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:229) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:162) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:57) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator$$anonfun$1.apply(ExecutorPodsAllocator.scala:55) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.(ExecutorPodsAllocator.scala:55) > at > org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:89) > at > org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2788) > ... 20 more > Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324) > at java.net.InetAddress.getAllByName0(InetAddress.java:1277) > at java.net.InetAddress.getAllByName(InetAddress.java:1193) > at java.net.InetAddress.getAllByName(InetAddress.java:1127) > at okhttp3.Dns$1.lookup(Dns.java:39) > at >
[jira] [Commented] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
[ https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988168#comment-16988168 ] Ankit Raj Boudh commented on SPARK-30130: - i will check this issue > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions > > > Key: SPARK-30130 > URL: https://issues.apache.org/jira/browse/SPARK-30130 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Matt Boegner >Priority: Minor > > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions. > {code:java} > val df = spark.sql(""" > with a as (select 0 as test, count group by test) > select * from a > """) > df.show(){code} > This results in an error message like {color:#e01e5a}GROUP BY position 0 is > not in select list (valid range is [1, 2]){color} . > > However, this error does not appear in a traditional subselect format. For > example, this query executes correctly: > {code:java} > val df = spark.sql(""" > select * from (select 0 as test, count group by test) a > """) > df.show(){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29453) Improve tooltip information for SQL tab
[ https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-29453. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 26641 [https://github.com/apache/spark/pull/26641] > Improve tooltip information for SQL tab > --- > > Key: SPARK-29453 > URL: https://issues.apache.org/jira/browse/SPARK-29453 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Assignee: Ankit Raj Boudh >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29453) Improve tooltip information for SQL tab
[ https://issues.apache.org/jira/browse/SPARK-29453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-29453: Assignee: Ankit Raj Boudh > Improve tooltip information for SQL tab > --- > > Key: SPARK-29453 > URL: https://issues.apache.org/jira/browse/SPARK-29453 > Project: Spark > Issue Type: Sub-task > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Sandeep Katta >Assignee: Ankit Raj Boudh >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30118) ALTER VIEW QUERY does not work
[ https://issues.apache.org/jira/browse/SPARK-30118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988078#comment-16988078 ] John Zhuge commented on SPARK-30118: [~cltlfcjin] Thanks for the comment. Do you know which commit fixed the issue? > ALTER VIEW QUERY does not work > -- > > Key: SPARK-30118 > URL: https://issues.apache.org/jira/browse/SPARK-30118 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: John Zhuge >Priority: Major > > `ALTER VIEW AS` does not change view query. It leaves the view in a corrupted > state. > {code:sql} > spark-sql> CREATE VIEW jzhuge.v1 AS SELECT 'foo' foo1; > spark-sql> SHOW CREATE TABLE jzhuge.v1; > CREATE VIEW `jzhuge`.`v1`(foo1) AS > SELECT 'foo' foo1 > spark-sql> ALTER VIEW jzhuge.v1 AS SELECT 'foo' foo2; > spark-sql> SHOW CREATE TABLE jzhuge.v1; > CREATE VIEW `jzhuge`.`v1`(foo1) AS > SELECT 'foo' foo1 > spark-sql> TABLE jzhuge.v1; > Error in query: Attribute with name 'foo2' is not found in '(foo1)';; > SubqueryAlias `jzhuge`.`v1` > +- View (`jzhuge`.`v1`, [foo1#33]) >+- Project [foo AS foo1#34] > +- OneRowRelation > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
[ https://issues.apache.org/jira/browse/SPARK-30130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Boegner updated SPARK-30130: - Description: Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions. {code:java} val df = spark.sql(""" with a as (select 0 as test, count group by test) select * from a """) df.show(){code} This results in an error message like {color:#e01e5a}GROUP BY position 0 is not in select list (valid range is [1, 2]){color} . However, this error does not appear in a traditional subselect format. For example, this query executes correctly: {code:java} val df = spark.sql(""" select * from (select 0 as test, count group by test) a """) df.show(){code} was: Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions. val df = spark.sql(""" with a as (select 0 as test, count(*) group by test) select * from a """) df.show() This results in an error message like {color:#e01e5a}GROUP BY position 0 is not in select list (valid range is [1, 2]){color} . However, this error does not appear in a traditional subselect format. For example, this query executes correctly: val df = spark.sql(""" select * from (select 0 as test, count(*) group by test) a """) df.show() > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions > > > Key: SPARK-30130 > URL: https://issues.apache.org/jira/browse/SPARK-30130 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.4 >Reporter: Matt Boegner >Priority: Minor > > Hardcoded numeric values in common table expressions which utilize GROUP BY > are interpreted as ordinal positions. > {code:java} > val df = spark.sql(""" > with a as (select 0 as test, count group by test) > select * from a > """) > df.show(){code} > This results in an error message like {color:#e01e5a}GROUP BY position 0 is > not in select list (valid range is [1, 2]){color} . > > However, this error does not appear in a traditional subselect format. For > example, this query executes correctly: > {code:java} > val df = spark.sql(""" > select * from (select 0 as test, count group by test) a > """) > df.show(){code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30130) Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions
Matt Boegner created SPARK-30130: Summary: Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions Key: SPARK-30130 URL: https://issues.apache.org/jira/browse/SPARK-30130 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.4 Reporter: Matt Boegner Hardcoded numeric values in common table expressions which utilize GROUP BY are interpreted as ordinal positions. val df = spark.sql(""" with a as (select 0 as test, count(*) group by test) select * from a """) df.show() This results in an error message like {color:#e01e5a}GROUP BY position 0 is not in select list (valid range is [1, 2]){color} . However, this error does not appear in a traditional subselect format. For example, this query executes correctly: val df = spark.sql(""" select * from (select 0 as test, count(*) group by test) a """) df.show() -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30129) New auth engine does not keep client ID in TransportClient after auth
Marcelo Masiero Vanzin created SPARK-30129: -- Summary: New auth engine does not keep client ID in TransportClient after auth Key: SPARK-30129 URL: https://issues.apache.org/jira/browse/SPARK-30129 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.4, 3.0.0 Reporter: Marcelo Masiero Vanzin Found a little bug when working on a feature; when auth is on, it's expected that the {{TransportClient}} provides the authenticated ID of the client (generally the app ID), but the new auth engine is not setting that information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30125) Remove PostgreSQL dialect
[ https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988032#comment-16988032 ] Aman Omer commented on SPARK-30125: --- I would like to take this work. > Remove PostgreSQL dialect > - > > Key: SPARK-30125 > URL: https://issues.apache.org/jira/browse/SPARK-30125 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > As the discussion in > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], > we need to remove PostgreSQL dialect form code base for several reasons: > 1. The current approach makes the codebase complicated and hard to maintain. > 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. > > Curently we have 3 features under PostgreSQL dialect: > 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. > are also allowed as true string. > 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard > behavior), but return int in PostgreSQL > 3. SPARK-28395: `int / int` returns double in Spark, but returns int in > PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view
[ https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruslan Dautkhanov reopened SPARK-21488: --- > Make saveAsTable() and createOrReplaceTempView() return dataframe of created > table/ created view > > > Key: SPARK-21488 > URL: https://issues.apache.org/jira/browse/SPARK-21488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed > > It would be great to make saveAsTable() return dataframe of created table, > so you could pipe result further as for example > {code} > mv_table_df = (sqlc.sql(''' > SELECT ... > FROM > ''') > .write.format("parquet").mode("overwrite") > .saveAsTable('test.parquet_table') > .createOrReplaceTempView('mv_table') > ) > {code} > ... Above code returns now expectedly: > {noformat} > AttributeError: 'NoneType' object has no attribute 'createOrReplaceTempView' > {noformat} > If this is implemented, we can skip a step like > {code} > sqlc.sql('SELECT * FROM > test.parquet_table').createOrReplaceTempView('mv_table') > {code} > We have this pattern very frequently. > Further improvement can be made if createOrReplaceTempView also returns > dataframe object, so in one pipeline of functions > we can > - create an external table > - create a dataframe reference to this newly created for SparkSQL and as a > Spark variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21488) Make saveAsTable() and createOrReplaceTempView() return dataframe of created table/ created view
[ https://issues.apache.org/jira/browse/SPARK-21488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988011#comment-16988011 ] Ruslan Dautkhanov commented on SPARK-21488: --- [~zsxwing] any chance this can be added to Spark 3.0? I can try to create a PR for this.. many of our users are still reporting this as relevant as this would streamline their code in many places . We always have a good mix of Spark SQL and Spark API calls in many places, and this would be a huge win for code readability. > Make saveAsTable() and createOrReplaceTempView() return dataframe of created > table/ created view > > > Key: SPARK-21488 > URL: https://issues.apache.org/jira/browse/SPARK-21488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ruslan Dautkhanov >Priority: Minor > Labels: bulk-closed > > It would be great to make saveAsTable() return dataframe of created table, > so you could pipe result further as for example > {code} > mv_table_df = (sqlc.sql(''' > SELECT ... > FROM > ''') > .write.format("parquet").mode("overwrite") > .saveAsTable('test.parquet_table') > .createOrReplaceTempView('mv_table') > ) > {code} > ... Above code returns now expectedly: > {noformat} > AttributeError: 'NoneType' object has no attribute 'createOrReplaceTempView' > {noformat} > If this is implemented, we can skip a step like > {code} > sqlc.sql('SELECT * FROM > test.parquet_table').createOrReplaceTempView('mv_table') > {code} > We have this pattern very frequently. > Further improvement can be made if createOrReplaceTempView also returns > dataframe object, so in one pipeline of functions > we can > - create an external table > - create a dataframe reference to this newly created for SparkSQL and as a > Spark variable. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30126) sparkContext.addFile fails when file path contains spaces
[ https://issues.apache.org/jira/browse/SPARK-30126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988001#comment-16988001 ] Ankit Raj Boudh commented on SPARK-30126: - i will check this issue > sparkContext.addFile fails when file path contains spaces > - > > Key: SPARK-30126 > URL: https://issues.apache.org/jira/browse/SPARK-30126 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Jan >Priority: Minor > > When uploading a file to the spark context via the addFile function, an > exception is thrown when file path contains a space character. Escaping the > space with %20 or \\ or + doesn't change the result. > > to reproduce: > file_path = "/home/user/test dir/config.conf" > sparkContext.addFile(file_path) > > results in: > py4j.protocol.Py4JJavaError: An error occurred while calling o131.addFile. > : java.io.FileNotFoundException: File file:/home/user/test%20dir/config.conf > does not exist -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark
[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987999#comment-16987999 ] Ruslan Dautkhanov commented on SPARK-19842: --- >From the design document """ This alternative proposes to use the KEY_CONSTRAINTS catalog table when Spark upgrates to Hive 2.1. Therefore, this proposal will introduce a dependency on Hive metastore 2.1. """ It seems Spark 3.0 is moving towards Hive 2.1 which has FK support.. would it be possible to add FKs and related optimizations to Spark 3.0 too? Thanks! > Informational Referential Integrity Constraints Support in Spark > > > Key: SPARK-19842 > URL: https://issues.apache.org/jira/browse/SPARK-19842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.2.0 >Reporter: Ioana Delaney >Priority: Major > Attachments: InformationalRIConstraints.doc > > > *Informational Referential Integrity Constraints Support in Spark* > This work proposes support for _informational primary key_ and _foreign key > (referential integrity) constraints_ in Spark. The main purpose is to open up > an area of query optimization techniques that rely on referential integrity > constraints semantics. > An _informational_ or _statistical constraint_ is a constraint such as a > _unique_, _primary key_, _foreign key_, or _check constraint_, that can be > used by Spark to improve query performance. Informational constraints are not > enforced by the Spark SQL engine; rather, they are used by Catalyst to > optimize the query processing. They provide semantics information that allows > Catalyst to rewrite queries to eliminate joins, push down aggregates, remove > unnecessary Distinct operations, and perform a number of other optimizations. > Informational constraints are primarily targeted to applications that load > and analyze data that originated from a data warehouse. For such > applications, the conditions for a given constraint are known to be true, so > the constraint does not need to be enforced during data load operations. > The attached document covers constraint definition, metastore storage, > constraint validation, and maintenance. The document shows many examples of > query performance improvements that utilize referential integrity constraints > and can be implemented in Spark. > Link to the google doc: > [InformationalRIConstraints|https://docs.google.com/document/d/17r-cOqbKF7Px0xb9L7krKg2-RQB_gD2pxOmklm-ehsw/edit] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs
[ https://issues.apache.org/jira/browse/SPARK-30128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-30128: - Description: Following on to SPARK-29903 and similar issues (linked), there are options available to the DataFrameReader for certain source formats, but which are not exposed properly in the relevant APIs. These options include `timeZone` and `pathGlobFilter`. Instead of being noted under [the option() method|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.option], they should be implemented directly into load APIs that support them. was: Following on to SPARK-27990 and similar issues (linked), there are options available to the DataFrameReader for certain source formats, but which are not exposed properly in the relevant APIs. These options include `timeZone` and `pathGlobFilter`. > Promote remaining "hidden" PySpark DataFrameReader options to load APIs > --- > > Key: SPARK-30128 > URL: https://issues.apache.org/jira/browse/SPARK-30128 > Project: Spark > Issue Type: Improvement > Components: PySpark, SQL >Affects Versions: 2.4.4, 3.0.0 >Reporter: Nicholas Chammas >Priority: Minor > > Following on to SPARK-29903 and similar issues (linked), there are options > available to the DataFrameReader for certain source formats, but which are > not exposed properly in the relevant APIs. > These options include `timeZone` and `pathGlobFilter`. Instead of being noted > under [the option() > method|https://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.option], > they should be implemented directly into load APIs that support them. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30128) Promote remaining "hidden" PySpark DataFrameReader options to load APIs
Nicholas Chammas created SPARK-30128: Summary: Promote remaining "hidden" PySpark DataFrameReader options to load APIs Key: SPARK-30128 URL: https://issues.apache.org/jira/browse/SPARK-30128 Project: Spark Issue Type: Improvement Components: PySpark, SQL Affects Versions: 2.4.4, 3.0.0 Reporter: Nicholas Chammas Following on to SPARK-27990 and similar issues (linked), there are options available to the DataFrameReader for certain source formats, but which are not exposed properly in the relevant APIs. These options include `timeZone` and `pathGlobFilter`. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30127) UDF should work for case class like Dataset operations
Wenchen Fan created SPARK-30127: --- Summary: UDF should work for case class like Dataset operations Key: SPARK-30127 URL: https://issues.apache.org/jira/browse/SPARK-30127 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.0.0 Reporter: Wenchen Fan Currently, Spark UDF can only work on data types like java.lang.String, o.a.s.sql.Row, Seq[_], etc. This is inconvenient if you want to apply an operation on one column, and the column is struct type. You must access data from a Row object, instead of your domain object like Dataset operations. It will be great if UDF can work on types that are supported by Dataset, e.g. case classes. Note that, there are multiple ways to register a UDF, and it's only possible to support this feature if the UDF is registered using Scala API that provides type tag, e.g. `def udf[RT: TypeTag, A1: TypeTag](f: Function1[A1, RT])` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30126) sparkContext.addFile fails when file path contains spaces
Jan created SPARK-30126: --- Summary: sparkContext.addFile fails when file path contains spaces Key: SPARK-30126 URL: https://issues.apache.org/jira/browse/SPARK-30126 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 2.4.3 Reporter: Jan When uploading a file to the spark context via the addFile function, an exception is thrown when file path contains a space character. Escaping the space with %20 or \\ or + doesn't change the result. to reproduce: file_path = "/home/user/test dir/config.conf" sparkContext.addFile(file_path) results in: py4j.protocol.Py4JJavaError: An error occurred while calling o131.addFile. : java.io.FileNotFoundException: File file:/home/user/test%20dir/config.conf does not exist -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30125) Remove PostgreSQL dialect
[ https://issues.apache.org/jira/browse/SPARK-30125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanjian Li updated SPARK-30125: Summary: Remove PostgreSQL dialect (was: Remove PostgreSQL dialect to reduce codebase maintenance cost) > Remove PostgreSQL dialect > - > > Key: SPARK-30125 > URL: https://issues.apache.org/jira/browse/SPARK-30125 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuanjian Li >Priority: Major > > As the discussion in > [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], > we need to remove PostgreSQL dialect form code base for several reasons: > 1. The current approach makes the codebase complicated and hard to maintain. > 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. > > Curently we have 3 features under PostgreSQL dialect: > 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. > are also allowed as true string. > 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard > behavior), but return int in PostgreSQL > 3. SPARK-28395: `int / int` returns double in Spark, but returns int in > PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30125) Remove PostgreSQL dialect to reduce codebase maintenance cost
Yuanjian Li created SPARK-30125: --- Summary: Remove PostgreSQL dialect to reduce codebase maintenance cost Key: SPARK-30125 URL: https://issues.apache.org/jira/browse/SPARK-30125 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuanjian Li As the discussion in [http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-PostgreSQL-dialect-td28417.html], we need to remove PostgreSQL dialect form code base for several reasons: 1. The current approach makes the codebase complicated and hard to maintain. 2. Fully migrating PostgreSQL workloads to Spark SQL is not our focus for now. Curently we have 3 features under PostgreSQL dialect: 1. SPARK-27931: when casting string to boolean, `t`, `tr`, `tru`, `yes`, .. are also allowed as true string. 2. SPARK-29364: `date - date` returns interval in Spark (SQL standard behavior), but return int in PostgreSQL 3. SPARK-28395: `int / int` returns double in Spark, but returns int in PostgreSQL. (there is no standard) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30124) unnecessary persist in PythonMLLibAPI.scala
[ https://issues.apache.org/jira/browse/SPARK-30124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Omer updated SPARK-30124: -- Summary: unnecessary persist in PythonMLLibAPI.scala (was: Improper persist in PythonMLLibAPI.scala) > unnecessary persist in PythonMLLibAPI.scala > --- > > Key: SPARK-30124 > URL: https://issues.apache.org/jira/browse/SPARK-30124 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 3.0.0 >Reporter: Aman Omer >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30124) Improper persist in PythonMLLibAPI.scala
Aman Omer created SPARK-30124: - Summary: Improper persist in PythonMLLibAPI.scala Key: SPARK-30124 URL: https://issues.apache.org/jira/browse/SPARK-30124 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 3.0.0 Reporter: Aman Omer -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30123) PartitionPruning should consider more case
deshanxiao created SPARK-30123: -- Summary: PartitionPruning should consider more case Key: SPARK-30123 URL: https://issues.apache.org/jira/browse/SPARK-30123 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: deshanxiao If left has partitionScan and right has PruningFilter but hasBenefit is false. The right will never be added a SubQuery. {code:java} var partScan = getPartitionTableScan(l, left) if (partScan.isDefined && canPruneLeft(joinType) && hasPartitionPruningFilter(right)) { val hasBenefit = pruningHasBenefit(l, partScan.get, r, right) newLeft = insertPredicate(l, newLeft, r, right, rightKeys, hasBenefit) } else { partScan = getPartitionTableScan(r, right) if (partScan.isDefined && canPruneRight(joinType) && hasPartitionPruningFilter(left) ) { val hasBenefit = pruningHasBenefit(r, partScan.get, l, left) newRight = insertPredicate(r, newRight, l, left, leftKeys, hasBenefit) } } case _ => } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30092) Number of active tasks is negative in Live UI Executors page
[ https://issues.apache.org/jira/browse/SPARK-30092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987799#comment-16987799 ] shahid commented on SPARK-30092: [~zhongyu09]Do you have any steps for reproducing the issue? > Number of active tasks is negative in Live UI Executors page > > > Key: SPARK-30092 > URL: https://issues.apache.org/jira/browse/SPARK-30092 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.4.1 > Environment: Hadoop version: 2.7.3 > ResourceManager version: 2.7.3 >Reporter: ZhongYu >Priority: Major > Attachments: wx20191202-102...@2x.png > > > The number of active tasks is negative in Live UI Executors page when there > is executor lost and task failure. I am using spark on yarn which built on > AWS spot instances. When yarn work lost, there is a large probability to > become negative active tasks in Spark Live UI. > I saw related tickets below and resolved in earlier version of Spark. But > Same things happened again in Spark 2.4.1. See attachment. > https://issues.apache.org/jira/browse/SPARK-8560 > https://issues.apache.org/jira/browse/SPARK-10141 > https://issues.apache.org/jira/browse/SPARK-19356 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30041) Add Codegen Stage Id to Stage DAG visualization in Web UI
[ https://issues.apache.org/jira/browse/SPARK-30041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-30041: Attachment: (was: Snippet_StagesDags_with_CodegenId.png) > Add Codegen Stage Id to Stage DAG visualization in Web UI > - > > Key: SPARK-30041 > URL: https://issues.apache.org/jira/browse/SPARK-30041 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > Attachments: Snippet_StagesDags_with_CodegenId _annotated.png > > > SPARK-29894 provides information on the Codegen Stage Id in WEBUI for SQL > Plan graphs. Similarly, this proposes to add Codegen Stage Id in the DAG > visualization for Stage execution. DAGs for Stage execution are available in > the WEBUI under the Jobs and Stages tabs. > This is proposed as an aid for drill-down analysis of complex SQL statement > execution, as it is not always easy to match parts of the SQL Plan graph with > the corresponding Stage DAG execution graph. Adding Codegen Stage Id for > WholeStageCodegen operations makes this task easier. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30041) Add Codegen Stage Id to Stage DAG visualization in Web UI
[ https://issues.apache.org/jira/browse/SPARK-30041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luca Canali updated SPARK-30041: Attachment: Snippet_StagesDags_with_CodegenId _annotated.png > Add Codegen Stage Id to Stage DAG visualization in Web UI > - > > Key: SPARK-30041 > URL: https://issues.apache.org/jira/browse/SPARK-30041 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Luca Canali >Priority: Minor > Attachments: Snippet_StagesDags_with_CodegenId _annotated.png, > Snippet_StagesDags_with_CodegenId.png > > > SPARK-29894 provides information on the Codegen Stage Id in WEBUI for SQL > Plan graphs. Similarly, this proposes to add Codegen Stage Id in the DAG > visualization for Stage execution. DAGs for Stage execution are available in > the WEBUI under the Jobs and Stages tabs. > This is proposed as an aid for drill-down analysis of complex SQL statement > execution, as it is not always easy to match parts of the SQL Plan graph with > the corresponding Stage DAG execution graph. Adding Codegen Stage Id for > WholeStageCodegen operations makes this task easier. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30122) Allow setting serviceAccountName for executor pods
Juho Mäkinen created SPARK-30122: Summary: Allow setting serviceAccountName for executor pods Key: SPARK-30122 URL: https://issues.apache.org/jira/browse/SPARK-30122 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.4 Reporter: Juho Mäkinen Currently it doesn't seem to be possible to have Spark Driver set the serviceAccountName for executor pods it launches. There is a " spark.kubernetes.authenticate.driver.serviceAccountName" property so naturally one can expect to have a similar "spark.kubernetes.authenticate.executor.serviceAccountName" property, but such doesn't exists. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-29591) Support data insertion in a different order if you wish or even omit some columns in spark sql also like postgresql
[ https://issues.apache.org/jira/browse/SPARK-29591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Omer updated SPARK-29591: -- Comment: was deleted (was: Thanks for confirming. I will start working on this.) > Support data insertion in a different order if you wish or even omit some > columns in spark sql also like postgresql > --- > > Key: SPARK-29591 > URL: https://issues.apache.org/jira/browse/SPARK-29591 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.4 >Reporter: jobit mathew >Priority: Major > > Support data insertion in a different order if you wish or even omit some > columns in spark sql also like postgre sql. > *In postgre sql* > {code:java} > CREATE TABLE weather ( > city varchar(80), > temp_lo int, – low temperature > temp_hi int, – high temperature > prcp real, – precipitation > date date > ); > {code} > *You can list the columns in a different order if you wish or even omit some > columns,* > {code:java} > INSERT INTO weather (date, city, temp_hi, temp_lo) > VALUES ('1994-11-29', 'Hayward', 54, 37); > {code} > > *Spark SQL* > But in spark sql is not allowing to insert data in different order or omit > any column.Better to support this as it can save time if we can not predict > any specific column value or if some value is fixed always. > {code:java} > create table jobit(id int,name string); > > insert into jobit values(1,"Ankit"); > Time taken: 0.548 seconds > spark-sql> *insert into jobit (id) values(1);* > *Error in query:* > mismatched input 'id' expecting \{'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', > 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19) > == SQL == > insert into jobit (id) values(1) > ---^^^ > spark-sql> *insert into jobit (name,id) values("Ankit",1);* > *Error in query:* > mismatched input 'name' expecting \{'(', 'SELECT', 'FROM', 'VALUES', > 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 19) > == SQL == > insert into jobit (name,id) values("Ankit",1) > ---^^^ > spark-sql> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-30121) Miss default memory setting for sbt usage
Kent Yao created SPARK-30121: Summary: Miss default memory setting for sbt usage Key: SPARK-30121 URL: https://issues.apache.org/jira/browse/SPARK-30121 Project: Spark Issue Type: Bug Components: Build Affects Versions: 2.4.4, 3.0.0 Reporter: Kent Yao {code:java} ``` -mem set memory options (default: , which is -Xms2048m -Xmx2048m -XX:ReservedCodeCacheSize=256m) ``` {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29667) implicitly convert mismatched datatypes on right side of "IN" operator
[ https://issues.apache.org/jira/browse/SPARK-29667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987712#comment-16987712 ] Marco Gaido commented on SPARK-29667: - I can agree more with you [~hyukjin.kwon]. I think that having different coercion rules for the two types of IN is very confusing. It'd be great for such things to be consistent among all the framework in order to avoid "surprises" for users IMHO. > implicitly convert mismatched datatypes on right side of "IN" operator > -- > > Key: SPARK-29667 > URL: https://issues.apache.org/jira/browse/SPARK-29667 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.3 >Reporter: Jessie Lin >Priority: Minor > > Ran into error on this sql > Mismatched columns: > {code} > [(a.`id`:decimal(28,0), db1.table1.`id`:decimal(18,0))] > {code} > the sql and clause > {code} > AND a.id in (select id from db1.table1 where col1 = 1 group by id) > {code} > Once I cast {{decimal(18,0)}} to {{decimal(28,0)}} explicitly above, the sql > ran just fine. Can the sql engine cast implicitly in this case? > > > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29914) ML models append metadata in `transform`/`transformSchema`
[ https://issues.apache.org/jira/browse/SPARK-29914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-29914. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 26547 [https://github.com/apache/spark/pull/26547] > ML models append metadata in `transform`/`transformSchema` > -- > > Key: SPARK-29914 > URL: https://issues.apache.org/jira/browse/SPARK-29914 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Fix For: 3.1.0 > > > There are many impls (like > `Binarizer`/`Bucketizer`/`VectorAssembler`/`OneHotEncoder`/`FeatureHasher`/`HashingTF`/`VectorSlicer`/...) > in `.ml` that append appropriate metadata in `transform`/`transformSchema` > method. > However there are also many impls return no metadata in transformation, even > some metadata like `vector.size`/`numAttrs`/`attrs` can be ealily inferred. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-29914) ML models append metadata in `transform`/`transformSchema`
[ https://issues.apache.org/jira/browse/SPARK-29914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng reassigned SPARK-29914: Assignee: zhengruifeng > ML models append metadata in `transform`/`transformSchema` > -- > > Key: SPARK-29914 > URL: https://issues.apache.org/jira/browse/SPARK-29914 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.0.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > > There are many impls (like > `Binarizer`/`Bucketizer`/`VectorAssembler`/`OneHotEncoder`/`FeatureHasher`/`HashingTF`/`VectorSlicer`/...) > in `.ml` that append appropriate metadata in `transform`/`transformSchema` > method. > However there are also many impls return no metadata in transformation, even > some metadata like `vector.size`/`numAttrs`/`attrs` can be ealily inferred. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18886) Delay scheduling should not delay some executors indefinitely if one task is scheduled before delay timeout
[ https://issues.apache.org/jira/browse/SPARK-18886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16987633#comment-16987633 ] Nicholas Brett Marcott commented on SPARK-18886: Thanks for mentioning the PRs here. My proposed solution in the second [PR mentioned above|https://github.com/apache/spark/pull/26696] is what I believe Kay said was ideal in comments of this [PR|https://github.com/apache/spark/pull/9433], but seemed to think was impractical. *The proposed solution:* Currently the time window that locality wait times are measuring is the time since the last task launched for a TSM. The proposed change is to instead measure the time since this TSM's available slots were fully utilized. The number of available slots for a TSM can be determined by dividing all slots among the TSMs according to the scheduling policy (FIFO vs FAIR). *Other possible solutions and their issues:* # Never reset timer: delay scheduling would likely only work on first wave* # Per slot timer: delay scheduling should apply per task/taskset, otherwise, timers started by one taskset could cause delay scheduling to be ignored for the next taskset, which might lead you to try approach #3 # Per slot per stage timer: tasks can be starved by being offered unique slots over a period of time. Possibly a taskset or other job that doesn't care about locality would use those resources. Also too many timers/bookkeeping # Per task timer: you still need a way to distinguish between when a task is waiting for a slot to become available vs it has them available but is not utilizing them (which is what this PR does). To do this right seems to be this PR + more timers. *wave = one round of running as many tasks as there are available slots for a taskset. imagine you have 2 slots and 10 tasks. it would take 10 / 2 = 5 waves to complete the taskset > Delay scheduling should not delay some executors indefinitely if one task is > scheduled before delay timeout > --- > > Key: SPARK-18886 > URL: https://issues.apache.org/jira/browse/SPARK-18886 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.1.0 >Reporter: Imran Rashid >Priority: Major > > Delay scheduling can introduce an unbounded delay and underutilization of > cluster resources under the following circumstances: > 1. Tasks have locality preferences for a subset of available resources > 2. Tasks finish in less time than the delay scheduling. > Instead of having *one* delay to wait for resources with better locality, > spark waits indefinitely. > As an example, consider a cluster with 100 executors, and a taskset with 500 > tasks. Say all tasks have a preference for one executor, which is by itself > on one host. Given the default locality wait of 3s per level, we end up with > a 6s delay till we schedule on other hosts (process wait + host wait). > If each task takes 5 seconds (under the 6 second delay), then _all 500_ tasks > get scheduled on _only one_ executor. This means you're only using a 1% of > your cluster, and you get a ~100x slowdown. You'd actually be better off if > tasks took 7 seconds. > *WORKAROUNDS*: > (1) You can change the locality wait times so that it is shorter than the > task execution time. You need to take into account the sum of all wait times > to use all the resources on your cluster. For example, if you have resources > on different racks, this will include the sum of > "spark.locality.wait.process" + "spark.locality.wait.node" + > "spark.locality.wait.rack". Those each default to "3s". The simplest way to > be to set "spark.locality.wait.process" to your desired wait interval, and > set both "spark.locality.wait.node" and "spark.locality.wait.rack" to "0". > For example, if your tasks take ~3 seconds on average, you might set > "spark.locality.wait.process" to "1s". *NOTE*: due to SPARK-18967, avoid > setting the {{spark.locality.wait=0}} -- instead, use > {{spark.locality.wait=1ms}}. > Note that this workaround isn't perfect --with less delay scheduling, you may > not get as good resource locality. After this issue is fixed, you'd most > likely want to undo these configuration changes. > (2) The worst case here will only happen if your tasks have extreme skew in > their locality preferences. Users may be able to modify their job to > controlling the distribution of the original input data. > (2a) A shuffle may end up with very skewed locality preferences, especially > if you do a repartition starting from a small number of partitions. (Shuffle > locality preference is assigned if any node has more than 20% of the shuffle > input data -- by chance, you may have one node just above that