[jira] [Commented] (SPARK-27055) Update Structured Streaming documentation because of DSv2 changes
[ https://issues.apache.org/jira/browse/SPARK-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786451#comment-16786451 ] Sandeep Katta commented on SPARK-27055: --- [~gsomogyi] [~cloud_fan] from the code changes of [SPARK-26956|https://issues.apache.org/jira/browse/SPARK-26956] Append and complete mode are supported, only the update mode is removed. So from the programing guide only this mode should be marked as unsupported. isn't it ? > Update Structured Streaming documentation because of DSv2 changes > - > > Key: SPARK-27055 > URL: https://issues.apache.org/jira/browse/SPARK-27055 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Minor > > Since SPARK-26956 has been merged the Structured Streaming documentation has > to be updated also to reflect the changes. > https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27069) Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232
[ https://issues.apache.org/jira/browse/SPARK-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] TAESUK KIM updated SPARK-27069: --- Summary: Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232 (was: Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)) > Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at > java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232 > > > Key: SPARK-27069 > URL: https://issues.apache.org/jira/browse/SPARK-27069 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.3.2 > Environment: Below is my environment > DataSet > # Document : about 100,000,000 --> 10,000,000 --> 1,000,000(All fail) > # Word : about 3553918(can't change) > Spark environment > # executor-memory,driver-memory : 18G --> 32g --> 64 --> 128g(all fail) > # executor-core,driver-core : 3 > # spark.serializer : default and > org.apache.spark.serializer.KryoSerializer(both fail) > # spark.executor.memoryOverhead : 18G --> 36G fail > Jave version : 1.8.0_191 (Oracle Corporation) > >Reporter: TAESUK KIM >Priority: Major > > I trained LDA(feature dimension : 100, iteration: 100 or 50, Distributed > version , ml ) using Spark 2.3.2(emr-5.18.0) . > After that I want to transform new DataSet by using that model. But when I > transform new data, I alway get error related memory error. > I changed data size from x 0.1 , to x 0.01. But always get memory > error(java.lang.OutOfMemoryError at > java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) > > That hugeCapacity error(overflow) is happened when size of array is over > Integer.MAX_VALUE - 8. But I changed data size to small size. I can't find > why this error is happened. > And I want to change serializer to KryoSerializer. But I found > this org.apache.spark.util.ClosureCleaner$.ensureSerializable always call > org.apache.spark.serializer.JavaSerializationStream even though I register > KryoClasses > > Is there any thing I can do ? > > Below is code > > {{val countvModel = CountVectorizerModel.load("s3://~/") }} > {{val ldaModel = DistributedLDAModel.load("s3://~/") }} > {{val transformeddata=countvModel.transform(inputData).select("productid", > "itemid", "ptkString", "features") var featureldaDF = > ldaModel.transform(transformeddata).select("productid", "itemid", > "topicDistribution", "ptkString").toDF("productid", "itemid", "features", > "ptkString") featureldaDF=featureldaDF.persist //this is 328 line }} > > Other testing > # Java option : UseParallelGC , UseG1GC (all fail) > Below is log > {{19/03/05 20:59:03 ERROR ApplicationMaster: User class threw exception: > java.lang.OutOfMemoryError java.lang.OutOfMemoryError at > java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at > java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at > java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at > org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41) > at > java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) > at > java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at > java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) > at > org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:342) > at > org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159) at > org.apache.spark.SparkContext.clean(SparkContext.scala:2299) at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at > org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:84
[jira] [Commented] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions
[ https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786454#comment-16786454 ] Udbhav Agrawal commented on SPARK-23872: Hi [~chelsa] , you have to close the spark_1 session to clear the active sparkcontext, otherwise new configuration won't be loaded. So call spark_1.stop() > Can not connect to another metastore uri using two Spark sessions > - > > Key: SPARK-23872 > URL: https://issues.apache.org/jira/browse/SPARK-23872 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 > Environment: OS :CentOS release 6.8 (Final) > JAVA : build 1.8.0_101-b13 > SPARK : 2.3.0 > >Reporter: Chan Min Park >Priority: Major > > In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session > metasore information is used when the second session is run > $## Run Source Code $## > val spark_1 = SparkSession.builder() > .enableHiveSupport() > .config("hive.metastore.uris", "thrift://HOST_A:9083") > .getOrCreate() > spark_1.sql("SELECT A_FIELD FROM TABLE_A").show() > SparkSession.clearActiveSession() > SparkSession.clearDefaultSession() > val spark_2 = SparkSession.builder() > .enableHiveSupport() > .config("hive.metastore.uris", "thrift://HOST_B:9083") > .getOrCreate() > spark_2.sql("SELECT B_FIELD FROM TABLE_B").show() > $## Run info result in spark 2.1.0 $## > .. > INFO metastore: Trying to connect to metastore with URI > thrift://*{color:#d04437}HOST_A{color}*:9083 > .. > INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took > 0.807905 s > + + > / A_FIELD / > --- > / A / > > .. > INFO metastore: Trying to connect to metastore with URI > thrift://*{color:#d04437}HOST_B{color}*:9083 > INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took > 0.807905 s > + + > / B_FIELD / > --- > / B / > > .. > $## Run info result in spark 2.3.0 $## > .. > INFO metastore: Trying to connect to metastore with URI > thrift://*{color:#d04437}HOST_A{color}*:9083 > .. > INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took > 0.807905 s > + + > / A_FIELD / > --- > / A / > > .. > INFO metastore: Trying to connect to metastore with URI > thrift://*{color:#d04437}HOST_A{color}*:9083 > .. > Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or > view not found: `default`.`TABLE_B`; line 1 pos 19; > ** -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786456#comment-16786456 ] peay commented on SPARK-24624: -- I mean regular aggregation functions and Pandas UDF aggregation functions (i.e., expression of the form {{.groupBy(key).agg(F.avg("col"), pd_agg_udf("col2"))}}). {{master}} seems to still require aggregation expressions to either all be regular agg. functions or all be Pandas UDF: [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L467], right? > Can not mix vectorized and non-vectorized UDFs > -- > > Key: SPARK-24624 > URL: https://issues.apache.org/jira/browse/SPARK-24624 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Xiao Li >Assignee: Li Jin >Priority: Major > Fix For: 2.4.0 > > > In the current impl, we have the limitation: users are unable to mix > vectorized and non-vectorized UDFs in same Project. This becomes worse since > our optimizer could combine continuous Projects into a single one. For > example, > {code} > applied_df = df.withColumn('regular', my_regular_udf('total', > 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty')) > {code} > Returns the following error. > {code} > IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs > java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized > UDFs > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) > at > org.apache.spark.sql.execution.QueryExecution.executedPla
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786418#comment-16786418 ] Jiaxin Shan edited comment on SPARK-26742 at 3/7/19 6:05 AM: - Got some failures. I think it's not related to kubernetes cluster version but some other configuration. Do you have an idea? [~shaneknapp] [~skonto] {code:java} [INFO] — exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @ spark-kubernetes-integration-tests_2.12 — Must specify a Spark tarball to build Docker images against with --spark-tgz. [INFO] [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 4.754 s] [INFO] Spark Project Tags . SUCCESS [ 3.560 s] [INFO] Spark Project Local DB . SUCCESS [ 3.040 s] [INFO] Spark Project Networking ... SUCCESS [ 4.559 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 2.559 s] [INFO] Spark Project Unsafe ... SUCCESS [ 3.040 s] [INFO] Spark Project Launcher . SUCCESS [ 3.807 s] [INFO] Spark Project Core . SUCCESS [ 32.979 s] [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 2.045 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:00 min [INFO] Finished at: 2019-03-06T21:58:26-08:00 [INFO] [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on project spark-kubernetes-integration-tests_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on project spark-kubernetes-integration-tests_2.12: Command execution failed. at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288) at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356) Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed. at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:276) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786418#comment-16786418 ] Jiaxin Shan commented on SPARK-26742: - Got some failures. I think it's not related to kubernetes cluster version but some other configuration. Do you have an idea? [~shaneknapp] [~skonto] ``` [INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @ spark-kubernetes-integration-tests_2.12 --- Must specify a Spark tarball to build Docker images against with --spark-tgz. [INFO] [INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT: [INFO] [INFO] Spark Project Parent POM ... SUCCESS [ 4.754 s] [INFO] Spark Project Tags . SUCCESS [ 3.560 s] [INFO] Spark Project Local DB . SUCCESS [ 3.040 s] [INFO] Spark Project Networking ... SUCCESS [ 4.559 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 2.559 s] [INFO] Spark Project Unsafe ... SUCCESS [ 3.040 s] [INFO] Spark Project Launcher . SUCCESS [ 3.807 s] [INFO] Spark Project Core . SUCCESS [ 32.979 s] [INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 2.045 s] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 01:00 min [INFO] Finished at: 2019-03-06T21:58:26-08:00 [INFO] [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on project spark-kubernetes-integration-tests_2.12: Command execution failed.: Process exited with an error: 1 (Exit value: 1) -> [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on project spark-kubernetes-integration-tests_2.12: Command execution failed. at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:215) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305) at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192) at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105) at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956) at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288) at org.apache.maven.cli.MavenCli.main (MavenCli.java:192) at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke (Method.java:498) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356) Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution failed. at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:276) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:137) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:210) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:156) at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:148) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:117) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:81) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:56) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:128) at org.apache.maven.DefaultMaven
[jira] [Created] (SPARK-27081) Support launching executors in existed Pods
Klaus Ma created SPARK-27081: Summary: Support launching executors in existed Pods Key: SPARK-27081 URL: https://issues.apache.org/jira/browse/SPARK-27081 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 2.4.0 Reporter: Klaus Ma Currently, spark-submit (Kuberentes) creates Pods on-demand to launch executors. But in our case/enhancement, those Pods included Volumes are ready there. So we'd like to have an option for spark-submit (Kuberetens) to launch executors in existed Pods. /cc @liyinan926 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.
[ https://issues.apache.org/jira/browse/SPARK-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27080: Assignee: Apache Spark > Read parquet file with merging metastore schema should compare schema field > in uniform case. > > > Key: SPARK-27080 > URL: https://issues.apache.org/jira/browse/SPARK-27080 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.3.3, 2.4.0 >Reporter: BoMeng >Assignee: Apache Spark >Priority: Major > > In our product environment, when we upgrade spark from version 2.1 to 2.3, > the job failed with an exception as below: > ---ERROR stack trace – > Exception occur when running Job, > org.apache.spark.SparkException: Detected conflicting schemas when merging > the schema obtained from the Hive > Metastore with the one inferred from the file format. Metastore schema: > { > "type" : "struct", > "fields" : [ > .. > } > Inferred schema: > { > "type" : "struct", > "fields" : [ > .. > } > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148) > at > org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195) > at > org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226) > at > org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215) > at > org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180) > > The followin
[jira] [Assigned] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.
[ https://issues.apache.org/jira/browse/SPARK-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27080: Assignee: (was: Apache Spark) > Read parquet file with merging metastore schema should compare schema field > in uniform case. > > > Key: SPARK-27080 > URL: https://issues.apache.org/jira/browse/SPARK-27080 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2, 2.3.3, 2.4.0 >Reporter: BoMeng >Priority: Major > > In our product environment, when we upgrade spark from version 2.1 to 2.3, > the job failed with an exception as below: > ---ERROR stack trace – > Exception occur when running Job, > org.apache.spark.SparkException: Detected conflicting schemas when merging > the schema obtained from the Hive > Metastore with the one inferred from the file format. Metastore schema: > { > "type" : "struct", > "fields" : [ > .. > } > Inferred schema: > { > "type" : "struct", > "fields" : [ > .. > } > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156) > at scala.Option.getOrElse(Option.scala:121) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54) > at > org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148) > at > org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195) > at > org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226) > at > org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) > at > org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215) > at > org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180) > > The following case can trigger the ex
[jira] [Created] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.
BoMeng created SPARK-27080: -- Summary: Read parquet file with merging metastore schema should compare schema field in uniform case. Key: SPARK-27080 URL: https://issues.apache.org/jira/browse/SPARK-27080 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0, 2.3.3, 2.3.2 Reporter: BoMeng In our product environment, when we upgrade spark from version 2.1 to 2.3, the job failed with an exception as below: ---ERROR stack trace – Exception occur when running Job, org.apache.spark.SparkException: Detected conflicting schemas when merging the schema obtained from the Hive Metastore with the one inferred from the file format. Metastore schema: { "type" : "struct", "fields" : [ .. } Inferred schema: { "type" : "struct", "fields" : [ .. } at org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156) at org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148) at org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54) at org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148) at org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195) at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226) at org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304) at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286) at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215) at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180) The following case can trigger the exception, so we think it's a bug in spark2.3 {code:java} // Parquet schema is subset of metaStore schema and has uppercase field name assertResult( StructType(Seq( StructField("UPPERCase", DoubleType, nullable = true), StructField("lowerCase", BinaryType, nullable = true { HiveMetastoreCatalog.mergeWithMetastoreSchema( StructType(Seq( StructField("UPPERCase", DoubleType, nullable = true), StructField("lowerCa
[jira] [Commented] (SPARK-26879) Inconsistency in default column names for functions like inline and stack
[ https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786401#comment-16786401 ] Chakravarthi commented on SPARK-26879: -- Thanks for reporting. [~jashgala] I would like to work on this issue. > Inconsistency in default column names for functions like inline and stack > - > > Key: SPARK-26879 > URL: https://issues.apache.org/jira/browse/SPARK-26879 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Jash Gala >Priority: Minor > > In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. > 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed > columns). > {code:title=spark-shell|borderStyle=solid} > scala> spark.sql("SELECT stack(2, 1, 2, 3)").show > +++ > |col0|col1| > +++ > | 1| 2| > | 3|null| > +++ > scala> spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, > 'b')))").show > +++ > |col1|col2| > +++ > | 1| a| > | 2| b| > +++ > {code} > This feels like an issue with consistency. As discussed on [PR > #23748|https://github.com/apache/spark/pull/23748], it might be a good idea > to standardize this to something specific (like zero-based indexing) for > these and other similar functions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26604) Register channel for stream request
[ https://issues.apache.org/jira/browse/SPARK-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786397#comment-16786397 ] Felix Cheung commented on SPARK-26604: -- could we backport this to branch-2.4? > Register channel for stream request > --- > > Key: SPARK-26604 > URL: https://issues.apache.org/jira/browse/SPARK-26604 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Minor > Fix For: 3.0.0 > > > Now in {{TransportRequestHandler.processStreamRequest}}, when a stream > request is processed, the stream id is not registered with the current > channel in stream manager. It should do that so in case of that the channel > gets terminated we can remove associated streams from stream requests too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26868) Duplicate error message for implicit cartesian product in verbose explain
[ https://issues.apache.org/jira/browse/SPARK-26868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786381#comment-16786381 ] Takeshi Yamamuro commented on SPARK-26868: -- Yea, you can do. > Duplicate error message for implicit cartesian product in verbose explain > - > > Key: SPARK-26868 > URL: https://issues.apache.org/jira/browse/SPARK-26868 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > Super trivial though, I just report this just in case (I think it would be > nice if we could print this error message in a cleaner way): > {code:java} > scala> Seq(1).toDF("id").write.saveAsTable("t1") > scala> Seq(1).toDF("id").write.saveAsTable("t2") > scala> sql("SELECT * FROM t1 JOIN t2").explain(true) > == Parsed Logical Plan == > 'Project [*] > +- 'Join Inner >:- 'UnresolvedRelation `t1` >+- 'UnresolvedRelation `t2` > == Analyzed Logical Plan == > id: int, id: int > Project [id#14, id#15] > +- Join Inner >:- SubqueryAlias `default`.`t1` >: +- Relation[id#14] parquet >+- SubqueryAlias `default`.`t2` > +- Relation[id#15] parquet > == Optimized Logical Plan == > org.apache.spark.sql.AnalysisException: Detected implicit cartesian product > for INNER join between logical plans > Relation[id#14] parquet > and > Relation[id#15] parquet > Join condition is missing or trivial. > Either: use the CROSS JOIN syntax to allow cartesian products between these > relations, or: enable implicit cartesian products by setting the configuration > variable spark.sql.crossJoin.enabled=true; > == Physical Plan == > org.apache.spark.sql.AnalysisException: Detected implicit cartesian product > for INNER join between logical plans > Relation[id#14] parquet > and > Relation[id#15] parquet > Join condition is missing or trivial. > Either: use the CROSS JOIN syntax to allow cartesian products between these > relations, or: enable implicit cartesian products by setting the configuration > variable spark.sql.crossJoin.enabled=true; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26868) Duplicate error message for implicit cartesian product in verbose explain
[ https://issues.apache.org/jira/browse/SPARK-26868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786362#comment-16786362 ] nivedita singh commented on SPARK-26868: If no one is working on it, can I work on this? > Duplicate error message for implicit cartesian product in verbose explain > - > > Key: SPARK-26868 > URL: https://issues.apache.org/jira/browse/SPARK-26868 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Takeshi Yamamuro >Priority: Trivial > > Super trivial though, I just report this just in case (I think it would be > nice if we could print this error message in a cleaner way): > {code:java} > scala> Seq(1).toDF("id").write.saveAsTable("t1") > scala> Seq(1).toDF("id").write.saveAsTable("t2") > scala> sql("SELECT * FROM t1 JOIN t2").explain(true) > == Parsed Logical Plan == > 'Project [*] > +- 'Join Inner >:- 'UnresolvedRelation `t1` >+- 'UnresolvedRelation `t2` > == Analyzed Logical Plan == > id: int, id: int > Project [id#14, id#15] > +- Join Inner >:- SubqueryAlias `default`.`t1` >: +- Relation[id#14] parquet >+- SubqueryAlias `default`.`t2` > +- Relation[id#15] parquet > == Optimized Logical Plan == > org.apache.spark.sql.AnalysisException: Detected implicit cartesian product > for INNER join between logical plans > Relation[id#14] parquet > and > Relation[id#15] parquet > Join condition is missing or trivial. > Either: use the CROSS JOIN syntax to allow cartesian products between these > relations, or: enable implicit cartesian products by setting the configuration > variable spark.sql.crossJoin.enabled=true; > == Physical Plan == > org.apache.spark.sql.AnalysisException: Detected implicit cartesian product > for INNER join between logical plans > Relation[id#14] parquet > and > Relation[id#15] parquet > Join condition is missing or trivial. > Either: use the CROSS JOIN syntax to allow cartesian products between these > relations, or: enable implicit cartesian products by setting the configuration > variable spark.sql.crossJoin.enabled=true; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27079) Fix typo & Remove useless imports
[ https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27079: Assignee: (was: Apache Spark) > Fix typo & Remove useless imports > - > > Key: SPARK-27079 > URL: https://issues.apache.org/jira/browse/SPARK-27079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: EdisonWang >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27079) Fix typo & Remove useless imports
[ https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27079: Assignee: Apache Spark > Fix typo & Remove useless imports > - > > Key: SPARK-27079 > URL: https://issues.apache.org/jira/browse/SPARK-27079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: EdisonWang >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27079) Fix typo & Remove useless imports
[ https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786358#comment-16786358 ] Sharanabasappa G Keriwaddi commented on SPARK-27079: Could you please share more details on this issue with specific instances you observed. > Fix typo & Remove useless imports > - > > Key: SPARK-27079 > URL: https://issues.apache.org/jira/browse/SPARK-27079 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: EdisonWang >Priority: Minor > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27079) Fix typo & Remove useless imports
EdisonWang created SPARK-27079: -- Summary: Fix typo & Remove useless imports Key: SPARK-27079 URL: https://issues.apache.org/jira/browse/SPARK-27079 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.0 Reporter: EdisonWang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27049) Support handling partition values in the abstraction of file source V2
[ https://issues.apache.org/jira/browse/SPARK-27049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27049: --- Assignee: Gengliang Wang > Support handling partition values in the abstraction of file source V2 > -- > > Key: SPARK-27049 > URL: https://issues.apache.org/jira/browse/SPARK-27049 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > > In FileFormat, the method buildReaderWithPartitionValues appends the > partition values to the end of the result of buildReader, so that for data > sources like CSV/JSON/AVRO only need to implement buildReader to read a > single file without taking care of partition values. > This PR proposes to support handling partition values in file source v2 > abstraction by > 1. Have two methods `buildReader` and `buildReaderWithPartitionValues` in > FilePartitionReaderFactory, which have exactly the same meaning as they are > in `FileFormat` > 2. Rename `buildColumnarReader` as `buildColumnarReaderWithPartitionValues` > to make the naming consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27049) Support handling partition values in the abstraction of file source V2
[ https://issues.apache.org/jira/browse/SPARK-27049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27049. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23987 [https://github.com/apache/spark/pull/23987] > Support handling partition values in the abstraction of file source V2 > -- > > Key: SPARK-27049 > URL: https://issues.apache.org/jira/browse/SPARK-27049 > Project: Spark > Issue Type: Task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.0.0 > > > In FileFormat, the method buildReaderWithPartitionValues appends the > partition values to the end of the result of buildReader, so that for data > sources like CSV/JSON/AVRO only need to implement buildReader to read a > single file without taking care of partition values. > This PR proposes to support handling partition values in file source v2 > abstraction by > 1. Have two methods `buildReader` and `buildReaderWithPartitionValues` in > FilePartitionReaderFactory, which have exactly the same meaning as they are > in `FileFormat` > 2. Rename `buildColumnarReader` as `buildColumnarReaderWithPartitionValues` > to make the naming consistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27057) Common trait for limit exec operators
[ https://issues.apache.org/jira/browse/SPARK-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-27057. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23976 [https://github.com/apache/spark/pull/23976] > Common trait for limit exec operators > - > > Key: SPARK-27057 > URL: https://issues.apache.org/jira/browse/SPARK-27057 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Trivial > Fix For: 3.0.0 > > > Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the > UnaryExecNode trait as the common trait. It is slightly inconvenient to > distinguish those operators from others. The ticket aims to introduce new > trait for all 3 operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27057) Common trait for limit exec operators
[ https://issues.apache.org/jira/browse/SPARK-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-27057: --- Assignee: Maxim Gekk > Common trait for limit exec operators > - > > Key: SPARK-27057 > URL: https://issues.apache.org/jira/browse/SPARK-27057 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Trivial > > Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the > UnaryExecNode trait as the common trait. It is slightly inconvenient to > distinguish those operators from others. The ticket aims to introduce new > trait for all 3 operators. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27078) Read Hive materialized view throw MatchError
[ https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27078. --- Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 3.0.0 2.4.2 This is resolved via https://github.com/apache/spark/pull/23984 > Read Hive materialized view throw MatchError > > > Key: SPARK-27078 > URL: https://issues.apache.org/jira/browse/SPARK-27078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 2.4.2, 3.0.0 > > > How to reproduce: > Hive side: > {code:sql} > CREATE TABLE materialized_view_tbl (key INT); > CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl; -- > Hive 3.x > CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM > materialized_view_tbl; -- Hive 2.3.x > {code} > Spark side(read from Hive 2.3.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1] > scala.MatchError: MATERIALIZED_VIEW (of class > org.apache.hadoop.hive.metastore.TableType) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > {code} > Spark side(read from Hive 3.1.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1] > java.lang.NoSuchFieldError: INDEX_TABLE > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27054) Remove Calcite dependency
[ https://issues.apache.org/jira/browse/SPARK-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-27054: - Assignee: Yuming Wang > Remove Calcite dependency > - > > Key: SPARK-27054 > URL: https://issues.apache.org/jira/browse/SPARK-27054 > Project: Spark > Issue Type: Improvement > Components: Build, SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > > Calcite is only used for > [runSqlHive|https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L699-L705] > when > {{hive.cbo.enable=true}}([SemanticAnalyzer|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280]). > So we can disable {{hive.cbo.enable}} and remove Calcite dependency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786246#comment-16786246 ] Hyukjin Kwon commented on SPARK-24624: -- If you mean Pandas UDF aggregate function, it's already fixed in upstream master. > Can not mix vectorized and non-vectorized UDFs > -- > > Key: SPARK-24624 > URL: https://issues.apache.org/jira/browse/SPARK-24624 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Xiao Li >Assignee: Li Jin >Priority: Major > Fix For: 2.4.0 > > > In the current impl, we have the limitation: users are unable to mix > vectorized and non-vectorized UDFs in same Project. This becomes worse since > our optimizer could combine continuous Projects into a single one. For > example, > {code} > applied_df = df.withColumn('regular', my_regular_udf('total', > 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty')) > {code} > Returns the following error. > {code} > IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs > java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized > UDFs > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:99) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3312) > at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2750) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe,
[jira] [Updated] (SPARK-27077) DataFrameReader and Number of Connection Limitation
[ https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Wu updated SPARK-27077: Description: I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader from sending out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156) Caused by: com.vertica.support.exceptions.NonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on databas e already met for was: I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader to send out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala
[jira] [Commented] (SPARK-27078) Read Hive materialized view throw MatchError
[ https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786233#comment-16786233 ] Apache Spark commented on SPARK-27078: -- User 'wangyum' has created a pull request for this issue: https://github.com/apache/spark/pull/23984 > Read Hive materialized view throw MatchError > > > Key: SPARK-27078 > URL: https://issues.apache.org/jira/browse/SPARK-27078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > Hive side: > {code:sql} > CREATE TABLE materialized_view_tbl (key INT); > CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl; -- > Hive 3.x > CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM > materialized_view_tbl; -- Hive 2.3.x > {code} > Spark side(read from Hive 2.3.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1] > scala.MatchError: MATERIALIZED_VIEW (of class > org.apache.hadoop.hive.metastore.TableType) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > {code} > Spark side(read from Hive 3.1.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1] > java.lang.NoSuchFieldError: INDEX_TABLE > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27078) Read Hive materialized view throw MatchError
[ https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27078: Assignee: (was: Apache Spark) > Read Hive materialized view throw MatchError > > > Key: SPARK-27078 > URL: https://issues.apache.org/jira/browse/SPARK-27078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > How to reproduce: > Hive side: > {code:sql} > CREATE TABLE materialized_view_tbl (key INT); > CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl; -- > Hive 3.x > CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM > materialized_view_tbl; -- Hive 2.3.x > {code} > Spark side(read from Hive 2.3.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1] > scala.MatchError: MATERIALIZED_VIEW (of class > org.apache.hadoop.hive.metastore.TableType) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > {code} > Spark side(read from Hive 3.1.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1] > java.lang.NoSuchFieldError: INDEX_TABLE > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27078) Read Hive materialized view throw MatchError
[ https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27078: Assignee: Apache Spark > Read Hive materialized view throw MatchError > > > Key: SPARK-27078 > URL: https://issues.apache.org/jira/browse/SPARK-27078 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > How to reproduce: > Hive side: > {code:sql} > CREATE TABLE materialized_view_tbl (key INT); > CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl; -- > Hive 3.x > CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM > materialized_view_tbl; -- Hive 2.3.x > {code} > Spark side(read from Hive 2.3.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1] > scala.MatchError: MATERIALIZED_VIEW (of class > org.apache.hadoop.hive.metastore.TableType) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > {code} > Spark side(read from Hive 3.1.x): > {code:java} > bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf > spark.sql.hive.metastore.jars=maven > spark-sql> select * from view_1; > 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1] > java.lang.NoSuchFieldError: INDEX_TABLE > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438) > at scala.Option.map(Option.scala:163) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27078) Read Hive materialized view throw MatchError
Yuming Wang created SPARK-27078: --- Summary: Read Hive materialized view throw MatchError Key: SPARK-27078 URL: https://issues.apache.org/jira/browse/SPARK-27078 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang How to reproduce: Hive side: {code:sql} CREATE TABLE materialized_view_tbl (key INT); CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl; -- Hive 3.x CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM materialized_view_tbl; -- Hive 2.3.x {code} Spark side(read from Hive 2.3.x): {code:java} bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf spark.sql.hive.metastore.jars=maven spark-sql> select * from view_1; 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1] scala.MatchError: MATERIALIZED_VIEW (of class org.apache.hadoop.hive.metastore.TableType) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434) at scala.Option.map(Option.scala:163) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) {code} Spark side(read from Hive 3.1.x): {code:java} bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf spark.sql.hive.metastore.jars=maven spark-sql> select * from view_1; 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1] java.lang.NoSuchFieldError: INDEX_TABLE at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438) at scala.Option.map(Option.scala:163) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370) at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277) at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215) at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214) at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260) at org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27077) DataFrameReader and Number of Connection Limitation
[ https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786228#comment-16786228 ] Yuming Wang commented on SPARK-27077: - Could you try to set {{numPartitions}} please? The maximum number of partitions that can be used for parallelism in table reading and writing. This also determines the maximum number of concurrent JDBC connections. If the number of partitions to write exceeds this limit, we decrease it to this limit by calling coalesce(numPartitions) before writing. http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html > DataFrameReader and Number of Connection Limitation > --- > > Key: SPARK-27077 > URL: https://issues.apache.org/jira/browse/SPARK-27077 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 2.3.2 >Reporter: Paul Wu >Priority: Major > > I am not very sure this is a Spark core issue or a Vertica issue, however I > intended to think this is Spark's issue. The problem is that when we try to > read with sparkSession.read.load from some datasource, in my case, Vertica > DB, the DataFrameReader needs to make some 'large' number of initial jdbc > connection requests. My account limits I can only use 16 (and I can see at > least 6 of them can be used for my loading), and when the "large" number of > the requests issued, I got exception below. In fact, I can see eventually it > could settle with fewer numbers of connections (in my case 2 simultaneous > DataFrameReader). So I think we should have a parameter that prevents the > reader to send out initial "bigger" number of connection requests than user's > limit. If we don't have this option parameter, my app could fail randomly due > to my Vertica account's number of connections allowed. > > java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: > New session rejected because connection limit of 16 on database already met > for M21176 > at com.vertica.util.ServerErrorData.buildException(Unknown Source) > at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) > at com.vertica.io.ProtocolStream.initSession(Unknown Source) > at com.vertica.core.VConnection.tryConnect(Unknown Source) > at com.vertica.core.VConnection.connect(Unknown Source) > at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown > Source) > at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) > at java.sql.DriverManager.getConnection(DriverManager.java:664) > at java.sql.DriverManager.getConnection(DriverManager.java:208) > at > com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) > at > com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) > at > com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) > at > org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341) > at > org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) > at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) > at > com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156) > Caused by: com.vertica.support.exceptions.NonTransientConnectionException: > [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit > of 16 on databas e already met for > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786217#comment-16786217 ] Jiaxin Shan edited comment on SPARK-26742 at 3/6/19 11:28 PM: -- I am willing to do that. I will make sure local integration test pass and then check in. [~shaneknapp] I am a new contributor and not very familiar with integration settings, may takes some time. I will sync with you later today. was (Author: seedjeffwan): I am willing to do that. I will make sure local integration test pass and then check in. [~shaneknapp] > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786217#comment-16786217 ] Jiaxin Shan commented on SPARK-26742: - I am willing to do that. I will make sure local integration test pass and then check in. [~shaneknapp] > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala
[ https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786202#comment-16786202 ] Val Feldsher commented on SPARK-25863: -- I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, and it appears that this happens whenever I use spark.driver.userClassPathFirst or spark.executor.userClassPathFirst. Similar to [SPARK-20241|https://issues.apache.org/jira/browse/SPARK-20241] , I see 2 different class loaders when I set this property to true: org.apache.spark.util.ChildFirstURLClassLoader and sun.misc.Launcher$AppClassLoader. This is accompanied by warning: Error calculating stats of compiled class. java.lang.IllegalArgumentException: Can not set final [B field org.codehaus.janino.util.ClassFile$CodeAttribute.code to org.codehaus.janino.util.ClassFile$CodeAttribute Followed by the empty.max error. > java.lang.UnsupportedOperationException: empty.max at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475) > - > > Key: SPARK-25863 > URL: https://issues.apache.org/jira/browse/SPARK-25863 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.1, 2.3.2 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: cache, catalyst, code-generation > > Failing task : > {noformat} > An error occurred while calling o2875.collectToPython. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 > in stage 21413.0 failed 4 times, most recent failure: Lost task 58.3 in stage > 21413.0 (TID 4057314, pc1udatahad117, executor 431): > java.lang.UnsupportedOperationException: empty.max > at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229) > at scala.collection.AbstractTraversable.max(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1418) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:81) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:263) > at > org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:262) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadC
[jira] [Comment Edited] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.
[ https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786202#comment-16786202 ] Val Feldsher edited comment on SPARK-25863 at 3/6/19 11:14 PM: --- I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, and it appears that this happens whenever I use spark.driver.userClassPathFirst or spark.executor.userClassPathFirst. Similarly to SPARK-20241 I see 2 different class loaders when I set this property to true: org.apache.spark.util.ChildFirstURLClassLoader and sun.misc.Launcher$AppClassLoader. This is accompanied by warning: Error calculating stats of compiled class. java.lang.IllegalArgumentException: Can not set final [B field org.codehaus.janino.util.ClassFile$CodeAttribute.code to org.codehaus.janino.util.ClassFile$CodeAttribute Followed by the empty.max error. was (Author: vfeldsher): I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, and it appears that this happens whenever I use spark.driver.userClassPathFirst or spark.executor.userClassPathFirst. Similar to [SPARK-20241|https://issues.apache.org/jira/browse/SPARK-20241] , I see 2 different class loaders when I set this property to true: org.apache.spark.util.ChildFirstURLClassLoader and sun.misc.Launcher$AppClassLoader. This is accompanied by warning: Error calculating stats of compiled class. java.lang.IllegalArgumentException: Can not set final [B field org.codehaus.janino.util.ClassFile$CodeAttribute.code to org.codehaus.janino.util.ClassFile$CodeAttribute Followed by the empty.max error. > java.lang.UnsupportedOperationException: empty.max at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475) > - > > Key: SPARK-25863 > URL: https://issues.apache.org/jira/browse/SPARK-25863 > Project: Spark > Issue Type: Bug > Components: Optimizer, Spark Core >Affects Versions: 2.3.1, 2.3.2 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: cache, catalyst, code-generation > > Failing task : > {noformat} > An error occurred while calling o2875.collectToPython. > : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 > in stage 21413.0 failed 4 times, most recent failure: Lost task 58.3 in stage > 21413.0 (TID 4057314, pc1udatahad117, executor 431): > java.lang.UnsupportedOperationException: empty.max > at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229) > at scala.collection.AbstractTraversable.max(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1418) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490) > at > org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599) > at > org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379) > at > org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257) > at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000) > at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:81) > at > org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:40) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318) > at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401) > at > org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:263) > at > org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:262) > at > org.apa
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786187#comment-16786187 ] shane knapp commented on SPARK-26742: - [~seedjeffwan] are you going to open a new PR for the master branch? > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27077) DataFrameReader and Number of Connection Limitation
Paul Wu created SPARK-27077: --- Summary: DataFrameReader and Number of Connection Limitation Key: SPARK-27077 URL: https://issues.apache.org/jira/browse/SPARK-27077 Project: Spark Issue Type: Bug Components: Spark Core, SQL Affects Versions: 2.3.2 Reporter: Paul Wu I am not very sure this is a Spark core issue or a Vertica issue, however I intended to think this is Spark's issue. The problem is that when we try to read with sparkSession.read.load from some datasource, in my case, Vertica DB, the DataFrameReader needs to make some 'large' number of initial jdbc connection requests. My account limits I can only use 16 (and I can see at least 6 of them can be used for my loading), and when the "large" number of the requests issued, I got exception below. In fact, I can see eventually it could settle with fewer numbers of connections (in my case 2 simultaneous DataFrameReader). So I think we should have a parameter that prevents the reader to send out initial "bigger" number of connection requests than user's limit. If we don't have this option parameter, my app could fail randomly due to my Vertica account's number of connections allowed. java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on database already met for M21176 at com.vertica.util.ServerErrorData.buildException(Unknown Source) at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source) at com.vertica.io.ProtocolStream.initSession(Unknown Source) at com.vertica.core.VConnection.tryConnect(Unknown Source) at com.vertica.core.VConnection.connect(Unknown Source) at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown Source) at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:664) at java.sql.DriverManager.getConnection(DriverManager.java:208) at com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105) at com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34) at com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164) at com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156) Caused by: com.vertica.support.exceptions.NonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 16 on databas e already met for -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27076) Getting the timeout error while writing parquet/csv files to s3
[ https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivas rao gajjala updated SPARK-27076: - Description: Hi, I'm trying to writing parquet/csv files from s3 using Amazon EMR clusters with the lable(emr-5.9.0) and below is the error I'm facing. {code:java} org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at Migration.migrate(Migration.scala:211) at DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351) at scala.util.control.Breaks.breakable(Breaks.scala:38) at DataMigrationFramework$.main(DataMigrationFramework.scala:350) at DataMigrationFramework.main(DataMigrationFramework.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 1120.15 in stage 5.0 (TID 8886, ip-10-120-60-82.ec2.internal, executor 4): com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazonaws.
[jira] [Updated] (SPARK-27076) Getting the timeout error while writing parquet/csv files to s3
[ https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivas rao gajjala updated SPARK-27076: - Summary: Getting the timeout error while writing parquet/csv files to s3 (was: Getting the timeout error while reading parquet/csv files from s3) > Getting the timeout error while writing parquet/csv files to s3 > --- > > Key: SPARK-27076 > URL: https://issues.apache.org/jira/browse/SPARK-27076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: srinivas rao gajjala >Priority: Major > > Hi, > I'm trying to write parquet files from s3 using Amazon EMR clusters with the > lable(emr-5.9.0) and below is the error I'm facing. > {code:java} > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at > org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at > DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at > Migration.migrate(Migration.scala:211) at > DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at > DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351) > at scala.util.control.Breaks.breakable(Breaks.scala:38) at > DataMigrationFramework$.main(DataMigrationFramework.scala:350) at > DataMigrationFramework.main(DataMigrationFramework.scala) at > sun.reflect.Nati
[jira] [Updated] (SPARK-27076) Getting the timeout error while reading parquet/csv files from s3
[ https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivas rao gajjala updated SPARK-27076: - Summary: Getting the timeout error while reading parquet/csv files from s3 (was: Getting the timeout error while reading parquet files from s3) > Getting the timeout error while reading parquet/csv files from s3 > - > > Key: SPARK-27076 > URL: https://issues.apache.org/jira/browse/SPARK-27076 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.2.0 >Reporter: srinivas rao gajjala >Priority: Major > > Hi, > I'm trying to read parquet files from s3 using Amazon EMR clusters with the > lable(emr-5.9.0) and below is the error I'm facing. > {code:java} > org.apache.spark.SparkException: Job aborted. at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at > org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) > at > org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) > at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at > org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at > org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at > DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at > Migration.migrate(Migration.scala:211) at > DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at > DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351) > at scala.util.control.Breaks.breakable(Breaks.scala:38) at > DataMigrationFramework$.main(DataMigrationFramework.scala:350) at > DataMigrationFramework.main(DataMigrationFramework.scala) at > sun.reflect.Nat
[jira] [Updated] (SPARK-27076) Getting the timeout error while reading parquet/csv files from s3
[ https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] srinivas rao gajjala updated SPARK-27076: - Description: Hi, I'm trying to write parquet files from s3 using Amazon EMR clusters with the lable(emr-5.9.0) and below is the error I'm facing. {code:java} org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at Migration.migrate(Migration.scala:211) at DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351) at scala.util.control.Breaks.breakable(Breaks.scala:38) at DataMigrationFramework$.main(DataMigrationFramework.scala:350) at DataMigrationFramework.main(DataMigrationFramework.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 1120.15 in stage 5.0 (TID 8886, ip-10-120-60-82.ec2.internal, executor 4): com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazonaws.http.A
[jira] [Created] (SPARK-27076) Getting the timeout error while reading parquet files from s3
srinivas rao gajjala created SPARK-27076: Summary: Getting the timeout error while reading parquet files from s3 Key: SPARK-27076 URL: https://issues.apache.org/jira/browse/SPARK-27076 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.2.0 Reporter: srinivas rao gajjala Hi, I'm trying to read parquet files from s3 using Amazon EMR clusters with the lable(emr-5.9.0) and below is the error I'm facing. {code:java} org.apache.spark.SparkException: Job aborted. at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438) at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at Migration.migrate(Migration.scala:211) at DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351) at scala.util.control.Breaks.breakable(Breaks.scala:38) at DataMigrationFramework$.main(DataMigrationFramework.scala:350) at DataMigrationFramework.main(DataMigrationFramework.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 112
[jira] [Resolved] (SPARK-27019) Spark UI's SQL tab shows inconsistent values
[ https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-27019. Resolution: Fixed Fix Version/s: 2.4.2 3.0.0 Issue resolved by pull request 23939 [https://github.com/apache/spark/pull/23939] > Spark UI's SQL tab shows inconsistent values > > > Key: SPARK-27019 > URL: https://issues.apache.org/jira/browse/SPARK-27019 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.4.0 >Reporter: peay >Assignee: Shahid K I >Priority: Major > Fix For: 3.0.0, 2.4.2 > > Attachments: Screenshot from 2019-03-01 21-31-48.png, > application_1550040445209_4748, query-1-details.png, query-1-list.png, > query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png > > > Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the > Spark UI, where submitted/duration make no sense, description has the ID > instead of the actual description. > Clicking on the link to open a query, the SQL plan is missing as well. > I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to > very large values like 30k out of paranoia that we may have too many events, > but to no avail. I have not identified anything particular that leads to > that: it doesn't occur in all my jobs, but it does occur in a lot of them > still. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27019) Spark UI's SQL tab shows inconsistent values
[ https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-27019: -- Assignee: Shahid K I > Spark UI's SQL tab shows inconsistent values > > > Key: SPARK-27019 > URL: https://issues.apache.org/jira/browse/SPARK-27019 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 2.4.0 >Reporter: peay >Assignee: Shahid K I >Priority: Major > Attachments: Screenshot from 2019-03-01 21-31-48.png, > application_1550040445209_4748, query-1-details.png, query-1-list.png, > query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png > > > Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the > Spark UI, where submitted/duration make no sense, description has the ID > instead of the actual description. > Clicking on the link to open a query, the SQL plan is missing as well. > I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to > very large values like 30k out of paranoia that we may have too many events, > but to no avail. I have not identified anything particular that leads to > that: it doesn't occur in all my jobs, but it does occur in a lot of them > still. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786068#comment-16786068 ] shane knapp edited comment on SPARK-26742 at 3/6/19 9:18 PM: - 1.12.6 passes: {noformat} [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 175 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 6 minutes, 32 seconds. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.009 s] [INFO] Spark Project Tags . SUCCESS [ 2.767 s] [INFO] Spark Project Local DB . SUCCESS [ 1.973 s] [INFO] Spark Project Networking ... SUCCESS [ 3.491 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.878 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.948 s] [INFO] Spark Project Launcher . SUCCESS [ 3.866 s] [INFO] Spark Project Core . SUCCESS [ 23.852 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [07:32 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:15 min [INFO] Finished at: 2019-03-06T11:40:06-08:00 [INFO] jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 --kubernetes-version=v1.12.6 {noformat} also k8s v1.10.13: {noformat} [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 184 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 7 minutes, 52 seconds. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 2.793 s] [INFO] Spark Project Tags . SUCCESS [ 2.848 s] [INFO] Spark Project Local DB . SUCCESS [ 2.024 s] [INFO] Spark Project Networking ... SUCCESS [ 3.462 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.907 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.929 s] [INFO] Spark Project Launcher . SUCCESS [ 3.939 s] [INFO] Spark Project Core . SUCCESS [ 24.078 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:57 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 09:40 min [INFO] Finished at: 2019-03-06T13:13:10-08:00 [INFO] jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh minikube --vm-driver=kvm2 start
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786146#comment-16786146 ] Stavros Kontopoulos commented on SPARK-26742: - That is a good point [~seedjeffwan] I expect that the fabric8io project will catch up, it seems very active. If not we will deal with it differently I guess. Release testing is a one shot thing which make it easier. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786146#comment-16786146 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 9:27 PM: - That is a good point [~seedjeffwan] I expect that the fabric8io client project will catch up, it seems very active. If not we will deal with it differently I guess. Release testing is a one shot thing which make it easier. was (Author: skonto): That is a good point [~seedjeffwan] I expect that the fabric8io project will catch up, it seems very active. If not we will deal with it differently I guess. Release testing is a one shot thing which make it easier. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786142#comment-16786142 ] shane knapp commented on SPARK-26742: - my testing shows that 1.13.x passes our integration tests w/o issue. testing against multiple versions of k8s will require some more work, both in the build configs *and* in the spark repo, so that depending on the spark branch we can know which version(s) to test against. however, testing against multiple versions of k8s w/minikube will be problematic (see discussion: https://issues.apache.org/jira/browse/SPARK-26973) > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786140#comment-16786140 ] Jiaxin Shan commented on SPARK-26742: - Agree to target v1.13.x even though 4.1.2 may not pass compatibility test. Here's a feature list for v1.13.0 and we need to make sure the apis Spark using are not affected. [https://sysdig.com/blog/whats-new-in-kubernetes-1-13/] > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786132#comment-16786132 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 9:11 PM: - My take is: latest for PRs and for releases the currently supported ones. That means though that we will need to keep up with client upgrades even on master and monitor the client project. Hopefully that will minimize the chance of something being broken at release time. If people want to use an old k8s version they should use an old Spark release. was (Author: skonto): My take is: latest for PRs and for releases the currently supported ones. That means though that we will need to keep up with client upgrades even on master and monitor the client project. Hopefully that will minimize the chance of something being broken at release time. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786132#comment-16786132 ] Stavros Kontopoulos commented on SPARK-26742: - My take is: latest for PRs and for releases the currently supported ones. That means though that we will need to keep up with client upgrades even on master and monitor the client project. Hopefully that will minimize the chance of something being broken at release time. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
[ https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shahid K I updated SPARK-27075: --- Description: Test steps: 1) bin/spark-sql 2) run some queries 3) Open SQL page in the webui 4) Try to sort any column in the execution table. !image-2019-03-07-02-37-20-453.png! was: Test steps: 1) bin/spark-sql 2) run some queries 3) Open SQL page in the webui 4) Try to sort any column in the execution table. file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png > Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' > > > Key: SPARK-27075 > URL: https://issues.apache.org/jira/browse/SPARK-27075 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Shahid K I >Priority: Major > Attachments: image-2019-03-07-02-37-20-453.png > > > Test steps: > 1) bin/spark-sql > 2) run some queries > 3) Open SQL page in the webui > 4) Try to sort any column in the execution table. > !image-2019-03-07-02-37-20-453.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
[ https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shahid K I updated SPARK-27075: --- Attachment: image-2019-03-07-02-37-20-453.png > Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' > > > Key: SPARK-27075 > URL: https://issues.apache.org/jira/browse/SPARK-27075 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Shahid K I >Priority: Major > Attachments: image-2019-03-07-02-37-20-453.png > > > Test steps: > 1) bin/spark-sql > 2) run some queries > 3) Open SQL page in the webui > 4) Try to sort any column in the execution table. > file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
[ https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786122#comment-16786122 ] Shahid K I commented on SPARK-27075: I will raise a PR > Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' > > > Key: SPARK-27075 > URL: https://issues.apache.org/jira/browse/SPARK-27075 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Shahid K I >Priority: Major > > Test steps: > 1) bin/spark-sql > 2) run some queries > 3) Open SQL page in the webui > 4) Try to sort any column in the execution table. > file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
[ https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27075: Assignee: Apache Spark > Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' > > > Key: SPARK-27075 > URL: https://issues.apache.org/jira/browse/SPARK-27075 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Shahid K I >Assignee: Apache Spark >Priority: Major > > Test steps: > 1) bin/spark-sql > 2) run some queries > 3) Open SQL page in the webui > 4) Try to sort any column in the execution table. > file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
[ https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-27075: Assignee: (was: Apache Spark) > Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' > > > Key: SPARK-27075 > URL: https://issues.apache.org/jira/browse/SPARK-27075 > Project: Spark > Issue Type: Bug > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Shahid K I >Priority: Major > > Test steps: > 1) bin/spark-sql > 2) run some queries > 3) Open SQL page in the webui > 4) Try to sort any column in the execution table. > file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786129#comment-16786129 ] shane knapp commented on SPARK-26742: - alright, last thing before these two PRs are mergeable: *which version of k8s do we want to test against?* > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
Shahid K I created SPARK-27075: -- Summary: Sorting table column in SQL WEBUI page throws 'IllegalArgumentException' Key: SPARK-27075 URL: https://issues.apache.org/jira/browse/SPARK-27075 Project: Spark Issue Type: Bug Components: SQL, Web UI Affects Versions: 3.0.0 Reporter: Shahid K I Test steps: 1) bin/spark-sql 2) run some queries 3) Open SQL page in the webui 4) Try to sort any column in the execution table. file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786102#comment-16786102 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 8:29 PM: - Yeah I was finally able to do so thanks. I was also hitting this one: https://github.com/kubernetes/kubeadm/issues/992 If anyone faces that he just needs to install `crictl` under `/usr/bin` after he builds it from source. was (Author: skonto): Yeah I was able to do so thanks. I was also hitting this one: https://github.com/kubernetes/kubeadm/issues/992 If anyone faces that he just needs to install `crictl` under `/usr/bin` after he builds it from source. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786102#comment-16786102 ] Stavros Kontopoulos commented on SPARK-26742: - Yeah I was able to do so thanks. I was also hitting this one: https://github.com/kubernetes/kubeadm/issues/992 If anyone faces that he just needs to install `crictl` under `/usr/bin` after he builds it from source. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786097#comment-16786097 ] shane knapp commented on SPARK-26742: - [~skonto] -- i was able to downgrade k8s successfully w/o deleting the .minikube and .kube dirs: {noformat} minikube stop; minikube delete; minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 --kubernetes-version=v1.12.6 {noformat} > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786074#comment-16786074 ] shane knapp commented on SPARK-26742: - [~skonto] the only problems i've had are when i'm going backwards w/versions, not forwards. also, k8s 1.14.x will be released in ~2-3 weeks (from my friend on the k8s team)... this means that v1.11.x will no longer be officially supported. i feel that we should target 1.13.x right now for our testing infra. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786069#comment-16786069 ] Stavros Kontopoulos commented on SPARK-26742: - That is awesome! > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786068#comment-16786068 ] shane knapp commented on SPARK-26742: - 1.12.6 passes for me as well: {noformat} [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 175 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 6 minutes, 32 seconds. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.009 s] [INFO] Spark Project Tags . SUCCESS [ 2.767 s] [INFO] Spark Project Local DB . SUCCESS [ 1.973 s] [INFO] Spark Project Networking ... SUCCESS [ 3.491 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.878 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.948 s] [INFO] Spark Project Launcher . SUCCESS [ 3.866 s] [INFO] Spark Project Core . SUCCESS [ 23.852 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [07:32 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:15 min [INFO] Finished at: 2019-03-06T11:40:06-08:00 [INFO] jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 --kubernetes-version=v1.12.6 {noformat} > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786067#comment-16786067 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 7:40 PM: - I did all that plus rm /etc/kubernetes, I will try once more (maybe missed .kube). I actually followed this: https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842 was (Author: skonto): I did all that plus rm /etc/kubernetes, I will try once more. I actually followed this: https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842 > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786067#comment-16786067 ] Stavros Kontopoulos commented on SPARK-26742: - I did all that plus rm /etc/kubernetes, I will try once more. I actually followed this: https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842 > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786060#comment-16786060 ] shane knapp edited comment on SPARK-26742 at 3/6/19 7:33 PM: - i was unable to get 1.11.7 to work until i did a {noformat}minikube stop && minikube delete; rm -rf ~/.minikube ~/.kube{noformat}. testing against k8s 1.12.6 now. was (Author: shaneknapp): i was unable to get 1.11.7 to work until i did a `minikube stop && minikube delete; rm -rf ~/.minikube ~/.kube`. testing against k8s 1.12.6 now. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27023) Kubernetes client timeouts should be configurable
[ https://issues.apache.org/jira/browse/SPARK-27023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-27023: -- Issue Type: Improvement (was: New Feature) > Kubernetes client timeouts should be configurable > - > > Key: SPARK-27023 > URL: https://issues.apache.org/jira/browse/SPARK-27023 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Onur Satici >Priority: Major > > Kubernetes clients used in driver submission, in client mode and in > requesting executors should have configurable read and connect timeouts -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786060#comment-16786060 ] shane knapp commented on SPARK-26742: - i was unable to get 1.11.7 to work until i did a `minikube stop && minikube delete; rm -rf ~/.minikube ~/.kube`. testing against k8s 1.12.6 now. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786056#comment-16786056 ] Stavros Kontopoulos commented on SPARK-26742: - I mean `minikube` binary because that flag does not work when passing that `--kubernetes-version=v1.11.7` on my aws instance. Minikube never starts, but that is for none driver, good to know kvm works. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters
[ https://issues.apache.org/jira/browse/SPARK-27063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786051#comment-16786051 ] Rob Vesse commented on SPARK-27063: --- [~skonto] Yes we have experienced the same problem, I think my next PR for this will look to make that overall timeout user configurable > Spark on K8S Integration Tests timeouts are too short for some test clusters > > > Key: SPARK-27063 > URL: https://issues.apache.org/jira/browse/SPARK-27063 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Rob Vesse >Priority: Minor > > As noted during development for SPARK-26729 there are a couple of integration > test timeouts that are too short when running on slower clusters e.g. > developers laptops, small CI clusters etc > [~skonto] confirmed that he has also experienced this behaviour in the > discussion on PR [PR > 23846|https://github.com/apache/spark/pull/23846#discussion_r262564938] > We should up the defaults of this timeouts as an initial step and longer term > consider making the timeouts themselves configurable -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786047#comment-16786047 ] shane knapp commented on SPARK-26742: - works for me against k8s v1.11.7: {noformat} [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 181 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 7 minutes, 1 second. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.010 s] [INFO] Spark Project Tags . SUCCESS [ 2.829 s] [INFO] Spark Project Local DB . SUCCESS [ 2.144 s] [INFO] Spark Project Networking ... SUCCESS [ 3.455 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.902 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.977 s] [INFO] Spark Project Launcher . SUCCESS [ 4.040 s] [INFO] Spark Project Core . SUCCESS [ 24.034 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:46 min [INFO] Finished at: 2019-03-06T11:15:46-08:00 [INFO] jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 --kubernetes-version=v1.11.7 {noformat} > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786047#comment-16786047 ] shane knapp edited comment on SPARK-26742 at 3/6/19 7:17 PM: - your PR works for me against k8s v1.11.7: {noformat} [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 181 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 7 minutes, 1 second. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.010 s] [INFO] Spark Project Tags . SUCCESS [ 2.829 s] [INFO] Spark Project Local DB . SUCCESS [ 2.144 s] [INFO] Spark Project Networking ... SUCCESS [ 3.455 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.902 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.977 s] [INFO] Spark Project Launcher . SUCCESS [ 4.040 s] [INFO] Spark Project Core . SUCCESS [ 24.034 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:46 min [INFO] Finished at: 2019-03-06T11:15:46-08:00 [INFO] jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 --kubernetes-version=v1.11.7 {noformat} was (Author: shaneknapp): works for me against k8s v1.11.7: {noformat} [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 181 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 7 minutes, 1 second. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.010 s] [INFO] Spark Project Tags . SUCCESS [ 2.829 s] [INFO] Spark Project Local DB . SUCCESS [ 2.144 s] [INFO] Spark Project Networking ... SUCCESS [ 3.455 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.902 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.977 s] [INFO] Spark Project Launcher . SUCCESS [ 4.040 s] [INFO] Spark Project Core . SUCCESS [ 24.034 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 08:46 min [INFO] Finished at: 2019-03-06T11:15:46-08:00 [INFO] jenkins@ubuntu-
[jira] [Resolved] (SPARK-27023) Kubernetes client timeouts should be configurable
[ https://issues.apache.org/jira/browse/SPARK-27023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-27023. --- Resolution: Fixed Assignee: Onur Satici Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/23928 > Kubernetes client timeouts should be configurable > - > > Key: SPARK-27023 > URL: https://issues.apache.org/jira/browse/SPARK-27023 > Project: Spark > Issue Type: Improvement > Components: Kubernetes >Affects Versions: 3.0.0 >Reporter: Onur Satici >Assignee: Onur Satici >Priority: Major > Fix For: 3.0.0 > > > Kubernetes clients used in driver submission, in client mode and in > requesting executors should have configurable read and connect timeouts -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786019#comment-16786019 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:59 PM: - Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version=v1.11.7 has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default versions other than 1.13. was (Author: skonto): Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default versions other than 1.13. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786023#comment-16786023 ] shane knapp commented on SPARK-26742: - i think you mean: s/minikube/k8s anyways, exactly which version of k8s do we want to be testing against? > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27006) SPIP: .NET bindings for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786030#comment-16786030 ] Sean Owen commented on SPARK-27006: --- You can use the Apache license, and follow Apache processes, without this code going into Spark. Heck it's even possible that you propose this as a top-level ASF project, though I don't think that ultimately makes sense. You can also of course solicit feedback here. It's not possible to give you a dev branch or any access to ASF repos unless you're a committer. But, you can of course fork the Github repo and do whatever you want. Anyone you allow can contribute. You can sync your fork's master and integrate as often as you like. This kind of prejudges that it's going to be merged into Spark, and i think that's highly unlikely. I don't think you need permission or oversight from anyone on Spark as a result. You can announce the work on ASF lists, ask for feedback, publish your packages as you like. The problem is that the (positive) idea of making sure your bindings stay up to date with Spark has a cost: now the whole project bears responsibility for not breaking it, updating it, releasing it. You may contribute a lot of that work, or intend to. But a change of this scope is going to inevitably put a lot of work on others. My opinion is it won't be worth it -- not because this isn't valuable, but because it's equally valuable in the form it is now as a separate project. You bear the burden of keeping it up to date, sure, but that's intended. > SPIP: .NET bindings for Apache Spark > > > Key: SPARK-27006 > URL: https://issues.apache.org/jira/browse/SPARK-27006 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Minor > Original Estimate: 4,032h > Remaining Estimate: 4,032h > > h4. Background and Motivation: > Apache Spark provides programming language support for Scala/Java (native), > and extensions for Python and R. While a variety of other language extensions > are possible to include in Apache Spark, .NET would bring one of the largest > developer community to the table. Presently, no good Big Data solution exists > for .NET developers in open source. This SPIP aims at discussing how we can > bring Apache Spark goodness to the .NET development platform. > .NET is a free, cross-platform, open source developer platform for building > many different types of applications. With .NET, you can use multiple > languages, editors, and libraries to build for web, mobile, desktop, gaming, > and IoT types of applications. Even with .NET serving millions of developers, > there is no good Big Data solution that exists today, which this SPIP aims to > address. > The .NET developer community is one of the largest programming language > communities in the world. Its flagship programming language C# is listed as > one of the most popular programming languages in a variety of articles and > statistics: > * Most popular Technologies on Stack Overflow: > [https://insights.stackoverflow.com/survey/2018/#most-popular-technologies|https://insights.stackoverflow.com/survey/2018/] > > * Most popular languages on GitHub 2018: > [https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10#2-java-9|https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10] > > * 1M+ new developers last 1 year > * Second most demanded technology on LinkedIn > * Top 30 High velocity OSS projects on GitHub > Including a C# language extension in Apache Spark will enable millions of > .NET developers to author Big Data applications in their preferred > programming language, developer environment, and tooling support. We aim to > promote the .NET bindings for Spark through engagements with the Spark > community (e.g., we are scheduled to present an early prototype at the SF > Spark Summit 2019) and the .NET developer community (e.g., similar > presentations will be held at .NET developer conferences this year). As > such, we believe that our efforts will help grow the Spark community by > making it accessible to the millions of .NET developers. > Furthermore, our early discussions with some large .NET development teams got > an enthusiastic reception. > We recognize that earlier attempts at this goal (specifically Mobius > [https://github.com/Microsoft/Mobius]) were unsuccessful primarily due to the > lack of communication with the Spark community. Therefore, another goal of > this proposal is to not only develop .NET bindings for Spark in open source, > but also continuously seek feedback from the Spark community via posted > Jira’s (like this one) and t
[jira] [Commented] (SPARK-18748) UDF multiple evaluations causes very poor performance
[ https://issues.apache.org/jira/browse/SPARK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786029#comment-16786029 ] Qingbo Hu commented on SPARK-18748: --- We have the same problem when using Spark structured streaming. This is a critical problem for us, since our UDF includes a counter that increases at every time it gets called, and the output of the UDF depends on this count. If the UDF gets executed multiple times when a field is referred, it will cause the output of our UDF incorrect. We cannot use cache() in this case, since we are in structured streaming. > UDF multiple evaluations causes very poor performance > - > > Key: SPARK-18748 > URL: https://issues.apache.org/jira/browse/SPARK-18748 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.4.0 >Reporter: Ohad Raviv >Priority: Major > > We have a use case where we have a relatively expensive UDF that needs to be > calculated. The problem is that instead of being calculated once, it gets > calculated over and over again. > for example: > {quote} > def veryExpensiveCalc(str:String) = \{println("blahblah1"); "nothing"\} > hiveContext.udf.register("veryExpensiveCalc", veryExpensiveCalc _) > hiveContext.sql("select * from (select veryExpensiveCalc('a') c)z where c is > not null and c<>''").show > {quote} > with the output: > {quote} > blahblah1 > blahblah1 > blahblah1 > +---+ > | c| > +---+ > |nothing| > +---+ > {quote} > You can see that for each reference of column "c" you will get the println. > that causes very poor performance for our real use case. > This also came out on StackOverflow: > http://stackoverflow.com/questions/40320563/spark-udf-called-more-than-once-per-record-when-df-has-too-many-columns > http://stackoverflow.com/questions/34587596/trying-to-turn-a-blob-into-multiple-columns-in-spark/ > with two problematic work-arounds: > 1. cache() after the first time. e.g. > {quote} > hiveContext.sql("select veryExpensiveCalc('a') as c").cache().where("c is not > null and c<>''").show > {quote} > while it works, in our case we can't do that because the table is too big to > cache. > 2. move back and forth to rdd: > {quote} > val df = hiveContext.sql("select veryExpensiveCalc('a') as c") > hiveContext.createDataFrame(df.rdd, df.schema).where("c is not null and > c<>''").show > {quote} > which works but then we loose some of the optimizations like push down > predicate features, etc. and its very ugly. > Any ideas on how we can make the UDF get calculated just once in a reasonable > way? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786012#comment-16786012 ] shane knapp commented on SPARK-26742: - 2.4 k8s integration tests pass w/the client upgrade to 4.1.2: {noformat} [INFO] [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ spark-kubernetes-integration-tests_2.11 --- Discovery starting. Discovery completed in 184 milliseconds. Run starting. Expected test count is: 14 KubernetesSuite: - Run SparkPi with no resources - Run SparkPi with a very long application name. - Use SparkLauncher.NO_RESOURCE - Run SparkPi with a master URL without a scheme. - Run SparkPi with an argument. - Run SparkPi with custom labels, annotations, and environment variables. - Run extraJVMOptions check on driver - Run SparkRemoteFileTest using a remote data file - Run SparkPi with env and mount secrets. - Run PySpark on simple pi.py example - Run PySpark with Python2 to test a pyfiles example - Run PySpark with Python3 to test a pyfiles example - Run PySpark with memory customization - Run in client mode. Run completed in 7 minutes, 24 seconds. Total number of tests run: 14 Suites: completed 2, aborted 0 Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 All tests passed. [INFO] [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM 2.4.2-SNAPSHOT SUCCESS [ 3.211 s] [INFO] Spark Project Tags . SUCCESS [ 3.515 s] [INFO] Spark Project Local DB . SUCCESS [ 2.181 s] [INFO] Spark Project Networking ... SUCCESS [ 3.738 s] [INFO] Spark Project Shuffle Streaming Service SUCCESS [ 1.891 s] [INFO] Spark Project Unsafe ... SUCCESS [ 1.999 s] [INFO] Spark Project Launcher . SUCCESS [ 3.926 s] [INFO] Spark Project Core . SUCCESS [ 23.593 s] [INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:35 min] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 09:19 min [INFO] Finished at: 2019-03-06T10:48:55-08:00 [INFO] {noformat} > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786019#comment-16786019 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:59 PM: - Cool! The question is does it pass with minikube 1.11 and 1.12 as well? Passing --kubernetes-version=v1.11.7 has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default versions other than 1.13. was (Author: skonto): Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version=v1.11.7 has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default versions other than 1.13. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786019#comment-16786019 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:58 PM: - Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default not 1.13. was (Author: skonto): Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version ahs not worked so far for me. So I am planning to test with older minikube versions which have as default not 1.13. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786019#comment-16786019 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:58 PM: - Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default versions other than 1.13. was (Author: skonto): Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version has not worked so far for me using the latest minikube binary. So I am planning to test with older minikube versions which have as default not 1.13. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786019#comment-16786019 ] Stavros Kontopoulos commented on SPARK-26742: - Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing --kubernetes-version ahs not worked so far for me. So I am planning to test with older minikube versions which have as default not 1.13. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27006) SPIP: .NET bindings for Apache Spark
[ https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786001#comment-16786001 ] Terry Kim commented on SPARK-27006: --- Thanks for the feedback. We are happy to work on addressing some of the voiced concerns. For example, we are already working on contributing the interop layer work ([SPARK-26257|https://issues.apache.org/jira/browse/SPARK-26257]), that should alleviate the code duplication problem, and we are planning on contributing it first. We are really looking forward to having early community engagement and feedback on this SPIP, and want to avoid a long discussion with the community once the PR is ready. Thus, having an open development under full view of the Apache Spark community is important to us. Furthermore, we think it is legally beneficial if we can do the work under the established rules and guidelines of the Apache Software Foundation (including aspects such as legal contribution framework, naming etc.), instead of having to build it under a different legal framework (that for example may require a different contribution agreement and will be painful to transfer into the ASF). Therefore, we are interested in finding a way that allows us to mitigate [~srowen]'s concern about maintaining a whole other language and copy of the APIs and to be under the Apache Foundation umbrella. Towards addressing this, we’d like to propose the following: Use a development branch on Apache Spark master repo shepherded by an Apache Spark PMC member of the community’s choice to ensure (a) community visibility and (b) obtain alignment from the broader community on a continuous basis. With a shepherd, the core community can ensure that the feature branch does not go off the rails and can be merged back at the appropriate time. The SPIP proposers and other interested community members will explicitly undertake majority of the code vetting and QA work. In short, we can have two ASF branches: • master - main branch intended for release - does not include the in progress work for [SPARK-27006|https://issues.apache.org/jira/browse/SPARK-27006] • SPARK-27006 feature branch - branch for shared development work to make the changes needed for this SPIP shepherded by a PMC member of the community’s choice Subsequently, the development process would be: • We and anyone who would like to contribute will work in our/their fork of the SPARK-27006 branch, issues regular and frequent PRs against that branch for review by the broader community • We periodically (weekly or any other appropriate interval) merge master into the SPARK-27006 branch to ensure alignment with ongoing work in master • After the work is deemed complete, the shepherd makes the final call on whether the work meets the expectations in a way that does not affect the project’s core guiding principles Eventually, the SPARK-27006 branch is merged by the shepherd into master once we obtain agreement from the broader community to bring the project in. [~srowen], [~dongjoon] (and other community members), would this proposal address your concerns? Or are there other established patterns within the Apache Foundation that have worked in the past? [~dongjoon] Thanks for pointing it out about the guide. > SPIP: .NET bindings for Apache Spark > > > Key: SPARK-27006 > URL: https://issues.apache.org/jira/browse/SPARK-27006 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Minor > Original Estimate: 4,032h > Remaining Estimate: 4,032h > > h4. Background and Motivation: > Apache Spark provides programming language support for Scala/Java (native), > and extensions for Python and R. While a variety of other language extensions > are possible to include in Apache Spark, .NET would bring one of the largest > developer community to the table. Presently, no good Big Data solution exists > for .NET developers in open source. This SPIP aims at discussing how we can > bring Apache Spark goodness to the .NET development platform. > .NET is a free, cross-platform, open source developer platform for building > many different types of applications. With .NET, you can use multiple > languages, editors, and libraries to build for web, mobile, desktop, gaming, > and IoT types of applications. Even with .NET serving millions of developers, > there is no good Big Data solution that exists today, which this SPIP aims to > address. > The .NET developer community is one of the largest programming language > communities in the world. Its flagship programming language C# is listed as > one of the most popular programming languages in a variety of articles and > statistics: > * Most popular Technologies
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785987#comment-16785987 ] shane knapp commented on SPARK-26742: - ok, i found the 2.4 PR: https://github.com/apache/spark/pull/23993 testing now. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27045) SQL tab in UI shows actual SQL instead of callsite
[ https://issues.apache.org/jira/browse/SPARK-27045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27045: Summary: SQL tab in UI shows actual SQL instead of callsite (was: SQL tab in UI shows callsite instead of actual SQL) > SQL tab in UI shows actual SQL instead of callsite > -- > > Key: SPARK-27045 > URL: https://issues.apache.org/jira/browse/SPARK-27045 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 2.3.2, 2.3.3, 3.0.0 >Reporter: Ajith S >Priority: Major > Attachments: image-2019-03-04-18-24-27-469.png, > image-2019-03-04-18-24-54-053.png > > > When we run sql in spark ( for example via thrift server), the SparkUI SQL > tab must show SQL instead of stacktrace which is more useful to end user. > Instead in description column it currently shows the callsite shortform which > is less useful > Actual: > !image-2019-03-04-18-24-27-469.png! > > Expected: > !image-2019-03-04-18-24-54-053.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27045) SQL tab in UI shows callsite instead of actual SQL
[ https://issues.apache.org/jira/browse/SPARK-27045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajith S updated SPARK-27045: Issue Type: Improvement (was: Bug) > SQL tab in UI shows callsite instead of actual SQL > -- > > Key: SPARK-27045 > URL: https://issues.apache.org/jira/browse/SPARK-27045 > Project: Spark > Issue Type: Improvement > Components: SQL, Web UI >Affects Versions: 2.3.2, 2.3.3, 3.0.0 >Reporter: Ajith S >Priority: Major > Attachments: image-2019-03-04-18-24-27-469.png, > image-2019-03-04-18-24-54-053.png > > > When we run sql in spark ( for example via thrift server), the SparkUI SQL > tab must show SQL instead of stacktrace which is more useful to end user. > Instead in description column it currently shows the callsite shortform which > is less useful > Actual: > !image-2019-03-04-18-24-27-469.png! > > Expected: > !image-2019-03-04-18-24-54-053.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785983#comment-16785983 ] shane knapp commented on SPARK-26742: - please link to the PRs (master + 2.4) here and i can test them independently. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785962#comment-16785962 ] Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:14 PM: - I created one for the 2.4 branch just in case we want to test and upgrade. was (Author: skonto): I created one for the 2.4 branch. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785962#comment-16785962 ] Stavros Kontopoulos commented on SPARK-26742: - I created one for the 2.4 branch. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2
[ https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785970#comment-16785970 ] shane knapp commented on SPARK-26742: - you can submit a PR now. it will fail on the existing ubuntu workers, but i can test it manually on the staging system that has the upgraded/deployed minikube + k8s. just throw the link to the PR in here and it'll take me ~30 mins to confirm that the changes work. if my build passes, i will temporarily take the ubuntu workers out of jenkins, update one to use the new minikube/k8s and re-trigger the test on that PR. if *that* build passes, i'll update the remaining production ubuntu workers and put them back in to rotation. then we can merge the client upgrade PR. sound good? i have the time to get this done right now. > Bump Kubernetes Client Version to 4.1.2 > --- > > Key: SPARK-26742 > URL: https://issues.apache.org/jira/browse/SPARK-26742 > Project: Spark > Issue Type: Dependency upgrade > Components: Kubernetes >Affects Versions: 2.4.0, 3.0.0 >Reporter: Steve Davids >Priority: Major > Labels: easyfix > Fix For: 3.0.0 > > > Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master > branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest > Kubernetes compatibility support for newer clusters: > https://github.com/fabric8io/kubernetes-client#compatibility-matrix -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-27065) avoid more than one active task set managers for a stage
[ https://issues.apache.org/jira/browse/SPARK-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-27065. -- Resolution: Fixed Fix Version/s: 2.3.4 2.4.1 3.0.0 Issue resolved by pull request 23927 [https://github.com/apache/spark/pull/23927] > avoid more than one active task set managers for a stage > > > Key: SPARK-27065 > URL: https://issues.apache.org/jira/browse/SPARK-27065 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 2.3.3, 2.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.0.0, 2.4.1, 2.3.4 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple tim
[ https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-25250. -- Resolution: Fixed > Race condition with tasks running when new attempt for same stage is created > leads to other task in the next attempt running on the same partition id > retry multiple times > -- > > Key: SPARK-25250 > URL: https://issues.apache.org/jira/browse/SPARK-25250 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.3.1 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > We recently had a scenario where a race condition occurred when a task from > previous stage attempt just finished before new attempt for the same stage > was created due to fetch failure, so the new task created in the second > attempt on the same partition id was retrying multiple times due to > TaskCommitDenied Exception without realizing that the task in earlier attempt > was already successful. > For example, consider a task with partition id 9000 and index 9000 running in > stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. > Just within this timespan, the above task completes successfully, thus, > marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has > not yet been created, the taskset info for that stage is not available to the > TaskScheduler so, naturally, the partition id 9000 has not been marked > completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same > partition id 9000. This task fails due to CommitDeniedException and since, it > does not see the corresponding partition id as been marked successful, it > keeps retrying multiple times until the job finally succeeds. It doesn't > cause any job failures because the DAG scheduler is tracking the partitions > separate from the task set managers. > > Steps to Reproduce: > # Run any large job involving shuffle operation. > # When the ShuffleMap stage finishes and the ResultStage begins running, > cause this stage to throw a fetch failure exception(Try deleting certain > shuffle files on any host). > # Observe the task attempt numbers for the next stage attempt. Please note > that this issue is an intermittent one, so it might not happen all the time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple time
[ https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-25250: - Fix Version/s: 3.0.0 2.4.1 > Race condition with tasks running when new attempt for same stage is created > leads to other task in the next attempt running on the same partition id > retry multiple times > -- > > Key: SPARK-25250 > URL: https://issues.apache.org/jira/browse/SPARK-25250 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.3.1 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Major > Fix For: 2.4.1, 3.0.0 > > > We recently had a scenario where a race condition occurred when a task from > previous stage attempt just finished before new attempt for the same stage > was created due to fetch failure, so the new task created in the second > attempt on the same partition id was retrying multiple times due to > TaskCommitDenied Exception without realizing that the task in earlier attempt > was already successful. > For example, consider a task with partition id 9000 and index 9000 running in > stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. > Just within this timespan, the above task completes successfully, thus, > marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has > not yet been created, the taskset info for that stage is not available to the > TaskScheduler so, naturally, the partition id 9000 has not been marked > completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same > partition id 9000. This task fails due to CommitDeniedException and since, it > does not see the corresponding partition id as been marked successful, it > keeps retrying multiple times until the job finally succeeds. It doesn't > cause any job failures because the DAG scheduler is tracking the partitions > separate from the task set managers. > > Steps to Reproduce: > # Run any large job involving shuffle operation. > # When the ShuffleMap stage finishes and the ResultStage begins running, > cause this stage to throw a fetch failure exception(Try deleting certain > shuffle files on any host). > # Observe the task attempt numbers for the next stage attempt. Please note > that this issue is an intermittent one, so it might not happen all the time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple tim
[ https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-25250: Assignee: Parth Gandhi > Race condition with tasks running when new attempt for same stage is created > leads to other task in the next attempt running on the same partition id > retry multiple times > -- > > Key: SPARK-25250 > URL: https://issues.apache.org/jira/browse/SPARK-25250 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core >Affects Versions: 2.3.1 >Reporter: Parth Gandhi >Assignee: Parth Gandhi >Priority: Major > > We recently had a scenario where a race condition occurred when a task from > previous stage attempt just finished before new attempt for the same stage > was created due to fetch failure, so the new task created in the second > attempt on the same partition id was retrying multiple times due to > TaskCommitDenied Exception without realizing that the task in earlier attempt > was already successful. > For example, consider a task with partition id 9000 and index 9000 running in > stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. > Just within this timespan, the above task completes successfully, thus, > marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has > not yet been created, the taskset info for that stage is not available to the > TaskScheduler so, naturally, the partition id 9000 has not been marked > completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same > partition id 9000. This task fails due to CommitDeniedException and since, it > does not see the corresponding partition id as been marked successful, it > keeps retrying multiple times until the job finally succeeds. It doesn't > cause any job failures because the DAG scheduler is tracking the partitions > separate from the task set managers. > > Steps to Reproduce: > # Run any large job involving shuffle operation. > # When the ShuffleMap stage finishes and the ResultStage begins running, > cause this stage to throw a fetch failure exception(Try deleting certain > shuffle files on any host). > # Observe the task attempt numbers for the next stage attempt. Please note > that this issue is an intermittent one, so it might not happen all the time. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-24669) Managed table was not cleared of path after drop database cascade
[ https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-24669. --- Resolution: Fixed Assignee: Udbhav Agrawal Fix Version/s: 3.0.0 2.4.2 2.3.4 This is resolved via https://github.com/apache/spark/pull/23905 . Thank you, [~Udbhav Agrawal]. I added you to Apache Spark Contributor group. > Managed table was not cleared of path after drop database cascade > - > > Key: SPARK-24669 > URL: https://issues.apache.org/jira/browse/SPARK-24669 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Dong Jiang >Assignee: Udbhav Agrawal >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > > I can do the following in sequence > # Create a managed table using path options > # Drop the table via dropping the parent database cascade > # Re-create the database and table with a different path > # The new table shows data from the old path, not the new path > {code} > echo "first" > /tmp/first.csv > echo "second" > /tmp/second.csv > spark-shell > spark.version > res0: String = 2.3.0 > spark.sql("create database foo") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/first.csv')") > spark.table("foo.first").show() > +-+ > | id| > +-+ > |first| > +-+ > spark.sql("drop database foo cascade") > spark.sql("create database foo") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/second.csv')") > "note, the path is different now, pointing to second.csv, but still showing > data from first file" > spark.table("foo.first").show() > +-+ > | id| > +-+ > |first| > +-+ > "now, if I drop the table explicitly, instead of via dropping database > cascade, then it will be the correct result" > spark.sql("drop table foo.first") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/second.csv')") > spark.table("foo.first").show() > +--+ > |id| > +--+ > |second| > +--+ > {code} > Same sequence failed in 2.3.1 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs
[ https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785920#comment-16785920 ] peay commented on SPARK-24624: -- Are there plans to support something similar for aggregation functions? > Can not mix vectorized and non-vectorized UDFs > -- > > Key: SPARK-24624 > URL: https://issues.apache.org/jira/browse/SPARK-24624 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.3.1 >Reporter: Xiao Li >Assignee: Li Jin >Priority: Major > Fix For: 2.4.0 > > > In the current impl, we have the limitation: users are unable to mix > vectorized and non-vectorized UDFs in same Project. This becomes worse since > our optimizer could combine continuous Projects into a single one. For > example, > {code} > applied_df = df.withColumn('regular', my_regular_udf('total', > 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty')) > {code} > Returns the following error. > {code} > IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs > java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized > UDFs > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234) > at scala.collection.immutable.List.foreach(List.scala:381) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:234) > at scala.collection.immutable.List.map(List.scala:285) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114) > at > org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113) > at > scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124) > at scala.collection.immutable.List.foldLeft(List.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:99) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3312) > at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2750) > ... > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...
[jira] [Commented] (SPARK-24669) Managed table was not cleared of path after drop database cascade
[ https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785900#comment-16785900 ] Udbhav Agrawal commented on SPARK-24669: Thankyou [~dongjoon] > Managed table was not cleared of path after drop database cascade > - > > Key: SPARK-24669 > URL: https://issues.apache.org/jira/browse/SPARK-24669 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0, 2.3.1 >Reporter: Dong Jiang >Assignee: Udbhav Agrawal >Priority: Major > Fix For: 2.3.4, 2.4.2, 3.0.0 > > > I can do the following in sequence > # Create a managed table using path options > # Drop the table via dropping the parent database cascade > # Re-create the database and table with a different path > # The new table shows data from the old path, not the new path > {code} > echo "first" > /tmp/first.csv > echo "second" > /tmp/second.csv > spark-shell > spark.version > res0: String = 2.3.0 > spark.sql("create database foo") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/first.csv')") > spark.table("foo.first").show() > +-+ > | id| > +-+ > |first| > +-+ > spark.sql("drop database foo cascade") > spark.sql("create database foo") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/second.csv')") > "note, the path is different now, pointing to second.csv, but still showing > data from first file" > spark.table("foo.first").show() > +-+ > | id| > +-+ > |first| > +-+ > "now, if I drop the table explicitly, instead of via dropping database > cascade, then it will be the correct result" > spark.sql("drop table foo.first") > spark.sql("create table foo.first (id string) using csv options > (path='/tmp/second.csv')") > spark.table("foo.first").show() > +--+ > |id| > +--+ > |second| > +--+ > {code} > Same sequence failed in 2.3.1 as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27072) Changing the parameter value of completedJob.sort to X prints stacktrace in sparkWebUI
[ https://issues.apache.org/jira/browse/SPARK-27072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16785889#comment-16785889 ] Marcelo Vanzin commented on SPARK-27072: I'd expect an error 401 here, but you get a status 200. But in any case, this is only a bug if there's a link in the UI somewhere with the wrong column name. If you're changing the URL manually, well, it's your fault. > Changing the parameter value of completedJob.sort to X prints stacktrace in > sparkWebUI > -- > > Key: SPARK-27072 > URL: https://issues.apache.org/jira/browse/SPARK-27072 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 2.4.0 >Reporter: Haripriya >Priority: Major > > Manipulating the value of completedJob.sort parameter > From > x.x.x.x:4040/jobs/?&completedJob.sort=Description&completedJob.pageSize=100#completed > To > x.x.x.x:4040/jobs/job/?id=1&completedStage.sort=x > is printing Stacktrace in webUI > > java.lang.IllegalArgumentException: Unknown column: x at > org.apache.spark.ui.jobs.JobDataSource.ordering(AllJobsPage.scala:493) at > org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:441) at > org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:533) at > org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:248) at > org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:297) at > org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at > org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at > org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at > javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at > org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) > at > org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493) > at > org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.spark_project.jetty.server.Server.handle(Server.java:539) at > org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at > org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-27074) Refactor HiveClientImpl runHive
[ https://issues.apache.org/jira/browse/SPARK-27074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-27074: Description: Hive 3.1.1's {{CommandProcessor}} have 2 changes: # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is: https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742. This is incorrect. # HIVE-18238(Hive 3.0.0) changed the {{Driver.close()}} function return type. This change is not compatible with the built-in Hive. was: Hive 3.1.1's {{CommandProcessor}} have 2 changes: # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is: [spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala|https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742] Lines 736 to 742 in [02bbe97|https://github.com/apache/spark/commit/02bbe977abaf7006b845a7e99d612b0235aa0025] | |case _ =>| | |if (state.out != null) {| | |// scalastyle:off println| | |state.out.println(tokens(0) + " " + cmd_1)| | |// scalastyle:on println| | |}| | |Seq(proc.run(cmd_1).getResponseCode.toString)| This is incorrect. # [HIVE-18238|http://issues.apache.org/jira/browse/HIVE-18238](Hive 3.0.0) changed the {{Driver.close()}} function return type. This change is not compatible with the built-in Hive. > Refactor HiveClientImpl runHive > --- > > Key: SPARK-27074 > URL: https://issues.apache.org/jira/browse/SPARK-27074 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > Hive 3.1.1's {{CommandProcessor}} have 2 changes: > # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is: > https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742. > This is incorrect. > # HIVE-18238(Hive 3.0.0) changed the {{Driver.close()}} function return > type. This change is not compatible with the built-in Hive. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org