[jira] [Commented] (SPARK-27055) Update Structured Streaming documentation because of DSv2 changes

2019-03-06 Thread Sandeep Katta (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786451#comment-16786451
 ] 

Sandeep Katta commented on SPARK-27055:
---

[~gsomogyi]  [~cloud_fan] from the code changes of 
[SPARK-26956|https://issues.apache.org/jira/browse/SPARK-26956]  Append and 
complete mode are supported, only the update mode is removed. So from the 
programing guide only this mode should be marked as unsupported. isn't it ?

 

> Update Structured Streaming documentation because of DSv2 changes
> -
>
> Key: SPARK-27055
> URL: https://issues.apache.org/jira/browse/SPARK-27055
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> Since SPARK-26956 has been merged the Structured Streaming documentation has 
> to be updated also to reflect the changes.
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-modes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27069) Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232

2019-03-06 Thread TAESUK KIM (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

TAESUK KIM updated SPARK-27069:
---
Summary: Spark(2.3.1) LDA transfomation memory 
error(java.lang.OutOfMemoryError at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232  
(was: Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123))

> Spark(2.3.1) LDA transfomation memory error(java.lang.OutOfMemoryError at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:1232
> 
>
> Key: SPARK-27069
> URL: https://issues.apache.org/jira/browse/SPARK-27069
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.3.2
> Environment: Below is my environment
> DataSet
>  # Document : about 100,000,000 --> 10,000,000 --> 1,000,000(All fail)
>  # Word : about 3553918(can't change)
> Spark environment
>  # executor-memory,driver-memory : 18G --> 32g --> 64 --> 128g(all fail)
>  # executor-core,driver-core : 3
>  # spark.serializer : default and 
> org.apache.spark.serializer.KryoSerializer(both fail)
>  # spark.executor.memoryOverhead : 18G --> 36G fail
> Jave version : 1.8.0_191 (Oracle Corporation)
>  
>Reporter: TAESUK KIM
>Priority: Major
>
> I trained LDA(feature dimension : 100, iteration: 100 or 50, Distributed 
> version , ml ) using Spark 2.3.2(emr-5.18.0) .
> After that I want to transform new DataSet by using that model. But when I 
> transform new data, I alway get error related memory error.
> I changed data size from x 0.1 , to x 0.01. But always get memory 
> error(java.lang.OutOfMemoryError at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
>  
> That hugeCapacity error(overflow) is happened when size of array is over 
> Integer.MAX_VALUE - 8. But I changed data size to small size. I can't find 
> why this error is happened.
> And I want to change serializer to KryoSerializer. But I found 
> this org.apache.spark.util.ClosureCleaner$.ensureSerializable always call 
> org.apache.spark.serializer.JavaSerializationStream even though I register 
> KryoClasses
>  
> Is there any thing I can do ?
>  
> Below is code
>  
> {{val countvModel = CountVectorizerModel.load("s3://~/") }}
> {{val ldaModel = DistributedLDAModel.load("s3://~/") }}
> {{val transformeddata=countvModel.transform(inputData).select("productid", 
> "itemid", "ptkString", "features") var featureldaDF = 
> ldaModel.transform(transformeddata).select("productid", "itemid", 
> "topicDistribution", "ptkString").toDF("productid", "itemid", "features", 
> "ptkString") featureldaDF=featureldaDF.persist //this is 328 line }}
>  
> Other testing
>  # Java option : UseParallelGC , UseG1GC (all fail)
> Below is log
> {{19/03/05 20:59:03 ERROR ApplicationMaster: User class threw exception: 
> java.lang.OutOfMemoryError java.lang.OutOfMemoryError at 
> java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at 
> java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at 
> java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) 
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at 
> org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
>  at 
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
>  at 
> java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
>  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at 
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
>  at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
>  at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:342)
>  at 
> org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335)
>  at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159) at 
> org.apache.spark.SparkContext.clean(SparkContext.scala:2299) at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:850)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1.apply(RDD.scala:849)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at 
> 

[jira] [Commented] (SPARK-23872) Can not connect to another metastore uri using two Spark sessions

2019-03-06 Thread Udbhav Agrawal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786454#comment-16786454
 ] 

Udbhav Agrawal commented on SPARK-23872:


Hi [~chelsa] , you have to close the spark_1 session to clear the active 
sparkcontext, otherwise new configuration won't be loaded.

So call spark_1.stop()

 

> Can not connect to another metastore uri using two Spark sessions
> -
>
> Key: SPARK-23872
> URL: https://issues.apache.org/jira/browse/SPARK-23872
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS  :CentOS release 6.8 (Final)
> JAVA : build 1.8.0_101-b13
> SPARK : 2.3.0
>  
>Reporter: Chan Min Park 
>Priority: Major
>
> In Spark 2.1.0, two sessions worked normally In 2.3.0, the first session 
> metasore information is used when the second session is run
>   $## Run Source Code  $##
>  val spark_1 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_A:9083")
>  .getOrCreate()
> spark_1.sql("SELECT A_FIELD FROM TABLE_A").show()
> SparkSession.clearActiveSession()
>  SparkSession.clearDefaultSession()
> val spark_2 = SparkSession.builder()
>  .enableHiveSupport()
>  .config("hive.metastore.uris", "thrift://HOST_B:9083")
>  .getOrCreate()
> spark_2.sql("SELECT B_FIELD FROM TABLE_B").show()
>   $## Run info result in spark 2.1.0  $##
>  ..
>  INFO metastore: Trying to connect to metastore with URI 
> thrift://*{color:#d04437}HOST_A{color}*:9083
>  ..
>  INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
> 0.807905 s
>  + +
>  / A_FIELD /
>  ---
>  /       A       /
>  
>  ..
>  INFO metastore: Trying to connect to metastore with URI 
> thrift://*{color:#d04437}HOST_B{color}*:9083
>  INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
> 0.807905 s
>  + +
>  / B_FIELD /
>  ---
>  /       B       /
>  
>  ..
>   $## Run info result in spark 2.3.0  $##
>  ..
>  INFO metastore: Trying to connect to metastore with URI 
> thrift://*{color:#d04437}HOST_A{color}*:9083
>  ..
>  INFO DAGScheduler: Job 3 finished: show at SparkHDFSTest.scala:20, took 
> 0.807905 s
>  + +
>  / A_FIELD /
>  ---
>  /       A       /
>  
>  ..
>  INFO metastore: Trying to connect to metastore with URI 
> thrift://*{color:#d04437}HOST_A{color}*:9083
>  ..
>  Exception in thread "main" org.apache.spark.sql.AnalysisException: Table or 
> view not found: `default`.`TABLE_B`; line 1 pos 19;
> **



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2019-03-06 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786456#comment-16786456
 ] 

peay commented on SPARK-24624:
--

I mean regular aggregation functions and Pandas UDF aggregation functions 
(i.e., expression of the form {{.groupBy(key).agg(F.avg("col"), 
pd_agg_udf("col2"))}}).

{{master}} seems to still require aggregation expressions to either all be 
regular agg. functions or all be Pandas UDF: 
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala#L467],
 right?

> Can not mix vectorized and non-vectorized UDFs
> --
>
> Key: SPARK-24624
> URL: https://issues.apache.org/jira/browse/SPARK-24624
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Assignee: Li Jin
>Priority: Major
> Fix For: 2.4.0
>
>
> In the current impl, we have the limitation: users are unable to mix 
> vectorized and non-vectorized UDFs in same Project. This becomes worse since 
> our optimizer could combine continuous Projects into a single one. For 
> example, 
> {code}
> applied_df = df.withColumn('regular', my_regular_udf('total', 
> 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty'))
> {code}
> Returns the following error. 
> {code}
> IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs
> java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized 
> UDFs
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>  at scala.collection.immutable.List.map(List.scala:285)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>  at scala.collection.immutable.List.foldLeft(List.scala:84)
>  at 
> org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100)
>  at 
> 

[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Jiaxin Shan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786418#comment-16786418
 ] 

Jiaxin Shan edited comment on SPARK-26742 at 3/7/19 6:05 AM:
-

Got some failures. I think it's not related to kubernetes cluster version but 
some other configuration. Do you have an idea?  [~shaneknapp] [~skonto]

 
{code:java}
[INFO] — exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @ 
spark-kubernetes-integration-tests_2.12 —
Must specify a Spark tarball to build Docker images against with --spark-tgz.
[INFO] 
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 4.754 s]
[INFO] Spark Project Tags . SUCCESS [ 3.560 s]
[INFO] Spark Project Local DB . SUCCESS [ 3.040 s]
[INFO] Spark Project Networking ... SUCCESS [ 4.559 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 2.559 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 3.040 s]
[INFO] Spark Project Launcher . SUCCESS [ 3.807 s]
[INFO] Spark Project Core . SUCCESS [ 32.979 s]
[INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 2.045 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:00 min
[INFO] Finished at: 2019-03-06T21:58:26-08:00
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
(setup-integration-test-env) on project 
spark-kubernetes-integration-tests_2.12: Command execution failed.: Process 
exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on 
project spark-kubernetes-integration-tests_2.12: Command execution failed.
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke (Method.java:498)
at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:289)
at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:415)
at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution 
failed.
at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:276)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:210)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
at 

[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Jiaxin Shan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786418#comment-16786418
 ] 

Jiaxin Shan commented on SPARK-26742:
-

Got some failures. I think it's not related to kubernetes cluster version but 
some other configuration. Do you have an idea?  [~shaneknapp] [~skonto]

 

```

[INFO] --- exec-maven-plugin:1.4.0:exec (setup-integration-test-env) @ 
spark-kubernetes-integration-tests_2.12 ---
Must specify a Spark tarball to build Docker images against with --spark-tgz.
[INFO] 
[INFO] Reactor Summary for Spark Project Parent POM 3.0.0-SNAPSHOT:
[INFO]
[INFO] Spark Project Parent POM ... SUCCESS [ 4.754 s]
[INFO] Spark Project Tags . SUCCESS [ 3.560 s]
[INFO] Spark Project Local DB . SUCCESS [ 3.040 s]
[INFO] Spark Project Networking ... SUCCESS [ 4.559 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [ 2.559 s]
[INFO] Spark Project Unsafe ... SUCCESS [ 3.040 s]
[INFO] Spark Project Launcher . SUCCESS [ 3.807 s]
[INFO] Spark Project Core . SUCCESS [ 32.979 s]
[INFO] Spark Project Kubernetes Integration Tests . FAILURE [ 2.045 s]
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 01:00 min
[INFO] Finished at: 2019-03-06T21:58:26-08:00
[INFO] 
[ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec 
(setup-integration-test-env) on project 
spark-kubernetes-integration-tests_2.12: Command execution failed.: Process 
exited with an error: 1 (Exit value: 1) -> [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (setup-integration-test-env) on 
project spark-kubernetes-integration-tests_2.12: Command execution failed.
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:215)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
 at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:305)
 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:192)
 at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:105)
 at org.apache.maven.cli.MavenCli.execute (MavenCli.java:956)
 at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:288)
 at org.apache.maven.cli.MavenCli.main (MavenCli.java:192)
 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke 
(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke (Method.java:498)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
(Launcher.java:289)
 at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:229)
 at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
(Launcher.java:415)
 at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:356)
Caused by: org.apache.maven.plugin.MojoExecutionException: Command execution 
failed.
 at org.codehaus.mojo.exec.ExecMojo.execute (ExecMojo.java:276)
 at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo 
(DefaultBuildPluginManager.java:137)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:210)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:156)
 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
(MojoExecutor.java:148)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:117)
 at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
(LifecycleModuleBuilder.java:81)
 at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
 (SingleThreadedBuilder.java:56)
 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
(LifecycleStarter.java:128)
 at org.apache.maven.DefaultMaven.doExecute 

[jira] [Created] (SPARK-27081) Support launching executors in existed Pods

2019-03-06 Thread Klaus Ma (JIRA)
Klaus Ma created SPARK-27081:


 Summary: Support launching executors in existed Pods
 Key: SPARK-27081
 URL: https://issues.apache.org/jira/browse/SPARK-27081
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Klaus Ma


Currently, spark-submit (Kuberentes) creates Pods on-demand to launch 
executors. But in our case/enhancement, those Pods included Volumes are ready 
there. So we'd like to have an option for spark-submit (Kuberetens) to launch 
executors in existed Pods.

 

/cc @liyinan926



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27080:


Assignee: Apache Spark

> Read parquet file with merging metastore schema should compare schema field 
> in uniform case.
> 
>
> Key: SPARK-27080
> URL: https://issues.apache.org/jira/browse/SPARK-27080
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.3.3, 2.4.0
>Reporter: BoMeng
>Assignee: Apache Spark
>Priority: Major
>
> In our product environment, when we upgrade spark from version 2.1 to 2.3, 
> the job failed with an exception as below:
> ---ERROR stack trace –
> Exception occur when running Job, 
> org.apache.spark.SparkException: Detected conflicting schemas when merging 
> the schema obtained from the Hive
>  Metastore with the one inferred from the file format. Metastore schema:
> {
>   "type" : "struct",
>   "fields" : [
> ..
> }
> Inferred schema:
> {
>   "type" : "struct",
>   "fields" : [
> ..
> }
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148)
> at 
> org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195)
> at 
> org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226)
> at 
> org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215)
> at 
> org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180)
>  
> The 

[jira] [Assigned] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27080:


Assignee: (was: Apache Spark)

> Read parquet file with merging metastore schema should compare schema field 
> in uniform case.
> 
>
> Key: SPARK-27080
> URL: https://issues.apache.org/jira/browse/SPARK-27080
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2, 2.3.3, 2.4.0
>Reporter: BoMeng
>Priority: Major
>
> In our product environment, when we upgrade spark from version 2.1 to 2.3, 
> the job failed with an exception as below:
> ---ERROR stack trace –
> Exception occur when running Job, 
> org.apache.spark.SparkException: Detected conflicting schemas when merging 
> the schema obtained from the Hive
>  Metastore with the one inferred from the file format. Metastore schema:
> {
>   "type" : "struct",
>   "fields" : [
> ..
> }
> Inferred schema:
> {
>   "type" : "struct",
>   "fields" : [
> ..
> }
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)
> at scala.Option.map(Option.scala:146)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156)
> at scala.Option.getOrElse(Option.scala:121)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)
> at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148)
> at 
> org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195)
> at 
> org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226)
> at 
> org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)
> at 
> org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215)
> at 
> org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180)
>  
> The following case can trigger the 

[jira] [Created] (SPARK-27080) Read parquet file with merging metastore schema should compare schema field in uniform case.

2019-03-06 Thread BoMeng (JIRA)
BoMeng created SPARK-27080:
--

 Summary: Read parquet file with merging metastore schema should 
compare schema field in uniform case.
 Key: SPARK-27080
 URL: https://issues.apache.org/jira/browse/SPARK-27080
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 2.3.3, 2.3.2
Reporter: BoMeng


In our product environment, when we upgrade spark from version 2.1 to 2.3, the 
job failed with an exception as below:

---ERROR stack trace –

Exception occur when running Job, 

org.apache.spark.SparkException: Detected conflicting schemas when merging the 
schema obtained from the Hive

 Metastore with the one inferred from the file format. Metastore schema:

{

  "type" : "struct",

  "fields" : [

..

}

Inferred schema:

{

  "type" : "struct",

  "fields" : [

..

}

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$.mergeWithMetastoreSchema(HiveMetastoreCatalog.scala:295)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$11.apply(HiveMetastoreCatalog.scala:243)

at scala.Option.map(Option.scala:146)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.org$apache$spark$sql$hive$HiveMetastoreCatalog$$inferIfNeeded(HiveMetastoreCatalog.scala:243)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:167)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4$$anonfun$5.apply(HiveMetastoreCatalog.scala:156)

at scala.Option.getOrElse(Option.scala:121)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:156)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog$$anonfun$4.apply(HiveMetastoreCatalog.scala:148)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.withTableCreationLock(HiveMetastoreCatalog.scala:54)

at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.convertToLogicalRelation(HiveMetastoreCatalog.scala:148)

at 
org.apache.spark.sql.hive.RelationConversions.org$apache$spark$sql$hive$RelationConversions$$convert(HiveStrategies.scala:195)

at 
org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:226)

at 
org.apache.spark.sql.hive.RelationConversions$$anonfun$apply$4.applyOrElse(HiveStrategies.scala:215)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)

at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:286)

at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)

at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)

at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)

at org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)

at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:215)

at org.apache.spark.sql.hive.RelationConversions.apply(HiveStrategies.scala:180)

 

The following case can trigger the exception, so we think it's a bug in spark2.3
{code:java}
// Parquet schema is subset of metaStore schema and has uppercase field name
assertResult(
  StructType(Seq(
StructField("UPPERCase", DoubleType, nullable = true),
StructField("lowerCase", BinaryType, nullable = true {

  HiveMetastoreCatalog.mergeWithMetastoreSchema(
StructType(Seq(
  StructField("UPPERCase", DoubleType, nullable = true),
  

[jira] [Commented] (SPARK-26879) Inconsistency in default column names for functions like inline and stack

2019-03-06 Thread Chakravarthi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786401#comment-16786401
 ] 

Chakravarthi commented on SPARK-26879:
--

Thanks for reporting. [~jashgala] I would like to work on this issue.

> Inconsistency in default column names for functions like inline and stack
> -
>
> Key: SPARK-26879
> URL: https://issues.apache.org/jira/browse/SPARK-26879
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Jash Gala
>Priority: Minor
>
> In the Spark SQL functions definitions, `inline` uses col1, col2, etc. (i.e. 
> 1-indexed columns), while `stack` uses col0, col1, col2, etc. (i.e. 0-indexed 
> columns).
> {code:title=spark-shell|borderStyle=solid}
> scala> spark.sql("SELECT stack(2, 1, 2, 3)").show
> +++
> |col0|col1|
> +++
> |   1|   2|
> |   3|null|
> +++
> scala>  spark.sql("SELECT inline_outer(array(struct(1, 'a'), struct(2, 
> 'b')))").show
> +++
> |col1|col2|
> +++
> |   1|   a|
> |   2|   b|
> +++
> {code}
> This feels like an issue with consistency. As discussed on [PR 
> #23748|https://github.com/apache/spark/pull/23748], it might be a good idea 
> to standardize this to something specific (like zero-based indexing) for 
> these and other similar functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26604) Register channel for stream request

2019-03-06 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786397#comment-16786397
 ] 

Felix Cheung commented on SPARK-26604:
--

could we backport this to branch-2.4?

> Register channel for stream request
> ---
>
> Key: SPARK-26604
> URL: https://issues.apache.org/jira/browse/SPARK-26604
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Now in {{TransportRequestHandler.processStreamRequest}}, when a stream 
> request is processed, the stream id is not registered with the current 
> channel in stream manager. It should do that so in case of that the channel 
> gets terminated we can remove associated streams from stream requests too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26868) Duplicate error message for implicit cartesian product in verbose explain

2019-03-06 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786381#comment-16786381
 ] 

Takeshi Yamamuro commented on SPARK-26868:
--

Yea, you can do.

> Duplicate error message for implicit cartesian product in verbose explain
> -
>
> Key: SPARK-26868
> URL: https://issues.apache.org/jira/browse/SPARK-26868
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Super trivial though, I just report this just in case (I think it would be 
> nice if we could print this error message in a cleaner way):
> {code:java}
> scala> Seq(1).toDF("id").write.saveAsTable("t1")
> scala> Seq(1).toDF("id").write.saveAsTable("t2")
> scala> sql("SELECT * FROM t1 JOIN t2").explain(true)
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Join Inner
>:- 'UnresolvedRelation `t1`
>+- 'UnresolvedRelation `t2`
> == Analyzed Logical Plan ==
> id: int, id: int
> Project [id#14, id#15]
> +- Join Inner
>:- SubqueryAlias `default`.`t1`
>:  +- Relation[id#14] parquet
>+- SubqueryAlias `default`.`t2`
>   +- Relation[id#15] parquet
> == Optimized Logical Plan ==
> org.apache.spark.sql.AnalysisException: Detected implicit cartesian product 
> for INNER join between logical plans
> Relation[id#14] parquet
> and
> Relation[id#15] parquet
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Detected implicit cartesian product 
> for INNER join between logical plans
> Relation[id#14] parquet
> and
> Relation[id#15] parquet
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26868) Duplicate error message for implicit cartesian product in verbose explain

2019-03-06 Thread nivedita singh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786362#comment-16786362
 ] 

nivedita singh commented on SPARK-26868:


If no one is working on it, can I work on this?

> Duplicate error message for implicit cartesian product in verbose explain
> -
>
> Key: SPARK-26868
> URL: https://issues.apache.org/jira/browse/SPARK-26868
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Super trivial though, I just report this just in case (I think it would be 
> nice if we could print this error message in a cleaner way):
> {code:java}
> scala> Seq(1).toDF("id").write.saveAsTable("t1")
> scala> Seq(1).toDF("id").write.saveAsTable("t2")
> scala> sql("SELECT * FROM t1 JOIN t2").explain(true)
> == Parsed Logical Plan ==
> 'Project [*]
> +- 'Join Inner
>:- 'UnresolvedRelation `t1`
>+- 'UnresolvedRelation `t2`
> == Analyzed Logical Plan ==
> id: int, id: int
> Project [id#14, id#15]
> +- Join Inner
>:- SubqueryAlias `default`.`t1`
>:  +- Relation[id#14] parquet
>+- SubqueryAlias `default`.`t2`
>   +- Relation[id#15] parquet
> == Optimized Logical Plan ==
> org.apache.spark.sql.AnalysisException: Detected implicit cartesian product 
> for INNER join between logical plans
> Relation[id#14] parquet
> and
> Relation[id#15] parquet
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> == Physical Plan ==
> org.apache.spark.sql.AnalysisException: Detected implicit cartesian product 
> for INNER join between logical plans
> Relation[id#14] parquet
> and
> Relation[id#15] parquet
> Join condition is missing or trivial.
> Either: use the CROSS JOIN syntax to allow cartesian products between these
> relations, or: enable implicit cartesian products by setting the configuration
> variable spark.sql.crossJoin.enabled=true;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27079) Fix typo & Remove useless imports

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27079:


Assignee: (was: Apache Spark)

> Fix typo & Remove useless imports
> -
>
> Key: SPARK-27079
> URL: https://issues.apache.org/jira/browse/SPARK-27079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27079) Fix typo & Remove useless imports

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27079:


Assignee: Apache Spark

> Fix typo & Remove useless imports
> -
>
> Key: SPARK-27079
> URL: https://issues.apache.org/jira/browse/SPARK-27079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: EdisonWang
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27079) Fix typo & Remove useless imports

2019-03-06 Thread Sharanabasappa G Keriwaddi (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786358#comment-16786358
 ] 

Sharanabasappa G Keriwaddi commented on SPARK-27079:


Could you please share more details on this issue with specific instances you 
observed.

> Fix typo & Remove useless imports
> -
>
> Key: SPARK-27079
> URL: https://issues.apache.org/jira/browse/SPARK-27079
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: EdisonWang
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27079) Fix typo & Remove useless imports

2019-03-06 Thread EdisonWang (JIRA)
EdisonWang created SPARK-27079:
--

 Summary: Fix typo & Remove useless imports
 Key: SPARK-27079
 URL: https://issues.apache.org/jira/browse/SPARK-27079
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: EdisonWang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27049) Support handling partition values in the abstraction of file source V2

2019-03-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-27049:
---

Assignee: Gengliang Wang

> Support handling partition values in the abstraction of file source V2
> --
>
> Key: SPARK-27049
> URL: https://issues.apache.org/jira/browse/SPARK-27049
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In FileFormat, the method buildReaderWithPartitionValues appends the 
> partition values to the end of the result of buildReader, so that for data 
> sources like CSV/JSON/AVRO only need to implement buildReader to read a 
> single file without taking care of partition values.
> This PR proposes to support handling partition values in file source v2 
> abstraction by
> 1.  Have two methods `buildReader` and `buildReaderWithPartitionValues` in 
> FilePartitionReaderFactory, which have exactly the same meaning as they are 
> in `FileFormat`
> 2. Rename `buildColumnarReader` as `buildColumnarReaderWithPartitionValues` 
> to make the naming consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27049) Support handling partition values in the abstraction of file source V2

2019-03-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27049.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23987
[https://github.com/apache/spark/pull/23987]

> Support handling partition values in the abstraction of file source V2
> --
>
> Key: SPARK-27049
> URL: https://issues.apache.org/jira/browse/SPARK-27049
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.0.0
>
>
> In FileFormat, the method buildReaderWithPartitionValues appends the 
> partition values to the end of the result of buildReader, so that for data 
> sources like CSV/JSON/AVRO only need to implement buildReader to read a 
> single file without taking care of partition values.
> This PR proposes to support handling partition values in file source v2 
> abstraction by
> 1.  Have two methods `buildReader` and `buildReaderWithPartitionValues` in 
> FilePartitionReaderFactory, which have exactly the same meaning as they are 
> in `FileFormat`
> 2. Rename `buildColumnarReader` as `buildColumnarReaderWithPartitionValues` 
> to make the naming consistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27057) Common trait for limit exec operators

2019-03-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-27057.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23976
[https://github.com/apache/spark/pull/23976]

> Common trait for limit exec operators
> -
>
> Key: SPARK-27057
> URL: https://issues.apache.org/jira/browse/SPARK-27057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the 
> UnaryExecNode trait as the common trait. It is slightly inconvenient to 
> distinguish those operators from others. The ticket aims to introduce new 
> trait for all 3 operators. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27057) Common trait for limit exec operators

2019-03-06 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-27057:
---

Assignee: Maxim Gekk

> Common trait for limit exec operators
> -
>
> Key: SPARK-27057
> URL: https://issues.apache.org/jira/browse/SPARK-27057
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Trivial
>
> Currently, CollectLimitExec, LocalLimitExec and GlobalLimitExec have the 
> UnaryExecNode trait as the common trait. It is slightly inconvenient to 
> distinguish those operators from others. The ticket aims to introduce new 
> trait for all 3 operators. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27078.
---
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0
   2.4.2

This is resolved via https://github.com/apache/spark/pull/23984

> Read Hive materialized view throw MatchError
> 
>
> Key: SPARK-27078
> URL: https://issues.apache.org/jira/browse/SPARK-27078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 2.4.2, 3.0.0
>
>
> How to reproduce:
> Hive side:
> {code:sql}
> CREATE TABLE materialized_view_tbl (key INT);
> CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
> Hive 3.x
> CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
> materialized_view_tbl;  -- Hive 2.3.x
> {code}
> Spark side(read from Hive 2.3.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
> scala.MatchError: MATERIALIZED_VIEW (of class 
> org.apache.hadoop.hive.metastore.TableType)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
> {code}
> Spark side(read from Hive 3.1.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
> java.lang.NoSuchFieldError: INDEX_TABLE
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27054) Remove Calcite dependency

2019-03-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-27054:
-

Assignee: Yuming Wang

> Remove Calcite dependency
> -
>
> Key: SPARK-27054
> URL: https://issues.apache.org/jira/browse/SPARK-27054
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
>
> Calcite is only used for 
> [runSqlHive|https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L699-L705]
>  when 
> {{hive.cbo.enable=true}}([SemanticAnalyzer|https://github.com/apache/hive/blob/release-1.2.1/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java#L278-L280]).
> So we can disable {{hive.cbo.enable}} and remove Calcite dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2019-03-06 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786246#comment-16786246
 ] 

Hyukjin Kwon commented on SPARK-24624:
--

If you mean Pandas UDF aggregate function, it's already fixed in upstream 
master.

> Can not mix vectorized and non-vectorized UDFs
> --
>
> Key: SPARK-24624
> URL: https://issues.apache.org/jira/browse/SPARK-24624
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Assignee: Li Jin
>Priority: Major
> Fix For: 2.4.0
>
>
> In the current impl, we have the limitation: users are unable to mix 
> vectorized and non-vectorized UDFs in same Project. This becomes worse since 
> our optimizer could combine continuous Projects into a single one. For 
> example, 
> {code}
> applied_df = df.withColumn('regular', my_regular_udf('total', 
> 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty'))
> {code}
> Returns the following error. 
> {code}
> IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs
> java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized 
> UDFs
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>  at scala.collection.immutable.List.map(List.scala:285)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>  at scala.collection.immutable.List.foldLeft(List.scala:84)
>  at 
> org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:99)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3312)
>  at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2750)
>  ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (SPARK-27077) DataFrameReader and Number of Connection Limitation

2019-03-06 Thread Paul Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Wu updated SPARK-27077:

Description: 
I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader from sending out initial "bigger" number of connection requests than 
user's limit. If we don't have this option parameter, my app could fail 
randomly due to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
     at com.vertica.util.ServerErrorData.buildException(Unknown Source)
     at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
     at com.vertica.io.ProtocolStream.initSession(Unknown Source)
     at com.vertica.core.VConnection.tryConnect(Unknown Source)
     at com.vertica.core.VConnection.connect(Unknown Source)
     at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
     at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
     at java.sql.DriverManager.getConnection(DriverManager.java:664)
     at java.sql.DriverManager.getConnection(DriverManager.java:208)
     at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
     at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
     at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
     at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
     at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
     at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
     at 
com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
 Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
[Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 
16 on databas    e already met for

 

  was:
I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader to send out initial "bigger" number of connection requests than user's 
limit. If we don't have this option parameter, my app could fail randomly due 
to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
    at com.vertica.util.ServerErrorData.buildException(Unknown Source)
    at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
    at com.vertica.io.ProtocolStream.initSession(Unknown Source)
    at com.vertica.core.VConnection.tryConnect(Unknown Source)
    at com.vertica.core.VConnection.connect(Unknown Source)
    at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
    at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
    at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
    at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
    at 

[jira] [Commented] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-06 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786233#comment-16786233
 ] 

Apache Spark commented on SPARK-27078:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/23984

> Read Hive materialized view throw MatchError
> 
>
> Key: SPARK-27078
> URL: https://issues.apache.org/jira/browse/SPARK-27078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> Hive side:
> {code:sql}
> CREATE TABLE materialized_view_tbl (key INT);
> CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
> Hive 3.x
> CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
> materialized_view_tbl;  -- Hive 2.3.x
> {code}
> Spark side(read from Hive 2.3.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
> scala.MatchError: MATERIALIZED_VIEW (of class 
> org.apache.hadoop.hive.metastore.TableType)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
> {code}
> Spark side(read from Hive 3.1.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
> java.lang.NoSuchFieldError: INDEX_TABLE
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27078:


Assignee: (was: Apache Spark)

> Read Hive materialized view throw MatchError
> 
>
> Key: SPARK-27078
> URL: https://issues.apache.org/jira/browse/SPARK-27078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> How to reproduce:
> Hive side:
> {code:sql}
> CREATE TABLE materialized_view_tbl (key INT);
> CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
> Hive 3.x
> CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
> materialized_view_tbl;  -- Hive 2.3.x
> {code}
> Spark side(read from Hive 2.3.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
> scala.MatchError: MATERIALIZED_VIEW (of class 
> org.apache.hadoop.hive.metastore.TableType)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
> {code}
> Spark side(read from Hive 3.1.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
> java.lang.NoSuchFieldError: INDEX_TABLE
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27078:


Assignee: Apache Spark

> Read Hive materialized view throw MatchError
> 
>
> Key: SPARK-27078
> URL: https://issues.apache.org/jira/browse/SPARK-27078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>
> How to reproduce:
> Hive side:
> {code:sql}
> CREATE TABLE materialized_view_tbl (key INT);
> CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
> Hive 3.x
> CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
> materialized_view_tbl;  -- Hive 2.3.x
> {code}
> Spark side(read from Hive 2.3.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
> scala.MatchError: MATERIALIZED_VIEW (of class 
> org.apache.hadoop.hive.metastore.TableType)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
> {code}
> Spark side(read from Hive 3.1.x):
> {code:java}
> bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
> spark.sql.hive.metastore.jars=maven
> spark-sql> select * from view_1;
> 19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
> java.lang.NoSuchFieldError: INDEX_TABLE
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
>   at scala.Option.map(Option.scala:163)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27078) Read Hive materialized view throw MatchError

2019-03-06 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-27078:
---

 Summary: Read Hive materialized view throw MatchError
 Key: SPARK-27078
 URL: https://issues.apache.org/jira/browse/SPARK-27078
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


How to reproduce:

Hive side:
{code:sql}
CREATE TABLE materialized_view_tbl (key INT);
CREATE MATERIALIZED VIEW view_1 AS SELECT * FROM materialized_view_tbl;  -- 
Hive 3.x
CREATE MATERIALIZED VIEW view_1 DISABLE REWRITE AS SELECT * FROM 
materialized_view_tbl;  -- Hive 2.3.x
{code}

Spark side(read from Hive 2.3.x):
{code:java}
bin/spark-sql --conf spark.sql.hive.metastore.version=2.3.4 --conf 
spark.sql.hive.metastore.jars=maven

spark-sql> select * from view_1;
19/03/06 16:33:44 ERROR SparkSQLDriver: Failed in [select * from view_1]
scala.MatchError: MATERIALIZED_VIEW (of class 
org.apache.hadoop.hive.metastore.TableType)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:434)
at scala.Option.map(Option.scala:163)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
{code}

Spark side(read from Hive 3.1.x):

{code:java}
bin/spark-sql --conf spark.sql.hive.metastore.version=3.1.1 --conf 
spark.sql.hive.metastore.jars=maven

spark-sql> select * from view_1;
19/03/05 19:55:37 ERROR SparkSQLDriver: Failed in [select * from view_1]
java.lang.NoSuchFieldError: INDEX_TABLE
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$3(HiveClientImpl.scala:438)
at scala.Option.map(Option.scala:163)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getTableOption$1(HiveClientImpl.scala:370)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:277)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:215)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:214)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:260)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getTableOption(HiveClientImpl.scala:368)
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27077) DataFrameReader and Number of Connection Limitation

2019-03-06 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786228#comment-16786228
 ] 

Yuming Wang commented on SPARK-27077:
-

Could you try to set {{numPartitions}} please?
The maximum number of partitions that can be used for parallelism in table 
reading and writing. This also determines the maximum number of concurrent JDBC 
connections. If the number of partitions to write exceeds this limit, we 
decrease it to this limit by calling coalesce(numPartitions) before writing.

http://spark.apache.org/docs/latest/sql-data-sources-jdbc.html

> DataFrameReader and Number of Connection Limitation
> ---
>
> Key: SPARK-27077
> URL: https://issues.apache.org/jira/browse/SPARK-27077
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.2
>Reporter: Paul Wu
>Priority: Major
>
> I am not very sure this is a Spark core issue or a Vertica issue, however I 
> intended to think this is Spark's issue.  The problem is that when we try to 
> read with sparkSession.read.load from some datasource, in my case, Vertica 
> DB, the DataFrameReader needs to make some 'large' number of  initial  jdbc 
> connection requests. My account limits I can only use 16 (and I can see at 
> least 6 of them can be used for my loading), and when the "large" number of 
> the requests issued, I got exception below.  In fact, I can see eventually it 
> could settle with fewer numbers of connections (in my case 2 simultaneous 
> DataFrameReader). So I think we should have a parameter that prevents the 
> reader to send out initial "bigger" number of connection requests than user's 
> limit. If we don't have this option parameter, my app could fail randomly due 
> to my Vertica account's number of connections allowed.
>  
> java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: 
> New session rejected because connection limit of 16 on database already met 
> for M21176
>     at com.vertica.util.ServerErrorData.buildException(Unknown Source)
>     at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
>     at com.vertica.io.ProtocolStream.initSession(Unknown Source)
>     at com.vertica.core.VConnection.tryConnect(Unknown Source)
>     at com.vertica.core.VConnection.connect(Unknown Source)
>     at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
> Source)
>     at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
>     at java.sql.DriverManager.getConnection(DriverManager.java:664)
>     at java.sql.DriverManager.getConnection(DriverManager.java:208)
>     at 
> com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
>     at 
> com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
>     at 
> com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
>     at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
>     at 
> org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
>     at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
>     at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
>     at 
> com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
> Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
> [Vertica][VJDBC](7470) FATAL: New session rejected because connection limit 
> of 16 on databas    e already met for
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Jiaxin Shan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786217#comment-16786217
 ] 

Jiaxin Shan edited comment on SPARK-26742 at 3/6/19 11:28 PM:
--

I am willing to do that. I will make sure local integration test pass and then 
check in. [~shaneknapp]

 

I am a new contributor and not very familiar with integration settings, may 
takes some time. I will sync with you later today. 


was (Author: seedjeffwan):
I am willing to do that. I will make sure local integration test pass and then 
check in. [~shaneknapp]

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Jiaxin Shan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786217#comment-16786217
 ] 

Jiaxin Shan commented on SPARK-26742:
-

I am willing to do that. I will make sure local integration test pass and then 
check in. [~shaneknapp]

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala

2019-03-06 Thread Val Feldsher (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786202#comment-16786202
 ] 

Val Feldsher commented on SPARK-25863:
--

I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, 
and it appears that this happens whenever I use spark.driver.userClassPathFirst 
or spark.executor.userClassPathFirst. Similar to 
[SPARK-20241|https://issues.apache.org/jira/browse/SPARK-20241] , I see 2 
different class loaders when I set this property to true: 
org.apache.spark.util.ChildFirstURLClassLoader and 
sun.misc.Launcher$AppClassLoader. This is accompanied by warning:

Error calculating stats of compiled class.
 java.lang.IllegalArgumentException: Can not set final [B field 
org.codehaus.janino.util.ClassFile$CodeAttribute.code to 
org.codehaus.janino.util.ClassFile$CodeAttribute
 
Followed by the empty.max error.

> java.lang.UnsupportedOperationException: empty.max at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> -
>
> Key: SPARK-25863
> URL: https://issues.apache.org/jira/browse/SPARK-25863
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: cache, catalyst, code-generation
>
> Failing task : 
> {noformat}
> An error occurred while calling o2875.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 
> in stage 21413.0 failed 4 times, most recent failure: Lost task 58.3 in stage 
> 21413.0 (TID 4057314, pc1udatahad117, executor 431): 
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.AbstractTraversable.max(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1418)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
> at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
> at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:81)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:40)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318)
> at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:263)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:262)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818)
> at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$apply$24.apply(RDD.scala:818)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> at 

[jira] [Comment Edited] (SPARK-25863) java.lang.UnsupportedOperationException: empty.max at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.

2019-03-06 Thread Val Feldsher (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786202#comment-16786202
 ] 

Val Feldsher edited comment on SPARK-25863 at 3/6/19 11:14 PM:
---

I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, 
and it appears that this happens whenever I use spark.driver.userClassPathFirst 
or spark.executor.userClassPathFirst. Similarly to SPARK-20241 I see 2 
different class loaders when I set this property to true: 
org.apache.spark.util.ChildFirstURLClassLoader and 
sun.misc.Launcher$AppClassLoader. This is accompanied by warning:

Error calculating stats of compiled class.
 java.lang.IllegalArgumentException: Can not set final [B field 
org.codehaus.janino.util.ClassFile$CodeAttribute.code to 
org.codehaus.janino.util.ClassFile$CodeAttribute

Followed by the empty.max error.


was (Author: vfeldsher):
I recently experienced similar issue while upgrading from spark 2.1.0 to 2.3.2, 
and it appears that this happens whenever I use spark.driver.userClassPathFirst 
or spark.executor.userClassPathFirst. Similar to 
[SPARK-20241|https://issues.apache.org/jira/browse/SPARK-20241] , I see 2 
different class loaders when I set this property to true: 
org.apache.spark.util.ChildFirstURLClassLoader and 
sun.misc.Launcher$AppClassLoader. This is accompanied by warning:

Error calculating stats of compiled class.
 java.lang.IllegalArgumentException: Can not set final [B field 
org.codehaus.janino.util.ClassFile$CodeAttribute.code to 
org.codehaus.janino.util.ClassFile$CodeAttribute
 
Followed by the empty.max error.

> java.lang.UnsupportedOperationException: empty.max at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> -
>
> Key: SPARK-25863
> URL: https://issues.apache.org/jira/browse/SPARK-25863
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 2.3.1, 2.3.2
>Reporter: Ruslan Dautkhanov
>Priority: Major
>  Labels: cache, catalyst, code-generation
>
> Failing task : 
> {noformat}
> An error occurred while calling o2875.collectToPython.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 58 
> in stage 21413.0 failed 4 times, most recent failure: Lost task 58.3 in stage 
> 21413.0 (TID 4057314, pc1udatahad117, executor 431): 
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.AbstractTraversable.max(Traversable.scala:104)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.updateAndGetCompilationStats(CodeGenerator.scala:1475)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:1418)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1493)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:1490)
> at 
> org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at 
> org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> at org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
> at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
> at org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
> at 
> org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1365)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:81)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.GeneratePredicate$.create(GeneratePredicate.scala:40)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1321)
> at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator.generate(CodeGenerator.scala:1318)
> at org.apache.spark.sql.execution.SparkPlan.newPredicate(SparkPlan.scala:401)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:263)
> at 
> org.apache.spark.sql.execution.columnar.InMemoryTableScanExec$$anonfun$filteredCachedBatches$1.apply(InMemoryTableScanExec.scala:262)
> at 
> 

[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786187#comment-16786187
 ] 

shane knapp commented on SPARK-26742:
-

[~seedjeffwan] are you going to open a new PR for the master branch?

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27077) DataFrameReader and Number of Connection Limitation

2019-03-06 Thread Paul Wu (JIRA)
Paul Wu created SPARK-27077:
---

 Summary: DataFrameReader and Number of Connection Limitation
 Key: SPARK-27077
 URL: https://issues.apache.org/jira/browse/SPARK-27077
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.2
Reporter: Paul Wu


I am not very sure this is a Spark core issue or a Vertica issue, however I 
intended to think this is Spark's issue.  The problem is that when we try to 
read with sparkSession.read.load from some datasource, in my case, Vertica DB, 
the DataFrameReader needs to make some 'large' number of  initial  jdbc 
connection requests. My account limits I can only use 16 (and I can see at 
least 6 of them can be used for my loading), and when the "large" number of the 
requests issued, I got exception below.  In fact, I can see eventually it could 
settle with fewer numbers of connections (in my case 2 simultaneous 
DataFrameReader). So I think we should have a parameter that prevents the 
reader to send out initial "bigger" number of connection requests than user's 
limit. If we don't have this option parameter, my app could fail randomly due 
to my Vertica account's number of connections allowed.

 

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](7470) FATAL: New 
session rejected because connection limit of 16 on database already met for 
M21176
    at com.vertica.util.ServerErrorData.buildException(Unknown Source)
    at com.vertica.io.ProtocolStream.readStartupMessages(Unknown Source)
    at com.vertica.io.ProtocolStream.initSession(Unknown Source)
    at com.vertica.core.VConnection.tryConnect(Unknown Source)
    at com.vertica.core.VConnection.connect(Unknown Source)
    at com.vertica.jdbc.common.BaseConnectionFactory.doConnect(Unknown 
Source)
    at com.vertica.jdbc.common.AbstractDriver.connect(Unknown Source)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:208)
    at 
com.vertica.spark.datasource.VerticaDataSourceRDD$.resolveTable(VerticaRDD.scala:105)
    at 
com.vertica.spark.datasource.VerticaRelation.(VerticaRelation.scala:34)
    at 
com.vertica.spark.datasource.DefaultSource.createRelation(VerticaSource.scala:47)
    at 
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:341)
    at 
org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
    at 
com.att.iqi.data.ConnectorPrepareHourlyDataRT$1.run(ConnectorPrepareHourlyDataRT.java:156)
Caused by: com.vertica.support.exceptions.NonTransientConnectionException: 
[Vertica][VJDBC](7470) FATAL: New session rejected because connection limit of 
16 on databas    e already met for

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27076) Getting the timeout error while writing parquet/csv files to s3

2019-03-06 Thread srinivas rao gajjala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivas rao gajjala updated SPARK-27076:
-
Description: 
Hi,

I'm trying to writing parquet/csv files from s3 using Amazon EMR clusters with 
the lable(emr-5.9.0) and below is the error I'm facing.
{code:java}
org.apache.spark.SparkException: Job aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
 at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at 
org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at 
DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at 
Migration.migrate(Migration.scala:211) at 
DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351)
 at scala.util.control.Breaks.breakable(Breaks.scala:38) at 
DataMigrationFramework$.main(DataMigrationFramework.scala:350) at 
DataMigrationFramework.main(DataMigrationFramework.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 1120.15 
in stage 5.0 (TID 8886, ip-10-120-60-82.ec2.internal, executor 4): 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool at 

[jira] [Updated] (SPARK-27076) Getting the timeout error while writing parquet/csv files to s3

2019-03-06 Thread srinivas rao gajjala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivas rao gajjala updated SPARK-27076:
-
Summary: Getting the timeout error while writing parquet/csv files to s3  
(was: Getting the timeout error while reading parquet/csv files from s3)

> Getting the timeout error while writing parquet/csv files to s3
> ---
>
> Key: SPARK-27076
> URL: https://issues.apache.org/jira/browse/SPARK-27076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: srinivas rao gajjala
>Priority: Major
>
> Hi,
> I'm trying to write parquet files from s3 using Amazon EMR clusters with the 
> lable(emr-5.9.0) and below is the error I'm facing.
> {code:java}
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 
> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) 
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at 
> org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at 
> DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at 
> Migration.migrate(Migration.scala:211) at 
> DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
> DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351)
>  at scala.util.control.Breaks.breakable(Breaks.scala:38) at 
> DataMigrationFramework$.main(DataMigrationFramework.scala:350) at 
> DataMigrationFramework.main(DataMigrationFramework.scala) at 
> 

[jira] [Updated] (SPARK-27076) Getting the timeout error while reading parquet/csv files from s3

2019-03-06 Thread srinivas rao gajjala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivas rao gajjala updated SPARK-27076:
-
Summary: Getting the timeout error while reading parquet/csv files from s3  
(was: Getting the timeout error while reading parquet files from s3)

> Getting the timeout error while reading parquet/csv files from s3
> -
>
> Key: SPARK-27076
> URL: https://issues.apache.org/jira/browse/SPARK-27076
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.0
>Reporter: srinivas rao gajjala
>Priority: Major
>
> Hi,
> I'm trying to read parquet files from s3 using Amazon EMR clusters with the 
> lable(emr-5.9.0) and below is the error I'm facing.
> {code:java}
> org.apache.spark.SparkException: Job aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 
> at 
> org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
>  at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) 
> at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) 
> at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at 
> org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at 
> DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at 
> Migration.migrate(Migration.scala:211) at 
> DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353)
>  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
> DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351)
>  at scala.util.control.Breaks.breakable(Breaks.scala:38) at 
> DataMigrationFramework$.main(DataMigrationFramework.scala:350) at 
> DataMigrationFramework.main(DataMigrationFramework.scala) at 
> 

[jira] [Updated] (SPARK-27076) Getting the timeout error while reading parquet/csv files from s3

2019-03-06 Thread srinivas rao gajjala (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

srinivas rao gajjala updated SPARK-27076:
-
Description: 
Hi,

I'm trying to write parquet files from s3 using Amazon EMR clusters with the 
lable(emr-5.9.0) and below is the error I'm facing.
{code:java}
org.apache.spark.SparkException: Job aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
 at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at 
org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at 
DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at 
Migration.migrate(Migration.scala:211) at 
DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351)
 at scala.util.control.Breaks.breakable(Breaks.scala:38) at 
DataMigrationFramework$.main(DataMigrationFramework.scala:350) at 
DataMigrationFramework.main(DataMigrationFramework.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 1120.15 
in stage 5.0 (TID 8886, ip-10-120-60-82.ec2.internal, executor 4): 
com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout 
waiting for connection from pool at 

[jira] [Created] (SPARK-27076) Getting the timeout error while reading parquet files from s3

2019-03-06 Thread srinivas rao gajjala (JIRA)
srinivas rao gajjala created SPARK-27076:


 Summary: Getting the timeout error while reading parquet files 
from s3
 Key: SPARK-27076
 URL: https://issues.apache.org/jira/browse/SPARK-27076
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.0
Reporter: srinivas rao gajjala


Hi,

I'm trying to read parquet files from s3 using Amazon EMR clusters with the 
lable(emr-5.9.0) and below is the error I'm facing.
{code:java}
org.apache.spark.SparkException: Job aborted. at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:215)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
 at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:173)
 at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:145)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.execution.datasources.DataSource.writeInFileFormat(DataSource.scala:438)
 at 
org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:474)
 at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:48)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
 at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:117)
 at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:138)
 at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) 
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:135) 
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:116) at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
 at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92) at 
org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:610) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:233) at 
org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:217) at 
org.apache.spark.sql.DataFrameWriter.csv(DataFrameWriter.scala:598) at 
DataFrameFromTo.dataFrameToFile(DataFrameFromTo.scala:120) at 
Migration.migrate(Migration.scala:211) at 
DataMigrationFramework$$anonfun$main$3$$anonfun$apply$mcV$sp$1.apply$mcVI$sp(DataMigrationFramework.scala:353)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) at 
DataMigrationFramework$$anonfun$main$3.apply$mcV$sp(DataMigrationFramework.scala:351)
 at scala.util.control.Breaks.breakable(Breaks.scala:38) at 
DataMigrationFramework$.main(DataMigrationFramework.scala:350) at 
DataMigrationFramework.main(DataMigrationFramework.scala) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:635)

 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1120 in stage 5.0 failed 16 times, most recent failure: Lost task 

[jira] [Resolved] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-06 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin resolved SPARK-27019.

   Resolution: Fixed
Fix Version/s: 2.4.2
   3.0.0

Issue resolved by pull request 23939
[https://github.com/apache/spark/pull/23939]

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Assignee: Shahid K I
>Priority: Major
> Fix For: 3.0.0, 2.4.2
>
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27019) Spark UI's SQL tab shows inconsistent values

2019-03-06 Thread Marcelo Vanzin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin reassigned SPARK-27019:
--

Assignee: Shahid K I

> Spark UI's SQL tab shows inconsistent values
> 
>
> Key: SPARK-27019
> URL: https://issues.apache.org/jira/browse/SPARK-27019
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 2.4.0
>Reporter: peay
>Assignee: Shahid K I
>Priority: Major
> Attachments: Screenshot from 2019-03-01 21-31-48.png, 
> application_1550040445209_4748, query-1-details.png, query-1-list.png, 
> query-job-1.png, screenshot-spark-ui-details.png, screenshot-spark-ui-list.png
>
>
> Since 2.4.0, I am frequently seeing broken outputs in the SQL tab of the 
> Spark UI, where submitted/duration make no sense, description has the ID 
> instead of the actual description.
> Clicking on the link to open a query, the SQL plan is missing as well.
> I have tried to increase `spark.scheduler.listenerbus.eventqueue.capacity` to 
> very large values like 30k out of paranoia that we may have too many events, 
> but to no avail. I have not identified anything particular that leads to 
> that: it doesn't occur in all my jobs, but it does occur in a lot of them 
> still.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786068#comment-16786068
 ] 

shane knapp edited comment on SPARK-26742 at 3/6/19 9:18 PM:
-

1.12.6 passes:

{noformat}
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 175 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 6 minutes, 32 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.009 s]
[INFO] Spark Project Tags . SUCCESS [  2.767 s]
[INFO] Spark Project Local DB . SUCCESS [  1.973 s]
[INFO] Spark Project Networking ... SUCCESS [  3.491 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.878 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.948 s]
[INFO] Spark Project Launcher . SUCCESS [  3.866 s]
[INFO] Spark Project Core . SUCCESS [ 23.852 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [07:32 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 08:15 min
[INFO] Finished at: 2019-03-06T11:40:06-08:00
[INFO] 
jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh
minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 
--kubernetes-version=v1.12.6
{noformat}

also k8s v1.10.13:

{noformat}
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 184 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 7 minutes, 52 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  2.793 s]
[INFO] Spark Project Tags . SUCCESS [  2.848 s]
[INFO] Spark Project Local DB . SUCCESS [  2.024 s]
[INFO] Spark Project Networking ... SUCCESS [  3.462 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.907 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.929 s]
[INFO] Spark Project Launcher . SUCCESS [  3.939 s]
[INFO] Spark Project Core . SUCCESS [ 24.078 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:57 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 09:40 min
[INFO] Finished at: 2019-03-06T13:13:10-08:00
[INFO] 
jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh
minikube --vm-driver=kvm2 start --memory 6000 

[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786146#comment-16786146
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

That is a good point [~seedjeffwan] I expect that the fabric8io project will 
catch up, it seems very active.
If not we will deal with it differently I guess. Release testing is a one shot 
thing which make it easier. 

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786146#comment-16786146
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 9:27 PM:
-

That is a good point [~seedjeffwan] I expect that the fabric8io client project 
will catch up, it seems very active.
If not we will deal with it differently I guess. Release testing is a one shot 
thing which make it easier. 


was (Author: skonto):
That is a good point [~seedjeffwan] I expect that the fabric8io project will 
catch up, it seems very active.
If not we will deal with it differently I guess. Release testing is a one shot 
thing which make it easier. 

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786142#comment-16786142
 ] 

shane knapp commented on SPARK-26742:
-

my testing shows that 1.13.x passes our integration tests w/o issue.

testing against multiple versions of k8s will require some more work, both in 
the build configs *and* in the spark repo, so that depending on the spark 
branch we can know which version(s) to test against.

however, testing against multiple versions of k8s w/minikube will be 
problematic (see discussion:  https://issues.apache.org/jira/browse/SPARK-26973)

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Jiaxin Shan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786140#comment-16786140
 ] 

Jiaxin Shan commented on SPARK-26742:
-

Agree to target v1.13.x even though 4.1.2 may not pass compatibility test. 

Here's a feature list for v1.13.0 and we need to make sure the apis Spark using 
are not affected.

[https://sysdig.com/blog/whats-new-in-kubernetes-1-13/]

 

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786132#comment-16786132
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 9:11 PM:
-

My take is: latest for PRs and for releases the currently supported ones. That 
means though that we will need to keep up with client upgrades even on master 
and monitor the client project. Hopefully that will minimize the chance of 
something being broken at release time.
If people want to use an old k8s version they should use an old Spark release.


was (Author: skonto):
My take is: latest for PRs and for releases the currently supported ones. That 
means though that we will need to keep up with client upgrades even on master 
and monitor the client project. Hopefully that will minimize the chance of 
something being broken at release time.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786132#comment-16786132
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

My take is: latest for PRs and for releases the currently supported ones. That 
means though that we will need to keep up with client upgrades even on master 
and monitor the client project. Hopefully that will minimize the chance of 
something being broken at release time.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Shahid K I (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shahid K I updated SPARK-27075:
---
Attachment: image-2019-03-07-02-37-20-453.png

> Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
> 
>
> Key: SPARK-27075
> URL: https://issues.apache.org/jira/browse/SPARK-27075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Major
> Attachments: image-2019-03-07-02-37-20-453.png
>
>
> Test steps:
> 1) bin/spark-sql
> 2) run some queries
> 3) Open SQL page in the webui
> 4) Try to sort any column in the execution table.
> file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Shahid K I (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shahid K I updated SPARK-27075:
---
Description: 
Test steps:
 1) bin/spark-sql
 2) run some queries
 3) Open SQL page in the webui
 4) Try to sort any column in the execution table.

!image-2019-03-07-02-37-20-453.png!

  was:
Test steps:
1) bin/spark-sql
2) run some queries
3) Open SQL page in the webui
4) Try to sort any column in the execution table.

file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png






> Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
> 
>
> Key: SPARK-27075
> URL: https://issues.apache.org/jira/browse/SPARK-27075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Major
> Attachments: image-2019-03-07-02-37-20-453.png
>
>
> Test steps:
>  1) bin/spark-sql
>  2) run some queries
>  3) Open SQL page in the webui
>  4) Try to sort any column in the execution table.
> !image-2019-03-07-02-37-20-453.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Shahid K I (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786122#comment-16786122
 ] 

Shahid K I commented on SPARK-27075:


I will raise a PR

> Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
> 
>
> Key: SPARK-27075
> URL: https://issues.apache.org/jira/browse/SPARK-27075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Major
>
> Test steps:
> 1) bin/spark-sql
> 2) run some queries
> 3) Open SQL page in the webui
> 4) Try to sort any column in the execution table.
> file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27075:


Assignee: Apache Spark

> Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
> 
>
> Key: SPARK-27075
> URL: https://issues.apache.org/jira/browse/SPARK-27075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Assignee: Apache Spark
>Priority: Major
>
> Test steps:
> 1) bin/spark-sql
> 2) run some queries
> 3) Open SQL page in the webui
> 4) Try to sort any column in the execution table.
> file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27075:


Assignee: (was: Apache Spark)

> Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'
> 
>
> Key: SPARK-27075
> URL: https://issues.apache.org/jira/browse/SPARK-27075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Web UI
>Affects Versions: 3.0.0
>Reporter: Shahid K I
>Priority: Major
>
> Test steps:
> 1) bin/spark-sql
> 2) run some queries
> 3) Open SQL page in the webui
> 4) Try to sort any column in the execution table.
> file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786129#comment-16786129
 ] 

shane knapp commented on SPARK-26742:
-

alright, last thing before these two PRs are mergeable:

*which version of k8s do we want to test against?*

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-27075) Sorting table column in SQL WEBUI page throws 'IllegalArgumentException'

2019-03-06 Thread Shahid K I (JIRA)
Shahid K I created SPARK-27075:
--

 Summary: Sorting table column in SQL WEBUI page throws 
'IllegalArgumentException'
 Key: SPARK-27075
 URL: https://issues.apache.org/jira/browse/SPARK-27075
 Project: Spark
  Issue Type: Bug
  Components: SQL, Web UI
Affects Versions: 3.0.0
Reporter: Shahid K I


Test steps:
1) bin/spark-sql
2) run some queries
3) Open SQL page in the webui
4) Try to sort any column in the execution table.

file:///home/root1/Documents/Screenshot%20from%202019-03-07%2001-38-17.png







--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786102#comment-16786102
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 8:29 PM:
-

Yeah I was finally able to do so thanks. I was also hitting this one: 
https://github.com/kubernetes/kubeadm/issues/992
If anyone faces that he just needs to install `crictl` under `/usr/bin` after 
he builds it from source.


was (Author: skonto):
Yeah I was able to do so thanks. I was also hitting this one: 
https://github.com/kubernetes/kubeadm/issues/992
If anyone faces that he just needs to install `crictl` under `/usr/bin` after 
he builds it from source.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786102#comment-16786102
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

Yeah I was able to do so thanks. I was also hitting this one: 
https://github.com/kubernetes/kubeadm/issues/992
If anyone faces that he just needs to install `crictl` under `/usr/bin` after 
he builds it from source.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786097#comment-16786097
 ] 

shane knapp commented on SPARK-26742:
-

[~skonto] -- i was able to downgrade k8s successfully w/o deleting the 
.minikube and .kube dirs:


{noformat}
minikube stop; minikube delete; minikube --vm-driver=kvm2 start --memory 6000 
--cpus 8 --kubernetes-version=v1.12.6
{noformat}


> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786074#comment-16786074
 ] 

shane knapp commented on SPARK-26742:
-

[~skonto] the only problems i've had are when i'm going backwards w/versions, 
not forwards.

also, k8s 1.14.x will be released in ~2-3 weeks (from my friend on the k8s 
team)...  this means that v1.11.x will no longer be officially supported.

i feel that we should target 1.13.x right now for our testing infra.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786069#comment-16786069
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

That is awesome!

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786068#comment-16786068
 ] 

shane knapp commented on SPARK-26742:
-

1.12.6 passes for me as well:


{noformat}
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 175 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 6 minutes, 32 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.009 s]
[INFO] Spark Project Tags . SUCCESS [  2.767 s]
[INFO] Spark Project Local DB . SUCCESS [  1.973 s]
[INFO] Spark Project Networking ... SUCCESS [  3.491 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.878 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.948 s]
[INFO] Spark Project Launcher . SUCCESS [  3.866 s]
[INFO] Spark Project Core . SUCCESS [ 23.852 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [07:32 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 08:15 min
[INFO] Finished at: 2019-03-06T11:40:06-08:00
[INFO] 
jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh
minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 
--kubernetes-version=v1.12.6
{noformat}


> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786067#comment-16786067
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 7:40 PM:
-

I did all that plus rm /etc/kubernetes, I will try once more (maybe missed 
.kube). I actually followed this: 
https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842


was (Author: skonto):
I did all that plus rm /etc/kubernetes, I will try once more. I actually 
followed this: 
https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786067#comment-16786067
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

I did all that plus rm /etc/kubernetes, I will try once more. I actually 
followed this: 
https://github.com/kubernetes/minikube/issues/1043#issuecomment-354453842

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27023) Kubernetes client timeouts should be configurable

2019-03-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27023:
--
Issue Type: Improvement  (was: New Feature)

> Kubernetes client timeouts should be configurable
> -
>
> Key: SPARK-27023
> URL: https://issues.apache.org/jira/browse/SPARK-27023
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Onur Satici
>Priority: Major
>
> Kubernetes clients used in driver submission, in client mode and in 
> requesting executors should have configurable read and connect timeouts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786060#comment-16786060
 ] 

shane knapp edited comment on SPARK-26742 at 3/6/19 7:33 PM:
-

i was unable to get 1.11.7 to work until i did a {noformat}minikube stop && 
minikube delete; rm -rf ~/.minikube ~/.kube{noformat}.

testing against k8s 1.12.6 now.


was (Author: shaneknapp):
i was unable to get 1.11.7 to work until i did a `minikube stop && minikube 
delete; rm -rf ~/.minikube ~/.kube`.

testing against k8s 1.12.6 now.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786060#comment-16786060
 ] 

shane knapp commented on SPARK-26742:
-

i was unable to get 1.11.7 to work until i did a `minikube stop && minikube 
delete; rm -rf ~/.minikube ~/.kube`.

testing against k8s 1.12.6 now.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786056#comment-16786056
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

I mean `minikube` binary because that flag does not work when passing that 
`--kubernetes-version=v1.11.7` on my aws instance. Minikube never starts, but 
that is for none driver, good to know kvm works.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27063) Spark on K8S Integration Tests timeouts are too short for some test clusters

2019-03-06 Thread Rob Vesse (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786051#comment-16786051
 ] 

Rob Vesse commented on SPARK-27063:
---

[~skonto] Yes we have experienced the same problem, I think my next PR for this 
will look to make that overall timeout user configurable

> Spark on K8S Integration Tests timeouts are too short for some test clusters
> 
>
> Key: SPARK-27063
> URL: https://issues.apache.org/jira/browse/SPARK-27063
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Rob Vesse
>Priority: Minor
>
> As noted during development for SPARK-26729 there are a couple of integration 
> test timeouts that are too short when running on slower clusters e.g. 
> developers laptops, small CI clusters etc
> [~skonto] confirmed that he has also experienced this behaviour in the 
> discussion on PR [PR 
> 23846|https://github.com/apache/spark/pull/23846#discussion_r262564938]
> We should up the defaults of this timeouts as an initial step and longer term 
> consider making the timeouts themselves configurable



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786047#comment-16786047
 ] 

shane knapp commented on SPARK-26742:
-

works for me against k8s v1.11.7:


{noformat}
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 181 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 7 minutes, 1 second.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.010 s]
[INFO] Spark Project Tags . SUCCESS [  2.829 s]
[INFO] Spark Project Local DB . SUCCESS [  2.144 s]
[INFO] Spark Project Networking ... SUCCESS [  3.455 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.902 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.977 s]
[INFO] Spark Project Launcher . SUCCESS [  4.040 s]
[INFO] Spark Project Core . SUCCESS [ 24.034 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 08:46 min
[INFO] Finished at: 2019-03-06T11:15:46-08:00
[INFO] 
jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh
minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 
--kubernetes-version=v1.11.7
{noformat}


> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786047#comment-16786047
 ] 

shane knapp edited comment on SPARK-26742 at 3/6/19 7:17 PM:
-

your PR works for me against k8s v1.11.7:


{noformat}
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 181 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 7 minutes, 1 second.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.010 s]
[INFO] Spark Project Tags . SUCCESS [  2.829 s]
[INFO] Spark Project Local DB . SUCCESS [  2.144 s]
[INFO] Spark Project Networking ... SUCCESS [  3.455 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.902 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.977 s]
[INFO] Spark Project Launcher . SUCCESS [  4.040 s]
[INFO] Spark Project Core . SUCCESS [ 24.034 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 08:46 min
[INFO] Finished at: 2019-03-06T11:15:46-08:00
[INFO] 
jenkins@ubuntu-testing:~/src/spark$ grep version run-int-tests.sh
minikube --vm-driver=kvm2 start --memory 6000 --cpus 8 
--kubernetes-version=v1.11.7
{noformat}



was (Author: shaneknapp):
works for me against k8s v1.11.7:


{noformat}
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 181 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 7 minutes, 1 second.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.010 s]
[INFO] Spark Project Tags . SUCCESS [  2.829 s]
[INFO] Spark Project Local DB . SUCCESS [  2.144 s]
[INFO] Spark Project Networking ... SUCCESS [  3.455 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.902 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.977 s]
[INFO] Spark Project Launcher . SUCCESS [  4.040 s]
[INFO] Spark Project Core . SUCCESS [ 24.034 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:03 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 08:46 min
[INFO] Finished at: 2019-03-06T11:15:46-08:00
[INFO] 

[jira] [Resolved] (SPARK-27023) Kubernetes client timeouts should be configurable

2019-03-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27023.
---
   Resolution: Fixed
 Assignee: Onur Satici
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23928

> Kubernetes client timeouts should be configurable
> -
>
> Key: SPARK-27023
> URL: https://issues.apache.org/jira/browse/SPARK-27023
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Onur Satici
>Assignee: Onur Satici
>Priority: Major
> Fix For: 3.0.0
>
>
> Kubernetes clients used in driver submission, in client mode and in 
> requesting executors should have configurable read and connect timeouts



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786019#comment-16786019
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:59 PM:
-

Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version=v1.11.7  has not worked so far for me using the latest 
minikube binary. So I am planning to test with older minikube versions which 
have as default versions other than 1.13.


was (Author: skonto):
Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version has not worked so far for me using the latest minikube 
binary. So I am planning to test with older minikube versions which have as 
default versions other than 1.13.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786023#comment-16786023
 ] 

shane knapp commented on SPARK-26742:
-

i think you mean:  s/minikube/k8s

anyways, exactly which version of k8s do we want to be testing against?

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27006) SPIP: .NET bindings for Apache Spark

2019-03-06 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786030#comment-16786030
 ] 

Sean Owen commented on SPARK-27006:
---

You can use the Apache license, and follow Apache processes, without this code 
going into Spark. Heck it's even possible that you propose this as a top-level 
ASF project, though I don't think that ultimately makes sense. You can also of 
course solicit feedback here.

It's not possible to give you a dev branch or any access to ASF repos unless 
you're a committer. But, you can of course fork the Github repo and do whatever 
you want. Anyone you allow can contribute. You can sync your fork's master and 
integrate as often as you like.

This kind of prejudges that it's going to be merged into Spark, and i think 
that's highly unlikely. I don't think you need permission or oversight from 
anyone on Spark as a result.

You can announce the work on ASF lists, ask for feedback, publish your packages 
as you like.

The problem is that the (positive) idea of making sure your bindings stay up to 
date with Spark has a cost: now the whole project bears responsibility for not 
breaking it, updating it, releasing it. You may contribute a lot of that work, 
or intend to. But a change of this scope is going to inevitably put a lot of 
work on others. My opinion is it won't be worth it -- not because this isn't 
valuable, but because it's equally valuable in the form it is now as a separate 
project. You bear the burden of keeping it up to date, sure, but that's 
intended.

> SPIP: .NET bindings for Apache Spark
> 
>
> Key: SPARK-27006
> URL: https://issues.apache.org/jira/browse/SPARK-27006
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Minor
>   Original Estimate: 4,032h
>  Remaining Estimate: 4,032h
>
> h4. Background and Motivation: 
> Apache Spark provides programming language support for Scala/Java (native), 
> and extensions for Python and R. While a variety of other language extensions 
> are possible to include in Apache Spark, .NET would bring one of the largest 
> developer community to the table. Presently, no good Big Data solution exists 
> for .NET developers in open source.  This SPIP aims at discussing how we can 
> bring Apache Spark goodness to the .NET development platform.  
> .NET is a free, cross-platform, open source developer platform for building 
> many different types of applications. With .NET, you can use multiple 
> languages, editors, and libraries to build for web, mobile, desktop, gaming, 
> and IoT types of applications. Even with .NET serving millions of developers, 
> there is no good Big Data solution that exists today, which this SPIP aims to 
> address.  
> The .NET developer community is one of the largest programming language 
> communities in the world. Its flagship programming language C# is listed as 
> one of the most popular programming languages in a variety of articles and 
> statistics: 
>  * Most popular Technologies on Stack Overflow: 
> [https://insights.stackoverflow.com/survey/2018/#most-popular-technologies|https://insights.stackoverflow.com/survey/2018/]
>   
>  * Most popular languages on GitHub 2018: 
> [https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10#2-java-9|https://www.businessinsider.com/the-10-most-popular-programming-languages-according-to-github-2018-10]
>  
>  * 1M+ new developers last 1 year  
>  * Second most demanded technology on LinkedIn 
>  * Top 30 High velocity OSS projects on GitHub 
> Including a C# language extension in Apache Spark will enable millions of 
> .NET developers to author Big Data applications in their preferred 
> programming language, developer environment, and tooling support. We aim to 
> promote the .NET bindings for Spark through engagements with the Spark 
> community (e.g., we are scheduled to present an early prototype at the SF 
> Spark Summit 2019) and the .NET developer community (e.g., similar 
> presentations will be held at .NET developer conferences this year).  As 
> such, we believe that our efforts will help grow the Spark community by 
> making it accessible to the millions of .NET developers. 
> Furthermore, our early discussions with some large .NET development teams got 
> an enthusiastic reception. 
> We recognize that earlier attempts at this goal (specifically Mobius 
> [https://github.com/Microsoft/Mobius]) were unsuccessful primarily due to the 
> lack of communication with the Spark community. Therefore, another goal of 
> this proposal is to not only develop .NET bindings for Spark in open source, 
> but also continuously seek feedback from the Spark community via posted 
> Jira’s (like this one) and the Spark 

[jira] [Commented] (SPARK-18748) UDF multiple evaluations causes very poor performance

2019-03-06 Thread Qingbo Hu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786029#comment-16786029
 ] 

Qingbo Hu commented on SPARK-18748:
---

We have the same problem when using Spark structured streaming. This is a 
critical problem for us, since our UDF includes a counter that increases at 
every time it gets called, and the output of the UDF depends on this count. If 
the UDF gets executed multiple times when a field is referred, it will cause 
the output of our UDF incorrect.

We cannot use cache() in this case, since we are in structured streaming.

> UDF multiple evaluations causes very poor performance
> -
>
> Key: SPARK-18748
> URL: https://issues.apache.org/jira/browse/SPARK-18748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.0
>Reporter: Ohad Raviv
>Priority: Major
>
> We have a use case where we have a relatively expensive UDF that needs to be 
> calculated. The problem is that instead of being calculated once, it gets 
> calculated over and over again.
> for example:
> {quote}
> def veryExpensiveCalc(str:String) = \{println("blahblah1"); "nothing"\}
> hiveContext.udf.register("veryExpensiveCalc", veryExpensiveCalc _)
> hiveContext.sql("select * from (select veryExpensiveCalc('a') c)z where c is 
> not null and c<>''").show
> {quote}
> with the output:
> {quote}
> blahblah1
> blahblah1
> blahblah1
> +---+
> |  c|
> +---+
> |nothing|
> +---+
> {quote}
> You can see that for each reference of column "c" you will get the println.
> that causes very poor performance for our real use case.
> This also came out on StackOverflow:
> http://stackoverflow.com/questions/40320563/spark-udf-called-more-than-once-per-record-when-df-has-too-many-columns
> http://stackoverflow.com/questions/34587596/trying-to-turn-a-blob-into-multiple-columns-in-spark/
> with two problematic work-arounds:
> 1. cache() after the first time. e.g.
> {quote}
> hiveContext.sql("select veryExpensiveCalc('a') as c").cache().where("c is not 
> null and c<>''").show
> {quote}
> while it works, in our case we can't do that because the table is too big to 
> cache.
> 2. move back and forth to rdd:
> {quote}
> val df = hiveContext.sql("select veryExpensiveCalc('a') as c")
> hiveContext.createDataFrame(df.rdd, df.schema).where("c is not null and 
> c<>''").show
> {quote}
> which works but then we loose some of the optimizations like push down 
> predicate features, etc. and its very ugly.
> Any ideas on how we can make the UDF get calculated just once in a reasonable 
> way?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786012#comment-16786012
 ] 

shane knapp commented on SPARK-26742:
-

2.4 k8s integration tests pass w/the client upgrade to 4.1.2:


{noformat}
[INFO]
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ 
spark-kubernetes-integration-tests_2.11 ---
Discovery starting.
Discovery completed in 184 milliseconds.
Run starting. Expected test count is: 14
KubernetesSuite:
- Run SparkPi with no resources
- Run SparkPi with a very long application name.
- Use SparkLauncher.NO_RESOURCE
- Run SparkPi with a master URL without a scheme.
- Run SparkPi with an argument.
- Run SparkPi with custom labels, annotations, and environment variables.
- Run extraJVMOptions check on driver
- Run SparkRemoteFileTest using a remote data file
- Run SparkPi with env and mount secrets.
- Run PySpark on simple pi.py example
- Run PySpark with Python2 to test a pyfiles example
- Run PySpark with Python3 to test a pyfiles example
- Run PySpark with memory customization
- Run in client mode.
Run completed in 7 minutes, 24 seconds.
Total number of tests run: 14
Suites: completed 2, aborted 0
Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
All tests passed.
[INFO] 
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM 2.4.2-SNAPSHOT  SUCCESS [  3.211 s]
[INFO] Spark Project Tags . SUCCESS [  3.515 s]
[INFO] Spark Project Local DB . SUCCESS [  2.181 s]
[INFO] Spark Project Networking ... SUCCESS [  3.738 s]
[INFO] Spark Project Shuffle Streaming Service  SUCCESS [  1.891 s]
[INFO] Spark Project Unsafe ... SUCCESS [  1.999 s]
[INFO] Spark Project Launcher . SUCCESS [  3.926 s]
[INFO] Spark Project Core . SUCCESS [ 23.593 s]
[INFO] Spark Project Kubernetes Integration Tests 2.4.2-SNAPSHOT SUCCESS [08:35 
min]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 09:19 min
[INFO] Finished at: 2019-03-06T10:48:55-08:00
[INFO] 
{noformat}


> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786019#comment-16786019
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:59 PM:
-

Cool! The question is does it pass with minikube 1.11 and 1.12 as well? Passing 
--kubernetes-version=v1.11.7  has not worked so far for me using the latest 
minikube binary. So I am planning to test with older minikube versions which 
have as default versions other than 1.13.


was (Author: skonto):
Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version=v1.11.7  has not worked so far for me using the latest 
minikube binary. So I am planning to test with older minikube versions which 
have as default versions other than 1.13.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786019#comment-16786019
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:58 PM:
-

Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version has not worked so far for me using the latest minikube 
binary. So I am planning to test with older minikube versions which have as 
default not 1.13.


was (Author: skonto):
Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version ahs not worked so far for me. So I am planning to test 
with older minikube versions which have as default not 1.13.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786019#comment-16786019
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:58 PM:
-

Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version has not worked so far for me using the latest minikube 
binary. So I am planning to test with older minikube versions which have as 
default versions other than 1.13.


was (Author: skonto):
Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version has not worked so far for me using the latest minikube 
binary. So I am planning to test with older minikube versions which have as 
default not 1.13.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786019#comment-16786019
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

Cool! The question is does it pass with minikube 1.11 and 1.10 as well? Passing 
--kubernetes-version ahs not worked so far for me. So I am planning to test 
with older minikube versions which have as default not 1.13.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27006) SPIP: .NET bindings for Apache Spark

2019-03-06 Thread Terry Kim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786001#comment-16786001
 ] 

Terry Kim commented on SPARK-27006:
---

Thanks for the feedback. We are happy to work on addressing some of the voiced 
concerns. For example, we are already working on contributing the interop layer 
work ([SPARK-26257|https://issues.apache.org/jira/browse/SPARK-26257]), that 
should alleviate the code duplication problem, and we are planning on 
contributing it first.

We are really looking forward to having early community engagement and feedback 
on this SPIP, and want to avoid a long discussion with the community once the 
PR is ready. Thus, having an open development under full view of the Apache 
Spark community is important to us. Furthermore, we think it is legally 
beneficial if we can do the work under the established rules and guidelines of 
the Apache Software Foundation (including aspects such as legal contribution 
framework, naming etc.), instead of having to build it under a different legal 
framework (that for example may require a different contribution agreement and 
will be painful to transfer into the ASF).

Therefore, we are interested in finding a way that allows us to mitigate 
[~srowen]'s concern about maintaining a whole other language and copy of the 
APIs and to be under the Apache Foundation umbrella. Towards addressing this, 
we’d like to propose the following: Use a development branch on Apache Spark 
master repo shepherded by an Apache Spark PMC member of the community’s choice 
to ensure (a) community visibility and (b) obtain alignment from the broader 
community on a continuous basis. With a shepherd, the core community can ensure 
that the feature branch does not go off the rails and can be merged back at the 
appropriate time. The SPIP proposers and other interested community members 
will explicitly undertake majority of the code vetting and QA work.
In short, we can have two ASF branches: 
• master - main branch intended for release - does not include the in progress 
work for [SPARK-27006|https://issues.apache.org/jira/browse/SPARK-27006]
• SPARK-27006 feature branch - branch for shared development work to make the 
changes needed for this SPIP shepherded by a PMC member of the community’s 
choice
 
Subsequently, the development process would be:
• We and anyone who would like to contribute will work in our/their fork of the 
SPARK-27006 branch, issues regular and frequent PRs against that branch for 
review by the broader community
• We periodically (weekly or any other appropriate interval) merge master into 
the SPARK-27006 branch to ensure alignment with ongoing work in master
• After the work is deemed complete, the shepherd makes the final call on 
whether the work meets the expectations in a way that does not affect the 
project’s core guiding principles 
 
Eventually, the SPARK-27006 branch is merged by the shepherd into master once 
we obtain agreement from the broader community to bring the project in.
 
[~srowen], [~dongjoon] (and other community members), would this proposal 
address your concerns? Or are there other established patterns within the 
Apache Foundation that have worked in the past?
 
[~dongjoon] Thanks for pointing it out about the guide.

> SPIP: .NET bindings for Apache Spark
> 
>
> Key: SPARK-27006
> URL: https://issues.apache.org/jira/browse/SPARK-27006
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Minor
>   Original Estimate: 4,032h
>  Remaining Estimate: 4,032h
>
> h4. Background and Motivation: 
> Apache Spark provides programming language support for Scala/Java (native), 
> and extensions for Python and R. While a variety of other language extensions 
> are possible to include in Apache Spark, .NET would bring one of the largest 
> developer community to the table. Presently, no good Big Data solution exists 
> for .NET developers in open source.  This SPIP aims at discussing how we can 
> bring Apache Spark goodness to the .NET development platform.  
> .NET is a free, cross-platform, open source developer platform for building 
> many different types of applications. With .NET, you can use multiple 
> languages, editors, and libraries to build for web, mobile, desktop, gaming, 
> and IoT types of applications. Even with .NET serving millions of developers, 
> there is no good Big Data solution that exists today, which this SPIP aims to 
> address.  
> The .NET developer community is one of the largest programming language 
> communities in the world. Its flagship programming language C# is listed as 
> one of the most popular programming languages in a variety of articles and 
> statistics: 
>  * Most popular Technologies on Stack 

[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785987#comment-16785987
 ] 

shane knapp commented on SPARK-26742:
-

ok, i found the 2.4 PR:  https://github.com/apache/spark/pull/23993

testing now.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27045) SQL tab in UI shows actual SQL instead of callsite

2019-03-06 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27045:

Summary: SQL tab in UI shows actual SQL instead of callsite  (was: SQL tab 
in UI shows callsite instead of actual SQL)

> SQL tab in UI shows actual SQL instead of callsite
> --
>
> Key: SPARK-27045
> URL: https://issues.apache.org/jira/browse/SPARK-27045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.3.3, 3.0.0
>Reporter: Ajith S
>Priority: Major
> Attachments: image-2019-03-04-18-24-27-469.png, 
> image-2019-03-04-18-24-54-053.png
>
>
> When we run sql in spark ( for example via thrift server), the SparkUI SQL 
> tab must show SQL instead of stacktrace which is more useful to end user. 
> Instead in description column it currently shows the callsite shortform which 
> is less useful
>  Actual:
> !image-2019-03-04-18-24-27-469.png!
>  
> Expected:
> !image-2019-03-04-18-24-54-053.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27045) SQL tab in UI shows callsite instead of actual SQL

2019-03-06 Thread Ajith S (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajith S updated SPARK-27045:

Issue Type: Improvement  (was: Bug)

> SQL tab in UI shows callsite instead of actual SQL
> --
>
> Key: SPARK-27045
> URL: https://issues.apache.org/jira/browse/SPARK-27045
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Web UI
>Affects Versions: 2.3.2, 2.3.3, 3.0.0
>Reporter: Ajith S
>Priority: Major
> Attachments: image-2019-03-04-18-24-27-469.png, 
> image-2019-03-04-18-24-54-053.png
>
>
> When we run sql in spark ( for example via thrift server), the SparkUI SQL 
> tab must show SQL instead of stacktrace which is more useful to end user. 
> Instead in description column it currently shows the callsite shortform which 
> is less useful
>  Actual:
> !image-2019-03-04-18-24-27-469.png!
>  
> Expected:
> !image-2019-03-04-18-24-54-053.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785983#comment-16785983
 ] 

shane knapp commented on SPARK-26742:
-

please link to the PRs (master + 2.4) here and i can test them independently.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785962#comment-16785962
 ] 

Stavros Kontopoulos edited comment on SPARK-26742 at 3/6/19 6:14 PM:
-

I created one for the 2.4 branch just in case we want to test and upgrade.


was (Author: skonto):
I created one for the 2.4 branch.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785962#comment-16785962
 ] 

Stavros Kontopoulos commented on SPARK-26742:
-

I created one for the 2.4 branch.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26742) Bump Kubernetes Client Version to 4.1.2

2019-03-06 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785970#comment-16785970
 ] 

shane knapp commented on SPARK-26742:
-

you can submit a PR now.  it will fail on the existing ubuntu workers, but i 
can test it manually on the staging system that has the upgraded/deployed 
minikube + k8s.

just throw the link to the PR in here and it'll take me ~30 mins to confirm 
that the changes work.

if my build passes, i will temporarily take the ubuntu workers out of jenkins, 
update one to use the new minikube/k8s and re-trigger the test on that PR.

if *that* build passes, i'll update the remaining production ubuntu workers and 
put them back in to rotation.

then we can merge the client upgrade PR.

sound good?  i have the time to get this done right now.

> Bump Kubernetes Client Version to 4.1.2
> ---
>
> Key: SPARK-26742
> URL: https://issues.apache.org/jira/browse/SPARK-26742
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Steve Davids
>Priority: Major
>  Labels: easyfix
> Fix For: 3.0.0
>
>
> Spark 2.x is using Kubernetes Client 3.x which is pretty old, the master 
> branch has 4.0, the client should be upgraded to 4.1.1 to have the broadest 
> Kubernetes compatibility support for newer clusters: 
> https://github.com/fabric8io/kubernetes-client#compatibility-matrix



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27065) avoid more than one active task set managers for a stage

2019-03-06 Thread Imran Rashid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid resolved SPARK-27065.
--
   Resolution: Fixed
Fix Version/s: 2.3.4
   2.4.1
   3.0.0

Issue resolved by pull request 23927
[https://github.com/apache/spark/pull/23927]

> avoid more than one active task set managers for a stage
> 
>
> Key: SPARK-27065
> URL: https://issues.apache.org/jira/browse/SPARK-27065
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.3.3, 2.4.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0, 2.4.1, 2.3.4
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple tim

2019-03-06 Thread Imran Rashid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid resolved SPARK-25250.
--
Resolution: Fixed

> Race condition with tasks running when new attempt for same stage is created 
> leads to other task in the next attempt running on the same partition id 
> retry multiple times
> --
>
> Key: SPARK-25250
> URL: https://issues.apache.org/jira/browse/SPARK-25250
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.1
>Reporter: Parth Gandhi
>Assignee: Parth Gandhi
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> We recently had a scenario where a race condition occurred when a task from 
> previous stage attempt just finished before new attempt for the same stage 
> was created due to fetch failure, so the new task created in the second 
> attempt on the same partition id was retrying multiple times due to 
> TaskCommitDenied Exception without realizing that the task in earlier attempt 
> was already successful.  
> For example, consider a task with partition id 9000 and index 9000 running in 
> stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. 
> Just within this timespan, the above task completes successfully, thus, 
> marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has 
> not yet been created, the taskset info for that stage is not available to the 
> TaskScheduler so, naturally, the partition id 9000 has not been marked 
> completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same 
> partition id 9000. This task fails due to CommitDeniedException and since, it 
> does not see the corresponding partition id as been marked successful, it 
> keeps retrying multiple times until the job finally succeeds. It doesn't 
> cause any job failures because the DAG scheduler is tracking the partitions 
> separate from the task set managers.
>  
> Steps to Reproduce:
>  # Run any large job involving shuffle operation.
>  # When the ShuffleMap stage finishes and the ResultStage begins running, 
> cause this stage to throw a fetch failure exception(Try deleting certain 
> shuffle files on any host).
>  # Observe the task attempt numbers for the next stage attempt. Please note 
> that this issue is an intermittent one, so it might not happen all the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple time

2019-03-06 Thread Imran Rashid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid updated SPARK-25250:
-
Fix Version/s: 3.0.0
   2.4.1

> Race condition with tasks running when new attempt for same stage is created 
> leads to other task in the next attempt running on the same partition id 
> retry multiple times
> --
>
> Key: SPARK-25250
> URL: https://issues.apache.org/jira/browse/SPARK-25250
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.1
>Reporter: Parth Gandhi
>Assignee: Parth Gandhi
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> We recently had a scenario where a race condition occurred when a task from 
> previous stage attempt just finished before new attempt for the same stage 
> was created due to fetch failure, so the new task created in the second 
> attempt on the same partition id was retrying multiple times due to 
> TaskCommitDenied Exception without realizing that the task in earlier attempt 
> was already successful.  
> For example, consider a task with partition id 9000 and index 9000 running in 
> stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. 
> Just within this timespan, the above task completes successfully, thus, 
> marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has 
> not yet been created, the taskset info for that stage is not available to the 
> TaskScheduler so, naturally, the partition id 9000 has not been marked 
> completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same 
> partition id 9000. This task fails due to CommitDeniedException and since, it 
> does not see the corresponding partition id as been marked successful, it 
> keeps retrying multiple times until the job finally succeeds. It doesn't 
> cause any job failures because the DAG scheduler is tracking the partitions 
> separate from the task set managers.
>  
> Steps to Reproduce:
>  # Run any large job involving shuffle operation.
>  # When the ShuffleMap stage finishes and the ResultStage begins running, 
> cause this stage to throw a fetch failure exception(Try deleting certain 
> shuffle files on any host).
>  # Observe the task attempt numbers for the next stage attempt. Please note 
> that this issue is an intermittent one, so it might not happen all the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25250) Race condition with tasks running when new attempt for same stage is created leads to other task in the next attempt running on the same partition id retry multiple tim

2019-03-06 Thread Imran Rashid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-25250:


Assignee: Parth Gandhi

> Race condition with tasks running when new attempt for same stage is created 
> leads to other task in the next attempt running on the same partition id 
> retry multiple times
> --
>
> Key: SPARK-25250
> URL: https://issues.apache.org/jira/browse/SPARK-25250
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core
>Affects Versions: 2.3.1
>Reporter: Parth Gandhi
>Assignee: Parth Gandhi
>Priority: Major
>
> We recently had a scenario where a race condition occurred when a task from 
> previous stage attempt just finished before new attempt for the same stage 
> was created due to fetch failure, so the new task created in the second 
> attempt on the same partition id was retrying multiple times due to 
> TaskCommitDenied Exception without realizing that the task in earlier attempt 
> was already successful.  
> For example, consider a task with partition id 9000 and index 9000 running in 
> stage 4.0. We see a fetch failure so thus, we spawn a new stage attempt 4.1. 
> Just within this timespan, the above task completes successfully, thus, 
> marking the partition id 9000 as complete for 4.0. However, as stage 4.1 has 
> not yet been created, the taskset info for that stage is not available to the 
> TaskScheduler so, naturally, the partition id 9000 has not been marked 
> completed for 4.1. Stage 4.1 now spawns task with index 2000 on the same 
> partition id 9000. This task fails due to CommitDeniedException and since, it 
> does not see the corresponding partition id as been marked successful, it 
> keeps retrying multiple times until the job finally succeeds. It doesn't 
> cause any job failures because the DAG scheduler is tracking the partitions 
> separate from the task set managers.
>  
> Steps to Reproduce:
>  # Run any large job involving shuffle operation.
>  # When the ShuffleMap stage finishes and the ResultStage begins running, 
> cause this stage to throw a fetch failure exception(Try deleting certain 
> shuffle files on any host).
>  # Observe the task attempt numbers for the next stage attempt. Please note 
> that this issue is an intermittent one, so it might not happen all the time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2019-03-06 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-24669.
---
   Resolution: Fixed
 Assignee: Udbhav Agrawal
Fix Version/s: 3.0.0
   2.4.2
   2.3.4

This is resolved via https://github.com/apache/spark/pull/23905 . Thank you, 
[~Udbhav Agrawal]. I added you to Apache Spark Contributor group.

> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Dong Jiang
>Assignee: Udbhav Agrawal
>Priority: Major
> Fix For: 2.3.4, 2.4.2, 3.0.0
>
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}
> Same sequence failed in 2.3.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24624) Can not mix vectorized and non-vectorized UDFs

2019-03-06 Thread peay (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785920#comment-16785920
 ] 

peay commented on SPARK-24624:
--

Are there plans to support something similar for aggregation functions?

> Can not mix vectorized and non-vectorized UDFs
> --
>
> Key: SPARK-24624
> URL: https://issues.apache.org/jira/browse/SPARK-24624
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Xiao Li
>Assignee: Li Jin
>Priority: Major
> Fix For: 2.4.0
>
>
> In the current impl, we have the limitation: users are unable to mix 
> vectorized and non-vectorized UDFs in same Project. This becomes worse since 
> our optimizer could combine continuous Projects into a single one. For 
> example, 
> {code}
> applied_df = df.withColumn('regular', my_regular_udf('total', 
> 'qty')).withColumn('pandas', my_pandas_udf('total', 'qty'))
> {code}
> Returns the following error. 
> {code}
> IllegalArgumentException: Can not mix vectorized and non-vectorized UDFs
> java.lang.IllegalArgumentException: Can not mix vectorized and non-vectorized 
> UDFs
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:170)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$6.apply(ExtractPythonUDFs.scala:146)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at scala.collection.immutable.List.foreach(List.scala:381)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>  at scala.collection.immutable.List.map(List.scala:285)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.org$apache$spark$sql$execution$python$ExtractPythonUDFs$$extract(ExtractPythonUDFs.scala:146)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:118)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$$anonfun$apply$2.applyOrElse(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:312)
>  at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:77)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:311)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$8.apply(TreeNode.scala:331)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:208)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:329)
>  at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:309)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:114)
>  at 
> org.apache.spark.sql.execution.python.ExtractPythonUDFs$.apply(ExtractPythonUDFs.scala:94)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$prepareForExecution$1.apply(QueryExecution.scala:113)
>  at 
> scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:124)
>  at scala.collection.immutable.List.foldLeft(List.scala:84)
>  at 
> org.apache.spark.sql.execution.QueryExecution.prepareForExecution(QueryExecution.scala:113)
>  at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:100)
>  at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:99)
>  at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3312)
>  at org.apache.spark.sql.Dataset.collectResult(Dataset.scala:2750)
>  ...
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (SPARK-24669) Managed table was not cleared of path after drop database cascade

2019-03-06 Thread Udbhav Agrawal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785900#comment-16785900
 ] 

Udbhav Agrawal commented on SPARK-24669:


Thankyou [~dongjoon]

> Managed table was not cleared of path after drop database cascade
> -
>
> Key: SPARK-24669
> URL: https://issues.apache.org/jira/browse/SPARK-24669
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Dong Jiang
>Assignee: Udbhav Agrawal
>Priority: Major
> Fix For: 2.3.4, 2.4.2, 3.0.0
>
>
> I can do the following in sequence
> # Create a managed table using path options
> # Drop the table via dropping the parent database cascade
> # Re-create the database and table with a different path
> # The new table shows data from the old path, not the new path
> {code}
> echo "first" > /tmp/first.csv
> echo "second" > /tmp/second.csv
> spark-shell
> spark.version
> res0: String = 2.3.0
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/first.csv')")
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> spark.sql("drop database foo cascade")
> spark.sql("create database foo")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> "note, the path is different now, pointing to second.csv, but still showing 
> data from first file"
> spark.table("foo.first").show()
> +-+
> |   id|
> +-+
> |first|
> +-+
> "now, if I drop the table explicitly, instead of via dropping database 
> cascade, then it will be the correct result"
> spark.sql("drop table foo.first")
> spark.sql("create table foo.first (id string) using csv options 
> (path='/tmp/second.csv')")
> spark.table("foo.first").show()
> +--+
> |id|
> +--+
> |second|
> +--+
> {code}
> Same sequence failed in 2.3.1 as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27072) Changing the parameter value of completedJob.sort to X prints stacktrace in sparkWebUI

2019-03-06 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16785889#comment-16785889
 ] 

Marcelo Vanzin commented on SPARK-27072:


I'd expect an error 401 here, but you get a status 200.

But in any case, this is only a bug if there's a link in the UI somewhere with 
the wrong column name. If you're changing the URL manually, well, it's your 
fault.

> Changing the parameter value of completedJob.sort to X prints stacktrace in 
> sparkWebUI
> --
>
> Key: SPARK-27072
> URL: https://issues.apache.org/jira/browse/SPARK-27072
> Project: Spark
>  Issue Type: Question
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Haripriya
>Priority: Major
>
> Manipulating the value of completedJob.sort parameter
> From
> x.x.x.x:4040/jobs/?=Description=100#completed
> To
> x.x.x.x:4040/jobs/job/?id=1=x
> is printing  Stacktrace in webUI 
>  
> java.lang.IllegalArgumentException: Unknown column: x at 
> org.apache.spark.ui.jobs.JobDataSource.ordering(AllJobsPage.scala:493) at 
> org.apache.spark.ui.jobs.JobDataSource.(AllJobsPage.scala:441) at 
> org.apache.spark.ui.jobs.JobPagedTable.(AllJobsPage.scala:533) at 
> org.apache.spark.ui.jobs.AllJobsPage.jobsTable(AllJobsPage.scala:248) at 
> org.apache.spark.ui.jobs.AllJobsPage.render(AllJobsPage.scala:297) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.WebUI$$anonfun$2.apply(WebUI.scala:84) at 
> org.apache.spark.ui.JettyUtils$$anon$3.doGet(JettyUtils.scala:90) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:687) at 
> javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at 
> org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848) 
> at 
> org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
>  at 
> org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
>  at 
> org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
>  at 
> org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>  at 
> org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
>  at 
> org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
>  at 
> org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
>  at org.spark_project.jetty.server.Server.handle(Server.java:539) at 
> org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:333) at 
> org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
>  at 
> org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
>  at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108) 
> at 
> org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
>  at 
> org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
>  at 
> org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
>  at java.lang.Thread.run(Thread.java:748)
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27074) Refactor HiveClientImpl runHive

2019-03-06 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-27074:

Description: 
Hive 3.1.1's {{CommandProcessor}} have 2 changes:
 # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is:
https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742.
 This is incorrect.
 # HIVE-18238(Hive 3.0.0) changed the {{Driver.close()}} function return type. 
This change is not compatible with the built-in Hive.

  was:
Hive 3.1.1's {{CommandProcessor}} have 2 changes:
 # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is:
[spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala|https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742]

Lines 736 to 742 in 
[02bbe97|https://github.com/apache/spark/commit/02bbe977abaf7006b845a7e99d612b0235aa0025]
| |case _ =>|
| |if (state.out != null) {|
| |// scalastyle:off println|
| |state.out.println(tokens(0) + " " + cmd_1)|
| |// scalastyle:on println|
| |}|
| |Seq(proc.run(cmd_1).getResponseCode.toString)|

This is incorrect.
 # [HIVE-18238|http://issues.apache.org/jira/browse/HIVE-18238](Hive 3.0.0) 
changed the {{Driver.close()}} function return type. This change is not 
compatible with the built-in Hive.


> Refactor HiveClientImpl runHive
> ---
>
> Key: SPARK-27074
> URL: https://issues.apache.org/jira/browse/SPARK-27074
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> Hive 3.1.1's {{CommandProcessor}} have 2 changes:
>  # HIVE-17626(Hive 3.0.0) add ReExecDriver. So the current code path is:
> https://github.com/apache/spark/blob/02bbe977abaf7006b845a7e99d612b0235aa0025/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveClientImpl.scala#L736-L742.
>  This is incorrect.
>  # HIVE-18238(Hive 3.0.0) changed the {{Driver.close()}} function return 
> type. This change is not compatible with the built-in Hive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >