[jira] [Updated] (SPARK-31790) cast scenarios may generate different results between Hive and Spark
[ https://issues.apache.org/jira/browse/SPARK-31790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] philipse updated SPARK-31790: - Description: `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n as milliseconds unit , while Spark SQL as seconds unit. so the cast result is different,please be care when you use it. For example: {code:java} In spark spark-sql> select cast(1586318188000 as timestamp); 52238-06-04 13:06:400.0 spark-sql> select cast(1586318188 as timestamp); 2020-04-08 11:56:28 In Hive hive> select cast(1586318188000 as timestamp); 2020-04-08 11:56:28 hive> select cast(1586318188 as timestamp); 1970-01-19 16:38:38.188{code} was:`CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n as milliseconds unit , while Spark SQL as seconds unit. so the cast result is different,please be care when you use it > cast scenarios may generate different results between Hive and Spark > - > > Key: SPARK-31790 > URL: https://issues.apache.org/jira/browse/SPARK-31790 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Minor > > `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n > as milliseconds unit , while Spark SQL as seconds unit. so the cast result is > different,please be care when you use it. > For example: > {code:java} > In spark > spark-sql> select cast(1586318188000 as timestamp); > 52238-06-04 13:06:400.0 > spark-sql> select cast(1586318188 as timestamp); > 2020-04-08 11:56:28 > In Hive > hive> select cast(1586318188000 as timestamp); > 2020-04-08 11:56:28 > hive> select cast(1586318188 as timestamp); > 1970-01-19 16:38:38.188{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31791) Improve cache block migration test reliability
Holden Karau created SPARK-31791: Summary: Improve cache block migration test reliability Key: SPARK-31791 URL: https://issues.apache.org/jira/browse/SPARK-31791 Project: Spark Issue Type: Improvement Components: Tests Affects Versions: 3.1.0 Reporter: Holden Karau Consider using TestUtils.waitUntilExecutorsUp and also pick a timeout with more leeway. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31555) Improve cache block migration
[ https://issues.apache.org/jira/browse/SPARK-31555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113736#comment-17113736 ] Holden Karau commented on SPARK-31555: -- Rocking. We're running into an issue in master with the tests so I'm going to take #7 as a seperate issue (the testutils) but otherwise have at it and let us know if you get stuck. > Improve cache block migration > - > > Key: SPARK-31555 > URL: https://issues.apache.org/jira/browse/SPARK-31555 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.1.0 >Reporter: Holden Karau >Priority: Major > > We should explore the following improvements to cache block migration: > 1) Peer selection (right now may overbalance on certain peers) > 2) Do we need to configure the number of blocks to be migrated at the same > time > 3) Are there any blocks we don't need to replicate (e.g. they are already > stored on the desired number of executors even once we remove the executors > slated for decommissioning). > 4) Do we want to prioritize migrating blocks with no replicas > 5) Log the attempt number for debugging > 6) Clarify the logic for determining the number of replicas > 7) Consider using TestUtils.waitUntilExecutorsUp in tests rather than count > to wait for the executors to come up. imho this is the least important. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves updated SPARK-31788: -- Priority: Blocker (was: Major) > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Blocker > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id
[ https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31387: -- Fix Version/s: (was: 3.0.0) 3.1.0 > HiveThriftServer2Listener update methods fail with unknown operation/session > id > --- > > Key: SPARK-31387 > URL: https://issues.apache.org/jira/browse/SPARK-31387 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Ali Smesseim >Assignee: Ali Smesseim >Priority: Major > Fix For: 3.1.0 > > > HiveThriftServer2Listener update methods, such as onSessionClosed and > onOperationError throw a NullPointerException (in Spark 3) or a > NoSuchElementException (in Spark 2) when the input session/operation id is > unknown. In Spark 2, this can cause control flow issues with the caller of > the listener. In Spark 3, the listener is called by a ListenerBus which > catches the exception, but it would still be nicer if an invalid update is > logged and does not throw an exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31696) Support spark.kubernetes.driver.service.annotation
[ https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113729#comment-17113729 ] Canbin Zheng commented on SPARK-31696: -- Hi [~dongjoon]! Are there some scenarios that the users would like to set annotations for the headless service? > Support spark.kubernetes.driver.service.annotation > -- > > Key: SPARK-31696 > URL: https://issues.apache.org/jira/browse/SPARK-31696 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-31761: Priority: Blocker (was: Major) > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Blocker > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113728#comment-17113728 ] Wenchen Fan commented on SPARK-31761: - I've set it as blocker. > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Blocker > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113724#comment-17113724 ] Canbin Zheng commented on SPARK-31786: -- It seems the same issue as https://github.com/fabric8io/kubernetes-client/issues/2212. I have tried out v4.9.2 in Flink and it works as expected. JIRA: https://issues.apache.org/jira/browse/FLINK-17565 > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at >
[jira] [Assigned] (SPARK-31790) cast scenarios may generate different results between Hive and Spark
[ https://issues.apache.org/jira/browse/SPARK-31790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31790: Assignee: Apache Spark > cast scenarios may generate different results between Hive and Spark > - > > Key: SPARK-31790 > URL: https://issues.apache.org/jira/browse/SPARK-31790 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.5 >Reporter: philipse >Assignee: Apache Spark >Priority: Minor > > `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n > as milliseconds unit , while Spark SQL as seconds unit. so the cast result is > different,please be care when you use it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31790) cast scenarios may generate different results between Hive and Spark
[ https://issues.apache.org/jira/browse/SPARK-31790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31790: Assignee: (was: Apache Spark) > cast scenarios may generate different results between Hive and Spark > - > > Key: SPARK-31790 > URL: https://issues.apache.org/jira/browse/SPARK-31790 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Minor > > `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n > as milliseconds unit , while Spark SQL as seconds unit. so the cast result is > different,please be care when you use it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31790) cast scenarios may generate different results between Hive and Spark
[ https://issues.apache.org/jira/browse/SPARK-31790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113715#comment-17113715 ] Apache Spark commented on SPARK-31790: -- User 'GuoPhilipse' has created a pull request for this issue: https://github.com/apache/spark/pull/28605 > cast scenarios may generate different results between Hive and Spark > - > > Key: SPARK-31790 > URL: https://issues.apache.org/jira/browse/SPARK-31790 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 2.4.5 >Reporter: philipse >Priority: Minor > > `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n > as milliseconds unit , while Spark SQL as seconds unit. so the cast result is > different,please be care when you use it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31789) SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version execption
[ https://issues.apache.org/jira/browse/SPARK-31789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113708#comment-17113708 ] Jungtaek Lim commented on SPARK-31789: -- critical / blocker are tend to be reserved for committers. Please elaborate why you think it's a blocker instead, like how it breaks your cluster, workload, etc. > SparkSubmitOperator could not get Exit Code after log stream interrupted by > k8s old resource version execption > -- > > Key: SPARK-31789 > URL: https://issues.apache.org/jira/browse/SPARK-31789 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.4 >Reporter: Dylan Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31789) SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version execption
[ https://issues.apache.org/jira/browse/SPARK-31789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim updated SPARK-31789: - Priority: Major (was: Blocker) > SparkSubmitOperator could not get Exit Code after log stream interrupted by > k8s old resource version execption > -- > > Key: SPARK-31789 > URL: https://issues.apache.org/jira/browse/SPARK-31789 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.4 >Reporter: Dylan Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31790) cast scenarios may generate different results between Hive and Spark
philipse created SPARK-31790: Summary: cast scenarios may generate different results between Hive and Spark Key: SPARK-31790 URL: https://issues.apache.org/jira/browse/SPARK-31790 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.5 Reporter: philipse `CAST(n,TIMESTAMPTYPE)` If n is Byte/Short/Int/Long data type, Hive treat n as milliseconds unit , while Spark SQL as seconds unit. so the cast result is different,please be care when you use it -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113706#comment-17113706 ] Jungtaek Lim edited comment on SPARK-31761 at 5/22/20, 2:58 AM: Let's make sure priority is marked properly so that RC3 cannot be initiated without this - sounds like it's a blocker because it's a regression and correctness issue. was (Author: kabhwan): Let's make sure priority is marked properly - sounds like it's a blocker because it's a regression and correctness issue. > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113706#comment-17113706 ] Jungtaek Lim commented on SPARK-31761: -- Let's make sure priority is marked properly - sounds like it's a blocker because it's a regression and correctness issue. > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31785. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28598 [https://github.com/apache/spark/pull/28598] > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-31785: Assignee: Maxim Gekk > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31789) SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version execption
Dylan Yao created SPARK-31789: - Summary: SparkSubmitOperator could not get Exit Code after log stream interrupted by k8s old resource version execption Key: SPARK-31789 URL: https://issues.apache.org/jira/browse/SPARK-31789 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.4 Reporter: Dylan Yao -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31049) Support nested adjacent generators, e.g., explode(explode(v))
[ https://issues.apache.org/jira/browse/SPARK-31049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takeshi Yamamuro resolved SPARK-31049. -- Resolution: Won't Fix > Support nested adjacent generators, e.g., explode(explode(v)) > - > > Key: SPARK-31049 > URL: https://issues.apache.org/jira/browse/SPARK-31049 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Takeshi Yamamuro >Priority: Major > > In the master, we currently don't support any nested generators, but I think > supporting limited nested cases is somewhat useful for users, e.g., > explode(explode(v)). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-29854) lpad and rpad built in function not throw Exception for invalid len value
[ https://issues.apache.org/jira/browse/SPARK-29854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113644#comment-17113644 ] Apache Spark commented on SPARK-29854: -- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/28604 > lpad and rpad built in function not throw Exception for invalid len value > - > > Key: SPARK-29854 > URL: https://issues.apache.org/jira/browse/SPARK-29854 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: ABHISHEK KUMAR GUPTA >Priority: Minor > > Spark Returns Empty String) > {code} > 0: jdbc:hive2://10.18.19.208:23040/default> SELECT > lpad('hihhh', 5000, ''); > ++ > |lpad(hihhh, CAST(5000 AS INT), > )| > ++ > ++ > Hive: > SELECT lpad('hihhh', 5000, > ''); > Error: Error while compiling statement: FAILED: SemanticException [Error > 10016]: Line 1:67 Argument type mismatch '''': lpad only takes > INT/SHORT/BYTE types as 2-ths argument, got DECIMAL (state=42000,code=10016) > PostgreSQL > function lpad(unknown, numeric, unknown) does not exist > > Expected output: > In Spark also it should throw Exception like Hive > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113626#comment-17113626 ] Apache Spark commented on SPARK-31788: -- User 'redsanket' has created a pull request for this issue: https://github.com/apache/spark/pull/28603 > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31788: Assignee: (was: Apache Spark) > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31788: Assignee: Apache Spark > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Assignee: Apache Spark >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113625#comment-17113625 ] Apache Spark commented on SPARK-31788: -- User 'redsanket' has created a pull request for this issue: https://github.com/apache/spark/pull/28603 > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113623#comment-17113623 ] Sanket Reddy commented on SPARK-31788: -- Took a naive dig at it [https://github.com/apache/spark/pull/28603] seems to work, looking for reviews and improvement suggestions. > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113566#comment-17113566 ] Sanket Reddy edited comment on SPARK-31788 at 5/21/20, 11:37 PM: - [https://github.com/apache/spark/commit/f83fedc9f20869ab4c62bb07bac50113d921207f] looks like it does not check for PairRDD type in pyspark was (Author: sanket991): [https://git.ouroath.com/hadoop/spark/commit/f83fedc9f20869ab4c62bb07bac50113d921207f] looks like it does not check for PairRDD type in pyspark > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113611#comment-17113611 ] Dongjoon Hyun edited comment on SPARK-31786 at 5/21/20, 11:13 PM: -- I also verified that Apache Spark 3.0.0-RC2 and 2.4.6-RC3 fails, too. {code} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed. {code} was (Author: dongjoon): I also verified that Apache Spark 3.0.0-RC2 fails, too. {code} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed. {code} > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113613#comment-17113613 ] Dongjoon Hyun commented on SPARK-31786: --- [~holden] and [~dbtsai]. I raised this issue as a blocker for Apache Spark 3.0.0 and 2.4.6. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at >
[jira] [Updated] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31786: -- Target Version/s: 2.4.6, 3.0.0 > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at > okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298) >
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113611#comment-17113611 ] Dongjoon Hyun commented on SPARK-31786: --- I also verified that Apache Spark 3.0.0-RC2 fails, too. {code} Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed. {code} > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at >
[jira] [Updated] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31786: -- Priority: Blocker (was: Major) > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Blocker > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at > okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298)
[jira] [Updated] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31786: -- Affects Version/s: 3.0.0 > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5, 3.0.0 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at > okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298) >
[jira] [Assigned] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31765: Assignee: Apache Spark (was: Kousuke Saruta) > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Apache Spark >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31765: Assignee: Kousuke Saruta (was: Apache Spark) > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31765: Assignee: Kousuke Saruta (was: Apache Spark) > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reopened SPARK-31765: > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113609#comment-17113609 ] Apache Spark commented on SPARK-31765: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/28602 > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113610#comment-17113610 ] Apache Spark commented on SPARK-31765: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/28602 > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113595#comment-17113595 ] Dongjoon Hyun commented on SPARK-31786: --- BTW, is there any chance for you to test Apache Spark 3.0 RC2 which is the latest binary? - https://dist.apache.org/repos/dist/dev/spark/v3.0.0-rc2-bin/ > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at >
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113588#comment-17113588 ] Apache Spark commented on SPARK-31786: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28601 > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at >
[jira] [Assigned] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31786: Assignee: Apache Spark > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Assignee: Apache Spark >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at >
[jira] [Assigned] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31786: Assignee: (was: Apache Spark) > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at > okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298) >
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113587#comment-17113587 ] Apache Spark commented on SPARK-31786: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28601 > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at >
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113575#comment-17113575 ] Dongjoon Hyun commented on SPARK-31786: --- Thank you for reporting, [~maver1ck]. > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) > at >
[jira] [Commented] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113574#comment-17113574 ] Thomas Graves commented on SPARK-31788: --- [~sanket991]the link you provided is not to public Apache Spark, can you change the reference? > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113566#comment-17113566 ] Sanket Reddy commented on SPARK-31788: -- [https://git.ouroath.com/hadoop/spark/commit/f83fedc9f20869ab4c62bb07bac50113d921207f] looks like it does not check for PairRDD type in pyspark > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31788) Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanket Reddy updated SPARK-31788: - Description: Union RDD of Pair RDD's seems to have issues SparkSession available as 'spark'. >>> rdd1 = sc.parallelize([1,2,3,4,5]) >>> rdd2 = sc.parallelize([6,7,8,9,10]) >>> pairRDD1 = rdd1.zip(rdd2) >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd3 = sc.parallelize([11,12,13,14,15]) >>> pairRDD2 = rdd3.zip(rdd3) >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd4 = sc.parallelize(range(5)) >>> pairRDD3 = rdd4.zip(rdd4) >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, >>> 1), (2, 2), (3, 3), (4, 4)] 2.4.5 does not have this regression was: Pair RDD conversion seems to have issues SparkSession available as 'spark'. >>> rdd1 = sc.parallelize([1,2,3,4,5]) >>> rdd2 = sc.parallelize([6,7,8,9,10]) >>> pairRDD1 = rdd1.zip(rdd2) >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd3 = sc.parallelize([11,12,13,14,15]) >>> pairRDD2 = rdd3.zip(rdd3) >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at
[jira] [Updated] (SPARK-31788) Error when creating UnionRDD of PairRDDs
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanket Reddy updated SPARK-31788: - Summary: Error when creating UnionRDD of PairRDDs (was: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD) > Error when creating UnionRDD of PairRDDs > > > Key: SPARK-31788 > URL: https://issues.apache.org/jira/browse/SPARK-31788 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0, 3.0.1 >Reporter: Sanket Reddy >Priority: Major > > Union RDD of Pair RDD's seems to have issues > SparkSession available as 'spark'. > >>> rdd1 = sc.parallelize([1,2,3,4,5]) > >>> rdd2 = sc.parallelize([6,7,8,9,10]) > >>> pairRDD1 = rdd1.zip(rdd2) > >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, > in union jrdds[i] = rdds[i]._jrdd > File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, > in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd3 = sc.parallelize([11,12,13,14,15]) > >>> pairRDD2 = rdd3.zip(rdd3) > >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) > Traceback (most recent call last): File "", line 1, in File > "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union > jrdds[i] = rdds[i]._jrdd File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 238, in _setitem_ File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", > line 221, in __set_item File > "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line > 332, in get_return_value py4j.protocol.Py4JError: An error occurred while > calling None.None. Trace: py4j.Py4JException: Cannot convert > org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at > py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at > py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at > py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748) > >>> rdd4 = sc.parallelize(range(5)) > >>> pairRDD3 = rdd4.zip(rdd4) > >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) > >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, > >>> 1), (2, 2), (3, 3), (4, 4)] > > 2.4.5 does not have this regression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31787) Fix Minikube.getIfNewMinikubeStatus to understand 1.5+
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31787. --- Fix Version/s: 2.4.6 Assignee: Marcelo Masiero Vanzin Resolution: Fixed > Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ > -- > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Assignee: Marcelo Masiero Vanzin >Priority: Minor > Fix For: 2.4.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31787) Fix Minikube.getIfNewMinikubeStatus to understand 1.5+
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113537#comment-17113537 ] Dongjoon Hyun commented on SPARK-31787: --- This is merged to `branch-2.4` with the original authorship. > Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ > -- > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Assignee: Marcelo Masiero Vanzin >Priority: Minor > Fix For: 2.4.6 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113523#comment-17113523 ] Apache Spark commented on SPARK-31761: -- User 'sandeep-katta' has created a pull request for this issue: https://github.com/apache/spark/pull/28600 > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113525#comment-17113525 ] Sandeep Katta commented on SPARK-31761: --- I have raised the Pull request [https://github.com/apache/spark/pull/28600] > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31761: Assignee: (was: Apache Spark) > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113524#comment-17113524 ] Apache Spark commented on SPARK-31761: -- User 'sandeep-katta' has created a pull request for this issue: https://github.com/apache/spark/pull/28600 > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31761: Assignee: Apache Spark > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Assignee: Apache Spark >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16323) Avoid unnecessary cast when doing integral divide
[ https://issues.apache.org/jira/browse/SPARK-16323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113522#comment-17113522 ] Apache Spark commented on SPARK-16323: -- User 'sandeep-katta' has created a pull request for this issue: https://github.com/apache/spark/pull/28600 > Avoid unnecessary cast when doing integral divide > - > > Key: SPARK-16323 > URL: https://issues.apache.org/jira/browse/SPARK-16323 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Sean Zhong >Assignee: Marco Gaido >Priority: Minor > Fix For: 3.0.0 > > > This is a follow up of issue SPARK-15776 > *Problem:* > For Integer divide operator div: > {code} > scala> spark.sql("select 6 div 3").explain(true) > ... > == Analyzed Logical Plan == > CAST((6 / 3) AS BIGINT): bigint > Project [cast((cast(6 as double) / cast(3 as double)) as bigint) AS CAST((6 / > 3) AS BIGINT)#5L] > +- OneRowRelation$ > ... > {code} > For performance reason, we should not do unnecessary cast {{cast(xx as > double)}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31788) Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD
[ https://issues.apache.org/jira/browse/SPARK-31788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sanket Reddy updated SPARK-31788: - Description: Pair RDD conversion seems to have issues SparkSession available as 'spark'. >>> rdd1 = sc.parallelize([1,2,3,4,5]) >>> rdd2 = sc.parallelize([6,7,8,9,10]) >>> pairRDD1 = rdd1.zip(rdd2) >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd3 = sc.parallelize([11,12,13,14,15]) >>> pairRDD2 = rdd3.zip(rdd3) >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd4 = sc.parallelize(range(5)) >>> pairRDD3 = rdd4.zip(rdd4) >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, >>> 1), (2, 2), (3, 3), (4, 4)] 2.4.5 does not have this regression was: Pair RDD conversion seems to have issues SparkSession available as 'spark'. >>> rdd1 = sc.parallelize([1,2,3,4,5]) >>> rdd2 = sc.parallelize([6,7,8,9,10]) >>> pairRDD1 = rdd1.zip(rdd2) >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd3 = sc.parallelize([11,12,13,14,15]) >>> pairRDD2 = rdd3.zip(rdd3) >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at
[jira] [Created] (SPARK-31788) Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD
Sanket Reddy created SPARK-31788: Summary: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD Key: SPARK-31788 URL: https://issues.apache.org/jira/browse/SPARK-31788 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.0.0, 3.0.1 Reporter: Sanket Reddy Pair RDD conversion seems to have issues SparkSession available as 'spark'. >>> rdd1 = sc.parallelize([1,2,3,4,5]) >>> rdd2 = sc.parallelize([6,7,8,9,10]) >>> pairRDD1 = rdd1.zip(rdd2) >>> unionRDD1 = sc.union([pairRDD1, pairRDD1]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd3 = sc.parallelize([11,12,13,14,15]) >>> pairRDD2 = rdd3.zip(rdd3) >>> unionRDD2 = sc.union([pairRDD1, pairRDD2]) Traceback (most recent call last): File "", line 1, in File "/home/gs/spark/latest/python/pyspark/context.py", line 870, in union jrdds[i] = rdds[i]._jrdd File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 238, in _setitem_ File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/java_collections.py", line 221, in __set_item File "/home/gs/spark/latest/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 332, in get_return_value py4j.protocol.Py4JError: An error occurred while calling None.None. Trace: py4j.Py4JException: Cannot convert org.apache.spark.api.java.JavaPairRDD to org.apache.spark.api.java.JavaRDD at py4j.commands.ArrayCommand.convertArgument(ArrayCommand.java:166) at py4j.commands.ArrayCommand.setArray(ArrayCommand.java:144) at py4j.commands.ArrayCommand.execute(ArrayCommand.java:97) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) >>> rdd4 = sc.parallelize(range(5)) >>> pairRDD3 = rdd4.zip(rdd4) >>> unionRDD3 = sc.union([pairRDD1, pairRDD3]) >>> unionRDD3.collect() [(1, 6), (2, 7), (3, 8), (4, 9), (5, 10), (0, 0), (1, >>> 1), (2, 2), (3, 3), (4, 4)] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31787) Fix Minikube.getIfNewMinikubeStatus to understand 1.5+
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31787: -- Summary: Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ (was: Support Minikube 1.5.x in K8s IT) > Fix Minikube.getIfNewMinikubeStatus to understand 1.5+ > -- > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31787) Support Minikube 1.5.x in K8s IT
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31787: Assignee: Apache Spark > Support Minikube 1.5.x in K8s IT > > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31787) Support Minikube 1.5.x in K8s IT
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31787: Assignee: (was: Apache Spark) > Support Minikube 1.5.x in K8s IT > > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31787) Support Minikube 1.5.x in K8s IT
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113503#comment-17113503 ] Apache Spark commented on SPARK-31787: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28599 > Support Minikube 1.5.x in K8s IT > > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31787) Support Minikube 1.5.x in K8s IT
Dongjoon Hyun created SPARK-31787: - Summary: Support Minikube 1.5.x in K8s IT Key: SPARK-31787 URL: https://issues.apache.org/jira/browse/SPARK-31787 Project: Spark Issue Type: Improvement Components: Kubernetes, Tests Affects Versions: 2.4.6 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31787) Support Minikube 1.5.x in K8s IT
[ https://issues.apache.org/jira/browse/SPARK-31787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31787: -- Priority: Minor (was: Major) > Support Minikube 1.5.x in K8s IT > > > Key: SPARK-31787 > URL: https://issues.apache.org/jira/browse/SPARK-31787 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Tests >Affects Versions: 2.4.6 >Reporter: Dongjoon Hyun >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-31765. Resolution: Fixed > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31765) Upgrade HtmlUnit >= 2.37.0
[ https://issues.apache.org/jira/browse/SPARK-31765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113460#comment-17113460 ] Gengliang Wang commented on SPARK-31765: This issue is resolved in https://github.com/apache/spark/pull/28585 > Upgrade HtmlUnit >= 2.37.0 > -- > > Key: SPARK-31765 > URL: https://issues.apache.org/jira/browse/SPARK-31765 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Kousuke Saruta >Assignee: Kousuke Saruta >Priority: Major > > Recently, a security issue which affects HtmlUnit is reported. > [https://nvd.nist.gov/vuln/detail/CVE-2020-5529] > According to the report, arbitrary code can be run by malicious users. > HtmlUnit is used for test so the impact might not be large but it's better to > upgrade it just in case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113453#comment-17113453 ] Sandeep Katta commented on SPARK-31761: --- okay I will try to fix it > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113451#comment-17113451 ] Dongjoon Hyun commented on SPARK-31761: --- If possible, could you try to find a way not to revert the existing commit, [~sandeep.katta2007]? > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-29303) UI updates for stage level scheduling
[ https://issues.apache.org/jira/browse/SPARK-29303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-29303. --- Fix Version/s: 3.1.0 Assignee: Thomas Graves Resolution: Fixed > UI updates for stage level scheduling > - > > Key: SPARK-29303 > URL: https://issues.apache.org/jira/browse/SPARK-29303 > Project: Spark > Issue Type: Story > Components: Web UI >Affects Versions: 3.0.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Major > Fix For: 3.1.0 > > > Update the UI to show information about stage level scheduling. > The stage pages should have what resources were required for instance. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31688) Refactor pagination framework for spark web UI pages
[ https://issues.apache.org/jira/browse/SPARK-31688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31688. -- Fix Version/s: 3.1.0 Assignee: Rakesh Raushan Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28512 > Refactor pagination framework for spark web UI pages > > > Key: SPARK-31688 > URL: https://issues.apache.org/jira/browse/SPARK-31688 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Rakesh Raushan >Assignee: Rakesh Raushan >Priority: Minor > Fix For: 3.1.0 > > > Currently, a large chunk of code is copied when we implement pagination using > the current pagination framework. We also embed HTML a lot, this decreases > code readability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31688) Refactor pagination framework for spark web UI pages
[ https://issues.apache.org/jira/browse/SPARK-31688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31688: - Issue Type: Improvement (was: Bug) > Refactor pagination framework for spark web UI pages > > > Key: SPARK-31688 > URL: https://issues.apache.org/jira/browse/SPARK-31688 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 3.1.0 >Reporter: Rakesh Raushan >Priority: Minor > > Currently, a large chunk of code is copied when we implement pagination using > the current pagination framework. We also embed HTML a lot, this decreases > code readability. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
[ https://issues.apache.org/jira/browse/SPARK-31786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113429#comment-17113429 ] Maciej Bryński commented on SPARK-31786: I think this is related to: [https://github.com/fabric8io/kubernetes-client/issues/2145] > Exception on submitting Spark-Pi to Kubernetes 1.17.3 > - > > Key: SPARK-31786 > URL: https://issues.apache.org/jira/browse/SPARK-31786 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Maciej Bryński >Priority: Major > > Hi, > I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. > Kubernetes version: 1.17.3 > JDK version: openjdk version "1.8.0_252" > Exception: > {code} > ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode > cluster --name spark-pi --conf > spark.kubernetes.container.image=spark-py:2.4.5 --conf > spark.kubernetes.executor.request.cores=0.1 --conf > spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf > spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py > log4j:WARN No appenders could be found for logger > (io.fabric8.kubernetes.client.Config). > log4j:WARN Please initialize the log4j system properly. > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more > info. > Using Spark's default log4j profile: > org/apache/spark/log4j-defaults.properties > Exception in thread "main" > io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] > for kind: [Pod] with name: [null] in namespace: [default] failed. > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) > at > io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) > at > io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) > at > org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) > at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) > at > org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > Caused by: java.net.SocketException: Broken pipe (Write failed) > at java.net.SocketOutputStream.socketWrite0(Native Method) > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) > at java.net.SocketOutputStream.write(SocketOutputStream.java:155) > at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) > at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) > at > sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) > at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) > at okio.Okio$1.write(Okio.java:79) > at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) > at okio.RealBufferedSink.flush(RealBufferedSink.java:224) > at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) > at > okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) >
[jira] [Updated] (SPARK-31763) DataFrame.inputFiles() not Available
[ https://issues.apache.org/jira/browse/SPARK-31763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Kizhakkel Jose updated SPARK-31763: - Issue Type: Bug (was: New Feature) > DataFrame.inputFiles() not Available > > > Key: SPARK-31763 > URL: https://issues.apache.org/jira/browse/SPARK-31763 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.5 >Reporter: Felix Kizhakkel Jose >Priority: Major > > I have been trying to list inputFiles that compose my DataSet by using > *PySpark* > spark_session.read > .format(sourceFileFormat) > .load(S3A_FILESYSTEM_PREFIX + bucket + File.separator + sourceFolderPrefix) > *.inputFiles();* > but I get an exception saying inputFiles attribute not present. But I was > able to get this functionality with Spark Java. > *So is this something missing in PySpark?* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31354) SparkSession Lifecycle methods to fix memory leak
[ https://issues.apache.org/jira/browse/SPARK-31354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31354: --- Assignee: Vinoo Ganesh > SparkSession Lifecycle methods to fix memory leak > - > > Key: SPARK-31354 > URL: https://issues.apache.org/jira/browse/SPARK-31354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Vinoo Ganesh >Assignee: Vinoo Ganesh >Priority: Major > Fix For: 3.0.0 > > > Follow up to https://issues.apache.org/jira/browse/SPARK-27958 after > discussion on [https://github.com/apache/spark/pull/24807]. > > Let's instead expose methods that allow the user to manually clean up > (terminate) a SparkSession, that also remove the listenerState from the > context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113344#comment-17113344 ] Sandeep Katta commented on SPARK-31761: --- but to fix this do we need to revert https://issues.apache.org/jira/browse/SPARK-16323 or we just cast input to long and divide ? > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113337#comment-17113337 ] Sandeep Katta commented on SPARK-31761: --- I can fix this, I will raise PR > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31354) SparkSession Lifecycle methods to fix memory leak
[ https://issues.apache.org/jira/browse/SPARK-31354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31354. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28128 [https://github.com/apache/spark/pull/28128] > SparkSession Lifecycle methods to fix memory leak > - > > Key: SPARK-31354 > URL: https://issues.apache.org/jira/browse/SPARK-31354 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Vinoo Ganesh >Priority: Major > Fix For: 3.0.0 > > > Follow up to https://issues.apache.org/jira/browse/SPARK-27958 after > discussion on [https://github.com/apache/spark/pull/24807]. > > Let's instead expose methods that allow the user to manually clean up > (terminate) a SparkSession, that also remove the listenerState from the > context. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113309#comment-17113309 ] Wenchen Fan commented on SPARK-31761: - [~sandeep.katta2007] are you interested at fixing it? > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113308#comment-17113308 ] Wenchen Fan commented on SPARK-31761: - I think this is a breaking change and we should fix it. The `div` operator always returns long type, so this should not overflow. The `IntegralDivide` should cast input to long and divide. > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31774) getting the Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, at org.apache.spark.sql.catalyst.errors.package$.attachTree(p
[ https://issues.apache.org/jira/browse/SPARK-31774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113297#comment-17113297 ] Pankaj Tiwari commented on SPARK-31774: --- Hi [~jobitmathew] , I have a column name like "name -> valie1 value2 value2 @cost order" so basically what is happening this is failing only for this kind of column name, but weird is that it is failing when I have number data is more but when the number of data is small then it is passing for this column also. and it is falling when I am calling the count() method, not sure but I feel during count it is taking some logic of group by internally which I am not sure, any suggestions? > getting the Caused by: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > --- > > Key: SPARK-31774 > URL: https://issues.apache.org/jira/browse/SPARK-31774 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 > Environment: spark 2.2 >Reporter: Pankaj Tiwari >Priority: Major > > Actually I am loading the excel which has some 90 columns and the some > columns name contains special character as well like @ % -> . etc etc so > while I am doing one use case like : > sourceDataSet.select(columnSeq).except(targetDataset.select(columnSeq))); > this is working fine but as soon as I am running > sourceDataSet.select(columnSeq).except(targetDataset.select(columnSeq)).count() > it is failing with error like : > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange SinglePartition > +- *HashAggregate(keys=[], functions=[partial_count(1)], > output=[count#26596L]) > +- *HashAggregate(keys=columns name > > > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree:column namet#14050 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$40.apply(HashAggregateExec.scala:703) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$40.apply(HashAggregateExec.scala:703) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) > at scala.collection.immutable.Stream.foreach(Stream.scala:595) > at > scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) > at scala.collection.AbstractTraversable.count(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.createCode(GenerateUnsafeProjection.scala:312) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:702) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:156) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:155) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36) > > > > > Caused by: java.lang.RuntimeException:
[jira] [Created] (SPARK-31786) Exception on submitting Spark-Pi to Kubernetes 1.17.3
Maciej Bryński created SPARK-31786: -- Summary: Exception on submitting Spark-Pi to Kubernetes 1.17.3 Key: SPARK-31786 URL: https://issues.apache.org/jira/browse/SPARK-31786 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 2.4.5 Reporter: Maciej Bryński Hi, I'm getting exception when submitting Spark-Pi app to Kubernetes cluster. Kubernetes version: 1.17.3 JDK version: openjdk version "1.8.0_252" Exception: {code} ./bin/spark-submit --master k8s://https://172.31.23.60:8443 --deploy-mode cluster --name spark-pi --conf spark.kubernetes.container.image=spark-py:2.4.5 --conf spark.kubernetes.executor.request.cores=0.1 --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark --conf spark.executor.instances=1 local:///opt/spark/examples/src/main/python/pi.py log4j:WARN No appenders could be found for logger (io.fabric8.kubernetes.client.Config). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Operation: [create] for kind: [Pod] with name: [null] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:337) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.create(BaseOperation.java:330) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:141) at org.apache.spark.deploy.k8s.submit.Client$$anonfun$run$2.apply(KubernetesClientApplication.scala:140) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at org.apache.spark.deploy.k8s.submit.Client.run(KubernetesClientApplication.scala:140) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:250) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication$$anonfun$run$5.apply(KubernetesClientApplication.scala:241) at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2543) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.run(KubernetesClientApplication.scala:241) at org.apache.spark.deploy.k8s.submit.KubernetesClientApplication.start(KubernetesClientApplication.scala:204) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.SocketException: Broken pipe (Write failed) at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at sun.security.ssl.OutputRecord.writeBuffer(OutputRecord.java:431) at sun.security.ssl.OutputRecord.write(OutputRecord.java:417) at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:894) at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:865) at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123) at okio.Okio$1.write(Okio.java:79) at okio.AsyncTimeout$1.write(AsyncTimeout.java:180) at okio.RealBufferedSink.flush(RealBufferedSink.java:224) at okhttp3.internal.http2.Http2Writer.settings(Http2Writer.java:203) at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:515) at okhttp3.internal.http2.Http2Connection.start(Http2Connection.java:505) at okhttp3.internal.connection.RealConnection.startHttp2(RealConnection.java:298) at okhttp3.internal.connection.RealConnection.establishProtocol(RealConnection.java:287) at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:168) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
[jira] [Commented] (SPARK-31774) getting the Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, at org.apache.spark.sql.catalyst.errors.package$.attachTree(p
[ https://issues.apache.org/jira/browse/SPARK-31774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113184#comment-17113184 ] jobit mathew commented on SPARK-31774: -- [~pankaj24] did the issue only in spark 2.2? May be you can try in latest spark 2.4.5 or 3.0 preview > getting the Caused by: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > --- > > Key: SPARK-31774 > URL: https://issues.apache.org/jira/browse/SPARK-31774 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.0 > Environment: spark 2.2 >Reporter: Pankaj Tiwari >Priority: Major > > Actually I am loading the excel which has some 90 columns and the some > columns name contains special character as well like @ % -> . etc etc so > while I am doing one use case like : > sourceDataSet.select(columnSeq).except(targetDataset.select(columnSeq))); > this is working fine but as soon as I am running > sourceDataSet.select(columnSeq).except(targetDataset.select(columnSeq)).count() > it is failing with error like : > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange SinglePartition > +- *HashAggregate(keys=[], functions=[partial_count(1)], > output=[count#26596L]) > +- *HashAggregate(keys=columns name > > > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree:column namet#14050 > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:88) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:87) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:267) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:266) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:256) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:87) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$40.apply(HashAggregateExec.scala:703) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec$$anonfun$40.apply(HashAggregateExec.scala:703) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at > scala.collection.immutable.Stream$$anonfun$map$1.apply(Stream.scala:418) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1233) > at scala.collection.immutable.Stream$Cons.tail(Stream.scala:1223) > at scala.collection.immutable.Stream.foreach(Stream.scala:595) > at > scala.collection.TraversableOnce$class.count(TraversableOnce.scala:115) > at scala.collection.AbstractTraversable.count(Traversable.scala:104) > at > org.apache.spark.sql.catalyst.expressions.codegen.GenerateUnsafeProjection$.createCode(GenerateUnsafeProjection.scala:312) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithKeys(HashAggregateExec.scala:702) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsume(HashAggregateExec.scala:156) > at > org.apache.spark.sql.execution.CodegenSupport$class.consume(WholeStageCodegenExec.scala:155) > at > org.apache.spark.sql.execution.ProjectExec.consume(basicPhysicalOperators.scala:36) > > > > > Caused by: java.lang.RuntimeException: Couldn't find here one name of column > following with > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1$$anonfun$applyOrElse$1.apply(BoundAttribute.scala:94) > at >
[jira] [Commented] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113113#comment-17113113 ] Apache Spark commented on SPARK-31785: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28598 > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31785: Assignee: Apache Spark > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113110#comment-17113110 ] Apache Spark commented on SPARK-31785: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28598 > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31785) Add a helper function to test all parquet readers
[ https://issues.apache.org/jira/browse/SPARK-31785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31785: Assignee: (was: Apache Spark) > Add a helper function to test all parquet readers > - > > Key: SPARK-31785 > URL: https://issues.apache.org/jira/browse/SPARK-31785 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Add the method withAllParquetReaders {} which runs the block of code for all > supported parquet readers. And re-use it in test suites. This should de-dup > code, and allow OSS Spark based projects that have their own parquet readers > to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31785) Add a helper function to test all parquet readers
Maxim Gekk created SPARK-31785: -- Summary: Add a helper function to test all parquet readers Key: SPARK-31785 URL: https://issues.apache.org/jira/browse/SPARK-31785 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Add the method withAllParquetReaders {} which runs the block of code for all supported parquet readers. And re-use it in test suites. This should de-dup code, and allow OSS Spark based projects that have their own parquet readers to re-use existing tests. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31448) Difference in Storage Levels used in cache() and persist() for pyspark dataframes
[ https://issues.apache.org/jira/browse/SPARK-31448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17113035#comment-17113035 ] Abhishek Dixit commented on SPARK-31448: [~tianshi] [~hyukjin.kwon] Any update on this? > Difference in Storage Levels used in cache() and persist() for pyspark > dataframes > - > > Key: SPARK-31448 > URL: https://issues.apache.org/jira/browse/SPARK-31448 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.3 >Reporter: Abhishek Dixit >Priority: Major > > There is a difference in default storage level *MEMORY_AND_DISK* in pyspark > and scala. > *Scala*: StorageLevel(true, true, false, true) > *Pyspark:* StorageLevel(True, True, False, False) > > *Problem Description:* > Calling *df.cache()* for pyspark dataframe directly invokes Scala method > cache() and Storage Level used is StorageLevel(true, true, false, true). > But calling *df.persist()* for pyspark dataframe sets the > newStorageLevel=StorageLevel(true, true, false, false) inside pyspark and > then invokes Scala function persist(newStorageLevel). > *Possible Fix:* > Invoke pyspark function persist inside pyspark function cache instead of > calling the scala function directly. > I can raise a PR for this fix if someone can confirm that this is a bug and > the possible fix is the correct approach. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112924#comment-17112924 ] Hyukjin Kwon commented on SPARK-31761: -- [~sandeep.katta2007] feel free to open a followup PR against SPARK-16323 in order to document the overflow in the migration guide. cc [~cloud_fan] and [~mgaido] fyi > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31761) Sql Div operator can result in incorrect output for int_min
[ https://issues.apache.org/jira/browse/SPARK-31761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112918#comment-17112918 ] Hyukjin Kwon commented on SPARK-31761: -- If it overflows, looks like it just should be guided by {{spark.sql.ansi.enabled}} and throws an exception on the overflow. > Sql Div operator can result in incorrect output for int_min > --- > > Key: SPARK-31761 > URL: https://issues.apache.org/jira/browse/SPARK-31761 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Kuhu Shukla >Priority: Major > > Input in csv : -2147483648,-1 --> (_c0, _c1) > {code} > val res = df.selectExpr("_c0 div _c1") > res.collect > res1: Array[org.apache.spark.sql.Row] = Array([-2147483648]) > {code} > The result should be 2147483648 instead. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31784: Assignee: Apache Spark > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages`(a.k.a rdd2.collect().head) should be > ["0", "1", "2", "3"], but is "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31784: Assignee: (was: Apache Spark) > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages`(a.k.a rdd2.collect().head) should be > ["0", "1", "2", "3"], but is "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17112913#comment-17112913 ] Apache Spark commented on SPARK-31784: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28596 > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages`(a.k.a rdd2.collect().head) should be > ["0", "1", "2", "3"], but is "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31784: - Description: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages`(a.k.a rdd2.collect().head) should be ["0", "1", "2", "3"], but is "0" in reality. was: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but is "0" in reality. > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages`(a.k.a rdd2.collect().head) should be > ["0", "1", "2", "3"], but is "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31784: - Description: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but is "0" in reality. was: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but is "0" in reality. > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages` should be ["0", "1", "2", "3"], but is > "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31784: - Description: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but is "0" in reality. was: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but only "0" in reality. > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages` should be ["0", "1", "2", "3"], but is > "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31784: - Description: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but only "0" in reality. > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > > {code:java} > test("share messages with allGather() call") { > val conf = new SparkConf() > .setMaster("local-cluster[4, 1, 1024]") > .setAppName("test-cluster") > sc = new SparkContext(conf) > val rdd = sc.makeRDD(1 to 10, 4) > val rdd2 = rdd.barrier().mapPartitions { it => > val context = BarrierTaskContext.get() > // Sleep for a random time before global sync. > Thread.sleep(Random.nextInt(1000)) > // Pass partitionId message in > val message: String = context.partitionId().toString > val messages: Array[String] = context.allGather(message) > messages.toList.iterator >} >// Take a sorted list of all the partitionId messages >val messages = rdd2.collect().head >// All the task partitionIds are shared >for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) > } > {code} > In this test, the desired `messages` should be ["0", "1", "2", "3"], but only > "0" in reality. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31784) Fix test BarrierTaskContextSuite."share messages with allGather() call"
[ https://issues.apache.org/jira/browse/SPARK-31784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31784: - Environment: {code:java} {code} was: {code:java} test("share messages with allGather() call") { val conf = new SparkConf() .setMaster("local-cluster[4, 1, 1024]") .setAppName("test-cluster") sc = new SparkContext(conf) val rdd = sc.makeRDD(1 to 10, 4) val rdd2 = rdd.barrier().mapPartitions { it => val context = BarrierTaskContext.get() // Sleep for a random time before global sync. Thread.sleep(Random.nextInt(1000)) // Pass partitionId message in val message: String = context.partitionId().toString val messages: Array[String] = context.allGather(message) messages.toList.iterator } // Take a sorted list of all the partitionId messages val messages = rdd2.collect().head // All the task partitionIds are shared for((x, i) <- messages.view.zipWithIndex) assert(x.toString == i.toString) } {code} In this test, the desired `messages` should be ["0", "1", "2", "3"], but only "0" in reality. > Fix test BarrierTaskContextSuite."share messages with allGather() call" > --- > > Key: SPARK-31784 > URL: https://issues.apache.org/jira/browse/SPARK-31784 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 3.1.0 > Environment: > {code:java} > {code} > >Reporter: wuyi >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org