[jira] [Commented] (SPARK-26466) Use ConfigEntry for hardcoded configs for submit categories.
[ https://issues.apache.org/jira/browse/SPARK-26466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741480#comment-16741480 ] Jungtaek Lim commented on SPARK-26466: -- I'm working on this. > Use ConfigEntry for hardcoded configs for submit categories. > > > Key: SPARK-26466 > URL: https://issues.apache.org/jira/browse/SPARK-26466 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Takuya Ueshin >Priority: Major > > Make the following hardcoded configs to use {{ConfigEntry}}. > {code} > spark.kryo > spark.kryoserializer > spark.jars > spark.submit > spark.serializer > spark.deploy > spark.worker > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests
[ https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-26120: - Fix Version/s: 2.3.3 > Fix a streaming query leak in Structured Streaming R tests > -- > > Key: SPARK-26120 > URL: https://issues.apache.org/jira/browse/SPARK-26120 > Project: Spark > Issue Type: Test > Components: SparkR, Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu >Priority: Minor > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > "Specify a schema by using a DDL-formatted string when reading" doesn't stop > the streaming query before stopping Spark. It causes the following annoying > logs. > {code} > Exception in thread "stream execution thread for [id = > 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = > 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution > thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = > 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: > Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > org.apache.spark.SparkException: Exception thrown in awaitResult: > at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355) > at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76) > at > org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108) > at > org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342) > at > org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77) > at > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323) > at > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204) > Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already > stopped. > at > org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158) > at > org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135) > at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229) > at > org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523) > at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91) > ... 7 more > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing
[ https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741453#comment-16741453 ] Felix Cheung commented on SPARK-26565: -- Yeah, my point wasn’t to allow access to unsigned release but to help RM to check out built packages before kicking off the RC process before release. For example, often times the build completes successfully but there are some issue with the content. > modify dev/create-release/release-build.sh to let jenkins build packages w/o > publishing > --- > > Key: SPARK-26565 > URL: https://issues.apache.org/jira/browse/SPARK-26565 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0 >Reporter: shane knapp >Assignee: shane knapp >Priority: Major > Attachments: fine.png, no-idea.jpg > > > about a year+ ago, we stopped publishing releases directly from jenkins... > this means that the spark-\{branch}-packaging builds are failing due to gpg > signing failures, and i would like to update these builds to *just* perform > packaging. > example: > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console] > i propose to change dev/create-release/release-build.sh... > when the script is called w/the 'package' option, add an {{if}} statement to > skip the following sections when run on jenkins: > 1) gpg signing of the source tarball (lines 184-187) > 2) gpg signing of the sparkR dist (lines 243-248) > 3) gpg signing of the python dist (lines 256-261) > 4) gpg signing of the regular binary dist (lines 264-271) > 5) the svn push of the signed dists (lines 317-332) > > -another, and probably much better option, is to nuke the > spark-\{branch}-packaging builds and create new ones that just build things > w/o touching this incredible fragile shell scripting nightmare.- -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10
[ https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-25572: - Fix Version/s: 2.3.3 > SparkR tests failed on CRAN on Java 10 > -- > > Key: SPARK-25572 > URL: https://issues.apache.org/jira/browse/SPARK-25572 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.3.3, 2.4.0 > > > follow up to SPARK-24255 > from 2.3.2 release we can see that CRAN doesn't seem to respect the system > requirements as running tests - we have seen cases where SparkR is run on > Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt > skipping all tests -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11
[ https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-26010: - Fix Version/s: 2.3.3 > SparkR vignette fails on CRAN on Java 11 > > > Key: SPARK-26010 > URL: https://issues.apache.org/jira/browse/SPARK-26010 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.4.0, 3.0.0 >Reporter: Felix Cheung >Assignee: Felix Cheung >Priority: Major > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > follow up to SPARK-25572 > but for vignettes > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`
[ https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741428#comment-16741428 ] Dongjoon Hyun commented on SPARK-26608: --- Thank you! > Remove Jenkins jobs for `branch-2.2` > > > Key: SPARK-26608 > URL: https://issues.apache.org/jira/browse/SPARK-26608 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.2.3 >Reporter: Dongjoon Hyun >Assignee: shane knapp >Priority: Major > Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png > > > This issue aims to remove the following Jenkins jobs for `branch-2.2` because > of EOL. > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/] > As of today, the branch is healthy. > !Screen Shot 2019-01-11 at 8.47.27 PM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23182) Allow enabling of TCP keep alive for RPC connections
[ https://issues.apache.org/jira/browse/SPARK-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23182: -- Summary: Allow enabling of TCP keep alive for RPC connections (was: Allow enabling of TCP keep alive for master RPC connections) > Allow enabling of TCP keep alive for RPC connections > > > Key: SPARK-23182 > URL: https://issues.apache.org/jira/browse/SPARK-23182 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2, 2.4.0 >Reporter: Petar Petrov >Priority: Major > > We rely heavily on preemptible worker machines in GCP/GCE. These machines > disappear without closing the TCP connections to the master which increases > the number of established connections and new workers can not connect because > of "Too many open files" on the master. > To solve the problem we need to enable TCP keep alive for the RPC connections > to the master but it's not possible to do so via configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`
[ https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shane knapp reassigned SPARK-26608: --- Assignee: shane knapp > Remove Jenkins jobs for `branch-2.2` > > > Key: SPARK-26608 > URL: https://issues.apache.org/jira/browse/SPARK-26608 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.2.3 >Reporter: Dongjoon Hyun >Assignee: shane knapp >Priority: Major > Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png > > > This issue aims to remove the following Jenkins jobs for `branch-2.2` because > of EOL. > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/] > As of today, the branch is healthy. > !Screen Shot 2019-01-11 at 8.47.27 PM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`
[ https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741412#comment-16741412 ] shane knapp commented on SPARK-26608: - sure, i'll take care of this next week. > Remove Jenkins jobs for `branch-2.2` > > > Key: SPARK-26608 > URL: https://issues.apache.org/jira/browse/SPARK-26608 > Project: Spark > Issue Type: Task > Components: Tests >Affects Versions: 2.2.3 >Reporter: Dongjoon Hyun >Assignee: shane knapp >Priority: Major > Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png > > > This issue aims to remove the following Jenkins jobs for `branch-2.2` because > of EOL. > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/] > As of today, the branch is healthy. > !Screen Shot 2019-01-11 at 8.47.27 PM.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-23182) Allow enabling of TCP keep alive for RPC connections
[ https://issues.apache.org/jira/browse/SPARK-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23182: -- Priority: Minor (was: Major) > Allow enabling of TCP keep alive for RPC connections > > > Key: SPARK-23182 > URL: https://issues.apache.org/jira/browse/SPARK-23182 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.2.2, 2.4.0 >Reporter: Petar Petrov >Priority: Minor > > We rely heavily on preemptible worker machines in GCP/GCE. These machines > disappear without closing the TCP connections to the master which increases > the number of established connections and new workers can not connect because > of "Too many open files" on the master. > To solve the problem we need to enable TCP keep alive for the RPC connections > to the master but it's not possible to do so via configuration. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26564) Fix wrong assertions and error messages for parameter checking
[ https://issues.apache.org/jira/browse/SPARK-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-26564. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23488 [https://github.com/apache/spark/pull/23488] > Fix wrong assertions and error messages for parameter checking > -- > > Key: SPARK-26564 > URL: https://issues.apache.org/jira/browse/SPARK-26564 > Project: Spark > Issue Type: Bug > Components: MLlib, Spark Core, SQL >Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0 >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > Labels: starter > Fix For: 3.0.0 > > > I mistakenly set an equivalent value with spark.network.timeout to > spark.executor.heartbeatInterval and got the following error: > {code} > java.lang.IllegalArgumentException: requirement failed: The value of > spark.network.timeout=120s must be no less than the value of > spark.executor.heartbeatInterval=120s. > {code} > But it can be read as they could be equal. "Greater than" is more precise > than "no less than". > > In addition, the following assertions are inconsistent with their messages. > {code:title=mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala} > 91 require(maxIter >= 0, s"maxIter must be a positive integer: $maxIter") > {code} > {code:title=sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala} > 416 require(capacity < 51200, "Cannot broadcast more than 512 > millions rows") > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26591) illegal hardware instruction
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741387#comment-16741387 ] Elchin edited comment on SPARK-26591 at 1/12/19 8:00 PM: - [~bryanc] I installed it through pip. And I tested it on clean virtual environment. And it also crushed PyArrow version is 0.11.1. I also attached core dump, may be it can help you. was (Author: elch10): [~bryanc] I installed it through pip. And I tested it on clean virtual environment. And it also doesn't work PyArrow version is 0.11.1. I also attached core dump, may be it can help you. > illegal hardware instruction > > > Key: SPARK-26591 > URL: https://issues.apache.org/jira/browse/SPARK-26591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 > Environment: Python 3.6.7 > Pyspark 2.4.0 > OS: > {noformat} > Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 > x86_64 x86_64 GNU/Linux{noformat} > CPU: > > {code:java} > Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB > clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz > {code} > > >Reporter: Elchin >Priority: Critical > Attachments: core > > > When I try to use pandas_udf from examples in > [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]: > {code:java} > from pyspark.sql.functions import pandas_udf, PandasUDFType > from pyspark.sql.types import IntegerType, StringType > slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is > crashed{code} > I get the error: > {code:java} > [1] 17969 illegal hardware instruction (core dumped) python3{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26591) illegal hardware instruction
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Elchin updated SPARK-26591: --- Attachment: core > illegal hardware instruction > > > Key: SPARK-26591 > URL: https://issues.apache.org/jira/browse/SPARK-26591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 > Environment: Python 3.6.7 > Pyspark 2.4.0 > OS: > {noformat} > Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 > x86_64 x86_64 GNU/Linux{noformat} > CPU: > > {code:java} > Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB > clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz > {code} > > >Reporter: Elchin >Priority: Critical > Attachments: core > > > When I try to use pandas_udf from examples in > [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]: > {code:java} > from pyspark.sql.functions import pandas_udf, PandasUDFType > from pyspark.sql.types import IntegerType, StringType > slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is > crashed{code} > I get the error: > {code:java} > [1] 17969 illegal hardware instruction (core dumped) python3{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26591) illegal hardware instruction
[ https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741387#comment-16741387 ] Elchin commented on SPARK-26591: [~bryanc] I installed it through pip. And I tested it on clean virtual environment. And it also doesn't work PyArrow version is 0.11.1. I also attached core dump, may be it can help you. > illegal hardware instruction > > > Key: SPARK-26591 > URL: https://issues.apache.org/jira/browse/SPARK-26591 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.0 > Environment: Python 3.6.7 > Pyspark 2.4.0 > OS: > {noformat} > Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 > x86_64 x86_64 GNU/Linux{noformat} > CPU: > > {code:java} > Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB > clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz > {code} > > >Reporter: Elchin >Priority: Critical > Attachments: core > > > When I try to use pandas_udf from examples in > [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]: > {code:java} > from pyspark.sql.functions import pandas_udf, PandasUDFType > from pyspark.sql.types import IntegerType, StringType > slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is > crashed{code} > I get the error: > {code:java} > [1] 17969 illegal hardware instruction (core dumped) python3{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26538) Postgres numeric array support
[ https://issues.apache.org/jira/browse/SPARK-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26538. --- Resolution: Fixed Fix Version/s: 3.0.0 2.4.1 2.3.3 This is resolved via https://github.com/apache/spark/pull/23456 > Postgres numeric array support > -- > > Key: SPARK-26538 > URL: https://issues.apache.org/jira/browse/SPARK-26538 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.2.2, 2.3.2, 2.4.1 > Environment: PostgreSQL 10.4, 9.6.9. >Reporter: Oleksii >Priority: Minor > Fix For: 2.3.3, 2.4.1, 3.0.0 > > > Consider the following table definition: > {code:sql} > create table test1 > ( > v numeric[], > d numeric > ); > insert into test1 values('{.222,.332}', 222.4555); > {code} > When reading the table into a Dataframe, I get the following schema: > {noformat} > root > |-- v: array (nullable = true) > | |-- element: decimal(0,0) (containsNull = true) > |-- d: decimal(38,18) (nullable = true){noformat} > Notice that for both columns precision and scale were not specified, but in > case of the array element I got both set to 0, while in the other case > defaults were set. > Later, when I try to read the Dataframe, I get the following error: > {noformat} > java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 > exceeds max precision 0 > at scala.Predef$.require(Predef.scala:224) > at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114) > at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474) > ...{noformat} > I would expect to get array elements of type decimal(38,18) and no error when > reading in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25430) Add map parameter for withColumnRenamed
[ https://issues.apache.org/jira/browse/SPARK-25430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25430. -- Resolution: Won't Fix > Add map parameter for withColumnRenamed > --- > > Key: SPARK-25430 > URL: https://issues.apache.org/jira/browse/SPARK-25430 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.3.1 >Reporter: Goun Na >Priority: Major > > WithColumnRenamed method should work with map parameter. It removes code > redundancy. > {code:java} > // example > df.withColumnRenamed(Map( "c1" -> "first_column", "c2" -> "second_column" > )){code} > {code:java} > // from abbr columns to desc columns > val m = Map( "c1" -> "first_column", "c2" -> "second_column" ) > df1.withColumnRenamed(m) > df2.withColumnRenamed(m) > {code} > It is useful for CJK users when they are working on analysis in notebook > environment such as Zeppelin, Databricks, Apache Toree. > {code:java} > // for CJK users once define dictionary into map, reuse column map to > translate columns whenever report visualization is required > val m = Map( "c1" -> "컬럼_1", "c2" -> "컬럼_2") > df1.withColumnRenamed(m) > df2.withColumnRenamed(m) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26225) Scan: track decoding time for row-based data sources
[ https://issues.apache.org/jira/browse/SPARK-26225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741315#comment-16741315 ] Yuanjian Li commented on SPARK-26225: - Thanks for your reply Wenchen, as our discussion, the decoding time for file format should hold on until data source v2 implement done, so I just close [GitHub Pull Request #23378|https://github.com/apache/spark/pull/23378]. For the `RowDataSourceScanExec`, I give a preview PR here [GitHub Pull Request #23528|https://github.com/apache/spark/pull/23528], but during the work, I found it does not take too much time, please take a look whether it's necessary to add this metric for `RowDataSourceScanExec`. Thanks. > Scan: track decoding time for row-based data sources > > > Key: SPARK-26225 > URL: https://issues.apache.org/jira/browse/SPARK-26225 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Reynold Xin >Priority: Major > > Scan node should report decoding time for each record, if it is not too much > overhead. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26609) Kinesis-Spark Stream unable to process records
[ https://issues.apache.org/jira/browse/SPARK-26609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aman Mundra updated SPARK-26609: Attachment: 2.PNG 1.PNG > Kinesis-Spark Stream unable to process records > -- > > Key: SPARK-26609 > URL: https://issues.apache.org/jira/browse/SPARK-26609 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.2.0 > Environment: > {code:java} > 2.2.0 > > > org.apache.spark > spark-core_${scala.binary.version} > ${spark.version} > > > > org.apache.spark > spark-sql_${scala.binary.version} > ${spark.version} > > > > org.apache.spark > spark-hive_${scala.binary.version} > ${spark.version} > > > > org.apache.spark > spark-mllib_${scala.binary.version} > ${spark.version} > > > > org.apache.spark > spark-streaming_${scala.binary.version} > ${spark.version} > > > > org.apache.spark > spark-streaming-kinesis-asl_2.11 > ${spark.version} > > > > com.databricks > spark-redshift_2.11 > 3.0.0-preview1 > > > > com.amazon.redshift > redshift-jdbc42 > 1.2.18.1036 > > {code} > > > spark.driver.cores=6 > spark.driver.memory=12g > spark.yarn.driver.memoryOverhead=1g > spark.driver.maxResultSize=4g > spark.executor.memory=8g > spark.executor.cores=4 > spark.yarn.executor.memoryOverhead=1g > spark.executor.instances=4 > spark.shuffle.service.enabled=true > spark.shuffle.registration.timeout=600 > spark.sql.shuffle.partitions=8 > spark.scheduler.mode=FIFO > maximizeResourceAllocation=true > spark.dynamicAllocation.enabled=true > spark.dynamicAllocation.executorIdleTimeout=60s > >Reporter: Aman Mundra >Priority: Major > Attachments: 1.PNG, 2.PNG > > > I'm trying to consume kinesis stream via spark streaming and amazon KCL lib. > Streaming job gets stuck at processing as so > on as it gets the first batch of non zero records. > I'm getting json data in my kinesis stream and here's what I'm trying to > achieve: > Get Dstream[ArrayByte] > convert to Dstream[String] > RDD > load as json to > create dataframe and perform transformations. > > Similar error links: > [https://stackoverflow.com/questions/40225135/spark-streaming-kafka-job-stuck-in-processing-stage] > I'm running the job in emr-5.8.0 with enough number of cores and executors > but still the job gets stuck in processing stage and build a huge pile of > queued batches over time. > Not able to process even a single record. > > Here's the code I'm using: > > > {code:java} > val numStreams=2 > val sparkStreamingBatchInterval=10 > val kinesisCheckpointInterval=5 > > val kinesisStreams = (0 until kinesisConfig("numStreams").toInt).map { i => > KinesisInputDStream.builder > .streamingContext(ssc) > .endpointUrl(kinesisConfig("endpointUrl")) > .regionName(kinesisConfig("regionName")) > .streamName(kinesisConfig("streamName")) > .initialPositionInStream(InitialPositionInStream.LATEST) > .checkpointAppName(kinesisConfig("appName")) > > .checkpointInterval(Seconds(kinesisConfig("kinesisCheckpointInterval").toInt)) > .storageLevel(StorageLevel.MEMORY_AND_DISK_2) > .kinesisCredentials(awsCredentials.build()) > .build() > } > val unionStreams = ssc.union(kinesisStreams) > val lines = unionStreams.flatMap(byteArray => new String(byteArray).split(" > ")) > lines.print(2) > lines.foreachRDD(rdd => { > if > (!rdd.partitions.isEmpty){ > println("New records found\nmetrics count in the batch: > %s".format(rdd.count())) //works > println("performing transformations") > rdd.saveAsTextFile("path")//works > import sparkSession.implicits._ > println(rdd.toString()) //not working > val records = rdd.toDF("records") //not working > println(records.take(2)) //not working > println(records.count()) //not working > } > else > println("No new record found") > }) > > {code} > > Attaching Thread dump: > h3. Thread dump for executor 2 > Updated at 2019/01/12 10:22:52 > > Collapse All > Search: > > ||Thread ID||Thread Name||Thread State||Thread Locks|| > |65|Executor task launch worker for task > 70|WAITING|Lock(java.util.concurrent.ThreadPoolExecutor$Worker@1560902703})| > |sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > org.apache.spark.streaming.receiver.ReceiverSupervisor.awaitTermination(ReceiverSupervisor.scala:219) > >
[jira] [Created] (SPARK-26609) Kinesis-Spark Stream unable to process records
Aman Mundra created SPARK-26609: --- Summary: Kinesis-Spark Stream unable to process records Key: SPARK-26609 URL: https://issues.apache.org/jira/browse/SPARK-26609 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.2.0 Environment: {code:java} 2.2.0 org.apache.spark spark-core_${scala.binary.version} ${spark.version} org.apache.spark spark-sql_${scala.binary.version} ${spark.version} org.apache.spark spark-hive_${scala.binary.version} ${spark.version} org.apache.spark spark-mllib_${scala.binary.version} ${spark.version} org.apache.spark spark-streaming_${scala.binary.version} ${spark.version} org.apache.spark spark-streaming-kinesis-asl_2.11 ${spark.version} com.databricks spark-redshift_2.11 3.0.0-preview1 com.amazon.redshift redshift-jdbc42 1.2.18.1036 {code} spark.driver.cores=6 spark.driver.memory=12g spark.yarn.driver.memoryOverhead=1g spark.driver.maxResultSize=4g spark.executor.memory=8g spark.executor.cores=4 spark.yarn.executor.memoryOverhead=1g spark.executor.instances=4 spark.shuffle.service.enabled=true spark.shuffle.registration.timeout=600 spark.sql.shuffle.partitions=8 spark.scheduler.mode=FIFO maximizeResourceAllocation=true spark.dynamicAllocation.enabled=true spark.dynamicAllocation.executorIdleTimeout=60s Reporter: Aman Mundra I'm trying to consume kinesis stream via spark streaming and amazon KCL lib. Streaming job gets stuck at processing as soon as it gets the first batch of non zero records. Here's the code I'm using: {code:java} val numStreams=2 val sparkStreamingBatchInterval=10 val kinesisCheckpointInterval=5 val kinesisStreams = (0 until kinesisConfig("numStreams").toInt).map { i => KinesisInputDStream.builder .streamingContext(ssc) .endpointUrl(kinesisConfig("endpointUrl")) .regionName(kinesisConfig("regionName")) .streamName(kinesisConfig("streamName")) .initialPositionInStream(InitialPositionInStream.LATEST) .checkpointAppName(kinesisConfig("appName")) .checkpointInterval(Seconds(kinesisConfig("kinesisCheckpointInterval").toInt)) .storageLevel(StorageLevel.MEMORY_AND_DISK_2) .kinesisCredentials(awsCredentials.build()) .build() } val unionStreams = ssc.union(kinesisStreams) val lines = unionStreams.flatMap(byteArray => new String(byteArray).split(" ")) lines.print(2) lines.foreachRDD(rdd => { if (!rdd.partitions.isEmpty){ println("New records found\nmetrics count in the batch: %s".format(rdd.count())) //works println("performing transformations") rdd.saveAsTextFile("path")//works import sparkSession.implicits._ println(rdd.toString()) //not working val records = rdd.toDF("records") //not working println(records.take(2)) //not working println(records.count()) //not working } else println("No new record found") }) {code} Attaching Thread dump: h3. Thread dump for executor 2 Updated at 2019/01/12 10:22:52 Collapse All Search: ||Thread ID||Thread Name||Thread State||Thread Locks|| |65|Executor task launch worker for task 70|WAITING|Lock(java.util.concurrent.ThreadPoolExecutor$Worker@1560902703})| |sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.spark.streaming.receiver.ReceiverSupervisor.awaitTermination(ReceiverSupervisor.scala:219) org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:608) org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:597) org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2173) org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2173) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) org.apache.spark.scheduler.Task.run(Task.scala:108) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) java.lang.Thread.run(Thread.java:748)| |123|Attach Listener|RUNNABLE| | | | |75|cw-metrics-publisher|TIMED_WAITING| | |java.lang.Object.wait(Native Method) com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:136)