[jira] [Commented] (SPARK-26466) Use ConfigEntry for hardcoded configs for submit categories.

2019-01-12 Thread Jungtaek Lim (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741480#comment-16741480
 ] 

Jungtaek Lim commented on SPARK-26466:
--

I'm working on this.

> Use ConfigEntry for hardcoded configs for submit categories.
> 
>
> Key: SPARK-26466
> URL: https://issues.apache.org/jira/browse/SPARK-26466
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> Make the following hardcoded configs to use {{ConfigEntry}}.
> {code}
> spark.kryo
> spark.kryoserializer
> spark.jars
> spark.submit
> spark.serializer
> spark.deploy
> spark.worker
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26120:
-
Fix Version/s: 2.3.3

> Fix a streaming query leak in Structured Streaming R tests
> --
>
> Key: SPARK-26120
> URL: https://issues.apache.org/jira/browse/SPARK-26120
> Project: Spark
>  Issue Type: Test
>  Components: SparkR, Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> "Specify a schema by using a DDL-formatted string when reading" doesn't stop 
> the streaming query before stopping Spark. It causes the following annoying 
> logs.
> {code}
> Exception in thread "stream execution thread for [id = 
> 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = 
> 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution 
> thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = 
> 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: 
> Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing

2019-01-12 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741453#comment-16741453
 ] 

Felix Cheung commented on SPARK-26565:
--

Yeah, my point wasn’t to allow access to unsigned release but to help RM to 
check out built packages before kicking off the RC process before release.

For example, often times the build completes successfully but there are some 
issue with the content.


> modify dev/create-release/release-build.sh to let jenkins build packages w/o 
> publishing
> ---
>
> Key: SPARK-26565
> URL: https://issues.apache.org/jira/browse/SPARK-26565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Attachments: fine.png, no-idea.jpg
>
>
> about a year+ ago, we stopped publishing releases directly from jenkins...
> this means that the spark-\{branch}-packaging builds are failing due to gpg 
> signing failures, and i would like to update these builds to *just* perform 
> packaging.
> example:
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console]
> i propose to change dev/create-release/release-build.sh...
> when the script is called w/the 'package' option, add an {{if}} statement to 
> skip the following sections when run on jenkins:
> 1) gpg signing of the source tarball (lines 184-187)
> 2) gpg signing of the sparkR dist (lines 243-248)
> 3) gpg signing of the python dist (lines 256-261)
> 4) gpg signing of the regular binary dist (lines 264-271)
> 5) the svn push of the signed dists (lines 317-332)
>  
> -another, and probably much better option, is to nuke the 
> spark-\{branch}-packaging builds and create new ones that just build things 
> w/o touching this incredible fragile shell scripting nightmare.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Fix Version/s: 2.3.3

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Fix Version/s: 2.3.3

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`

2019-01-12 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741428#comment-16741428
 ] 

Dongjoon Hyun commented on SPARK-26608:
---

Thank you!

> Remove Jenkins jobs for `branch-2.2`
> 
>
> Key: SPARK-26608
> URL: https://issues.apache.org/jira/browse/SPARK-26608
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.2.3
>Reporter: Dongjoon Hyun
>Assignee: shane knapp
>Priority: Major
> Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png
>
>
> This issue aims to remove the following Jenkins jobs for `branch-2.2` because 
> of EOL.
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/]
> As of today, the branch is healthy.
> !Screen Shot 2019-01-11 at 8.47.27 PM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23182) Allow enabling of TCP keep alive for RPC connections

2019-01-12 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-23182:
--
Summary: Allow enabling of TCP keep alive for RPC connections  (was: Allow 
enabling of TCP keep alive for master RPC connections)

> Allow enabling of TCP keep alive for RPC connections
> 
>
> Key: SPARK-23182
> URL: https://issues.apache.org/jira/browse/SPARK-23182
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.4.0
>Reporter: Petar Petrov
>Priority: Major
>
> We rely heavily on preemptible worker machines in GCP/GCE. These machines 
> disappear without closing the TCP connections to the master which increases 
> the number of established connections and new workers can not connect because 
> of "Too many open files" on the master.
> To solve the problem we need to enable TCP keep alive for the RPC connections 
> to the master but it's not possible to do so via configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`

2019-01-12 Thread shane knapp (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp reassigned SPARK-26608:
---

Assignee: shane knapp

> Remove Jenkins jobs for `branch-2.2`
> 
>
> Key: SPARK-26608
> URL: https://issues.apache.org/jira/browse/SPARK-26608
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.2.3
>Reporter: Dongjoon Hyun
>Assignee: shane knapp
>Priority: Major
> Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png
>
>
> This issue aims to remove the following Jenkins jobs for `branch-2.2` because 
> of EOL.
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/]
> As of today, the branch is healthy.
> !Screen Shot 2019-01-11 at 8.47.27 PM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26608) Remove Jenkins jobs for `branch-2.2`

2019-01-12 Thread shane knapp (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741412#comment-16741412
 ] 

shane knapp commented on SPARK-26608:
-

sure, i'll take care of this next week.

> Remove Jenkins jobs for `branch-2.2`
> 
>
> Key: SPARK-26608
> URL: https://issues.apache.org/jira/browse/SPARK-26608
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 2.2.3
>Reporter: Dongjoon Hyun
>Assignee: shane knapp
>Priority: Major
> Attachments: Screen Shot 2019-01-11 at 8.47.27 PM.png
>
>
> This issue aims to remove the following Jenkins jobs for `branch-2.2` because 
> of EOL.
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-maven-hadoop-2.7/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.6/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.2-test-sbt-hadoop-2.7/]
> As of today, the branch is healthy.
> !Screen Shot 2019-01-11 at 8.47.27 PM.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23182) Allow enabling of TCP keep alive for RPC connections

2019-01-12 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-23182:
--
Priority: Minor  (was: Major)

> Allow enabling of TCP keep alive for RPC connections
> 
>
> Key: SPARK-23182
> URL: https://issues.apache.org/jira/browse/SPARK-23182
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.2.2, 2.4.0
>Reporter: Petar Petrov
>Priority: Minor
>
> We rely heavily on preemptible worker machines in GCP/GCE. These machines 
> disappear without closing the TCP connections to the master which increases 
> the number of established connections and new workers can not connect because 
> of "Too many open files" on the master.
> To solve the problem we need to enable TCP keep alive for the RPC connections 
> to the master but it's not possible to do so via configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26564) Fix wrong assertions and error messages for parameter checking

2019-01-12 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26564.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23488
[https://github.com/apache/spark/pull/23488]

> Fix wrong assertions and error messages for parameter checking
> --
>
> Key: SPARK-26564
> URL: https://issues.apache.org/jira/browse/SPARK-26564
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, Spark Core, SQL
>Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.4.0
>Reporter: Kengo Seki
>Assignee: Kengo Seki
>Priority: Minor
>  Labels: starter
> Fix For: 3.0.0
>
>
> I mistakenly set an equivalent value with spark.network.timeout to 
> spark.executor.heartbeatInterval and got the following error:
> {code}
> java.lang.IllegalArgumentException: requirement failed: The value of 
> spark.network.timeout=120s must be no less than the value of 
> spark.executor.heartbeatInterval=120s.
> {code}
> But it can be read as they could be equal. "Greater than" is more precise 
> than "no less than".
> 
> In addition, the following assertions are inconsistent with their messages.
> {code:title=mllib/src/main/scala/org/apache/spark/ml/optim/WeightedLeastSquares.scala}
>  91   require(maxIter >= 0, s"maxIter must be a positive integer: $maxIter")
> {code}
> {code:title=sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashedRelation.scala}
> 416   require(capacity < 51200, "Cannot broadcast more than 512 
> millions rows")
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26591) illegal hardware instruction

2019-01-12 Thread Elchin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741387#comment-16741387
 ] 

Elchin edited comment on SPARK-26591 at 1/12/19 8:00 PM:
-

[~bryanc] I installed it through pip. And I tested it on clean virtual 
environment. And it also crushed

PyArrow version is 0.11.1. I also attached core dump, may be it can help you.


was (Author: elch10):
[~bryanc] I installed it through pip. And I tested it on clean virtual 
environment. And it also doesn't work

PyArrow version is 0.11.1. I also attached core dump, may be it can help you.

> illegal hardware instruction
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Critical
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26591) illegal hardware instruction

2019-01-12 Thread Elchin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Elchin updated SPARK-26591:
---
Attachment: core

> illegal hardware instruction
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Critical
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26591) illegal hardware instruction

2019-01-12 Thread Elchin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741387#comment-16741387
 ] 

Elchin commented on SPARK-26591:


[~bryanc] I installed it through pip. And I tested it on clean virtual 
environment. And it also doesn't work

PyArrow version is 0.11.1. I also attached core dump, may be it can help you.

> illegal hardware instruction
> 
>
> Key: SPARK-26591
> URL: https://issues.apache.org/jira/browse/SPARK-26591
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
> Environment: Python 3.6.7
> Pyspark 2.4.0
> OS:
> {noformat}
> Linux 4.15.0-43-generic #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018 x86_64 
> x86_64 x86_64 GNU/Linux{noformat}
> CPU:
>  
> {code:java}
> Dual core AMD Athlon II P360 (-MCP-) cache: 1024 KB
> clock speeds: max: 2300 MHz 1: 1700 MHz 2: 1700 MHz
> {code}
>  
>  
>Reporter: Elchin
>Priority: Critical
> Attachments: core
>
>
> When I try to use pandas_udf from examples in 
> [documentation|https://spark.apache.org/docs/2.4.0/api/python/pyspark.sql.html#pyspark.sql.functions.pandas_udf]:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> from pyspark.sql.types import IntegerType, StringType
> slen = pandas_udf(lambda s: s.str.len(), IntegerType()) #here it is 
> crashed{code}
> I get the error:
> {code:java}
> [1]    17969 illegal hardware instruction (core dumped)  python3{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26538) Postgres numeric array support

2019-01-12 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26538.
---
   Resolution: Fixed
Fix Version/s: 3.0.0
   2.4.1
   2.3.3

This is resolved via https://github.com/apache/spark/pull/23456

> Postgres numeric array support
> --
>
> Key: SPARK-26538
> URL: https://issues.apache.org/jira/browse/SPARK-26538
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.2, 2.3.2, 2.4.1
> Environment: PostgreSQL 10.4, 9.6.9.
>Reporter: Oleksii
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> Consider the following table definition:
> {code:sql}
> create table test1
> (
>    v  numeric[],
>    d  numeric
> );
> insert into test1 values('{.222,.332}', 222.4555);
> {code}
> When reading the table into a Dataframe, I get the following schema:
> {noformat}
> root
>  |-- v: array (nullable = true)
>  |    |-- element: decimal(0,0) (containsNull = true)
>  |-- d: decimal(38,18) (nullable = true){noformat}
> Notice that for both columns precision and scale were not specified, but in 
> case of the array element I got both set to 0, while in the other case 
> defaults were set.
> Later, when I try to read the Dataframe, I get the following error:
> {noformat}
> java.lang.IllegalArgumentException: requirement failed: Decimal precision 4 
> exceeds max precision 0
>         at scala.Predef$.require(Predef.scala:224)
>         at org.apache.spark.sql.types.Decimal.set(Decimal.scala:114)
>         at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:453)
>         at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$16$$anonfun$apply$6$$anonfun$apply$7.apply(JdbcUtils.scala:474)
>         ...{noformat}
> I would expect to get array elements of type decimal(38,18) and no error when 
> reading in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25430) Add map parameter for withColumnRenamed

2019-01-12 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25430.
--
Resolution: Won't Fix

> Add map parameter for withColumnRenamed
> ---
>
> Key: SPARK-25430
> URL: https://issues.apache.org/jira/browse/SPARK-25430
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Goun Na
>Priority: Major
>
> WithColumnRenamed method should work with map parameter. It removes code 
> redundancy.
> {code:java}
> // example
> df.withColumnRenamed(Map( "c1" -> "first_column", "c2" -> "second_column" 
> )){code}
> {code:java}
> // from abbr columns to desc columns
> val m = Map( "c1" -> "first_column", "c2" -> "second_column" )
> df1.withColumnRenamed(m) 
> df2.withColumnRenamed(m)
> {code}
> It is useful for CJK users when they are working on analysis in notebook 
> environment such as Zeppelin, Databricks, Apache Toree. 
> {code:java}
> // for CJK users once define dictionary into map, reuse column map to 
> translate columns whenever report visualization is required
> val m = Map( "c1" -> "컬럼_1", "c2" -> "컬럼_2") 
> df1.withColumnRenamed(m) 
> df2.withColumnRenamed(m)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26225) Scan: track decoding time for row-based data sources

2019-01-12 Thread Yuanjian Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741315#comment-16741315
 ] 

Yuanjian Li commented on SPARK-26225:
-

Thanks for your reply Wenchen, as our discussion, the decoding time for file 
format should hold on until data source v2 implement done, so I just close 
[GitHub Pull Request #23378|https://github.com/apache/spark/pull/23378].

For the `RowDataSourceScanExec`, I give a preview PR here [GitHub Pull Request 
#23528|https://github.com/apache/spark/pull/23528], but during the work, I 
found it does not take too much time, please take a look whether it's necessary 
to add this metric for `RowDataSourceScanExec`. Thanks.

> Scan: track decoding time for row-based data sources
> 
>
> Key: SPARK-26225
> URL: https://issues.apache.org/jira/browse/SPARK-26225
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Priority: Major
>
> Scan node should report decoding time for each record, if it is not too much 
> overhead.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26609) Kinesis-Spark Stream unable to process records

2019-01-12 Thread Aman Mundra (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aman Mundra updated SPARK-26609:

Attachment: 2.PNG
1.PNG

> Kinesis-Spark Stream unable to process records
> --
>
> Key: SPARK-26609
> URL: https://issues.apache.org/jira/browse/SPARK-26609
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.2.0
> Environment:  
> {code:java}
> 2.2.0
> 
> 
>  org.apache.spark
>  spark-core_${scala.binary.version}
>  ${spark.version}
> 
> 
> 
>  org.apache.spark
>  spark-sql_${scala.binary.version}
>  ${spark.version}
> 
> 
> 
>  org.apache.spark
>  spark-hive_${scala.binary.version}
>  ${spark.version}
> 
> 
> 
>  org.apache.spark
>  spark-mllib_${scala.binary.version}
>  ${spark.version}
> 
> 
> 
>  org.apache.spark
>  spark-streaming_${scala.binary.version}
>  ${spark.version}
> 
> 
> 
>  org.apache.spark
>  spark-streaming-kinesis-asl_2.11
>  ${spark.version}
> 
> 
> 
>  com.databricks
>  spark-redshift_2.11
>  3.0.0-preview1
> 
> 
> 
>  com.amazon.redshift
>  redshift-jdbc42
>  1.2.18.1036
> 
> {code}
>  
>  
> spark.driver.cores=6
> spark.driver.memory=12g
> spark.yarn.driver.memoryOverhead=1g
> spark.driver.maxResultSize=4g
> spark.executor.memory=8g
> spark.executor.cores=4
> spark.yarn.executor.memoryOverhead=1g
> spark.executor.instances=4
> spark.shuffle.service.enabled=true
> spark.shuffle.registration.timeout=600
> spark.sql.shuffle.partitions=8
> spark.scheduler.mode=FIFO
> maximizeResourceAllocation=true
> spark.dynamicAllocation.enabled=true
> spark.dynamicAllocation.executorIdleTimeout=60s
>  
>Reporter: Aman Mundra
>Priority: Major
> Attachments: 1.PNG, 2.PNG
>
>
> I'm trying to consume kinesis stream via spark streaming and amazon KCL lib.
> Streaming job gets stuck at processing as so
> on as it gets the first batch of non zero records.
> I'm getting json data in my kinesis stream and here's what I'm trying to 
> achieve:
> Get Dstream[ArrayByte] > convert to Dstream[String] > RDD > load as json to 
> create dataframe and perform transformations.
>  
> Similar error links:
> [https://stackoverflow.com/questions/40225135/spark-streaming-kafka-job-stuck-in-processing-stage]
> I'm running the job in emr-5.8.0 with enough number of cores and executors 
> but still the job gets stuck in processing stage and build a huge pile of 
> queued batches over time.
> Not able to process even a single record.
>  
> Here's the code I'm using:
>  
>  
> {code:java}
> val numStreams=2
> val sparkStreamingBatchInterval=10
> val kinesisCheckpointInterval=5
>  
> val kinesisStreams = (0 until kinesisConfig("numStreams").toInt).map { i =>
>  KinesisInputDStream.builder
>  .streamingContext(ssc)
>  .endpointUrl(kinesisConfig("endpointUrl"))
>  .regionName(kinesisConfig("regionName"))
>  .streamName(kinesisConfig("streamName"))
>  .initialPositionInStream(InitialPositionInStream.LATEST)
>  .checkpointAppName(kinesisConfig("appName"))
>  
> .checkpointInterval(Seconds(kinesisConfig("kinesisCheckpointInterval").toInt))
>  .storageLevel(StorageLevel.MEMORY_AND_DISK_2)
>  .kinesisCredentials(awsCredentials.build())
>  .build()
> }
> val unionStreams = ssc.union(kinesisStreams)
> val lines = unionStreams.flatMap(byteArray => new String(byteArray).split(" 
> "))
> lines.print(2)
> lines.foreachRDD(rdd => {
>  if
>  (!rdd.partitions.isEmpty){
>  println("New records found\nmetrics count in the batch: 
> %s".format(rdd.count())) //works
>  println("performing transformations")
>  rdd.saveAsTextFile("path")//works
>  import sparkSession.implicits._
>  println(rdd.toString()) //not working
>  val records = rdd.toDF("records") //not working
>  println(records.take(2)) //not working
>  println(records.count()) //not working
>  }
>  else
>  println("No new record found")
> })
>  
> {code}
>  
> Attaching Thread dump:
> h3. Thread dump for executor 2
> Updated at 2019/01/12 10:22:52
>  
> Collapse All
>  Search: 
>   
> ||Thread ID||Thread Name||Thread State||Thread Locks||
> |65|Executor task launch worker for task 
> 70|WAITING|Lock(java.util.concurrent.ThreadPoolExecutor$Worker@1560902703})|
> |sun.misc.Unsafe.park(Native Method) 
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
>  
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
>  java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) 
> org.apache.spark.streaming.receiver.ReceiverSupervisor.awaitTermination(ReceiverSupervisor.scala:219)
>  
> 

[jira] [Created] (SPARK-26609) Kinesis-Spark Stream unable to process records

2019-01-12 Thread Aman Mundra (JIRA)
Aman Mundra created SPARK-26609:
---

 Summary: Kinesis-Spark Stream unable to process records
 Key: SPARK-26609
 URL: https://issues.apache.org/jira/browse/SPARK-26609
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 2.2.0
 Environment:  
{code:java}
2.2.0


 org.apache.spark
 spark-core_${scala.binary.version}
 ${spark.version}




 org.apache.spark
 spark-sql_${scala.binary.version}
 ${spark.version}




 org.apache.spark
 spark-hive_${scala.binary.version}
 ${spark.version}




 org.apache.spark
 spark-mllib_${scala.binary.version}
 ${spark.version}




 org.apache.spark
 spark-streaming_${scala.binary.version}
 ${spark.version}




 org.apache.spark
 spark-streaming-kinesis-asl_2.11
 ${spark.version}




 com.databricks
 spark-redshift_2.11
 3.0.0-preview1




 com.amazon.redshift
 redshift-jdbc42
 1.2.18.1036

{code}
 

 

spark.driver.cores=6
spark.driver.memory=12g
spark.yarn.driver.memoryOverhead=1g
spark.driver.maxResultSize=4g

spark.executor.memory=8g
spark.executor.cores=4
spark.yarn.executor.memoryOverhead=1g
spark.executor.instances=4

spark.shuffle.service.enabled=true
spark.shuffle.registration.timeout=600
spark.sql.shuffle.partitions=8

spark.scheduler.mode=FIFO

maximizeResourceAllocation=true
spark.dynamicAllocation.enabled=true
spark.dynamicAllocation.executorIdleTimeout=60s

 
Reporter: Aman Mundra


I'm trying to consume kinesis stream via spark streaming and amazon KCL lib.

Streaming job gets stuck at processing as soon as it gets the first batch of 
non zero records.

 

Here's the code I'm using:

 

 
{code:java}
val numStreams=2
val sparkStreamingBatchInterval=10
val kinesisCheckpointInterval=5
 
val kinesisStreams = (0 until kinesisConfig("numStreams").toInt).map { i =>
 KinesisInputDStream.builder
 .streamingContext(ssc)
 .endpointUrl(kinesisConfig("endpointUrl"))
 .regionName(kinesisConfig("regionName"))
 .streamName(kinesisConfig("streamName"))
 .initialPositionInStream(InitialPositionInStream.LATEST)
 .checkpointAppName(kinesisConfig("appName"))
 .checkpointInterval(Seconds(kinesisConfig("kinesisCheckpointInterval").toInt))
 .storageLevel(StorageLevel.MEMORY_AND_DISK_2)
 .kinesisCredentials(awsCredentials.build())
 .build()
}
val unionStreams = ssc.union(kinesisStreams)

val lines = unionStreams.flatMap(byteArray => new String(byteArray).split(" "))
lines.print(2)

lines.foreachRDD(rdd => {
 if
 (!rdd.partitions.isEmpty){
 println("New records found\nmetrics count in the batch: 
%s".format(rdd.count())) //works
 println("performing transformations")
 rdd.saveAsTextFile("path")//works
 import sparkSession.implicits._
 println(rdd.toString()) //not working
 val records = rdd.toDF("records") //not working
 println(records.take(2)) //not working
 println(records.count()) //not working
 }
 else
 println("No new record found")
})
 
{code}
 

Attaching Thread dump:
h3. Thread dump for executor 2
Updated at 2019/01/12 10:22:52

 

Collapse All
Search: 
 
||Thread ID||Thread Name||Thread State||Thread Locks||
|65|Executor task launch worker for task 
70|WAITING|Lock(java.util.concurrent.ThreadPoolExecutor$Worker@1560902703})|
|sun.misc.Unsafe.park(Native Method) 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
 java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) 
org.apache.spark.streaming.receiver.ReceiverSupervisor.awaitTermination(ReceiverSupervisor.scala:219)
 
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:608)
 
org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:597)
 org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2173) 
org.apache.spark.SparkContext$$anonfun$34.apply(SparkContext.scala:2173) 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
org.apache.spark.scheduler.Task.run(Task.scala:108) 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335) 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
java.lang.Thread.run(Thread.java:748)|
|123|Attach Listener|RUNNABLE| |
| |
|75|cw-metrics-publisher|TIMED_WAITING| |
|java.lang.Object.wait(Native Method) 
com.amazonaws.services.kinesis.metrics.impl.CWPublisherRunnable.runOnce(CWPublisherRunnable.java:136)