[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15751124#comment-15751124 ] Hardik Gupta commented on SPARK-13525: -- Even I am facing this issue when am running in cluster of 5 nodes. My printschema works but head fails My code: library(SparkR, lib.loc = "/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/R/lib") sc <- sparkR.init(master = "spark://10.103.25.39:7077", appName = "demo", sparkHome = "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark", sparkEnvir = list(spark.executor.memory = '16m')) sqlContext <- sparkRSQL.init(sc) df <- createDataFrame(sqlContext, iris) head(df) printSchema(df) sparkR.stop() I get the following logs # spark-submit --master spark://10.103.25.39:7077 /opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/sample2.R Loading required package: methods Attaching package: ‘SparkR’ The following objects are masked from ‘package:stats’: filter, na.omit The following objects are masked from ‘package:base’: intersect, rbind, sample, subset, summary, table, transform Warning message: package ‘SparkR’ was built under R version 3.2.1 16/12/15 16:55:12 INFO SparkContext: Running Spark version 1.5.0-cdh5.5.1 16/12/15 16:55:13 INFO SecurityManager: Changing view acls to: root 16/12/15 16:55:13 INFO SecurityManager: Changing modify acls to: root 16/12/15 16:55:13 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 16/12/15 16:55:13 INFO Slf4jLogger: Slf4jLogger started 16/12/15 16:55:14 INFO Remoting: Starting remoting 16/12/15 16:55:14 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@10.103.40.40:38177] 16/12/15 16:55:14 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@10.103.40.40:38177] 16/12/15 16:55:14 INFO Utils: Successfully started service 'sparkDriver' on port 38177. 16/12/15 16:55:14 INFO SparkEnv: Registering MapOutputTracker 16/12/15 16:55:14 INFO SparkEnv: Registering BlockManagerMaster 16/12/15 16:55:14 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-54d59b34-eb65-4af4-9f02-07590932fc4f 16/12/15 16:55:14 INFO MemoryStore: MemoryStore started with capacity 530.0 MB 16/12/15 16:55:14 INFO HttpFileServer: HTTP File server directory is /tmp/spark-38ff69cb-a858-430e-9104-7ea7b7a4b3be/httpd-7d4b481b-c3b2-43e4-b2bc-bf499798b1fa 16/12/15 16:55:14 INFO HttpServer: Starting HTTP Server 16/12/15 16:55:14 INFO Server: jetty-8.y.z-SNAPSHOT 16/12/15 16:55:14 INFO AbstractConnector: Started SocketConnector@0.0.0.0:33746 16/12/15 16:55:14 INFO Utils: Successfully started service 'HTTP file server' on port 33746. 16/12/15 16:55:14 INFO SparkEnv: Registering OutputCommitCoordinator 16/12/15 16:55:14 INFO Server: jetty-8.y.z-SNAPSHOT 16/12/15 16:55:14 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 16/12/15 16:55:14 INFO Utils: Successfully started service 'SparkUI' on port 4040. 16/12/15 16:55:14 INFO SparkUI: Started SparkUI at http://10.103.40.40:4040 16/12/15 16:55:14 INFO Utils: Copying /opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/sample2.R to /tmp/spark-38ff69cb-a858-430e-9104-7ea7b7a4b3be/userFiles-57695a67-ea9b-484c-af69-94298e02997b/sample2.R 16/12/15 16:55:14 INFO SparkContext: Added file file:/opt/BIG-DATA/spark-1.5.0-bin-hadoop2.6/sample2.R at http://10.103.40.40:33746/files/sample2.R with timestamp 1481801114815 16/12/15 16:55:14 WARN MetricsSystem: Using default name DAGScheduler for source because spark.app.id is not set. 16/12/15 16:55:14 INFO AppClient$ClientEndpoint: Connecting to master spark://10.103.25.39:7077... 16/12/15 16:55:15 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20161215165352-0123 16/12/15 16:55:15 INFO AppClient$ClientEndpoint: Executor added: app-20161215165352-0123/0 on worker-20161208195327-10.103.40.207-7078 (10.103.40.207:7078) with 4 cores 16/12/15 16:55:15 INFO SparkDeploySchedulerBackend: Granted executor ID app-20161215165352-0123/0 on hostPort 10.103.40.207:7078 with 4 cores, 16.0 MB RAM 16/12/15 16:55:15 INFO AppClient$ClientEndpoint: Executor added: app-20161215165352-0123/1 on worker-20161208184218-10.103.25.39-7078 (10.103.25.39:7078) with 4 cores 16/12/15 16:55:15 INFO SparkDeploySchedulerBackend: Granted executor ID app-20161215165352-0123/1 on hostPort 10.103.25.39:7078 with 4 cores, 16.0 MB RAM 16/12/15 16:55:15 INFO AppClient$ClientEndpoint: Executor added: app-20161215165352-0123/2 on worker-20161208195437-10.103.40.186-7078 (10.103.40.186:7078) with 4 cores 16/12/15 16:55:15 INFO SparkDeploySchedulerBackend: Granted executor ID app-20161215165352-0123/2 on hostPort 10.103.40.186:7078 with 4 cores, 16.0 MB RAM 16/12/15 16:55:15 INFO Utils: Successfully
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15454700#comment-15454700 ] Arihanth Jain commented on SPARK-13525: --- Sorry, I am not sure on how to achieve this. I could only reach the below: #!/bin/bash setsid Rscript my_SparkR_script.R > logfile.txt & tail -f logfile.txt - cat logfile.txt: Launching java with spark-submit command /usr/hdp/2.4.0.0-169/spark//bin/spark-submit --driver-memory "12g" sparkr-shell /tmp/Rtmpkyis3T/backend_port415016dcc029 spark.yarn.driver.memoryOverhead is set but does not apply in client mode. - Could you help in creating the right wrapper ? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15447791#comment-15447791 ] Sun Rui commented on SPARK-13525: - yes, if spark fails to launch R worker as a process, it should throw IOException. So throwing SocketTimeoutException implies that the R worker process has been successfully launched. But your experiment shows that daemon.R seems not having a chance to be executed by RScript. So I guess something must be wrong after the Rscript is launched but before it begins to interpret the R script in daemon.R. Could you write a shell wrapper for Rscript? Setting the configure option to point to it. This wrapper writes some debug information to a temp file, and in turn launches Rscript. Remember to re-direct the stdout and stderr of Rscript for debugging. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445686#comment-15445686 ] Arihanth Jain commented on SPARK-13525: --- I followed this and it fails to create the debug file, which does not seem to have reached daemon.R as reported earlier by other user. On the other hand, i repeated the run using 'spark.sparkr.r.command=/path to/Rscript' and see a different trace as below: 16/08/29 13:38:49 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, mastertuconti.jiffybox.net, partition 0,PROCESS_LOCAL, 16535 bytes) 16/08/29 13:38:49 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on mastertuconti.jiffybox.net:44974 (size: 5.7 KB, free: 1247.2 MB) 16/08/29 13:38:50 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, mastertuconti.jiffybox.net): java.io.IOException: Cannot run program "/usr/lib64/R/bin/": error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048) at org.apache.spark.api.r.RRDD$.createRProcess(RRDD.scala:413) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:429) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.(UNIXProcess.java:248) at java.lang.ProcessImpl.start(ProcessImpl.java:134) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) ... 33 more 16/08/29 13:38:50 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, mastert.jiffybox.net, partition 0,PROCESS_LOCAL, 16535 bytes) 16/08/29 13:38:50 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2) on executor mastert.jiffybox.net: java.io.IOException (Cannot run program "/usr/lib64/R/bin/": error=13, Permission denied) [duplicate 1] 16/08/29 13:38:50 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, mastert.jiffybox.net, partition 0,PROCESS_LOCAL, 16535 bytes) 16/08/29 13:38:50 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3) on executor mastert.jiffybox.net: java.io.IOException (Cannot run program "/usr/lib64/R/bin/": error=13, Permission denied) [duplicate 2] 16/08/29 13:38:50 INFO TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, mastert.jiffybox.net, partition 0,PROCESS_LOCAL, 16535 bytes) 16/08/29 13:38:50 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4) on executor mastert.jiffybox.net: java.io.IOException (Cannot run program "/usr/lib64/R/bin/": error=13, Permission denied) [duplicate 3] 16/08/29 13:38:50 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 16/08/29 13:38:50 INFO
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444579#comment-15444579 ] Sun Rui commented on SPARK-13525: - Could you help to do some debugging by modifying R code? The steps are: 1. Modify /R/lib/SparkR/worker/daemon.R, add some checkpoints like 'cat("xxx", file="/tmp/sparkr-debug")' around some keypoints: {code} p <- parallel:::mcfork() ... source(script) ... {code} 2. Zip the /R/lib/SparkR directory to override the existing zip file : /R/lib/sparkr.zip 3. Run your spark application. And check the local file "/tmp/sparkr-debug" on all your Yarn worker nodes (you can limit the number of executors for debugging) Hope this can help to find where the code is broken > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441107#comment-15441107 ] Arihanth Jain commented on SPARK-13525: --- I checked for localhost and it works. The spark cluster deployment is managed through yarn. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440505#comment-15440505 ] Sun Rui commented on SPARK-13525: - What's your spark cluster deployment mode? yarn or standalone? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440476#comment-15440476 ] Sun Rui commented on SPARK-13525: - Another guess: could you check "localhost" works for local TCP connection? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15440090#comment-15440090 ] Arihanth Jain commented on SPARK-13525: --- [~sunrui] I have tried "spark.sparkr.use.daemon" to false with no luck. Now, dealing with this by creating R cluster using base package "parallel" and makePSOCKcluster function. I believe this gets closer to finding: By passing nodes Hostname the workers fail with following error and hangs on it forever @@@ @ WARNING: POSSIBLE DNS SPOOFING DETECTED! @ @@@ The RSA host key for test02.servers.jiffybox.net has changed, and the key for the corresponding IP address 134.xxx.xx.xxx is unchanged. This could either mean that DNS SPOOFING is happening or the IP address for the host and its host key have changed at the same time. Offending key for IP in /root/.ssh/known_hosts:10 @@@ @WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is . Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending key in /root/.ssh/known_hosts:9 RSA host key for test02.servers.jiffybox.net has changed and you have requested strict checking. Host key verification failed. The same works fine and all workers are started when passing nodes IP address instead of Hostname. starting worker pid=32407 on master.jiffybox.net:11575 at 22:18:50.050 starting worker pid=3523 on master.jiffybox.net:11575 at 22:18:50.464 starting worker pid=2583 on master.jiffybox.net:11575 at 22:18:50.885 starting worker pid=5227 on master.jiffybox.net:11575 at 22:18:51.294 -- The above "DNS SPOOFING" issue was simply resolved by removing the matching entries from .ssh/known_hosts and recreating them for all nodes "ssh root@hostname". This fixed the previous issue and was able to able to create socket cluster with 4 nodes (now at port 11977). starting worker pid=6804 on master.jiffybox.net:11977 at 23:59:23.245 starting worker pid=10257 on master.jiffybox.net:11977 at 23:59:23.668 starting worker pid=9776 on master.jiffybox.net:11977 at 23:59:24.107 starting worker pid=12073 on master.jiffybox.net:11977 at 23:59:24.540 note: Neither the path to Rscript not any port number was specified. -- Unfortunately, this did not resolve the problem with SparkR. It fails with existing issue "java.net.SocketTimeoutException: Accept timed out". > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438331#comment-15438331 ] Sun Rui commented on SPARK-13525: - sorry, due to missing log information in such point, it is hard to determine the root cause. It seems that your are using an old SparkR version. and I am wondering if it is possible for you to modify R code to collection more information? Also you can try to set "spark.sparkr.use.daemon" to false to see wether this issue gone or not > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15438316#comment-15438316 ] Sun Rui commented on SPARK-13525: - I checked the code and realized that SocketTimeoutException means the R worker process should have been started. Otherwise, other exception should be thrown when calling ProcessBuilder.start() if there is any problem starting the R worker process. we don't know the root cause of such kind of issue, it may be due to a bug or just issues in system runtime environment. But at least we can: 1. Update documentation for setup of SparkR. state clearly that R must be installed on each node, and R worker executable can be configured to a proper path. 2. Update the RRunner, enlarge the scope of the try block to cover the connection establishment, so that the stderr output of the R process can be printed when SocketTimeoutException is thrown. That will help us to find the root cause. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437856#comment-15437856 ] Shivaram Venkataraman commented on SPARK-13525: --- Yeah this is related but a slightly different error - This means that the R daemons were started but the workers they forked didn't connect back to the JVM. I think this could happen if the machine runs of memory / file descriptors etc causing a fork to fail ? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437605#comment-15437605 ] Arihanth Jain commented on SPARK-13525: --- [~shivaram] I have similar trace, please see below: ERROR RBackendHandler: fitRModelFormula on org.apache.spark.ml.api.r.SparkRWrappers failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, test.jiffybox.net): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:404) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:71) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadChec - the only difference when observed to [~vmenda] trace is: "at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426)" Does this indicate that R workers were actually started on worker machines ? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316908#comment-15316908 ] Shivaram Venkataraman commented on SPARK-13525: --- [~vmenda] Your stack trace indicates that the error was because the R worker could not be started correctly on the worker machines. This is mostly due to not having R installed on the workers and / or due to Rscript not being on the PATH. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15316187#comment-15316187 ] menda venugopal commented on SPARK-13525: - Hi, We have problem with similar kind of issue where SPARKML doesn't work with SPARKR with yarn-client mode. In local mode, it works. Spark core and Spark SQLworks without problem with R in local mode or yarn client. # > head(filter(df01, df01$C3 < 014001)) C0 C1 C2 C3 C4 C5 C6 C7 1 01001A008W-1 1 Lotissement Bellevue 01400 L'Abergement-Cl\xe9menciat CAD 46.134565 4.924122 2 01001A008W-2 2 Lotissement Bellevue 01400 L'Abergement-Cl\xe9menciat CAD 46.134633 4.924168 3 01001A008W-4 4 Lotissement Bellevue 01400 L'Abergement-Cl\xe9menciat CAD 46.134637 4.924353 4 01001A008W-5 5 Lotissement Bellevue 01400 L'Abergement-Cl\xe9menciat CAD 46.134518 4.924333 5 01001A008W-6 6 Lotissement Bellevue 01400 L'Abergement-Cl\xe9menciat CAD 46.134345 4.924229 6 01001A015D-1 1 Lotissement Les Charmilles 01400 L'Abergement-Cl\xe9menciat C+O 46.151573 4.921591 > groupByCode_Postal <- groupBy(df01, df01$C3) > code_Postal_G <- summarize(groupByCode_Postal, count = count(df01$C3)) > df <- createDataFrame(sqlContext, iris) Warning messages: 1: In FUN(X[[5L]], ...) : Use Sepal_Length instead of Sepal.Length as column name 2: In FUN(X[[5L]], ...) : Use Sepal_Width instead of Sepal.Width as column name 3: In FUN(X[[5L]], ...) : Use Petal_Length instead of Petal.Length as column name 4: In FUN(X[[5L]], ...) : Use Petal_Width instead of Petal.Width as column name > model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = > "gaussian") model <- glm(Sepal_Length ~ Sepal_Width + Species, data = df, family = "gaussian") 16/05/27 12:33:59 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5, host-172-30-125-248.openstacklocal): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) # 16/05/27 12:34:29 ERROR TaskSetManager: Task 0 in stage 5.0 failed 4 times; aborting job 16/05/27 12:34:29 ERROR RBackendHandler: fitRModelFormula on org.apache.spark.ml.api.r.SparkRWrappers failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 4 times, most recent failure: Lost task 0.3 in stage 5.0 (TID 8, host-172-30-125-248.openstacklocal): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:426) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) at org.apache.spark.rdd.MapPartitionsRDD.compute(Ma > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297952#comment-15297952 ] Sumit commented on SPARK-13525: --- [~shivaram][~Narine] If you do not see logs in daemon.R, most likely the case is that Rscript at the worker node is not found or not executable, hence daemon.R script is not run at all.Try setting the location of Rscript at the node explicitly using spark.r.command config option. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297813#comment-15297813 ] Narine Kokhlikyan commented on SPARK-13525: --- Thanks for the hint [~shivaram]! It doesn't seem to reach daemon.R ?! I do not see any print-outs : 16/05/24 00:08:14 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409) at java.net.ServerSocket.implAccept(ServerSocket.java:545) at java.net.ServerSocket.accept(ServerSocket.java:513) at org.apache.spark.api.r.RRunner$.createRWorker(RRunner.scala:354) at org.apache.spark.api.r.RRunner.compute(RRunner.scala:68) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:49) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 16/05/24 00:08:14 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:4 > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15297159#comment-15297159 ] Shivaram Venkataraman commented on SPARK-13525: --- If the problem is on line 353 it means that the forked R worker process is not connecting to the JVM process. The way to debug this is to add some print statements to daemon.R (specifically around lines https://github.com/apache/spark/blob/37c617e4f580482b59e1abbe3c0c27c7125cf605/R/pkg/inst/worker/daemon.R#L29) and see if the port number matches what is expected and/or if the connection is failing due to some other reason. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296630#comment-15296630 ] Narine Kokhlikyan commented on SPARK-13525: --- Hi guys, I'm afraid I'm seeing this issue on my freshly installed Ubuntu 16.04. I saw no issues with Mac OS. It fails here: https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/r/RRunner.scala#L353 The timeout is already set to 1. [~sunrui],[~shivaram], [~felixcheung], Do you guys have any idea how could I debug this ? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251955#comment-15251955 ] Sumit commented on SPARK-13525: --- Trying to figure out how to do that. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251888#comment-15251888 ] Sun Rui commented on SPARK-13525: - It is expected that read.df() has no such issue, because R worker won't be involved in this case. Have you tried to set a larger timeout value? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251807#comment-15251807 ] Sumit commented on SPARK-13525: --- I doubt it is a lag in starting the R workers. Because, the operations fail only if I use head,filter,... functions on SparkR dataframes created from R dataframes. If I directly create a SparkR dataframe from a file, the operations succeed. For example: DF <- createrDataFrame(sqlContext, faithful) SparkR::head(DF) fails But, DF <- read.df(sqlContext, , 'json') SparkR::head(DF) succeeds. I set the spark.sparkr.use.daemon = 'false', but that did'nt help either. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15251144#comment-15251144 ] Sun Rui commented on SPARK-13525: - We need to find the root cause before we have a fix for it. The default timeout value for waiting for R worker's connection is 10 seconds, it is less likely that it takes more than 10 seconds to launch an R worker. So we need to figure out what's wrong during the launch of R worker. 1. Since it takes 10 seconds to fail, could you us ps tool to check if the R worker process has really been launched before the timeout exception happens? 2. Since your are using SparkR built from master branch, I assume you can modify the source code. So could you try to modify the line 430 in RRDD.scala: ``` serverSocket.setSoTimeout(1) ``` by setting the timeout value to a much larger value to see if the issue can be solved? > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174720#comment-15174720 ] Sun Rui commented on SPARK-13525: - the interactive R session is for your driver, Rscript is needed for launching R workers. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) >
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174193#comment-15174193 ] Shubhanshu Mishra commented on SPARK-13525: --- I am running my code from the interactive R session, so I believe Rscript is not needed. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174191#comment-15174191 ] Shubhanshu Mishra commented on SPARK-13525: --- Hi [~felixcheung], I used the steps from the link and was able to get the head method to be exported correctly. However, I am still getting the following error when I call head: {code} 16/03/01 12:40:24 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job 16/03/01 12:40:24 ERROR RBackendHandler: dfToCols on org.apache.spark.sql.api.r.SQLUtils failed Error in invokeJava(isStatic = TRUE, className, methodName, ...) : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost): java.net.SocketTimeoutException: Accept timed out at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) at java.net.ServerSocket.implAccept(ServerSocket.java:530) at java.net.ServerSocket.accept(ServerSocket.java:498) at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > q() Save workspace image? [y/n/c]: n {code} > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171294#comment-15171294 ] Sun Rui commented on SPARK-13525: - Is the path of your Rscript in $PATH? If not, you can set "spark.r.command" to point to the path to it. > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15170860#comment-15170860 ] Felix Cheung commented on SPARK-13525: -- It's odd that `head` is not exported. Could you try running it as described here: https://spark.apache.org/docs/1.6.0/sparkr.html#starting-up-from-rstudio > SparkR: java.net.SocketTimeoutException: Accept timed out when running any > dataframe function > - > > Key: SPARK-13525 > URL: https://issues.apache.org/jira/browse/SPARK-13525 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Shubhanshu Mishra > Labels: sparkr > > I am following the code steps from this example: > https://spark.apache.org/docs/1.6.0/sparkr.html > There are multiple issues: > 1. The head and summary and filter methods are not overridden by spark. Hence > I need to call them using `SparkR::` namespace. > 2. When I try to execute the following, I get errors: > {code} > $> $R_HOME/bin/R > R version 3.2.3 (2015-12-10) -- "Wooden Christmas-Tree" > Copyright (C) 2015 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > R is free software and comes with ABSOLUTELY NO WARRANTY. > You are welcome to redistribute it under certain conditions. > Type 'license()' or 'licence()' for distribution details. > Natural language support but running in an English locale > R is a collaborative project with many contributors. > Type 'contributors()' for more information and > 'citation()' on how to cite R or R packages in publications. > Type 'demo()' for some demos, 'help()' for on-line help, or > 'help.start()' for an HTML browser interface to help. > Type 'q()' to quit R. > Welcome at Fri Feb 26 16:19:35 2016 > Attaching package: ‘SparkR’ > The following objects are masked from ‘package:base’: > colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, > summary, transform > Launching java with spark-submit command > /content/smishra8/SOFTWARE/spark/bin/spark-submit --driver-memory "50g" > sparkr-shell /tmp/RtmpfBQRg6/backend_portc3bc16f09b1b > > df <- createDataFrame(sqlContext, iris) > Warning messages: > 1: In FUN(X[[i]], ...) : > Use Sepal_Length instead of Sepal.Length as column name > 2: In FUN(X[[i]], ...) : > Use Sepal_Width instead of Sepal.Width as column name > 3: In FUN(X[[i]], ...) : > Use Petal_Length instead of Petal.Length as column name > 4: In FUN(X[[i]], ...) : > Use Petal_Width instead of Petal.Width as column name > > training <- filter(df, df$Species != "setosa") > Error in filter(df, df$Species != "setosa") : > no method for coercing this S4 class to a vector > > training <- SparkR::filter(df, df$Species != "setosa") > > model <- SparkR::glm(Species ~ Sepal_Length + Sepal_Width, data = training, > > family = "binomial") > 16/02/26 16:26:46 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) > java.net.SocketTimeoutException: Accept timed out > at java.net.PlainSocketImpl.socketAccept(Native Method) > at > java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398) > at java.net.ServerSocket.implAccept(ServerSocket.java:530) > at java.net.ServerSocket.accept(ServerSocket.java:498) > at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:431) > at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:62) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:277) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) >