How Spark Calculate partition size automatically
Hi, When I am running a job, that is loading the data from Cassandra, Spark has created almost 9million partitions. How spark decide the partition count? I have read from one of the presentation that it is good to have 1000 to 10,000 partitions. Regards Raj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-Spark-Calculate-partition-size-automatically-tp21109.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Find S3 file attributes by Spark
Hi, We have file in AWS S3 bucket, that is loaded frequently, When accessing that file from spark, can we get file properties by some method in spark? Regards Raj -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Find-S3-file-attributes-by-Spark-tp21039.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Api to get the status of spark workers
You can use 4040 port, that gives information for current running application. That will give detail summary of currently running executors. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Api-to-get-the-status-of-spark-workers-tp20967p20980.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Timeout Exception in standalone cluster
Hi, I am getting following exception in Spark (1.1.0) Job that is running on "Standalone Cluster". My cluster configuration is: Intel(R) 2.50GHz 4 Core 16 GB RAM 5 Machines. Exception in thread "main" java.lang.reflect.UndeclaredThrowableException: Unknown exception in doAs at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1134) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.security.PrivilegedActionException: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) ... 4 more Caused by: java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Timeout-Exception-in-standalone-cluster-tp20979.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Failed to read chunk exception
I am facing the same issue in spark-1.1.0 versions /12/29 20:44:31 INFO scheduler.TaskSetManager: Starting task 5.0 in stage 1.1 (TID 1373, X.X.X.X , ANY, 2185 bytes) 14/12/29 20:44:31 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0 (TID 1367, iX.X.X.X): java.io.IOException: failed to read chunk org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:348) org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:384) java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2293) java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2586) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Failed-to-read-chunk-exception-tp20374p20891.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org