Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn
Little more progress... I add few enviornment variables, not I get following error message: InvocationTargetException: Can't get Master Kerberos principal for use as renewer -> [Help 1] -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kerberos-setup-in-Apache-spark-connecting-to-remote-HDFS-Yarn-tp27181p27189.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Kerberos setup in Apache spark connecting to remote HDFS/Yarn
Rest of the stacktrace. WARNING] java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:294) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:65) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:214) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:275) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:269) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:820) at org.apache.spark.examples.SparkYarn$.launchClient(SparkYarn.scala:57) at org.apache.spark.examples.SparkYarn$.main(SparkYarn.scala:84) at org.apache.spark.examples.SparkYarn.main(SparkYarn.scala) ... 6 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:63) ... 14 more Caused by: KrbException: Cannot locate default realm at sun.security.krb5.Config.getDefaultRealm(Config.java:1029) I did add krb5.config to classpath as well as define KRB5_CONFIG -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kerberos-setup-in-Apache-spark-connecting-to-remote-HDFS-Yarn-tp27181p27183.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Kerberos setup in Apache spark connecting to remote HDFS/Yarn
I am trying to setup my IDE to a scala spark application. I want to access HDFS files from remote Hadoop server that has Kerberos enabled. My understanding is I should be able to do that from Spark. Here is my code so far: val sparkConf = new SparkConf().setAppName(appName).setMaster(master); if(jars.length>0) { sparkConf.setJars(jars); } if(!properties.isEmpty) { //val iter = properties.keys.iterator for((k,v)<-properties) sparkConf.set(k, v); } else { sparkConf .set("spark.executor.memory", "1024m") .set("spark.cores.max", "1") .set("spark.default.parallelism", "4"); } try { if(!StringUtils.isBlank(principal) && !StringUtils.isBlank(keytab)) { //UserGroupInformation.setConfiguration(config); UserGroupInformation.loginUserFromKeytab(principal, keytab); } } catch { case ioe:IOException =>{ println("Failed to login to Hadoop [principal = " + principal + ", keytab = " + keytab + "]"); ioe.printStackTrace();} } val sc = new SparkContext(sparkConf) val MY_FILE: String = "hdfs://remoteserver:port/file.out" val rDD = sc.textFile(MY_FILE,10) println("Lines "+rDD.count); I have core-site.xml in my classpath. I changed hadoop.ssl.enabled to false as it was expecting a secret key. The principal I am using is correct. I tried username/_HOST@fully.qualified.domain and username@fully.qualified.domain with no success. I tried running spark in local mode and yarn client mode. I am hoping someone has a recipe/solved this problem. Any pointers to help setup/debug this problem will be helpful. I am getting following error message: Caused by: java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:65) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:227) at org.apache.hadoop.security.UserGroupInformation.setConfiguration(UserGroupInformation.java:249) at org.apache.spark.examples.SparkYarn$.launchClient(SparkYarn.scala:55) at org.apache.spark.examples.SparkYarn$.main(SparkYarn.scala:83) at org.apache.spark.examples.SparkYarn.main(SparkYarn.scala) ... 6 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:75) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:63) ... 11 more Caused by: KrbException: Cannot locate default realm -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kerberos-setup-in-Apache-spark-connecting-to-remote-HDFS-Yarn-tp27181.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
This post has NOT been accepted by the mailing list yet.
I seem to see this for many of my posts... does anyone have solution? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/This-post-has-NOT-been-accepted-by-the-mailing-list-yet-tp24969.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR Error in sparkR.init(master=“local”) in RStudio
I couldn't get this working... I have have JAVA_HOME set. I have defined SPARK_HOME Sys.setenv(SPARK_HOME="c:\DevTools\spark-1.5.1") .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths())) library("SparkR", lib.loc="c:\\DevTools\\spark-1.5.1\\lib") library(SparkR) sc<-sparkR.init(master="local") I get Error in sparkR.init(master = "local") : JVM is not ready after 10 seconds What am I missing?? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Error-in-sparkR-init-master-local-in-RStudio-tp23768p24949.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: SparkR Error in sparkR.init(master=“local”) in RStudio
It seems it is failing at path <- tempfile(pattern = "backend_port") I do not see backend_port directory created... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Error-in-sparkR-init-master-local-in-RStudio-tp23768p24958.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Loading status
I am not sure what Loading status means, followed by Running. In the application UI, I see: Executor Summary ExecutorID Worker Cores Memory State Logs 1 worker-20150202144112-hadoop-w-1.c.fi-mdd-poc.internal-3887416 83971 LOADING stdout stderr 0 worker-20150202144112-hadoop-w-2.c.fi-mdd-poc.internal-5868516 83971 RUNNING stdout stderr Looking at the executor hadoop-w-2, I see the status is Loading . Why different statuses, and what does that mean? Please see below for details: ID: worker-20150202144112-hadoop-w-2.c.fi-mdd-poc.internal-58685 Master URL: spark://hadoop-m:7077 Cores: 16 (16 Used) Memory: 82.0 GB (82.0 GB Used) Back to Master Running Executors (1) ExecutorID Cores State Memory Job Details Logs 0 16 LOADING 82.0 GB ID: app-20150202152154-0001 Name: Simple File Merge Application User: hadoop stdout stderr Thank you, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Loading-status-tp21468.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
ExternalSorter - spilling in-memory map
I am using spark 1.2, and I see a lot of messages like: ExternalSorter: Thread 66 spilling in-memory map of 5.0 MB to disk (13160 times so far) I seem to have a lot of memory: URL: spark://hadoop-m:7077 Workers: 4 Cores: 64 Total, 64 Used Memory: 328.0 GB Total, 327.0 GB Used _ Executors (4) Memory: 3.5 GB Used (114.8 GB Total) Executor ID Address RDD Blocks Memory Used 0 hadoop-w-1 176 2.8 GB / 28.8 GB 1 hadoop-w-0 42 680.9 MB / 28.8 GB 2 hadoop-w-2 0 0.0 B / 28.8 GB driverhadoop-w-3 0 0.0 B / 28.4 GB Also, I am not sure why hadoop-w-2 is in Loading state? Thanks, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ExternalSorter-spilling-in-memory-map-tp21125.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Any ideas why a few tasks would stall
This did not work for me. that is, rdd.coalesce(200, forceShuffle) . Does anyone have ideas on how to distribute your data evenly and co-locate partitions of interest? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Any-ideas-why-a-few-tasks-would-stall-tp20207p20387.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Help understanding - Not enough space to cache rdd
hmm.. 33.6gb is sum of the memory used by the two RDD that is cached. You're right when I put serialized RDDs in the cache, the memory foot print for these rdds become a lot smaller. Serialized Memory footprint shown below: RDD NameStorage Level Cached Partitions Fraction Cached Size in Memory Size in Tachyon Size on Disk 2 Memory Serialized 1x Replicated 239 100%3.1 GB 0.0 B 0.0 B 5 Memory Serialized 1x Replicated 100 100%1254.9 MB 0.0 B 0.0 B I don't know what is 73.7 reflective of. I am able to verify in the application UI, I am able to see 4.3 GB Used out of (73.7 GB Total) by the cahced RDD. I am not sure how that is 73.7 is calculated. I have following configuration: conf.set(spark.storage.memoryFraction, 0.9); conf.set(spark.shuffle.memoryFraction,0.1); Based on my understanding, 0.9 * 95g (memory allocated to the driver) = 85.5 g should be the available memory, correct? Out of which 1 % is taken out for shuffle(~85.5-8.55=76.95)! which would lead to 76.95 gb usable memory. Is that right? The two RDD that is cached is not using nearly as much. The two systematic problem that I am avoiding is MAX_INTEGER and Requested array size exceeds VM limit No matter how much I tweak the parallelism/memory configuration, there seems to be little or no impact. Is there someone, who can help me understand the internals, so that I can get this working. I know this platform is great viable solution for the use case we have in mind, if I can get it running successfully. At this point, the data size is not that huge compared to some white papers that are published. So, I am thinking it boils down to the configuration and validating what I have with an expert. We can take this offline, if need be. Please feel free to email me directly. Thank you, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186p20269.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Help understanding - Not enough space to cache rdd
I think, the memory calculation is correct, what I didn't account for is the memory used. I am still puzzled as how I can successfully process the RDD in spark. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186p20273.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Help understanding - Not enough space to cache rdd
I am running in local mode. I am using google n1-highmem-16 (16 vCPU, 104 GB memory) machine. I have allocated the SPARK_DRIVER_MEMORY=95g I see Memory: 33.6 GB Used (73.7 GB Total) that the exeuctor is using. In the log out put below, I see 33.6 gb blocks are used by 2 rdds that I have cached. I should still have 40.2 gb left. However, I see messages like: 14/12/02 18:15:04 WARN storage.MemoryStore: Not enough space to cache rdd_15_9 in memory! (computed 8.1 GB so far) 14/12/02 18:15:04 INFO storage.MemoryStore: Memory use = 33.6 GB (blocks) + 40.1 GB (scratch space shared across 14 thread(s)) = 73.7 GB. Storage limit = 73.7 GB. 14/12/02 18:15:04 WARN spark.CacheManager: Persisting partition rdd_15_9 to disk instead. . . . . further down I see: 4/12/02 18:30:08 INFO storage.BlockManagerInfo: Added rdd_15_9 on disk on localhost:41889 (size: 6.9 GB) 4/12/02 18:30:08 INFO storage.BlockManagerMaster: Updated info of block rdd_15_9 14/12/02 18:30:08 ERROR executor.Executor: Exception in task 9.0 in stage 2.0 (TID 348) java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE I don't understand couple of things: 1) In this case, I am joining 2 RDDs (size 16.3 G and 17.2 GB) both rdds are create from reading from HDFS files. The size of each .part is 24.87 MB, I am reading this files into 250 partitions, so I shouldn't have any individual partition over 25MB, so how could rdd_15_9 have 8.1g? 2) Even if the data is 8.1g, spark should have enough memory to write, but I would expect Integer.MAX_VALUE 2gb limitation! However, I don't get that error message, and partial dataset is written to disk (6.9 gb). I don't understand how and why only partial dataset is written. 3) Why do get java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE after writing partial dataset. I would love to hear from anyone that can shed some light into this... None -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Help-understanding-Not-enough-space-to-cache-rdd-tp20186.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
packaging from source gives protobuf compatibility issues.
scala textFile.count() java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$CompleteReques tProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; I tried ./make-distribution.sh -Dhadoop.version=2.5.0 and /usr/local/apache-maven-3.2.3/bin/mvn -Dhadoop.version=2.5.0 -DskipTests clean package both are giving same errors. I am connecting to HDFS hosted on hadoop version 2.5.0. I will appreciate any help anyone can provide! Thanks, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/packaging-from-source-gives-protobuf-compatibility-issues-tp20112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
only option is to split you problem further by increasing parallelism My understanding is by increasing the number of partitions, is that right? That didn't seem to help because it is seem the partitions are not uniformly sized. My observation is when I increase the number of partitions, it creates many empty block partitions and may larger partition is not broken down into smaller size. Any hints, on how I can get uniform partitions. I noticed many threads, but was not able to do any thing effective from Java api. I will appreciate any help/insight you can provide. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-Requested-array-size-exceeds-VM-limit-tp16809p19097.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
SparkSubmitDriverBootstrapper and JVM parameters
/usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java org.apache.spark.deploy.SparkSubmitDriverBootstrapper When I execute /usr/local/spark-1.1.0/bin/spark-submit local[32] for my app, I see two processes get spun off. One is the org.apache.spark.deploy.SparkSubmitDriverBootstrapper and org.apache.spark.deploy.SparkSubmit. My understanding is first one is the driver and the latter is the executor, can you confirm? If that is true, my spark my application defaults don't seem to be picked-up from the following parmeters. My SparkSubmit picks up JVM parameters from here. spark-defaults.conf spark.daemon.memory=45g spark.driver.memory=45g spark.executor.memory=45g It is not clear to me, when spark uses spark-defaults? and when spark-env? Can some one help me understand. spark-env.sh SPARK_DAEMON_MEMORY=30g SPARK_EXECUTOR_MEMORY=30g SPARK_DRIVER_MEMORY=30g I am running into GC/OOM issues, and I am wondering whether tweaking SparkSubmitDriverBootstrapper or SparkSubmit JVM parameter will help. I did look at the configuration on Spark's site, and tried many different approaches as suggested there. Thanks, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SparkSubmitDriverBootstrapper-and-JVM-parameters-tp18290.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
no luck :(! Still observing the same behavior! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17988.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
OOM - Requested array size exceeds VM limit
I am running local (client). My vm is 16 cpu/108gb ram. My configuration is as following: spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC -XX:MaxPermSize=1024m spark.daemon.memory=20g spark.driver.memory=20g spark.executor.memory=20g export SPARK_DAEMON_JAVA_OPTS=-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC -XX:MaxPermSize=1024m /usr/local/spark-1.1.0/bin/spark-submit --class main.java.MyAppMainProcess --master local[32] MyApp.jar myapp.out /11/03 20:45:43 INFO BlockManager: Removing block broadcast_4 14/11/03 20:45:43 INFO MemoryStore: Block broadcast_4 of size 3872 dropped from memory (free 16669590422) 14/11/03 20:45:43 INFO ContextCleaner: Cleaned broadcast 4 14/11/03 20:46:00 WARN BlockManager: Putting block rdd_19_5 failed 14/11/03 20:46:00 ERROR Executor: Exception in task 5.0 in stage 3.0 (TID 70) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110) at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1047) at org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1056) at org.apache.spark.storage.TachyonStore.putIterator(TachyonStore.scala:60) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:743) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:594) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745 It is hard to see from this output what stage it fails, but the output is saving textFile. Individual record (key, value or key and value is relatively small, but number of records in the collection is large.) There seems to be a bottleneck that I have run into that I can't seem to get pass. Any pointers in the right direction will be helpful! Thanks, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OOM-Requested-array-size-exceeds-VM-limit-tp17996.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Thanks for the pointers! I did tried but didn't seem to help... In my latest try, I am doing spark-submit local But see the same message in spark App ui (4040) localhost CANNOT FIND ADDRESS In the logs, I see a lot of in-memory map to disk. I don't understand why that is the case. There should be over 35 gb ram available for input that is not significantly large. It may be link to the performance issues that I am seeing. I have another post for seeing advice on that. It seems, I am not able to tune spark sufficiently to execute my process successfully. 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 1777 MB to disk (2 times so far) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 393 MB to disk (1 time so f ar) 14/10/31 13:45:12 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (2 times so far) 14/10/31 13:45:14 INctorsBySecId();^ZFO ExternalAppendOnlyMap: Thread 230 spilling in-memory map of 554 MB to disk (2 times so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 235 spilling in-memory map of 3990 MB to disk (1 time so far) 14/10/31 13:45:15 INFO ExternalAppendOnlyMap: Thread 236 spilling in-memory map of 2667 MB to disk (1 time so far) 14/10/31 13:45:17 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 825 MB to disk (3 times so far) 14/10/31 13:45:24 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 4618 MB to disk (2 times so far) 14/10/31 13:45:26 INFO ExternalAppendOnlyMap: Thread 233 spilling in-memory map of 15869 MB to disk (1 time so far) 14/10/31 13:45:37 INFO ExternalAppendOnlyMap: Thread 206 spilling in-memory map of 3026 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 228 spilling in-memory map of 401 MB to disk (3 times so far) 14/10/31 13:45:48 INFO ExternalAppendOnlyMap: Thread 259 spilling in-memory map of 392 MB to disk (4 times so My spark properties are: NameValue spark.akka.frameSize50 spark.akka.timeout 900 spark.app.name Simple File Merge Application spark.core.connection.ack.wait.timeout 900 spark.default.parallelism 10 spark.driver.host spark-single.c.fi-mdd-poc.internal spark.driver.memory 35g spark.driver.port 40255 spark.eventLog.enabled true spark.fileserver.urihttp://10.240.106.135:59255 spark.jars file:/home/ami_khandeshi_gmail_com/SparkExample-1.0.jar spark.masterlocal[16] spark.scheduler.modeFIFO spark.shuffle.consolidateFiles true spark.storage.memoryFraction0.3 spark.tachyonStore.baseDir /tmp spark.tachyonStore.folderName spark-21ad0fd2-2177-48ce-9242-8dbb33f2a1f1 spark.tachyonStore.url tachyon://mdd:19998 -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17824.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
CANNOT FIND ADDRESS
SparkApplication UI shows that one of the executor Cannot find Addresss Aggregated Metrics by Executor Executor ID Address Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle ReadShuffle Write Shuffle Spill (Memory) Shuffle Spill (Disk) 0 mddworker1.c.fi-mdd-poc.internal:42197 0 ms0 0 0 0.0 B 136.1 MB184.9 MB 146.8 GB135.4 MB 1 CANNOT FIND ADDRESS 0 ms0 0 0 0.0 B 87.4 MB 142.0 MB61.4 GB 81.4 MB I also see following in one of the executor logs for which the driver may have lost communication. 14/10/29 13:18:33 WARN : Master_Client Heartbeat last execution took 90859 ms. Longer than the FIXED_EXECUTION_INTERVAL_MS 5000 14/10/29 13:18:33 WARN : WorkerClientToWorkerHeartbeat last execution took 90859 ms. Longer than the FIXED_EXECUTION_INTERVAL_MS 1000 14/10/29 13:18:33 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [30 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:362) I have also seen other variation of timeouts 14/10/29 06:21:05 WARN SendingConnection: Error finishing connection to mddworker1.c.fi-mdd-poc.internal/10.240.179.241:40442 java.net.ConnectException: Connection refused 14/10/29 06:21:05 ERROR BlockManager: Failed to report broadcast_6_piece0 to master; giving up. or 14/10/29 07:23:40 WARN AkkaUtils: Error sending message in 1 attempts java.util.concurrent.TimeoutException: Futures timed out after [10 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.util.AkkaUtils$.askWithReply(AkkaUtils.scala:176) at org.apache.spark.storage.BlockManagerMaster.askDriverWithReply(BlockManagerMaster.scala:218) at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:58) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:310) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:190) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:188) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at org.apache.spark.util.TimeStampedHashMap.foreach(TimeStampedHashMap.scala:107) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.storage.BlockManager.reportAllBlocks(BlockManager.scala:188) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:207) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:366) How do I track down what is causing this problem? Any suggestion on solution, debugging or workaround will be helpful! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark Performance
I am relatively new to spark processing. I am using Spark Java API to process data. I am having trouble processing a data set that I don't think is significantly large. It is joining a dataset that is around 3-4gb each (around 12 gb data). The workflow is: x=RDD1.KeyBy(x).partitionBy(new HashPartitioner(10).cache() y=RDD2.KeyBy(x).partitionBy(new HashPartitioner(10).cache() z=RDD3.KeyBy(x).partitionBy(new HashPartitioner(10).cache() o=RDD4.KeyBy(y).partitionBy(new HashPartitioner(10).cache() out=x.join(y).join(z).keyBy(y).partitionBy(new HashPartitioner(10).cache().join(o) out.saveAsObject(Out); The spark processor seems to be hung at out= step indefinitely. I am using kyro for serialization. using local with SPARK_MEM=90g. I have 16cpu, 108g ram. I am saving output to hadoop. I have also tried on a standalone cluster with 2 workers 8 cpu and 52 gb ram. My VMs are on google cloud. Below is the table from the completed stages. Stage IdDescription Submitted DurationTasks: Succeeded/Total Input Shuffle ReadShuffle Write 8 keyBy at ProcessA.java:1094+details 10/27/2014 12:402.0 min 10-Oct 3 filter at ProcessA.java:1079+details10/27/2014 12:402.0 min 10-Oct 2 keyBy at ProcessA.java:1071+details 10/27/2014 12:3939 s 11-Nov 268.4 MB 25.7 MB 1 filter at ProcessA.java:1103+details10/27/2014 12:3916 s 9-Sep 58.8 MB 30.4 MB 7 keyBy at ProcessA.java:1045+details 10/27/2014 12:3932 s 24/24 2.8 GB 573.8 MB 6 keyBy at ProcessA.java:1045+details 10/27/2014 12:3940 s 11-Nov 268.4 MB 24.5 MB Somethings, I don't understand.. I see output in the logfiles where it is indicating it is spilling in-memory map to disk, and the spilling size is greater than the input. I am not sure how to avoid that or reduce that... I also tried the cluster mode where I observed the same behavior, so I questioned whether the processes are running in parallel or serial? 14/10/27 14:11:33 INFO collection.ExternalAppendOnlyMap: Thread 94 spilling in-memory map of 1000 MB to disk ( 15 times so far) 14/10/27 14:11:34 INFO collection.ExternalAppendOnlyMap: Thread 107 spilling in-memory map of 2351 MB to disk (2 times so far) 14/10/27 14:11:36 INFO collection.ExternalAppendOnlyMap: Thread 94 spilling in-memory map of 1000 MB to disk ( 16 times so far) 14/10/27 14:11:37 INFO collection.ExternalAppendOnlyMap: Thread 91 spilling in-memory map of 4781 MB to disk ( 10 times so far) 14/10/27 14:11:38 INFO collection.ExternalAppendOnlyMap: Thread 112 spilling in-memory map of 1243 MB to disk (10 times so far) 14/10/27 14:11:39 INFO collection.ExternalAppendOnlyMap: Thread 94 spilling in-memory map of 983 MB to disk (1 7 times so far) 14/10/27 14:11:39 INFO collection.ExternalAppendOnlyMap: Thread 96 spilling in-memory map of 75546 MB to disk (11 times so far) 14/10/27 14:11:56 INFO collection.ExternalAppendOnlyMap: Thread 106 spilling in-memory map of 2324 MB to disk (7 times so far) 14/10/27 14:11:56 INFO collection.ExternalAppendOnlyMap: Thread 112 spilling in-memory map of 1729 MB to disk (11 times so far) 14/10/27 14:11:58 INFO collection.ExternalAppendOnlyMap: Thread 96 spilling in-memory map of 2410 MB to disk ( 12 times so far) 14/10/27 14:11:58 INFO collection.ExternalAppendOnlyMap: Thread 91 spilling in-memory map of 1211 MB to disk I would appreciate any pointers in the right direction! ___ by the way, I also see behavior described error messages like Not enough space to cache partition rdd_21_4 -indicating perhaps nothing is getting cached. per - http://mail-archives.apache.org/mod_mbox/spark-issues/201409.mbox/%3cjira.12744773.141202099.148323.1412021014...@atlassian.jira%3E -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Performance-tp17640.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: CANNOT FIND ADDRESS
Thanks...hmm It is seems to be a timeout issue perhaps?? Not sure what is causing it? or how to debug? I see following error message... 4/10/29 13:26:04 ERROR ContextCleaner: Error cleaning broadcast 9 akka.pattern.AskTimeoutException: Timed out at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334) at akka.actor.Scheduler$$anon$11.run(Scheduler.scala:118) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$ $unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:455) at akka.actor.LightArrayRevolverScheduler$$anon$12.executeBucket$1(Scheduler.scala:407) at akka.actor.LightArrayRevolverScheduler$$anon$12.nextTick(Scheduler.scala:411) at akka.actor.LightArrayRevolverScheduler$$anon$12.run(Scheduler.scala:363) at java.lang.Thread.run(Thread.java:745) 14/10/29 13:26:04 WARN BlockManagerMaster: Failed to remove broadcast 9 with removeFromMaster = true - Timed o ut} -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/CANNOT-FIND-ADDRESS-tp17637p17646.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark-submt job Killed
I recently starting seeing this new problem where spark-submt is terminated by Killed message but no error message indicating what happened. I have enable logging on in spark configuration. has anyone seen this or know how to troubleshoot? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submt-job-Killed-tp17560.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark-submt job Killed
I did have it as rdd.saveAsText(RDD); and now I have it as: Log.info(RDD Counts+rdd.persist(StorageLevel.MEMORY_AND_DISK_SER()).count()); -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submt-job-Killed-tp17560p17598.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org