RE: Spark UI Storage Memory
unsubsribe
RE: spark with breeze error of NoClassDefFoundError
If I tried to change “provided” to “compile”.. then the error changed to : Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/11/19 10:28:29 INFO util.Utils: Shutdown hook called Meanwhile, I will prefer to use maven to compile the jar file rather than sbt, although it is indeed another option. Best regards, Jack From: Fengdong Yu [mailto:fengdo...@everstring.com] Sent: Wednesday, 18 November 2015 7:30 PM To: Jack Yang Cc: Ted Yu; user@spark.apache.org Subject: Re: spark with breeze error of NoClassDefFoundError The simplest way is remove all “provided” in your pom. then ‘sbt assembly” to build your final package. then get rid of ‘—jars’ because assembly already includes all dependencies. On Nov 18, 2015, at 2:15 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: So weird. Is there anything wrong with the way I made the pom file (I labelled them as provided)? Is there missing jar I forget to add in “--jar”? See the trace below: Exception in thread "main" java.lang.NoClassDefFoundError: breeze/storage/DefaultArrayValue at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: breeze.storage.DefaultArrayValue at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 10 more 15/11/18 17:15:15 INFO util.Utils: Shutdown hook called From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, 18 November 2015 4:01 PM To: Jack Yang Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: spark with breeze error of NoClassDefFoundError Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue : jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep !$ jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep DefaultArrayValue 369 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$mcZ$sp$class.class 309 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$mcJ$sp.class 2233 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$DoubleDefaultArrayValue$.class Can you show the complete stack trace ? FYI On Tue, Nov 17, 2015 at 8:33 PM, Jack Yang <j...@uow.edu.au<
RE: Do windowing functions require hive support?
SQLContext only implements a subset of the SQL function, not included the window function. In HiveContext it is fine though. From: Stephen Boesch [mailto:java...@gmail.com] Sent: Thursday, 19 November 2015 3:01 PM To: Michael Armbrust Cc: Jack Yang; user Subject: Re: Do windowing functions require hive support? Why is the same query (and actually i tried several variations) working against a hivecontext and not against the sql context? 2015-11-18 19:57 GMT-08:00 Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>>: Yes they do. On Wed, Nov 18, 2015 at 7:49 PM, Stephen Boesch <java...@gmail.com<mailto:java...@gmail.com>> wrote: But to focus the attention properly: I had already tried out 1.5.2. 2015-11-18 19:46 GMT-08:00 Stephen Boesch <java...@gmail.com<mailto:java...@gmail.com>>: Checked out 1.6.0-SNAPSHOT 60 minutes ago 2015-11-18 19:19 GMT-08:00 Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>>: Which version of spark are you using? From: Stephen Boesch [mailto:java...@gmail.com<mailto:java...@gmail.com>] Sent: Thursday, 19 November 2015 2:12 PM To: user Subject: Do windowing functions require hive support? The following works against a hive table from spark sql hc.sql("select id,r from (select id, name, rank() over (order by name) as r from tt2) v where v.r >= 1 and v.r <= 12") But when using a standard sql context against a temporary table the following occurs: Exception in thread "main" java.lang.RuntimeException: [3.25] failure: ``)'' expected but `(' found rank() over (order by name) as r ^
RE: Do windowing functions require hive support?
Which version of spark are you using? From: Stephen Boesch [mailto:java...@gmail.com] Sent: Thursday, 19 November 2015 2:12 PM To: user Subject: Do windowing functions require hive support? The following works against a hive table from spark sql hc.sql("select id,r from (select id, name, rank() over (order by name) as r from tt2) v where v.r >= 1 and v.r <= 12") But when using a standard sql context against a temporary table the following occurs: Exception in thread "main" java.lang.RuntimeException: [3.25] failure: ``)'' expected but `(' found rank() over (order by name) as r ^
RE: spark with breeze error of NoClassDefFoundError
Back to my question. If I use “provided”, the jar file will expect some libraries are provided by the system. However, the “ compiled ” is the default setting, which means the third-party library will be included inside jar file after compiling. So when I use “provided”, the error is they cannot find the Class, but with “compiled” the error is IncompatibleClassChangeError. Ok, so can someone tell me which version of breeze and breeze-math are used in spark 1.4? From: Zhiliang Zhu [mailto:zchl.j...@yahoo.com] Sent: Thursday, 19 November 2015 5:10 PM To: Ted Yu Cc: Jack Yang; Fengdong Yu; user@spark.apache.org Subject: Re: spark with breeze error of NoClassDefFoundError Dear Ted, I just looked at the link you provided, it is great! For my understanding, I could also directly use other Breeze part (except spark mllib package linalg ) in spark (scala or java ) program after importing Breeze package, it is right? Thanks a lot in advance again! Zhiliang On Thursday, November 19, 2015 1:46 PM, Ted Yu <yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote: Have you looked at https://github.com/scalanlp/breeze/wiki Cheers On Nov 18, 2015, at 9:34 PM, Zhiliang Zhu <zchl.j...@yahoo.com<mailto:zchl.j...@yahoo.com>> wrote: Dear Jack, As is known, Breeze is numerical calculation package wrote by scala , spark mllib also use it as underlying package for algebra usage. Here I am also preparing to use Breeze for nonlinear equation optimization, however, it seemed that I could not find the exact doc or API for Breeze except spark linalg package... Could you help some to provide me the official doc or API website for Breeze ? Thank you in advance! Zhiliang On Thursday, November 19, 2015 7:32 AM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: If I tried to change “provided” to “compile”.. then the error changed to : Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing class at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:800) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:449) at java.net.URLClassLoader.access$100(URLClassLoader.java:71) at java.net.URLClassLoader$1.run(URLClassLoader.java:361) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/11/19 10:28:29 INFO util.Utils: Shutdown hook called Meanwhile, I will prefer to use maven to compile the jar file rather than sbt, although it is indeed another option. Best regards, Jack From: Fengdong Yu [mailto:fengdo...@everstring.com] Sent: Wednesday, 18 November 2015 7:30 PM To: Jack Yang Cc: Ted Yu; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: spark with breeze error of NoClassDefFoundError The simplest way is remove all “provided” in your pom. then ‘sbt assembly” to build your final package. then get rid of ‘—jars’ because assembly already includes all dependencies. On Nov 18, 2015, at 2:15 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: So weird. Is there anything wrong with the way I made the pom file (I labelled them as provided)? Is there missing jar I forget to add in “--jar”? See the trace below: Exception in thread "main" java.lang.NoClassDefFoundError: breeze/storage/DefaultArrayValue at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at
RE: spark with breeze error of NoClassDefFoundError
So weird. Is there anything wrong with the way I made the pom file (I labelled them as provided)? Is there missing jar I forget to add in “--jar”? See the trace below: Exception in thread "main" java.lang.NoClassDefFoundError: breeze/storage/DefaultArrayValue at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.ClassNotFoundException: breeze.storage.DefaultArrayValue at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 10 more 15/11/18 17:15:15 INFO util.Utils: Shutdown hook called From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, 18 November 2015 4:01 PM To: Jack Yang Cc: user@spark.apache.org Subject: Re: spark with breeze error of NoClassDefFoundError Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue : jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep !$ jar tvf /Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | grep DefaultArrayValue 369 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$mcZ$sp$class.class 309 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$mcJ$sp.class 2233 Wed Mar 19 11:18:32 PDT 2014 breeze/storage/DefaultArrayValue$DoubleDefaultArrayValue$.class Can you show the complete stack trace ? FYI On Tue, Nov 17, 2015 at 8:33 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: Hi all, I am using spark 1.4.0, and building my codes using maven. So in one of my scala, I used: import breeze.linalg._ val v1 = new breeze.linalg.SparseVector(commonVector.indices, commonVector.values, commonVector.size) val v2 = new breeze.linalg.SparseVector(commonVector2.indices, commonVector2.values, commonVector2.size) println (v1.dot(v2) / (norm(v1) * norm(v2)) ) in my pom.xml file, I used: org.scalanlp breeze-math_2.10 0.4 provided org.scalanlp breeze_2.10 0.11.2 provided When submit, I included breeze jars (breeze_2.10-0.11.2.jar breeze-math_2.10-0.4.jar breeze-natives_2.10-0.11.2.jar breeze-process_2.10-0.3.jar) using “--jar” arguments, although I doubt it is necessary to do that. however, the error is Exception in thread "main" java.lang.NoClassDefFoundError: breeze/storage/DefaultArrayValue Any thoughts? Best regards, Jack
error with saveAsTextFile in local directory
Hi all, I am saving some hive- query results into the local directory: val hdfsFilePath = "hdfs://master:ip/ tempFile "; val localFilePath = "file:///home/hduser/tempFile"; hiveContext.sql(s"""my hql codes here""") res.printSchema() --working res.show() --working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath) --still working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath) --wrong! then at last, I get the correct results in hdfsFilePath, but nothing in localFilePath. Btw, the localFilePath was created, but the folder was only with a _SUCCESS file, no part file. See the track: (any thougt?) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at myApp.scala:112) with 1 output partitions (allowLocal=false) // the 112 line is the place I am using saveAsTextFile function to save the results locally. 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 42(saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 41) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List() 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no missing parents 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with curMem=3889533, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values in memory (estimated size 156.9 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with curMem=4050165, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 54.8 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.135:32836 (size: 54.8 KB, free: 266.8 MB) 15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast at DAGScheduler.scala:874 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1 tasks 15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.129:54062 (size: 54.8 KB, free: 1068.8 MB) 15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1) 15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose tasks have all completed, from pool 15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile at MyApp.scala:112) finished in 6.360 s 15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile at MyApp.scala:112, took 6.588821 s 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped
RE: error with saveAsTextFile in local directory
Yes. My one is 1.4.0. Then is this problem to do with the version? I doubt that. Any comments please? From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Wednesday, 4 November 2015 11:52 AM To: Jack Yang Cc: user@spark.apache.org Subject: Re: error with saveAsTextFile in local directory Looks like you were running 1.4.x or earlier release because the allowLocal flag is deprecated as of Spark 1.5.0+. Cheers On Tue, Nov 3, 2015 at 3:07 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: Hi all, I am saving some hive- query results into the local directory: val hdfsFilePath = "hdfs://master:ip/ tempFile "; val localFilePath = "file:///home/hduser/tempFile"; hiveContext.sql(s"""my hql codes here""") res.printSchema() --working res.show() --working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath) --still working res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath) --wrong! then at last, I get the correct results in hdfsFilePath, but nothing in localFilePath. Btw, the localFilePath was created, but the folder was only with a _SUCCESS file, no part file. See the track: (any thougt?) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at myApp.scala:112) with 1 output partitions (allowLocal=false) // the 112 line is the place I am using saveAsTextFile function to save the results locally. 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 42(saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 41) 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List() 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no missing parents 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with curMem=3889533, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values in memory (estimated size 156.9 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with curMem=4050165, maxMem=280248975 15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as bytes in memory (estimated size 54.8 KB, free 263.4 MB) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.135:32836<http://192.168.70.135:32836> (size: 54.8 KB, free: 266.8 MB) 15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast at DAGScheduler.scala:874 15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112) 15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1 tasks 15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes) 15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in memory on 192.168.70.129:54062<http://192.168.70.129:54062> (size: 54.8 KB, free: 1068.8 MB) 15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1) 15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose tasks have all completed, from pool 15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile at MyApp.scala:112) finished in 6.360 s 15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile at MyApp.scala:112, took 6.588821 s 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 15/11/04 09:57:47 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandle
RE: No space left on device when running graphx job
Just something usual as below: 1. Check the physical disk volume (particularly /tmp folder) 2. Use spark.local.dir to check the size of the temp files 3. Add more workers 4. Decrease partitions (in code) From: Robin East [mailto:robin.e...@xense.co.uk] Sent: Saturday, 26 September 2015 12:27 AM To: Jack Yang Cc: Ted Yu; Andy Huang; user@spark.apache.org Subject: Re: No space left on device when running graphx job Would you mind sharing what your solution was? It would help those on the forum who might run into the same problem. Even it it’s a silly ‘gotcha’ it would help to know what it was and how you spotted the source of the issue. Robin On 25 Sep 2015, at 05:34, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: Hi all, I resolved the problems. Thanks folk. Jack From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 25 September 2015 9:57 AM To: Ted Yu; Andy Huang Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: No space left on device when running graphx job Also, please see the screenshot below from spark web ui: This is the snapshot just 5 seconds (I guess) before the job crashed. From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 25 September 2015 9:55 AM To: Ted Yu; Andy Huang Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: No space left on device when running graphx job Hi, here is the full stack trace: 15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 (TID 62230, 192.168.70.129): java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I am using df –i command to monitor the inode usage, which shows the below all the time: Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda1 1245184 275424 969760 23% / udev382148484 3816641% /dev tmpfs 384505366 3841391% /run none384505 3 3845021% /run/lock none384505 1 3845041% /run/shm From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, 24 September 2015 9:12 PM To: Andy Huang Cc: Jack Yang; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: No space left on device when running graphx job Andy: Can you show complete stack trace ? Have you checked there are enough free inode on the .129 machine ? Cheers On Sep 23, 2015, at 11:43 PM, Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>> wrote: Hi Jack, Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM filled up) and it's running out of disk space. Cheers Andy On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: Hi folk, I ha
No space left on device when running graphx job
Hi folk, I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores) Basically, I load data using GraphLoader.edgeListFile mthod and then count number of nodes using: graph.vertices.count() method. The problem is : Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) when I try a small amount of data, the code is working. So I guess the error comes from the amount of data. This is how I submit the job: spark-submit --class "myclass" --master spark://hadoopmaster:7077 (I am using standalone) --executor-memory 2048M --driver-java-options "-XX:MaxPermSize=2G" --total-executor-cores 4 my.jar Any thoughts? Best regards, Jack
RE: No space left on device when running graphx job
Hi all, I resolved the problems. Thanks folk. Jack From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 25 September 2015 9:57 AM To: Ted Yu; Andy Huang Cc: user@spark.apache.org Subject: RE: No space left on device when running graphx job Also, please see the screenshot below from spark web ui: This is the snapshot just 5 seconds (I guess) before the job crashed. [cid:image001.png@01D0F79F.44F6CC70] From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 25 September 2015 9:55 AM To: Ted Yu; Andy Huang Cc: user@spark.apache.org<mailto:user@spark.apache.org> Subject: RE: No space left on device when running graphx job Hi, here is the full stack trace: 15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 (TID 62230, 192.168.70.129): java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.DataOutputStream.writeLong(DataOutputStream.java:224) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80) at org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285) at org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88) at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) I am using df –i command to monitor the inode usage, which shows the below all the time: Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda1 1245184 275424 969760 23% / udev382148484 3816641% /dev tmpfs 384505366 3841391% /run none384505 3 3845021% /run/lock none384505 1 3845041% /run/shm From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Thursday, 24 September 2015 9:12 PM To: Andy Huang Cc: Jack Yang; user@spark.apache.org<mailto:user@spark.apache.org> Subject: Re: No space left on device when running graphx job Andy: Can you show complete stack trace ? Have you checked there are enough free inode on the .129 machine ? Cheers On Sep 23, 2015, at 11:43 PM, Andy Huang <andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>> wrote: Hi Jack, Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM filled up) and it's running out of disk space. Cheers Andy On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote: Hi folk, I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores) Basically, I load data using GraphLoader.edgeListFile mthod and then count number of nodes using: graph.vertices.count() method. The problem is : Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:345) when I try a small amount of data, the code is working. So I guess the error comes from the amount of data. This is how I submit the job: spark-submit --class "myclass" --master spark://hadoopmaster:7077 (I am using standalone) --executor-memory 2048M --driver-java-options "-XX:MaxPermSize=2G&quo
log file directory
Hi all, I have questions with regarding to the log file directory. That say if I run spark-submit --master local[4], where is the log file? Then how about if I run standalone spark-submit --master spark://mymaster:7077? Best regards, Jack
RE: standalone to connect mysql
That works! Thanks. Can I ask you one further question? How did spark sql support insertion? That is say, if I did: sqlContext.sql(insert into newStu values (“10”,”a”,1) the error is: failure: ``table'' expected but identifier newStu found insert into newStu values ('10', aa, 1) but if I did: sqlContext.sql(sinsert into Table newStu select * from otherStu) that works. Is there any document addressing that? Best regards, Jack From: Terry Hole [mailto:hujie.ea...@gmail.com] Sent: Tuesday, 21 July 2015 4:17 PM To: Jack Yang; user@spark.apache.org Subject: Re: standalone to connect mysql Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry Hi there, I would like to use spark to access the data in mysql. So firstly I tried to run the program using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar that returns me the correct results. Then I tried the standalone version using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar (the mysql-connector-java-5.1.34.jar i have them on all worker nodes.) and the error is: Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No suitable driver found for jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root I also found the similar problem before in https://jira.talendforge.org/browse/TBD-2244. Is this a bug to be fixed later? Or do I miss anything? Best regards, Jack
Re: standalone to connect mysql
I maybe find the answer from the sqlparser.scala file. Looks like the syntax spark used for insert is different from what we normally used for MySQL. I hope if someone can confirm this. Also I will appreciate if there is a SQL reference list available. Sent from my iPhone On 21 Jul 2015, at 9:21 pm, Jack Yang j...@uow.edu.aumailto:j...@uow.edu.au wrote: No. I did not use hiveContext at this stage. I am talking the embedded SQL syntax for pure spark sql. Thanks, mate. On 21 Jul 2015, at 6:13 pm, Terry Hole hujie.ea...@gmail.commailto:hujie.ea...@gmail.com wrote: Jack, You can refer the hive sql syntax if you use HiveContext: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML Thanks! -Terry That works! Thanks. Can I ask you one further question? How did spark sql support insertion? That is say, if I did: sqlContext.sql(insert into newStu values (10,a,1) the error is: failure: ``table'' expected but identifier newStu found insert into newStu values ('10', aa, 1) but if I did: sqlContext.sql(sinsert into Table newStu select * from otherStu) that works. Is there any document addressing that? Best regards, Jack From: Terry Hole [mailto:hujie.ea...@gmail.commailto:hujie.ea...@gmail.com] Sent: Tuesday, 21 July 2015 4:17 PM To: Jack Yang; user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: standalone to connect mysql Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry Hi there, I would like to use spark to access the data in mysql. So firstly I tried to run the program using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar that returns me the correct results. Then I tried the standalone version using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar (the mysql-connector-java-5.1.34.jar i have them on all worker nodes.) and the error is: Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No suitable driver found for jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root I also found the similar problem before in https://jira.talendforge.org/browse/TBD-2244. Is this a bug to be fixed later? Or do I miss anything? Best regards, Jack
Re: standalone to connect mysql
No. I did not use hiveContext at this stage. I am talking the embedded SQL syntax for pure spark sql. Thanks, mate. On 21 Jul 2015, at 6:13 pm, Terry Hole hujie.ea...@gmail.commailto:hujie.ea...@gmail.com wrote: Jack, You can refer the hive sql syntax if you use HiveContext: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML Thanks! -Terry That works! Thanks. Can I ask you one further question? How did spark sql support insertion? That is say, if I did: sqlContext.sql(insert into newStu values (10,a,1) the error is: failure: ``table'' expected but identifier newStu found insert into newStu values ('10', aa, 1) but if I did: sqlContext.sql(sinsert into Table newStu select * from otherStu) that works. Is there any document addressing that? Best regards, Jack From: Terry Hole [mailto:hujie.ea...@gmail.commailto:hujie.ea...@gmail.com] Sent: Tuesday, 21 July 2015 4:17 PM To: Jack Yang; user@spark.apache.orgmailto:user@spark.apache.org Subject: Re: standalone to connect mysql Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar Thanks! -Terry Hi there, I would like to use spark to access the data in mysql. So firstly I tried to run the program using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar that returns me the correct results. Then I tried the standalone version using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar (the mysql-connector-java-5.1.34.jar i have them on all worker nodes.) and the error is: Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No suitable driver found for jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root I also found the similar problem before in https://jira.talendforge.org/browse/TBD-2244. Is this a bug to be fixed later? Or do I miss anything? Best regards, Jack
standalone to connect mysql
Hi there, I would like to use spark to access the data in mysql. So firstly I tried to run the program using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar that returns me the correct results. Then I tried the standalone version using: spark-submit --class sparkwithscala.SqlApp --driver-class-path /home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 /home/myjar.jar (the mysql-connector-java-5.1.34.jar i have them on all worker nodes.) and the error is: Exception in thread main org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No suitable driver found for jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root I also found the similar problem before in https://jira.talendforge.org/browse/TBD-2244. Is this a bug to be fixed later? Or do I miss anything? Best regards, Jack
assertion failed error with GraphX
Hi there, I got an error when running one simple graphX program. My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual machines. if I constructed one small graph (6 nodes, 4 edges), I run: println(triangleCount: %s .format( hdfs_graph.triangleCount().vertices.count() )) that returns me the correct results. But I import a much larger graph (with 85 nodes, 500 edges), the error is 15/07/20 12:03:36 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 11.0 (TID 32, 192.168.157.131): java.lang.AssertionError: assertion failed at scala.Predef$.assert(Predef.scala:165) at org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:90) at org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:87) at org.apache.spark.graphx.impl.VertexPartitionBaseOps.leftJoin(VertexPartitionBaseOps.scala:140) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:159) at org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156) at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) I run the above two graphs using the same submit command: spark-submit --class sparkUI.GraphApp --master spark://master:7077 --executor-memory 2G --total-executor-cores 4 myjar.jar any thought? anything wrong with my machine or configuration? Best regards, Jack
RE: error from DecisonTree Training:
So this is a bug unsolved (for java) yet? From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 18 July 2014 4:52 PM To: user@spark.apache.org Subject: error from DecisonTree Training: Hi All, I got an error while using DecisionTreeModel (my program is written in Java, spark 1.0.1, scala 2.10.1). I have read a local file, loaded it as RDD, and then sent to decisionTree for training. See below for details: JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache(); LogisticRegressionModel model = LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize); // until here it is working Strategy strategy = new Strategy( ); DecisionTree decisionTree = new DecisionTree(strategy); DecisionTreeModel decisionTreeModel = decisionTree.train(Points.rdd()); The error is : java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.regression.LabeledPoint; Any thoughts? Best regards, Jack
RE: error from DecisonTree Training:
That is nice. Thanks Xiangrui. -Original Message- From: Xiangrui Meng [mailto:men...@gmail.com] Sent: Tuesday, 22 July 2014 9:31 AM To: user@spark.apache.org Subject: Re: error from DecisonTree Training: This is a known issue: https://issues.apache.org/jira/browse/SPARK-2197 . Joseph is working on it. -Xiangrui On Mon, Jul 21, 2014 at 4:20 PM, Jack Yang j...@uow.edu.au wrote: So this is a bug unsolved (for java) yet? From: Jack Yang [mailto:j...@uow.edu.au] Sent: Friday, 18 July 2014 4:52 PM To: user@spark.apache.org Subject: error from DecisonTree Training: Hi All, I got an error while using DecisionTreeModel (my program is written in Java, spark 1.0.1, scala 2.10.1). I have read a local file, loaded it as RDD, and then sent to decisionTree for training. See below for details: JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache(); LogisticRegressionModel model = LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize); // until here it is working Strategy strategy = new Strategy( ….); DecisionTree decisionTree = new DecisionTree(strategy); DecisionTreeModel decisionTreeModel = decisionTree.train(Points.rdd()); The error is : java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.regression.LabeledPoint; Any thoughts? Best regards, Jack
error from DecisonTree Training:
Hi All, I got an error while using DecisionTreeModel (my program is written in Java, spark 1.0.1, scala 2.10.1). I have read a local file, loaded it as RDD, and then sent to decisionTree for training. See below for details: JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache(); LogisticRegressionModel model = LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize); // until here it is working Strategy strategy = new Strategy( ); DecisionTree decisionTree = new DecisionTree(strategy); DecisionTreeModel decisionTreeModel = decisionTree.train(Points.rdd()); The error is : java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast to [Lorg.apache.spark.mllib.regression.LabeledPoint; Any thoughts? Best regards, Jack