RE: Spark UI Storage Memory

2020-12-04 Thread Jack Yang
unsubsribe


RE: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Jack Yang
If I tried to change “provided” to “compile”.. then the error changed to :

Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing 
class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/11/19 10:28:29 INFO util.Utils: Shutdown hook called

Meanwhile, I will prefer to use maven to compile the jar file rather than sbt, 
although it is indeed another option.

Best regards,
Jack



From: Fengdong Yu [mailto:fengdo...@everstring.com]
Sent: Wednesday, 18 November 2015 7:30 PM
To: Jack Yang
Cc: Ted Yu; user@spark.apache.org
Subject: Re: spark with breeze error of NoClassDefFoundError

The simplest way is remove all “provided” in your pom.

then ‘sbt assembly” to build your final package. then get rid of ‘—jars’ 
because assembly already includes all dependencies.






On Nov 18, 2015, at 2:15 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:

So weird. Is there anything wrong with the way I made the pom file (I labelled 
them as provided)?

Is there missing jar I forget to add in “--jar”?

See the trace below:



Exception in thread "main" java.lang.NoClassDefFoundError: 
breeze/storage/DefaultArrayValue
at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: breeze.storage.DefaultArrayValue
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 10 more
15/11/18 17:15:15 INFO util.Utils: Shutdown hook called


From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, 18 November 2015 4:01 PM
To: Jack Yang
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: spark with breeze error of NoClassDefFoundError

Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue :

jar tvf 
/Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | 
grep !$
jar tvf 
/Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | 
grep DefaultArrayValue
   369 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$mcZ$sp$class.class
   309 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$mcJ$sp.class
  2233 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$DoubleDefaultArrayValue$.class

Can you show the complete stack trace ?

FYI

On Tue, Nov 17, 2015 at 8:33 PM, Jack Yang 
<j...@uow.edu.au<

RE: Do windowing functions require hive support?

2015-11-18 Thread Jack Yang
SQLContext only implements a subset of the SQL function, not included the 
window function.
In HiveContext it is fine though.

From: Stephen Boesch [mailto:java...@gmail.com]
Sent: Thursday, 19 November 2015 3:01 PM
To: Michael Armbrust
Cc: Jack Yang; user
Subject: Re: Do windowing functions require hive support?

Why is the same query (and actually i tried several variations) working against 
a hivecontext and not against the sql context?

2015-11-18 19:57 GMT-08:00 Michael Armbrust 
<mich...@databricks.com<mailto:mich...@databricks.com>>:
Yes they do.

On Wed, Nov 18, 2015 at 7:49 PM, Stephen Boesch 
<java...@gmail.com<mailto:java...@gmail.com>> wrote:
But to focus the attention properly: I had already tried out 1.5.2.

2015-11-18 19:46 GMT-08:00 Stephen Boesch 
<java...@gmail.com<mailto:java...@gmail.com>>:
Checked out 1.6.0-SNAPSHOT 60 minutes ago

2015-11-18 19:19 GMT-08:00 Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>>:
Which version of spark are you using?

From: Stephen Boesch [mailto:java...@gmail.com<mailto:java...@gmail.com>]
Sent: Thursday, 19 November 2015 2:12 PM
To: user
Subject: Do windowing functions require hive support?


The following works against a hive table from spark sql

hc.sql("select id,r from (select id, name, rank()  over (order by name) as r 
from tt2) v where v.r >= 1 and v.r <= 12")

But when using  a standard sql context against a temporary table the following 
occurs:



Exception in thread "main" java.lang.RuntimeException: [3.25]

  failure: ``)'' expected but `(' found



rank() over (order by name) as r

^






RE: Do windowing functions require hive support?

2015-11-18 Thread Jack Yang
Which version of spark are you using?

From: Stephen Boesch [mailto:java...@gmail.com]
Sent: Thursday, 19 November 2015 2:12 PM
To: user
Subject: Do windowing functions require hive support?


The following works against a hive table from spark sql

hc.sql("select id,r from (select id, name, rank()  over (order by name) as r 
from tt2) v where v.r >= 1 and v.r <= 12")

But when using  a standard sql context against a temporary table the following 
occurs:



Exception in thread "main" java.lang.RuntimeException: [3.25]

  failure: ``)'' expected but `(' found



rank() over (order by name) as r

^


RE: spark with breeze error of NoClassDefFoundError

2015-11-18 Thread Jack Yang
Back to my question. If  I use “provided”, the jar file will 
expect some libraries are provided by the system.
However, the “ compiled ” is the default setting, which means 
the third-party library will be included inside jar file after compiling.
So when I use “provided”, the error is they cannot find the 
Class, but with “compiled” the error is IncompatibleClassChangeError.

Ok, so can someone tell me which version of breeze and breeze-math are used in 
spark 1.4?

From: Zhiliang Zhu [mailto:zchl.j...@yahoo.com]
Sent: Thursday, 19 November 2015 5:10 PM
To: Ted Yu
Cc: Jack Yang; Fengdong Yu; user@spark.apache.org
Subject: Re: spark with breeze error of NoClassDefFoundError

Dear Ted,
I just looked at the link you provided, it is great!

For my understanding, I could also directly use other Breeze part (except spark 
mllib package linalg ) in spark (scala or java ) program after importing Breeze 
package,
it is right?

Thanks a lot in advance again!
Zhiliang



On Thursday, November 19, 2015 1:46 PM, Ted Yu 
<yuzhih...@gmail.com<mailto:yuzhih...@gmail.com>> wrote:

Have you looked at
https://github.com/scalanlp/breeze/wiki

Cheers

On Nov 18, 2015, at 9:34 PM, Zhiliang Zhu 
<zchl.j...@yahoo.com<mailto:zchl.j...@yahoo.com>> wrote:
Dear Jack,

As is known, Breeze is numerical calculation package wrote by scala , spark 
mllib also use it as underlying package for algebra usage.
Here I am also preparing to use Breeze for nonlinear equation optimization, 
however, it seemed that I could not find the exact doc or API for Breeze except 
spark linalg package...

Could you help some to provide me the official doc or API website for Breeze ?
Thank you in advance!

Zhiliang



On Thursday, November 19, 2015 7:32 AM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:

If I tried to change “provided” to “compile”.. then the error changed to :

Exception in thread "main" java.lang.IncompatibleClassChangeError: Implementing 
class
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
15/11/19 10:28:29 INFO util.Utils: Shutdown hook called

Meanwhile, I will prefer to use maven to compile the jar file rather than sbt, 
although it is indeed another option.

Best regards,
Jack



From: Fengdong Yu [mailto:fengdo...@everstring.com]
Sent: Wednesday, 18 November 2015 7:30 PM
To: Jack Yang
Cc: Ted Yu; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: spark with breeze error of NoClassDefFoundError

The simplest way is remove all “provided” in your pom.

then ‘sbt assembly” to build your final package. then get rid of ‘—jars’ 
because assembly already includes all dependencies.






On Nov 18, 2015, at 2:15 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:

So weird. Is there anything wrong with the way I made the pom file (I labelled 
them as provided)?

Is there missing jar I forget to add in “--jar”?

See the trace below:



Exception in thread "main" java.lang.NoClassDefFoundError: 
breeze/storage/DefaultArrayValue
at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at

RE: spark with breeze error of NoClassDefFoundError

2015-11-17 Thread Jack Yang
So weird. Is there anything wrong with the way I made the pom file (I labelled 
them as provided)?

Is there missing jar I forget to add in “--jar”?

See the trace below:



Exception in thread "main" java.lang.NoClassDefFoundError: 
breeze/storage/DefaultArrayValue
at smartapp.smart.sparkwithscala.textMingApp.main(textMingApp.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: breeze.storage.DefaultArrayValue
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 10 more
15/11/18 17:15:15 INFO util.Utils: Shutdown hook called


From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, 18 November 2015 4:01 PM
To: Jack Yang
Cc: user@spark.apache.org
Subject: Re: spark with breeze error of NoClassDefFoundError

Looking in local maven repo, breeze_2.10-0.7.jar contains DefaultArrayValue :

jar tvf 
/Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | 
grep !$
jar tvf 
/Users/tyu/.m2/repository//org/scalanlp/breeze_2.10/0.7/breeze_2.10-0.7.jar | 
grep DefaultArrayValue
   369 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$mcZ$sp$class.class
   309 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$mcJ$sp.class
  2233 Wed Mar 19 11:18:32 PDT 2014 
breeze/storage/DefaultArrayValue$DoubleDefaultArrayValue$.class

Can you show the complete stack trace ?

FYI

On Tue, Nov 17, 2015 at 8:33 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:
Hi all,
I am using spark 1.4.0, and building my codes using maven.
So in one of my scala, I used:

import breeze.linalg._
val v1 = new breeze.linalg.SparseVector(commonVector.indices, 
commonVector.values, commonVector.size)
val v2 = new breeze.linalg.SparseVector(commonVector2.indices, 
commonVector2.values, commonVector2.size)
println (v1.dot(v2) / (norm(v1) * norm(v2)) )



in my pom.xml file, I used:

 org.scalanlp
 
breeze-math_2.10
 0.4
 provided
  

  
 org.scalanlp
 
breeze_2.10
 0.11.2
 provided
  


When submit, I included breeze jars (breeze_2.10-0.11.2.jar 
breeze-math_2.10-0.4.jar breeze-natives_2.10-0.11.2.jar 
breeze-process_2.10-0.3.jar) using “--jar” arguments, although I doubt it is 
necessary to do that.

however, the error is

Exception in thread "main" java.lang.NoClassDefFoundError: 
breeze/storage/DefaultArrayValue

Any thoughts?



Best regards,
Jack




error with saveAsTextFile in local directory

2015-11-03 Thread Jack Yang
Hi all,

I am saving some hive- query results into the local directory:

val hdfsFilePath = "hdfs://master:ip/ tempFile ";
val localFilePath = "file:///home/hduser/tempFile";
hiveContext.sql(s"""my hql codes here""")
res.printSchema()  --working
res.show()   --working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath)  
--still working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath)  
--wrong!

then at last, I get the correct results in hdfsFilePath, but nothing in 
localFilePath.
Btw, the localFilePath was created, but the folder was only with a _SUCCESS 
file, no part file.

See the track: (any thougt?)

15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at 
myApp.scala:112) with 1 output partitions (allowLocal=false)
// the 112 line is the place I am using saveAsTextFile function to save the 
results locally.

15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 
42(saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 41)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List()
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42 
(MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no 
missing parents
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with 
curMem=3889533, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values 
in memory (estimated size 156.9 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with 
curMem=4050165, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as 
bytes in memory (estimated size 54.8 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in 
memory on 192.168.70.135:32836 (size: 54.8 KB, free: 266.8 MB)
15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast 
at DAGScheduler.scala:874
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from 
ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1 
tasks
15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in 
memory on 192.168.70.129:54062 (size: 54.8 KB, free: 1068.8 MB)
15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1)
15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose 
tasks have all completed, from pool
15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile 
at MyApp.scala:112) finished in 6.360 s
15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile 
at MyApp.scala:112, took 6.588821 s
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/api,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/static,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 

RE: error with saveAsTextFile in local directory

2015-11-03 Thread Jack Yang
Yes. My one is 1.4.0.

Then is this problem to do with the version?

I doubt that.  Any comments please?

From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, 4 November 2015 11:52 AM
To: Jack Yang
Cc: user@spark.apache.org
Subject: Re: error with saveAsTextFile in local directory

Looks like you were running 1.4.x or earlier release because the allowLocal 
flag is deprecated as of Spark 1.5.0+.

Cheers

On Tue, Nov 3, 2015 at 3:07 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:
Hi all,

I am saving some hive- query results into the local directory:

val hdfsFilePath = "hdfs://master:ip/ tempFile ";
val localFilePath = 
"file:///home/hduser/tempFile";
hiveContext.sql(s"""my hql codes here""")
res.printSchema()  --working
res.show()   --working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(hdfsFilePath)  
--still working
res.map{ x => tranRow2Str(x) }.coalesce(1).saveAsTextFile(localFilePath)  
--wrong!

then at last, I get the correct results in hdfsFilePath, but nothing in 
localFilePath.
Btw, the localFilePath was created, but the folder was only with a _SUCCESS 
file, no part file.

See the track: (any thougt?)

15/11/04 09:57:41 INFO scheduler.DAGScheduler: Got job 4 (saveAsTextFile at 
myApp.scala:112) with 1 output partitions (allowLocal=false)
// the 112 line is the place I am using saveAsTextFile function to save the 
results locally.

15/11/04 09:57:41 INFO scheduler.DAGScheduler: Final stage: ResultStage 
42(saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Parents of final stage: 
List(ShuffleMapStage 41)
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Missing parents: List()
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting ResultStage 42 
(MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112), which has no 
missing parents
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(160632) called with 
curMem=3889533, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28 stored as values 
in memory (estimated size 156.9 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.MemoryStore: ensureFreeSpace(56065) called with 
curMem=4050165, maxMem=280248975
15/11/04 09:57:41 INFO storage.MemoryStore: Block broadcast_28_piece0 stored as 
bytes in memory (estimated size 54.8 KB, free 263.4 MB)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in 
memory on 192.168.70.135:32836<http://192.168.70.135:32836> (size: 54.8 KB, 
free: 266.8 MB)
15/11/04 09:57:41 INFO spark.SparkContext: Created broadcast 28 from broadcast 
at DAGScheduler.scala:874
15/11/04 09:57:41 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from 
ResultStage 42 (MapPartitionsRDD[106] at saveAsTextFile at MyApp.scala:112)
15/11/04 09:57:41 INFO scheduler.TaskSchedulerImpl: Adding task set 42.0 with 1 
tasks
15/11/04 09:57:41 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 
42.0 (TID 2018, 192.168.70.129, PROCESS_LOCAL, 5097 bytes)
15/11/04 09:57:41 INFO storage.BlockManagerInfo: Added broadcast_28_piece0 in 
memory on 192.168.70.129:54062<http://192.168.70.129:54062> (size: 54.8 KB, 
free: 1068.8 MB)
15/11/04 09:57:47 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 
42.0 (TID 2018) in 6362 ms on 192.168.70.129 (1/1)
15/11/04 09:57:47 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 42.0, whose 
tasks have all completed, from pool
15/11/04 09:57:47 INFO scheduler.DAGScheduler: ResultStage 42 (saveAsTextFile 
at MyApp.scala:112) finished in 6.360 s
15/11/04 09:57:47 INFO scheduler.DAGScheduler: Job 4 finished: saveAsTextFile 
at MyApp.scala:112, took 6.588821 s
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/metrics/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/api,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/static,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment/json,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment,null}
15/11/04 09:57:47 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandle

RE: No space left on device when running graphx job

2015-10-05 Thread Jack Yang
Just something usual as below:



1.  Check the physical disk volume (particularly /tmp folder)

2.  Use spark.local.dir to check the size of the temp files

3.  Add more workers

4.  Decrease partitions (in code)

From: Robin East [mailto:robin.e...@xense.co.uk]
Sent: Saturday, 26 September 2015 12:27 AM
To: Jack Yang
Cc: Ted Yu; Andy Huang; user@spark.apache.org
Subject: Re: No space left on device when running graphx job

Would you mind sharing what your solution was? It would help those on the forum 
who might run into the same problem. Even it it’s a silly ‘gotcha’ it would 
help to know what it was and how you spotted the source of the issue.

Robin



On 25 Sep 2015, at 05:34, Jack Yang <j...@uow.edu.au<mailto:j...@uow.edu.au>> 
wrote:

Hi all,
I resolved the problems.
Thanks folk.
Jack

From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 25 September 2015 9:57 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: No space left on device when running graphx job

Also, please see the screenshot below from spark web ui:
This is the snapshot just 5 seconds (I guess) before the job crashed.



From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 25 September 2015 9:55 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: No space left on device when running graphx job

Hi, here is the full stack trace:

15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 
(TID 62230, 192.168.70.129): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)


I am using df –i command to monitor the inode usage, which shows the below all 
the time:

Filesystem  Inodes  IUsed  IFree IUse% Mounted on
/dev/sda1  1245184 275424 969760   23% /
udev382148484 3816641% /dev
tmpfs   384505366 3841391% /run
none384505  3 3845021% /run/lock
none384505  1 3845041% /run/shm



From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, 24 September 2015 9:12 PM
To: Andy Huang
Cc: Jack Yang; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: No space left on device when running graphx job

Andy:
Can you show complete stack trace ?

Have you checked there are enough free inode on the .129 machine ?

Cheers

On Sep 23, 2015, at 11:43 PM, Andy Huang 
<andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>> wrote:
Hi Jack,

Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM 
filled up) and it's running out of disk space.

Cheers
Andy

On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:
Hi folk,

I ha

No space left on device when running graphx job

2015-09-24 Thread Jack Yang
Hi folk,

I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores)
Basically, I load data using GraphLoader.edgeListFile mthod and then count 
number of nodes using: graph.vertices.count() method.
The problem is :

Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): 
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)

when I try a small amount of data, the code is working. So I guess the error 
comes from the amount of data.
This is how I submit the job:

spark-submit --class "myclass"
--master spark://hadoopmaster:7077  (I am using standalone)
--executor-memory 2048M
--driver-java-options "-XX:MaxPermSize=2G"
--total-executor-cores 4  my.jar


Any thoughts?
Best regards,
Jack



RE: No space left on device when running graphx job

2015-09-24 Thread Jack Yang
Hi all,
I resolved the problems.
Thanks folk.
Jack

From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 25 September 2015 9:57 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org
Subject: RE: No space left on device when running graphx job

Also, please see the screenshot below from spark web ui:
This is the snapshot just 5 seconds (I guess) before the job crashed.

[cid:image001.png@01D0F79F.44F6CC70]

From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 25 September 2015 9:55 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: RE: No space left on device when running graphx job

Hi, here is the full stack trace:

15/09/25 09:50:14 WARN scheduler.TaskSetManager: Lost task 21088.0 in stage 6.0 
(TID 62230, 192.168.70.129): java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at java.io.DataOutputStream.writeLong(DataOutputStream.java:224)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply$mcVJ$sp(IndexShuffleBlockResolver.scala:86)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1$$anonfun$apply$mcV$sp$1.apply(IndexShuffleBlockResolver.scala:84)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofLong.foreach(ArrayOps.scala:168)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply$mcV$sp(IndexShuffleBlockResolver.scala:84)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver$$anonfun$writeIndexFile$1.apply(IndexShuffleBlockResolver.scala:80)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
at 
org.apache.spark.shuffle.IndexShuffleBlockResolver.writeIndexFile(IndexShuffleBlockResolver.scala:88)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:71)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)


I am using df –i command to monitor the inode usage, which shows the below all 
the time:

Filesystem  Inodes  IUsed  IFree IUse% Mounted on
/dev/sda1  1245184 275424 969760   23% /
udev382148484 3816641% /dev
tmpfs   384505366 3841391% /run
none384505  3 3845021% /run/lock
none384505  1 3845041% /run/shm



From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Thursday, 24 September 2015 9:12 PM
To: Andy Huang
Cc: Jack Yang; user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: No space left on device when running graphx job

Andy:
Can you show complete stack trace ?

Have you checked there are enough free inode on the .129 machine ?

Cheers

On Sep 23, 2015, at 11:43 PM, Andy Huang 
<andy.hu...@servian.com.au<mailto:andy.hu...@servian.com.au>> wrote:
Hi Jack,

Are you writing out to disk? Or it sounds like Spark is spilling to disk (RAM 
filled up) and it's running out of disk space.

Cheers
Andy

On Thu, Sep 24, 2015 at 4:29 PM, Jack Yang 
<j...@uow.edu.au<mailto:j...@uow.edu.au>> wrote:
Hi folk,

I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores)
Basically, I load data using GraphLoader.edgeListFile mthod and then count 
number of nodes using: graph.vertices.count() method.
The problem is :

Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129): 
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:345)

when I try a small amount of data, the code is working. So I guess the error 
comes from the amount of data.
This is how I submit the job:

spark-submit --class "myclass"
--master spark://hadoopmaster:7077  (I am using standalone)
--executor-memory 2048M
--driver-java-options "-XX:MaxPermSize=2G&quo

log file directory

2015-07-28 Thread Jack Yang
Hi all,

I have questions with regarding to the log file directory.

That say if I run spark-submit   --master local[4], where is the log file?
Then how about if I run standalone spark-submit   --master 
spark://mymaster:7077?



Best regards,
Jack



RE: standalone to connect mysql

2015-07-21 Thread Jack Yang
That works! Thanks.
Can I ask you one further question?
How did spark sql support insertion?

That is say, if I did:
sqlContext.sql(insert into newStu values (“10”,”a”,1)

the error is:
failure: ``table'' expected but identifier newStu found
insert into newStu values ('10', aa, 1)

but if I did:
sqlContext.sql(sinsert into Table newStu select * from otherStu)
that works.

Is there any document addressing that?


Best regards,
Jack


From: Terry Hole [mailto:hujie.ea...@gmail.com]
Sent: Tuesday, 21 July 2015 4:17 PM
To: Jack Yang; user@spark.apache.org
Subject: Re: standalone to connect mysql

Maybe you can try: spark-submit --class sparkwithscala.SqlApp  --jars 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar

Thanks!
-Terry
Hi there,

I would like to use spark to access the data in mysql. So firstly  I tried to 
run the program using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar

that returns me the correct results. Then I tried the standalone version using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar
(the mysql-connector-java-5.1.34.jar i have them on all worker nodes.)
and the error is:

Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No 
suitable driver found for 
jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root

I also found the similar problem before in 
https://jira.talendforge.org/browse/TBD-2244.

Is this a bug to be fixed later? Or do I miss anything?



Best regards,
Jack



Re: standalone to connect mysql

2015-07-21 Thread Jack Yang
I maybe find the answer from the sqlparser.scala file.


Looks like the syntax spark used for insert is different from what we normally 
used for MySQL.

I hope if someone can confirm this. Also I will appreciate if there is a SQL 
reference list available.

Sent from my iPhone

On 21 Jul 2015, at 9:21 pm, Jack Yang 
j...@uow.edu.aumailto:j...@uow.edu.au wrote:

No. I did not use hiveContext at this stage.

I am talking the embedded SQL syntax for pure spark sql.

Thanks, mate.

On 21 Jul 2015, at 6:13 pm, Terry Hole 
hujie.ea...@gmail.commailto:hujie.ea...@gmail.com wrote:

Jack,

You can refer the hive sql syntax if you use HiveContext: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

Thanks!
-Terry

That works! Thanks.
Can I ask you one further question?
How did spark sql support insertion?

That is say, if I did:
sqlContext.sql(insert into newStu values (10,a,1)

the error is:
failure: ``table'' expected but identifier newStu found
insert into newStu values ('10', aa, 1)

but if I did:
sqlContext.sql(sinsert into Table newStu select * from otherStu)
that works.

Is there any document addressing that?


Best regards,
Jack


From: Terry Hole [mailto:hujie.ea...@gmail.commailto:hujie.ea...@gmail.com]
Sent: Tuesday, 21 July 2015 4:17 PM
To: Jack Yang; user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: standalone to connect mysql

Maybe you can try: spark-submit --class sparkwithscala.SqlApp  --jars 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar

Thanks!
-Terry
Hi there,

I would like to use spark to access the data in mysql. So firstly  I tried to 
run the program using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar

that returns me the correct results. Then I tried the standalone version using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar
(the mysql-connector-java-5.1.34.jar i have them on all worker nodes.)
and the error is:

Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No 
suitable driver found for 
jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root

I also found the similar problem before in 
https://jira.talendforge.org/browse/TBD-2244.

Is this a bug to be fixed later? Or do I miss anything?



Best regards,
Jack



Re: standalone to connect mysql

2015-07-21 Thread Jack Yang
No. I did not use hiveContext at this stage.

I am talking the embedded SQL syntax for pure spark sql.

Thanks, mate.

On 21 Jul 2015, at 6:13 pm, Terry Hole 
hujie.ea...@gmail.commailto:hujie.ea...@gmail.com wrote:

Jack,

You can refer the hive sql syntax if you use HiveContext: 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML

Thanks!
-Terry

That works! Thanks.
Can I ask you one further question?
How did spark sql support insertion?

That is say, if I did:
sqlContext.sql(insert into newStu values (10,a,1)

the error is:
failure: ``table'' expected but identifier newStu found
insert into newStu values ('10', aa, 1)

but if I did:
sqlContext.sql(sinsert into Table newStu select * from otherStu)
that works.

Is there any document addressing that?


Best regards,
Jack


From: Terry Hole [mailto:hujie.ea...@gmail.commailto:hujie.ea...@gmail.com]
Sent: Tuesday, 21 July 2015 4:17 PM
To: Jack Yang; user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: standalone to connect mysql

Maybe you can try: spark-submit --class sparkwithscala.SqlApp  --jars 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar

Thanks!
-Terry
Hi there,

I would like to use spark to access the data in mysql. So firstly  I tried to 
run the program using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar

that returns me the correct results. Then I tried the standalone version using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar
(the mysql-connector-java-5.1.34.jar i have them on all worker nodes.)
and the error is:

Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No 
suitable driver found for 
jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root

I also found the similar problem before in 
https://jira.talendforge.org/browse/TBD-2244.

Is this a bug to be fixed later? Or do I miss anything?



Best regards,
Jack



standalone to connect mysql

2015-07-20 Thread Jack Yang
Hi there,

I would like to use spark to access the data in mysql. So firstly  I tried to 
run the program using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar

that returns me the correct results. Then I tried the standalone version using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path 
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077 
/home/myjar.jar
(the mysql-connector-java-5.1.34.jar i have them on all worker nodes.)
and the error is:

Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task 0.3 in stage 0.0 (TID 3, 192.168.157.129): java.sql.SQLException: No 
suitable driver found for 
jdbc:mysql://hadoop1:3306/sparkMysqlDB?user=rootpassword=root

I also found the similar problem before in 
https://jira.talendforge.org/browse/TBD-2244.

Is this a bug to be fixed later? Or do I miss anything?



Best regards,
Jack



assertion failed error with GraphX

2015-07-19 Thread Jack Yang
Hi there,

I got an error when running one simple graphX program.
My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual 
machines.

if I constructed one small graph (6 nodes, 4 edges), I run:
println(triangleCount: %s .format(  
hdfs_graph.triangleCount().vertices.count() ))
that returns me the correct results.

But I import a much larger graph (with 85 nodes, 500 edges), the error 
is
15/07/20 12:03:36 WARN scheduler.TaskSetManager: Lost task 2.0 in stage 11.0 
(TID 32, 192.168.157.131): java.lang.AssertionError: assertion failed
at scala.Predef$.assert(Predef.scala:165)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:90)
at 
org.apache.spark.graphx.lib.TriangleCount$$anonfun$7.apply(TriangleCount.scala:87)
at 
org.apache.spark.graphx.impl.VertexPartitionBaseOps.leftJoin(VertexPartitionBaseOps.scala:140)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:159)
at 
org.apache.spark.graphx.impl.VertexRDDImpl$$anonfun$3.apply(VertexRDDImpl.scala:156)
at 
org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)


I run the above two graphs using the same submit command:
spark-submit --class sparkUI.GraphApp --master spark://master:7077 
--executor-memory 2G  --total-executor-cores 4 myjar.jar

any thought? anything wrong with my machine or configuration?




Best regards,
Jack



RE: error from DecisonTree Training:

2014-07-21 Thread Jack Yang
So this is a bug unsolved (for java) yet?

From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 18 July 2014 4:52 PM
To: user@spark.apache.org
Subject: error from DecisonTree Training:

Hi All,
I got an error while using DecisionTreeModel (my program is written in Java, 
spark 1.0.1, scala 2.10.1).
I have read a local file, loaded it as RDD, and then sent to decisionTree for 
training. See below for details:

JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache();
LogisticRegressionModel model = 
LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize);   // until 
here it is working
Strategy strategy = new Strategy( );
DecisionTree decisionTree = new DecisionTree(strategy);
DecisionTreeModel decisionTreeModel = decisionTree.train(Points.rdd());


The error is : java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast 
to [Lorg.apache.spark.mllib.regression.LabeledPoint;

Any thoughts?

Best regards,
Jack



RE: error from DecisonTree Training:

2014-07-21 Thread Jack Yang
That is nice.
Thanks Xiangrui.

-Original Message-
From: Xiangrui Meng [mailto:men...@gmail.com] 
Sent: Tuesday, 22 July 2014 9:31 AM
To: user@spark.apache.org
Subject: Re: error from DecisonTree Training:

This is a known issue:
https://issues.apache.org/jira/browse/SPARK-2197 . Joseph is working on it. 
-Xiangrui

On Mon, Jul 21, 2014 at 4:20 PM, Jack Yang j...@uow.edu.au wrote:
 So this is a bug unsolved (for java) yet?



 From: Jack Yang [mailto:j...@uow.edu.au]
 Sent: Friday, 18 July 2014 4:52 PM
 To: user@spark.apache.org
 Subject: error from DecisonTree Training:



 Hi All,

 I got an error while using DecisionTreeModel (my program is written in 
 Java, spark 1.0.1, scala 2.10.1).

 I have read a local file, loaded it as RDD, and then sent to 
 decisionTree for training. See below for details:



 JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache();

 LogisticRegressionModel model =
 LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize);   //
 until here it is working

 Strategy strategy = new Strategy( ….);

 DecisionTree decisionTree = new DecisionTree(strategy);

 DecisionTreeModel decisionTreeModel = 
 decisionTree.train(Points.rdd());





 The error is : java.lang.ClassCastException: [Ljava.lang.Object; 
 cannot be cast to [Lorg.apache.spark.mllib.regression.LabeledPoint;



 Any thoughts?



 Best regards,

 Jack




error from DecisonTree Training:

2014-07-18 Thread Jack Yang
Hi All,
I got an error while using DecisionTreeModel (my program is written in Java, 
spark 1.0.1, scala 2.10.1).
I have read a local file, loaded it as RDD, and then sent to decisionTree for 
training. See below for details:

JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache();
LogisticRegressionModel model = 
LogisticRegressionWithSGD.train(Points.rdd(),iterations, stepSize);   // until 
here it is working
Strategy strategy = new Strategy( );
DecisionTree decisionTree = new DecisionTree(strategy);
DecisionTreeModel decisionTreeModel = decisionTree.train(Points.rdd());


The error is : java.lang.ClassCastException: [Ljava.lang.Object; cannot be cast 
to [Lorg.apache.spark.mllib.regression.LabeledPoint;

Any thoughts?

Best regards,
Jack