I got this error when trying to perform PCA on a sparse matrix, each row
has a nominal length of 8000, and there are 36k rows. each row has on
average 3 elements being non-zero.
I guess the total size is not that big.
Exception in thread main java.lang.OutOfMemoryError: Java heap space
at
this is the stack trace I got with yarn logs -applicationId
really no idea where to dig further.
thanks!
yang
14/10/21 14:36:43 INFO ConnectionManager: Accepted connection from [
phxaishdc9dn1262.stratus.phx.ebay.com/10.115.58.21]
14/10/21 14:36:47 ERROR Executor: Exception in task ID 98
artifactIdscala-library/artifactId
version2.10.4/version
/dependency
Thanks a lot
Yang
during tests, I often modify my code a little bit and want to see the
result.
but spark-submit requires the full fat-jar, which takes quite a lot of time
to build.
I just need to run in --master local mode. is there a way to run it without
rebuilding the fat jar?
thanks
Yang
.
thanks!
yang
: %s.format(numAs, numBs))
}
}
then I debug through this and it became fairly clear
On Sun, Jul 19, 2015 at 10:13 PM, Yang tedd...@gmail.com wrote:
thanks, my point is that earlier versions are normally much simpler so
it's easier to follow. and the basic structure should at least bare great
(Task[] ) through serialization.
On Mon, Jul 20, 2015 at 12:38 AM, Yang tedd...@gmail.com wrote:
ok got some headstart:
pull the git source to 14719b93ff4ea7c3234a9389621be3c97fa278b9 (first
release so that I could at least build it)
then build it according to README.md,
then get
why you started with such an early commit.
Spark project has evolved quite fast.
I suggest you clone Spark project from github.com/apache/spark/ and start
with core/src/main/scala/org/apache/spark/rdd/RDD.scala
Cheers
On Sun, Jul 19, 2015 at 7:44 PM, Yang tedd...@gmail.com wrote:
I'm
I tried df.options(MAP(prop_name->prop_value)).saveAsTable(tb_name)
doesn't seem to work
thanks a lot!
the new DataSet API is supposed to provide type safety and type checks at
compile time
https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#join-operations
It does this indeed for a lot of places, but I found it still doesn't have
a type safe join:
val ds1 =
2|
+-++
On Tue, Oct 18, 2016 at 11:30 PM, Yang <tedd...@gmail.com> wrote:
> scala> val a = sc.parallelize(Array((1,2),(3,4)))
> a: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[243] at
> parallelize at :38
>
> scala> val a_ds = hc.di.createDa
scala> val a = sc.parallelize(Array((1,2),(3,4)))
a: org.apache.spark.rdd.RDD[(Int, Int)] = ParallelCollectionRDD[243] at
parallelize at :38
scala> val a_ds = hc.di.createDataFrame(a).as[(Long,Long)]
a_ds: org.apache.spark.sql.Dataset[(Long, Long)] = [_1: int, _2: int]
scala>
>> > per-iteration time).
>>
>> >
>>
>> > Note that the current impl forces dense arrays for intermediate data
>>
>> > structures, increasing the communication cost significantly. See this
>> PR for
>>
>> > info: https://githu
ame wrapping your RDD, and $"id" % 10 with the key
> to group by, then you can get the RDD from shuffled and do the following
> operations you want.
>
> Cheng
>
>
>
> On 10/20/16 10:53 AM, Yang wrote:
>
>> in my application, I group by same training sa
uld avoid it for large groups.
>
> The key is to never materialize the grouped and shuffled data.
>
> To see one approach to do this take a look at
> https://github.com/tresata/spark-sorted
>
> It's basically a combination of smart partitioning and secondary sort.
&g
while making small changes
to the code.
any idea what part of the spark framework might have caused this ?
thanks
Yang
in my application, I group by same training samples by their model_id's
(the input table contains training samples for 100k different models),
then each group ends up having about 1 million training samples,
then I feed that group of samples to a little Logistic Regression solver
(SGD), but SGD
with the following simple code
val a =
sc.createDataFrame(sc.parallelize(Seq((1,2),(3,4.as[(Int,Int)]
val grouped = a.groupByKey({x:(Int,Int)=>x._1})
val mappedGroups = grouped.mapGroups((k,x)=>{(k,1)})
val yyy = sc.broadcast(1)
val last = mappedGroups.rdd.map(xx=>{
I'm trying to use the joinWith() method instead of join() since the former
provides type checked result while the latter is a straight DataFrame.
the signature is DataSet[(T,U)] joinWith(other:DataSet[U], col:Column)
here the second arg, col:Column is normally provided by
does mllib support this?
I do see Lasso impl here
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/regression/Lasso.scala
if it supports LR , could you please show me a link? what algorithm does it
use?
thanks
regression.html#logistic-regression
>
> You'd set elasticnetparam = 1 for Lasso
>
> On Wed, Jan 4, 2017 at 7:13 PM, Yang <tedd...@gmail.com> wrote:
>
>> does mllib support this?
>>
>> I do see Lasso impl here https://github.com/apache
>> /spark/blob/maste
summary: Spark-shell fails to redefine values in some cases, this is at
least found in a case where "implicit" is involved, but not limited to such
cases
run the following in spark-shell, u can see that the last redefinition does
not take effect. the same code runs in plain scala REPL without
. This way when I encode the wrapper, the bean encoder simply encodes
the getContent() output, I think. encoding a list of tuples is very fast.
Yang
On Tue, May 9, 2017 at 11:19 AM, Michael Armbrust <mich...@databricks.com>
wrote:
> I think you are supposed to set BeanProperty on a var a
> <https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala#L71-L83>.
> If you are using scala though I'd consider using the case class encoders.
>
> On Tue, May 9, 2017 at 12:21 AM, Yang
/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala#L71-L83>.
> If you are using scala though I'd consider using the case class encoders.
>
> On Tue, May 9, 2017 at 12:21 AM, Yang <tedd...@gmail.com> wrote:
>
>> I'm trying to use Encoders.bean() t
2.0.2 with scala 2.11
On Tue, May 9, 2017 at 11:30 AM, Michael Armbrust <mich...@databricks.com>
wrote:
> Which version of Spark?
>
> On Tue, May 9, 2017 at 11:28 AM, Yang <tedd...@gmail.com> wrote:
>
>> actually with var it's the same:
I'm trying to use Encoders.bean() to create an encoder for my custom class,
but it fails complaining about can't find the schema:
class Person4 { @scala.beans.BeanProperty def setX(x:Int): Unit = {} @scala.
beans.BeanProperty def getX():Int = {1} } val personEncoder = Encoders.bean[
s.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/908554720841389/2840265927289860/latest.html>
> in
> Spark 2.1.
>
> On Tue, May 9, 2017 at 12:10 PM, Yang <tedd...@gmail.com> wrote:
>
>> somehow the schema check is here
>>
>> https://g
Hi
The 512MB is the default memory size which each executor needs. and
actually, your job does not need as much as the default memory size. you
can create a SparkContext with
sc = new SparkContext(local-cluster[2,1,512], test) // suppose you use
the local-cluster model.
Here the 512 is the
Hi
I am studing the structure of the Spark Streaming(my spark version is
0.9.0). I have a question about the SocketReceiver.In the onStart function:
---
protected def onStart() {
logInfo(Connecting to + host + : + port)
val socket
Hi
I am also curious about this question.
The textFile function was supposed to read a hdfs file? In this case
,It is on local filesystem that the file was taken in.There are any
recognization ways to identify the local filesystem and the hdfs in the
textFile function?
Beside, the OOM
Hi
Just know akka is under a commercial license,however Spark is under the
apache
license.
Is there any problem?
Regards
Hi,
Just wondering if you can try this:
val obj = sql(select manufacturer, count(*) as examcount from pft
group by manufacturer order by examcount desc)
obj.collect()
obj.queryExecution.executedPlan.executeCollect()
and time the third line alone. It could be that Spark SQL taking some
time to
I may be wrong, but I think RDDs must be created inside a
SparkContext. To somehow preserve the order of the list, perhaps you
could try something like:
sc.parallelize((1 to xs.size).zip(xs))
On Fri, Jun 13, 2014 at 6:08 PM, SK skrishna...@gmail.com wrote:
Hi,
I have a List[ (String, Int,
Sorry I wasn't being clear. The idea off the top of my head was that
you could append an original position index to each element (using the
line above), and modified what ever processing functions you have in
mind to make them aware of these indices. And I think you are right
that RDD collections
If your input data is JSON, you can also try out the recently merged
in initial JSON support:
https://github.com/apache/spark/commit/d2f4f30b12f99358953e2781957468e2cfe3c916
On Wed, Jun 18, 2014 at 5:27 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
That’s pretty neat! So I guess if you
Hi Stuti,
Yes, you do need to install R on all nodes. Furthermore the rJava
library is also required, which can be installed simply using
'install.packages(rJava)' in the R shell. Some more installation
instructions after that step can be found in the README here:
Hi durin,
I just tried this example (nice data, by the way!), *with each JSON
object on one line*, and it worked fine:
scala rdd.printSchema()
root
|-- entities: org.apache.spark.sql.catalyst.types.StructType$@13b6cdef
||-- friends:
Hi Haoming,
For your spark-submit question: can you try using an assembly jar
(sbt/sbt assembly will build it for you)? Another thing to check is
if there is any package structure that contains your SimpleApp; if so
you should include the hierarchal name.
Zongheng
On Thu, Jul 10, 2014 at 11:33
Hey Jerry,
When you ran these queries using different methods, did you see any
discrepancy in the returned results (i.e. the counts)?
On Thu, Jul 10, 2014 at 5:55 PM, Michael Armbrust
mich...@databricks.com wrote:
Yeah, sorry. I think you are seeing some weirdness with partitioned tables
that
Sounds like a job for Spark SQL:
http://spark.apache.org/docs/latest/sql-programming-guide.html !
On Tue, Jul 15, 2014 at 11:25 AM, Nick Pentreath
nick.pentre...@gmail.com wrote:
You can use .distinct.count on your user RDD.
What are you trying to achieve with the time group by?
—
Sent from
FWIW, I am unable to reproduce this using the example program locally.
On Tue, Jul 15, 2014 at 11:56 AM, Keith Simmons keith.simm...@gmail.com wrote:
Nope. All of them are registered from the driver program.
However, I think we've found the culprit. If the join column between two
tables is
- user@incubator
Hi Keith,
I did reproduce this using local-cluster[2,2,1024], and the errors
look almost the same. Just wondering, despite the errors did your
program output any result for the join? On my machine, I could see the
correct output.
Zongheng
On Tue, Jul 15, 2014 at 1:46 PM,
Hi Keith gorenuru,
This patch (https://github.com/apache/spark/pull/1423) solves the
errors for me in my local tests. If possible, can you guys test this
out to see if it solves your test programs?
Thanks,
Zongheng
On Tue, Jul 15, 2014 at 3:08 PM, Zongheng Yang zonghen...@gmail.com wrote
One way is to set this in your conf/spark-defaults.conf:
spark.executor.extraLibraryPath /path/to/native/lib
The key is documented here:
http://spark.apache.org/docs/latest/configuration.html
On Thu, Jul 17, 2014 at 1:25 PM, Eric Friedman
eric.d.fried...@gmail.com wrote:
I used to use
Hi All,
I got an error while using DecisionTreeModel (my program is written in Java,
spark 1.0.1, scala 2.10.1).
I have read a local file, loaded it as RDD, and then sent to decisionTree for
training. See below for details:
JavaRDDLabeledPoint Points = lines.map(new ParsePoint()).cache();
So this is a bug unsolved (for java) yet?
From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 18 July 2014 4:52 PM
To: user@spark.apache.org
Subject: error from DecisonTree Training:
Hi All,
I got an error while using DecisionTreeModel (my program is written in Java,
spark 1.0.1, scala
is working on it.
-Xiangrui
On Mon, Jul 21, 2014 at 4:20 PM, Jack Yang j...@uow.edu.au wrote:
So this is a bug unsolved (for java) yet?
From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 18 July 2014 4:52 PM
To: user@spark.apache.org
Subject: error from DecisonTree Training:
Hi All
Do you mean that the texts of the SQL queries being hardcoded in the
code? What do you mean by cannot shar the sql to all workers?
On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi guys,
I'm able to run some Spark SQL example but the sql is static in the code. I
,
Siyuan
On Tue, Jul 22, 2014 at 4:15 PM, Zongheng Yang zonghen...@gmail.com wrote:
Do you mean that the texts of the SQL queries being hardcoded in the
code? What do you mean by cannot shar the sql to all workers?
On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys
As Hao already mentioned, using 'hive' (the HiveContext) throughout would
work.
On Monday, July 28, 2014, Cheng, Hao hao.ch...@intel.com wrote:
In your code snippet, sample is actually a SchemaRDD, and SchemaRDD
actually binds a certain SQLContext in runtime, I don't think we can
To add to this: for this many (= 20) machines I usually use at least
--wait 600.
On Wed, Jul 30, 2014 at 9:10 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
William,
The error you are seeing is misleading. There is no need to terminate the
cluster and start over.
Just re-run your
countDistinct is recently added and is in 1.0.2. If you are using that
or the master branch, you could try something like:
r.select('keyword, countDistinct('userId)).groupBy('keyword)
On Thu, Jul 31, 2014 at 12:27 PM, buntu buntu...@gmail.com wrote:
I'm looking to write a select statement
, Buntu Dev buntu...@gmail.com wrote:
Thanks Zongheng for the pointer. Is there a way to achieve the same in 1.0.0
?
On Thu, Jul 31, 2014 at 1:43 PM, Zongheng Yang zonghen...@gmail.com wrote:
countDistinct is recently added and is in 1.0.2. If you are using that
or the master branch, you could
I agree that this is definitely useful.
One related project I know of is Sparkling [1] (also see talk at Spark
Summit 2014 [2]), but it'd be great (and I imagine somewhat
challenging) to visualize the *physical execution* graph of a Spark
job.
[1] http://pr01.uml.edu/
[2]
Hi Pranay,
If this is data format is to be assumed, then I believe the issue starts at
lines - textFile(sc,/sparkdev/datafiles/covariance.txt)
totals - lapply(lines, function(lines)
After the first line, `lines` becomes an RDD of strings, each of which
is a line of the form 1,1.
Hi List,
We're recently trying to running spark on Mesos, however, we encountered a fatal
error that mesos-master process will continuousely consume memory and finally
killed by OOM Killer, this situation only happening if has spark job
(fine-grained mode) running.
We finally root caused the
Hi,
I've got a huge list of key-value pairs, where the key is an integer and
the value is a long string(around 1Kb). I want to concatenate the strings
with the same keys.
Initially I did something like: pairs.reduceByKey((a, b) = a+ +b)
Then tried to save the result to HDFS. But it was
Guys,
Recently we are migrating our backend pipeline from to Spark.
In our pipeline, we have a MPI-based HAC implementation, to ensure the
result consistency of migration, we also want to migrate this
MPI-implemented code to Spark.
However, during the migration process, I found that there are
Guys,
As to the questions of pre-processing, you could just migrate your logic to
Spark before using K-means.
I only used Scala on Spark, and haven't used Python binding on Spark, but I
think the basic steps must be the same.
BTW, if your data set is big with huge sparse dimension feature
Hi Daniel,
Thanks for your email! We don't have a book (yet?) specifically on SparkR,
but here's a list of helpful tutorials / links you can check out (I am
listing them in roughly basic - advanced order):
- AMPCamp5 SparkR exercises
http://ampcamp.berkeley.edu/5/exercises/sparkr.html. This
the connection to the port 8080. I could not figure out how to
solve it.
Any sueggestion is appreciated. Thanks a lot.
--
Sincerely Yours
Xingwei Yang
https://sites.google.com/site/xingweiyang1223/
-2.compute.amazonaws.com:7070:
akka.remote.EndpointAssociationException: Association failed with
[akka.tcp://
sparkmas...@ec2-54-149-92-187.us-west-2.compute.amazonaws.com:7070]
Please let me know if you any any clue about it. Thanks a lot.
--
Sincerely Yours
Xingwei Yang
https://sites.google.com
, it shows a error like this:
The method fromRDD(RDDT, ClassTagT) in the type JavaRDD is not
applicable for the arguments (RDDVector, ClassTagObject)
Is there anything wrong with the method? Thanks a lot.
--
Sincerely Yours
Xingwei Yang
https://sites.google.com/site/xingweiyang1223/
Hi Guys:
I want to kill an application but I could not find the driver id of
the application from web ui. Is there any way to get it from command line?
Thanks
--
Sincerely Yours
Xingwei Yang
https://sites.google.com/site/xingweiyang1223/
suggestion is to build Spark by yourself.
Anyway, would like to see your update once you figure out the solution.
Best wishes!
Bo
On Wed, Feb 4, 2015 at 4:47 AM, Corey Nolet cjno...@gmail.com wrote:
Bo yang-
I am using Spark 1.2.0 and undoubtedly there are older Guava classes which
are being
Corey,
Which version of Spark do you use? I am using Spark 1.2.0, and guava 15.0.
It seems fine.
Best,
Bo
On Tue, Feb 3, 2015 at 8:56 PM, M. Dale medal...@yahoo.com.invalid wrote:
Try spark.yarn.user.classpath.first (see
https://issues.apache.org/jira/browse/SPARK-2996 - only works for
Hi folks,
I am new to spark. I just get spark 1.2 to run on emr ami 3.3.1 (hadoop 2.4).
I ssh to emr master node and submit the job or start the shell. Everything runs
well except the webUI.
In order to see the UI, I used ssh tunnel which forward my dev machine port to
emr master node webUI
Hi,
I'm quite interested in how Spark's fault tolerance works and I'd like to
ask a question here.
According to the paper, there are two kinds of dependencies--the wide
dependency and the narrow dependency. My understanding is, if the
operations I use are all narrow, then when one machine
Check spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala
It can be used through sliding(windowSize: Int) in
spark/mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala
Yuhao
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Thursday, February 12, 2015
Guys,
I have a question regarding to Spark 1.1 broadcast implementation.
In our pipeline, we have a large multi-class LR model, which is about 1GiB
size.
To employ the benefit of Spark parallelism, a natural thinking is to
broadcast this model file to the worker node.
However, it looks that
Thanks Cheng for the clarification.
Looking forward to this new API mentioned below.
Yang
Sent from my iPad
On Mar 17, 2015, at 8:05 PM, Cheng Lian lian.cs@gmail.com wrote:
Hey Yang,
My comments are in-lined below.
Cheng
On 3/18/15 6:53 AM, Yang Lei wrote:
Hello,
I am
Spark which filters are handled already, so that there is
no redundant filtering.
Appreciate comments and links to any existing documentation or discussion.
Yang
Check this out : https://github.com/cloudant/spark-cloudant. It supports
both the DataFrame and SQL approach for reading data from Cloudant and save
it .
Looking forward to your feedback on the project.
Yang
of operations, then there will be a lot of shuffle data. So You need
to check in the worker logs and see what happened (whether DISK full etc.),
We have streaming pipelines running for weeks without having any issues.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons.
In my understanding, it is not guaranteed that Spark streaming receiver
will
On Mon, Mar 16, 2015 at 12:40 PM, Jun Yang yangjun...@gmail.com wrote:
Guys,
We have a project which builds upon Spark streaming.
We use Kafka as the input stream, and create 5 receivers.
When this application runs for around 90 hour, all the 5 receivers failed
for some unknown reasons
spawn another receiver on another machine or on the same machine.
Thanks
Best Regards
On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang yangjun...@gmail.com wrote:
Dibyendu,
Thanks for the reply.
I am reading your project homepage now.
One quick question I care about is:
If the receivers
Hi Kelvin,
Thank you. That works for me. I wrote my own joins that produced Scala
collections, instead of using rdd.join.
Regards,
Yang
On Thu, Mar 26, 2015 at 5:51 PM, Kelvin Chu 2dot7kel...@gmail.com wrote:
Hi, I used union() before and yes it may be slow sometimes. I _guess_ your
variable
Hi Mark,
That's true, but in neither way can I combine the RDDs, so I have to avoid
unions.
Thanks,
Yang
On Thu, Mar 26, 2015 at 5:31 PM, Mark Hamstra m...@clearstorydata.com
wrote:
RDD#union is not the same thing as SparkContext#union
On Thu, Mar 26, 2015 at 2:27 PM, Yang Chen y...@yang
?
Thanks in advance for any suggestions on how to resolve this.
Yang
I have no problem running the socket text stream sample in the same
environment.
Thanks
Yang
Sent from my iPhone
On Apr 25, 2015, at 1:30 PM, Akhil Das ak...@sigmoidanalytics.com wrote:
Make sure you are having =2 core for your streaming application.
Thanks
Best Regards
On Sat
I hit the same issue as if the directory has no files at all when running
the sample examples/src/main/python/streaming/hdfs_wordcount.py with a
local directory, and adding file into that directory . Appreciate comments
on how to resolve this.
--
View this message in context:
is using ip
addresses for all communication by
defining spark.driver.host, SPARK_PUBLIC_DNS, SPARK_LOCAL_IP, SPARK_LOCAL_HOST
in the right place.
Hope this help.
Yang.
On Fri, Apr 24, 2015 at 5:15 PM, Stephen Carman scar...@coldlight.com
wrote:
So I can’t for the life of me to get something even
Hi Cui,
Try to read the scala version of LDAExample,
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala
The matrix you're referring to is the corpus after vectorization.
One example, given a dict, [apple, orange, banana]
3
Hi Samsudhin,
If possible, can you please provide a part of the code? Or perhaps try with
the ut in RandomForestSuite to see if the issue repros.
Regards,
yuhao
-Original Message-
From: samsudhin [mailto:samsud...@pigstick.com]
Sent: Tuesday, June 23, 2015 2:14 PM
To:
Hi all,
I have questions with regarding to the log file directory.
That say if I run spark-submit --master local[4], where is the log file?
Then how about if I run standalone spark-submit --master
spark://mymaster:7077?
Best regards,
Jack
through Spark SQL:
https://www.linkedin.com/pulse/light-weight-self-service-data-query-through-spark-sql-bo-yang
Take a look and feel free to let me know for any question.
Best,
Bo
On Sat, Aug 8, 2015 at 1:42 PM, unk1102 umesh.ka...@gmail.com wrote:
Hi how do we create DataFrame from a binary
yang bobyan...@gmail.com wrote:
You can create your own data schema (StructType in spark), and use
following method to create data frame with your own data schema:
sqlContext.createDataFrame(yourRDD, structType);
I wrote a post on how to do it. You can also get the sample code there:
Light
Hi Akshat,
I find some open source library which implements S3 InputFormat for Hadoop.
Then I use Spark newAPIHadoopRDD to load data via that S3 InputFormat.
The open source library is https://github.com/ATLANTBH/emr-s3-io. It is a
little old. I look inside it and make some changes. Then it
Hi there,
I got an error when running one simple graphX program.
My setting is: spark 1.4.0, Hadoop yarn 2.5. scala 2.10. with four virtual
machines.
if I constructed one small graph (6 nodes, 4 edges), I run:
println(triangleCount: %s .format(
hdfs_graph.triangleCount().vertices.count() ))
Hi there,
I would like to use spark to access the data in mysql. So firstly I tried to
run the program using:
spark-submit --class sparkwithscala.SqlApp --driver-class-path
/home/lib/mysql-connector-java-5.1.34.jar --master local[4] /home/myjar.jar
that returns me the correct results. Then I
:
sqlContext.sql(sinsert into Table newStu select * from otherStu)
that works.
Is there any document addressing that?
Best regards,
Jack
From: Terry Hole [mailto:hujie.ea...@gmail.com]
Sent: Tuesday, 21 July 2015 4:17 PM
To: Jack Yang; user@spark.apache.org
Subject: Re: standalone to connect mysql
, at 9:21 pm, Jack Yang
j...@uow.edu.aumailto:j...@uow.edu.au wrote:
No. I did not use hiveContext at this stage.
I am talking the embedded SQL syntax for pure spark sql.
Thanks, mate.
On 21 Jul 2015, at 6:13 pm, Terry Hole
hujie.ea...@gmail.commailto:hujie.ea...@gmail.com wrote:
Jack,
You can
July 2015 4:17 PM
To: Jack Yang; user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: standalone to connect mysql
Maybe you can try: spark-submit --class sparkwithscala.SqlApp --jars
/home/lib/mysql-connector-java-5.1.34.jar --master spark://hadoop1:7077
/home/myjar.jar
Thanks!
-Terry
Hi
Hi all,
I am saving some hive- query results into the local directory:
val hdfsFilePath = "hdfs://master:ip/ tempFile ";
val localFilePath = "file:///home/hduser/tempFile";
hiveContext.sql(s"""my hql codes here""")
res.printSchema() --working
res.show() --working
res.map{ x => tranRow2Str(x)
Yes. My one is 1.4.0.
Then is this problem to do with the version?
I doubt that. Any comments please?
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Wednesday, 4 November 2015 11:52 AM
To: Jack Yang
Cc: user@spark.apache.org
Subject: Re: error with saveAsTextFile in local directory
Looks
September 2015 12:27 AM
To: Jack Yang
Cc: Ted Yu; Andy Huang; user@spark.apache.org
Subject: Re: No space left on device when running graphx job
Would you mind sharing what your solution was? It would help those on the forum
who might run into the same problem. Even it it’s a silly ‘gotcha
Hi folk,
I have an issue of graphx. (spark: 1.4.0 + 4 machines + 4G memory + 4 CPU cores)
Basically, I load data using GraphLoader.edgeListFile mthod and then count
number of nodes using: graph.vertices.count() method.
The problem is :
Lost task 11972.0 in stage 6.0 (TID 54585, 192.168.70.129):
Hi all,
I resolved the problems.
Thanks folk.
Jack
From: Jack Yang [mailto:j...@uow.edu.au]
Sent: Friday, 25 September 2015 9:57 AM
To: Ted Yu; Andy Huang
Cc: user@spark.apache.org
Subject: RE: No space left on device when running graphx job
Also, please see the screenshot below from spark web
1 - 100 of 208 matches
Mail list logo