Thanks
-Mo
2014-03-31 13:16 GMT-05:00 Evgeny Shishkin itparan...@gmail.com:
On 31 Mar 2014, at 21:05, Dong Mo monted...@gmail.com wrote:
Dear list,
I was wondering how Spark handles congestion when the upstream is
generating dstreams faster than downstream workers can handle?
It
Nicholas, I'm in Boston and would be interested in a Spark group. Not
sure if you know this -- there was a meetup that never got off the
ground. Anyway, I'd be +1 for attending. Not sure what is involved in
organizing. Seems a shame that a city like Boston doesn't have one.
On Mon, Mar 31, 2014
My fellow Bostonians and New Englanders,
We cannot allow New York to beat us to having a banging Spark meetup.
Respond to me (and I guess also Andy?) if you are interested.
Yana,
I'm not sure either what is involved in organizing, but we can figure it
out. I didn't know about the meetup that
I would offer to host one in Cape Town but we're almost certainly the only
Spark users in the country apart from perhaps one in Johanmesburg :)—
Sent from Mailbox for iPhone
On Mon, Mar 31, 2014 at 8:53 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
My fellow Bostonians and New
Happy to help with an NYC meet up (just emailed Andy). I recently moved to VA,
but am back in NYC quite often, and have been turning several computational
people at Columbia / NYU / Simons Foundation onto Spark; there'd definitely be
interest in those communities.
-- Jeremy
Also in NYC, definitely interested in a spark meetup!
Sent from my iPhone
On Mar 31, 2014, at 3:07 PM, Jeremy Freeman freeman.jer...@gmail.com wrote:
Happy to help with an NYC meet up (just emailed Andy). I recently moved to
VA, but am back in NYC quite often, and have been turning several
If you have any questions on helping to get a Spark Meetup off the ground,
please do not hesitate to ping me (denny.g@gmail.com). I helped jump start
the one here in Seattle (and tangentially have been helping the Vancouver and
Denver ones as well). HTH!
On March 31, 2014 at 12:35:38
Your suggestion took me past the ClassNotFoundException. I then hit
akka.actor.ActorNotFound exception. I patched in PR 568 into my 0.9.0 spark
codebase and everything worked.
So thanks a lot, Tim. Is there a JIRA/PR for the protobuf issue? Why is it not
fixed in the latest git tree?
Thanks.
Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be
getting pulled in unless you are directly using akka yourself. Are you?
Does your project have other dependencies that might be indirectly pulling
in protobuf 2.4.1? It would be helpful if you could list all of your
In the spirit of everything being bigger and better in TX ;) = if
anyone is in Austin and interested in meeting up over Spark - contact
me! There seems to be a Spark meetup group in Austin that has never met
and my initial email to organize the first gathering was never acknowledged.
Ognen
On
@eric-
i saw this exact issue recently while working on the KinesisWordCount.
are you passing local[2] to your example as the MASTER arg versus just
local or local[1]?
you need at least 2. it's documented as n1 in the scala source docs -
which is easy to mistake for n=1.
i just ran the
I was talking about the protobuf version issue as not fixed. I could not find
any reference to the problem or the fix.
Reg. SPARK-1052, I could pull in the fix into my 0.9.0 tree (from the tar ball
on the website) and I see the fix in the latest git.
Thanks
On 01-Apr-2014, at 3:28 am, deric
Hi Andy,
I would be interested in setting up a meetup in Delhi/NCR, India. Can you
please let me know how to go about organizing it?
Best Regards,
Sonal
Nube Technologies http://www.nubetech.co
http://in.linkedin.com/in/sonalgoyal
On Tue, Apr 1, 2014 at 10:04 AM, giive chen
Another problem I noticed is that the current 1.0.0 git tree still gives me the
ClassNotFoundException. I see that the SPARK-1052 is already fixed there. I
then modified the pom.xml for mesos and protobuf and that still gave the
ClassNotFoundException. I also tried modifying pom.xml only for
Hi Li-Ming,
This binary logistic regression using SGD is in
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
We're working on multinomial logistic regression using Newton and L-BFGS
optimizer now. Will be released
Thanks.
What will be equivalent code in Hadoop where Spark published the 110s/0.9s
comparison?
On 1 Apr, 2014, at 2:44 pm, DB Tsai dbt...@alpinenow.com wrote:
Hi Li-Ming,
This binary logistic regression using SGD is in
I think with addJar() there is no 'caching', in the sense files will be
copied everytime per job.
Whereas in hadoop distributed cache, files will be copied only once, and a
symlink will be created to the cache file for subsequent runs:
Spark now shades its own protobuf dependency so protobuf 2.4.1 should't be
getting pulled in unless you are directly using akka yourself. Are you?
No i'm not. Although I see that protobuf libraries are directly pulled into the
0.9.0 assembly jar - I do see the shaded version as well.
e.g.
Hi All,
I have a five node spark cluster, Master, s1,s2,s3,s4.
I have passwordless ssh to all slaves from master and vice-versa.
But only one machine, s2, what happens is after 2-3 minutes of my
connection from master to slave, the write-pipe is broken. So if try to
connect again from master i
Hello, I would like to have a kind of sub windows. The idea is to have 3
windows in the following way:
future - --
past
w1 w2 w3
So I can do some processing with the
hello..
i am on my second day with spark.. and im having trouble getting the foreach
function to work with the network wordcount example.. i can see the the
flatMap and map methods are being invoked.. but i dont seem to be getting
into the foreach method... not sure if what i am doing even
How do you remove the validation blocker from the compilation?
Thank you
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-submit-an-application-to-standalone-cluster-which-on-hdfs-tp1730p3574.html
Sent from the Apache Spark User List mailing list
i would like to write a custom receiver to receive data from a Tibco RV subject
i found this scala example..
http://spark.incubator.apache.org/docs/0.8.0/streaming-custom-receivers.html
but i cant seem to find a java example
does anybody know of a good java example for creating a custom receiver
Hi all;
Can someone give me some tips to compute mean of RDD by key , maybe with
combineByKey and StatCount.
Cheers,
Jaonary
Some related discussion: https://github.com/apache/spark/pull/246
On Tue, Apr 1, 2014 at 8:43 AM, Philip Ogren philip.og...@oracle.comwrote:
Hi DB,
Just wondering if you ever got an answer to your question about monitoring
progress - either offline or through your own investigation. Any
You could probably port it back, but it required some changes on the Java side
as well (a new PythonMLUtils class). It might be easier to fix the Mesos issues
with 0.9.
Matei
On Apr 1, 2014, at 8:53 AM, Ian Ferreira ianferre...@hotmail.com wrote:
Hi there,
For some reason the
Yes I'm using akka as well. But if that is the problem then I should have
been facing this issue in my local setup as well. I'm only running into this
error on using the spark standalone cluster.
But will try out your suggestion and let you know.
Thanks
Kanwal
--
View this message in context:
The discussion there hits on the distinction of jobs and stages.
When looking at one application, there are hundreds of stages,
sometimes thousands. Depends on the data and the task. And the UI
seems to track stages. And one could independently track them for
such a job.
I've removed the dependency on akka in a separate project but still running
into the same error. In the POM Dependency Hierarchy I do see 2.4.1 - shaded
and 2.5.0 being included. If there is a conflict with project dependency I
would think I should be getting the same error in my local setup as
SPARK_HADOOP_VERSION=2.0.0-cdh4.2.1 sbt/sbt assembly
That's all I do.
On Apr 1, 2014, at 11:41 AM, Patrick Wendell pwend...@gmail.com wrote:
Vidal - could you show exactly what flags/commands you are using when you
build spark to produce this assembly?
On Tue, Apr 1, 2014 at 12:53 AM,
Alright, so I've upped the minSplits parameter on my call to textFile, but
the resulting RDD still has only 1 partition, which I assume means it was
read in on a single process. I am checking the number of partitions in
pyspark by using the rdd._jrdd.splits().size() trick I picked up on this
list.
When my tuple type includes a generic type parameter, the pair RDD
functions aren't available. Take for example the following (a join on two
RDDs, taking the sum of the values):
def joinTest(rddA: RDD[(String, Int)], rddB: RDD[(String, Int)]) :
RDD[(String, Int)] = {
rddA.join(rddB).map {
Looks like you're right that gzip files are not easily splittable [1], and
also about everything else you said.
[1]
http://mail-archives.apache.org/mod_mbox/spark-user/201310.mbox/%3CCANDWdjY2hN-=jXTSNZ8JHZ=G-S+ZKLNze=rgkjacjaw3tto...@mail.gmail.com%3E
On Tue, Apr 1, 2014 at 1:51 PM, Nicholas
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag
def joinTest[K: ClassTag](rddA: RDD[(K, Int)], rddB: RDD[(K, Int)]) :
RDD[(K, Int)] = {
rddA.join(rddB).map { case (k, (a, b)) = (k, a+b) }
}
On Tue, Apr 1, 2014 at 4:55 PM, Daniel
Koert's answer is very likely correct. This implicit definition which
converts an RDD[(K, V)] to provide PairRDDFunctions requires a ClassTag is
available for K:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/SparkContext.scala#L1124
To fully understand what's
Just an FYI, it's not obvious from the
docshttp://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#partitionBythat
the following code should fail:
a = sc.parallelize([1,2,3,4,5,6,7,8,9,10], 2)
a._jrdd.splits().size()
a.count()
b = a.partitionBy(5)
http://spark.incubator.apache.org/docs/latest/spark-standalone.html#monitoring-and-logging
As the above shows:
Monitoring and Logging
Spark’s standalone mode offers a web-based user interface to monitor the
cluster. The master and each worker has its own web UI that shows cluster
and job
Are you trying to access the UI from another machine? If so, first confirm
that you don't have a network issue by opening the UI from the master node
itself.
For example:
yum -y install lynx
lynx ip_address:8080
If this succeeds, then you likely have something blocking you from
accessing the
Do you get the same problem if you build with maven?
On Tue, Apr 1, 2014 at 12:23 PM, Vipul Pandey vipan...@gmail.com wrote:
SPARK_HADOOP_VERSION=2.0.0-cdh4.2.1 sbt/sbt assembly
That's all I do.
On Apr 1, 2014, at 11:41 AM, Patrick Wendell pwend...@gmail.com wrote:
Vidal - could you show
Hm, yeah, the docs are not clear on this one. The function you're looking
for to change the number of partitions on any ol' RDD is repartition(),
which is available in master but for some reason doesn't seem to show up in
the latest docs. Sorry about that, I also didn't realize partitionBy() had
Alright!
Thanks for that link. I did little research based on it and it looks like
Snappy or LZO + some container would be better alternatives to gzip.
I confirmed that gzip was cramping my style by trying sc.textFile() on an
uncompressed version of the text file. With the uncompressed file,
Hmm, doing help(rdd) in PySpark doesn't show a method called repartition().
Trying rdd.repartition() or rdd.repartition(10) also fail. I'm on 0.9.0.
The approach I'm going with to partition my MappedRDD is to key it by a
random int, and then partition it.
So something like:
rdd =
Dell - Internal Use - Confidential
I got an exception can't zip RDDs with unusual numbers of Partitions when I
apply any action (reduce, collect) of dataset created by zipping two dataset of
10 million entries each. The problem occurs independently of the number of
partitions or when I let
You can get detailed information through Spark listener interface regarding
each stage. Multiple jobs may be compressed into a single stage so jobwise
information would be same as Spark.
Regards
Mayur
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi
mllib has been part of Spark distribution (under mllib directory), also check
http://spark.apache.org/docs/latest/mllib-guide.html
and for JIRA, because of the recent migration to apache JIRA, I think all
mllib-related issues should be under the Spark umbrella,
Hi Nan,
I was actually referring to MLI/MLBase (http://www.mlbase.org); is this
being actively developed?
I'm familiar with mllib and have been looking at its documentation.
Thanks!
On Tue, Apr 1, 2014 at 10:44 PM, Nan Zhu [via Apache Spark User List]
ml-node+s1001560n3611...@n3.nabble.com
Ah, I see, I’m sorry, I didn’t read your email carefully
then I have no idea about the progress on MLBase
Best,
--
Nan Zhu
On Tuesday, April 1, 2014 at 11:05 PM, Krakna H wrote:
Hi Nan,
I was actually referring to MLI/MLBase (http://www.mlbase.org); is this being
actively
Hi there,
MLlib is the first component of MLbase - MLI and the higher levels of the
stack are still being developed. Look for updates in terms of our progress
on the hyperparameter tuning/model selection problem in the next month or
so!
- Evan
On Tue, Apr 1, 2014 at 8:05 PM, Krakna H
From API docs: Zips this RDD with another one, returning key-value
pairs with the first element in each RDD, second element in each RDD,
etc. Assumes that the two RDDs have the *same number of partitions*
and the *same number of elements in each partition* (e.g. one was made
through a map on the
It's this: mvn -Dhadoop.version=2.0.0-cdh4.2.1 -DskipTests clean package
On Tue, Apr 1, 2014 at 5:15 PM, Vipul Pandey vipan...@gmail.com wrote:
how do you recommend building that - it says
ERROR] Failed to execute goal
org.apache.maven.plugins:maven-assembly-plugin:2.2-beta-5:assembly
Hi Thierry,
Your code does not work if @yh18190 wants a global counter. A RDD may have
more than one partition. For each partition, cnt will be reset to -1. You
can try the following code:
scala val rdd = sc.parallelize( (1, 'a') :: (2, 'b') :: (3, 'c') :: (4,
'd') :: Nil)
rdd:
I think multiply by ratings is a heuristic that worked on rating related
problems like netflix dataset or any other ratings datasets but the scope
of NMF is much more broad than that
@Sean please correct me in case you don't agree...
Definitely it's good to add all the rating dataset related
I downloaded 0.9.0 fresh and ran the mvn command - the assembly jar thus
generated also has both shaded and real version of protobuf classes
Vipuls-MacBook-Pro-3:spark-0.9.0-incubating vipul$ jar -ftv
./assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.0.0-cdh4.2.1.jar
|
Hi Therry,
Thanks for the above responses..I implemented using RangePartitioner..we
need to use any of the custom partitioners in orderto perform this
task..Normally u cant maintain a counter becoz count operations should
beperformed on each partitioned block ofdata...
--
View this message in
I’ve been able to get CDH5 up and running on EC2 and according to Cloudera
Manager, Spark is running healthy.
But when I try to run spark-shell, I eventually get the error:
14/04/02 07:18:18 INFO client.AppClient$ClientActor: Connecting to master
spark://ip-172-xxx-xxx-xxx:7077...
14/04/02
Thanks for the update Evan! In terms of using MLI, I see that the Github
code is linked to Spark 0.8; will it not work with 0.9 (which is what I
have set up) or higher versions?
On Wed, Apr 2, 2014 at 1:44 AM, Evan R. Sparks [via Apache Spark User List]
ml-node+s1001560n3615...@n3.nabble.com
It should be kept in mind that different implementations are rarely
strictly better, and that what works well in one type of data might
not in another. It also bears keeping in mind that several of these
differences just amount to different amounts of regularization, which
need not be a
Hi, Spark Devs:
I encounter a problem which shows error message akka.actor.ActorNotFound
on our mesos mini-cluster.
mesos : 0.17.0
spark : spark-0.9.0-incubating
spark-env.sh:
#!/usr/bin/env bash
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=hdfs://
Heya,
Yep this is a problem in the Mesos scheduler implementation that has been
fixed after 0.9.0 (https://spark-project.atlassian.net/browse/SPARK-1052 =
MesosSchedulerBackend)
So several options, like applying the patch, upgrading to 0.9.1 :-/
Cheers,
Andy
On Wed, Apr 2, 2014 at 5:30 PM,
Aha, thank you for your kind reply.
Upgrading to 0.9.1 is a good choice. :)
On Wed, Apr 2, 2014 at 11:35 PM, andy petrella andy.petre...@gmail.comwrote:
Heya,
Yep this is a problem in the Mesos scheduler implementation that has been
fixed after 0.9.0
np ;-)
On Wed, Apr 2, 2014 at 5:50 PM, Leon Zhang leonca...@gmail.com wrote:
Aha, thank you for your kind reply.
Upgrading to 0.9.1 is a good choice. :)
On Wed, Apr 2, 2014 at 11:35 PM, andy petrella andy.petre...@gmail.comwrote:
Heya,
Yep this is a problem in the Mesos scheduler
Can someone explain how RDD is resilient? If one of the partition is lost,
who is responsible to recreate that partition - is it the driver program?
Hi Guys
I would like printing the content inside of line in :
JavaDStreamString lines = ssc.socketTextStream(args[1],
Integer.parseInt(args[2]));
JavaDStreamString words = lines.flatMap(new
FlatMapFunctionString, String() {
@Override
public IterableString call(String x) {
TL;DR
Your classes are missing on the workers, pass the jar containing the
class main.scala.Utils to the SparkContext
Longer:
I miss some information, like how the SparkContext is configured but my
best guess is that you didn't provided the jars (addJars on SparkConf or
use the SC's constructor
Update: I'm now using this ghetto function to partition the RDD I get back
when I call textFile() on a gzipped file:
# Python 2.6
def partitionRDD(rdd, numPartitions):
counter = {'a': 0}
def count_up(x):
counter['a'] += 1
return counter['a']
return (rdd.keyBy(count_up)
Sorry I was not clear perhaps, anyway, could you try with the path in the
*List* to be the absolute one; e.g.
List(/home/yh/src/pj/spark-stuffs/target/scala-2.10/simple-project_2.10-1.0.jar)
In order to provide a relative path, you need first to figure out your CWD,
so you can do (to be really
For textFile I believe we overload it and let you set a codec directly:
https://github.com/apache/spark/blob/master/core/src/test/scala/org/apache/spark/FileSuite.scala#L59
For saveAsSequenceFile yep, I think Mark is right, you need an option.
On Wed, Apr 2, 2014 at 12:36 PM, Mark Hamstra
Hi Folks
I'm looking to buy some gear to run Spark. I'm quite well versed in Hadoop
Server design but there does not seem to be much Spark related collateral
around infrastructure guidelines (or at least I haven't been able to find
them). My current thinking for server design is something
Is this a
Scala-onlyhttp://spark.incubator.apache.org/docs/latest/api/pyspark/pyspark.rdd.RDD-class.html#saveAsTextFilefeature?
On Wed, Apr 2, 2014 at 5:55 PM, Patrick Wendell pwend...@gmail.com wrote:
For textFile I believe we overload it and let you set a codec directly:
There is a repartition method in pyspark master:
https://github.com/apache/spark/blob/master/python/pyspark/rdd.py#L1128
On Wed, Apr 2, 2014 at 2:44 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Update: I'm now using this ghetto function to partition the RDD I get back
when I call
Thanks for pointing that out.
On Wed, Apr 2, 2014 at 6:11 PM, Mark Hamstra m...@clearstorydata.comwrote:
First, you shouldn't be using spark.incubator.apache.org anymore, just
spark.apache.org. Second, saveAsSequenceFile doesn't appear to exist in
the Python API at this point.
On Wed,
Ah, now I see what Aaron was referring to. So I'm guessing we will get this
in the next release or two. Thank you.
On Wed, Apr 2, 2014 at 6:09 PM, Mark Hamstra m...@clearstorydata.comwrote:
There is a repartition method in pyspark master:
Will be in 1.0.0
On Wed, Apr 2, 2014 at 3:22 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Ah, now I see what Aaron was referring to. So I'm guessing we will get
this in the next release or two. Thank you.
On Wed, Apr 2, 2014 at 6:09 PM, Mark Hamstra
What I'd like is a way to capture the information provided on the stages
page (i.e. cluster:4040/stages via IndexPage). Looking through the
Spark code, it doesn't seem like it is possible to directly query for
specific facts such as how many tasks have succeeded or how many total
tasks there
Hi All,
I am intrested in measure the total network I/O, cpu and memory
consumed by Spark job. I tried to find the related information in logs and
Web UI. But there seems no sufficient information. Could anyone give me any
suggestion?
Thanks very much in advance.
--
View this
Hi,
I want to aggregate (time-stamped) event data at daily, weekly and monthly
level stored in a directory in data//mm/dd/dat.gz format. For example:
Each dat.gz file contains tuples in (datetime, id, value) format. I can
perform aggregation as follows:
but this code doesn't seem to be
The driver stores the meta-data associated with the partition, but the
re-computation will occur on an executor. So if several partitions are
lost, e.g. due to a few machines failing, the re-computation can be striped
across the cluster making it fast.
On Wed, Apr 2, 2014 at 11:27 AM, David
Hey Phillip,
Right now there is no mechanism for this. You have to go in through the low
level listener interface.
We could consider exposing the JobProgressListener directly - I think it's
been factored nicely so it's fairly decoupled from the UI. The concern is
this is a semi-internal piece of
Hi Philip,
In the upcoming release of Spark 1.0 there will be a feature that provides
for exactly what you describe: capturing the information displayed on the
UI in JSON. More details will be provided in the documentation, but for
now, anything before 0.9.1 can only go through JobLogger.scala,
Watch out with loading data from gzipped files. Spark cannot parallelize
the load of gzipped files, and if you do not explicitly repartition your
RDD created from such a file, everything you do on that RDD will run on a
single core.
On Wed, Apr 2, 2014 at 8:22 PM, K Koh den...@gmail.com wrote:
Targeting 0.9.0 should work out of the box (just a change to the build.sbt)
- I'll push some changes I've been sitting on to the public repo in the
next couple of days.
On Wed, Apr 2, 2014 at 4:05 AM, Krakna H shankark+...@gmail.com wrote:
Thanks for the update Evan! In terms of using MLI, I
I would suggest to start with cloud hosting if you can, depending on your
usecase, memory requirement may vary a lot .
Regards
Mayur
On Apr 2, 2014 3:59 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hey Steve,
This configuration sounds pretty good. The one thing I would consider is
having
Hi Matei,
How can I run multiple Spark workers per node ? I am running 8 core 10 node
cluster but I do have 8 more cores on each nodeSo having 2 workers per
node will definitely help my usecase.
Thanks.
Deb
On Wed, Apr 2, 2014 at 3:58 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
Hey
Hi,
Shall I send my questions to this Email address?
Sorry for bothering, and thanks a lot!
Yes, please do. :)
On Wed, Apr 2, 2014 at 7:36 PM, weida xu xwd0...@gmail.com wrote:
Hi,
Shall I send my questions to this Email address?
Sorry for bothering, and thanks a lot!
Hi,
We are placing business logic in incoming data stream using Spark
streaming. Here I want to point Shark table to use data coming from Spark
Streaming. Instead of storing Spark streaming to HDFS or other area, is
there a way I can directly point Shark in-memory table to take data from
Spark
Hi,
I'm trying to run script in SHARK(0.81) insert into emp (id,name)
values (212,Abhi) but it doesn't work.
I urgently need direct insert as it is show stopper.
I know that we can do insert into emp select * from xyz.
Here requirement is direct insert.
Does any one tried it ? Or is there
Hi,
I have a small program but I cannot seem to make it connect to the right
properties of the cluster.
I have the SPARK_YARN_APP_JAR, SPARK_JAR and SPARK_HOME set properly.
If I run this scala file, I am seeing that this is never using the
yarn.resourcemanager.address property that I set
I deployed mesos and test it using the exmaple/test-framework script, mesos
seems OK.but when runing spark on the mesos cluster, the mesos slave nodes
report the following exception, any one can help me to fix this ? thanks in
advance:14/04/03 11:24:39 INFO Slf4jLogger: Slf4jLogger started14/04/03
any advice ?
2014-04-03 11:35 GMT+08:00 felix cnwe...@gmail.com:
I deployed mesos and test it using the exmaple/test-framework script,
mesos seems OK. but when runing spark on the mesos cluster, the mesos slave
nodes report the following exception, any one can help me to fix this ?
thanks
I think this is related to a known issue (regression) in 0.9.0. Try using
explicit IP other than loop back.
Sent from a mobile device
On Apr 2, 2014, at 8:53 PM, panfei cnwe...@gmail.com wrote:
any advice ?
2014-04-03 11:35 GMT+08:00 felix cnwe...@gmail.com:
I deployed mesos and test
For various schemaRDD functions like select, where, orderby, groupby etc. I
would like to create expression objects and pass these to the methods for
execution.
Can someone show some examples of how to create expressions for case class
and execute ? E.g., how to create expressions for select,
after upgrading to 0.9.1 , everything goes well now. thanks for the reply.
2014-04-03 13:47 GMT+08:00 andy petrella andy.petre...@gmail.com:
Hello,
It's indeed due to a known bug, but using another IP for the driver won't
be enough (other problems will pop up).
A easy solution would be to
Hi, alll
When I start spark in the shell. It automatically output some system info
every minute, see below. Can I stop or block the output of these info? I
tried the :silent comnond, but the automatical output remains.
14/04/03 19:34:30 INFO MetadataCleaner: Ran metadata cleaner for
Hi,
I know if we call persist with the right options, we can have Spark persist
an RDD's data on disk.
I am wondering what happens in intermediate operations that could
conceivably create large collections/Sequences, like GroupBy and shuffling.
Basically, one part of the question is when is
You can find here a gist that illustrates this issue
https://gist.github.com/jrabary/9953562
I got this with spark from master branch.
On Sat, Mar 29, 2014 at 7:12 PM, Andrew Ash and...@andrewash.com wrote:
Is this spark 0.9.0? Try setting spark.shuffle.spill=false There was a
hash collision
We use avro objects in our project, and have a Kryo serializer for generic
Avro SpecificRecords. Take a look at:
https://github.com/bigdatagenomics/adam/blob/master/adam-core/src/main/scala/edu/berkeley/cs/amplab/adam/serialization/ADAMKryoRegistrator.scala
Also, Matt Massie has a good blog post
This is great news thanks for the update! I will either wait for the
1.0 release or go and test it ahead of time from git rather than trying
to pull it out of JobLogger or creating my own SparkListener.
On 04/02/2014 06:48 PM, Andrew Or wrote:
Hi Philip,
In the upcoming release of Spark
I can appreciate the reluctance to expose something like the
JobProgressListener as a public interface. It's exactly the sort of
thing that you want to deprecate as soon as something better comes along
and can be a real pain when trying to maintain the level of backwards
compatibility that
Indeed, it's how mesos works actually. So the tarball just has to be
somewhere accessible by the mesos slaves. That's why it is often put in
hdfs.
Le 3 avr. 2014 18:46, felix cnwe...@gmail.com a écrit :
So, if I set this parameter, there is no need to copy the spark tarball to
every mesos
701 - 800 of 75449 matches
Mail list logo