Hey Matt,
This setting shouldn’t really affect groupBy operations, because they don’t go
through Akka. The frame size setting is for messages from the master to workers
(specifically, sending out tasks), and for results that go directly from
workers to the application (e.g. collect()). So it
I’m not sure you can have a star inside that quoted classpath argument (the
double quotes may cancel the *). Try using the JAR through its full name, or
link to Spark through Maven
(http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-java).
Matei
On Dec 6, 2013,
Hi Kenneth,
1. Is Spark suited for online learning algorithms? From what I’ve read
so far (mainly from this slide), it seems not but I could be wrong.
You can probably use Spark Streaming
(http://spark.incubator.apache.org/docs/latest/streaming-programming-guide.html)
to implement
As I said, it should not affect performance of transformations on RDDs, only of
sending tasks to the workers and getting results back. In general, you want the
Akka frame size to be as small as possible while still holding your largest
task or result; as long as your application isn’t throwing
OK. It is clear.
But what about collect() and collectAsMap()? Is it possible that Spark
throws 'java heap space' error or 'communication error' because of a small
spark.akka.framesize? Currently I set it as 1024.
Thank you!
Best,
Shangyu
2013/12/8 Matei Zaharia matei.zaha...@gmail.com
As I
Hi all,
Sorry for posting this again but I am interested in finding out what different
on disk data formats for storing timeline event and analytics aggregate data.
Currently I am just using newline delimited json gzipped files. I was wondering
if there were any recommendations.
-- Ankur
LZO compression at a minimum, and using Parquet as a second step,
seems like the way to go though I haven't tried either personally yet.
Sent from my mobile phone
On Dec 8, 2013, at 16:54, Ankur Chauhan achau...@brightcove.com wrote:
Hi all,
Sorry for posting this again but I am interested
any thoughs here? I still cannot compile spark using maven, thanks for any
inputs.
On 2013-12-07 2:31 PM, Azuryy Yu azury...@gmail.com wrote:
Hey dears,
Can you give me a maven repo, so I can compile Spark with Maven.
I'm using http://repo1.maven.org/maven2/ currently
but It complains
Yeah, maybe you have weird versions of something published locally. Try
deleting your ~/.m2 and ~/.ivy2 directories and redoing the build.
Unfortunately this will take a while to re-download stuff, but it should work
out.
Matei
On Dec 8, 2013, at 5:21 PM, Mark Hamstra m...@clearstorydata.com
I am not check out from repository, I download source package and build.
On 2013-12-09 9:22 AM, Mark Hamstra m...@clearstorydata.com wrote:
I don't believe that is true of the Spark 0.8.1 code. I just got done
building Spark from the v0.8.1-incubating tag after first removing anything
to do
Hi Mark,
I build the current releast candidate,
It complained during build:
Downloading:
http://repo1.maven.org/maven2/com/typesafe/akka/akka-actor/2.0.5/akka-actor-2.0.5.pom
[WARNING] The POM for com.typesafe.akka:akka-actor:jar:2.0.5 is missing, no
dependency information available
Downloading:
@Mark,
It works now after I changed the seetings.xml, but It would be better if
improve a little Spark document in the section of Building Spark with
Mavenhttp://spark.incubator.apache.org/docs/latest/building-with-maven.html
On Mon, Dec 9, 2013 at 10:45 AM, Azuryy Yu azury...@gmail.com wrote:
Hi Patrick,
I agree this is a very open ended question but I was trying to get a general
answer anyway but I think you did hint on some nuances.
1. My work load is definitely bottlenecked by disk IO just beacause even with a
project on a single column(mostly 2-3 out of 20) there is a lot of
Try to see if that dependency comes via a transitive dependency using a mvn
dependency tree.
Rajika
On Sat, Dec 7, 2013 at 1:31 AM, Azuryy Yu azury...@gmail.com wrote:
Hey dears,
Can you give me a maven repo, so I can compile Spark with Maven.
I'm using http://repo1.maven.org/maven2/
Parquet might be a good fit for you then... it's pretty new and I
don't have a lot of direct experience working with it. But I've seen
examples of people using Spark with Parquet. You might want to
checkout Matt Massie's post here:
http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
This
Thanks for sharing.
On 2013-12-09 11:50 AM, Patrick Wendell pwend...@gmail.com wrote:
Parquet might be a good fit for you then... it's pretty new and I
don't have a lot of direct experience working with it. But I've seen
examples of people using Spark with Parquet. You might want to
checkout
Hi,
When I did
sc.sequenceFile(file, classOf[Text],
classOf[Text]).flatMap(map_func).count()
It gave me result of 365.
However, when I did
sc.sequenceFile(file, classOf[Text],
classOf[Text]).flatMap(map_func).sortByKey().count(),
It threw java.io.NotSerializableException for Key Class returned
Also note that when you add parameters to the -cp flag on the JVM and want
to include multiple jars, the only way to do that is by including an entire
directory with dir/* -- you can't use dir/*jar or dir/spark*jar or
anything else like that.
I did make the classes Serialized. But now running the same command
sc.sequenceFile(file, classOf[Text], classOf[Text]).flatMap(map_
func).sortByKey().count(), gives me java.lang.NoSuchMethodError.
For the Collection class which I made Serialized accesses one static
variable that
static
And Since sortByKey serializes the classes, I guess it has something to do
with Serialization thing.
On Mon, Dec 9, 2013 at 11:19 AM, Archit Thakur archit279tha...@gmail.comwrote:
I did make the classes Serialized. But now running the same command
sc.sequenceFile(file, classOf[Text],
Hi Nick,
Yeah I saw that. I actually used sc.sequenceFile file to load data from
HDFS to RDD. Also both my key class and value class implements
WritableComparable of Hadoop. Still I got the error
java.io.NotSerializableException, When I used sortByKey.
Hierarchy of my classes:
Collection
21 matches
Mail list logo