Hey,
In
https://spark.incubator.apache.org/docs/0.8.1/configuration.html#configuring-logging
It states:
*Spark uses log4j http://logging.apache.org/log4j/ for logging. You can
configure it by adding a log4j.properties file in the conf directory. One
way to start is to copy the existing
Hi.
I'm using Spark 0.8.0 and launching a AWS cluster using spark-ec2 script.
Typically I can launch spark-shell with no problem. However if I add
protobuf-java-2.5.0.jar to my spark-classpath, I am not able to launch
spark - the workers fail to connect and spark-shell quits.
If I do not add
On Fri, Jan 3, 2014 at 12:11 PM, Shay Seng s...@1618labs.com wrote:
Hi.
I'm using Spark 0.8.0 and launching a AWS cluster using spark-ec2 script.
Typically I can launch spark-shell with no problem. However if I add
protobuf-java-2.5.0.jar to my spark-classpath, I am not able to launch
I've been using JRI to communicate with R from Spark, with some utils to
convert from Scala data types into R datatypes/dataframes etc.
http://www.rforge.net/JRI/
I've been using mapPartitions to push R closures thru JRI and collecting
back the results in Spark. This works reasonably well, though
Hi,
Is there some way to get R-style Data.Frame data structures into RDDs? I've
been using RDD[Seq[]] but this is getting quite error-prone and the code
gets pretty hard to read especially after a few joins, maps etc.
Rather than access columns by index, I would prefer to access them by name.
on that in-memory table structure.
We're planning to harmonize that with the MLBase work in the near future.
Just a matter of prioritization on limited resources. If there's enough
interest we'll accelerate that.
Sent while mobile. Pls excuse typos etc.
On Nov 16, 2013 1:11 AM, Shay Seng s
Hi,
Just wondering what people suggest for joining of 2 RDDs of very different
sizes
I have a sequence of map reduce that will in the end yield me a RDD ~ 500MB
- 800MB that typically has a couple hundred partitions.
After that I want to join that rdd with 2 smaller rdds 1 will be 50MB
on a
cellphone so I'm not sure why RDDs are involved.
On Thu, Nov 14, 2013 at 11:14 AM, Shay Seng s...@1618labs.com wrote:
Hi,
Just wondering what people suggest for joining of 2 RDDs of very
different
sizes
I have a sequence of map reduce that will in the end yield me a RDD ~
500MB
It seems that I need to have the log4j.properties file in the current
directory
So if I launch spark-shell in spark/conf I see that INFO is not displayed.
On Thu, Nov 7, 2013 at 2:16 PM, Shay Seng s...@1618labs.com wrote:
When is the log4j.properties file read... and how can I verify
available as sc.
Type in expressions to have them evaluated.
Type :help for more information.
scala sc.parallelize(1 to 10, 2).count
res0: Long = 10
On Tue, Nov 5, 2013 at 2:36 PM, Shay Seng s...@1618labs.com wrote:
Hi,
I added a log4j.properties file in spark/conf
more ./spark/conf
Hi,
I'm having some problem getting a piece of code that I can run in the REPL
to compile
val aDay = day.map( n=
...
((aInt,bInt),(cInt,dInt,eDbl,fInt,gDbl))
)
val seg = segments.map( n =
...
((aInt,bInt), (..))
)
val allSegs = aDay.join(seg)
error: value join is not a member of
What's the recommended way to save a RDD as a CSV on say HDFS?
Do I have to collect the RDD and save it from the master, or is there
someway I can write out the CSV file in parallel to HDFS?
tks
shay
Hey,
I seeing a funny situation where a piece of code executes in a pure Scala
REPL but not in a Spark-shell.
I'm using Scala 2.9.3 with Spark 0.8.0
In Spark I see:
class Animal() {
def says():String = ???
}
val amimal = new Animal
amimal: this.Animal = Animal@df27cd5
class Zoo[A :
Hi,
I would like to store some data as a seq of protobuf objects. I would of
course need to beable to read that into an RDD and write the RDD back out
in some binary format.
First of all, is this supported natively (or through some download)?
If not, are there examples on how I might write my
Hi,
I've been trying to use the spark-ec2 launch scripts have have some
comments on it, not sure if this is the best place to post ...
(1) On the AMI image, most of the modeule's init.sh file has the following
idiom:
if [ -d spark ]; then
echo Spark seems to be installed. Exiting.
exit 0
Inlined.
On Wed, Oct 2, 2013 at 1:00 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
Hi Shangyu,
(1) When we read in a local file by SparkContext.textFile and do some
map/reduce job on it, how will spark decide to send data to which worker
node? Will the data be divided/partitioned equally
called on a local file will magically turn that local
file into a distributed file and allow more than just the node where the
file is local to process that file.
On Thu, Oct 3, 2013 at 11:05 AM, Shay Seng s...@1618labs.com wrote:
Inlined.
On Wed, Oct 2, 2013 at 1:00 PM, Matei Zaharia
PM, Mark Hamstra m...@clearstorydata.comwrote:
But the worker has to be on a node that has local access to the file.
On Thu, Oct 3, 2013 at 12:30 PM, Shay Seng s...@1618labs.com wrote:
Ok, even if my understanding of allowLocal is incorrect, nevertheless
(1) I'm loading a local file
(2
can ssh to the worker machine, and look at the work folder in Spark.
--
Reynold Xin, AMPLab, UC Berkeley
http://rxin.org
On Sat, Sep 21, 2013 at 12:30 PM, Shay Seng s...@1618labs.com wrote:
Hey,
I've been struggling to set up a work flow with spark. I'm basically
using the AMI
Hey all.
I've been getting OutOfMemory Java errors.
13/09/20 18:54:37 ERROR actor.ActorSystemImpl: Uncaught error from thread
[spark-akka.actor.default-dispatcher-5] shutting down JVM since
'akka.jvm-exit-on-fatal-error' is enabled
java.lang.OutOfMemoryError: Java heap space
at
Please ignore this, was being dumb, mixture of typo and mis(not)reading the
docs.
On Fri, Sep 20, 2013 at 12:04 PM, Shay Seng s...@1618labs.com wrote:
Hey all.
I've been getting OutOfMemory Java errors.
13/09/20 18:54:37 ERROR actor.ActorSystemImpl: Uncaught error from thread
[spark
21 matches
Mail list logo