from:"Paul Schooss"

Re: CDH 5.0 and Spark 0.9.0

2014-05-05 Thread Paul Schooss

Hello Sean,

Thanks a bunch, I am not currently working HA mode.

The configuration is identical to our CDH4 setup which perfectly fine. It's
really strange how only spark breaks with this enabled.

On Thu, May 1, 2014 at 3:06 AM, Sean Owen  wrote:

> This codec does require native libraries to be installed, IIRC, but
> they are installed with CDH 5.
>
> The error you show does not look related though. Are you sure your HA
> setup is working and that you have configured it correctly in whatever
> config spark is seeing?
> --
> Sean Owen | Director, Data Science | London
>
>
> On Thu, May 1, 2014 at 12:44 AM, Paul Schooss 
> wrote:
> > Hello,
> >
> > So I was unable to run the following commands from the spark shell with
> CDH
> > 5.0 and spark 0.9.0, see below.
> >
> > Once I removed the property
> >
> > 
> > io.compression.codec.lzo.class
> > com.hadoop.compression.lzo.LzoCodec
> > true
> > 
> >
> > from the core-site.xml on the node, the spark commands worked. Is there a
> > specific setup I am missing?
> >
> > scala> var log = sc.textFile("hdfs://jobs-ab-hnn1//input/core-site.xml")
> > 14/04/30 22:43:16 INFO MemoryStore: ensureFreeSpace(78800) called with
> > curMem=150115, maxMem=308713881
> > 14/04/30 22:43:16 INFO MemoryStore: Block broadcast_1 stored as values to
> > memory (estimated size 77.0 KB, free 294.2 MB)
> > 14/04/30 22:43:16 WARN Configuration: mapred-default.xml:an attempt to
> > override final parameter: mapreduce.tasktracker.cache.local.size;
> Ignoring.
> > 14/04/30 22:43:16 WARN Configuration: yarn-site.xml:an attempt to
> override
> > final parameter: mapreduce.output.fileoutputformat.compress.type;
> Ignoring.
> > 14/04/30 22:43:16 WARN Configuration: hdfs-site.xml:an attempt to
> override
> > final parameter: mapreduce.map.output.compress.codec; Ignoring.
> > log: org.apache.spark.rdd.RDD[String] = MappedRDD[3] at textFile at
> > :12
> >
> > scala> log.count()
> > 14/04/30 22:43:03 WARN JobConf: The variable mapred.child.ulimit is no
> > longer used.
> > 14/04/30 22:43:04 WARN Configuration: mapred-default.xml:an attempt to
> > override final parameter: mapreduce.tasktracker.cache.local.size;
> Ignoring.
> > 14/04/30 22:43:04 WARN Configuration: yarn-site.xml:an attempt to
> override
> > final parameter: mapreduce.output.fileoutputformat.compress.type;
> Ignoring.
> > 14/04/30 22:43:04 WARN Configuration: hdfs-site.xml:an attempt to
> override
> > final parameter: mapreduce.map.output.compress.codec; Ignoring.
> > java.lang.IllegalArgumentException: java.net.UnknownHostException:
> > jobs-a-hnn1
> > at
> >
> org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
> > at
> >
> org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
> > at
> >
> org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
> > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:576)
> > at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:521)
> > at
> >
> org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
> > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
> > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
> > at
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
> > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
> > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
> > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
> > at
> >
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
> > at
> >
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
> > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> > at scala.Option.getOrElse(Option.scala:120)
> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
> > at scala.Option.getOrElse(Option.scala:120)
> > at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> > at org.apache.spark.SparkContext.runJob(SparkContext.scala:902)
> > at org.apache.spark.rdd.RDD.count(RDD.sc

CDH 5.0 and Spark 0.9.0

2014-04-30 Thread Paul Schooss

Hello,

So I was unable to run the following commands from the spark shell with CDH
5.0 and spark 0.9.0, see below.

Once I removed the property


io.compression.codec.lzo.class
com.hadoop.compression.lzo.LzoCodec
true


from the core-site.xml on the node, the spark commands worked. Is there a
specific setup I am missing?

scala> var log = sc.textFile("hdfs://jobs-ab-hnn1//input/core-site.xml")
14/04/30 22:43:16 INFO MemoryStore: ensureFreeSpace(78800) called with
curMem=150115, maxMem=308713881
14/04/30 22:43:16 INFO MemoryStore: Block broadcast_1 stored as values to
memory (estimated size 77.0 KB, free 294.2 MB)
14/04/30 22:43:16 WARN Configuration: mapred-default.xml:an attempt to
override final parameter: mapreduce.tasktracker.cache.local.size; Ignoring.
14/04/30 22:43:16 WARN Configuration: yarn-site.xml:an attempt to override
final parameter: mapreduce.output.fileoutputformat.compress.type; Ignoring.
14/04/30 22:43:16 WARN Configuration: hdfs-site.xml:an attempt to override
final parameter: mapreduce.map.output.compress.codec; Ignoring.
log: org.apache.spark.rdd.RDD[String] = MappedRDD[3] at textFile at
:12

scala> log.count()
14/04/30 22:43:03 WARN JobConf: The variable mapred.child.ulimit is no
longer used.
14/04/30 22:43:04 WARN Configuration: mapred-default.xml:an attempt to
override final parameter: mapreduce.tasktracker.cache.local.size; Ignoring.
14/04/30 22:43:04 WARN Configuration: yarn-site.xml:an attempt to override
final parameter: mapreduce.output.fileoutputformat.compress.type; Ignoring.
14/04/30 22:43:04 WARN Configuration: hdfs-site.xml:an attempt to override
final parameter: mapreduce.map.output.compress.codec; Ignoring.
java.lang.IllegalArgumentException: java.net.UnknownHostException:
jobs-a-hnn1
at
org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:377)
at
org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:237)
at
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:141)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:576)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:521)
at
org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:146)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2397)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:902)
at org.apache.spark.rdd.RDD.count(RDD.scala:720)
at $iwC$$iwC$$iwC$$iwC.(:15)
at $iwC$$iwC$$iwC.(:20)
at $iwC$$iwC.(:22)
at $iwC.(:24)
at (:26)
at .(:30)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:795)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:840)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:752)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:607)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:610)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:935)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:883)
at
scala.tools.nsc.util.ScalaC

Re: JMX with Spark

2014-04-25 Thread Paul Schooss

Hello Folks,

Sorry for the delay, these emails got missed due to the volume.

Here is my metrics.conf


root@jobs-ab-hdn4:~# cat /opt/klout/spark/conf/metrics.conf
#  syntax: [instance].sink|source.[name].[options]=[value]

#  This file configures Spark's internal metrics system. The metrics system
is
#  divided into instances which correspond to internal components.
#  Each instance can be configured to report its metrics to one or more
sinks.
#  Accepted values for [instance] are "master", "worker", "executor",
"driver",
#  and "applications". A wild card "*" can be used as an instance name, in
#  which case all instances will inherit the supplied property.
#
#  Within an instance, a "source" specifies a particular set of grouped
metrics.
#  there are two kinds of sources:
#1. Spark internal sources, like MasterSource, WorkerSource, etc, which
will
#collect a Spark component's internal state. Each instance is paired
with a
#Spark source that is added automatically.
#2. Common sources, like JvmSource, which will collect low level state.
#These can be added through configuration options and are then loaded
#using reflection.
#
#  A "sink" specifies where metrics are delivered to. Each instance can be
#  assigned one or more sinks.
#
#  The sink|source field specifies whether the property relates to a sink or
#  source.
#
#  The [name] field specifies the name of source or sink.
#
#  The [options] field is the specific property of this source or sink. The
#  source or sink is responsible for parsing this property.
#
#  Notes:
#1. To add a new sink, set the "class" option to a fully qualified class
#name (see examples below).
#2. Some sinks involve a polling period. The minimum allowed polling
period
#is 1 second.
#3. Wild card properties can be overridden by more specific properties.
#For example, master.sink.console.period takes precedence over
#*.sink.console.period.
#4. A metrics specific configuration
#"spark.metrics.conf=${SPARK_HOME}/conf/metrics.properties" should be
#added to Java properties using -Dspark.metrics.conf=xxx if you want to
#customize metrics system. You can also put the file in
${SPARK_HOME}/conf
#and it will be loaded automatically.
#5. MetricsServlet is added by default as a sink in master, worker and
client
#driver, you can send http request "/metrics/json" to get a snapshot of
all the
#registered metrics in json format. For master, requests
"/metrics/master/json" and
#"/metrics/applications/json" can be sent seperately to get metrics
snapshot of
#instance master and applications. MetricsServlet may not be configured
by self.
#

## List of available sinks and their properties.

# org.apache.spark.metrics.sink.ConsoleSink
#   Name:   Default:   Description:
#   period  10 Poll period
#   unitsecondsUnits of poll period

# org.apache.spark.metrics.sink.CSVSink
#   Name: Default:   Description:
#   period10 Poll period
#   unit  secondsUnits of poll period
#   directory /tmp   Where to store CSV files

# org.apache.spark.metrics.sink.GangliaSink
#   Name: Default:   Description:
#   host  NONE   Hostname or multicast group of Ganglia server
#   port  NONE   Port of Ganglia server(s)
#   period10 Poll period
#   unit  secondsUnits of poll period
#   ttl   1  TTL of messages sent by Ganglia
#   mode  multicast  Ganglia network mode ('unicast' or 'mulitcast')

#org.apache.spark.metrics.sink.JmxSink

# org.apache.spark.metrics.sink.MetricsServlet
#   Name: Default:   Description:
#   path  VARIES*Path prefix from the web server root
#   samplefalse  Whether to show entire set of samples for
histograms ('false' or 'true')
#
# * Default path is /metrics/json for all instances except the master. The
master has two paths:
# /metrics/aplications/json # App information
# /metrics/master/json  # Master information

# org.apache.spark.metrics.sink.GraphiteSink
#   Name: Default:  Description:
#   host  NONE  Hostname of Graphite server
#   port  NONE  Port of Graphite server
#   period10Poll period
#   unit  seconds   Units of poll period
#   prefixEMPTY STRING  Prefix to prepend to metric name

## Examples
# Enable JmxSink for all instances by class name
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink

# Enable ConsoleSink for all instances by class name
#*.sink.console.class=org.apache.spark.metrics.sink.ConsoleSink

# Polling period for ConsoleSink
#*.sink.console.period=10

#*.sink.console.unit=seconds

# Master instance overlap polling period
#master.sink.console.period=15

#master.sink.console.unit=seconds

# Enable CsvSink for all instances
#*.sink.csv.class=org.apache.spark.metrics.sink.CsvSink

# Polling period for CsvSink
#*.sink.csv.period=1

#*.sink.csv.unit=minutes

# Polling directory for CsvSink

JMX with Spark

2014-04-15 Thread Paul Schooss

Has anyone got this working? I have enabled the properties for it in the
metrics.conf file and ensure that it is placed under spark's home
directory. Any ideas why I don't see spark beans ?

Re: Can't run a simple spark application with 0.9.1

2014-04-15 Thread Paul Schooss

I am a dork please disregard this issue. I did not have the slaves
correctly configured. This error is very misleading


On Tue, Apr 15, 2014 at 11:21 AM, Paul Schooss wrote:

> Hello,
>
> Currently I deployed 0.9.1 spark using a new way of starting up spark
>
> exec start-stop-daemon --start --pidfile /var/run/spark.pid
> --make-pidfile --chuid ${SPARK_USER}:${SPARK_GROUP} --chdir ${SPARK_HOME}
> --exec /usr/bin/java -- -cp ${CLASSPATH}
> -Dcom.sun.management.jmxremote.authenticate=false
> -Dcom.sun.management.jmxremote.ssl=false
> -Dcom.sun.management.jmxremote.port=10111
> -Dspark.akka.logLifecycleEvents=true -Djava.library.path=
> -XX:ReservedCodeCacheSize=512M -XX:+UseCodeCacheFlushing
> -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC
> -Dspark.executor.memory=10G -Xmx10g ${MAIN_CLASS} ${MAIN_CLASS_ARGS}
>
>
> where class path points to the spark jar that we compile with sbt. When I
> try to run a job I receive the following warning
>
> WARN TaskSchedulerImpl: Initial job has not accepted any resources; check
> your cluster UI to ensure that workers are registered and have sufficient
> memory
>
>
> My first question is do I need the entire spark project on disk in order
> to run jobs? Or what else am I doing wrong?
>

Can't run a simple spark application with 0.9.1

2014-04-15 Thread Paul Schooss

Hello,

Currently I deployed 0.9.1 spark using a new way of starting up spark

exec start-stop-daemon --start --pidfile /var/run/spark.pid
--make-pidfile --chuid ${SPARK_USER}:${SPARK_GROUP} --chdir ${SPARK_HOME}
--exec /usr/bin/java -- -cp ${CLASSPATH}
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.port=10111
-Dspark.akka.logLifecycleEvents=true -Djava.library.path=
-XX:ReservedCodeCacheSize=512M -XX:+UseCodeCacheFlushing
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC
-Dspark.executor.memory=10G -Xmx10g ${MAIN_CLASS} ${MAIN_CLASS_ARGS}


where class path points to the spark jar that we compile with sbt. When I
try to run a job I receive the following warning

WARN TaskSchedulerImpl: Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory


My first question is do I need the entire spark project on disk in order to
run jobs? Or what else am I doing wrong?

Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

2014-03-25 Thread Paul Schooss

Andrew,

I ran into the same problem and eventually settled on just running the jars
directly with java. Since we use sbt to build our jars we had all the
dependancies builtin to the jar it self so need for random class paths.


On Tue, Mar 25, 2014 at 1:47 PM, Andrew Lee  wrote:

> Hi All,
>
> I'm getting the following error when I execute start-master.sh which also
> invokes spark-class at the end.
>
> Failed to find Spark assembly in /root/spark/assembly/target/scala-2.10/
>
> You need to build Spark with 'sbt/sbt assembly' before running this
> program.
>
>
> After digging into the code, I see the CLASSPATH is hardcoded with "
> spark-assembly.*hadoop.*.jar".
>
> In bin/spark-class :
>
>
> if [ ! -f "$FWDIR/RELEASE" ]; then
>
>   # Exit if the user hasn't compiled Spark
>
> *  num_jars=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep
> "spark-assembly.*hadoop.*.jar" | wc -l)*
>
> *  jars_list=$(ls "$FWDIR"/assembly/target/scala-$SCALA_VERSION/ | grep
> "spark-assembly.*hadoop.*.jar")*
>
>   if [ "$num_jars" -eq "0" ]; then
>
> echo "Failed to find Spark assembly in
> $FWDIR/assembly/target/scala-$SCALA_VERSION/" >&2
>
> echo "You need to build Spark with 'sbt/sbt assembly' before running
> this program." >&2
>
> exit 1
>
>   fi
>
>   if [ "$num_jars" -gt "1" ]; then
>
> echo "Found multiple Spark assembly jars in
> $FWDIR/assembly/target/scala-$SCALA_VERSION:" >&2
>
> echo "$jars_list"
>
> echo "Please remove all but one jar."
>
> exit 1
>
>   fi
>
> fi
>
>
> Is there any reason why this is only grabbing spark-assembly.**hadoop*.*.jar
> ? I am trying to run Spark that links to my own version of Hadoop under
> /opt/hadoop23/,
>
> and I use 'sbt/sbt clean package' to build the package without the Hadoop
> jar. What is the correct way to link to my own Hadoop jar?
>
>
>
>

NoClassFound Errors with Streaming Twitter

2014-03-13 Thread Paul Schooss

Hello Folks,

We have a strange issue going on with a spark standalone cluster in which a
simple test application is having a hard time using external classes. Here
are the details

The application is located here:

https://github.com/prantik/spark-example

We use classes such as spark's streaming twitter and twitter4j to stream
the twitter hose. We use sbt to build the jar that we execute against the
cluster. We have verified that the jar contains these classes.

jar tf /opt/SimpleProject-assembly-1.1.jar  | grep twitter4j/Status
twitter4j/Status.class

jar tf /opt/SimpleProject-assembly-1.1.jar  | grep twitter/TwitterRec
org/apache/spark/streaming/twitter/TwitterReceiver$$anon$1.class
org/apache/spark/streaming/twitter/TwitterReceiver$$anonfun$onStart$1.class
org/apache/spark/streaming/twitter/TwitterReceiver$$anonfun$onStop$1.class
org/apache/spark/streaming/twitter/TwitterReceiver.class


Also we ensure that even the slaves have class paths set to reference these
libraries.

ps auxww | grep spark on a slave

/usr/bin/java -cp
/opt/spark/external/twitter/target/spark-streaming-twitter_2.10-0.9.0-incubating.jar:/opt/spark/tools/target/spark-tools_2.10-0.9.0-incubating.jar:/opt/spark/assembly/target/scala-2.10/spark-assembly_2.10-0.9.0-incubating-hadoop2.2.0.jar


However we encounter the following error when running the application the
slaves

14/03/13 21:20:42 INFO Executor: Running task ID 74
14/03/13 21:20:42 ERROR Executor: Exception in task ID 74
java.lang.ClassNotFoundException: twitter4j.Status


We don't know how to address the class not being found. Any ideas?

Regards,

Paul

Re: Applications for Spark on HDFS

2014-03-11 Thread Paul Schooss

Thanks Sandy,

I have not taken advantage of that yet but will research how to invoke that
option when submitting the application to the spark master. Currently I am
running a standalone spark master and using the run-class script to invoke
the application we crafted as a test.

On Tue, Mar 11, 2014 at 5:09 PM, Sandy Ryza  wrote:

> Hi Paul,
>
> What do you mean by distributing the jars manually?  If you register jars
> that are local to the client with SparkContext.addJars, Spark should handle
> distributing them to the workers.  Are you taking advantage of this?
>
> -Sandy
>
>
> On Tue, Mar 11, 2014 at 3:09 PM, Paul Schooss wrote:
>
>> Hello Folks,
>>
>> I was wondering if anyone had experience placing application jars for
>> Spark onto HDFS. Currently I have distributing the jars manually and would
>> love to source the jar via HDFS a la distributed caching with MR. Any
>> ideas?
>>
>> Regards,
>>
>> Paul
>>
>
>

Applications for Spark on HDFS

2014-03-11 Thread Paul Schooss

Hello Folks,

I was wondering if anyone had experience placing application jars for Spark
onto HDFS. Currently I have distributing the jars manually and would love
to source the jar via HDFS a la distributed caching with MR. Any ideas?

Regards,

Paul

Re: CDH 5.0 and Spark 0.9.0

CDH 5.0 and Spark 0.9.0

Re: JMX with Spark

JMX with Spark

Re: Can't run a simple spark application with 0.9.1

Can't run a simple spark application with 0.9.1

Re: Spark 0.9.1 - How to run bin/spark-class with my own hadoop jar files?

NoClassFound Errors with Streaming Twitter

Re: Applications for Spark on HDFS

Applications for Spark on HDFS

10 matches

Site Navigation

Mail list logo

Footer information