This is likely because hdfs's core-site.xml (or something similar) provides
an "fs.default.name" which changes the default FileSystem and Spark uses
the Hadoop FileSystem API to resolve paths. Anyway, your solution is
definitely a good one -- another would be to remote hdfs from Spark's
classpath i
Hi:
I believe I figured out how the behavior here:
A file specified to SparkContext like this '/PATH/TO/SOME/FILE':
* Will be interpreted as 'HDFS://path/to/some/file', when settings for
HDFS are present in '/ETC/HADOOP/CONF/*-SITE.XML'.
* Will be interpreted as 'FILE:///pa
Hi Alton:
Thanks for the reply. I just wanted to build/use it from scratch to get
a better intuition of what's a happening.
Btw, using the binaries provided by Cloudera/CDH5 yielded the same issue
as my compiled version (i.e. it, too,
tried to access the HDFS / Name Node. Same exact error).
I am doing the exact same thing for the purpose of learning. I also
don't have a hadoop cluster and plan to scale on ec2 as soon as I get
it working locally.
I am having good success just using the binaries on and not compiling
from source... Is there a reason why you aren't just using the
binarie
Hello friends:
I recently compiled and installed Spark v0.9 from the Apache distribution.
Note: I have the Cloudera/CDH5 Spark RPMs co-installed as well
(actually, the
entire big-data suite from CDH is installed), but for the moment I'm
using my
manually built Apache Spark for 'ground-up' lea