Jey, On Mon, Jan 20, 2014 at 10:59 PM, Jey Kottalam <[email protected]> wrote:
> >> This sounds like either a bug or somehow the S3 library requiring lots > of > >> memory to read a block. There isn’t a separate way to run HDFS over S3. > >> Hadoop just has different implementations of “file systems”, one of > which is > >> S3. There’s a pointer to these versions at the bottom of > >> > http://spark.incubator.apache.org/docs/latest/ec2-scripts.html#accessing-data-in-s3 > >> but it is indeed pretty hidden in the docs. > > > > > > Hmmm. Maybe a bug then. If I read a small 600 byte file via the s3n:// > uri - > > it works on a spark cluster. If I try a 20GB file it just sits and sits > and > > sits frozen. Is there anything I can do to instrument this and figure out > > what is going on? > > > > Try taking a look at the stderr log of the executor that failed. You > should hopefully see a more detailed error message there. The stderr > logs can be found by browsing to http://mymaster:8080, where > `mymaster` is the hostname of your Spark master. > Thanks. I will try that but your assumption is that something is failing in an obvious way with a message. By the look of the spark-shell - just frozen I would say something is "stuck". Will report back. Thanks, Ognen
