Ah okay, kind of weird that it worked with a small file. Maybe that was being 
done locally since the file was small.

If you do run into further issues with S3, one other idea is to build Spark 
against a newer version of the Hadoop client library (Spark uses Hadoop’s data 
source classes to read data, so its S3 support comes from that library). You 
can do this by rebuilding Spark with

SPARK_HADOOP_VERSION=2.2.0 sbt/sbt clean assembly

Matei

On Jan 21, 2014, at 3:04 AM, Ognen Duzlevski <[email protected]> wrote:

> On Mon, Jan 20, 2014 at 11:05 PM, Ognen Duzlevski <[email protected]> 
> wrote:
> 
> Thanks. I will try that but your assumption is that something is failing in 
> an obvious way with a message. By the look of the spark-shell - just frozen I 
> would say something is "stuck".  Will report back.
> 
> Given the suspicious nature of the "freezing" of the shell, it looked to me 
> like a timeout or some kind of a "wait".
> 
> I whipped out tcpdump on a node in the cluster and noticed that the nodes try 
> to connect back to master on some (random?) port. I realized that my VPC 
> security group was too restrictive. As soon as I allowed all tcp and udp 
> traffic within the VPC, it magically worked ;)
> 
> So, problem solved. It is not a bug after all, just traffic being blocked.
> 
> In any case, I am documenting this as I go. As soon as I have a viable "data 
> pipeline" in the VPC I will publish something for everyone to read, I figure 
> another experience wouldn't hurt.
> 
> Cheers,
> Ognen 

Reply via email to