Python 2.7 + numpy break sortByKey()

2014-03-01 Thread nicholas.chammas
Unexpected behavior. Here's the repro: 1. Launch an EC2 cluster with spark-ec2. 1 slave; default instance type. 2. Upgrade the cluster to Python 2.7 using the instructions here

spark-ec2 login expects at least 1 slave

2014-03-01 Thread nicholas.chammas
I successfully launched a Spark EC2 "cluster" with 0 slaves using spark-ec2. When trying to login to the master node with spark-ec2 login, I get the following: Searching for existing cluster test-blah... Found 1 master(s), 0 slaves ERROR: Could not find slaves in group test-blah-slaves Is this

java.net.SocketException on reduceByKey() in pyspark

2014-02-28 Thread nicholas.chammas
I've done a whole bunch of things to this RDD, and now when I try to sortByKey(), this is what I get: >>> flattened_po.flatMap(lambda x: map_to_database_types(x)).sortByKey()14/02/28 23:18:41 INFO spark.SparkContext: Starting job: sortByKey at :114/02/28 23:18:41 INFO scheduler.DAGScheduler: Got j

Running Spark with Python 2.7.5+

2014-02-27 Thread nicholas.chammas
The provided Spark EC2 scriptsand default AMI ship with Python 2.6.8. I would like to use Python 2.7.5 or later. I believe that among the 2.x versions, 2.7 is the most popular. What's the easiest way to get my Spark cluster on Python