Can't execute SQL on Ec2 Spark cluster

Richard Grossman Mon, 18 May 2015 10:38:47 -0700

Hi

After switching from bad github repository I've succeed to run a command.
So now I would like to make the tutorial I've a new notebook
First I enter this code :
val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")


case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)

val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
    s=>Bank(s(0).toInt,
            s(1).replaceAll("\"", ""),
            s(2).replaceAll("\"", ""),
            s(3).replaceAll("\"", ""),
            s(5).replaceAll("\"", "").toInt
        )
)

bank.toDF().registerTempTable("bank")

All is OK I get result as :
bankText: org.apache.spark.rdd.RDD[String] =
s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
<console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
MapPartitionsRDD[4] at map at <console>:28

Now I'm trying to run the sql query
%sql select age, count(1) from bank where age < 30 group by age order by age

The operation never end just running. In the log I can see:

Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient resources

I can't debug the worker as the job worker URL on spark UI give an error on
port 4040:
app-20150518165912-0000
<http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
16:59:12ubuntuRUNNING5.5 min
The Zepellin like is on 4040
Could you help me to understand what going on ?

Thanks

Can't execute SQL on Ec2 Spark cluster

Reply via email to