Hi
After switching from bad github repository I've succeed to run a command.
So now I would like to make the tutorial I've a new notebook
First I enter this code :
val bankText = sc.textFile("s3n://inneractive-parquet/root/bank.zip")
case class Bank(age:Integer, job:String, marital : String, education :
String, balance : Integer)
val bank = bankText.map(s=>s.split(";")).filter(s=>s(0)!="\"age\"").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", ""),
s(3).replaceAll("\"", ""),
s(5).replaceAll("\"", "").toInt
)
)
bank.toDF().registerTempTable("bank")
All is OK I get result as :
bankText: org.apache.spark.rdd.RDD[String] =
s3n://inneractive-parquet/root/bank.zip MapPartitionsRDD[1] at textFile at
<console>:24 defined class Bank bank: org.apache.spark.rdd.RDD[Bank] =
MapPartitionsRDD[4] at map at <console>:28
Now I'm trying to run the sql query
%sql select age, count(1) from bank where age < 30 group by age order by age
The operation never end just running. In the log I can see:
Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient resources
I can't debug the worker as the job worker URL on spark UI give an error on
port 4040:
app-20150518165912-0000
<http://ec2-54-91-146-31.compute-1.amazonaws.com:8080/app?appId=app-20150518165912-0000>
Zeppelin <http://ip-10-123-128-51.ec2.internal:4040/>166.0 GB2015/05/18
16:59:12ubuntuRUNNING5.5 min
The Zepellin like is on 4040
Could you help me to understand what going on ?
Thanks