Hi, Please type `sqlCtx.sql("select * .... ").explain` to show execution plans. Also, you can kill jobs from webUI.
// maropu On Thu, Aug 4, 2016 at 4:58 PM, Marco Colombo <ing.marco.colo...@gmail.com> wrote: > Hi all, I've a question on how hive+spark are handling data. > > I've started a new HiveContext and I'm extracting data from cassandra. > I've configured spark.sql.shuffle.partitions=10. > Now, I've following query: > > select d.id, avg(d.avg) from v_points d where id=90 group by id; > > I see that 10 task are submitted and execution is fast. Every id on that > table has 2000 samples. > > But if I just add a new id, as: > > select d.id, avg(d.avg) from v_points d where id=90 or id=2 group by id; > > it adds 663 task and query does not end. > > If I write query with in () like > > select d.id, avg(d.avg) from v_points d where id in (90,2) group by id; > > query is again fast. > > How can I get the 'execution plan' of the query? > > And also, how can I kill the long running submitted tasks? > > Thanks all! > -- --- Takeshi Yamamuro