@ xqflying, There were a few shuffle issues fixed post 0.5.3 which you might be hitting.
TEZ-2214. FetcherOrderedGrouped can get stuck indefinitely when MergeManager misses memToDiskMerging TEZ-1923. FetcherOrderedGrouped gets into infinite loop due to memory pressure You can probably try the 0.5.4 release ( once it comes out within the next week or so ) or try applying the patches from the jiras above. thanks — Hitesh On May 10, 2015, at 1:12 AM, [email protected] wrote: > i have encountered similar problem before,for work around, i used mr instead. > i used tez 0.53at that time. and at that time shuffle keep running for ever. > which version of tez r u using? > > > On 2015-05-06 06:02 , Hitesh Shah Wrote: > > This might be a mail that is better suited for the user@hive mailing list to > start with. > > thanks > — Hitesh > > On May 5, 2015, at 12:58 AM, [email protected] wrote: > > > I change the sql where condition to (where t.update_time >= '2015-05-04') > > , the sql can return result for a while. Because t.update_time >= > > '2015-05-04' can filter many row when table scan. But why change where > > condition to (where t.update_time >= '2015-05-04' or > > length(t8.end_user_id)>0) ,the sql run forever as follows: > > Status: Running (Executing on YARN cluster with App id > > application_1419300485749_1419769) > > > > -------------------------------------------------------------------------------- > > > > VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED > > -------------------------------------------------------------------------------- > > > > Map 1 .......... SUCCEEDED 1 1 0 0 0 0 > > Map 10 ......... SUCCEEDED 3 3 0 0 0 0 > > Map 11 ......... SUCCEEDED 151 151 0 0 0 0 > > Map 12 ......... SUCCEEDED 1 1 0 0 0 0 > > Map 13 ......... SUCCEEDED 76 76 0 0 0 0 > > Map 5 .......... SUCCEEDED 11 11 0 0 0 0 > > Map 7 .......... SUCCEEDED 156 156 0 0 0 0 > > Map 9 .......... SUCCEEDED 10 10 0 0 0 0 > > Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0 > > Reducer 3 ..... RUNNING 642 641 1 0 0 0 > > Reducer 4 RUNNING 1009 0 89 920 0 0 > > Reducer 6 ...... SUCCEEDED 3 3 0 0 0 0 > > Reducer 8 ...... SUCCEEDED 203 203 0 0 0 0 > > -------------------------------------------------------------------------------- > > > > VERTICES: 11/13 [==============>>------------] 55% ELAPSED TIME: 307.54 s > > > > What is the root cause ? > > > > [email protected] > > <sql.txt><queryplan.TXT> > > >
