I am using hive-0.12.0, and ZKRMStateRoot as RM store class. another step would be compare ZK data for MR job and Hive Job.
On Tue, Apr 1, 2014 at 12:51 PM, Karthik Kambatla <[email protected]>wrote: > It might be a good first step to compare the configurations for the vanilla > MR job and Hive MR job. > > > On Mon, Mar 31, 2014 at 7:06 PM, Azuryy Yu <[email protected]> wrote: > > > Hi Karthik, > > I ram a common MR job, it does work well during RM failover. > > > > job progress: > > (there is failover with red font) > > > > 14/04/01 10:01:38 INFO mapreduce.Job: map 61% reduce 8% > > 14/04/01 10:01:40 INFO mapreduce.Job: map 61% reduce 10% > > 14/04/01 10:01:41 INFO mapreduce.Job: map 62% reduce 10% > > 14/04/01 10:01:44 INFO mapreduce.Job: map 63% reduce 10% > > 14/04/01 10:01:47 INFO mapreduce.Job: map 64% reduce 10% > > 14/04/01 10:02:36 INFO mapreduce.Job: map 60% reduce 0% > > 14/04/01 10:02:40 INFO client.ConfiguredRMFailoverProxyProvider: Failing > > over to rm2 > > 14/04/01 10:03:00 INFO mapreduce.Job: map 63% reduce 0% > > 14/04/01 10:03:02 INFO mapreduce.Job: map 66% reduce 2% > > 14/04/01 10:03:04 INFO mapreduce.Job: map 67% reduce 2% > > 14/04/01 10:03:06 INFO mapreduce.Job: map 69% reduce 2% > > 14/04/01 10:03:08 INFO mapreduce.Job: map 71% reduce 2% > > 14/04/01 10:03:10 INFO mapreduce.Job: map 72% reduce 2% > > > > So Hive job tasks are all restart during failover, please take a look. > > > > > > > > On Tue, Apr 1, 2014 at 7:20 AM, Azuryy <[email protected]> wrote: > > > > > I will run a MR job to verify it. > > > > > > Stop RM means yarn-daemon.sh stop resourcemanager > > > > > > Thanks > > > Sent from my iPhone5s > > > > > > > On 2014年4月1日, at 0:38, Karthik Kambatla <[email protected]> wrote: > > > > > > > > Thanks for reporting this, Azuryy. Indeed, this is surprising. > > > > > > > > I don't quite understand how Hive works; do you mind running a > vanilla > > MR > > > > job and verifying if this is indeed the case. Also, when you say you > > > > stopped the Active RM, you mean only the RM process - correct? > > > > > > > > > > > >> On Mon, Mar 31, 2014 at 3:46 AM, Azuryy Yu <[email protected]> > > wrote: > > > >> > > > >> Hi, > > > >> > > > >> I built from trunk, and configured RM Ha, then I submitted a hive > job. > > > >> total 11 maps, then I stopped active RM when 6 maps finished. > > > >> > > > >> but Hive shows me all map tasks restat again. This is conflict with > > the > > > >> design description. > > > >> > > > >> job progress: > > > >> 2014-03-31 18:44:14,088 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 713.84 sec > > > >> 2014-03-31 18:44:15,128 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 722.83 sec > > > >> 2014-03-31 18:44:16,160 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 731.95 sec > > > >> 2014-03-31 18:44:17,191 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 744.17 sec > > > >> 2014-03-31 18:44:18,220 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 756.22 sec > > > >> 2014-03-31 18:44:19,250 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 762.4 sec > > > >> 2014-03-31 18:44:20,281 Stage-1 map = 68%, reduce = 0%, Cumulative > > CPU > > > >> 774.64 sec > > > >> 2014-03-31 18:44:21,306 Stage-1 map = 70%, reduce = 0%, Cumulative > > CPU > > > >> 786.49 sec > > > >> 2014-03-31 18:44:22,334 Stage-1 map = 70%, reduce = 0%, Cumulative > > CPU > > > >> 792.59 sec > > > >> 2014-03-31 18:44:23,363 Stage-1 map = 73%, reduce = 0%, Cumulative > > CPU > > > >> 807.58 sec > > > >> 2014-03-31 18:44:24,392 Stage-1 map = 77%, reduce = 0%, Cumulative > > CPU > > > >> 815.96 sec > > > >> 2014-03-31 18:44:25,416 Stage-1 map = 80%, reduce = 0%, Cumulative > > CPU > > > >> 823.83 sec > > > >> 2014-03-31 18:44:26,443 Stage-1 map = 80%, reduce = 0%, Cumulative > > CPU > > > >> 826.84 sec > > > >> 2014-03-31 18:44:27,472 Stage-1 map = 82%, reduce = 0%, Cumulative > > CPU > > > >> 832.16 sec > > > >> 2014-03-31 18:44:28,501 Stage-1 map = 84%, reduce = 0%, Cumulative > > CPU > > > >> 839.73 sec > > > >> 2014-03-31 18:44:29,531 Stage-1 map = 86%, reduce = 0%, Cumulative > > CPU > > > >> 844.45 sec > > > >> 2014-03-31 18:44:30,564 Stage-1 map = 82%, reduce = 0%, Cumulative > > CPU > > > >> 760.34 sec > > > >> 2014-03-31 18:44:31,728 Stage-1 map = 0%, reduce = 0% > > > >> 2014-03-31 18:45:06,918 Stage-1 map = 2%, reduce = 0%, Cumulative > CPU > > > >> 213.81 sec > > > >> 2014-03-31 18:45:07,952 Stage-1 map = 2%, reduce = 0%, Cumulative > CPU > > > >> 216.83 sec > > > >> 2014-03-31 18:45:08,979 Stage-1 map = 7%, reduce = 0%, Cumulative > CPU > > > >> 229.15 sec > > > >> 2014-03-31 18:45:10,007 Stage-1 map = 11%, reduce = 0%, Cumulative > > CPU > > > >> 244.42 sec > > > >> 2014-03-31 18:45:11,040 Stage-1 map = 14%, reduce = 0%, Cumulative > > CPU > > > >> 247.31 sec > > > >> 2014-03-31 18:45:12,072 Stage-1 map = 18%, reduce = 0%, Cumulative > > CPU > > > >> 259.5 sec > > > >> 2014-03-31 18:45:13,105 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 274.72 sec > > > >> 2014-03-31 18:45:14,135 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 280.76 sec > > > >> 2014-03-31 18:45:15,170 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 292.9 sec > > > >> 2014-03-31 18:45:16,202 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 305.16 sec > > > >> 2014-03-31 18:45:17,233 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 314.21 sec > > > >> 2014-03-31 18:45:18,264 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 323.34 sec > > > >> 2014-03-31 18:45:19,294 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 335.6 sec > > > >> 2014-03-31 18:45:20,325 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 344.71 sec > > > >> 2014-03-31 18:45:21,355 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 353.8 sec > > > >> 2014-03-31 18:45:22,385 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 366.06 sec > > > >> 2014-03-31 18:45:23,415 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 375.2 sec > > > >> 2014-03-31 18:45:24,449 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 384.28 sec > > > >> 2014-03-31 18:45:25,481 Stage-1 map = 23%, reduce = 0%, Cumulative > > CPU > > > >> 396.54 sec > > > >> 2014-03-31 18:45:26,512 Stage-1 map = 25%, reduce = 0%, Cumulative > > CPU > > > >> 408.72 sec > > > >> 2014-03-31 18:45:27,549 Stage-1 map = 25%, reduce = 0%, Cumulative > > CPU > > > >> 414.69 sec > > > >> 2014-03-31 18:45:28,582 Stage-1 map = 30%, reduce = 0%, Cumulative > > CPU > > > >> 426.99 sec > > > >> 2014-03-31 18:45:29,614 Stage-1 map = 32%, reduce = 0%, Cumulative > > CPU > > > >> 439.25 sec > > > >> 2014-03-31 18:45:30,653 Stage-1 map = 34%, reduce = 0%, Cumulative > > CPU > > > >> 448.25 sec > > > >> 2014-03-31 18:45:31,683 Stage-1 map = 39%, reduce = 0%, Cumulative > > CPU > > > >> 460.5 sec > > > >> 2014-03-31 18:45:32,723 Stage-1 map = 41%, reduce = 0%, Cumulative > > CPU > > > >> 469.63 sec > > > >> 2014-03-31 18:45:33,754 Stage-1 map = 43%, reduce = 0%, Cumulative > > CPU > > > >> 478.67 sec > > > >> > > > > > >
