On Dec. 15, 2016, 4:49 p.m., Jaimin Jetly wrote: > > One thing I don't quite see here (and it could be due to the size of the > > patch) is what happens in these two cases: > > - Something goes wrong when trying to store a task's status. How does the > > system recover and mark it completed? > > - What about waiting until a request is HOLDING and then restarting Ambari > > - will the relevent maps get re-populated? > > Jaimin Jetly wrote: > >> - Something goes wrong when trying to store a task's status. How does > the system recover and mark it completed? > > This work only adds logic to add/update stage and request status. The way > task status is being updated or the logic for system to recover from anything > that goes wrong when storing task status is not changed. > This work ensures that task status update, respective stage status update > and respective request status update happens inside same transactional > boundary. Thus all three entities remains consistent in the status they show. > This work does not add any recovery logic and piggybacks on existing > failure recovery mechanism for updating task status. Thus if something goes > wrong storing task status then stage/request status will also be not store > and vice-versa. Next time when ambari-agent sends command reports again then > task update and respective stage/request status update should also also get > updated successfully. > > >> - What about waiting until a request is HOLDING and then restarting > Ambari - will the relevent maps get re-populated? > > Yes, everytime ambari-server starts, we check for all stages in > HostRoleStatus.IN_PROGRESS_STATUSES and publish an event with the tasks. > these will repopulate the maps. The patch adds that logic with > "publishInProgressTasks(stages)" method in ActionScheduler.java > > I have tested that scenario of restarting ambari-server when a request is > ongoing with in progress tasks and validating that stages and requests status > is correctly updated > > Sid Wagle wrote: > The agent re-sending command reports are we generally fault tolerant in > that area? Sicne you are probably mostly up-to-date on that part of the code > can you shed some light on loss of task status scenario. Do we recover the > correct status?
Yes. Largely. Although There are following specific cases: 1) If persisting of "request, it's stages and it's host role commands" for the first time does not happen then the request will be completely missed and ambari-server will never log in the database or track it in anyways. 2) If "request, it's stages and it's host role commands" are persisted but subsequent updates and DB merge keeps failing then request will be eventually declared timeout and so will all hrcs As far as DB merge from subsequent agent command reports is successfully committed to DB, earlier failures will be recovered. - Jaimin ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/53686/#review159315 ----------------------------------------------------------- On Jan. 18, 2017, 10:33 p.m., Jaimin Jetly wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/53686/ > ----------------------------------------------------------- > > (Updated Jan. 18, 2017, 10:33 p.m.) > > > Review request for Ambari, Jonathan Hurley, Nate Cole, Sumit Mohanty, and Sid > Wagle. > > > Bugs: AMBARI-18868 > https://issues.apache.org/jira/browse/AMBARI-18868 > > > Repository: ambari > > > Description > ------- > > Stage and Request status should be persisted in the database. > > upgrading to ambari-3.0.0 should add status for all present stages and > request for the cluster. > > > Diffs > ----- > > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionDBAccessorImpl.java > 7837a7b > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/ActionScheduler.java > dabcb98 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/HostRoleStatus.java > 3656bfe > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Request.java > 31e11c1 > > ambari-server/src/main/java/org/apache/ambari/server/actionmanager/Stage.java > 4a05b32 > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/CalculatedStatus.java > 3c415df > > ambari-server/src/main/java/org/apache/ambari/server/events/TaskCreateEvent.java > PRE-CREATION > ambari-server/src/main/java/org/apache/ambari/server/events/TaskEvent.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/TaskUpdateEvent.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListener.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/events/publishers/TaskEventPublisher.java > PRE-CREATION > > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/HostRoleCommandDAO.java > 02c4091 > > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/RequestDAO.java > 1c4d0a3 > ambari-server/src/main/java/org/apache/ambari/server/orm/dao/StageDAO.java > d2f899f > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/HostRoleCommandEntity.java > 74271b9 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/RequestEntity.java > 7944d21 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntity.java > f9c8810 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/StageEntityPK.java > 9ca0470 > > ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog300.java > 4f90ef3 > ambari-server/src/main/resources/Ambari-DDL-Derby-CREATE.sql b79c945 > ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1c502bc > ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql c6d4ad0 > ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 1be87bb > ambari-server/src/main/resources/Ambari-DDL-SQLAnywhere-CREATE.sql abe48e8 > ambari-server/src/main/resources/Ambari-DDL-SQLServer-CREATE.sql 169a464 > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionDBAccessorImpl.java > 1ca777d > > ambari-server/src/test/java/org/apache/ambari/server/actionmanager/TestActionScheduler.java > 6cc511e > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeResourceProviderTest.java > a702e6f > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/UpgradeSummaryResourceProviderTest.java > e398a54 > > ambari-server/src/test/java/org/apache/ambari/server/events/listeners/tasks/TaskStatusListenerTest.java > PRE-CREATION > > ambari-server/src/test/java/org/apache/ambari/server/orm/dao/UpgradeDAOTest.java > ae85241 > > ambari-server/src/test/java/org/apache/ambari/server/state/services/RetryUpgradeActionServiceTest.java > 7dd9932 > > ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog300Test.java > d7979e8 > > Diff: https://reviews.apache.org/r/53686/diff/ > > > Testing > ------- > > Verified manually on a cluster by making api requests and upgrading ambari. > Add unit tests. > Verified that the patch does not break any existing unit tests on dev box. > Jenkins job overall unit test result pending.. > Verified on 1000 node cluster that the patch does not regress big operations. > Executed Stop Services and Start Services API call which gernerated around > 9000 tasks and compared request completion time for these operations. There > was a minor performance gain with the patch. As part of > https://issues.apache.org/jira/browse/AMBARI-18889, I will look if we can use > the request status and stage status to further enhance performance. > > > Thanks, > > Jaimin Jetly > >
