> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, line 129 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line129> > > > > Keep the name? > > > > Shared persistent volume is a kind of persistent volume.
Yes, agreed. This happened since to start off, we had a separate test framework for shared volumes, and later decided to merge the test frameworks for shared and regular persistent volumes into a single test. > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, lines 188-189 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line188> > > > > Why is "shared-only" a possible value? A shard has only one persistent > > volume so it's either "shared" or not. How about just a `bool isShared`. > > (As suggested below, we can put it in the volume). The original thinking for `flags.mode` was not scoped to a shard but to the framework, i.e. if the framework launches shards with tasks where every shard used a shared volume (`shared-only` case) or alternates between shared and regular volumes (`mixed` case). However, I think I liked separating that intention with `num_shards` (for regular volumes) and `num_shared_shards` (for shared volumes). So, removed `flags.mode` and added `flags.num_shared_shards`. > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, lines 215-222 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line215> > > > > No need to act differently based on the shard type. > > > > We can just have > > > > - The first task (`launched == 0`) writes to the file: `echo hello > > > volume/persisted` > > - Later tasks read from the file: `cat volume/persisted` > > > > For regular persistent volumes the two tasks will be sequential, for > > shared ones they'll be simulatenous. The fact that 1st task could finish > > before the 2nd is launched is not important: the example framework mainly > > serves the purpose of demostrating the usage of the feature and we don't > > want the tasks to block the CI for too long. > > > > As a potentially followup we can take a read/consumer command and a > > write/producer command from flags (with default values). So CI would use > > the default values to complete quickly while a human could try out > > different write and reads commands which could represent more > > interesting/edge cases. I think the issue is when the writer task has not done an actual write yet, but the reader task has done a read already (this is possible since these are 2 separate tasks). This would not happen in shards using regular volumes (since we guarantee the order of task launches and hence order of read/write), but can happen in rare case in shared volumes. So, this is what I did to avoid that race: Writer task: `echo hello > volume/persisted` for both regular and shared volumes. Reader task(s): Regular volumes: `cat volume/persisted` Shared volumes: We try for 5 iterations to read `volume/persisted` and we exit when we read successfully (exits with success) or we fail to read in 5 iterations (exits with a failure). ```COUNTER=0; while [ $COUNTER -lt 5 ]; do cat volume/persisted; if [ $? -eq 0 ]; then exit 0; fi; COUNTER=$[COUNTER+1]; sleep 1; done; exit 1");``` > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, lines 208-209 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line208> > > > > Any need for `string task_id`? If so it needs to be named `taskId` but > > I don't see the need? Once it's assigned to the task we can get it out by > > `task.task_id()`. I was doing a `echo` on the `task id` so I used a temporary variable, but I agree we can certainly avoid it. > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, lines 470-476 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line470> > > > > I think it's sufficient to have the following states. (We should use a > > minimun number of state, we can always add more when more features are > > added to persistent volumes which demand more state but starting with a > > large number of states makes it difficult to evolve the test). > > > > ``` > > STAGING = 0, // The shard is awaiting offers to launch more tasks. > > RUNNING, // The shard is fully running (all its tasks are > > launched). > > TERMINATING, // The shard is terminating and needs to clean up > > its persistent volume. > > DONE // The shard is terminated. > > ``` > > > > This translates to: > > > > ``` > > STAGING = 0, // In resourceOffers: launch tasks > > RUNNING, // In resourceOffers: noop; In statusUpdate: check > > shard TERMINATING condition. > > TERMINATING, // In resourceOffers: DESTROY > > DONE // Test terminal condition. > > ``` To accomplish this state, I changed shard's state to `Option<State> state` so we start the state machine if `shared.state == None()`. Also, if for some reason `DESTROY` fails continuously, the test would not exit. We can fail the test if it fails on n attempts of `DESTROY` by keeping track of number of `DESTROY` attempts but maybe not required. Comments? > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, line 473 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line473> > > > > `WAITING` state removed. > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, line 474 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line474> > > > > To represent a state, use a `-ing` word? > > > > See the overall comment on the enum. Changed this to `TERMINATING` state. > On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote: > > src/examples/persistent_volume_framework.cpp, lines 475-476 > > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line475> > > > > Why both `WAIT_DONE` and `DONE`? > > > > Doesn't look like we need to treat shared pv and regular pv differently. > > > > See the overall comment on the enum. Removed `WAIT_DONE`. I wanted to keep track of re-attempts of `DESTROY` in `WAIT_DONE` but maybe it is not needed (and if needed, it can be done without a new state). - Anindya ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45962/#review154380 ----------------------------------------------------------- On Nov. 7, 2016, 5:26 a.m., Anindya Sinha wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45962/ > ----------------------------------------------------------- > > (Updated Nov. 7, 2016, 5:26 a.m.) > > > Review request for mesos, Greg Mann, Jie Yu, and Jiang Yan Xu. > > > Bugs: MESOS-4431 > https://issues.apache.org/jira/browse/MESOS-4431 > > > Repository: mesos > > > Description > ------- > > Updated a persistent volume test framework to include shared volumes. > > > Diffs > ----- > > src/examples/persistent_volume_framework.cpp > 9d45bb496c4cf378af429b5aa970bf81450e482a > > Diff: https://reviews.apache.org/r/45962/diff/ > > > Testing > ------- > > New test framework for shared resources added. > Tests successful. > > > Thanks, > > Anindya Sinha > >
