> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, line 129
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line129>
> >
> >     Keep the name?
> >     
> >     Shared persistent volume is a kind of persistent volume.

Yes, agreed. This happened since to start off, we had a separate test framework 
for shared volumes, and later decided to merge the test frameworks for shared 
and regular persistent volumes into a single test.


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, lines 188-189
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line188>
> >
> >     Why is "shared-only" a possible value? A shard has only one persistent 
> > volume so it's either "shared" or not. How about just a `bool isShared`. 
> > (As suggested below, we can put it in the volume).

The original thinking for `flags.mode` was not scoped to a shard but to the 
framework, i.e. if the framework launches shards with tasks where every shard 
used a shared volume (`shared-only` case) or alternates between shared and 
regular volumes (`mixed` case). However, I think I liked separating that 
intention with `num_shards` (for regular volumes) and `num_shared_shards` (for 
shared volumes). So, removed `flags.mode` and added `flags.num_shared_shards`.


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, lines 215-222
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line215>
> >
> >     No need to act differently based on the shard type.
> >     
> >     We can just have 
> >     
> >     - The first task (`launched == 0`) writes to the file: `echo hello > 
> > volume/persisted`
> >     - Later tasks read from the file: `cat volume/persisted`
> >     
> >     For regular persistent volumes the two tasks will be sequential, for 
> > shared ones they'll be simulatenous. The fact that 1st task could finish 
> > before the 2nd is launched is not important: the example framework mainly 
> > serves the purpose of demostrating the usage of the feature and we don't 
> > want the tasks to block the CI for too long.
> >     
> >     As a potentially followup we can take a read/consumer command and a 
> > write/producer command from flags (with default values). So CI would use 
> > the default values to complete quickly while a human could try out 
> > different write and reads commands which could represent more 
> > interesting/edge cases.

I think the issue is when the writer task has not done an actual write yet, but 
the reader task has done a read already (this is possible since these are 2 
separate tasks). This would not happen in shards using regular volumes (since 
we guarantee the order of task launches and hence order of read/write), but can 
happen in rare case in shared volumes. So, this is what I did to avoid that 
race:

Writer task:
`echo hello > volume/persisted` for both regular and shared volumes.

Reader task(s):
Regular volumes: `cat volume/persisted`
Shared volumes: We try for 5 iterations to read `volume/persisted` and we exit 
when we read successfully (exits with success) or we fail to read in 5 
iterations (exits with a failure).
```COUNTER=0; while [ $COUNTER -lt 5 ]; 
   do cat volume/persisted;
   if [ $? -eq 0 ]; then
     exit 0;
   fi;
   COUNTER=$[COUNTER+1]; sleep 1; done; exit 1");```


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, lines 208-209
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line208>
> >
> >     Any need for `string task_id`? If so it needs to be named `taskId` but 
> > I don't see the need? Once it's assigned to the task we can get it out by 
> > `task.task_id()`.

I was doing a `echo` on the `task id` so I used a temporary variable, but I 
agree we can certainly avoid it.


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, lines 470-476
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line470>
> >
> >     I think it's sufficient to have the following states. (We should use a 
> > minimun number of state, we can always add more when more features are 
> > added to persistent volumes which demand more state but starting with a 
> > large number of states makes it difficult to evolve the test).
> >     
> >     ```
> >           STAGING = 0, // The shard is awaiting offers to launch more tasks.
> >           RUNNING,     // The shard is fully running (all its tasks are 
> > launched).
> >           TERMINATING, // The shard is terminating and needs to clean up 
> > its persistent volume.
> >           DONE         // The shard is terminated.
> >     ```
> >     
> >     This translates to:
> >     
> >     ```
> >           STAGING = 0, // In resourceOffers: launch tasks
> >           RUNNING,     // In resourceOffers: noop; In statusUpdate: check 
> > shard TERMINATING condition.
> >           TERMINATING, // In resourceOffers: DESTROY
> >           DONE         // Test terminal condition.
> >     ```

To accomplish this state, I changed shard's state to `Option<State> state` so 
we start the state machine if `shared.state == None()`. Also, if for some 
reason `DESTROY` fails continuously, the test would not exit. We can fail the 
test if it fails on n attempts of `DESTROY` by keeping track of number of 
`DESTROY` attempts but maybe not required. Comments?


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, line 473
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line473>
> >
> >

`WAITING` state removed.


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, line 474
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line474>
> >
> >     To represent a state, use a `-ing` word?
> >     
> >     See the overall comment on the enum.

Changed this to `TERMINATING` state.


> On Nov. 7, 2016, 5:25 a.m., Jiang Yan Xu wrote:
> > src/examples/persistent_volume_framework.cpp, lines 475-476
> > <https://reviews.apache.org/r/45962/diff/19/?file=1543197#file1543197line475>
> >
> >     Why both `WAIT_DONE` and `DONE`?
> >     
> >     Doesn't look like we need to treat shared pv and regular pv differently.
> >     
> >     See the overall comment on the enum.

Removed `WAIT_DONE`. I wanted to keep track of re-attempts of `DESTROY` in 
`WAIT_DONE` but maybe it is not needed (and if needed, it can be done without a 
new state).


- Anindya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45962/#review154380
-----------------------------------------------------------


On Nov. 7, 2016, 5:26 a.m., Anindya Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45962/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2016, 5:26 a.m.)
> 
> 
> Review request for mesos, Greg Mann, Jie Yu, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-4431
>     https://issues.apache.org/jira/browse/MESOS-4431
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Updated a persistent volume test framework to include shared volumes.
> 
> 
> Diffs
> -----
> 
>   src/examples/persistent_volume_framework.cpp 
> 9d45bb496c4cf378af429b5aa970bf81450e482a 
> 
> Diff: https://reviews.apache.org/r/45962/diff/
> 
> 
> Testing
> -------
> 
> New test framework for shared resources added.
> Tests successful.
> 
> 
> Thanks,
> 
> Anindya Sinha
> 
>

Reply via email to