Dear colleagues,

Apologies for cross posting - just wanted to ensure that we share these 
insights broadly...

We recently passed the eleven-year anniversary for the first upload to the 
international CKM - the body temperature archetype. As Europe readies itself 
for summer holidays and the clinical review season slows down, it is a good 
time to review the progress of the openEHR clinical modelling program.

Roughly 6 weeks ago I created and downloaded a number of reports from CKM. I've 
spent some time analysing the data and thought I'd share what I learned with 
This exploration was triggered by a tweet from Ewan Davis last December asking:
"How many person hours do you think has gone in to creating the openEHR 
archetypes available via CKM - I think it must be in excess of 100,000 hours 
(40 person years)"

It took a while to gather the data and propose reasonable assumptions so that 
we could make time and effort estimates, but here goes...
CKM stats
(As of July 5 2019):

  1.  Community
     *   Registered users - 2239
     *   Countries represented - 95
  2.  Archetype library
     *   Total archetypes - 785
     *   Active archetypes

Published - 93

                                                             ii.      Published 
as v1, needing reassessment - 6

                                                           iii.      In review 
- 31, with at least 7 about to be published

                                                           iv.      Draft - 351

                                                             v.      Initial 
(in incubators) - 110

     *   Proposed archetypes - 10

Behind the scenes
(from CKM reports, May 2019)

  1.  Number of archetypes which have completed or are undergoing a review 
process - 130
  2.  Number of review rounds completed - 295
  3.  Number of archetype reviews completed by all reviewers - 2995
  4.  Number of unique reviewers - 272
  5.  Reviews completed per review round - 10.15
  6.  Average number of reviews per archetype - 23.04
  7.  Average number of reviews per reviewer - 11.01
  8.  On average, approximately 100 unique reviewers log into to CKM 900 times 
per month during the past 12 months.

Time estimates
This is where things become interesting...
Design time
Reviewer time (assumes 30 minutes per review)
Editorial time
Clinical Knowledge Administrator (CKA)

This equates to roughly 8.5 person years.

Obviously, I have made some assumptions about the average time for many 
activities and if we factor in incidental conversations or pondering modelling 
conundrums or cross pollination between CKMs we could reasonably increase the 
estimate to 10 person years. However, try as I could, there was no way I could 
justify bumping them up in order to achieve estimates of 20, much less 40, 
person years. These numbers reflect the work for archetypes that are owned and 
managed in the international CKM. This includes an estimation of work done by 
the reviewers and editors from the Apperta and Norwegian CKMs if their 
archetypes are now residing in the international CKM, or multiple CKMs. It does 
not reflect the work done on reviews from the now retired Australian CKM, 
although estimates of design time have been part of the assumptions.

I interpret Ewan's estimate to reflect his impression that the effort to 
achieve what we have done so far was huge. I too believed that the effort was 
epic, but in my head it was still only in the ballpark of about half of his 
initial estimate. That the actual effort appears to be only 8-10 person years 
totally surprised me. Initially my figures were considerably lower; I did go 
back to the figures and tried to massage them upward because this is obviously 
a rather inexact science, more like an educated guestimate, but this is as far 
as I feel comfortable going.

In addition, Thomas Beale estimates that on average there are 14 clinically 
significant data elements per archetype, according to the ADL Workbench. These 
are the relevant data points that we design, review etc. So 785 active 
archetypes x 14 data points/archetype suggests that we have a library of 
approximately 10,990 data points, none of which are duplicates or overlapping 
in the governed archetypes. And if we agree with my estimate of a total of 
16289 hours, the amount of time per data element is 16289/10990 - only 1.48 
hours each, covering design, review, maintenance, governance.

What conclusions can we draw?

  *   Firstly, modelling 'openEHR style' seems to be quite efficient, 
surprising even those of us who are involved daily and secondly, this unique 
collaborative and crowdsourced approach to standardisation of clinical data is 
working well. On top of that, if you remember that more than 95% of the 
editorial work and reviewer's time has been volunteer, then it this truly has 
been an extraordinary community endeavour.
  *   Secondly, the ratio of reviewer time to design time is noteworthy - 1498 
hours of review, compared to 10437 hours of design. In effect, we have 
successfully minimised reviewer effort by making each 30-minute review count as 
efficiently as possible, and that has been achieved by attention to detail and 
spending time investigating and developing strong design patterns before we 
send them out for review. Over the years we have made some bad design choices 
and had to rethink our approach. Gradually we have been developing some good 
patterns and, before you ask where we have documented them, I will point you to 
the published archetypes - each of them functions as a potential pattern for 
the next archetype we intend to develop - we reference and reuse the patterns 
as much as possible. In this way our library is growing, and our modelling is 
improving. As an example, a current area of serious rework is the Physical 
examination archetypes which are being 'renovated' at present. It does make me 
think that for every hour spent in design it is a good investment of time and 
effort - that may not seem apparent in the early days, but I think that we are 
finding that it is paying off for the archetypes that we are designing years 
later, based on the (good and bad) learnings from those earliest archetype 
  *   Thirdly, we have some insights into the modelling community, and for the 
first time we have some idea about the level of activity by those with various 
roles and activities. We also have an estimate of the size of the data library 
at data element level, so that we are able to compare to other similar 
modelling efforts elsewhere in the world.

I would particularly like to thank my co-lead, Silje Ljosland Bakke, and Ian 
McNicoll for their dedicated efforts, and of course to all of the other 
Editors, Reviewers and Translators who have so generously volunteered their 
time and expertise to create a strong free and public foundation for digital 
health data standards.

We should all be very proud of this work. This will be our legacy that will 
live on after well after we've all long retired.

Kind regards

Heather Leslie

Dr Heather Leslie
M +61 418 966 670
Skype: heatherleslie
Twitter: @atomicainfo, @clinicalmodels & @omowizard

openEHR-technical mailing list

Reply via email to