Hi Carl,
I am not really a user but know the code behind those features.
As you know, the TCH graphs stores the change records that make up each
working copy/workflow. This graph-based approach means that anyone (with
admin permissions) can manipulate that graph using SPARQL or scripts
such as SWP. In SPARQL, someone could potentially issue an UPDATE
request to delete all change records related to certain workflows.
Assuming the .tch graph is the default graph, one low-level maintenance
operation would be
DELETE {
?change ?p ?o .
?triple ?tp ?to .
}
WHERE {
?change a teamwork:Change .
?change teamwork:tag ?tag .
?change teamwork:added|teamwork:deleted ?triple .
?change ?p ?o .
?triple ?tp ?to .
}
which would delete any workflow-related teamwork:Change entries plus
their linked triples (which are the bulk of data). This keeps the label
and comment of the workflows themselves (instances of teamwork:Tag), and
you may want to get rid of those too, e.g.
DELETE {
?tag ?p ?o .
}
WHERE {
?tag a teamwork:Tag .
?tag teamwork:status teamwork:Committed .
?tag ?p ?o .
}
to delete all committed workflows completely. This should significantly
reduce the size of the TCH graphs.
In all of these queries you can add a FILTER, e.g. do
WHERE {
?tag a teamwork:Tag .
...
?tag teamwork:statusChange ?change .
?change teamwork:newStatus teamwork:Committed .
?change dcterms:created ?date .
FILTER (?date < "2019-01-01"^^xsd:date)
}
I have shown the SPARQL also as a way to illustrate the format of
workflows in the RDF data model.
When you open teamwork.ui.ttlx in TBC and look at
teamwork:ArchiveChangesToFile you also see a script that can be called
from the outside and probably does some of what you are interested in.
If you need a variation of this, create a clone and call that.
In any case, before making any such calls, try them in a safe
environments, e.g. from TBC-ME, not on the actual data!
Just activating the Archive Working Copies on Commit will not
retrospectively archive existing committed workflows.
Holger
On 10/09/2019 06:46, [email protected] wrote:
I hope anyone reading this can share their experiences with managing
similar issues...
We have several enterprise EDG projects (taxonomies and ontologies)
whose ".tch" change history graphs are getting very large, since they
have been actively edited for several years now. Separately, we have a
large number (1000+ workflows per project, for two of our projects) of
committed but un-archived workflows. These two conditions seem to be
slowing down performance in several areas:
* Loading the list of workflows
* Loading the change history for an individual resource (this is
painfully slow now)
* Sending these projects to another server (if we do choose to also
send the .tch files)
* Updating the projects available to Explorer users. (EDG seems to
automatically send the .tch file when you do this, but if it were
up to me I would choose not to, since the change history is not
relevant to our Explorer users.)
I am considering different methods of archiving working copies and
reducing the size of our change history graphs, including:
* Activating the "Archive Working Copies on Commit" option... but I
have several questions around what this entails:
o Will doing this automatically archive all our
already-committed workflows? If not, is there a way to do that
in a batch? (I can't find one.)
o Will doing this reduce the size of the .tch graph in any way?
Or does it just remove the workflows from the UI?
* Manually editing the .tch graphs to remove all changes of
particular types, or made before a certain date.
o Is there any easy way of doing this? (I can only think of
exporting the graphs, editing them in TBC or a text editor,
and then re-importing them.)
It would be nice if there were a built-in setting to "only keep the
last x years of changes in the change history" or something similar.
It would also be nice if there were a way to only "keep forever" a
particular type of change data: who made which edits/to a project/ and
when. Over the long term, other types of change data (e.g. details
/about workflows /and which changes they contained) are not useful to
us and could be deleted after a certain period (say, a year).
For example, suppose I start a new workflow for a taxonomy, add 30
altLabels to concepts, send the workflow on to a colleague for review,
and then commit the changes to production. The only data I'd want to
know /forever /by consulting the change history is the date, time, and
creator of those 30 edits to altLabels -- not the fact that they were
part of a certain workflow, or the facts that the workflow was
created/transitioned/committed/archived by certain people at certain
dates and times.
Thanks for any insight that either TQ users or team members can provide.
-Carl
--
You received this message because you are subscribed to the Google
Groups "TopBraid Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to [email protected]
<mailto:[email protected]>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/topbraid-users/56722c17-4adc-4816-808f-c1f5869f61e5%40googlegroups.com
<https://groups.google.com/d/msgid/topbraid-users/56722c17-4adc-4816-808f-c1f5869f61e5%40googlegroups.com?utm_medium=email&utm_source=footer>.
--
You received this message because you are subscribed to the Google Groups "TopBraid
Suite Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/topbraid-users/711ef32d-30bf-a1f2-7095-754901a009b7%40topquadrant.com.