Dear Tim

Short answer, forensics & chain of trust: $wonder-drug binding to 
$interesting-target is published and is carried downstream, some time later it 
is discovered that the binding mode is different in vivo to the published 
structure and you want to be able to verify (or otherwise) all of the steps 
which were taken to arrive at that structure. For this you need the original 
data. You also need other things, but without the original diffraction images 
all you have is an easily faked table of numbers.

Not saying that this happens frequently but there have been cases where this 
has happened. Making the raw data available is a useful check, as properly 
simulating this *including detector artefacts* is hard.

One opinion, clearly others are equally valid.

Another comment I will make is people are completely happy to pay large sums 
for lab equipment & consumables. Surely storing your data that are the basis of 
your science is just another consumable? You could draw a parallel with buying 
screens - clearly you test all of the conditions in case some work - here we’re 
talking about storing all of your data in case you need *some* later. Like with 
crystallisation conditions, you don’t usually know a priori which you need.

Cheerio Graeme

On 23 Oct 2015, at 10:16, Tim Gruene 
<tim.gru...@psi.ch<mailto:tim.gru...@psi.ch>> wrote:

Dear all,

I have wondered if it is really worth the effort (and disk space) for central
long-term storage of diffraction images. What fraction of such data will ever
be looked at in the future after the respective project has been published?
Even if some revolutionary new technology would be developed, I guess this
would mostly be applied to current rather than old projects.
Given the substantial energy consumption of long term storage (including DVDs
and tape as these have to be produced), the gross benefit might be greater
deleting old data at some point saving energy and effort for more current
things.

I have been through a few disk crashs. Often I was annoyed because I had to
reinstall a new computer, and sometimes I could not recover some data which I
would have liked to. But in fact it often cleaned my computer and life went on
even without access to whatever got lost.

So what is the scientific argument behind long-term storage of diffraction
images other than academic interest in re-processing the data? As mentioned
above, I guess that the benefit of re-processing the data may only be minor
and effort might be better spent on concurrent projects.

Best wishes,
Tim

On Wednesday, October 21, 2015 06:03:21 PM Allister Crow wrote:
On the last point about storing diffraction images, I wonder what the
community's opinion is of uploading images to the Zenodo archive for
safe-keeping and sharing?

The Zenodo project is being run by the folks at CERN, and is EU funded to
support scientific data sharing.  (Zenodo.org<http://zenodo.org>)

Until the PDB does this, perhaps this is one of the better ways through
which we can ensure preservation (or at least another backup) of our most
important diffraction images?

- Ally

ps I should also say that I originally learned of Zenodo from Graeme Winter
at Diamond.

-----------------
Allister Crow
Department of Pathology
University of Cambridge
Google Scholar Profile <http://bit.ly/11ga7Sq>
Research Gate Profile <http://bit.ly/137Ytt4>
Departmental Page <http://www.path.cam.ac.uk/directory/allister-crow>

On 21 Oct 2015, at 17:03, William G. Scott 
<wgsc...@ucsc.edu<mailto:wgsc...@ucsc.edu>> wrote:

Dear CCP4 Citizenry:

I’m worried about medium to long-term data storage and integrity.  At the
moment, our lab uses mostly HFS+ formatted filesystems on our disks,
which is the OS X default.  HFS+ always struck me as somewhat fragile,
and resource forks at best are a (seemingly needless) headache, at least
as far as crystallography datasets go.  (True, you can do HFS-compression
and losslessly shrink your images by a factor of 2, or shrink your ccp4
installation, but these are fairly minor advantages).

I read the CCP4 wiki page
http://strucbio.biologie.uni-konstanz.de/ccp4wiki/index.php/Filesystems
that summarizes some of the other options. From what I have read, there
and elsewhere, it seems like zfs and btrfs might be significantly better
alternatives to HFS+, but I really would like to get a sense of what
others have experienced with these, or other, equally or more robust
options. I don’t feel like I know enough to critically evaluate the
information.

Anyone know what the NSA uses?

I recently created a de novo backup of some personal data on an external
HFS+ drive (photos, movies, music, etc).  I was very unpleasantly
surprised to find several files had been silently corrupted.  (In the
case of a movie file, for example, the file would play but could not be
copied. In another case, a music file would not copy, yet it had
identical md5sum and sha1 checksums when compared to an uncorrupted
redundant backup I had.  I’m still puzzled by this, but it suggests the
resource fork might be the source of the corruption, and, more worrisome
still, that conventional checksums aren’t detecting some silently
corrupted data, so I am not even sure if zfs self-healing would be the
answer.)

Since we as a community are now encouraging primary X-ray diffraction
images to be stored, I can only imagine the problem could be ubiquitous,
and a discussion might be worth having.  (I apologize if this has been
addressed previously; I did search the archive.)

All the best,

Bill



William G. Scott
Director, Program in Biochemistry and Molecular Biology
Professor, Department of Chemistry and Biochemistry
and The Center for the Molecular Biology of RNA
University of California at Santa Cruz
Santa Cruz, California 95064
USA

--
--
Paul Scherrer Institut
Dr. Tim Gruene
- persoenlich -
OFLC/102
CH-5232 Villigen PSI
phone: +41 (0)56 310 5297

GPG Key ID = A46BEE1A


-- 
This e-mail and any attachments may contain confidential, copyright and or 
privileged material, and are for the use of the intended addressee only. If you 
are not the intended addressee or an authorised recipient of the addressee 
please notify us of receipt by returning the e-mail and do not use, copy, 
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not 
necessarily of Diamond Light Source Ltd. 
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments 
are free from viruses and we cannot accept liability for any damage which you 
may sustain as a result of software viruses which may be transmitted in or with 
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and 
Wales with its registered office at Diamond House, Harwell Science and 
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Reply via email to