Re: [ccp4bb] database-assisted data archive

2010-08-23 Thread Chris Morris
The PiMS team intends that the CCP4 records link not only with the
synchrotron, but further back to crystallogenesis records in xtalPiMS,
and protein production records in PiMS. The benefits this will provide
include:
- if you find an unexpected piece of electron density, navigating to
records that show what substances were in the sample
- designing crystalogenesis screens in the light of data not only about
crystals obtained, but also about diffraction.

Paul Paukstelis rightly points out that "was convincing anyone to
actually use it" is hard, even though the cost of lost work is
significant. To address this, we need to ensure:
- data entry is as automatic as possible
- everything joins up, so that one act of data entry has multiple
payoffs.

The aim must be seamless data transfer and consistent user interfaces,
all the way from target selection to structure interpretation, delivered
in a way that is extensible as methods evolve, and which supports not
only PX but also other methods. This is a large challenge, but it is
achievable.

Andreas, in the short term I suggest you look at keeping your files in a
Subversion repository.  This provides a central backup, and it can
easily be mapped as a folder on Linux, OSX, and Windows, because it
implements the WebDAV standard. Each project can have a sub-folder.
 
regards,
Chris

Chris Morris   
chris.mor...@stfc.ac.uk
Tel: +44 (0)1925 603689  Fax: +44 (0)1925 603634
Mobile: 07921-717915
https://www.pims-lims.org/
Daresbury Lab,  Daresbury,  Warrington,  UK,  WA4 4AD 
 
Date:Wed, 18 Aug 2010 12:19:36 +0100
From:Georgios Pelios 
Subject: Re: database-assisted data archive

Dear all

As CCP4, we are currently developing the new CCP4i that will include a
database application that will store project and job data. The database
schema has already been designed but its design is not final and can be
modified depending on user feedback. Now, we are in the process of
writing the database API. Any suggestions and ideas regarding data
storage and retrieval are welcome. 

George Pelios
CCP4


Re: [ccp4bb] database-assisted data archive

2010-08-19 Thread Steve Androulakis
TARDIS and MyTARDIS (for public and private data respectively) is currently in 
production at the Australian Synchrotron and has just received funding to 
expand to working with all data being produced from all beamlines (not just 
macromolecular) - and also to all instruments at the Australian nuclear 
facility ANSTO.

One of the major drawcards of this system is that for users of the Australian 
Synchrotron there is zero barrier to entry as far as data cataloguing and 
access. Once the frames come off the beamline, their headers are extracted and 
catalogued in a database. This is all accessible for download anywhere today 
via the web portal http://tardis.synchrotron.org.au under one's synchrotron 
user account. Information is gathered from the proposal and scheduling systems 
at the facility and fed to this MyTARDIS node, so there is literally nothing a 
user *has* to enter to have their data described and accounted for in the 
system.

Furthermore, an instance of MyTARDIS can be set up at the lab or institution to 
receive a local copy of the data and metadata. For instance, if a 
crystallographer from Melbourne university has a MyTARDIS set up in their lab, 
the MyTARDIS node at the Australian Synchrotron detects if new data off a 
beamline is owned by this crystallographer and sends a copy of all data and its 
associated metadata for download through a local web portal - under their 
regular university login system. A sharing interface allows crystallographers 
to grant access to fellow researchers so that they can also download data and 
browse/search through metadata.

Later on, a user will be able to add datasets with results and log files to 
these catalogued raw diffraction datasets and publish them. Published data 
appears in the central index TARDIS.edu.au and contains a persistent handle for 
citation. No data is actually stored at this central index: TARDIS.edu.au 
simply provides rich metadata and download links to federated MyTARDIS nodes 
and their stored data.

There are plans to have (at least) the first diffraction image converted to JPG 
or PNG and stored/displayed by the web portal (as Andreas mentioned), as well 
as crystal quality ranking and other (eg. XDS) processing.

As a final note, while the preferred method of data storage in TARDIS is the 
zero-effort one via synchrotrons, there's a method of manually depositing 
diffraction datasets, irrespective of date or origin. See: 
http://tardis.edu.au/deposit for more details.

A mailing list (Google Group) has just been set up for discussion of 
TARDIS/MyTARDIS. Feel free to join in to keep abreast of changes and discuss 
finer points of the solution:

http://groups.google.com/group/tardis-users


Re: [ccp4bb] database-assisted data archive

2010-08-19 Thread Georgios Pelios
Hi everyone

 

Thanks for your emails.

 

Apparently, there is a wide range of suggestions and ideas about what
(and how) can be stored in a database with X-ray crystallography 

data. We would all like to be able to store all our research data in a
database, with as little an effort as possible, 

in a very simple way, and we should also be able to easily upload it to
our laptops. How many of you think this is possible?

 

As I mentioned in an earlier email, the new CCP4i will include a small
database. The primary function of the database will be 

to store data on projects, jobs, files and users. 

 

One of the objectives is to allow other programs (such as Coot, CCP4mg
and iMosflm) to be more integrated with the main CCP4 suite

interface, allowing data from those programs to be accessible from CCP4i
and vice-versa.

 

The database will be tested thoroughly before it is released as part of
the new CCP4i. We welcome user feedback.

 

George Pelios

CCP4

 

 



Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Mark Brooks
Dear Andreas,
If you really want to do this, and want to define what is
the data is, it's not _so_ difficult to do it yourself, with Ruby on Rails (
http://rubyonrails.org/)

You have to know how to script a bit, and know a bit about
Model/View/Controller frameworks. http://www.youtube.com/watch?v=Gzj723LkRJY

That's not what you asked, but if you want to define what is the data to be
input, you end up being unhappy with someone else's implementation.

Mark

2010/8/18 Andreas Förster 

> Dear all,
>
> going through some previous lab member's data and trying to make sense of
> it, I was wondering what kind of solutions exist to simply the archiving and
> retrieval process.
>
> In particular, what I have in mind is a web interface that allows a user
> who has just returned from the synchrotron or the in-house detector to fill
> in a few boxes (user, name of protein, mutant, light source, quality of
> data, number of frames, status of project, etc) and then upload his data
> from the USB stick, portable hard drive or remote storage.
>
> The database application would put the data in a safe place (some file
> server that's periodically backed up) and let users browse through all the
> collected data of the lab with minimal effort later.
>
> I doesn't seem too hard to implement this, which is why I'm asking if
> anyone has done so already.
>
> Thanks.
>
>
> Andreas
>
> --
>Andreas Förster, Research Associate
>Paul Freemont & Xiaodong Zhang Labs
> Department of Biochemistry, Imperial College London
>http://www.msf.bio.ic.ac.uk
>



-- 
Skype: markabrooks


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Ethan Merritt
On Wednesday 18 August 2010 11:25:19 am Andreas Förster wrote:
> Thanks to everyone for the good ideas and suggestion.  Let me clarify 
> what I want.  A simple system that does one task.  I'm with James Holton 
> on complexity and with several others on wikis and databases.  They're 
> simple to set up and easy to use, but no one does, besides the one who 
> implemented them.  I've seen this with a lab wiki and a plasmid 
> database.  If the boss just approves of the project but doesn't enforce 
> usage, it won't be used.
> 
> That's why what I really want is an unavoidable system.  

Our protocol makes use of a FileMaker database (the one Juergen Bosch
mentioned earlier) that tracks all mounted crystals.  It is both handy
and, as you say you want (but be careful what you wish for), unavoidable.
Juergen was largely responsible for setting it up in the first place,
but it has remained in continuous use since then.

This works for us because the great bulk of our data collection is done
using the BluIce interface to the SSRL beamlines.  As a requirement for
data collection, users must provide a spreadsheet that indexes
each crystal and its location in the  SSRL sample cassette.  
We create this spreadsheet directly as an export from our lab database.
The database itself assigns a unique systematic directory name for each 
crystal. The spreadsheet is then used by the beamline software to screen 
and collect data from all the crystals.  
The beamline software fills in screening information as it goes,
including the cell dimensions, etc, as determined by the automated
software.  The data images for each crystal are put into a uniquely
named directory as specified in the spreadsheet. After the run,
the updated spreadsheet is merged back into our lab database and
the data images are archived keeping their systematic uniquely
determined directory names.

Yes, if you work hard at it you can manage to mess up, say, the
human-interpretable meaning of the assigned systematic name.
But you cannot avoid the system altogether, because the only way
to reserve a slot for your crystal in the cassette being sent for
data collection is to enter its identifying information in the lab
database. 

There is still room to lose track of archived data at a larger scale.
Last I asked, TARDIS and the like cannot really help much with this.
If your 600 Gigabytes of archived data from 2008 are indexed as being
stored on disk XD_2008_2 in Room K407 of building HSB, it can tell
you exactly what directory on that disk corresponds to the data
from which crystal.  Unfortunately, it doesn't tell you that in fact 
that disk was moved to a room down the hall 6 months ago when the lab
was reorganized :-)

The drawbacks of this system are

- I wish I knew of an open-source linux-compatible equivalent 
  to FileMaker.  Nothing else I have looked at offered this level of 
  easy yet controlled access via a web browser from remote locations.

- Compliance with the protocol drops to less than 100% for datasets
  collected at home rather than at a beamline.  

- One is still faced with the issue of how to deal with archiving
  terabytes of data


- Ethan



> I'm thinking of 
> an uploader that sits on the file server.  Only the uploader has write 
> permission.  The user calls the uploader because data is only backed up 
> on the file server, puts the data directory name into a box and fills in 
> a few other boxes (four or five) because otherwise the uploader won't 
> work.  The uploader interface could then be used to query the file 
> server and find datasets.  But the key is that the system MUST be used 
> to archive data - basically like flickr, but with the tag boxes 
> mandatory.  It's look like TARDIS (http://tardis.edu.au/) might have 
> such capabilities.
> 
> The discussion regarding LIMS and ISPyB and other fancy tracking systems 
> was fascinating, but I don't see those as relevant for my archiving 
> task.  For the same reason, xTrack doesn't fit my bill.  I want to bury 
> data, but not so deep that I don't find them should I ever need to.  I 
> don't care about space group or crystallization conditions or processing 
> information - the CCP4_DATABASE breaks with time anyway, either because 
> a user renamed directories or because the user's home directory has been 
> moved to /oldhome to make space for new users.  I just want to be able 
> to always find old data.
> 
> Going off on a tangent, associating a jpg of the first image (with 
> resolution rings) to each dataset is great.  Can the generation of such 
> images be automated, ie. a script for the whole directory tree?
> 
> All best.
> 
> 
> Andreas
> 
> 
> 
> On 18/08/2010 11:44, Eleanor Dodson wrote:
> > I would contact Johan Turkenburg here - he and sSam Hart have organised
> > the York data archive brilliantly - it is now pretty straightforward to
> > access any data back to ~ 1998 I think..
> >
> > Eleanor
> > j...@ysbl.york.ac.uk
> >
> > Andreas Förster wrote:
> >>

Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Andreas Förster
Thanks to everyone for the good ideas and suggestion.  Let me clarify 
what I want.  A simple system that does one task.  I'm with James Holton 
on complexity and with several others on wikis and databases.  They're 
simple to set up and easy to use, but no one does, besides the one who 
implemented them.  I've seen this with a lab wiki and a plasmid 
database.  If the boss just approves of the project but doesn't enforce 
usage, it won't be used.


That's why what I really want is an unavoidable system.  I'm thinking of 
an uploader that sits on the file server.  Only the uploader has write 
permission.  The user calls the uploader because data is only backed up 
on the file server, puts the data directory name into a box and fills in 
a few other boxes (four or five) because otherwise the uploader won't 
work.  The uploader interface could then be used to query the file 
server and find datasets.  But the key is that the system MUST be used 
to archive data - basically like flickr, but with the tag boxes 
mandatory.  It's look like TARDIS (http://tardis.edu.au/) might have 
such capabilities.


The discussion regarding LIMS and ISPyB and other fancy tracking systems 
was fascinating, but I don't see those as relevant for my archiving 
task.  For the same reason, xTrack doesn't fit my bill.  I want to bury 
data, but not so deep that I don't find them should I ever need to.  I 
don't care about space group or crystallization conditions or processing 
information - the CCP4_DATABASE breaks with time anyway, either because 
a user renamed directories or because the user's home directory has been 
moved to /oldhome to make space for new users.  I just want to be able 
to always find old data.


Going off on a tangent, associating a jpg of the first image (with 
resolution rings) to each dataset is great.  Can the generation of such 
images be automated, ie. a script for the whole directory tree?


All best.


Andreas



On 18/08/2010 11:44, Eleanor Dodson wrote:

I would contact Johan Turkenburg here - he and sSam Hart have organised
the York data archive brilliantly - it is now pretty straightforward to
access any data back to ~ 1998 I think..

Eleanor
j...@ysbl.york.ac.uk

Andreas Förster wrote:

Dear all,

going through some previous lab member's data and trying to make sense
of it, I was wondering what kind of solutions exist to simply the
archiving and retrieval process.

In particular, what I have in mind is a web interface that allows a
user who has just returned from the synchrotron or the in-house
detector to fill in a few boxes (user, name of protein, mutant, light
source, quality of data, number of frames, status of project, etc) and
then upload his data from the USB stick, portable hard drive or remote
storage.

The database application would put the data in a safe place (some file
server that's periodically backed up) and let users browse through all
the collected data of the lab with minimal effort later.

I doesn't seem too hard to implement this, which is why I'm asking if
anyone has done so already.

Thanks.


Andreas






--
Andreas Förster, Research Associate
Paul Freemont & Xiaodong Zhang Labs
Department of Biochemistry, Imperial College London
http://www.msf.bio.ic.ac.uk


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Berry, Ian
What about XTrack?
http://xray.bmc.uu.se/xtrack/



-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of James 
Holton
Sent: 18 August 2010 16:54
To: CCP4BB@JISCMAIL.AC.UK
Subject: Re: [ccp4bb] database-assisted data archive

There is an image archiving system called TARDIS (http://tardis.edu.au/) 
that sounds more-or-less exactly like what you describe. 

I agree that it would be "nice" if you can get your synchrotron to do it 
for you, but since every single beamline and home-source setup in the 
world has already been providing you with a "database" that is more 
commonly called the "image header", I don't think it is too hard to 
imagine how accurate the data in your "database" is going to be.

If I may interject my two cents, I have found that when a user is asked 
to fill out a form, compliance is inversely proportional to the number 
of fields on the form.  But far more important than that: if you ask 
them to answer a question that they simply don't know the answer to, 
they will likely skip the whole thing.  An excellent example (I think) 
is asking for the space group BEFORE they have even taken their first 
snapshot of a brand new crystal.  This datum is simply not known until 
AFTER the structure is solved!  For example, is it P41 or P43?  You 
don't "really" know that until after you see a helix in the map.  What 
is the molecular weight?  That depends on whether or not it is a 
complex. (if I had a nickel for every user who was certain they had a 
protein-DNA complex with a "very low solvent content", I would be quite 
rich).

All that said, I don't think it is unreasonable to expect an image 
header (or any other database) to contain motor positions, detector 
type, wavelength, beam center etc. Clearly this is not always the case, 
and this problem still needs a lot of work, but my point is that we 
should try to write down things that we "really know" (observations) and 
not try to muddle the database with derived quantities (interpretations).

When it comes to what you "really know" about the sample, all you can 
realistically hope to be sure of is the list of chemicals that went into 
the drop: macromolecule sequence, salts, PEGs, and their respective 
concentrations.  Sometimes you don't even kow that! (i.e. proteolysis).  
However, the macromolecule sequence is INCREDIBLY useful for deriving 
(or at least guessing) a great many other things (such as the molecular 
weight, solvent content, number of heavy atom sites).  The list of salts 
is also absolutely critical for doing radiation damage predictions. 

So, as my rant comes to an end, I would strongly suggest focusing on 
trying to capture the important things that we actually do know, rather 
than confusing our poor users further by asking them to write down a lot 
of things that they don't.

-James Holton
MAD Scientist

Andreas Förster wrote:
> Dear all,
>
> going through some previous lab member's data and trying to make sense 
> of it, I was wondering what kind of solutions exist to simply the 
> archiving and retrieval process.
>
> In particular, what I have in mind is a web interface that allows a 
> user who has just returned from the synchrotron or the in-house 
> detector to fill in a few boxes (user, name of protein, mutant, light 
> source, quality of data, number of frames, status of project, etc) and 
> then upload his data from the USB stick, portable hard drive or remote 
> storage.
>
> The database application would put the data in a safe place (some file 
> server that's periodically backed up) and let users browse through all 
> the collected data of the lab with minimal effort later.
>
> I doesn't seem too hard to implement this, which is why I'm asking if 
> anyone has done so already.
>
> Thanks.
>
>
> Andreas
>
Evotec (UK) Ltd is a limited company registered in England and Wales. 
Registration number:2674265. Registered office: 114 Milton Park, Abingdon, 
Oxfordshire, OX14 4SA, United Kingdom.


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread James Holton
There is an image archiving system called TARDIS (http://tardis.edu.au/) 
that sounds more-or-less exactly like what you describe. 

I agree that it would be "nice" if you can get your synchrotron to do it 
for you, but since every single beamline and home-source setup in the 
world has already been providing you with a "database" that is more 
commonly called the "image header", I don't think it is too hard to 
imagine how accurate the data in your "database" is going to be.


If I may interject my two cents, I have found that when a user is asked 
to fill out a form, compliance is inversely proportional to the number 
of fields on the form.  But far more important than that: if you ask 
them to answer a question that they simply don't know the answer to, 
they will likely skip the whole thing.  An excellent example (I think) 
is asking for the space group BEFORE they have even taken their first 
snapshot of a brand new crystal.  This datum is simply not known until 
AFTER the structure is solved!  For example, is it P41 or P43?  You 
don't "really" know that until after you see a helix in the map.  What 
is the molecular weight?  That depends on whether or not it is a 
complex. (if I had a nickel for every user who was certain they had a 
protein-DNA complex with a "very low solvent content", I would be quite 
rich).


All that said, I don't think it is unreasonable to expect an image 
header (or any other database) to contain motor positions, detector 
type, wavelength, beam center etc. Clearly this is not always the case, 
and this problem still needs a lot of work, but my point is that we 
should try to write down things that we "really know" (observations) and 
not try to muddle the database with derived quantities (interpretations).


When it comes to what you "really know" about the sample, all you can 
realistically hope to be sure of is the list of chemicals that went into 
the drop: macromolecule sequence, salts, PEGs, and their respective 
concentrations.  Sometimes you don't even kow that! (i.e. proteolysis).  
However, the macromolecule sequence is INCREDIBLY useful for deriving 
(or at least guessing) a great many other things (such as the molecular 
weight, solvent content, number of heavy atom sites).  The list of salts 
is also absolutely critical for doing radiation damage predictions. 

So, as my rant comes to an end, I would strongly suggest focusing on 
trying to capture the important things that we actually do know, rather 
than confusing our poor users further by asking them to write down a lot 
of things that they don't.


-James Holton
MAD Scientist

Andreas Förster wrote:

Dear all,

going through some previous lab member's data and trying to make sense 
of it, I was wondering what kind of solutions exist to simply the 
archiving and retrieval process.


In particular, what I have in mind is a web interface that allows a 
user who has just returned from the synchrotron or the in-house 
detector to fill in a few boxes (user, name of protein, mutant, light 
source, quality of data, number of frames, status of project, etc) and 
then upload his data from the USB stick, portable hard drive or remote 
storage.


The database application would put the data in a safe place (some file 
server that's periodically backed up) and let users browse through all 
the collected data of the lab with minimal effort later.


I doesn't seem too hard to implement this, which is why I'm asking if 
anyone has done so already.


Thanks.


Andreas



Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Matthew BOWLER

Dear All,
   I would just like to add to Enrico's mention of ISPyB.  This LIMS 
system will log all your data collected at the beamline (experimental 
parameters, screening images, data sets, edge scans, xrf spectra, 
crystal snapshots etc) automatically and is stored indefinitely.  Your 
colleagues can also follow data collections in real time by logging on 
from their home labs.  In addition, you can upload large amounts of 
information on your samples (acronym, space group, pin barcode etc) to 
the data base that can be recovered at the beamline through MXCuBE and 
the sample changer, tying all data collections to this information.  You 
can also track your dewars to and from the ESRF using it - even 
receiving an email when it reaches the beamline. It has recently delved 
into the world of data analysis, as you can rank crystals against each 
other using a number of criteria.  For those not in an exclusive 
relationship with the ESRF, you will be glad to hear it is also 
available at Diamond and I believe will be at PETRAIII.


Cheers, Matt


Some links:

ISPyB: 
http://www.esrf.eu/UsersAndScience/Experiments/MX/How_to_use_our_beamlines/ISPYB


Sample tracking: 
http://www.esrf.eu/UsersAndScience/Experiments/MX/How_to_use_our_beamlines/ISPYB/ispyb-dewar-tracking


Ranking:  
http://www.esrf.eu/UsersAndScience/Experiments/MX/How_to_use_our_beamlines/ISPYB/ispyb-sample-ranking





Enrico Stura wrote:
Knowing where all the important files are is really all that is 
needed. Sofistication can come later.

I would welcome a CCP4 database-assisted data archive system.

Here is my contribution to the discussion:

I agree with Paul Paukstelis that getting users to use any 
database-assisted data archive system
is the biggest obstacle. I have had problems with compliance with my 
system, where all that the student
has to do is to provide file and directory names each Friday to keep 
the database up to date.


It is a simple html based access system where through hyperlinks one 
can access the data anywhere
where it is stored. Users need only provide the directories names of 
where the various pieces
of data are stored within the accessible network and the data manager 
(any HTML competent individual)
can then set-up the links to the main control platform (start-up html 
page).
The advantage of such system is that it is platform independent and 
needs only a well configured browser.

It is backward compatible with any old data.

George Pelios may want to consider an automated system where mosflm, 
scala and all subsequent
programs contribute to create and update a raw data retrieval file  on 
the basis of the files
they have used. When the project is finished a backup program should 
be able to retrieve
all such files to be stored in a consolidated manner for transfer to a 
long term storage server.


A brief description of the system I use for synchrotron data collection:

Prior to the synchrotron trip, each sample taken to the synchrotron is 
entered in a table that represents its position in
the puck with hyperlinks to a file describing its position in the 
crystallization tray (this file will have hyperlinks

to crystallization and all prior preparation steps).
As data is collected a short comment (resolution and number of frames 
is included if data has been
collected) as the data is transfered in the home lab a link to the 
directory where the data is

stored is then added.
To give an idea of data quality Mosflm and gimp screen capture are 
used to create a jpg of
the first data image (with the frame filename added) which is stored 
in the same directory as

the raw data frames. This image is accessed when clicking on the comment.
Compliance with the system can be checked by clicking on comments 
other than "not tested".


It is all manual but is not very time consuming once the initial html 
templates have been
set up. Still I am looking foward to a simple CCP4 designed system 
that can do something similar

automatically.

I would also recommend looking at ispyb implemented at the ESRF which 
is also web based:

www.esrf.eu/UsersAndScience/Experiments/MX/Software/ispyb

Enrico.



--
Matthew Bowler
Structural Biology Group
European Synchrotron Radiation Facility
B.P. 220, 6 rue Jules Horowitz
F-38043 GRENOBLE CEDEX
FRANCE
===
Tel: +33 (0) 4.76.88.29.28
Fax: +33 (0) 4.76.88.29.04

http://www.esrf.fr/UsersAndScience/Experiments/MX/
=== 


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Enrico Stura
Knowing where all the important files are is really all that is needed.  
Sofistication can come later.

I would welcome a CCP4 database-assisted data archive system.

Here is my contribution to the discussion:

I agree with Paul Paukstelis that getting users to use any  
database-assisted data archive system
is the biggest obstacle. I have had problems with compliance with my  
system, where all that the student
has to do is to provide file and directory names each Friday to keep the  
database up to date.


It is a simple html based access system where through hyperlinks one can  
access the data anywhere
where it is stored. Users need only provide the directories names of where  
the various pieces
of data are stored within the accessible network and the data manager (any  
HTML competent individual)
can then set-up the links to the main control platform (start-up html  
page).
The advantage of such system is that it is platform independent and needs  
only a well configured browser.

It is backward compatible with any old data.

George Pelios may want to consider an automated system where mosflm, scala  
and all subsequent
programs contribute to create and update a raw data retrieval file  on the  
basis of the files
they have used. When the project is finished a backup program should be  
able to retrieve
all such files to be stored in a consolidated manner for transfer to a  
long term storage server.


A brief description of the system I use for synchrotron data collection:

Prior to the synchrotron trip, each sample taken to the synchrotron is  
entered in a table that represents its position in
the puck with hyperlinks to a file describing its position in the  
crystallization tray (this file will have hyperlinks

to crystallization and all prior preparation steps).
As data is collected a short comment (resolution and number of frames is  
included if data has been
collected) as the data is transfered in the home lab a link to the  
directory where the data is

stored is then added.
To give an idea of data quality Mosflm and gimp screen capture are used to  
create a jpg of
the first data image (with the frame filename added) which is stored in  
the same directory as

the raw data frames. This image is accessed when clicking on the comment.
Compliance with the system can be checked by clicking on comments other  
than "not tested".


It is all manual but is not very time consuming once the initial html  
templates have been
set up. Still I am looking foward to a simple CCP4 designed system that  
can do something similar

automatically.

I would also recommend looking at ispyb implemented at the ESRF which is  
also web based:

www.esrf.eu/UsersAndScience/Experiments/MX/Software/ispyb

Enrico.

--
Enrico A. Stura D.Phil. (Oxon) ,Tel: 33 (0)1 69 08 4302 Office
Room 19, Bat.152,   Tel: 33 (0)1 69 08 9449Lab
LTMB, SIMOPRO, IBiTec-S, CE Saclay, 91191 Gif-sur-Yvette,   FRANCE
http://www-dsv.cea.fr/en/institutes/institute-of-biology-and-technology-saclay-ibitec-s/unites-de-recherche/department-of-molecular-engineering-of-proteins-simopro/molecular-toxinology-and-biotechnology-laboratory-ltmb/crystallogenesis-e.-stura
http://www.chem.gla.ac.uk/protein/mirror/stura/index2.html
e-mail: est...@cea.fr Fax: 33 (0)1 69 08 90 71


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Georgios Pelios
Dear all

As CCP4, we are currently developing the new CCP4i that will include a database 
application that will store project and job data. The database schema has 
already been designed but its design is not final and can be modified depending 
on user feedback. Now, we are in the process of writing the database API. Any 
suggestions and ideas regarding data storage and retrieval are welcome. 

George Pelios
CCP4



-Original Message-
From: CCP4 bulletin board [mailto:ccp...@jiscmail.ac.uk] On Behalf Of Andreas 
Förster
Sent: 18 August 2010 10:53
To: CCP4BB@JISCMAIL.AC.UK
Subject: [ccp4bb] database-assisted data archive

Dear all,

going through some previous lab member's data and trying to make sense 
of it, I was wondering what kind of solutions exist to simply the 
archiving and retrieval process.

In particular, what I have in mind is a web interface that allows a user 
who has just returned from the synchrotron or the in-house detector to 
fill in a few boxes (user, name of protein, mutant, light source, 
quality of data, number of frames, status of project, etc) and then 
upload his data from the USB stick, portable hard drive or remote storage.

The database application would put the data in a safe place (some file 
server that's periodically backed up) and let users browse through all 
the collected data of the lab with minimal effort later.

I doesn't seem too hard to implement this, which is why I'm asking if 
anyone has done so already.

Thanks.


Andreas

-- 
 Andreas Förster, Research Associate
 Paul Freemont & Xiaodong Zhang Labs
Department of Biochemistry, Imperial College London
 http://www.msf.bio.ic.ac.uk


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Paul Paukstelis
I did something like that for plasmids by putting together a web 
interface, php, and MySQL. It was simple, maybe a little ugly, but 
worked nicely. The problem was convincing anyone to actually use it was 
virtually impossible.


--paul

On 08/18/2010 04:52 AM, Andreas Förster wrote:

Dear all,

going through some previous lab member's data and trying to make sense
of it, I was wondering what kind of solutions exist to simply the
archiving and retrieval process.

In particular, what I have in mind is a web interface that allows a user
who has just returned from the synchrotron or the in-house detector to
fill in a few boxes (user, name of protein, mutant, light source,
quality of data, number of frames, status of project, etc) and then
upload his data from the USB stick, portable hard drive or remote storage.

The database application would put the data in a safe place (some file
server that's periodically backed up) and let users browse through all
the collected data of the lab with minimal effort later.

I doesn't seem too hard to implement this, which is why I'm asking if
anyone has done so already.

Thanks.


Andreas



--
Paul Paukstelis, Ph.D
Assistant Professor
University of Maryland
Chemistry & Biochemistry Dept.
Center for Biomolecular Structure & Organization
pauks...@umd.edu
301-405-9933


Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Eleanor Dodson
I would contact Johan Turkenburg here - he and sSam Hart have organised 
the York data archive brilliantly - it is now pretty straightforward to 
access any data back to ~ 1998 I think..


 Eleanor
j...@ysbl.york.ac.uk

Andreas Förster wrote:

Dear all,

going through some previous lab member's data and trying to make sense 
of it, I was wondering what kind of solutions exist to simply the 
archiving and retrieval process.


In particular, what I have in mind is a web interface that allows a user 
who has just returned from the synchrotron or the in-house detector to 
fill in a few boxes (user, name of protein, mutant, light source, 
quality of data, number of frames, status of project, etc) and then 
upload his data from the USB stick, portable hard drive or remote storage.


The database application would put the data in a safe place (some file 
server that's periodically backed up) and let users browse through all 
the collected data of the lab with minimal effort later.


I doesn't seem too hard to implement this, which is why I'm asking if 
anyone has done so already.


Thanks.


Andreas



Re: [ccp4bb] database-assisted data archive

2010-08-18 Thread Jürgen Bosch
Do you want the frames to be accessible too ?
If not, then a.wiki would be an easy solution.
Alternatively a Filemaker database would do the trick too.

Jürgen 

..
Jürgen Bosch
Johns Hopkins Bloomberg School of Public Health
Department of Biochemistry & Molecular Biology
Johns Hopkins Malaria Research Institute
615 North Wolfe Street, W8708
Baltimore, MD 21205
Phone: +1-410-614-4742
Lab:  +1-410-614-4894
Fax:  +1-410-955-3655
http://web.mac.com/bosch_lab/

On Aug 18, 2010, at 5:52, Andreas Förster  wrote:

> Dear all,
> 
> going through some previous lab member's data and trying to make sense 
> of it, I was wondering what kind of solutions exist to simply the 
> archiving and retrieval process.
> 
> In particular, what I have in mind is a web interface that allows a user 
> who has just returned from the synchrotron or the in-house detector to 
> fill in a few boxes (user, name of protein, mutant, light source, 
> quality of data, number of frames, status of project, etc) and then 
> upload his data from the USB stick, portable hard drive or remote storage.
> 
> The database application would put the data in a safe place (some file 
> server that's periodically backed up) and let users browse through all 
> the collected data of the lab with minimal effort later.
> 
> I doesn't seem too hard to implement this, which is why I'm asking if 
> anyone has done so already.
> 
> Thanks.
> 
> 
> Andreas
> 
> -- 
> Andreas Förster, Research Associate
> Paul Freemont & Xiaodong Zhang Labs
> Department of Biochemistry, Imperial College London
> http://www.msf.bio.ic.ac.uk


[ccp4bb] database-assisted data archive

2010-08-18 Thread Andreas Förster

Dear all,

going through some previous lab member's data and trying to make sense 
of it, I was wondering what kind of solutions exist to simply the 
archiving and retrieval process.


In particular, what I have in mind is a web interface that allows a user 
who has just returned from the synchrotron or the in-house detector to 
fill in a few boxes (user, name of protein, mutant, light source, 
quality of data, number of frames, status of project, etc) and then 
upload his data from the USB stick, portable hard drive or remote storage.


The database application would put the data in a safe place (some file 
server that's periodically backed up) and let users browse through all 
the collected data of the lab with minimal effort later.


I doesn't seem too hard to implement this, which is why I'm asking if 
anyone has done so already.


Thanks.


Andreas

--
Andreas Förster, Research Associate
Paul Freemont & Xiaodong Zhang Labs
Department of Biochemistry, Imperial College London
http://www.msf.bio.ic.ac.uk