Re: BMRB NMR-STAR v3.1 file format or STAR format reader/writer (maybe using CCPN?).

gary thompson Thu, 31 Jul 2008 05:38:45 -0700

On Thu, Jul 31, 2008 at 10:04 AM, Edward d'Auvergne <[EMAIL PROTECTED]>wrote:


> Hi,
>
> Thank you for answering all my questions.  I think that clearly covers
> most things for me to start thinking about how this can be implemented
> (although I'm still unsure about how to input the bond lengths used in
> the calculation of the dipolar constants).  For adding BMRB NMR-STAR
> v3.1 file format reading and writing capabilities, I've now created a
> branch of the relax 1.3 development line which is viewable at
> http://svn.gna.org/viewcvs/relax/.  I think that it would be
> beneficial to add, in addition to the creation of STAR files for BMRB,
> reading capabilities simultaneously so that data from the BMRB can
> easily be read by relax and then a new or extended analysis performed
> (relax can also create input for Modelfree4 and Dasha, as well as
> control these programs).
>
> So the major difficulty in implementing this, as I see it, is the
> support for generic STAR formatted files or the specific NMR-STAR v3.1
> file format.  I have done extensive searches and although Python
> perfectly supports XML reading and writing, I haven't been able to
> find any Python packages for generic STAR format support.  Would
> anyone know of a STAR or NMR-STAR 3.1 Dictionary reader/writer for
> Python?  I could write a STAR format parser and writer, but that would
> take a lot of time.  It would be easier if a Python package for this
> could be found or recycled.  However the major issue with using a
> preexisting package would be legal issues with the copyright
> licencing.  Ideally the STAR format parser and writer would be
> appropriately licenced, for example maybe using
> http://www.python.org/download/releases/2.4.2/license/, to allow
> incorporation into the standard python modules (sitting alongside the
> XML reader/writer) so that all NMR programs with a python interface,
> which is quite a few nowadays, could have very easier access to the
> BMRB data.
>
> I have found PyCIFRW (http://anbf2.kek.jp/CIF/) and this also includes
> PySTARRW which could be useful.  However these have lisencing issues
> which clash with the open source GPL licence of relax.  So
> unfortunately I can't use these files.  The only other Python STAR
> reader/writer I've found is that used by in the CCPN data model
> (http://www.ccpn.ac.uk/).  This has the ability to convert NMR STAR
> format to the CCPN data model format through the file
> 'ccpnmr/ccpnmr1.0/python/ccpnmr/format/converters/NmrStarFormat.py'.
> The copyright licensing should be ok, but unfortunately this is not a
> generic reader/writer but something which is tightly integrated into
> CCPN.  Hence it would be too difficult to incorporate this file into
> relax.  I would like to have relax interface with the CCPN data model
> (https://mail.gna.org/public/relax-devel/2007-11/msg00037.html), but
> this would be far into the future and support for a model-free
> analysis may not be fully supported by CCPN yet
> (https://mail.gna.org/public/relax-devel/2007-12/msg00002.html).  One
> other thing I noticed at CCPN was a comment that a STAR reader/writer
> written in Python by Jurgen Doreleijers
> (http://tang.bmrb.wisc.edu/~jurgen/ <http://tang.bmrb.wisc.edu/%7Ejurgen/>)
> was incorporated into their
> software.  Do you know anything about this Python module?
>
> Once a usable STAR reader/writer is accessible by relax, then creating
> and reading BMRB deposition files should be relatively straight
> forward.
>
> Regards,
>
> Edward
>
>
> On Tue, Jul 29, 2008 at 7:16 PM, Eldon Ulrich <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > Thank you for the quick response and feedback. I will try to answer as
> many
> > of your comments and questions below. We are converting all of our data
> from
> > NMR-STAR v2.1 to NMR-STAR v3.1. Examples of the v3.1 files can be found
> on
> > the BMRB ftp site at
> >
> > ftp://ftp.bmrb.wisc.edu/pub/data/nmr-star-v3/
> >
> > These are early beta files and may have serious problems.
> >
> > For the purposes of this discussion, I will be referring to v3.1 tags.
> > Descriptions for these tags can be found at this URL:
> >
> > http://www.bmrb.wisc.edu/formats.html
> >
> > Files containing a fake NMR-STAR v3.1 file (nmrstar3_fake.txt) and other
> > information on the dictionary in its 'working' form is available from the
> > BMRB ftp site:
> >
> > ftp://ftp.bmrb.wisc.edu/pub/data/nmr-star_dict/dictionary_files
> >
> > We are very open to suggestions from the community on how to model and
> > capture relaxation data and are quite excited about this discussion. I am
> > sure I have not addressed all of your questions, but I hope this is a
> start.
> >
> > Cheers,
> > Eldon
> >
> >
> > Edward d'Auvergne wrote:
> >>
> >> Hi,
> >>
> >> I've had a look at the fields and have a few questions as to how these
> >> should be implemented.  I'm assuming that these are the fields for
> >> simply depositing R1 relaxation data into the BMRB, is this correct?
> >
> > The Excel file contains the tags for the fields in the ADIT-NMR
> deposition
> > system that are mandatory. These fields represent for the most part the
> meta
> > information about the molecule, sample, sample conditions, spectrometers,
> > etc. The T1 fields were included as an example for one kind of relaxation
> > data and the mandatory fields that would need to be entered in ADIT-NMR.
> The
> > actual tables of data would be uploaded at the time of deposition.
> >>
> >> So the first question I have has to do with Rx versus Tx.  Almost all
> >> theories for the interpretation of the T1 relaxation times are
> >> dependent upon this being in the R1 rate form (with units of
> >> rad.s^-1).  relax (http://nmr-relax.com), Art Palmer's curvefit
> >> (
> http://cpmcnet.columbia.edu/dept/gsas/biochem/labs/palmer/software.html),
> >> David Fushman's RELAXFIT
> >> (http://gandalf.umd.edu/FushmanLab/pdsw.html), and almost all other
> >> programs calculate the Rx relaxation rate errors and not relaxation
> >> time errors via Monte Carlo simulation.  Then the programs relax
> >> (http://nmr-relax.com), modelfree4
> >> (
> http://cpmcnet.columbia.edu/dept/gsas/biochem/labs/palmer/software.html),
> >> dasha (http://www.nmr.ru/dasha.html), DYNAMICS
> >> (http://gandalf.umd.edu/FushmanLab/pdsw.html), Tensor2
> >> (http://www.ibs.fr/ext/labos/LRMN/softs/welcome.htm), etc. all work
> >> with the rates and not the times.  So the storage of relation times
> >> and their errors may not be very useful.  Is it possible to deposit
> >> rates and their errors rather than the antiquated relaxation times and
> >> their errors?
> >
> > Yes, you can deposit rates and the appropriate error and not the times.
> The
> > T1.Val and T1.Val_err tags can have units of appropriate for either times
> or
> > rates (i.e., s or s-1). In the header to the table of T1 values is a tag
> > _Heteronucl_T1_list.T1_val_units. The value to this tag defines whether
> the
> > T1 data have been expressed as times or rates.
> >
> > The terminology used for relaxation studies in NMR has been quite
> diverse.
> > At the time these tags were constructed, the term 'T1' still seemed to be
> > the most commonly used. But, we realized capturing the data as rates was
> > extremely important and so we allowed for the units for the values to
> > actually determine if the values were times or rates.
> >
> >> Also, conversion of the Rx relation rate errors to the
> >> Tx time errors would require full Monte Carlo simulation to be
> >> accurate, and I'm not sure if anyone would have done this properly.  I
> >> could be wrong (anyone on this list who knows otherwise, please
> >> correct me), but I don't think there are any programs that use the Tx
> >> times or that properly convert Rx errors to Tx errors and vice versa.
> >>
> >> The second question I have has to do with the integration of relax
> >> with the BRMB deposition and automating the process.  Can all data for
> >> a model-free analysis be deposited at once?  For example if relax was
> >> to create a STAR formatted file with the ADIT-NMR fields with the R1,
> >> R2, and NOE values and errors at multiple fields, with the S2, S2f,
> >> S2s, te, ts, tf, and Rex parameters and errors, the selected model
> >> information (model name or parameters of the model), parameters such
> >> as the CSA value used and bond length, and global parameters such as
> >> the diffusion tensor, could this file be accepted?  Or will this
> >> require multiple small files for multiple deposition?
> >>
> > All of the data can be uploaded as one file. The NMR-STAR format is
> modular
> > and a single file can contain as many modules (saveframes) of the same or
> > different type with a few exceptions. A module or saveframe begins with
> the
> > key term 'save_somestring' and ends with the key term 'save_'. A file can
> > contain as many R1, R2, and NOE modules as needed. Within each of the
> > modules there is a header tag that takes as a value the field strength of
> > the spectrometer used to collect the data in that module as well as the
> NMR
> > experiment. It is important that the experiment used for the data be
> defined
> > uniquely.
> >
> > The following list of tags contains most of the values you mention, S2,
> S2f,
> > S2s, te, ts, Rex all with errors, and type of model fit. It is missing
> the
> > tf, but this can be easily added. The units for te and ts are provided in
> > the header tags
> > _Order_parameter_list.Tau_e_val_units  and
> > _Order_parameter_list.Tau_s_val_units. For the order parameter data, it
> is
> > important to include the experiments used to collect the underlying data.
> In
> > this way the order parameters are linked to the R1, R2, etc data used in
> > doing the fitting. It is possible to include in the file a description of
> > the software used and the 'method' or parameter file.
> >
> >
> >    _Order_param.Order_param_val
> >    _Order_param.Order_param_val_fit_err
> >    _Order_param.Tau_e_val
> >    _Order_param.Tau_e_val_fit_err
> >    _Order_param.Rex_val
> >    _Order_param.Rex_val_fit_err
> >    _Order_param.Model_free_sum_squared_errs
> >    _Order_param.Model_fit
> >    _Order_param.Sf2_val
> >    _Order_param.Sf2_val_fit_err
> >    _Order_param.Ss2_val
> >    _Order_param.Ss2_val_fit_err
> >    _Order_param.Tau_s_val
> >    _Order_param.Tau_s_val_fit_err
> >    _Order_param.SH2_val
> >    _Order_param.SH2_val_fit_err
> >    _Order_param.SN2_val
> >    _Order_param.SN2_val_fit_err
> >
> > The CSA data would be included in a separate module, but the same file.
> >
> >
> >> I've also noticed from some of the deposited data (e.g.
> >>
> >>
> http://www.bmrb.wisc.edu/data_library/gen_saveframe.php?accNum=6470&saveframe=T1_relaxation
> >> ) that all the data is identified by residue number.  For supporting
> >> analyses using nucleic acids, small biomolecules, or proteins where
> >> more than just the backbone NH relaxation has been studied, would it
> >> be possible to additionally have an atom or spin numerical code and
> >> textual label?  If an analysis is done on a molecular complex, is the
> >> deposition of data for multiple molecules supported as well?
> >
> > The header tag of the type '_Heteronucl_T1_list.T1_coherence_type' is
> > intended to provide an idea of the coherence being measured. In addition,
> > the following set of tags or similar set for other kinds of data are
> > provided for every row in a data value table. The values for these tags
> > allow an atom within a molecular assembly of almost any complexity
> > (including ones that are undergoing chemical or conformational exchange)
> to
> > be defined.
> >
> >     _T1.Entity_assembly_ID
> >     _T1.Entity_ID
> >     _T1.Comp_index_ID
> >     _T1.Seq_ID
> >     _T1.Comp_ID
> >     _T1.Atom_ID
> >     _T1.Atom_type
> >     _T1.Atom_isotope_number
> >
> > The data that is available from BMRB has been supplied by authors for the
> > most part and the quality and how well the data are described is variable
> > and in all cases out of our control as authors do not respond to our
> > requests for better descriptions and more complete data sets.
> >
> >>
> >> I still have many questions about the fields, their format in the STAR
> >> file to deposit, which are compulsory, and which fields do not yet
> >> exist for deposition of all model-free data (much of this data can be
> >> seen in the relax results file
> >>
> >>
> http://svn.gna.org/viewcvs/relax/1.3/test_suite/shared_data/model_free/OMP/final_results_trunc_1.3.bz2
> >> ).  For example most of the STAR tags in
> >>
> >>
> http://www.bmrb.wisc.edu/data_library/gen_saveframe.php?accNum=5841&saveframe=S2_parameters
> >> are not in the excel spreadsheet.  And why are order parameters and
> >> their errors input using the STAR format tags '_S2_value' and
> >> '_S2_error' whereas the T1 fields are called '_T1_value' and
> >> '_T1_value_error' and the effective model-free internal correlation
> >> time te filed under '_Tau_e_value' and '_Tau_e_value_fit_error'?
> >
> > When working on an almost 5000 tag dictionary over many years,
> > inconsistencies creep into the tag names. We have tried to eliminate
> these
> > inconsistencies as much as possible in the NMR-STAR v3 dictionary, but I
> > would guess there are still at least a few.
> >
> >> Would you have an example deposition text file formatted correctly
> >> using the ADIT-NMR tags in the Excel file?  Or is this unmodified, for
> >> example is
> >> http://www.bmrb.wisc.edu/cgi-bin/explore.cgi?format=raw&bmrbId=5841
> >> the same file as that that the authors deposited?
> >
> > I do not have a full relaxation example file. For example files you
> should
> > look in the directory on the ftp site listed above. We are working to
> clean
> > up these files as quickly as possible.
> >
> >> And how is the
> >> field strength dependent data handled, e.g. in
> >> http://www.bmrb.wisc.edu/cgi-bin/explore.cgi?format=raw&bmrbId=4970
> >> there are 2 spectrometers declared to be a 600 and 750, yet there is
> >> relaxation data at 500, 600 and 750 present in the file?
> >
> > As mentioned above, for each module containing data that are field
> strength
> > dependent there should be a tag that takes as a value the field strength
> of
> > the spectrometer used to collect the data. For data like order parameters
> > that are derived from different sets of data, currently the experiment
> list
> > is used to trace back to the input data and spectrometer field strength.
> >
> >>
> >> Cheers,
> >>
> >> Edward
> >>
> >>
> >> P.S.  For reference, this message will soon appear at
> >> https://mail.gna.org/public/relax-devel/.
> >>
> >>
> >>
> >> On Mon, Jul 28, 2008 at 6:01 PM, Eldon Ulrich <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> Hi Edward,
> >>>
> >>> Sorry for the delay in providing a list of the required ADIT-NMR
> fields.
> >>> An
> >>> Excel file with the information is attached compiled by one of our
> >>> students.
> >>> The table provides a fairly complete description of the field and where
> >>> appropriate the dependencies on other fields. In terms of the
> >>> experimental
> >>> data, only the fields required for T1 relaxation data were included.
> The
> >>> required fields may vary slightly depending on the kinds of data being
> >>> deposited.
> >>>
> >>> I hope this information helps. If you have any questions or need
> >>> additional
> >>> information, please let me know.
> >>>
> >>> All the best,
> >>> Eldon
> >>>
> >>> _______________________________________________
> >>> relax (http://nmr-relax.com)
> >>>
> >>> This is the relax-devel mailing list
> >>> [email protected]
> >>>
> >>> To unsubscribe from this list, get a password
> >>> reminder, or change your subscription options,
> >>> visit the list information page at
> >>> https://mail.gna.org/listinfo/relax-devel
> >>>
> >>>
> >
> >
>
> _______________________________________________
> relax (http://nmr-relax.com)
>
> This is the relax-devel mailing list
> [email protected]
>
> To unsubscribe from this list, get a password
> reminder, or change your subscription options,
> visit the list information page at
> https://mail.gna.org/listinfo/relax-devel



Hi Ed

some alternatives

1. stardom (gpl; ignore what it says on the first web page and just look at
the license) converts start files to an xml format
http://www.pasteur.fr/recherche/unites/Binfs/stardom
2. ccpn format converters come in two parts (I have helped write one for
import of data from xplor-marvin) I would have a look at
ccpnmr1.0/python/ccp/format/nmrStar which is a basic star file reader
framework...
3. I can assist here (my structure calculation stuff is now done [mostly so
I am heading back to dynamics]) ;-)

regards
gary

_______________________________________________
relax (http://nmr-relax.com)

This is the relax-devel mailing list
[email protected]

To unsubscribe from this list, get a password
reminder, or change your subscription options,
visit the list information page at
https://mail.gna.org/listinfo/relax-devel

Re: BMRB NMR-STAR v3.1 file format or STAR format reader/writer (maybe using CCPN?).

Reply via email to