John Kuszewski wrote:

Hi John

firstly many thanks for all the useful input! The good news was that I 
had most of it right already ;-) but it was really nice to have your 
comments to read and assure me that i had everything right especially 
the deleting of the reference pdb. It would be rerally nice to have some 
of the things in thia note in the eginput directory and possibley a 
skeleton calculation that was easy to pickup and use

> Hi,
>
> The best way to get started with marvin is to build off the eginput  
> example.
>
> Stuff you'll need to assemble:
>
> 1.  a PSF file for your protein, of course.
> 2.  a shift table for your protein.
> 3.  a NOESY peak location table for your protein.
>
> Regarding the shift table, it doesn't have to be in PIPP format.  I  
> can also read
> NMR-STAR, although you have to be careful to get the degeneracy codes  
> correct.


unfortunatley it turned out that analysis can't output nmrstra with 
shifts and peaks  (yet) so our friends at the ebi (wim vranken) whipped 
up a very basic pipp shift output filter and this was enough along with 
the nmrdraw peakoutput filter they have. I did have to write a short 
script to get the chemical shift tranges from the nmrdraw peak table as 
they have a different format to pipp:

#
# reads the parameters in a nmrDraw peak table and
# returns the center of the selected axis in PPM
# and its width in PPM [clearly this is _NOT_ true!]
#

proc readSpectralRangeFromnmrDrawPeakTable args {

    set fname [requiredFlagVal $args -fileName]
    set axis  [requiredFlagVal $args -axis]

    global errorInfo

    # open file

    if {[catch {set inUnit [open $fname r]}]} {
        error "Error opening input file $fname"
    }

    # eat text until we hit a line that starts with DATA <column name>
    set found 0
    while {! $found} {

        if {[catch {set l [nextLine $inUnit]}]} {
            set errorInfo ""
            break
        }

        if {([lindex $l 0] == "DATA") && ([lindex $l 1] == $axis)} {
            set found 1
        }
    }

    close $inUnit

    if {! $found} {
        error "DATA line for axis named $axis not found in file $fname"
    }


    #
    # nmrDraw header format is
    # DATA <axisDirection> <axisType> <unknown> <unknown> <maxShift ppm> 
<minimum ppm>
    #
    # obvious the two fields <unknown> <unknown> contain something, but 
thats one for frank delaglio!
    #
    # also i don't seem to have the number of data points in the spectrum...

    set specStartPPM     [lindex $l 5]
    set specEndPPM       [lindex $l 6]

    if {[string first "ppm" $specStartPPM] == -1} {
        error "start shift ($specStartPPM) for axis $axis doesn't appear 
to have ppm units... (shift should finish with ppm eg 121.3ppm)"
    }

    if {[string first "ppm" $specEndPPM] == -1} {
        error "end shift ($specEndPPM) for axis $axis doesn't appear to 
have ppm units... (shift should finish with ppm eg 15.2ppm)"
    }

    set specStartPPM     [string trimright $specStartPPM "pm"]
    set specEndPPM       [string trimright $specEndPPM "pm"]

    #
    # i am not quite clear if these are shifts for the edge of the 
spectrum or the first
    # and last point so i am not sure if we have a picket frnce error 
another one for frank delaglio!
    # so maybe: the spectrum actually begins 1/2 of a point before the 
first point's frequency.
    #

    return [list $specStartPPM $specEndPPM]
}


n.b. there seems to be an error in the comments on the function 
readSpectralRangeFromPippPeakTable
#
# reads the parameters in a PIPP .PCK table and
# returns the center of the selected axis in PPM
# and its width in PPM
#

it seems to be returning the start and finish of the spectrum instead: 
return [list $specStartPPM $specEndPPM]


I have also started to do a couple of other things

1. annotate the tcl files in xplor/tcl with autodoc comments as an aid 
to understanding them
2. annotating copys of the initMatch3dC.tcl etc to get my head round 
what they do
3. writing a more generic version of initMatch3dC.tcl to try and make it 
simpler to plugin most of a new project:

I hope these are of use. Should I post them back for examination and 
possible distribution (I am currently just doing these things to help 
myself)

package require aeneas
package require pipp
package require marvin


# modified from eginput/marvin/cvn gst Thu May 31 11:18:18 BST 2007

set quickMode [flagExists $argv -quick]

requiredFlagVal $argv -shifts
set shiftList [flagVal $argv -shifts]

requiredFlagVal $argv -peaks
set peakList [flagVal $argv -peaks]

requiredFlagVal $argv -psf
set  psfFile [flagVal $argv -psf]

requiredFlagVal $argv -root
set fileRoot [flagVal $argv -root]

set inD2O 1

set peakFromProtonColumn  "X"
set peakFromCarbonColumn  "Z"
set peakToProtonColumn    "Y"

set specFromProtonAxis "X_AXIS"
set specFromHeavyAxis  "Z_AXIS"
set specToProtonAxis   "Y_AXIS"

set heavyToleranceCoarse   0.75
set protonToleranceCoarse  0.075
set heavyToleranceFine   0.2
set protonToleranceFine  0.02

set solventTolerance 0.05
set diagonalTolerance 0.001

set stripeToleranceProton $protonToleranceFine
set stripeToleranceHeavy  $heavyToleranceFine

set foldingFlags {\
-signChangesUponFoldingFromHeavyatomDimension \
-shiftsAliasedAlongFromHeavyatomDimension \
-shiftsAliasedAlongFromProtonDimension \
-shiftsAliasedAlongToProtonDimension \
-nonFoldedPeaksAreNegative
}

puts "parameters"
puts "----------"
puts ""
puts "shifts:           $shiftList"
puts "peaks:            $peakList"
puts "psf file:         $psfFile"
puts "output file root: $fileRoot"
puts "in d2o:           $inD2O"
puts ""
puts "peak table column:"
puts "from proton axis: $peakFromProtonColumn"
puts "from carbon axis: $peakFromCarbonColumn"
puts "to proton axis:   $peakToProtonColumn"
puts ""
puts "peak table spectrum axes:"
puts "from proton axis: $specFromProtonAxis"
puts "from carbon axis: $specFromHeavyAxis"
puts "to proton axis:   $specToProtonAxis"
puts ""
puts "peak width tolerances:"
puts "proton coarse: $protonToleranceCoarse"
puts "heavy coarse:  $heavyToleranceCoarse"
puts "proton fine:   $protonToleranceFine"
puts "heavy  fine:   $heavyToleranceFine"
puts ""
puts "stripe tolerances"
puts "proton: $stripeToleranceProton"
puts "heavy:  $stripeToleranceHeavy"
puts ""
puts "artifact tolerances"
puts "solvent:  $solventTolerance"
puts "diagonal: $diagonalTolerance"
puts ""
puts "folding flags"
puts $foldingFlags

initParamPsfPdb \
    -psfFileName   $psfFile \
    -randomCoords

#
# quick mode will only calculate the structure of the first 36 residues 
of cyanovirin
#

if {$quickMode} {
    XplorCommand "delete sele (not (resid 1:36)) end"
}



set noe [create_MarvinNOEPotential marv]

set peakRemarks [list]
set saRemarks [list]

#
# read the chemical shift table
#

set shifts [readPippShiftTable -fileName $shiftList -remarksVariableName 
saRemarks] ; noOutput

puts "using flexible!!!"
reportUnassignedAtoms \
    -shiftList $shifts \
    -flexibleRegionSelection [AtomSel -args "(resid 1:2)"] \
    -remarksVariableName saRemarks

#removed -flexibleRegionSelection [AtomSel -args "(resid 1:2)"]



#
# this spectrum was taken in D2O.  Therefore, don't create to 
shiftAssignments
# for exchangeable protons
#

if $inD2O  {
    puts "in d2o!!!"
    set tempString "(name h* and not (name hn or name ht* or (resn thr 
and name hg1) or
                                 (resn ser and name hg) or (resn lys and 
name hz*) or
                                 (resn tyr and name hh) or (resn arg and 
name hh*) or
                                 (resn arg and name he) or (resn asn and 
name hd*) or
                                 (resn gln and name he*)))"
}

createShiftAssignments \
    -shiftList $shifts \
    -pot $noe \
    -fromProtonSelectionString "(name h* and bondedto (name c*))" \
    -fromHeavyatomSelectionString "(name c*)" \
    -toProtonSelectionString $tempString \
    -fromProtonDimensionSolventRange [list 4.6 4.8] \
    -namePrefix "3dc" \
    -remarksVariableName saRemarks

expandStereospecificShiftAssignments \
    -shiftAssignments [$noe shiftAssignments] \
    -remarksVariableName saRemarks

recordToFromPartnersForShiftAssignments \
    -pot $noe \
    -remarksVariableName saRemarks

markMethylShiftAssignments \
    -shiftAssignments [$noe shiftAssignments] \
    -remarksVariableName saRemarks


#
# read the NOE peak table.  Make sure that the
# columns are identified correctly
#

    process3dCPippPeakTable -fileName   $peakList \
        -peakIDcolumnName PkID \
        -fromProtonColumnName $peakFromProtonColumn \
        -fromCarbonColumnName $peakFromCarbonColumn \
        -toProtonColumnName   $peakToProtonColumn   \
        -intensityColumnName Intensity \
        -pot $noe \
        -remarksVariableName peakRemarks



#
# First matching stage, to detect any shift referencing differences
# between the from and to proton dimensions.
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
#

puts $errorInfo

puts "shift range [readSpectralRangeFromPippPeakTable -fileName 
$peakList -axis $specFromProtonAxis]"

#note use of list command to protect string from commands such as $noe peaks
eval  match3d \
    -peakList [list [$noe peaks ]]  \
    -fromProtonTolerancePPM $protonToleranceCoarse \
    -fromProtonSpectralRangePPM    [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromProtonAxis]] \
    -fromHeavyatomTolerancePPM $heavyToleranceCoarse \
    -fromHeavyatomSpectralRangePPM [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromHeavyAxis]]  \
    -toProtonTolerancePPM $protonToleranceCoarse \
    -toProtonSpectralRangePPM      [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specToProtonAxis]]   \
    -pot $noe \
    -remarksVariableName peakRemarks \
    $foldingFlags

# correct the shift referencing between the proton dimensions,
# correct the shift referencing between the proton dimensions,
# yielding updated peak locations
#

correctShiftReferencing \
    -pot $noe  \
    -correctToProtonDimension \
    -remarksVariableName peakRemarks

puts $peakRemarks

puts $errorInfo


#
# clean up the peak list based on global spectral features
#

removeDiagonalPeaks \
    -peakList [$noe peaks] \
    -pot $noe \
    -tolerance $diagonalTolerance \
    -remarksVariableName peakRemarks

removeSolventPeaks \
    -peakList [$noe peaks] \
    -pot $noe \
    -autoDetect \
    -fromProtonDimension \
    -tolerance $solventTolerance \
    -remarksVariableName peakRemarks

#
# Second matching stage, to detect differences between the NOE peak 
locations
# and the shift table entries, based on peak stripes.
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
#


#note use of list command to protect string from commands such as $noe peaks
eval match3d \
    -peakList [list [$noe peaks]] \
    -fromProtonTolerancePPM $protonToleranceCoarse \
    -fromProtonSpectralRangePPM    [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromProtonAxis]] \
    -fromHeavyatomTolerancePPM $heavyToleranceCoarse \
    -fromHeavyatomSpectralRangePPM [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromHeavyAxis]] \
    -toProtonTolerancePPM $protonToleranceCoarse \
    -toProtonSpectralRangePPM      [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specToProtonAxis]] \
    -pot $noe \
    -remarksVariableName peakRemarks \
    $foldingFlags

puts $errorInfo

#
# correct the shift assignments' chemical shifts by detecting stripes,
# yielding updated shift assignment values
#

stripeCorrection\
    -pot $noe \
    -stripeGenerationProtonTolerance $stripeToleranceProton \
    -stripeGenerationHeavyatomTolerance $stripeToleranceHeavy \
    -estimateMissingTargets \
    -remarksVariableName peakRemarks \
    -geminalFilter \
    -toFromFilter \
    -toFromShare \
    -useBackboneSequential

puts $errorInfo
puts $peakRemarks

#
# Third matching stage, done with tight tolerances
#
# Enter the valid spectral range (in PPM) along each dimension.
# Make sure that the correct sign for a non-folded peak is given.
# rematch
#

#note use of list command to protect string from commands such as $noe peaks
eval match3d \
    -peakList [list [$noe peaks]] \
    -fromProtonTolerancePPM $protonToleranceFine \
    -fromProtonSpectralRangePPM    [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromProtonAxis]] \
    -fromHeavyatomTolerancePPM $heavyToleranceFine \
    -fromHeavyatomSpectralRangePPM [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specFromHeavyAxis]] \
    -toProtonTolerancePPM $protonToleranceFine \
    -toProtonSpectralRangePPM      [list 
[readSpectralRangeFromPippPeakTable -fileName $peakList -axis 
$specToProtonAxis]] \
    -pot $noe \
    -remarksVariableName peakRemarks \
    $foldingFlags

#
# generate the distance bounds
#

generateDistanceBounds -peakList [$noe peaks]
correctUpBoundsForMethyls -peakList [$noe peaks]

#
# mark good PAs
#


markGoodPeakAssignments \
    -peakList [$noe peaks] \
    -remarksVariableName peakRemarks \
    -violCutoff 0.5

initializePeakAssignmentNumFiltersFailed -pot $noe

currentFilterSummary $noe

netFilter \
    -pot $noe \
    -intraresidueNeighborhoods \
    -extendNeighborhoodsWithHighScoringPeakAssignments \
    -failedFiltersCutoff 0 \
    -numIterations 5 \
    -expectedContactsPerShiftAssignment 10 \
    -createInverseExceptions \
    -preferIntra \
    -resetNumFailedFilters \
    -remarksVariableName peakRemarks

puts $errorInfo

currentFilterSummary $noe

primarySequenceDistanceFilter \
    -pot $noe \
    -intraresidue \
    -likelihoodMode \
    -remarksVariableName peakRemarks

currentFilterSummary $noe



initializeLikelihoodsFromFilters -pot $noe


createExplicitInverseExceptions -pot $noe -remarksVariableName saRemarks 
-failedFiltersCutoff 0
writeExplicitInverseExceptions -pot $noe -filename 
${fileRoot}_3dc.exceptions


#
# report statistics.  If there is a known structure, you can
# enter it here to generate accuracy statistics in the starting
# NOE file.
#

initialShiftAssignmentAnalysis \
    -pot $noe \
    -minLikelihood 0.9 \
    -remarksVariableName saRemarks

puts $errorInfo

writeShiftAssignments \
    -fileName  ${fileRoot}_3dc_pass1.shiftAssignments \
    -shiftAssignments [$noe shiftAssignments] \
    -remarks $saRemarks

puts $saRemarks

initialPeakAnalysis \
    -pot $noe\
    -violCutoff 0.5 \
    -minLikelihood 0.9 \
    -remarksVariableName peakRemarks

puts $errorInfo

writeMarvinPeaks \
    -fileName  ${fileRoot}_3dc_pass1.peaks \
    -pot $noe \
    -remarks $peakRemarks

puts $peakRemarks

puts $errorInfo

exit

(note this isn't complete and debugged...)

>
> If you want to read NMR-STAR shift tables, add the line "package  
> require nmrstar" to the
> top of the initMatch* scripts.  Then change the call to  
> readPippShiftTable to readNmrStarShiftTable.
>
> In the call to createShiftAssignments, you can set the various  
> selections to exclude
> atoms that aren't observable in your particular spectrum (eg.,  
> backbone amides, if
> your 3dC was collected in D2O).
>

> Regarding the NOESY peak location table, it also doesn't have to be  
> in PIPP format.
> I can read Xeasy and nmrDraw's formats as well.  If you want to use  
> one of those formats,
> add a "package require xeasy" or "package require nmrdraw" line to  
> the top, and change
> the call to process3dCPippPeakTable to process3dCXeasyPeakTable or  
> process3dCNmrDrawPeakTable.
>
> You'll have to identify each column in the peak table, using the  
> flags to your process*PeakTable call,
> as illustrated in the CVN example.
>
> Then you'll have to set the peak folding/aliasing flags to match your  
> spectrum.  These flags are repeated
> for each call to match3d.  The relevant ones are
>
> -nonFoldedPeaksArePositive or -nonFoldedPeaksAreNegative
> -signChangesUponFoldingFromHeavyatomDimension
> -signChangesUponFoldingFromProtonDimension
> -signChangesUponFoldingToProtonDimension
> -shiftsAliasedAlongFromHeavyatomDimension or - 
> shiftsFoldedAlongFromHeavyatomDimension
> -shiftsAliasedAlongFromProtonDimension or - 
> shiftsFoldedAlongFromProtonDimension
> -shiftsAliasedAlongToProtonDimension or - 
> shiftsFoldedAlongToProtonDimension
>
> The meaning of these flags should be obvious.  The last three flags  
> are a bit more complex:
>
> -fromProtonSpectralRangePPM   [list -1.1 9.5]
> -fromHeavyatomSpectralRangePPM [list 60 120]
> -toProtonSpectralRangePPM [list -1.2 6.0]
>
> (The numbers in the above lines are just examples. )  The list of two  
> numbers after each flag define the chemical
> shifts around which the actual aliasings/foldings take place.  For  
> now, getting them right is left in your hands, unless
> you're using PIPP peak location tables, which include the necessary  
> information in their headers.  I'm working with
> Frank Delaglio to see if we can build something general-purpose that  
> could read that information out of nmrPipe
> spectra, but it's not there yet.
>
> If you don't get the spectral range flag values correct, marvin will  
> have a difficult time assigning peaks with folded/aliased
> positions.
>
I think I have this correct! but as explained in the comments to my code 
above I don't know if I need to allow for a 'picket fence error of 1/2 a 
digital resolution (I hope I don't)

> You'll need to change the names of the  output .peaks, 
> .shiftAssignments, and .exceptions files to something  you like.
>
> And if you have a reference structure, replace the name of  
> cvn_reference.pdb with the name of the PDB file for your
> reference structure.  If you don't have one (as is usually the case,  
> of course), just delete those lines.
>
> Finally, you'll need to update the sa_pass* and summarize_pass*  
> files.  The necessary changes are all just filename
> changes, which should be obvious.

indeed and I have produced generic versions of these as well

>
> I'm sure you'll run into more questions as you continue.  Ask away.
>
> --JK


I am afraid there are more! I am quite intereested in what the graphcs 
that are on the top of the cvn_3dc_pass1.peaks and 
cvn_3dc_pass1.shiftAssignments are what they tell me and what I should 
be looking out for

so here are some questions

1. what exceptions and explicitinverseexceptions
2. for the shiftAssignments Number of peak assignments per 
shiftAssignment  i take it this is the number of peaks assigned to each 
shift
3. for the peak assignments what are 'completeness of targets'
4. for peak are 'Differences between target from proton shifts and 
shiftAssignment values' the difference between the shift found in the 
peaks for assignments and the shift in the shiftLists
5. what are the good peak assignments at this stage?
6. what are 'Passing SA pair scores'
7. what is the network filter is it network anchoring ala petere guntert?



many thanks
gary

>
>
> On Jun 11, 2007, at 6:19 PM, gary thompson wrote:
>
>> some questions about marvin
>>
>> where should I start? do i need to alter much other than
>>
>> 1. alters the input peak and shifts  lists psf  files and file name  
>> roots
>> 2.  remove opening of reference pdb files e.g . delete
>>
>> readPDB -fileName ./cvn_reference.pdb
>>
>> and
>>
>> initialShiftAssignmentAnalysis \
>>     -pot $noe \
>>     -referenceStructureFile ./cvn_reference.pdb \
>>     -minLikelihood 0.9 \
>>
>> etc...
>>
>>
>> also
>> is there any more documentation on whats going on other than the  
>> source code, paper, slides on th web  and the cvn example?
>>
>>
>> regards and many thanks
>> gary
>>
>> n.b. what sort of hardware were the timings you reported (60  
>> processors 12 hours) made on?
>> _______________________________________________
>> Xplor-nih mailing list
>> Xplor-nih at nmr.cit.nih.gov
>> http://dcb.cit.nih.gov/mailman/listinfo/xplor-nih
>
>
> .
>


-- 
-------------------------------------------------------------------
Dr Gary Thompson
Astbury Centre for Structural Molecular Biology,
University of Leeds, Astbury Building,
Leeds, LS2 9JT, West-Yorkshire, UK             Tel. +44-113-3433024
email: garyt at bmb.leeds.ac.uk                   Fax  +44-113-2331407
-------------------------------------------------------------------


Reply via email to