[spctools-discuss] spectrast2tsv.py from Aebersold 2015 doesn't generate an output .csv or most of the programmed print messages

Steven Shuken Thu, 19 Oct 2017 16:07:47 -0700

Hello,

I'm a grad student following the Aebersold protocol on SWATH library 
building: 
https://www.nature.com/nprot/journal/v10/n3/full/nprot.2015.015.html
I'm on Step 37(B)(i) where the Anaconda script spectrast2rsv.py is run to 
extract the 6 most intense fragment ion peaks from each consensus spectrum 
to generate a SWATH library.


When I run the script with the recommended options, the script runs without 
any error messages, but doesn't produce any print messages after the intro 
(ending with "Output file name (default: appends _peakview.txt)"). The next 
print message is supposed to be "Labeling file :" just a few lines later, 
but that never appears.

"C:\TPPdata>python C:\ProgramData\Anaconda2\Scripts\spectrast2tsv.py -1 
350,2000
-s b,y -x 1,2 -o 6 -n 6 -p 0.05 -d -e -w swaths.txt -k openswath -a 
SpecLib_cons
_openswath.csv SpecLib_cons.sptxt

spectrast2tsv.py
---------------
This script is used as filter from spectraST files to swath input files.

Usage:
python spectrast2tsv.py [options] spectrast_file(s)
-h                  Display this help
-d                  Remove duplicate masses from labeling
-e                  Use theoretical mass
-f    fasta_file    Fasta file to relate peptides to their proteins (this 
is opt
ional).
-g    mass_modifs   List of allowed fragment mass modifications. Useful for 
phos
phorylation and neutral losses. Example: -g -79.97,-97.98,-17.03,-18.01
-i    labeling_file File containing the amino acid isotopic labeling mass 
shifts
. If this option is used, heavy transitions will be generated.
-k    output_key    Select the output provided. Keys available: openswath, 
peakv
iew. Default: peakview
-l    mass_limits   Lower and upper mass limits of fragment ions. Example: 
-l 40
0,2000
-m    mods_file     File with the modifications delta mass
-n    int           Max number of reported ions per peptide/z. Default: 20
-o    int           Min number of reported ions per peptide/z. Default: 3
-p    float         Maximum error allowed at the annotation of a fragment 
ion. D
efault: 0.05
-q    int            Number of processors to use (only for isoforms!). 
Default:
1
-s    ion_series    List of ion series to be used. Example: -s y,b
-t    time-scale    Options: minutes, seconds. Default: seconds.
-u     unimod-code    Use this unimod code as a switching modification. 
Useful f
or phosphorylations. Example: -u 21
-v                  Verbose mode.
-w    swaths_file   File containing the swath ranges. This is used to 
remove tra
nsitions with Q3 falling in the swath mass range. (line breaks in 
windows/unix f
ormat)
-x    allowed_frg_z Fragment ion charge states allowed. Default: 1,2
-y    UIS-order     When using a switching modification, this determines 
the UIS
 order to be calculated. If -1 is set, all transitions for each isoform 
will be
reported. Default : 2
-a    outfile       Output file name (default: appends _peakview.txt)


C:\TPPdata>"

I'm not very knowledgeable with Python so I hope this is an easy fix; maybe 
the issue arises from the age of the script?

Thanks,
Steven

-- 
You received this message because you are subscribed to the Google Groups 
"spctools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/spctools-discuss.
For more options, visit https://groups.google.com/d/optout.

#!C:\ProgramData\Anaconda2\python.exe
# -*- coding: utf-8  -*-
"""
=========================================================================
        msproteomicstools -- Mass Spectrometry Proteomics Tools
=========================================================================

Copyright (c) 2013, ETH Zurich
For a full list of authors, refer to the file AUTHORS.

This software is released under a three-clause BSD license:
 * Redistributions of source code must retain the above copyright
   notice, this list of conditions and the following disclaimer.
 * Redistributions in binary form must reproduce the above copyright
   notice, this list of conditions and the following disclaimer in the
   documentation and/or other materials provided with the distribution.
 * Neither the name of any author or any participating institution
   may be used to endorse or promote products derived from this software
   without specific prior written permission.
--------------------------------------------------------------------------
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL ANY OF THE AUTHORS OR THE CONTRIBUTING
INSTITUTIONS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--------------------------------------------------------------------------
$Maintainer: Pedro Navarro$
$Authors: Pedro Navarro$
--------------------------------------------------------------------------
"""

from __future__ import print_function
from __future__ import division
import sys
import os
import csv
import getopt
import multiprocessing
import time
import math
import sys
import numbers
from configobj import ConfigObj

from msproteomicstoolslib.data_structures.aminoacides     import Aminoacides
from msproteomicstoolslib.data_structures.modifications    import Modifications
from msproteomicstoolslib.format.ProteinDB                import ProteinDB  
import msproteomicstoolslib.format.speclib_db_lib            as speclib_db_lib  
#import msproteomicstoolslib.utils.logs.MultiProcessingLog     as MPlog 


def usage() :
    print("")
    print("spectrast2tsv.py")
    print("-" * 15)
    print("This script is used as filter from spectraST files to swath input files.")
    print("")
    print("Usage: ")
    print("python spectrast2tsv.py [options] spectrast_file(s)")
    print("-h                  Display this help")
    print("-d                  Remove duplicate masses from labeling")
    print("-e                  Use theoretical mass")
    print("-f    fasta_file    Fasta file to relate peptides to their proteins (this is optional).")
    print("-g    mass_modifs   List of allowed fragment mass modifications. Useful for phosphorylation and neutral losses. Example: -g -79.97,-97.98,-17.03,-18.01")
    print("-i    labeling_file File containing the amino acid isotopic labeling mass shifts. If this option is used, heavy transitions will be generated.")
    print("-k    output_key    Select the output provided. Keys available: openswath, peakview. Default: peakview")
    print("-l    mass_limits   Lower and upper mass limits of fragment ions. Example: -l 400,2000")
    print("-m    mods_file     File with the modifications delta mass")
    print("-n    int           Max number of reported ions per peptide/z. Default: 20")
    print("-o    int           Min number of reported ions per peptide/z. Default: 3")
    print("-p    float         Maximum error allowed at the annotation of a fragment ion. Default: 0.05")
    print("-q    int            Number of processors to use (only for isoforms!). Default: 1")
    print("-s    ion_series    List of ion series to be used. Example: -s y,b")
    print("-t    time-scale    Options: minutes, seconds. Default: seconds.")
    print("-u     unimod-code    Use this unimod code as a switching modification. Useful for phosphorylations. Example: -u 21")
    print("-v                  Verbose mode.")
    print("-w    swaths_file   File containing the swath ranges. This is used to remove transitions with Q3 falling in the swath mass range. (line breaks in windows/unix format)")
    print("-x    allowed_frg_z Fragment ion charge states allowed. Default: 1,2")
    print("-y    UIS-order     When using a switching modification, this determines the UIS order to be calculated. If -1 is set, all transitions for each isoform will be reported. Default : 2")
    print("-a    outfile       Output file name (default: appends _peakview.txt)")
    print("")

def writeStandardConfigFile(filename):

    config = ConfigObj()

    config.filename = filename

    # {'M': {'fv': ('-1.3094', '0'), 'sr': ('-13.0000', '0'), 'is': ('25.6100', '0'), 'pb': ('0.9500', '0'), 'ex': ('0.3476', '0'), 'hs': ('30.1900', '0'), 'sc': ('17.1900', '0')}}
    # {'S': {'pb': ('0.9774', '0'), 'xc': ('4.2610', '0'), 'dc': ('1.0000', '0'), 'fv': ('1.3848', '0')}}


    #Section SEQUEST
    config['SEQUEST'] = {}
    config['SEQUEST']['id'] = 'S'
    config['SEQUEST']['pb'] = 0.5
    config['SEQUEST']['xc'] = 2.5
    config['SEQUEST']['dc'] = 0
    config['SEQUEST']['fv'] = 0


    #Section MASCOT
    config['MASCOT'] = {}
    config['MASCOT']['id'] = 'M'
    config['MASCOT']['fv'] = 0
    config['MASCOT']['sr'] = 0
    config['MASCOT']['is'] = 0
    config['MASCOT']['pb'] = 0
    config['MASCOT']['ex'] = 0
    config['MASCOT']['hs'] = 0
    config['MASCOT']['sc'] = 0

    config.write()

class Label(object):

    def __init__(self, AA, deltamass, name):
        self.AA = AA
        self.deltamass = deltamass
        self.name = name

def readLabelingFile(labeling_file) :
    #Returns a dictionary of amino-acides (including also C-Term and N-Term) with the mass shifts due to an isotope labeling experiment.
    labeling = {}

    cur_file = open (labeling_file,"r")
    for line in cur_file:
        # Skip empty lines or comments
        if len(line.strip()) == 0 : continue
        if line[0] == '#' : continue

        labelname = ''
        aminoacid = ''
        mass_shift = 0.0

        sline = line.strip().split('\t')
        if len(sline) >= 2 :
            aminoacid  = sline[0]
            mass_shift = float(sline[1])
            if len(sline) >= 3:
                labelname = sline[2]

        labeling[aminoacid] = Label(aminoacid, mass_shift, labelname)

    cur_file.close()

    print("Labeling file :" , labeling)
    return labeling

def read_swathsfile(swathsfile) :
    swaths = []
    with open(swathsfile) as infile:
        for count, row in enumerate(infile):
            if not row:
                print("Swaths file contains %s swaths" % count)
                break

            if row[0] == '#':
                continue

            srow = row.split("\t")
            if len(srow) != 2:
                print("Error when reading swaths file. Are there more than two values in the same row?")
                sys.exit(2)

            try:
                swaths.append((float(srow[0]), float(srow[1])))
            except ValueError as e:
                print("Error while reading swaths file: Some value(s) "
                      "are not numbers: " + str(e))
                sys.exit(2)

    return swaths
    

def is_Q3_in_swath_range(q1 , q3 , swaths):
    #determine which is the swath
    swath = []
    for sw in swaths :
        if q1 >= sw[0] and q1 <= sw[1] :
            swath = sw

    #If there is no swath --> return True, so that peptides out of Q1 experiment limits are removed
    if len(swath) == 0 : return True

    if ( q3 >= swath[0] and q3<=swath[1] ) : return True
    else : return False

def removeDuplicates(seq, idfun=None):
    # order preserving
    if idfun is None:
        def idfun(x): return x

    seen = {}
    result = []

    for item in seq:
        marker = idfun(item)
        if marker in seen: continue

        seen[marker] = 1
        result.append(item)

    return result


def removeSimilarDuplicates(seq, tolerance, idfun=None):
    # order preserving
    if idfun is None:
        def idfun(x): return x

    seen = set()
    result = []

    for item in seq:
        marker = idfun(item)
        if not isinstance(marker, numbers.Real):
            raise ValueError("comparison values are supposed to be numbers!")

        if any(abs(marker - cp) <= tolerance for cp in seen):
            continue

        seen.add(marker)
        result.append(item)

    return result


def filterBySearchEngineParams(searchEngineInfo, parameter_thresholds) :
    spectrumOK  = True
    #print parameter_thresholds
    #print searchEngineInfo

    if not 'id' in parameter_thresholds :
        return True

    if not parameter_thresholds['id'] in searchEngineInfo :
        return True

    for parameter,threshold in parameter_thresholds.items() :
        if parameter in ('id') : continue
        if parameter in searchEngineInfo[parameter_thresholds['id']] :
            if float( searchEngineInfo[parameter_thresholds['id']][parameter][0] ) < float(threshold) :
                spectrumOK = False

    return spectrumOK


def get_iso_species(sptxtfile, switchingModification, modificationsLib, aaLib = None) :
    '''It reads a spectraST file, and gets a dictionary of peptides containing a dictionary of isoforms of the peptide. 
    The values of the latter dictionary are the file offset in which the related species are located at the spectraST file. If
    the specie is not located (it has only been theoretically defined), the offset value is -1. 
    Example : iso_species = { 'PEPTSISE_(UniMod:21)' : {'PEPTS(UniMod:21)ISE' : 335542366 , 'PEPTSIS(UniMod:21)E' : -1 } },
    where PEPTSISE_(UniMod:21) means that the peptide contains only one phosphorilation. Two phosphorilations are reported as 
    PEPTSISE_(UniMod:21)(UniMod:21)
    '''
    iso_species = {}
    library_key = 99
    spectrastlib = speclib_db_lib.Library(library_key)
    num_spectrum = 0
    offset = spectrastlib.get_first_offset(sptxtfile)
    last_offset = -100
    unimodcode = ''
    switchingMod = modificationsLib.mods_unimods[switchingModification]
    unimodcode = switchingMod.getcode('unimod')[1:]

    while ( offset - last_offset > 10) :
        last_offset = offset
        offset , spectrum = spectrastlib.read_sptxt_with_offset(sptxtfile,offset)
        sequence = spectrum.name.split('/')[0]
        z_parent = float(spectrum.name.split('/')[1])
        #Get the possible modification switching sites. If there is only one possibility, discard the peptide for the dictionary
        pep = modificationsLib.translateModificationsFromSequence(sequence, 'TPP', aaLib = aaLib)
        isobaric_peptides = []
        if len(pep.modifications) == 0 : continue
        #Count the number of switching mods
        num_of_swmods = 0
        for pos,mod in pep.modifications.items() :
            if mod.unimodAccession == switchingModification : num_of_swmods += 1
        #Count the number of modification sites in the peptide
        aaTargets = [ m.aminoacid for m in modificationsLib.list if m.unimodAccession == switchingModification ]
        aaList = pep._getAminoacidList()
        num_of_sites = 0
        for aa in aaTargets :
            if aa in aaList : num_of_sites += aaList[aa]
        if num_of_sites < 2 : continue 
        #If the peptide+#switchingMods is not present in the dictionary, initialize its key with all the possible isobaric species.
        #Otherwise, find the corresponding specie in the dictionary, and change its offset value.
        specie_family = pep.sequence + "_%s" % (num_of_swmods * unimodcode)
        specie = pep.getSequenceWithMods('unimod')
        if specie_family in iso_species : 
            if specie in iso_species[specie_family] : iso_species[specie_family][specie] = last_offset
        else :
            iso_species[specie_family] = {}
            #Calculate here all the possible isobaric peptides, and initialize the value in the dictionary
            isobaric_peptides = pep.calIsoforms(switchingMod,modificationsLib)    
            for isopep in isobaric_peptides : 
                iso_species[specie_family][isopep.getSequenceWithMods('unimod')] = -1
                if isopep.getSequenceWithMods('unimod') == specie : iso_species[specie_family][specie] = last_offset 
    return iso_species

def isoform_writer(isobaric_species, lock, sptxtfile, modLibrary, aaLib, searchEngineconfig, masslimits, switchingModification_, 
                writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key, useMinutes, precision):
    filteredtransitions = []
    library_key = 99
    spectrastlib = speclib_db_lib.Library(library_key)
    switchingModification = modLibrary.mods_unimods[switchingModification_]
    
    pepfamily_cnt = 0
    precursor_cnt = 0   # To-Do : This maybe I should pass to the worker the indexes, so that we don't repeat indexes. 
    transition_cnt = 0  # To-Do : This maybe I should pass to the worker the indexes, so that we don't repeat indexes.
    fut_progress = 0.05
    for pepfamily , isoforms in isobaric_species.items() :
        pepfamily_cnt += 1
        progress = pepfamily_cnt / len(isobaric_species)
        if progress >= fut_progress and progress > 0.005 : 
            print("process id:", os.getpid() , ", ", fut_progress*100 ,  "% of peptide families written.")
            fut_progress += 0.05 
        
        #print pepfamily, isoforms
        precursor_cnt  += 1
        protein_code1 = pepfamily
        protein_desc  = pepfamily
        #for isoform, offset in isoforms.iteritems() :
        #    print "%s : %s"  % (isoform , offset)
        for isoform_,offset in isoforms.items() :
            spectrum = None
            isoform  = modLibrary.translateModificationsFromSequence(isoform_, 'unimod', aaLib = aaLib)
            
            if offset > 0 :  
                if lock : lock.acquire()
                _ , spectrum = spectrastlib.read_sptxt_with_offset(sptxtfile,offset)
                time.sleep(0.001)
                if lock : lock.release()
            #Get the shared and unshared ions of this isoform to all its family
            otherIsoforms = []
            for isof in isoforms:
                if isof != isoform_:
                    otherIsoforms.append(modLibrary.translateModificationsFromSequence(isof, 'unimod', aaLib=aaLib))
            
            shared, unshared = isoform.comparePeptideFragments(otherIsoforms, ['y','b'], precision = 1e-5)
        
            #if there is a spectrum for the isoform, use it. Otherwise, use dummy data
            z_parent = 2  #To-Do: Better if we'd take the most common parental charge among the family
            sequence = isoform.getSequenceWithMods('TPP') # spectrum.name.split('/')[0]
            iRT_experimental = -100000.0
            RT_experimental = -100000.0
            searchenginefiltered = False
            peaks = []
            if spectrum : 
                z_parent = float(spectrum.name.split('/')[1])
                if spectrum.RetTime_detected:
                    RT_experimental = spectrum.RetTime / 60.0   #PeakView expect minutes, and spectraST reports seconds.
                if spectrum.iRT_detected:
                    iRT_experimental = spectrum.iRT / 60.0   #PeakView expect minutes, and spectraST reports seconds.
                else:
                    iRT_experimental = RT_experimental
                if not useMinutes : iRT_experimental = iRT_experimental * 60
                if not useMinutes : RT_experimental = RT_experimental * 60
                try :
                    for searchengine in searchEngineconfig :
                        if not filterBySearchEngineParams(spectrum.searchEngineInfo, searchEngineconfig[searchengine] ) :
                            searchenginefiltered = True
                            continue
                except AttributeError :
                    pass
                peaks = spectrum.get_peaks()
            precursorMZ = isoform.getMZ(z_parent, label ='')
            

            if searchenginefiltered : peaks = []
            
            
            for (frg_serie,frg_nr,frg_z,gainloss,fragment_mz) in unshared :
                #Check whether we have the fragment in the spectrum    
                rel_intensity = 1 #Dummy value in case there's no spectrum
                is_in_spectrum = False
                for peak in peaks :
                    ''' These filters are not useful in this use case, since they will be filtered in a previous step
                    if peak.is_unknown : continue
                    if peak.frg_is_isotope    : continue
                    if peak.frg_z not in frgchargestate : continue
                    if hasattr(peak, 'mass_error') :
                        if abs(peak.mass_error) > precision : continue
                    '''
                    if abs(float(peak.peak) - fragment_mz) < precision : 
                        is_in_spectrum = True 
                        rel_intensity = float(peak.intensity)
                        break
                #print isoform.getSequenceWithMods('unimod') , frg_serie, frg_nr, frg_z, fragment_mz, is_in_spectrum, rel_intensity
                #Filter by mass range
                if fragment_mz < masslimits[0] : continue
                if fragment_mz > masslimits[1] : continue
                    
                code = 'ProteinPilot'
                if key == 'openswath'     : code = 'unimod'
                if key == 'peakview'     : code = 'ProteinPilot'

                transition = []
                transition_cnt += 1
                if key == 'peakview' :
                    transition = [ precursorMZ , fragment_mz , iRT_experimental , protein_desc , 'light' ,
                                    rel_intensity , isoform.sequence , isoform.getSequenceWithMods(code) , int(z_parent) ,
                                    frg_serie , frg_z , frg_nr , iRT_experimental , protein_code1 , 'FALSE']
                if key == 'openswath' :
                    transition = [precursorMZ, fragment_mz, iRT_experimental, "%s_%s_%s" % (transition_cnt, isoform.getSequenceWithMods(code), int(z_parent)), '-1',
                            rel_intensity, "%s_%s_%s" % (precursor_cnt, isoform.getSequenceWithMods(code), int(z_parent)), 0, isoform.sequence, protein_desc, 
                            "%s_%s_%s" %(frg_serie, frg_z, frg_nr), isoform.getSequenceWithMods(code), int(z_parent), 'light', protein_code1, frg_serie, frg_z, frg_nr ]
                filteredtransitions.append(transition)
            
            #For each isoform, filtering and write
            do_filtering_and_write(filteredtransitions, writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, 0.02, lock = lock)
            filteredtransitions = []
            


def mp_isoform_writer(isobaric_species, sptxtfile, modLibrary, aaLib, searchEngineconfig, masslimits, switchingModification_, 
                writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key,useMinutes, precision, nprocs):
    def worker(pepfamilies, lock, sptxtfile, modLibrary, aaLib, searchEngineconfig, masslimits, switchingModification_, 
                writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key, useMinutes, precision):
        """ The worker function, invoked in a process. 'nums' is a
            list of pep families to process.
        """
        #outdict = {}
        #To-Do: This worker is absolutely absurd --> REDEFINE!!!!
        isoform_writer(pepfamilies, lock, sptxtfile, modLibrary, aaLib, searchEngineconfig, masslimits, switchingModification_, 
                writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key, useMinutes, precision)
        #out_q.put(outdict)

    # Each process will get 'chunksize' nums and a queue to put his out
    # dict into
    #out_q = multiprocessing.Queue()
    chunksize = int(math.ceil(len(isobaric_species) / float(nprocs)))
    print("%s peptide families divided into %s chunks of %s" % (len(isobaric_species) , nprocs, chunksize) )
    procs = []
    lock = multiprocessing.Lock()
    #lock = None
    
    dict_chunks = {}
    last_chunk = 0
    dict_chunks[last_chunk] = {}
    for idx, (pepf, value) in enumerate(isobaric_species.items()) : #divide the dictionary into nprocs different dictionaries
        if idx // chunksize > last_chunk:
            last_chunk += 1
            dict_chunks[last_chunk] = {}
        dict_chunks[last_chunk][pepf] = value

    for i in dict_chunks.values():
        p = multiprocessing.Process(
                target=worker,
                args=(i, lock, sptxtfile, modLibrary, aaLib, 
                    searchEngineconfig, masslimits, switchingModification_, 
                writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key, useMinutes, precision))
        procs.append(p)
        p.start()
        #out_q.put(p)

    # Collect all results into a single result dict. We know how many dicts
    # with results to expect.
    #resultdict = {}
    #for i in range(nprocs):
    #    resultdict.update(out_q.get())

    # Wait for all worker processes to finish
    for p in procs:
        p.join()

    return 0


def transitions_isobaric_peptides(isobaric_species , sptxtfile, switchingModification_, modLibrary,UISorder, ionseries, fragmentlossgains, frg_z_list,  
                        useMinutes, searchEngineconfig, masslimits, key, precision,  
                        writer,  labeling, removeDuplicatesInHeavy, swaths, mintransitions, maxtransitions, massTolerance, aaLib = None, nprocs = 0, verbose = False) :
    '''
    This returns the (specific!) transitions of all the isoforms of the peptides present in the spectraST library. If there is no
    spectraST reference for an isoform, transitions are chosen randomly, respecting the established filter rules, and a dummy 
    relative intensity value is given.
    '''
    library_key = 99
    spectrastlib = speclib_db_lib.Library(library_key)
    switchingModification = modLibrary.mods_unimods[switchingModification_]
    
    filteredtransitions = []
    transition_cnt = 0
    precursor_cnt  = 0
    
    print("A total of %s peptide families have been found."  % len(isobaric_species) )
    
    pepfamily_cnt = 0
    fut_progress = 0.01

    #Multiprocess this part of the code (only if multiprocess option has been specified)
    if nprocs > 0 :
        mp_isoform_writer(isobaric_species, sptxtfile, modLibrary, aaLib, searchEngineconfig, masslimits, switchingModification_, 
                    writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, key, useMinutes, precision, nprocs)
        print("done!")
        sys.exit()
    
    for pepfamily , isoforms in isobaric_species.items() :
        #print pepfamily, [ isof for isof in isoforms]
        pepfamily_cnt += 1
        progress = pepfamily_cnt / len(isobaric_species)
        if progress >= fut_progress and progress > 0.005 : 
            print(fut_progress*100 ,  "% of peptide families written.")
            fut_progress += 0.01 
        
        #print pepfamily, isoforms
        precursor_cnt  += 1
        protein_code1 = pepfamily
        protein_desc  = pepfamily
        #for isoform, offset in isoforms.iteritems() :
        #    print "%s : %s"  % (isoform , offset)
        for isoform_,offset in isoforms.items() :
            spectrum = None
            isoform  = modLibrary.translateModificationsFromSequence(isoform_, 'unimod', aaLib = aaLib)
            
            if offset > 0:
                _, spectrum = spectrastlib.read_sptxt_with_offset(sptxtfile,offset)
            #Get the shared and unshared ions of this isoform to all its family
            otherIsoforms = []
            for isof in isoforms:
                if isof != isoform_:
                    otherIsoforms.append(modLibrary.translateModificationsFromSequence(isof, 'unimod', aaLib = aaLib))
            
            uis_list , uis_annotated_list = isoform.cal_UIS(otherIsoforms, UISorder = UISorder,  ionseries = ionseries, 
                                            fragmentlossgains = fragmentlossgains, precision = massTolerance, frg_z_list = frg_z_list, mass_limits = masslimits)
            
            #if there is a spectrum for the isoform, use it. Otherwise, use dummy data
            z_parent = 2  #To-Do: Better if we'd take the most common parental charge among the family
            sequence = isoform.getSequenceWithMods('TPP') # spectrum.name.split('/')[0]
            iRT_experimental = -100000.0
            RT_experimental = -100000.0
            searchenginefiltered = False
            peaks = []
            if spectrum : 
                z_parent = float(spectrum.name.split('/')[1])
                if spectrum.RetTime_detected:
                    RT_experimental = spectrum.RetTime / 60.0   #PeakView expect minutes, and spectraST reports seconds.
                if spectrum.iRT_detected:
                    iRT_experimental = spectrum.iRT / 60.0   #PeakView expect minutes, and spectraST reports seconds.
                else:
                    iRT_experimental = RT_experimental
                if not useMinutes : iRT_experimental = iRT_experimental * 60
                if not useMinutes : RT_experimental = RT_experimental * 60
                try :
                    for searchengine in searchEngineconfig :
                        if not filterBySearchEngineParams(spectrum.searchEngineInfo, searchEngineconfig[searchengine] ) :
                            searchenginefiltered = True
                            continue
                except AttributeError :
                    pass
                peaks = spectrum.get_peaks()
            precursorMZ = isoform.getMZ(z_parent, label ='')
            

            if searchenginefiltered : peaks = []
            #print uis_annotated_list
            for uis_masses, uis in uis_annotated_list.items() :
                for uis_unit in uis :
                    # uis_annotated_list = [('y', 6, 1, 0, 646.3406318939999), ('y', 8, 1, 0, 928.365933736)]
                    #print uis_unit
                    uis_order   = len(uis)
                    uis_serie   = uis_unit[0]
                    uis_nr      = uis_unit[1]
                    uis_frg_z   = uis_unit[2]
                    uis_lossgain= uis_unit[3]
                    uis_mass    = uis_unit[4]
                    rel_intensity = 1 #Dummy value in case there's no spectrum
                    is_in_spectrum = False
                    for peak in peaks : 
                        if abs(float(peak.peak) - uis_mass) < precision:
                            is_in_spectrum = True 
                            rel_intensity = float(peak.intensity)
                            break
                        
                    
                    if uis_mass < masslimits[0]:
                        continue   # I don't think this is necessary,
                    if uis_mass > masslimits[1]:
                        continue   # since UIS are already mass limited, but...
               
                    code = 'ProteinPilot'
                    if key == 'openswath':
                        code = 'unimod'
                    if key == 'peakview':
                        code = 'ProteinPilot'
    
                    transition = []
                    transition_cnt += 1
                    if key == 'peakview':
                        if abs(uis_lossgain) > 0.05:
                            uis_serie = uis_serie + str(int(round(uis_lossgain)))
                        transition = [
                            precursorMZ, uis_mass, iRT_experimental,
                            protein_desc, 'light', rel_intensity,
                            isoform.sequence, isoform.getSequenceWithMods(code),
                            int(z_parent), uis_serie, uis_frg_z, uis_nr,
                            iRT_experimental, protein_code1, 'FALSE',
                            uis_order, uis_masses
                        ]
                    if key == 'openswath' :
                        transition = [precursorMZ, uis_mass, iRT_experimental, "%s_%s_%s" % (transition_cnt, isoform.getSequenceWithMods(code), int(z_parent)), '-1',
                                rel_intensity, "%s_%s_%s" % (precursor_cnt, isoform.getSequenceWithMods(code), int(z_parent)), 0, isoform.sequence, protein_desc, 
                                "%s_%s_%s" %(uis_serie, uis_frg_z, uis_nr), isoform.getSequenceWithMods(code), int(z_parent), 'light', protein_code1, uis_serie, uis_frg_z, uis_nr, uis_order, uis_masses ]
                    filteredtransitions.append(transition)
            
            #For each isoform, filtering and write
            if verbose and ( len(filteredtransitions) > maxtransitions or len(filteredtransitions) ) < mintransitions : print(mintransitions, maxtransitions, len(filteredtransitions))
            do_filtering_and_write(filteredtransitions, writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, 0.02, verbose = verbose)
            filteredtransitions = []

    #print  filteredtransitions
    return 0

def main(argv) :

    fastafile        = ''
    swathsfile        = ''
    masslimits        = [0,30000]
    ionseries        = []
    useexactmass    = False
    modificationsfile = ''
    maxtransitions    = 20
    mintransitions    = 3
    frgchargestate    = [1,2]
    precision        = 0.05
    searchEngineconfig = []
    gain_or_loss_mz_txt = []
    gain_or_loss_mz = []
    labeling = {}
    codes = ["openswath", "peakview"]
    removeDuplicatesInHeavy = False
    useMinutes = False
    keys            = ['openswath','peakview']
    key                = 'peakview'
    outputfile = None
    switchingModification = None
    aaLib = Aminoacides()
    nprocs = 0
    verbose = False
    UISorder = 2
    
    csv_headers_peakview =     [    'Q1', 'Q3', 'RT_detected', 'protein_name', 'isotype',
                     'relative_intensity', 'stripped_sequence', 'modification_sequence', 'prec_z',
                     'frg_type', 'frg_z', 'frg_nr', 'iRT', 'uniprot_id', 'decoy' , 'N', 'confidence', 'shared'
                     ]
    csv_headers_openswath = ['PrecursorMz', 'ProductMz', 'Tr_recalibrated', 'transition_name', 'CE',
                            'LibraryIntensity', 'transition_group_id', 'decoy', 'PeptideSequence', 'ProteinName', 
                            'Annotation', 'FullUniModPeptideName', 
                            'PrecursorCharge', 'PeptideGroupLabel',   'UniprotID', 'FragmentType', 'FragmentCharge',
                            'FragmentSeriesNumber', 'LabelType']
    
    csv_headers = csv_headers_peakview

    swaths = []
    #swaths =[(400,425),(424,450),(449,475),(474,500),(499,525),
    #         (524,550),(549,575),(574,600),(599,625),(624,650),
    #         (649,675),(674,700),(699,725),(724,750),(749,775),
    #         (774,800),(799,825),(824,850),(849,875),(874,900),
    #         (899,925),(924,950),(949,975),(974,1000),(999,1025),
    #         (1024,1050),(1049,1075),(1074,1100),(1099,1125),(1124,1150),
    #         (1149,1175),(1174,1200)]

    #Get options
    try:
        opts, _ = getopt.getopt(argv, "hf:l:s:en:m:o:w:c:z:g:i:dx:p:t:k:a:u:q:vy:",["help","fasta","limits","series","exact","max","modifications","min","swaths","config","writeconfig","gain","isot-labeling","remove-duplicates","charge","precision","timescale","key","output","switchingmod","nprocs","verbose","UISorder"])

    except getopt.GetoptError:
        usage()
        sys.exit(2)

    argsUsed = 0
    for opt,arg in opts:
        if opt in ("-h","--help") :
            usage()
            sys.exit()
        if opt in ("-a", "--output") :
            outputfile = arg
            argsUsed += 2
        if opt in ("-f","--fasta") :
            fastafile = arg
            argsUsed += 2
        if opt in ("-m","--modifications") :
            modificationsfile = arg
            argsUsed += 2
        if opt in ("-w","--swaths") :
            print("swathsfile : " , arg)
            swathsfile = arg
            argsUsed += 2
        if opt in ("-l","--limits") :
            masslimits = []
            masslimits_txt = arg.split(',')
            try :
                for val in masslimits_txt : masslimits.append( float(val) )
            except :
                print("Mass range limits are not a number! Please, try again.")
                sys.exit(2)
            argsUsed += 2
        if opt in ("-s","--series") :
            ionseries = arg
            argsUsed += 2
        if opt in ("-e","--exact") :
            useexactmass = True
            argsUsed += 1
        if opt in ("-n","--max") :
            try :
                maxtransitions = int(arg)
            except :
                print("Max number of transitions is not an integer! Please, try again.")
                sys.exit(2)
            argsUsed += 2
        if opt in ("-o","--min") :
            try :
                mintransitions = int(arg)
            except :
                print("Min number of transitions is not an integer! Please, try again.")
            argsUsed += 2
        if opt in ("-c","--config") :
            searchEngineconfig = ConfigObj(arg)
            argsUsed+=2
        if opt in ("-z","--writeconfig") :
            writeStandardConfigFile(arg)
            sys.exit()
        if opt in ("-g","--gain") :
            gain_or_loss_mz_txt = arg.split(',')
            for obj in gain_or_loss_mz_txt :
                gain_or_loss_mz.append(float(obj))
            argsUsed += 2
        if opt in("-i","--isot-labeling") :
            labeling = readLabelingFile(arg)
            argsUsed += 2
        if opt in("-d","--remove-duplicates") :
            removeDuplicatesInHeavy = True
            argsUsed += 1
        if opt in("-x","--charge") :
            frgchargestate = []
            frgchargestate_txt = arg.split(',')
            for obj in frgchargestate_txt :
                frgchargestate.append(int(obj))
            argsUsed += 2
        if opt in("-p","--precision") :
            precision = float(arg)
            argsUsed += 2
        if opt in("-t","--timescale") :
            argsUsed += 2
            if arg in ["minutes","seconds"] :
                if arg in ["minutes"] : useMinutes = True
            else :
                print("Choose a right time-scale. Options are: minutes, seconds")
                sys.exit(10) 
        if opt in ('-k', 'key') :
            if arg not in codes :
                print("Error: key option is not valid! key : " , arg)
                print("Valid options are : " , keys)
                sys.exit(2)
            key = arg
            if key == 'openswath'     : csv_headers = csv_headers_openswath
            if key == 'peakview'    : csv_headers = csv_headers_peakview
            argsUsed += 2
        if opt in ('-u', 'switchingmod') :
            argsUsed += 2
            switchingModification = int(arg)
        if opt in ('-q', '--nprocs') :
            argsUsed += 2
            nprocs = int(arg)
        if opt in ('-v','--verbose') :
            argsUsed += 1
            verbose = True
        if opt in ('-y','--UISorder') :
            argsUsed += 2
            UISorder = int(arg)
            

    print("Masslimits:",masslimits)

    if mintransitions > maxtransitions :
        print("This might seem a bit fool, but... You can't select a minimum number of transitions higher than the maximum!! ")
        print("Min : " , mintransitions , " Max :" , maxtransitions)
        sys.exit(2)


    sptxtfiles = argv[argsUsed:]
    if len(sptxtfiles) == 0:
        print("No input files given")
        sys.exit(2)
    
    #If a modifications file is provided, update the Modifications
    modificationsLib = Modifications()     #None
    if len(modificationsfile) > 0 :
        modificationsLib.readModificationsFile(modificationsfile)
    print("Modifications used : ", list(modificationsLib.mods_TPPcode.keys()))
    
    
    #If a fasta file is provided, read and store it into a dictionary
    
    proteins = None
    if len(fastafile) > 0 :
        proteins = ProteinDB()
        print("Reading fasta file :" , fastafile)
        proteins.readFasta(fastafile)

    protein_cnt = 1
    protein_index = {}

    #Read swaths file (if provided)
    if swathsfile != '' :
        swaths = read_swathsfile(swathsfile)
    else:
        #print "Using default swath windows",swaths
        print("No swath windows set.")

    for sptxtfile in sptxtfiles :
        print("Reading : " , sptxtfile)
        if not os.path.exists(sptxtfile):
            print("The file: %s does not exist!" % sptxtfile)
            sys.exit(2)

        if outputfile is None:
            peakviewfilename = sptxtfile[:-6] + "_peakview.txt"
        else:
            peakviewfilename = outputfile
        try :
            writer = csv.writer(open(peakviewfilename,'w'), dialect='excel-tab')
        except :
            print("something went wrong while trying to write the file :" , peakviewfilename)
            sys.exit(1)

        #write the headers
        if switchingModification : 
            csv_headers.extend(["UIS_order", "UIS_mass_list"])
        writer.writerow( csv_headers )

        #If a switching modification has been defined, a dictionary containing isobaric species must be created
        if switchingModification :
            #Check whether the switching modification is actually present in the modifications lib.
            if switchingModification not in modificationsLib.mods_unimods :
                raise Exception("Error: the switching modification given by the user is not in the modifications library!")
            switchingMod = modificationsLib.mods_unimods[switchingModification]
            isobaric_species = get_iso_species(sptxtfile, switchingModification, modificationsLib, aaLib = aaLib)
            if 0 not in gain_or_loss_mz : gain_or_loss_mz.append(0.0)
            transitions_isobaric_peptides(isobaric_species , sptxtfile, switchingModification, modificationsLib, UISorder, ionseries, gain_or_loss_mz, frgchargestate,
                        useMinutes, searchEngineconfig, masslimits, key, precision, writer,  labeling, removeDuplicatesInHeavy, swaths,mintransitions, maxtransitions, 0.02, aaLib = aaLib, nprocs = nprocs, verbose = verbose)
            
            print("done!")
            sys.exit()

        filteredtransitions = []

        library_key = 99
        spectrastlib = speclib_db_lib.Library(library_key)
        num_spectrum = 0
        offset = spectrastlib.get_first_offset(sptxtfile)
        last_offset = -100
        modification_code = 'TPP'
        
        transition_cnt = 0
        precursor_cnt = 0
        
        while ( offset - last_offset > 10) :
            last_offset = offset
            offset , spectrum = spectrastlib.read_sptxt_with_offset(sptxtfile,offset)

            #for property, value in vars(spectrum).iteritems():
            #    if property    in ['compress_spectra' ] : continue
            #    print property, ": ", value
            #sys.exit()

            sequence = spectrum.name.split('/')[0]
            z_parent = float(spectrum.name.split('/')[1])

            #print sequence, z_parent

            #Declare the peptide
            pep = modificationsLib.translateModificationsFromSequence(sequence, modification_code, aaLib = aaLib)
            
            iRT_experimental = 0.0
            RT_experimental = 0.0
            if spectrum.RetTime_detected:
                RT_experimental = spectrum.RetTime / 60.0   #PeakView expect minutes, and spectraST reports seconds.
            if spectrum.iRT_detected:
                iRT_experimental = spectrum.iRT / 60.0   #PeakView expect minutes, and spectraST reports seconds.
            else:
                iRT_experimental = RT_experimental

            if not useMinutes : iRT_experimental = iRT_experimental * 60
            if not useMinutes : RT_experimental = RT_experimental * 60


            ###only if fasta file set
            spec_proteins = []
            if proteins : spec_proteins = proteins.get_proteins_containing_peptide(pep.sequence)

            protein_code1 = ''
            protein_desc  = ''

            for prot in spec_proteins :
                protein_code1    += prot.code1
                protein_code1    += ','
                protein_desc    += prot.description
                protein_desc    += ','

            if len(protein_code1) > 0 : protein_code1 = protein_code1[:-1]
            if len(protein_desc) > 0 :  protein_desc  = protein_desc[:-1]

            if len(protein_code1) == 0 :
                if hasattr(spectrum, 'protein_ac') : protein_code1 = spectrum.protein_ac
                else : protein_code1 = 'unknown'
            if len(protein_desc)  == 0 :
                if hasattr(spectrum, 'protein_ac') : protein_desc = spectrum.protein_ac
                else : protein_desc  = 'unknown'
            ###endfasta
            
            
            num_spectrum = num_spectrum +1
            if (num_spectrum % 1000 == 0) : print("spectra processed: %s" % num_spectrum)

            precursorMZ = spectrum.precursorMZ
            if useexactmass : #calculate Q1 and Q3 mass/charge values
                precursorMZ = pep.getMZ(z_parent , label = '')

            searchenginefiltered = False
            try :
                for searchengine in searchEngineconfig :
                    if not filterBySearchEngineParams(spectrum.searchEngineInfo, searchEngineconfig[searchengine] ) :
                        searchenginefiltered = True
                        continue
            except AttributeError :
                pass

            peaks = spectrum.get_peaks()
            if searchenginefiltered : peaks = []

            filteredtransitions = []
            precursor_cnt += 1
            for peak in peaks :
                if peak.is_unknown : continue
                if peak.frg_is_isotope    : continue
                if peak.frg_z not in frgchargestate : continue
                if hasattr(peak, 'mass_error') :
                    if abs(peak.mass_error) > precision : continue
                #If exact mass selected, calculate the fragment mass, otherwise keep "experimental" mass
                fragment_mz = float(peak.peak)
                if useexactmass :
                    fragment_mz  = pep.getMZfragment(peak.frg_serie , peak.frg_nr , peak.frg_z , label = '')

                #If ion series were specified by the user, filter those not matching user preferences.
                if len(ionseries) > 0 :
                    if peak.frg_serie not in ionseries : continue

                #Filter by mass range
                if fragment_mz < masslimits[0] : continue
                if fragment_mz > masslimits[1] : continue

                #Filter fragment losses and gains, isotopes...
                if peak.is_frg_loss :
                    objfound = False
                    for mz in gain_or_loss_mz :
                        #print mz, peak.frg_loss, abs(mz + peak.frg_loss[0] )
                        if abs(mz + peak.frg_loss[0]) < 0.05 :
                            objfound = True
                            if useexactmass : fragment_mz += mz / peak.frg_z
                            peak.frg_serie = peak.frg_serie + str(int(round(mz)))
                    if not objfound : continue
                if peak.is_frg_gain :
                    objfound = False
                    for mz in gain_or_loss_mz :
                        if abs(mz - peak.frg_gain[0]) < 0.05 :
                            objfound = True
                            if useexactmass : fragment_mz += mz / peak.frg_z
                            peak.frg_serie = peak.frg_serie + '+' + str(int(round(mz)))
                    if not objfound : continue


                #Write the data into the data matrix (transitions)

                code = 'ProteinPilot'
                if key == 'openswath'     : code = 'unimod'
                if key == 'peakview'    : code = 'ProteinPilot'

                if protein_desc not in protein_index:
                	protein_index[protein_desc] = protein_cnt
                	protein_cnt+=1

                if protein_desc[:2] == "1/":
                	protein_shared = "FALSE"
                else:
                	protein_shared = "TRUE"

                transition = []
                transition_cnt += 1
                if key == 'peakview' :
                    transition = [ precursorMZ , fragment_mz , iRT_experimental , protein_desc , 'light' ,
                                    peak.intensity , spectrum.sequence , pep.getSequenceWithMods(code) , int(z_parent) ,
                                    peak.frg_serie , peak.frg_z , peak.frg_nr , iRT_experimental , protein_code1 , 'FALSE', protein_index[protein_desc] , 1 , protein_shared ]
                if key == 'openswath' :
                    transition_group_id = "%s_%s_%s" % (precursor_cnt, pep.getSequenceWithMods(code), int(z_parent))
                    transition = [precursorMZ, fragment_mz, iRT_experimental, "%s_%s%s_%s_%s_%s" % (transition_cnt, peak.frg_serie, peak.frg_nr, peak.frg_z, pep.getSequenceWithMods(code), int(z_parent)), '-1',
                            peak.intensity, transition_group_id, 0, spectrum.sequence, protein_desc, 
                            peak.peak_annotation, pep.getSequenceWithMods(code),
                            int(z_parent), transition_group_id, protein_code1, peak.frg_serie, peak.frg_z,
                            peak.frg_nr , 'light']
                
                filteredtransitions.append(transition)
            

            #Sort transitions by frg_serie, frg_nr, frg_z and MINUS intensity, then remove duplicates
            #this means of every frag_serie/no/chg only the highest(!) intensity peak is stored
            if key == 'peakview':
                filteredtransitions = sorted(filteredtransitions, key=lambda x: (x[9], x[10], x[11], -x[5]))
                filteredtransitions = removeDuplicates(filteredtransitions, lambda x: (x[9], x[10], x[11]))
            elif key == 'openswath':
                filteredtransitions = sorted(filteredtransitions, key=lambda x: (x[15], x[16], x[17], -x[5]))
                filteredtransitions = removeDuplicates(filteredtransitions, lambda x: (x[15], x[16], x[17]))

            #REVERSE sort the transitions by intensity
            #this means the highest(!) intensity peaks of this transition are on top
            filteredtransitions = sorted ( filteredtransitions    , key=lambda transition : transition[5], reverse=True )

            #Remove transitions with very similar Q3 masses
            massTolerance = 0.02 #This should be configured by user --> TO-DO
            filteredtransitions = removeSimilarDuplicates (filteredtransitions , massTolerance , lambda x : x[1])

            #If a swaths file was provided, remove transitions falling into the swath
            filteredtransitions_tmp = []
            if len(swaths) > 0 :
                for tr in filteredtransitions :
                    if not is_Q3_in_swath_range(tr[0] , tr[1] , swaths) : filteredtransitions_tmp.append(tr)
                filteredtransitions = filteredtransitions_tmp

            #if less transitions than the minimum --> continue to next spectrum
            #print filteredtransitions
            if len(filteredtransitions) < mintransitions :
                filteredtransitions = [] #I don't think this is really necessary, just in case.
                continue

            #Write in the peakview input file (until the max number of transitions per peptide/z (the most intense N transitions)
            for index in range(0,min(len(filteredtransitions),maxtransitions)) :
                writer.writerow(filteredtransitions[index])

                #Isotopic labeling 
                #To-Do : All the labeling calculations must be reviewed and adapted to the Peptide class
                if len(labeling) > 0 :
                    heavy_transition       = filteredtransitions[index]
                    if key == 'peakview' :
                        heavy_transition[4] = 'heavy'
                        precursorMZ_heavy      = heavy_transition[0]
                        fragment_mz_heavy      = heavy_transition[1]
                        sequence_heavy         = heavy_transition[6]
                        z_parent               = heavy_transition[8]
                        frg_serie              = heavy_transition[9]
                        frg_z                   = heavy_transition[10]
                        frg_number               = heavy_transition[11]
                    if key == 'openswath' :
                        # heavy_transition[13] = 'heavy'
                        heavy_transition[18] = 'heavy'
                        heavy_transition[13] = heavy_transition[6]
                        precursorMZ_heavy      = heavy_transition[0]
                        fragment_mz_heavy      = heavy_transition[1]
                        sequence_heavy         = heavy_transition[8]
                        z_parent               = heavy_transition[12]
                        frg_serie              = heavy_transition[15]
                        frg_z                   = heavy_transition[16]
                        frg_number               = heavy_transition[17]

                        # Modify full sequence with name of label
                        fullseq = heavy_transition[11]
                        fullseq_n = ""
                        for aa in fullseq:
                            fullseq_n += aa
                            if aa in labeling and labeling[aa].name:
                                fullseq_n += "(%s)" % labeling[aa].name

                        # Store modified sequence and make transition id unique again
                        heavy_transition[11] = fullseq_n
                        heavy_transition[3] += 'heavy'
                        heavy_transition[6] += 'heavy'




                                        
                    # TODO also change full unimod peptide name     


                    #NOTE: This only works for y- and b- ions (AND their neutral losses). Other fragment series are ignored, and no heavy transition will be generated for them.
                    if frg_serie[0] not in ['y','b'] : continue


                    for aa in sequence_heavy :
                        if aa in labeling : precursorMZ_heavy += labeling[aa].deltamass / z_parent

                    frg_seq = sequence_heavy
                    #b series
                    if frg_serie == 'b': frg_seq = sequence_heavy[:frg_number]
                    #y series
                    if frg_serie == 'y': frg_seq = sequence_heavy[-frg_number:]

                    for aa in frg_seq :
                        if aa in labeling : fragment_mz_heavy += labeling[aa].deltamass / frg_z

                    #Check for C- and N-terminal labelings
                    if 'C-term' in labeling:
                        precursorMZ_heavy += labeling['C-term'].deltamass / z_parent
                        if frg_serie == 'y':
                            fragment_mz_heavy += labeling['C-term'].deltamass / frg_z
                    if 'N-term' in labeling:
                        precursorMZ_heavy += labeling['N-term'].deltamass / z_parent
                        if frg_serie == 'b':
                            fragment_mz_heavy += labeling['N-term'].deltamass / frg_z


                    #NOTE : if for any reason the Q3 mass has not changed (the labeling is not present in this fragment ion), it is not reported.
                    #        if Q3 remains the same, then the mass shift in Q1 should be enough to fall in a different SWATH window.
                    #        (In case a swaths mass ranges file is present, otherwise, we remove the not changing Q3 masses anyway).
                    if removeDuplicatesInHeavy and fragment_mz_heavy == filteredtransitions[index][1] :
                        if len(swaths) > 0 :
                            if is_Q3_in_swath_range(precursorMZ_heavy , filteredtransitions[index][0] , swaths) : continue
                        else : continue

                    #Write the heavy transition into the file
                    heavy_transition[0] = precursorMZ_heavy
                    heavy_transition[1] = fragment_mz_heavy
                    writer.writerow(heavy_transition)

            filteredtransitions = []

        print("file written : " , peakviewfilename)


    print("Done.")






def do_filtering_and_write(filteredtransitions, writer, labeling, removeDuplicatesInHeavy, swaths, mintransitions, maxtransitions, massTolerance = 0.02, verbose = False , lock = None) :
    #Sort transitions by frg_serie, frg_nr, frg_z and MINUS intensity, then remove duplicates
    #this means of every frag_serie/no/chg only the highest(!) intensity peak is stored
    if verbose : print("initial number transitions " , len(filteredtransitions))
    if key == 'peakview':
        filteredtransitions = sorted(filteredtransitions, key=lambda x: (x[9], x[10], x[11], -x[5]))
        filteredtransitions = removeDuplicates(filteredtransitions, lambda x: (x[9], x[10], x[11]))
    elif key == 'openswath':
        filteredtransitions = sorted(filteredtransitions, key=lambda x: (x[15], x[16], x[17], -x[5]))
        filteredtransitions = removeDuplicates(filteredtransitions, lambda x: (x[15], x[16], x[17]))
    if verbose : print("after removing duplicates " , len(filteredtransitions))

    #REVERSE sort the transitions by intensity
    #this means the highest(!) intensity peaks of this transition are on top
    filteredtransitions = sorted ( filteredtransitions    , key=lambda transition : transition[5], reverse=True )

    #Remove transitions with very similar Q3 masses
    massTolerance = 0.02 #This should be configured by user --> TO-DO
    filteredtransitions = removeSimilarDuplicates (filteredtransitions , massTolerance , lambda x : x[1])
    if verbose : print("after removing similar duplicates " , len(filteredtransitions))

    #If a swaths file was provided, remove transitions falling into the swath
    filteredtransitions_tmp = []
    if len(swaths) > 0 :
        for tr in filteredtransitions :
            if not is_Q3_in_swath_range(tr[0] , tr[1] , swaths) : filteredtransitions_tmp.append(tr)
        filteredtransitions = filteredtransitions_tmp
    if verbose : print("after removing transitions within the swath isolation window " , len(filteredtransitions))

    #if less transitions than the minimum --> continue to next spectrum
    #print filteredtransitions
    if len(filteredtransitions) < mintransitions :
        filteredtransitions = [] #I don't think this is really necessary, just in case.
    if verbose : print("after removing num of transitions below the required minimum " , len(filteredtransitions))

    adssfsfadsf
    
    #Write in the peakview input file (until the max number of transitions per peptide/z (the most intense N transitions)
    for index in range(0,min(len(filteredtransitions),maxtransitions)) :
        if lock : lock.acquire()
        writer.writerow(filteredtransitions[index])
        time.sleep(0.001)
        if lock : lock.release()
        
        #Isotopic labeling 
        #To-Do : All the labeling calculations must be reviewed and adapted to the Peptide class
        if len(labeling) > 0 :
            heavy_transition       = filteredtransitions[index]
            if key == 'peakview' :
                heavy_transition[4] = 'heavy'
                precursorMZ_heavy      = heavy_transition[0]
                fragment_mz_heavy      = heavy_transition[1]
                sequence_heavy         = heavy_transition[6]
                z_parent               = heavy_transition[8]
                frg_serie              = heavy_transition[9]
                frg_z                   = heavy_transition[10]
                frg_number               = heavy_transition[11]
            if key == 'openswath' :
                print(heavy_transition)
                heavy_transition[4] = 'heavy'
                print(heavy_transition)
                precursorMZ_heavy      = heavy_transition[0]
                fragment_mz_heavy      = heavy_transition[1]
                sequence_heavy         = heavy_transition[8]
                z_parent               = heavy_transition[12]
                frg_serie              = heavy_transition[15]
                frg_z                   = heavy_transition[16]
                frg_number               = heavy_transition[17]
                                
            #NOTE: This only works for y- and b- ions (AND their neutral losses). Other fragment series are ignored, and no heavy transition will be generated for them.
            if frg_serie[0] not in ['y','b'] : continue


            for aa in sequence_heavy :
                if aa in labeling:
                    precursorMZ_heavy += labeling[aa].deltamass / z_parent

            frg_seq = sequence_heavy
            #b series
            if frg_serie == 'b':
                frg_seq = sequence_heavy[:frg_number]
            #y series
            if frg_serie == 'y':
                frg_seq = sequence_heavy[-frg_number:]

            for aa in frg_seq:
                if aa in labeling:
                    fragment_mz_heavy += labeling[aa].deltamass / frg_z

            #Check for C- and N-terminal labelings
            if 'C-term' in labeling:
                precursorMZ_heavy += labeling['C-term'].deltamass / z_parent
                if frg_serie == 'y':
                    fragment_mz_heavy += labeling['C-term'].deltamass / frg_z
            if 'N-term' in labeling:
                precursorMZ_heavy += labeling['N-term'].deltamass / z_parent
                if frg_serie == 'b':
                    fragment_mz_heavy += labeling['N-term'].deltamass / frg_z


            #NOTE : if for any reason the Q3 mass has not changed (the labeling is not present in this fragment ion), it is not reported.
            #        if Q3 remains the same, then the mass shift in Q1 should be enough to fall in a different SWATH window.
            #        (In case a swaths mass ranges file is present, otherwise, we remove the not changing Q3 masses anyway).
            if removeDuplicatesInHeavy and fragment_mz_heavy == filteredtransitions[index][1] :
                if len(swaths) > 0 :
                    if is_Q3_in_swath_range(precursorMZ_heavy , filteredtransitions[index][0] , swaths) : continue
                else : continue

            #Write the heavy transition into the file
            heavy_transition[0] = precursorMZ_heavy
            heavy_transition[1] = fragment_mz_heavy
            if lock : lock.acquire()
            writer.writerow(heavy_transition)
            time.sleep(0.001)
            if lock : lock.release()




if __name__ == '__main__':
    main(sys.argv[1:])

[spctools-discuss] spectrast2tsv.py from Aebersold 2015 doesn't generate an output .csv or most of the programmed print messages

Reply via email to