Thanks for the explanation. If I understand correctly, the protein_group probability is done using all the peptides from the subgroups, while protein subgroup probability is only for peptides from that subgroup (both unique and non-unique).
However, what about the case where protein_group probability is 1.0 even though all the individual subgroup probabilities are 0. In this case the sibling groups have several entries in the "unique_stripped_peptides" tag but "total_number_peptides=0" Any explanation? In general, which is better filtering by the protein_group probability or the probability in each subgroup? Thanks. LW On Aug 13, 10:02 am, GATTACA <[email protected]> wrote: > So in these cases, all the proteins in the protein group share at > least one peptide. > The different sub groups occur because certain "clusters" of proteins > share peptides that are specific to the cluster. > > As an example, imagine a group that consists of 3 sibling groups: a,b, > and c. All of the protein identifiers in the group correspond to > Histones. Sibling group 'a' contains peptides that are unique to > Histone2A. While sibling group 'b' contains Histone3 and sibling group > 'c' has Histone4A. > > All 3 sibling groups share at least some peptides in common, but each > sibling group also has some peptides, unique to itself. > > Because peptide probabilities in ProteinProphet are adjusted based > upon the number of sibling peptides (nsp) and how the peptides are > shared among various proteins (wt) the probability for a sibling group > can be different from the probability of the group as a whole. > > I don't know how clear that is, but that's my attempt at explaining > it. > > On Aug 12, 7:40 pm, LW <[email protected]> wrote: > > > > > > > > > Hi, > > > I have a question on the prot.xml. It seems like each protein group is > > a probability and each > > subgroup (those with group_sibling_id="a", "b", etc) has a > > probability. > > > <protein_group group_number="1" probability="1.0000"> > > <protein protein_name="DECOY_40330" n_indistinguishable_proteins="1" > > probability="1.0000" percent_coverage="2.9" > > unique_stripped_peptides="LMVSNQFK+NMMTIETNSSTSVVSPRASTAR" > > group_sibling_id="a" total_number_peptides="8" > > pct_spectrum_ids="0.019" confidence="0.004"> > > > How is the probability for the protein_group determined? I came across > > cases where all the > > subgroup probabilities are 0 but the protein_group probability is > > 1.0. > > How do I explain this? > > > Thanks, > > LW -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
