I believe that in the XML file, total_num_peptides refers to the total number of SPECTRA contributing to the sibling group, not the actual number of peptides. The individual sub groups can all be zero because their peptides are all shared with each other so the weight factor is close to zero for all peptides. However, when all of the peptides are taken together for the group, the weight factor is not considered and the group probability reaches 1.0.
If you look at the protein group in question that you are finding, I think you'll see that all of the peptides in the group are shared among each sibling group and with no other protein groups. On Aug 15, 1:25 pm, LW <[email protected]> wrote: > Thanks for the explanation. > If I understand correctly, the protein_group probability is done using > all > the peptides from the subgroups, while protein subgroup probability > is only for peptides from that subgroup (both unique and non-unique). > > However, what about the case where protein_group probability is 1.0 > even though all the individual subgroup probabilities are 0. > In this case the sibling groups have several entries in the > "unique_stripped_peptides" tag but "total_number_peptides=0" > Any explanation? > In general, which is better filtering by the protein_group > probability or the probability in each subgroup? > > Thanks. > LW > > On Aug 13, 10:02 am, GATTACA <[email protected]> wrote:> So in these cases, > all the proteins in the protein group share at > > least one peptide. > > The different sub groups occur because certain "clusters" of proteins > > share peptides that are specific to the cluster. > > > As an example, imagine a group that consists of 3 sibling groups: a,b, > > and c. All of the protein identifiers in the group correspond to > > Histones. Sibling group 'a' contains peptides that are unique to > > Histone2A. While sibling group 'b' contains Histone3 and sibling group > > 'c' has Histone4A. > > > All 3 sibling groups share at least some peptides in common, but each > > sibling group also has some peptides, unique to itself. > > > Because peptide probabilities in ProteinProphet are adjusted based > > upon the number of sibling peptides (nsp) and how the peptides are > > shared among various proteins (wt) the probability for a sibling group > > can be different from the probability of the group as a whole. > > > I don't know how clear that is, but that's my attempt at explaining > > it. > > > On Aug 12, 7:40 pm, LW <[email protected]> wrote: > > > > Hi, > > > > I have a question on the prot.xml. It seems like each protein group is > > > a probability and each > > > subgroup (those with group_sibling_id="a", "b", etc) has a > > > probability. > > > > <protein_group group_number="1" probability="1.0000"> > > > <protein protein_name="DECOY_40330" n_indistinguishable_proteins="1" > > > probability="1.0000" percent_coverage="2.9" > > > unique_stripped_peptides="LMVSNQFK+NMMTIETNSSTSVVSPRASTAR" > > > group_sibling_id="a" total_number_peptides="8" > > > pct_spectrum_ids="0.019" confidence="0.004"> > > > > How is the probability for the protein_group determined? I came across > > > cases where all the > > > subgroup probabilities are 0 but the protein_group probability is > > > 1.0. > > > How do I explain this? > > > > Thanks, > > > LW -- You received this message because you are subscribed to the Google Groups "spctools-discuss" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/spctools-discuss?hl=en.
