Package: mc
Version: 3:4.8.3-10
Severity: normal

Hi,

the attached manpage contains errors (I'm in the middle of editing and
just wanted to have a quick look) and when you press F3 on it it shows
some problems.  After pressing <ESC> mc will show the manpage (at least
the non-broken part) properly.  However, there is no way (like <ESC> or
<F10> to leave the viewer again - I needed to kill the xterm to get
rid of this view.

Kind regards

       Andreas.


-- System Information:
Debian Release: 7.0
  APT prefers testing
  APT policy: (501, 'testing'), (50, 'unstable'), (5, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 3.2.0-4-amd64 (SMP w/2 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages mc depends on:
ii  e2fslibs      1.42.5-1
ii  libc6         2.13-37
ii  libcomerr2    1.42.5-1
ii  libglib2.0-0  2.33.12+really2.32.4-3
ii  libgpm2       1.20.4-6
ii  libslang2     2.2.4-15
ii  mc-data       3:4.8.3-10

Versions of packages mc recommends:
ii  mime-support  3.52-1
ii  perl          5.14.2-17
ii  unzip         6.0-8

Versions of packages mc suggests:
ii  arj                  3.10.22-10
ii  bzip2                1.0.6-4
ii  catdvi               0.14-12.1
ii  dbview               1.0.4-1
ii  djvulibre-bin        3.5.25.3-1
ii  evince [pdf-viewer]  3.4.0-3.1
ii  file                 5.11-2
pn  gv                   <none>
ii  imagemagick          8:6.7.7.10-5
ii  links                2.7-1
ii  lynx                 2.8.8dev.12-2
ii  odt2txt              0.4+git20100620-1+b1
ii  python               2.7.3~rc2-1
pn  python-boto          <none>
ii  python-tz            2012c-1
ii  w3m                  0.5.3-8
ii  zip                  3.0-6

-- no debconf information
.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.40.10.
.TH MAST: "1" "February 2013" "Motif Alignment and Search Tool" "User Commands"
.SH NAME
MAST \- Motif Alignment and Search Tool
.SH SYNOPSIS
.B mast <motif file> <sequence file>
[\fIoptions\fR]
.SH DESCRIPTION
MAST: Motif Alignment and Search Tool
.PP
Inputs
.TP
\fB<motif file>\fR
file containing motifs to use; normally a MEME output file
.TP
\fB<sequence file>\fR
search sequences in FASTA\-formatted database with motifs
.TP
\fB\-bfile <file>\fR
read background frequencies from <file>
.TP
\fB\-dblist\fR
read the <sequence file> as a list of FASTA\-formatted databases
.PP
Outputs
.TP
\fB\-o <dir>\fR
directory to output mast results; directory must not exist
.TP
\fB\-oc <dir>\fR
directory to output mast results with overwriting allowed
.TP
\fB\-hit_list\fR
print a machine\-readable list of all hits only; outputs to standard out and overrides \fB\-seqp\fR
.PP
Which Motifs To Use
.TP
\fB\-remcorr\fR
remove highly correlated motifs from query
.TP
\fB\-m <m>+\fR
use only motif number \fB<m>\fR (overrides \fB\-mev\fR); this can be
repeated to select multiple motifs
.TP
\fB\-c <count>\fR
only use the first \fB<count>\fR motifs or all motifs when \fB<count>\fR is zero (default: 0)
.TP
\fB\-mev <mev>\fR
use only motifs with E\-values less than \fB<mev>\fR
.TP
\fB\-diag <diag>\fR
nominal order and spacing of motifs is specified by \fB<diag>\fR which is a block diagram
.PP
DNA\-Only Options
.TP
\fB\-norc\fR
do not score reverse complement DNA strand
.TP
\fB\-sep\fR
score reverse complement DNA strand as a separate sequence
.TP
\fB\-dna\fR
translate DNA sequences to protein; motifs must be protein; sequences must be DNA
.TP
\fB\-comp\fR
adjust p\-values and E\-values for sequence composition
.PP
Which Results To Print
.TP
\fB\-ev <ev>\fR
print results for sequences with E\-value < \fB<ev>\fR (default: 10)
.PP
Appearance Of Block Diagrams
.TP
\fB\-mt <mt>\fR
show motif matches with p\-value < \fB<mt>\fR (default: 0.0001)
.TP\fB\-w\fR show weak matches (<mt> < p\-value < <mt>*10) in angle brackets in
the hit list or when the xml is converted to text
.TP\fB\-best\fR include only the best motif hits in \fB\-hit_list\fR diagrams
.TP\fB\-seqp\fR use SEQUENCE p\-values for motif thresholds (default: use
POSITION p\-values)
Miscellaneous
.TP\fB\-mf <mf>\fR in results use <mf> as motif file name
.TP\fB\-df <df>\fR in results use <df> as database name (ignored when \fB\-dblist\fR)
.TP\fB\-dl <dl>\fR in results use <dl> as link to search sequence names; token
SEQUENCEID is replaced with the FASTA sequence ID; ignored when
\fB\-dblist\fR;
.TP\fB\-minseqs <ms>\fR lower bound on number of sequences in db
.TP\fB\-nostatus\fR do not print progress report
.TP\fB\-notext\fR do not create text output
.TP\fB\-nohtml\fR do not create html output
.IP
MAST is a tool for searching biological sequence databases for
sequences that contain one or more of a group of known motifs.
.IP
A motif is a sequence pattern that occurs repeatedly in a group of
related protein or DNA sequences. Motifs are represented as
position\-dependent scoring matrices that describe the score of each
possible letter at each position in the pattern. Individual motifs may
not contain gaps. Patterns with variable\-length gaps must be split into
two or more separate motifs before being submitted as input to MAST.
.IP
MAST takes as input a file containing the descriptions of one or more
motifs and searches a sequence database that you select for sequences
that match the motifs. The motif file can be the output of the MEME
motif discovery tool or any file in the appropriate format.
.IP
MAST outputs an xml file which can then be converted into html or text
format. The xml file is designed for machine processing and the html
file is designed for human viewing. The text format is available for
backwards compatibility though due to design decisions made to optimise
the xml for html generation the output for separate scoring mode is not
identical and some options were removed. The text format will be
unsupported in future releases and so we recommend you migrate any
programs reading mast output to the xml format.
.IP
MAST outputs three things:
.IP
1. The names of the high\-scoring sequences sorted by the strength of
.IP
the combined match of the sequence to all of the motifs in the
group.
.IP
2. Motif diagrams showing the order and spacing of the motifs within
.IP
each matching sequence.
.IP
3. Detailed annotation of each matching sequence showing the sequence
.IP
and the locations and strengths of matches to the motifs.
.IP
MAST works by calculating match scores for each sequence in the
database compared with each of the motifs in the group of motifs you
provide. For each sequence, the match scores are converted into various
types of p\-values and these are used to determine the overall match of
the sequence to the group of motifs and the probable order and spacing
of occurrences of the motifs in the sequence.
.IP
MAST generates a human readable file from the xml output containing:
.IP
* the version of MAST and the date it was built,
* the reference to cite if you use MAST in your research,
* a description of the databases and motifs used in the search,
* an explanation of the result,
* the sequences identifier and score sorted by score matching the
.IP
group of motifs above a stated level of statistical significance,
.IP
* motif diagrams showing the order and spacing of occurrences of the
.IP
motifs in the significant sequences and,
.IP
* annotated sequences showing the positions and p\-values of all motif
.IP
occurrences in each of the high\-scoring sequences.
.IP
The html version is the recommended version for human reading and has
all sections documented however the text version has no documentation
for the first section. That section lists each motif along with the
sequence that would achieve the best possible match score. In order to
avoid biased scores when multiple motif scores are combined, MAST also
computes the pairwise correlations between each pair of motifs. The
correlation between two motifs is the maximum sum of Pearson's
correlation coefficients for aligned columns divided by the width of
the shorter motif. The maximum is found by trying all alignments of the
two motifs. Motifs with correlations below 0.60 have little effect on
the accuracy of the combined scores. Pairs of motifs with higher
correlations should be removed from the query.
.PP
Match Scores
.IP
The match score of a motif to a position in a sequence is the sum of
the score from each column of the position\-dependent scoring matrix
corresponding to the letter at that position in the sequence. For
example, if the sequence is
.IP
TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
.IP
========
.IP
and the motif is represented by the position\-dependent scoring matrix
(where each row of the matrix corresponds to a position in the motif)
.TP
Position
A      C      G      T
.TP
1
1.447  0.188  \fB\-4\fR.025 \fB\-4\fR.095
.TP
2
0.739  1.339  \fB\-3\fR.945 \fB\-2\fR.325
.TP
3
1.764  \fB\-3\fR.562 \fB\-4\fR.197 \fB\-3\fR.895
.TP
4
1.574  \fB\-3\fR.784 \fB\-1\fR.594 \fB\-1\fR.994
.TP
5
1.602  \fB\-3\fR.935 \fB\-4\fR.054 \fB\-1\fR.370
.TP
6
0.797  \fB\-3\fR.647 \fB\-0\fR.814 0.215
.TP
7
\fB\-1\fR.280 1.873  \fB\-0\fR.607 \fB\-1\fR.993
.TP
8
\fB\-3\fR.076 1.035  1.414  \fB\-3\fR.913
.IP
then the match score of the fourth position in the sequence
(underlined) would be found by summing the score for T in position 1, G
in position 2 and so on until G in position 8. So the match score would
be
.IP
score = \fB\-4\fR.095 + \fB\-3\fR.945 + \fB\-3\fR.895 + \fB\-1\fR.994
.IP
+ \fB\-4\fR.054 + \fB\-0\fR.814 + \fB\-1\fR.933 + 1.414
.IP
= \fB\-19\fR.316
.IP
The match scores for other positions in the sequence are calculated in
the same way. Match scores are only calculated if the match completely
fits within the sequence. Match scores are not calculated if the motif
would overhang either end of the sequence.
.PP
P\-values
.IP
MAST reports all matches of a sequence to a motif or group of motifs in
terms of the p\-value of the match. MAST considers the p\-values of four
types of events:
.IP
* position p\-value: the match of a single position within a sequence
.IP
to a given motif,
.IP
* sequence p\-value: the best match of any position within a sequence
.IP
to a given motif,
.IP
* combined p\-value: the combined best matches of a sequence to a
.IP
group of motifs, and
.IP
* E\-value: observing a combined p\-value at least as small in a random
.IP
database of the same size.
.IP
All p\-values are based on a random sequence model that assumes each
position in a random sequence is generated according to the average
letter frequencies of all sequences in the appropriate (peptide or
nucleotide) non\-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/)
on September 22, 1996. This can be overridden by specifying the \fB\-bfile\fR
or \fB\-comp\fR options (see below). For DNA sequences, unless \fB\-norc\fR is given,
the positive and reverse complement strand frequencies are averaged
together.
.IP
1. \fB\-bfile\fR <bfile> The random model uses the letter frequencies given
.IP
in <bfile> instead of the non\-redundant database frequencies. The
format of <bfile> is the same as that for the MEME \fB\-bfile\fR option;
see the MEME documentation for details. You can create files in the
appropriate format based on the base/residue composition of your
own FASTA sequence files using the command "fasta\-get\-markov"
included in the MEME distribution. Type fasta\-get\-markov on the
command line for documentation. (Sample files are also given in
directory tests: tests/nt.freq and tests/na.freq.)
.IP
2. \fB\-comp\fR The random model uses the letter frequencies in the current
.IP
target sequence instead of the non\-redundant database frequencies.
This causes p\-values and E\-values to be compensated individually
for the actual composition of each sequence in the database. This
option can increase search time substantially due to the need to
compute a different score distribution for each high\-scoring
sequence. With this option and DNA sequences, the positive and
reverse complement strand frequencies are not averaged together.
.IP
Position p\-value
.IP
The p\-value of a match of a given position within a sequence to a motif
is defined as the probability of a randomly selected position in a
randomly generated sequence having a match score at least as large as
that of the given position. Note:If MAST is combining reverse
complement DNA strands, the position p\-value is not corrected for
multiple tests.
.IP
Sequence p\-value
.IP
The p\-value of a match of a sequence to a motif is defined as the
probability of a randomly generated sequence of the same length having
a match score at least as large as the largest match score of any
position in the sequence.
.IP
Combined p\-value
.IP
The p\-value of a match of a sequence to a group of motifs is defined as
the probability of a randomly generated sequence of the same length
having sequence p\-values whose product is at least as small as the
product of the sequence p\-values of the matches of the motifs to the
given sequence.
.IP
E\-value
.IP
The E\-value of the match of a sequence in a database to a a group of
motifs is defined as the expected number of sequences in a random
database of the same size that would match the motifs as well as the
sequence does and is equal to the combined p\-value of the sequence
times the number of sequences in the database.
.PP
High\-scoring Sequences
.IP
MAST lists the names and part of the descriptive text of all sequences
whose E\-value is less than E. Sequences shorter than one or more of the
motifs are skipped. The sequences are sorted by increasing E\-value. The
value of E is set to 10 for the WEB server but is user\-selectable in
the down\-loadable version of MAST.
.PP
Motif Diagrams
.IP
Motif diagrams show the order and spacing of non\-overlapping matches to
the motifs in each high\-scoring sequence. Motif occurrences are
determined based on the position p\-value of matches to the motif.
Strong matches (p\-value < M) are shown in square brackets (`[ ]'), weak
matches (M < p\-value < M x 10) are shown in angle brackets (`< >') and
the length of non\-motif sequence ("spacer") is shown between
underscores (`_'). For example,
.IP
27_[3]_44_<4>_99_[1]_7
.IP
shows an initial spacer of length 27, followed by a strong match to
motif 3, a spacer of length 44, a weak match to motif 4, a spacer of
length 99, a strong match to motif 1 and a final non\-motif sequence of
length 7. The value of M is 0.0001 for the WEB server but is
user\-selectable in the downloadable version of MAST.
.PP
Annotated Sequences
.IP
MAST annotates each high\-scoring sequence by printing the sequence
along with the position and strength of all the non\-overlapping motif
occurrences. The four lines above each motif occurrence contain,
respectively,
.IP
* the motif number of the occurrence,
* the position p\-value of the occurrence,
* the best possible match to the motif, and
* a plus sign (`+') above each letter in the occurrence that has a
.IP
positive match score to the motif.
.IP
The best possible match to a motif is the sequence of letters which
would achieve the highest match score.
.PP
Hit List
.IP
If you specify the \fB\-hit_list\fR switch to MAST, MAST outputs ONLY a list
of "hits" in easily machine\-readable format. Each line corresponds to
one motif occurrence in one sequence. The format of the hit lines is
.IP
[<sequence_name> <strand><motif> <start> <end> <score> <p\-value>]+
.IP
where
.IP
<sequence_name> is the name of the sequence containing the hit
<strand>        is the strand (+ or \- for DNA, blank for protein),
<motif>         is the motif number,
<start>         is the starting position of the hit,
<end>           is the ending position of the hit, and
<score>         is the score the hit,
<p\-value>       is the position p\-value of the hit.
.IP
Two comment lines (starting with "#") are written above the list of
hits, and the MAST command line is printed as a comment line after the
list. An example of the output using the \fB\-hit_list\fR switch to MAST is:
.IP
# All non\-overlapping hits in all sequences.
# sequence_name motif hit_start hit_end score hit_p\-value
ce1cg \fB\-2\fR 8 22  1459.90 1.67e\-06
ara +2 2 16  1661.18 5.04e\-08
bglr1 +2 1 15  1274.97 1.42e\-05
cya \fB\-2\fR 19 33  1101.37 6.64e\-05
gale +2 5 19  1076.21 8.11e\-05
ilv \fB\-2\fR 6 20  1098.85 6.78e\-05
malk +2 37 51  1085.02 7.56e\-05
ompa +2 5 19  1583.18 2.43e\-07
# mast tests/meme/meme.crp0.oops tests/common/crp0.s \fB\-hit_list\fR \fB\-m\fR 2
.PP
Loading Multiple Sequence Databases
.IP
Multiple sequence databases can be loaded by MAST by putting the file
names into a file and specifying that file instead of the sequence
database with the option \fB\-dblist\fR.
.IP
The file list has one file name on each line with the optional name and
link as follows:
.IP
<file> [<name> <link>]
\&...
\&...
.IP
If it is specified then the name will be used instead of the file name
in the output. If the link is specified then all sequences for that
database in the html output will have a hyperlink to the URL specified
with the text SEQUENCEID replaced with the FASTA sequence id.
.PP
EXAMPLES:
.IP
The following examples assume that file "meme.results" is the output of
a MEME run containing at least 3 motifs which was created on the
trainingset "training.fasta" and file SwissProt is a copy of the
Swiss\-Prot database on your local disk. DNA_DB is a copy of a DNA
database on your local disk.
.IP
1. Annotate the training set:
.IP
mast meme.results training.fasta
.IP
2. Find sequences matching the motif and annotate them in the
.IP
SwissProt database:
.IP
mast meme.results SwissProt
.IP
3. Show sequences with weaker combined matches to motifs.
.IP
mast meme.results SwissProt \fB\-ev\fR 200
.IP
4. Include a nominal order and spacing of the first three motifs in
.IP
the calculation of the sequence p\-values to increase the
sensitivity of the search for matching sequences:
.IP
mast meme.results SwissProt \fB\-diag\fR "9\-[2]\-61\-[1]\-62\-[3]\-91"
.IP
5. Use only the first and third motifs in the search:
.IP
mast meme.results SwissProt \fB\-m\fR 1 \fB\-m\fR 3
.IP
6. Use only the first two motifs in the search:
.IP
mast meme.results SwissProt \fB\-c\fR 2
.IP
7. Search DNA sequences using protein motifs, adjusting p\-values and
.IP
E\-values for each sequence by that sequence's composition:
.IP
mast meme.results DNA_DB \fB\-dna\fR \fB\-comp\fR
.PP
References
.IP
1. file://localhost/home/tillea/debian\-maintain/repack/meme/meme_4.9.0/doc/meme\-format.html
2. file://localhost/home/tillea/debian\-maintain/repack/meme/meme_4.9.0/doc/mast.html#dblist
.PP
Usage
.IP
mast <motif file> <sequence file> [options]
.IP
Inputs
<motif file> file containing motifs to use; normally a MEME output
file
<sequence file> search sequences in FASTA\-formatted database with
motifs;
.TP\fB\-bfile <file>\fR read background frequencies from <file>
.TP\fB\-dblist\fR read the <sequence file> as a list of FASTA\-formatted
databases
Outputs
.TP\fB\-o <dir>\fR directory to output mast results; directory must not exist
.TP\fB\-oc <dir>\fR directory to output mast results with overwriting allowed
.TP\fB\-hit_list\fR print a machine\-readable list of all hits only; outputs to
standard out and overrides \fB\-seqp\fR
Which Motifs To Use
.TP\fB\-remcorr\fR remove highly correlated motifs from query
.TP\fB\-m <m>\fR+ use only motif number <m> (overrides \fB\-mev\fR); this can be
repeated to select multiple motifs
.TP\fB\-c <count>\fR only use the first <count> motifs or all motifs when
<count> is zero (default: 0)
.TP\fB\-mev <mev>\fR use only motifs with E\-values less than <mev>
.TP\fB\-diag <diag>\fR nominal order and spacing of motifs is specified by
<diag> which is a block diagram
DNA\-Only Options
.TP\fB\-norc\fR do not score reverse complement DNA strand
.TP\fB\-sep\fR score reverse complement DNA strand as a separate sequence
.TP\fB\-dna\fR translate DNA sequences to protein; motifs must be protein;
sequences must be DNA
.TP\fB\-comp\fR adjust p\-values and E\-values for sequence composition
Which Results To Print
.TP\fB\-ev <ev>\fR print results for sequences with E\-value < <ev> (default:
10)
Appearance Of Block Diagrams
.TP\fB\-mt <mt>\fR show motif matches with p\-value < <mt> (default: 0.0001)
.TP\fB\-w\fR show weak matches (<mt> < p\-value < <mt>*10) in angle brackets in
the hit list or when the xml is converted to text
.TP\fB\-best\fR include only the best motif hits in \fB\-hit_list\fR diagrams
.TP\fB\-seqp\fR use SEQUENCE p\-values for motif thresholds (default: use
POSITION p\-values)
Miscellaneous
.TP\fB\-mf <mf>\fR in results use <mf> as motif file name
.TP\fB\-df <df>\fR in results use <df> as database name (ignored when \fB\-dblist\fR)
.TP\fB\-dl <dl>\fR in results use <dl> as link to search sequence names; token
SEQUENCEID is replaced with the FASTA sequence ID; ignored when
\fB\-dblist\fR;
.TP\fB\-minseqs <ms>\fR lower bound on number of sequences in db
.TP\fB\-nostatus\fR do not print progress report
.TP\fB\-notext\fR do not create text output
.TP\fB\-nohtml\fR do not create html output
.IP
MAST is a tool for searching biological sequence databases for
sequences that contain one or more of a group of known motifs.
.IP
A motif is a sequence pattern that occurs repeatedly in a group of
related protein or DNA sequences. Motifs are represented as
position\-dependent scoring matrices that describe the score of each
possible letter at each position in the pattern. Individual motifs may
not contain gaps. Patterns with variable\-length gaps must be split into
two or more separate motifs before being submitted as input to MAST.
.IP
MAST takes as input a file containing the descriptions of one or more
motifs and searches a sequence database that you select for sequences
that match the motifs. The motif file can be the output of the MEME
motif discovery tool or any file in the appropriate format.
.IP
MAST outputs an xml file which can then be converted into html or text
format. The xml file is designed for machine processing and the html
file is designed for human viewing. The text format is available for
backwards compatibility though due to design decisions made to optimise
the xml for html generation the output for separate scoring mode is not
identical and some options were removed. The text format will be
unsupported in future releases and so we recommend you migrate any
programs reading mast output to the xml format.
.IP
MAST outputs three things:
.IP
1. The names of the high\-scoring sequences sorted by the strength of
.IP
the combined match of the sequence to all of the motifs in the
group.
.IP
2. Motif diagrams showing the order and spacing of the motifs within
.IP
each matching sequence.
.IP
3. Detailed annotation of each matching sequence showing the sequence
.IP
and the locations and strengths of matches to the motifs.
.IP
MAST works by calculating match scores for each sequence in the
database compared with each of the motifs in the group of motifs you
provide. For each sequence, the match scores are converted into various
types of p\-values and these are used to determine the overall match of
the sequence to the group of motifs and the probable order and spacing
of occurrences of the motifs in the sequence.
.IP
MAST generates a human readable file from the xml output containing:
.IP
* the version of MAST and the date it was built,
* the reference to cite if you use MAST in your research,
* a description of the databases and motifs used in the search,
* an explanation of the result,
* the sequences identifier and score sorted by score matching the
.IP
group of motifs above a stated level of statistical significance,
.IP
* motif diagrams showing the order and spacing of occurrences of the
.IP
motifs in the significant sequences and,
.IP
* annotated sequences showing the positions and p\-values of all motif
.IP
occurrences in each of the high\-scoring sequences.
.IP
The html version is the recommended version for human reading and has
all sections documented however the text version has no documentation
for the first section. That section lists each motif along with the
sequence that would achieve the best possible match score. In order to
avoid biased scores when multiple motif scores are combined, MAST also
computes the pairwise correlations between each pair of motifs. The
correlation between two motifs is the maximum sum of Pearson's
correlation coefficients for aligned columns divided by the width of
the shorter motif. The maximum is found by trying all alignments of the
two motifs. Motifs with correlations below 0.60 have little effect on
the accuracy of the combined scores. Pairs of motifs with higher
correlations should be removed from the query.
.PP
Match Scores
.IP
The match score of a motif to a position in a sequence is the sum of
the score from each column of the position\-dependent scoring matrix
corresponding to the letter at that position in the sequence. For
example, if the sequence is
.IP
TAATGTTGGTGCTGGTTTTTGTGGCATCGGGCGAGAATAGCGC
.IP
========
.IP
and the motif is represented by the position\-dependent scoring matrix
(where each row of the matrix corresponds to a position in the motif)
.TP
Position
A      C      G      T
.TP
1
1.447  0.188  \fB\-4\fR.025 \fB\-4\fR.095
.TP
2
0.739  1.339  \fB\-3\fR.945 \fB\-2\fR.325
.TP
3
1.764  \fB\-3\fR.562 \fB\-4\fR.197 \fB\-3\fR.895
.TP
4
1.574  \fB\-3\fR.784 \fB\-1\fR.594 \fB\-1\fR.994
.TP
5
1.602  \fB\-3\fR.935 \fB\-4\fR.054 \fB\-1\fR.370
.TP
6
0.797  \fB\-3\fR.647 \fB\-0\fR.814 0.215
.TP
7
\fB\-1\fR.280 1.873  \fB\-0\fR.607 \fB\-1\fR.993
.TP
8
\fB\-3\fR.076 1.035  1.414  \fB\-3\fR.913
.IP
then the match score of the fourth position in the sequence
(underlined) would be found by summing the score for T in position 1, G
in position 2 and so on until G in position 8. So the match score would
be
.IP
score = \fB\-4\fR.095 + \fB\-3\fR.945 + \fB\-3\fR.895 + \fB\-1\fR.994
.IP
+ \fB\-4\fR.054 + \fB\-0\fR.814 + \fB\-1\fR.933 + 1.414
.IP
= \fB\-19\fR.316
.IP
The match scores for other positions in the sequence are calculated in
the same way. Match scores are only calculated if the match completely
fits within the sequence. Match scores are not calculated if the motif
would overhang either end of the sequence.
.PP
P\-values
.IP
MAST reports all matches of a sequence to a motif or group of motifs in
terms of the p\-value of the match. MAST considers the p\-values of four
types of events:
.IP
* position p\-value: the match of a single position within a sequence
.IP
to a given motif,
.IP
* sequence p\-value: the best match of any position within a sequence
.IP
to a given motif,
.IP
* combined p\-value: the combined best matches of a sequence to a
.IP
group of motifs, and
.IP
* E\-value: observing a combined p\-value at least as small in a random
.IP
database of the same size.
.IP
All p\-values are based on a random sequence model that assumes each
position in a random sequence is generated according to the average
letter frequencies of all sequences in the appropriate (peptide or
nucleotide) non\-redundant database (ftp://ncbi.nlm.nih.gov/blast/db/)
on September 22, 1996. This can be overridden by specifying the \fB\-bfile\fR
or \fB\-comp\fR options (see below). For DNA sequences, unless \fB\-norc\fR is given,
the positive and reverse complement strand frequencies are averaged
together.
.IP
1. \fB\-bfile\fR <bfile> The random model uses the letter frequencies given
.IP
in <bfile> instead of the non\-redundant database frequencies. The
format of <bfile> is the same as that for the MEME \fB\-bfile\fR option;
see the MEME documentation for details. You can create files in the
appropriate format based on the base/residue composition of your
own FASTA sequence files using the command "fasta\-get\-markov"
included in the MEME distribution. Type fasta\-get\-markov on the
command line for documentation. (Sample files are also given in
directory tests: tests/nt.freq and tests/na.freq.)
.IP
2. \fB\-comp\fR The random model uses the letter frequencies in the current
.IP
target sequence instead of the non\-redundant database frequencies.
This causes p\-values and E\-values to be compensated individually
for the actual composition of each sequence in the database. This
option can increase search time substantially due to the need to
compute a different score distribution for each high\-scoring
sequence. With this option and DNA sequences, the positive and
reverse complement strand frequencies are not averaged together.
.IP
Position p\-value
.IP
The p\-value of a match of a given position within a sequence to a motif
is defined as the probability of a randomly selected position in a
randomly generated sequence having a match score at least as large as
that of the given position. Note:If MAST is combining reverse
complement DNA strands, the position p\-value is not corrected for
multiple tests.
.IP
Sequence p\-value
.IP
The p\-value of a match of a sequence to a motif is defined as the
probability of a randomly generated sequence of the same length having
a match score at least as large as the largest match score of any
position in the sequence.
.IP
Combined p\-value
.IP
The p\-value of a match of a sequence to a group of motifs is defined as
the probability of a randomly generated sequence of the same length
having sequence p\-values whose product is at least as small as the
product of the sequence p\-values of the matches of the motifs to the
given sequence.
.IP
E\-value
.IP
The E\-value of the match of a sequence in a database to a a group of
motifs is defined as the expected number of sequences in a random
database of the same size that would match the motifs as well as the
sequence does and is equal to the combined p\-value of the sequence
times the number of sequences in the database.
.PP
High\-scoring Sequences
.IP
MAST lists the names and part of the descriptive text of all sequences
whose E\-value is less than E. Sequences shorter than one or more of the
motifs are skipped. The sequences are sorted by increasing E\-value. The
value of E is set to 10 for the WEB server but is user\-selectable in
the down\-loadable version of MAST.
.PP
Motif Diagrams
.IP
Motif diagrams show the order and spacing of non\-overlapping matches to
the motifs in each high\-scoring sequence. Motif occurrences are
determined based on the position p\-value of matches to the motif.
Strong matches (p\-value < M) are shown in square brackets (`[ ]'), weak
matches (M < p\-value < M x 10) are shown in angle brackets (`< >') and
the length of non\-motif sequence ("spacer") is shown between
underscores (`_'). For example,
.IP
27_[3]_44_<4>_99_[1]_7
.IP
shows an initial spacer of length 27, followed by a strong match to
motif 3, a spacer of length 44, a weak match to motif 4, a spacer of
length 99, a strong match to motif 1 and a final non\-motif sequence of
length 7. The value of M is 0.0001 for the WEB server but is
user\-selectable in the downloadable version of MAST.
.PP
Annotated Sequences
.IP
MAST annotates each high\-scoring sequence by printing the sequence
along with the position and strength of all the non\-overlapping motif
occurrences. The four lines above each motif occurrence contain,
respectively,
.IP
* the motif number of the occurrence,
* the position p\-value of the occurrence,
* the best possible match to the motif, and
* a plus sign (`+') above each letter in the occurrence that has a
.IP
positive match score to the motif.
.IP
The best possible match to a motif is the sequence of letters which
would achieve the highest match score.
.PP
Hit List
.IP
If you specify the \fB\-hit_list\fR switch to MAST, MAST outputs ONLY a list
of "hits" in easily machine\-readable format. Each line corresponds to
one motif occurrence in one sequence. The format of the hit lines is
.IP
[<sequence_name> <strand><motif> <start> <end> <score> <p\-value>]+
.IP
where
.IP
<sequence_name> is the name of the sequence containing the hit
<strand>        is the strand (+ or \- for DNA, blank for protein),
<motif>         is the motif number,
<start>         is the starting position of the hit,
<end>           is the ending position of the hit, and
<score>         is the score the hit,
<p\-value>       is the position p\-value of the hit.
.IP
Two comment lines (starting with "#") are written above the list of
hits, and the MAST command line is printed as a comment line after the
list. An example of the output using the \fB\-hit_list\fR switch to MAST is:
.IP
# All non\-overlapping hits in all sequences.
# sequence_name motif hit_start hit_end score hit_p\-value
ce1cg \fB\-2\fR 8 22  1459.90 1.67e\-06
ara +2 2 16  1661.18 5.04e\-08
bglr1 +2 1 15  1274.97 1.42e\-05
cya \fB\-2\fR 19 33  1101.37 6.64e\-05
gale +2 5 19  1076.21 8.11e\-05
ilv \fB\-2\fR 6 20  1098.85 6.78e\-05
malk +2 37 51  1085.02 7.56e\-05
ompa +2 5 19  1583.18 2.43e\-07
# mast tests/meme/meme.crp0.oops tests/common/crp0.s \fB\-hit_list\fR \fB\-m\fR 2
.PP
Loading Multiple Sequence Databases
.IP
Multiple sequence databases can be loaded by MAST by putting the file
names into a file and specifying that file instead of the sequence
database with the option \fB\-dblist\fR.
.IP
The file list has one file name on each line with the optional name and
link as follows:
.IP
<file> [<name> <link>]
\&...
\&...
.IP
If it is specified then the name will be used instead of the file name
in the output. If the link is specified then all sequences for that
database in the html output will have a hyperlink to the URL specified
with the text SEQUENCEID replaced with the FASTA sequence id.
.PP
EXAMPLES:
.IP
The following examples assume that file "meme.results" is the output of
a MEME run containing at least 3 motifs which was created on the
trainingset "training.fasta" and file SwissProt is a copy of the
Swiss\-Prot database on your local disk. DNA_DB is a copy of a DNA
database on your local disk.
.IP
1. Annotate the training set:
.IP
mast meme.results training.fasta
.IP
2. Find sequences matching the motif and annotate them in the
.IP
SwissProt database:
.IP
mast meme.results SwissProt
.IP
3. Show sequences with weaker combined matches to motifs.
.IP
mast meme.results SwissProt \fB\-ev\fR 200
.IP
4. Include a nominal order and spacing of the first three motifs in
.IP
the calculation of the sequence p\-values to increase the
sensitivity of the search for matching sequences:
.IP
mast meme.results SwissProt \fB\-diag\fR "9\-[2]\-61\-[1]\-62\-[3]\-91"
.IP
5. Use only the first and third motifs in the search:
.IP
mast meme.results SwissProt \fB\-m\fR 1 \fB\-m\fR 3
.IP
6. Use only the first two motifs in the search:
.IP
mast meme.results SwissProt \fB\-c\fR 2
.IP
7. Search DNA sequences using protein motifs, adjusting p\-values and
.IP
E\-values for each sequence by that sequence's composition:
.IP
mast meme.results DNA_DB \fB\-dna\fR \fB\-comp\fR
.PP

Reply via email to