Hi Cliff, thanks for your questions. Many of the ideas I'm throwing out
there borrow heavily from standard Perl regexp ideas (and some math
ideas, too), so if I use one that doesn't make sense, let me know.
Also, I've put a bit more work into the notation and changed a few
things, particularly the notation for mean and standard deviation.
> Instead of <0> being just 0 crossing, could it be extended to mean
the value for which you are searching? Kind of making it like a context
grep.
I agree exactly. <NUMBER> should match the crossing of whatever number
is specified within the angle brackets. In regex terms, the <> notation
is meant to work like a zero-width assertion, much like the \b assertion
in Perl regexes for word-boundaries. If you're searching for something
that crosses 2, you could use <2>. Ideally, you would be able to use
any scalar expression inside the angle brackets, so you could write
<$crossing_value> if you set that variable already. I've given some
thought to a two-argument form for this notation, but I don't have my
notes with me and I'm fuzzy on the details at the moment.
> In the slope example, what is the purpose of the S attribute along
with the {1,3}. Is the S an operator such that it is calculating the
slope for the data points in the {range}? Will there be a minimum and
maximum to the number of points in the {range} that can be used for the
slope calculation? Could the range operator also be used to capture a
set of points around the <X> crossing - versus zero crossing.
S is not an operator or attribute. S is a metacharacter (ALL characters
are metacharacters in numerical regexes -- no need to escape them) and
it stands for 'positively sloped number'. So for example, S+ means
'match one or more positively sloped numbers', and S* means 'match zero
or more positively sloped numbers'. Thus, S{1,3} means 'match between
one and three positively sloped numbers'. However, I am certainly open
to different ideas. If you think S{1,3} should mean "match a number
whose slope is between 1 and 3", I'll consider it. I've thought about
that kind of thing before and come up with a solution of my own, which
is to allow a regex to be applied to multiple dataset simultaneously.
I'll explain that in more detail if you like. Of course, I did escape
one of the metacharacters in one of the examples, it was \G, just to
keep with notational consistency. Sorry if this muddied the waters even
more.
> What is the purpose of the + in the example for selecting the peak -
($peak, $left_of_peak, $max, $right_of_peak) = $fft_of_data =~ n/
(([[email protected],]+) (MM) ([[email protected],]+)) /;
The + is a quantifier, just like in standard regexes.
> I don't know standard Bracket Notation, but it seems that the "["
indicates the inclusive side of the < or > part of the range. However,
when I look at the examples I don't see the same nomenclature. Am I
missing it?
You're right about the interpretation of square brackets in this
example. I've decided to have two different uses for both parentheses
and brackets, depending on context, and both based on standard usages in
their context's fields. First, the Bracket Notation from math uses
parentheses and brackets to specify ranges, so that x is in [3,5) if 3
<= x < 5. If you change the opening bracket to a parenthesis, you
replace the less-than-or-equal-to with just less-than, so (3,5) means 3
< x < 5. However, regexes use (matched) parentheses to indicate
captures and matched brackets to indicate 'character classes'; both of
these concepts have sensible analogs in numerical regexes. The key
disambiguation is the presence (or lack) of a comma. Ranges must always
have a comma in them. Matched parentheses without a comma, such as the
OUTER parentheses in ([[email protected],]+), are captures.
> In the "skew example" I interpreted $peak to be a piddle,
$left_of_peak to be a single value that is 0.1 standard deviations away.
But then later you used $left_of_peak->dim(0) seemingly to capture the
first data item. So the [...@s,] returns a piddle of values from the S
value to the peak - correct? I don't understand the reasoning for the
dim(0)*2 portion for the skew calculation either.
I guess the confusion here arises from the behavior of the captures.
I'm going to assume you're familiar with captures in standard Perl
regexes, so I won't explain them. In numerical regexes as I envision
them, all captures return piddles (just like all captures in Perl return
strings). They may be single-element piddles, but they are piddles.
Thus ([[email protected],]+), which occurs twice, captures one or more numbers (the +
is a quantifier) whose values are all greater than (mean + 0.1 * std
dev.) and stores the result in $2 or $4 depending on which one you're
talking about. These are later stored in $left_of_peak and
$right_of_peak, respectively. The expression (MM) captures the global
maximum and stores it in the single-element piddle $3, and later, $max.
The operation is nearly identical to this string-parsing regex:
$name = "My name is David C. Mertens";
($full_name, $first_name, $MI, $last_name) = $name =~
/((\w+)\s(\w)\.\s(\w+)/;
As for the dim(0)*2 business, this basically checks if the right side is
more than twice as long as the left, or if the left side is more than
twice as long as the right, like the distribution you get with Planck's law.
David
_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl