On Mon, 13 Aug 2001, Justin Erenkrantz wrote: > [ Maybe some Perl hacker will read this and have thoughts... ] > > Anyway, I encountered the following situation when running flood today > and the only way I can think of resolving this is with full blown > regular expressions. > > Here's the scenario I have: > > We want to extract some information from the response returned by > the server. So, let's say we want to get an ID back that is embedded > in a URL. An example: > > ...blah...<A HREF="http://www.example.com/test.jsp?id=123" > class="bar">Justin's test</A>...blah > > So what I currently have in CVS will work like ($ is a really bad > delimiter, but it's what I chose and is easily changeable if I could > come up with something better): > > <A HREF="http://www.example.com/test.jsp?id=$$" class="bar">Justin's test</A> > > And, $$ will now take on the value 123 by some rudimentary pattern > matching. > > However, that all gets shot to hell when faced with: > > <A HREF="http://www.example.com/test.jsp?id=$$" tabindex="50" > class="bar">Justin's test</A> > > Now, the tabindex value is keyed off of its position within the document > (and we can't move the tabindex value around due to limitations in JSP > land). I definitely don't want to hardcode 50 in the response > "template" (i.e. what flood will look for). So, the alternative seems > to be bite the bullet and use regex. So, the above example could be > coded in regex as: > > <A HREF="http://www.example.com/test.jsp?id=([^"]*)" ([^>]*)>Justin's test</A> > > Is this correct (Roy says so)? Then, $1 (variable one in the regex) is > 123 in my example. $2 is the rest of the junk I don't care much about.
I'm not sure I understand what do you want to accomplish. Do you want to have some templating features where you can say: <A HREF="http://www.example.com/test.jsp?id=[% id %]" tabindex="[% tabindex %]">Justin's test</A> and then substitute the values? Or are you looking for some elaborate regex? The best way to code complex regexs is to build regex tokens and then put them together. But I think what you really need is to use HTML::Parser to tokenize the elements and then apply simple regexes on each of these. Here is an example: #!/usr/bin/perl -w use strict; use HTML::Parser (); my $input = << "__INPUT__"; <A HREF="http://www.example.com/test.jsp?id=123" tabindex="50" class="bar">test </A> __INPUT__ my %map = ( href => sub { $_[0] =~ /\?id=(.*)$/ ? ("id", $1) : undef }, tabindex => sub { $_[0] ? ("tab", $_[0]) : undef }, class => sub { $_[0] ? ("class", $_[0]) : undef }, ); # Create parser object my $p = HTML::Parser->new( api_version => 3); $p->handler(start => \&a_start_handler, "self,tagname,attr,text"); sub a_start_handler { my($self, $tag, $attr, $text) = @_; if ($tag eq "a" and exists $attr->{href}) { for my $label (qw(href tabindex class)) { # do something with $attr->{$label} if (exists $map{$label} and exists $attr->{$label}){ # try to apply the match my @match = $map{$label}->($attr->{$label}); print sprintf "%+10s = %s\n", @match if @match; } } } } $p->parse($input); # or grab some file as an input. #$p->parse_file("test.html") or die $!; when you run this code it prints: id = 123 tab = 50 class = bar Once you have the tag tokenized, you can do pretty much whatever you want. I just gave you an example where you can do a simple pattern matching on simple tokens. > This also leads to a problem with how do I tell flood that I want to > retrieve $1 and place it in my "state" table? I don't know exactly how > to do that. I'm just thinking to hardcode $1 as what it should grab. > Maybe I could add a responsetemplatevalue in XML which says, "Use > this number parameter from the regex and store its value in your state > table." Is there some common semantic for doing this? > > Also, does anyone know anything about the POSIX regex functions (in > regex.h)? Is there a reason to use PCRE even when the POSIX regex > functions are available? I've coded up a quick proof-of-concept using > the POSIX regex functions, but I'm not sure why httpd doesn't use the > POSIX library (unless it isn't very common). I haven't come across a > system that didn't have POSIX regex, but I'll bet there is one. > However, both of the "target" platforms (Solaris and Linux) both have > the POSIX regex libraries. So, I'm tempted not to use PCRE unless > there is a good reason to. -- justin > _____________________________________________________________________ Stas Bekman JAm_pH -- Just Another mod_perl Hacker http://stason.org/ mod_perl Guide http://perl.apache.org/guide mailto:[EMAIL PROTECTED] http://localhost/ http://eXtropia.com/ http://singlesheaven.com http://perl.apache.org http://perlmonth.com/