subject:"\[PHP\] Need help with RegEx"

RE: [PHP] Need help with RegEx

2006-12-15 Thread Richard Lynch


preg_match_all('|status([^]*)/status|msU', $xml, $matches);
var_dump($matches);

YMMV

Download and play with The Regex Coach

On Mon, December 11, 2006 9:29 am, Brad Fuller wrote:

 The example provided didn't work for me.  It gave me the same string
 without
 anything modified.

 I am also looking for this solution to strip out text from some XML
 response
 I get from posting data to a remote server.  I can do it using
 substring
 functions but I'd like something more compact and portable. (A
 one-liner
 that I could modify for other uses as well)

 Example 1:
 someXMLtags
   status16664 Rejected: Invalid LTV/status
 /someXMLtags

 Example 2:
 someXMLtags
   statusUnable to Post, Invalid Information/status
 /someXMLtags

 I want what is inside the status tags.

 Does anyone have a working solution how we can get the text from
 inside
 these tags using regex?

 Much appreciated,

 B

 -Original Message-
 From: Michael [mailto:[EMAIL PROTECTED]
 Sent: Monday, December 11, 2006 6:59 AM
 To: Anthony Papillion
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] Need help with RegEx

 At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
 Hello Everyone,
 
 I am having a bit of problems wrapping my head around regular
 expressions. I
 thought I had a good grip on them but, for some reason, the
 expression
 I've
 created below simply doesn't work! Basically, I need to retreive
 all of
 the
 text between two unique and specific tags but I don't need the tag
 text.
 So
 let's say that the tag is
 
 tag lang='ttt'THIS IS A TEST/tag
 
 I would need to retreive THIS IS A TEST only and nothing else.
 
 Now, a bit more information: I am using cURL to retreive the entire
 contents
 of a webpage into a variable. I am then trying to perform the
 following
 regular expression on the retreived text:
 
 $trans_text = preg_match(\/div id=result_box
 dir=ltr(.+?)\/div/);

 Using the tags you describe here, and assuming the source html is in
 the
 variable $source_html, try this:

 $trans_text = preg_replace(/(.*?)(div id=result_box
 dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);

 how this breaks down is:

 opening quote for first parameter (your MATCH pattern).

 open regex match pattern= /

 first atom (.*?) = any or no leading text before div id=result_box
 dir=ltr,
 the ? makes it non-greedy so that it stops after finding the first
 match.

 second atom (div id=result_box dir=ltr) = the opening tag you are
 looking for.

 third atom (.*?) = the text you want to strip out, all text even if
 nothing is
 there, between the 2nd and
 4th atoms.

 fourth atom (\/div) = the closing tag of the div tag pair.

 fifth atom (.*?) = all of the rest of the source html after the
 closing
 tag up
 to the end of the line ^,even if there is nothing there.

 close regex match pattern= /s

 in order for this to work on html that may contain newlines, you
 must
 specify
 that the . can represent newline characters, this is done by adding
 the
 letter
 's' after your regex closing /, so the last thing in your regex
 match
 pattern
 would be /s.

 end of string ^ (this matches the end of the string you are
 matching/replacing
 , $source_html)

 closing quote for first parameter.

 The second parameter of the preg_replace is the atom # which
 contains the
 text
 you want to replace the text matched by the regex match pattern in
 the
 first
 parameter, in this case the text we want is in the third atom so
 this
 parameter
 would be $3 (this is the PHP way of back-referencing, if we wanted
 the
 text
 before the tag we would use atom 1, or $1, if we want the tag itself
 we
 use $2,
 etc basically a $ followed by the atom # that holds what we want to
 replace the
 $source_html into $trans_text).

 The third parameter of the preg_replace is the source you wish to
 match
 and
 replace from, in this case your source html in $source_html.

 after this executes, $trans_text should contain the innerText of the
 div
 id=result_box dir=ltr/div tag pair from $source_html, if there is
 nothing
 between the opening and closing tags, $trans_text will == , if
 there is
 only
 a newline between the tags, $trans_text will == \n. IMPORTANT: if
 the
 text
 between the tags contains a newline, $trans_text will also contain
 that
 newline
 character because we told . to match newlines.

 I am no regex expert by far, but this worked for me (assuming I
 copied it
 correctly here heh)
 There are doubtless many other ways to do this, and I am sure others
 on
 the
 list here will correct me if my way is wrong or inefficient.

 I hope this works for you and that I haven't horribly embarassed
 myself
 here.
 Good luck :)

 
 The problem is that when I echo the value of $trans_text variable,
 I end
 up
 with the entire HTML of the page.
 
 Can anyone clue me in to what I am doing wrong?
 
 Thanks,
 Anthony
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

 --
 PHP General Mailing List (http

RE: [PHP] Need help with RegEx

2006-12-12 Thread Ford, Mike

On 11 December 2006 19:43, Michael wrote:

 At 08:29 AM 12/11/2006 , Brad Fuller wrote:
  
  The example provided didn't work for me.  It gave me the same
  string without anything modified.
 
 You are absolutely correct, this is what I get for not
 testing it explicitly :( My most sincere apologies to the OP
 and the list, there is an error in my example (see below for
 correction) 
 
  I have cut and pasted from further down in the quoted
 message, for convenience 
   Using the tags you describe here, and assuming the source html is
   in the variable $source_html, try this:
   
   $trans_text = preg_replace(/(.*?)(div id=result_box
   dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);
 
 The End of string symbol ^ should not be included.

That's because ^ is not the end-of-string symbol -- it's the START-of-string 
symbol.  $ is the END-of string symbol.  But the OP doesn't need either of 
these symbols as he's not trying to match at the start or end of the string, 
and nor does he need your suggested leading and trailing (.*?) for the same 
reason.  Unless anchored with ^ and/or $, preg is perfectly happy to match in 
the middle of the subject string.

@Anthony: your pattern is fine -- it's what you're doing with it that's wrong.

On 11 December 2006 08:03, Anthony Papillion wrote:

 $trans_text = preg_match(\/div id=result_box
 dir=ltr(.+?)\/div/);
 
 The problem is that when I echo the value of $trans_text variable, I
 end up with the entire HTML of the page.

I don't see how this is possible, since preg_match returns an integer telling 
you how many times the pattern matched -- which will be 0 or 1, since 
preg_match doesn't do multiple matches!  You also clearly haven't given us your 
actual call, since you've only included the pattern and not the subject string.

What you're after is the third argument to preg_match, which returns an array 
of matched text; so for:

preg_match(/div id=result_box dir=ltr(.+?)\\/div/, $orig, $matches);

$matches[0]  will return the entire match (everything from div  to /div
$matches[1]  will return the first parenthesized expression, which is what 
you're looking for.

Note also the doubled backslash, since you need to pass a single backslash 
through to escape the / for preg_match.  As an alternative, I would strongly 
advise using a different delimiter, so that no escaping is needed; for instance:

preg_match(#div id=result_box dir=ltr(.+?)/div#, $orig, $matches);

Cheers!

Mike

-
Mike Ford,  Electronic Information Services Adviser,
Learning Support Services, Learning  Information Services,
JG125, James Graham Building, Leeds Metropolitan University,
Headingley Campus, LEEDS,  LS6 3QS,  United Kingdom
Email: [EMAIL PROTECTED]
Tel: +44 113 283 2600 extn 4730  Fax:  +44 113 283 3211 


To view the terms under which this email is distributed, please go to 
http://disclaimer.leedsmet.ac.uk/email.htm

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] Need help with RegEx

2006-12-12 Thread Michael

At 04:56 AM 12/12/2006 , Ford, Mike wrote:
On 11 December 2006 19:43, Michael wrote:

 At 08:29 AM 12/11/2006 , Brad Fuller wrote:
  
  The example provided didn't work for me.  It gave me the same
  string without anything modified.
 
 You are absolutely correct, this is what I get for not
 testing it explicitly :( My most sincere apologies to the OP
 and the list, there is an error in my example (see below for
 correction) 
 
  I have cut and pasted from further down in the quoted
 message, for convenience 
   Using the tags you describe here, and assuming the source html is
   in the variable $source_html, try this:
   
   $trans_text = preg_replace(/(.*?)(div id=result_box
   dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);
 
 The End of string symbol ^ should not be included.

That's because ^ is not the end-of-string symbol -- it's the START-of-string 
symbol.  $ is the END-of string symbol.  But the OP doesn't need either of 
these symbols as he's not trying to match at the start or end of the string, 
and nor does he need your suggested leading and trailing (.*?) for the same 
reason.  Unless anchored with ^ and/or $, preg is perfectly happy to match in 
the middle of the subject string.

Well, DOH, leave it to me to bugger something up like that heh, got the $ and ^ 
reversed. Thanks for correcting me :) 



@Anthony: your pattern is fine -- it's what you're doing with it that's wrong.

On 11 December 2006 08:03, Anthony Papillion wrote:

 $trans_text = preg_match(\/div id=result_box
 dir=ltr(.+?)\/div/);
 
 The problem is that when I echo the value of $trans_text variable, I
 end up with the entire HTML of the page.

I don't see how this is possible, since preg_match returns an integer telling 
you how many times the pattern matched -- which will be 0 or 1, since 
preg_match doesn't do multiple matches!  You also clearly haven't given us 
your actual call, since you've only included the pattern and not the subject 
string.

What you're after is the third argument to preg_match, which returns an array 
of matched text; so for:

preg_match(/div id=result_box dir=ltr(.+?)\\/div/, $orig, $matches);

$matches[0]  will return the entire match (everything from div  to /div
$matches[1]  will return the first parenthesized expression, which is what 
you're looking for.

Note also the doubled backslash, since you need to pass a single backslash 
through to escape the / for preg_match.  As an alternative, I would strongly 
advise using a different delimiter, so that no escaping is needed; for 
instance:

preg_match(#div id=result_box dir=ltr(.+?)/div#, $orig, $matches);

Cheers!

Mike

-
Mike Ford,  Electronic Information Services Adviser,
Learning Support Services, Learning  Information Services,
JG125, James Graham Building, Leeds Metropolitan University,
Headingley Campus, LEEDS,  LS6 3QS,  United Kingdom
Email: [EMAIL PROTECTED]
Tel: +44 113 283 2600 extn 4730  Fax:  +44 113 283 3211 


To view the terms under which this email is distributed, please go to 
http://disclaimer.leedsmet.ac.uk/email.htm

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

[PHP] Need help with RegEx

2006-12-11 Thread Anthony Papillion

Hello Everyone,

I am having a bit of problems wrapping my head around regular expressions. I 
thought I had a good grip on them but, for some reason, the expression I've 
created below simply doesn't work! Basically, I need to retreive all of the 
text between two unique and specific tags but I don't need the tag text. So 
let's say that the tag is

tag lang='ttt'THIS IS A TEST/tag

I would need to retreive THIS IS A TEST only and nothing else.

Now, a bit more information: I am using cURL to retreive the entire contents 
of a webpage into a variable. I am then trying to perform the following 
regular expression on the retreived text:

$trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);

The problem is that when I echo the value of $trans_text variable, I end up 
with the entire HTML of the page.

Can anyone clue me in to what I am doing wrong?

Thanks,
Anthony 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Need help with RegEx

2006-12-11 Thread Børge Holen

explode it

I'm having quite the difficulty to comprehend the regexp myself, but as a 
training, go ahead.

On Monday 11 December 2006 09:02, Anthony Papillion wrote:
 Hello Everyone,

 I am having a bit of problems wrapping my head around regular expressions.
 I thought I had a good grip on them but, for some reason, the expression
 I've created below simply doesn't work! Basically, I need to retreive all
 of the text between two unique and specific tags but I don't need the tag
 text. So let's say that the tag is

 tag lang='ttt'THIS IS A TEST/tag

 I would need to retreive THIS IS A TEST only and nothing else.

 Now, a bit more information: I am using cURL to retreive the entire
 contents of a webpage into a variable. I am then trying to perform the
 following regular expression on the retreived text:

 $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);

 The problem is that when I echo the value of $trans_text variable, I end up
 with the entire HTML of the page.

 Can anyone clue me in to what I am doing wrong?

 Thanks,
 Anthony

-- 
---
Børge
Kennel Arivene 
http://www.arivene.net
---

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Need help with RegEx

2006-12-11 Thread T . Lensselink

I'm no regex guru but something goes wrong here.

First of you miss the second parameter in preg_match

int preg_match ( string pattern, string subject [, array matches [, int flags 
[, int offset]]] )

If you need the text from two unique tags it should not be to hard:

$test = tag lang='ttt'THIS IS A TEST/tag;
preg_match(/tag lang='ttt'(.+?)\/tag/, $test, $matches);
print_r($matches);

Thijs

On Mon, 11 Dec 2006 02:02:46 -0600, Anthony Papillion [EMAIL PROTECTED] 
wrote:
 Hello Everyone,
 
 I am having a bit of problems wrapping my head around regular expressions.
 I
 thought I had a good grip on them but, for some reason, the expression
 I've
 created below simply doesn't work! Basically, I need to retreive all of
 the
 text between two unique and specific tags but I don't need the tag text.
 So
 let's say that the tag is
 
 tag lang='ttt'THIS IS A TEST/tag
 
 I would need to retreive THIS IS A TEST only and nothing else.
 
 Now, a bit more information: I am using cURL to retreive the entire
 contents
 of a webpage into a variable. I am then trying to perform the following
 regular expression on the retreived text:
 
 $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);
 
 The problem is that when I echo the value of $trans_text variable, I end
 up
 with the entire HTML of the page.
 
 Can anyone clue me in to what I am doing wrong?
 
 Thanks,
 Anthony
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Need help with RegEx

2006-12-11 Thread Roman Neuhauser

# [EMAIL PROTECTED] / 2006-12-11 02:02:46 -0600:
 I am having a bit of problems wrapping my head around regular expressions. I 
 thought I had a good grip on them but, for some reason, the expression I've 
 created below simply doesn't work! Basically, I need to retreive all of the 
 text between two unique and specific tags but I don't need the tag text. So 
 let's say that the tag is
 
 tag lang='ttt'THIS IS A TEST/tag
 
 I would need to retreive THIS IS A TEST only and nothing else.
 
 Now, a bit more information: I am using cURL to retreive the entire contents 
 of a webpage into a variable. I am then trying to perform the following 
 regular expression on the retreived text:
 
 $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);
 
 The problem is that when I echo the value of $trans_text variable, I end up 
 with the entire HTML of the page.

This is hardly the code you're actually using[1], can you please
provide a piece of real code?

[1] int preg_match ( string pattern, string subject [, array matches [, 
int flags [, int offset]]] )

-- 
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE. http://bash.org/?255991

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Need help with RegEx

2006-12-11 Thread Michael

At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
Hello Everyone,

I am having a bit of problems wrapping my head around regular expressions. I 
thought I had a good grip on them but, for some reason, the expression I've 
created below simply doesn't work! Basically, I need to retreive all of the 
text between two unique and specific tags but I don't need the tag text. So 
let's say that the tag is

tag lang='ttt'THIS IS A TEST/tag

I would need to retreive THIS IS A TEST only and nothing else.

Now, a bit more information: I am using cURL to retreive the entire contents 
of a webpage into a variable. I am then trying to perform the following 
regular expression on the retreived text:

$trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);

Using the tags you describe here, and assuming the source html is in the
variable $source_html, try this:

$trans_text = preg_replace(/(.*?)(div id=result_box
dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);

how this breaks down is:
 
opening quote for first parameter (your MATCH pattern).

open regex match pattern= /

first atom (.*?) = any or no leading text before div id=result_box dir=ltr,
the ? makes it non-greedy so that it stops after finding the first match.

second atom (div id=result_box dir=ltr) = the opening tag you are looking for.

third atom (.*?) = the text you want to strip out, all text even if nothing is
there, between the 2nd and
4th atoms.

fourth atom (\/div) = the closing tag of the div tag pair.

fifth atom (.*?) = all of the rest of the source html after the closing tag up
to the end of the line ^,even if there is nothing there.

close regex match pattern= /s

in order for this to work on html that may contain newlines, you must specify
that the . can represent newline characters, this is done by adding the letter
's' after your regex closing /, so the last thing in your regex match pattern
would be /s.

end of string ^ (this matches the end of the string you are matching/replacing
, $source_html)

closing quote for first parameter.

The second parameter of the preg_replace is the atom # which contains the text
you want to replace the text matched by the regex match pattern in the first
parameter, in this case the text we want is in the third atom so this parameter
would be $3 (this is the PHP way of back-referencing, if we wanted the text
before the tag we would use atom 1, or $1, if we want the tag itself we use $2,
etc basically a $ followed by the atom # that holds what we want to replace the
$source_html into $trans_text).

The third parameter of the preg_replace is the source you wish to match and
replace from, in this case your source html in $source_html.

after this executes, $trans_text should contain the innerText of the div
id=result_box dir=ltr/div tag pair from $source_html, if there is nothing
between the opening and closing tags, $trans_text will == , if there is only
a newline between the tags, $trans_text will == \n. IMPORTANT: if the text
between the tags contains a newline, $trans_text will also contain that newline
character because we told . to match newlines.

I am no regex expert by far, but this worked for me (assuming I copied it
correctly here heh)
There are doubtless many other ways to do this, and I am sure others on the
list here will correct me if my way is wrong or inefficient.

I hope this works for you and that I haven't horribly embarassed myself here.
Good luck :)


The problem is that when I echo the value of $trans_text variable, I end up 
with the entire HTML of the page.

Can anyone clue me in to what I am doing wrong?

Thanks,
Anthony 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP] Need help with RegEx

2006-12-11 Thread Michael

I just realized I neglected to explain a couple of things here, sorry...

My method will only work for the FIRST occurrence of the div tag pair in 
$source_html.

The reason this method works is that you are telling preg_replace to replace 
everything that matches the match pattern, with just what is contained in the 
third atom of the match pattern. Since we are matching everything between the 
start of $source_html and the end of $source_html (the (.*?) atom at the 
beginning, and the (.*?)^ atom at the end) your return value ends up being $3, 
or the contents of the third atom of the match pattern, which represents the 
text between the opening tag and closing tag of your div element.

hope this makes sense, I'm writing this at 5am heh

Cheers,
Michael

At 04:58 AM 12/11/2006 , Michael wrote:
At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
Hello Everyone,

I am having a bit of problems wrapping my head around regular expressions. I 
thought I had a good grip on them but, for some reason, the expression I've 
created below simply doesn't work! Basically, I need to retreive all of the 
text between two unique and specific tags but I don't need the tag text. So 
let's say that the tag is

tag lang='ttt'THIS IS A TEST/tag

I would need to retreive THIS IS A TEST only and nothing else.

Now, a bit more information: I am using cURL to retreive the entire contents 
of a webpage into a variable. I am then trying to perform the following 
regular expression on the retreived text:

$trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);

Using the tags you describe here, and assuming the source html is in the
variable $source_html, try this:

$trans_text = preg_replace(/(.*?)(div id=result_box
dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);

how this breaks down is:
 
opening quote for first parameter (your MATCH pattern).

open regex match pattern= /

first atom (.*?) = any or no leading text before div id=result_box dir=ltr,
the ? makes it non-greedy so that it stops after finding the first match.

second atom (div id=result_box dir=ltr) = the opening tag you are looking 
for.

third atom (.*?) = the text you want to strip out, all text even if nothing is
there, between the 2nd and
4th atoms.

fourth atom (\/div) = the closing tag of the div tag pair.

fifth atom (.*?) = all of the rest of the source html after the closing tag up
to the end of the line ^,even if there is nothing there.

close regex match pattern= /s

in order for this to work on html that may contain newlines, you must specify
that the . can represent newline characters, this is done by adding the letter
's' after your regex closing /, so the last thing in your regex match pattern
would be /s.

end of string ^ (this matches the end of the string you are matching/replacing
, $source_html)

closing quote for first parameter.

The second parameter of the preg_replace is the atom # which contains the text
you want to replace the text matched by the regex match pattern in the first
parameter, in this case the text we want is in the third atom so this parameter
would be $3 (this is the PHP way of back-referencing, if we wanted the text
before the tag we would use atom 1, or $1, if we want the tag itself we use $2,
etc basically a $ followed by the atom # that holds what we want to replace the
$source_html into $trans_text).

The third parameter of the preg_replace is the source you wish to match and
replace from, in this case your source html in $source_html.

after this executes, $trans_text should contain the innerText of the div
id=result_box dir=ltr/div tag pair from $source_html, if there is nothing
between the opening and closing tags, $trans_text will == , if there is only
a newline between the tags, $trans_text will == \n. IMPORTANT: if the text
between the tags contains a newline, $trans_text will also contain that newline
character because we told . to match newlines.

I am no regex expert by far, but this worked for me (assuming I copied it
correctly here heh)
There are doubtless many other ways to do this, and I am sure others on the
list here will correct me if my way is wrong or inefficient.

I hope this works for you and that I haven't horribly embarassed myself here.
Good luck :)


The problem is that when I echo the value of $trans_text variable, I end up 
with the entire HTML of the page.

Can anyone clue me in to what I am doing wrong?

Thanks,
Anthony 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
  

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] Need help with RegEx

2006-12-11 Thread Brad Fuller


The example provided didn't work for me.  It gave me the same string without
anything modified.

I am also looking for this solution to strip out text from some XML response
I get from posting data to a remote server.  I can do it using substring
functions but I'd like something more compact and portable. (A one-liner
that I could modify for other uses as well)

Example 1:
someXMLtags
status16664 Rejected: Invalid LTV/status
/someXMLtags

Example 2:
someXMLtags
statusUnable to Post, Invalid Information/status
/someXMLtags

I want what is inside the status tags.

Does anyone have a working solution how we can get the text from inside
these tags using regex?

Much appreciated,

B

 -Original Message-
 From: Michael [mailto:[EMAIL PROTECTED]
 Sent: Monday, December 11, 2006 6:59 AM
 To: Anthony Papillion
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] Need help with RegEx
 
 At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
 Hello Everyone,
 
 I am having a bit of problems wrapping my head around regular
 expressions. I
 thought I had a good grip on them but, for some reason, the expression
 I've
 created below simply doesn't work! Basically, I need to retreive all of
 the
 text between two unique and specific tags but I don't need the tag text.
 So
 let's say that the tag is
 
 tag lang='ttt'THIS IS A TEST/tag
 
 I would need to retreive THIS IS A TEST only and nothing else.
 
 Now, a bit more information: I am using cURL to retreive the entire
 contents
 of a webpage into a variable. I am then trying to perform the following
 regular expression on the retreived text:
 
 $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);
 
 Using the tags you describe here, and assuming the source html is in the
 variable $source_html, try this:
 
 $trans_text = preg_replace(/(.*?)(div id=result_box
 dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);
 
 how this breaks down is:
 
 opening quote for first parameter (your MATCH pattern).
 
 open regex match pattern= /
 
 first atom (.*?) = any or no leading text before div id=result_box
 dir=ltr,
 the ? makes it non-greedy so that it stops after finding the first match.
 
 second atom (div id=result_box dir=ltr) = the opening tag you are
 looking for.
 
 third atom (.*?) = the text you want to strip out, all text even if
 nothing is
 there, between the 2nd and
 4th atoms.
 
 fourth atom (\/div) = the closing tag of the div tag pair.
 
 fifth atom (.*?) = all of the rest of the source html after the closing
 tag up
 to the end of the line ^,even if there is nothing there.
 
 close regex match pattern= /s
 
 in order for this to work on html that may contain newlines, you must
 specify
 that the . can represent newline characters, this is done by adding the
 letter
 's' after your regex closing /, so the last thing in your regex match
 pattern
 would be /s.
 
 end of string ^ (this matches the end of the string you are
 matching/replacing
 , $source_html)
 
 closing quote for first parameter.
 
 The second parameter of the preg_replace is the atom # which contains the
 text
 you want to replace the text matched by the regex match pattern in the
 first
 parameter, in this case the text we want is in the third atom so this
 parameter
 would be $3 (this is the PHP way of back-referencing, if we wanted the
 text
 before the tag we would use atom 1, or $1, if we want the tag itself we
 use $2,
 etc basically a $ followed by the atom # that holds what we want to
 replace the
 $source_html into $trans_text).
 
 The third parameter of the preg_replace is the source you wish to match
 and
 replace from, in this case your source html in $source_html.
 
 after this executes, $trans_text should contain the innerText of the div
 id=result_box dir=ltr/div tag pair from $source_html, if there is
 nothing
 between the opening and closing tags, $trans_text will == , if there is
 only
 a newline between the tags, $trans_text will == \n. IMPORTANT: if the
 text
 between the tags contains a newline, $trans_text will also contain that
 newline
 character because we told . to match newlines.
 
 I am no regex expert by far, but this worked for me (assuming I copied it
 correctly here heh)
 There are doubtless many other ways to do this, and I am sure others on
 the
 list here will correct me if my way is wrong or inefficient.
 
 I hope this works for you and that I haven't horribly embarassed myself
 here.
 Good luck :)
 
 
 The problem is that when I echo the value of $trans_text variable, I end
 up
 with the entire HTML of the page.
 
 Can anyone clue me in to what I am doing wrong?
 
 Thanks,
 Anthony
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] Need help with RegEx

2006-12-11 Thread tg-php

If you didn't say using regex this is how I'd do it  (untested, forgive typos 
and such..ripped from some code I actively use and stripped down):

?PHP

  $_XML_RESPONSE_PARSER = xml_parser_create();
  xml_set_element_handler($_XML_RESPONSE_PARSER, 
'xml_response_open_element_function', 'xml_response_close_element_function');
  xml_set_character_data_handler($_XML_RESPONSE_PARSER, 
'xml_response_handle_character_data');
  xml_parse($_XML_RESPONSE_PARSER, $_XML_RESPONSE, strlen($_XML_RESPONSE));
  xml_parser_free($_XML_RESPONSE_PARSER);
~
  $FoundStatusTag = false;
~
  function xml_response_open_element_function($p, $element, $attributes) {
global $FoundStatusTag;
~~
if (strtoupper($element) == STATUS) $FoundStatusTag = true;
  }
~
  function xml_response_close_element_function($p, $element){
global $FoundStatusTag;
~  
// do nothing special for now
  }
~
  function xml_response_handle_character_data($p, $cdata){ 
global $FoundStatusTag;
~  
if ($FoundStatusTag) {
  echo $cdata;
  $FoundStatusTag = false;
}
  }

?

= = = Original message = = =

The example provided didn't work for me.  It gave me the same string without
anything modified.

I am also looking for this solution to strip out text from some XML response
I get from posting data to a remote server.  I can do it using substring
functions but I'd like something more compact and portable. (A one-liner
that I could modify for other uses as well)

Example 1:
someXMLtags
~status16664 Rejected: Invalid LTV/status
/someXMLtags

Example 2:
someXMLtags
~statusUnable to Post, Invalid Information/status
/someXMLtags

I want what is inside the status tags.

Does anyone have a working solution how we can get the text from inside
these tags using regex?

Much appreciated,

B


___
Sent by ePrompter, the premier email notification software.
Free download at http://www.ePrompter.com.

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] Need help with RegEx

2006-12-11 Thread Brad Fuller


I got it.

?php
$input = xmlJunkstatusHello, World!/status/xmlJunk;

preg_match(#status(.*?)/status#s, $input, $matches);
echo $matches[1];
?


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
 Sent: Monday, December 11, 2006 10:59 AM
 To: php-general@lists.php.net
 Cc: [EMAIL PROTECTED]
 Subject: RE: [PHP] Need help with RegEx
 
 If you didn't say using regex this is how I'd do it  (untested, forgive
 typos and such..ripped from some code I actively use and stripped down):
 
 ?PHP
 
   $_XML_RESPONSE_PARSER = xml_parser_create();
   xml_set_element_handler($_XML_RESPONSE_PARSER,
 'xml_response_open_element_function',
 'xml_response_close_element_function');
   xml_set_character_data_handler($_XML_RESPONSE_PARSER,
 'xml_response_handle_character_data');
   xml_parse($_XML_RESPONSE_PARSER, $_XML_RESPONSE,
 strlen($_XML_RESPONSE));
   xml_parser_free($_XML_RESPONSE_PARSER);
 ~
   $FoundStatusTag = false;
 ~
   function xml_response_open_element_function($p, $element, $attributes) {
 global $FoundStatusTag;
 ~~
 if (strtoupper($element) == STATUS) $FoundStatusTag = true;
   }
 ~
   function xml_response_close_element_function($p, $element){
 global $FoundStatusTag;
 ~
 // do nothing special for now
   }
 ~
   function xml_response_handle_character_data($p, $cdata){
 global $FoundStatusTag;
 ~
 if ($FoundStatusTag) {
   echo $cdata;
   $FoundStatusTag = false;
 }
   }
 
 ?
 
 = = = Original message = = =
 
 The example provided didn't work for me.  It gave me the same string
 without
 anything modified.
 
 I am also looking for this solution to strip out text from some XML
 response
 I get from posting data to a remote server.  I can do it using substring
 functions but I'd like something more compact and portable. (A one-liner
 that I could modify for other uses as well)
 
 Example 1:
 someXMLtags
 ~status16664 Rejected: Invalid LTV/status
 /someXMLtags
 
 Example 2:
 someXMLtags
 ~statusUnable to Post, Invalid Information/status
 /someXMLtags
 
 I want what is inside the status tags.
 
 Does anyone have a working solution how we can get the text from inside
 these tags using regex?
 
 Much appreciated,
 
 B
 
 
 ___
 Sent by ePrompter, the premier email notification software.
 Free download at http://www.ePrompter.com.
 
 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php
 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php

RE: [PHP] Need help with RegEx

2006-12-11 Thread Michael



At 08:29 AM 12/11/2006 , Brad Fuller wrote:

The example provided didn't work for me.  It gave me the same string without
anything modified.

You are absolutely correct, this is what I get for not testing it explicitly :( 
My most sincere apologies to the OP and the list, there is an error in my 
example (see below for correction)

 I have cut and pasted from further down in the quoted message, for 
convenience 
 Using the tags you describe here, and assuming the source html is in the
 variable $source_html, try this:
 
 $trans_text = preg_replace(/(.*?)(div id=result_box
 dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);

The End of string symbol ^ should not be included. I tested the above function 
without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace(/(.*?)(div id=result_box 
dir=ltr)(.*?)(\/div)(.*?)/s,$3,$source_html);
* end of pasted section *



I am also looking for this solution to strip out text from some XML response
I get from posting data to a remote server.  I can do it using substring
functions but I'd like something more compact and portable. (A one-liner
that I could modify for other uses as well)

Example 1:
someXMLtags
   status16664 Rejected: Invalid LTV/status
/someXMLtags

Example 2:
someXMLtags
   statusUnable to Post, Invalid Information/status
/someXMLtags

I want what is inside the status tags.

Does anyone have a working solution how we can get the text from inside
these tags using regex?

Much appreciated,

B

 -Original Message-
 From: Michael [mailto:[EMAIL PROTECTED]
 Sent: Monday, December 11, 2006 6:59 AM
 To: Anthony Papillion
 Cc: php-general@lists.php.net
 Subject: Re: [PHP] Need help with RegEx
 
 At 01:02 AM 12/11/2006 , Anthony Papillion wrote:
 Hello Everyone,
 
 I am having a bit of problems wrapping my head around regular
 expressions. I
 thought I had a good grip on them but, for some reason, the expression
 I've
 created below simply doesn't work! Basically, I need to retreive all of
 the
 text between two unique and specific tags but I don't need the tag text.
 So
 let's say that the tag is
 
 tag lang='ttt'THIS IS A TEST/tag
 
 I would need to retreive THIS IS A TEST only and nothing else.
 
 Now, a bit more information: I am using cURL to retreive the entire
 contents
 of a webpage into a variable. I am then trying to perform the following
 regular expression on the retreived text:
 
 $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/);
 
 Using the tags you describe here, and assuming the source html is in the
 variable $source_html, try this:
 
 $trans_text = preg_replace(/(.*?)(div id=result_box
 dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html);

The End of string symbol ^ should not be included. I tested the above function 
without the ^ and it worked for me. below is the TESTED version:

$trans_text = preg_replace(/(.*?)(div id=result_box 
dir=ltr)(.*?)(\/div)(.*?)/s,$3,$source_html);

 
 how this breaks down is:
 
 opening quote for first parameter (your MATCH pattern).
 
 open regex match pattern= /
 
 first atom (.*?) = any or no leading text before div id=result_box
 dir=ltr,
 the ? makes it non-greedy so that it stops after finding the first match.
 
 second atom (div id=result_box dir=ltr) = the opening tag you are
 looking for.
 
 third atom (.*?) = the text you want to strip out, all text even if
 nothing is
 there, between the 2nd and
 4th atoms.
 
 fourth atom (\/div) = the closing tag of the div tag pair.
 
 fifth atom (.*?) = all of the rest of the source html after the closing
 tag up
 to the end of the line ^,even if there is nothing there.
 
 close regex match pattern= /s
 
 in order for this to work on html that may contain newlines, you must
 specify
 that the . can represent newline characters, this is done by adding the
 letter
 's' after your regex closing /, so the last thing in your regex match
 pattern
 would be /s.
 
 end of string ^ (this matches the end of the string you are
 matching/replacing
 , $source_html)

 ignore this part of the explanation, the ^ is not needed and in fact breaks 
the example given

 
 closing quote for first parameter.
 
 The second parameter of the preg_replace is the atom # which contains the
 text
 you want to replace the text matched by the regex match pattern in the
 first
 parameter, in this case the text we want is in the third atom so this
 parameter
 would be $3 (this is the PHP way of back-referencing, if we wanted the
 text
 before the tag we would use atom 1, or $1, if we want the tag itself we
 use $2,
 etc basically a $ followed by the atom # that holds what we want to
 replace the
 $source_html into $trans_text).
 
 The third parameter of the preg_replace is the source you wish to match
 and
 replace from, in this case your source html in $source_html.
 
 after this executes, $trans_text should contain the innerText of the div
 id=result_box dir=ltr/div tag pair from $source_html, if there is
 nothing

RE: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

[PHP] Need help with RegEx

Re: [PHP] Need help with RegEx

Re: [PHP] Need help with RegEx

Re: [PHP] Need help with RegEx

Re: [PHP] Need help with RegEx

Re: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

RE: [PHP] Need help with RegEx

13 matches

Site Navigation

Mail list logo

Footer information