RE: [PHP] Need help with RegEx
preg_match_all('|status([^]*)/status|msU', $xml, $matches); var_dump($matches); YMMV Download and play with The Regex Coach On Mon, December 11, 2006 9:29 am, Brad Fuller wrote: The example provided didn't work for me. It gave me the same string without anything modified. I am also looking for this solution to strip out text from some XML response I get from posting data to a remote server. I can do it using substring functions but I'd like something more compact and portable. (A one-liner that I could modify for other uses as well) Example 1: someXMLtags status16664 Rejected: Invalid LTV/status /someXMLtags Example 2: someXMLtags statusUnable to Post, Invalid Information/status /someXMLtags I want what is inside the status tags. Does anyone have a working solution how we can get the text from inside these tags using regex? Much appreciated, B -Original Message- From: Michael [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 6:59 AM To: Anthony Papillion Cc: php-general@lists.php.net Subject: Re: [PHP] Need help with RegEx At 01:02 AM 12/11/2006 , Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); how this breaks down is: opening quote for first parameter (your MATCH pattern). open regex match pattern= / first atom (.*?) = any or no leading text before div id=result_box dir=ltr, the ? makes it non-greedy so that it stops after finding the first match. second atom (div id=result_box dir=ltr) = the opening tag you are looking for. third atom (.*?) = the text you want to strip out, all text even if nothing is there, between the 2nd and 4th atoms. fourth atom (\/div) = the closing tag of the div tag pair. fifth atom (.*?) = all of the rest of the source html after the closing tag up to the end of the line ^,even if there is nothing there. close regex match pattern= /s in order for this to work on html that may contain newlines, you must specify that the . can represent newline characters, this is done by adding the letter 's' after your regex closing /, so the last thing in your regex match pattern would be /s. end of string ^ (this matches the end of the string you are matching/replacing , $source_html) closing quote for first parameter. The second parameter of the preg_replace is the atom # which contains the text you want to replace the text matched by the regex match pattern in the first parameter, in this case the text we want is in the third atom so this parameter would be $3 (this is the PHP way of back-referencing, if we wanted the text before the tag we would use atom 1, or $1, if we want the tag itself we use $2, etc basically a $ followed by the atom # that holds what we want to replace the $source_html into $trans_text). The third parameter of the preg_replace is the source you wish to match and replace from, in this case your source html in $source_html. after this executes, $trans_text should contain the innerText of the div id=result_box dir=ltr/div tag pair from $source_html, if there is nothing between the opening and closing tags, $trans_text will == , if there is only a newline between the tags, $trans_text will == \n. IMPORTANT: if the text between the tags contains a newline, $trans_text will also contain that newline character because we told . to match newlines. I am no regex expert by far, but this worked for me (assuming I copied it correctly here heh) There are doubtless many other ways to do this, and I am sure others on the list here will correct me if my way is wrong or inefficient. I hope this works for you and that I haven't horribly embarassed myself here. Good luck :) The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http
RE: [PHP] Need help with RegEx
On 11 December 2006 19:43, Michael wrote: At 08:29 AM 12/11/2006 , Brad Fuller wrote: The example provided didn't work for me. It gave me the same string without anything modified. You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction) I have cut and pasted from further down in the quoted message, for convenience Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); The End of string symbol ^ should not be included. That's because ^ is not the end-of-string symbol -- it's the START-of-string symbol. $ is the END-of string symbol. But the OP doesn't need either of these symbols as he's not trying to match at the start or end of the string, and nor does he need your suggested leading and trailing (.*?) for the same reason. Unless anchored with ^ and/or $, preg is perfectly happy to match in the middle of the subject string. @Anthony: your pattern is fine -- it's what you're doing with it that's wrong. On 11 December 2006 08:03, Anthony Papillion wrote: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. I don't see how this is possible, since preg_match returns an integer telling you how many times the pattern matched -- which will be 0 or 1, since preg_match doesn't do multiple matches! You also clearly haven't given us your actual call, since you've only included the pattern and not the subject string. What you're after is the third argument to preg_match, which returns an array of matched text; so for: preg_match(/div id=result_box dir=ltr(.+?)\\/div/, $orig, $matches); $matches[0] will return the entire match (everything from div to /div $matches[1] will return the first parenthesized expression, which is what you're looking for. Note also the doubled backslash, since you need to pass a single backslash through to escape the / for preg_match. As an alternative, I would strongly advise using a different delimiter, so that no escaping is needed; for instance: preg_match(#div id=result_box dir=ltr(.+?)/div#, $orig, $matches); Cheers! Mike - Mike Ford, Electronic Information Services Adviser, Learning Support Services, Learning Information Services, JG125, James Graham Building, Leeds Metropolitan University, Headingley Campus, LEEDS, LS6 3QS, United Kingdom Email: [EMAIL PROTECTED] Tel: +44 113 283 2600 extn 4730 Fax: +44 113 283 3211 To view the terms under which this email is distributed, please go to http://disclaimer.leedsmet.ac.uk/email.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Need help with RegEx
At 04:56 AM 12/12/2006 , Ford, Mike wrote: On 11 December 2006 19:43, Michael wrote: At 08:29 AM 12/11/2006 , Brad Fuller wrote: The example provided didn't work for me. It gave me the same string without anything modified. You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction) I have cut and pasted from further down in the quoted message, for convenience Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); The End of string symbol ^ should not be included. That's because ^ is not the end-of-string symbol -- it's the START-of-string symbol. $ is the END-of string symbol. But the OP doesn't need either of these symbols as he's not trying to match at the start or end of the string, and nor does he need your suggested leading and trailing (.*?) for the same reason. Unless anchored with ^ and/or $, preg is perfectly happy to match in the middle of the subject string. Well, DOH, leave it to me to bugger something up like that heh, got the $ and ^ reversed. Thanks for correcting me :) @Anthony: your pattern is fine -- it's what you're doing with it that's wrong. On 11 December 2006 08:03, Anthony Papillion wrote: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. I don't see how this is possible, since preg_match returns an integer telling you how many times the pattern matched -- which will be 0 or 1, since preg_match doesn't do multiple matches! You also clearly haven't given us your actual call, since you've only included the pattern and not the subject string. What you're after is the third argument to preg_match, which returns an array of matched text; so for: preg_match(/div id=result_box dir=ltr(.+?)\\/div/, $orig, $matches); $matches[0] will return the entire match (everything from div to /div $matches[1] will return the first parenthesized expression, which is what you're looking for. Note also the doubled backslash, since you need to pass a single backslash through to escape the / for preg_match. As an alternative, I would strongly advise using a different delimiter, so that no escaping is needed; for instance: preg_match(#div id=result_box dir=ltr(.+?)/div#, $orig, $matches); Cheers! Mike - Mike Ford, Electronic Information Services Adviser, Learning Support Services, Learning Information Services, JG125, James Graham Building, Leeds Metropolitan University, Headingley Campus, LEEDS, LS6 3QS, United Kingdom Email: [EMAIL PROTECTED] Tel: +44 113 283 2600 extn 4730 Fax: +44 113 283 3211 To view the terms under which this email is distributed, please go to http://disclaimer.leedsmet.ac.uk/email.htm -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Need help with RegEx
explode it I'm having quite the difficulty to comprehend the regexp myself, but as a training, go ahead. On Monday 11 December 2006 09:02, Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- --- Børge Kennel Arivene http://www.arivene.net --- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Need help with RegEx
I'm no regex guru but something goes wrong here. First of you miss the second parameter in preg_match int preg_match ( string pattern, string subject [, array matches [, int flags [, int offset]]] ) If you need the text from two unique tags it should not be to hard: $test = tag lang='ttt'THIS IS A TEST/tag; preg_match(/tag lang='ttt'(.+?)\/tag/, $test, $matches); print_r($matches); Thijs On Mon, 11 Dec 2006 02:02:46 -0600, Anthony Papillion [EMAIL PROTECTED] wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Need help with RegEx
# [EMAIL PROTECTED] / 2006-12-11 02:02:46 -0600: I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. This is hardly the code you're actually using[1], can you please provide a piece of real code? [1] int preg_match ( string pattern, string subject [, array matches [, int flags [, int offset]]] ) -- How many Vietnam vets does it take to screw in a light bulb? You don't know, man. You don't KNOW. Cause you weren't THERE. http://bash.org/?255991 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Need help with RegEx
At 01:02 AM 12/11/2006 , Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); how this breaks down is: opening quote for first parameter (your MATCH pattern). open regex match pattern= / first atom (.*?) = any or no leading text before div id=result_box dir=ltr, the ? makes it non-greedy so that it stops after finding the first match. second atom (div id=result_box dir=ltr) = the opening tag you are looking for. third atom (.*?) = the text you want to strip out, all text even if nothing is there, between the 2nd and 4th atoms. fourth atom (\/div) = the closing tag of the div tag pair. fifth atom (.*?) = all of the rest of the source html after the closing tag up to the end of the line ^,even if there is nothing there. close regex match pattern= /s in order for this to work on html that may contain newlines, you must specify that the . can represent newline characters, this is done by adding the letter 's' after your regex closing /, so the last thing in your regex match pattern would be /s. end of string ^ (this matches the end of the string you are matching/replacing , $source_html) closing quote for first parameter. The second parameter of the preg_replace is the atom # which contains the text you want to replace the text matched by the regex match pattern in the first parameter, in this case the text we want is in the third atom so this parameter would be $3 (this is the PHP way of back-referencing, if we wanted the text before the tag we would use atom 1, or $1, if we want the tag itself we use $2, etc basically a $ followed by the atom # that holds what we want to replace the $source_html into $trans_text). The third parameter of the preg_replace is the source you wish to match and replace from, in this case your source html in $source_html. after this executes, $trans_text should contain the innerText of the div id=result_box dir=ltr/div tag pair from $source_html, if there is nothing between the opening and closing tags, $trans_text will == , if there is only a newline between the tags, $trans_text will == \n. IMPORTANT: if the text between the tags contains a newline, $trans_text will also contain that newline character because we told . to match newlines. I am no regex expert by far, but this worked for me (assuming I copied it correctly here heh) There are doubtless many other ways to do this, and I am sure others on the list here will correct me if my way is wrong or inefficient. I hope this works for you and that I haven't horribly embarassed myself here. Good luck :) The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Need help with RegEx
I just realized I neglected to explain a couple of things here, sorry... My method will only work for the FIRST occurrence of the div tag pair in $source_html. The reason this method works is that you are telling preg_replace to replace everything that matches the match pattern, with just what is contained in the third atom of the match pattern. Since we are matching everything between the start of $source_html and the end of $source_html (the (.*?) atom at the beginning, and the (.*?)^ atom at the end) your return value ends up being $3, or the contents of the third atom of the match pattern, which represents the text between the opening tag and closing tag of your div element. hope this makes sense, I'm writing this at 5am heh Cheers, Michael At 04:58 AM 12/11/2006 , Michael wrote: At 01:02 AM 12/11/2006 , Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); how this breaks down is: opening quote for first parameter (your MATCH pattern). open regex match pattern= / first atom (.*?) = any or no leading text before div id=result_box dir=ltr, the ? makes it non-greedy so that it stops after finding the first match. second atom (div id=result_box dir=ltr) = the opening tag you are looking for. third atom (.*?) = the text you want to strip out, all text even if nothing is there, between the 2nd and 4th atoms. fourth atom (\/div) = the closing tag of the div tag pair. fifth atom (.*?) = all of the rest of the source html after the closing tag up to the end of the line ^,even if there is nothing there. close regex match pattern= /s in order for this to work on html that may contain newlines, you must specify that the . can represent newline characters, this is done by adding the letter 's' after your regex closing /, so the last thing in your regex match pattern would be /s. end of string ^ (this matches the end of the string you are matching/replacing , $source_html) closing quote for first parameter. The second parameter of the preg_replace is the atom # which contains the text you want to replace the text matched by the regex match pattern in the first parameter, in this case the text we want is in the third atom so this parameter would be $3 (this is the PHP way of back-referencing, if we wanted the text before the tag we would use atom 1, or $1, if we want the tag itself we use $2, etc basically a $ followed by the atom # that holds what we want to replace the $source_html into $trans_text). The third parameter of the preg_replace is the source you wish to match and replace from, in this case your source html in $source_html. after this executes, $trans_text should contain the innerText of the div id=result_box dir=ltr/div tag pair from $source_html, if there is nothing between the opening and closing tags, $trans_text will == , if there is only a newline between the tags, $trans_text will == \n. IMPORTANT: if the text between the tags contains a newline, $trans_text will also contain that newline character because we told . to match newlines. I am no regex expert by far, but this worked for me (assuming I copied it correctly here heh) There are doubtless many other ways to do this, and I am sure others on the list here will correct me if my way is wrong or inefficient. I hope this works for you and that I haven't horribly embarassed myself here. Good luck :) The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Need help with RegEx
The example provided didn't work for me. It gave me the same string without anything modified. I am also looking for this solution to strip out text from some XML response I get from posting data to a remote server. I can do it using substring functions but I'd like something more compact and portable. (A one-liner that I could modify for other uses as well) Example 1: someXMLtags status16664 Rejected: Invalid LTV/status /someXMLtags Example 2: someXMLtags statusUnable to Post, Invalid Information/status /someXMLtags I want what is inside the status tags. Does anyone have a working solution how we can get the text from inside these tags using regex? Much appreciated, B -Original Message- From: Michael [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 6:59 AM To: Anthony Papillion Cc: php-general@lists.php.net Subject: Re: [PHP] Need help with RegEx At 01:02 AM 12/11/2006 , Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); how this breaks down is: opening quote for first parameter (your MATCH pattern). open regex match pattern= / first atom (.*?) = any or no leading text before div id=result_box dir=ltr, the ? makes it non-greedy so that it stops after finding the first match. second atom (div id=result_box dir=ltr) = the opening tag you are looking for. third atom (.*?) = the text you want to strip out, all text even if nothing is there, between the 2nd and 4th atoms. fourth atom (\/div) = the closing tag of the div tag pair. fifth atom (.*?) = all of the rest of the source html after the closing tag up to the end of the line ^,even if there is nothing there. close regex match pattern= /s in order for this to work on html that may contain newlines, you must specify that the . can represent newline characters, this is done by adding the letter 's' after your regex closing /, so the last thing in your regex match pattern would be /s. end of string ^ (this matches the end of the string you are matching/replacing , $source_html) closing quote for first parameter. The second parameter of the preg_replace is the atom # which contains the text you want to replace the text matched by the regex match pattern in the first parameter, in this case the text we want is in the third atom so this parameter would be $3 (this is the PHP way of back-referencing, if we wanted the text before the tag we would use atom 1, or $1, if we want the tag itself we use $2, etc basically a $ followed by the atom # that holds what we want to replace the $source_html into $trans_text). The third parameter of the preg_replace is the source you wish to match and replace from, in this case your source html in $source_html. after this executes, $trans_text should contain the innerText of the div id=result_box dir=ltr/div tag pair from $source_html, if there is nothing between the opening and closing tags, $trans_text will == , if there is only a newline between the tags, $trans_text will == \n. IMPORTANT: if the text between the tags contains a newline, $trans_text will also contain that newline character because we told . to match newlines. I am no regex expert by far, but this worked for me (assuming I copied it correctly here heh) There are doubtless many other ways to do this, and I am sure others on the list here will correct me if my way is wrong or inefficient. I hope this works for you and that I haven't horribly embarassed myself here. Good luck :) The problem is that when I echo the value of $trans_text variable, I end up with the entire HTML of the page. Can anyone clue me in to what I am doing wrong? Thanks, Anthony -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Need help with RegEx
If you didn't say using regex this is how I'd do it (untested, forgive typos and such..ripped from some code I actively use and stripped down): ?PHP $_XML_RESPONSE_PARSER = xml_parser_create(); xml_set_element_handler($_XML_RESPONSE_PARSER, 'xml_response_open_element_function', 'xml_response_close_element_function'); xml_set_character_data_handler($_XML_RESPONSE_PARSER, 'xml_response_handle_character_data'); xml_parse($_XML_RESPONSE_PARSER, $_XML_RESPONSE, strlen($_XML_RESPONSE)); xml_parser_free($_XML_RESPONSE_PARSER); ~ $FoundStatusTag = false; ~ function xml_response_open_element_function($p, $element, $attributes) { global $FoundStatusTag; ~~ if (strtoupper($element) == STATUS) $FoundStatusTag = true; } ~ function xml_response_close_element_function($p, $element){ global $FoundStatusTag; ~ // do nothing special for now } ~ function xml_response_handle_character_data($p, $cdata){ global $FoundStatusTag; ~ if ($FoundStatusTag) { echo $cdata; $FoundStatusTag = false; } } ? = = = Original message = = = The example provided didn't work for me. It gave me the same string without anything modified. I am also looking for this solution to strip out text from some XML response I get from posting data to a remote server. I can do it using substring functions but I'd like something more compact and portable. (A one-liner that I could modify for other uses as well) Example 1: someXMLtags ~status16664 Rejected: Invalid LTV/status /someXMLtags Example 2: someXMLtags ~statusUnable to Post, Invalid Information/status /someXMLtags I want what is inside the status tags. Does anyone have a working solution how we can get the text from inside these tags using regex? Much appreciated, B ___ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Need help with RegEx
I got it. ?php $input = xmlJunkstatusHello, World!/status/xmlJunk; preg_match(#status(.*?)/status#s, $input, $matches); echo $matches[1]; ? -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 10:59 AM To: php-general@lists.php.net Cc: [EMAIL PROTECTED] Subject: RE: [PHP] Need help with RegEx If you didn't say using regex this is how I'd do it (untested, forgive typos and such..ripped from some code I actively use and stripped down): ?PHP $_XML_RESPONSE_PARSER = xml_parser_create(); xml_set_element_handler($_XML_RESPONSE_PARSER, 'xml_response_open_element_function', 'xml_response_close_element_function'); xml_set_character_data_handler($_XML_RESPONSE_PARSER, 'xml_response_handle_character_data'); xml_parse($_XML_RESPONSE_PARSER, $_XML_RESPONSE, strlen($_XML_RESPONSE)); xml_parser_free($_XML_RESPONSE_PARSER); ~ $FoundStatusTag = false; ~ function xml_response_open_element_function($p, $element, $attributes) { global $FoundStatusTag; ~~ if (strtoupper($element) == STATUS) $FoundStatusTag = true; } ~ function xml_response_close_element_function($p, $element){ global $FoundStatusTag; ~ // do nothing special for now } ~ function xml_response_handle_character_data($p, $cdata){ global $FoundStatusTag; ~ if ($FoundStatusTag) { echo $cdata; $FoundStatusTag = false; } } ? = = = Original message = = = The example provided didn't work for me. It gave me the same string without anything modified. I am also looking for this solution to strip out text from some XML response I get from posting data to a remote server. I can do it using substring functions but I'd like something more compact and portable. (A one-liner that I could modify for other uses as well) Example 1: someXMLtags ~status16664 Rejected: Invalid LTV/status /someXMLtags Example 2: someXMLtags ~statusUnable to Post, Invalid Information/status /someXMLtags I want what is inside the status tags. Does anyone have a working solution how we can get the text from inside these tags using regex? Much appreciated, B ___ Sent by ePrompter, the premier email notification software. Free download at http://www.ePrompter.com. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Need help with RegEx
At 08:29 AM 12/11/2006 , Brad Fuller wrote: The example provided didn't work for me. It gave me the same string without anything modified. You are absolutely correct, this is what I get for not testing it explicitly :( My most sincere apologies to the OP and the list, there is an error in my example (see below for correction) I have cut and pasted from further down in the quoted message, for convenience Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)/s,$3,$source_html); * end of pasted section * I am also looking for this solution to strip out text from some XML response I get from posting data to a remote server. I can do it using substring functions but I'd like something more compact and portable. (A one-liner that I could modify for other uses as well) Example 1: someXMLtags status16664 Rejected: Invalid LTV/status /someXMLtags Example 2: someXMLtags statusUnable to Post, Invalid Information/status /someXMLtags I want what is inside the status tags. Does anyone have a working solution how we can get the text from inside these tags using regex? Much appreciated, B -Original Message- From: Michael [mailto:[EMAIL PROTECTED] Sent: Monday, December 11, 2006 6:59 AM To: Anthony Papillion Cc: php-general@lists.php.net Subject: Re: [PHP] Need help with RegEx At 01:02 AM 12/11/2006 , Anthony Papillion wrote: Hello Everyone, I am having a bit of problems wrapping my head around regular expressions. I thought I had a good grip on them but, for some reason, the expression I've created below simply doesn't work! Basically, I need to retreive all of the text between two unique and specific tags but I don't need the tag text. So let's say that the tag is tag lang='ttt'THIS IS A TEST/tag I would need to retreive THIS IS A TEST only and nothing else. Now, a bit more information: I am using cURL to retreive the entire contents of a webpage into a variable. I am then trying to perform the following regular expression on the retreived text: $trans_text = preg_match(\/div id=result_box dir=ltr(.+?)\/div/); Using the tags you describe here, and assuming the source html is in the variable $source_html, try this: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)^/s,$3,$source_html); The End of string symbol ^ should not be included. I tested the above function without the ^ and it worked for me. below is the TESTED version: $trans_text = preg_replace(/(.*?)(div id=result_box dir=ltr)(.*?)(\/div)(.*?)/s,$3,$source_html); how this breaks down is: opening quote for first parameter (your MATCH pattern). open regex match pattern= / first atom (.*?) = any or no leading text before div id=result_box dir=ltr, the ? makes it non-greedy so that it stops after finding the first match. second atom (div id=result_box dir=ltr) = the opening tag you are looking for. third atom (.*?) = the text you want to strip out, all text even if nothing is there, between the 2nd and 4th atoms. fourth atom (\/div) = the closing tag of the div tag pair. fifth atom (.*?) = all of the rest of the source html after the closing tag up to the end of the line ^,even if there is nothing there. close regex match pattern= /s in order for this to work on html that may contain newlines, you must specify that the . can represent newline characters, this is done by adding the letter 's' after your regex closing /, so the last thing in your regex match pattern would be /s. end of string ^ (this matches the end of the string you are matching/replacing , $source_html) ignore this part of the explanation, the ^ is not needed and in fact breaks the example given closing quote for first parameter. The second parameter of the preg_replace is the atom # which contains the text you want to replace the text matched by the regex match pattern in the first parameter, in this case the text we want is in the third atom so this parameter would be $3 (this is the PHP way of back-referencing, if we wanted the text before the tag we would use atom 1, or $1, if we want the tag itself we use $2, etc basically a $ followed by the atom # that holds what we want to replace the $source_html into $trans_text). The third parameter of the preg_replace is the source you wish to match and replace from, in this case your source html in $source_html. after this executes, $trans_text should contain the innerText of the div id=result_box dir=ltr/div tag pair from $source_html, if there is nothing