Re: [PHP] Re: preg_match and dates
Peter Ford wrote: Michael A. Peters wrote: I have absolutely no control over the source file. The source file is an xml file (er, sort of, it doesn't follow any particular DTD) and has a tag called VERBATIM_DATE in each record - looks to be required in their output as every record so far has it, but w/o a DTD hard to know - time of day, on the other hand, is not required and sometimes (usually) the tag missing. Here's the beauty - VERBATIM_DATE in the same xml file uses multiple different formats. IE - 12 March 1945 14 Mar 1967 Apr 1999 12-03-2005 Before 1904 Winter or Spring 1977 etc. It does seem that if there is a day, the day is always first - but sometimes it has a space as a delimiter, - as delimiter, and sometimes it has both - IE 10-15 Dec 1934 12 March-03 April 1956 What I'm trying to do is write a preg matches for each case I come across - if it matches the preg, it then parses according to the pattern to get me an acceptable -MM-DD (not sure how I'll deal with the season case yet ... but I'm serious, that kind of thing in there several times) To at least get started though, is there a wildcard defined that says match a month? IE /^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/ where MONTH is some special magic that matches Mar March Apr April etc. ? If you must know - it's data from a biology vertebrate museum. Thousands of records may match a given query. Most of them look fairly easily parsable and it does look like when a day is specified, it is always first and year is always last. The data is needed by me, so I'm planning on having the script die if it comes across a date I don't have a regex to parse before it does anything so I can add appropriate regex as necessary, but damn - you'd think a vertebrate museum would have cleaned up their DB somewhat. My first shot would be to see how far I get with strtotime(), or date_create(). The rest looks like a job for the Mechanical Turk (http://www.mturk.com/mturk). For your specific query, you could do something like (Jan|January|Feb|February|...) alternation, but that won't catch typos and idiosyncrasies. You probably want to make it case-insensitive too. I suspect you will end up with a bunch of records where the data cannot be parsed sensibly - I would probably write the list of such records to an exception file. Once you have a a system that generates a manageable number of exceptions you can deal with those by hand. As for your expectation of a museum: the reputation of dusty old rooms full of stuff is not entirely un-earned, so I wouldn't expect their databases to be spotless! I got it figured out - the dates I couldn't parse I went ahead and entered into my database anyway but with a date of 1800-01-01 so I can go back and manually deal with them later (not that many) http://homepage.mac.com/mpeters/misc/map41med.png http://homepage.mac.com/mpeters/misc/map43med.png http://homepage.mac.com/mpeters/misc/map49med.png Working fabulously :) (yes - most the records are 5 years old, younger records that are not in that museum will be added soon) -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: preg_match question...
hmmm... tried your preg__match/regex... i get: 0 - 1145 total 1 - 1145 2 - l i would have thought that the 2nd array item should have had total... -Original Message- From: Frank Stanovcak [mailto:blindspot...@comcast.net] Sent: Friday, February 06, 2009 6:15 AM To: php-general@lists.php.net Subject: [PHP] Re: preg_match question... bruce bedoug...@earthlink.net wrote in message news:234801c98863$88f27260$0301a...@tmesa.com... hi... trying to figure out the best approach to using preg_match to extract the number from the follwing type of line... 131646 sometext follows.. basically, i want to extract the number, without the text, but i have to be able to match on the text i've been playing with different preg_match regexs.. but i'm missing something obvious! thoughts/comments.. How about preg_match('#(\d+)(.)+#',$haystack,$match) if I remember right $match[0] would be all of it $match[1] would be the numbers $match[2] would be the text -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: preg_match question...
bruce wrote: hmmm... tried your preg__match/regex... i get: 0 - 1145 total 1 - 1145 2 - l i would have thought that the 2nd array item should have had total... Probably want this: '#(\d+)(.+)#' -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: preg_match question...
2009/2/6 Shawn McKenzie nos...@mckenzies.net bruce wrote: hmmm... tried your preg__match/regex... i get: 0 - 1145 total 1 - 1145 2 - l i would have thought that the 2nd array item should have had total... Probably want this: '#(\d+)(.+)#' That's it sorry. Take a look at preg_match_all(), you can put the total in you regexp like /^([0-9]+) total/ -- Thanks! -Shawn http://www.spidean.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php -- Alpar Torok
Re: [PHP] Re: preg_match question...
Shawn McKenzie nos...@mckenzies.net wrote in message news:e1.67.59347.e494c...@pb1.pair.com... bruce wrote: hmmm... tried your preg__match/regex... i get: 0 - 1145 total 1 - 1145 2 - l i would have thought that the 2nd array item should have had total... Probably want this: '#(\d+)(.+)#' -- Thanks! -Shawn http://www.spidean.com yep. Relized it after I saw his post. the space doesn't hurt either '#(\d+) (.+)#' Frank...doh! -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: preg_match() returns false but no documentation why
On 5/30/07, Jim Lucas [EMAIL PROTECTED] wrote: The op will need to use something other than forward slashes. At 5/30/2007 03:26 PM, Jared Farrish wrote: You mean the delimiters (a la Richard's suggestion about using '|')? Hi Jared, If the pattern delimiter character appears in the pattern it must be escaped so that the regexp processor will correctly interpret it as a pattern character and not as the end of the pattern. This would produce a regexp error: /ldap://*/ but this is OK: /ldap:\/\/*/ Therefore if you choose another delimiter altogether you don't have to escape the slashes: #ldap://*# Cleaner and more clear. preg_match('|^ldap(s)?://[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$|', $this-server ) I also recommend using single quotes instead of double quotes here. Single Quotes: Noted. Any reason why? I guess you might be a little out of luck putting $vars into a regex without . concatenating. Both PHP and regexp use the backslash as an escape. Inside double quotes, PHP interprets \ as escape, while inside single quotes PHP interprets \ as a simple backslash character. When working with regexp in PHP you're dealing with two interpreters, first PHP and then regexp. To support PHP's interpretation with double quotes, you have to escape the escapes: Single quotes: '/ldap:\/\/*/' Double quotes: /ldap:\\/\\/*/ PHP interprets \\/ as \/ RegExp interprets \/ as / There's also the additional minor argument that single-quoted strings take less processing because PHP isn't scanning them for escaped characters and variables to expand. On a practical level, though, the difference is going to be measured in microseconds and is unlikely to affect the perceptible speed of a typical PHP application. So, for a pattern like this that contains slashes, it's best to use a non-slash delimiter AND single quotes (unless, as you say, you need to include PHP variables in the pattern): $pattern = '#ldap://*#'; Personally I favor heredoc syntax for such situations because I don't have to worry about the quotes: $regexp = _ #ldap://*$var# _; why is there a period in the second pattern? The period comes from the original article on SitePoint (linked earlier). Is it unnecessary? I can't say I'm real sure what this means for the '.' in regex's: Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too. - http://www.regular-expressions.info/reference.html Inside of a bracketed character class, the dot means a literal period character and not a wildcard. All non-alphanumeric characters other than \, -, ^ (at the start) and the terminating ] are non-special in character classes PHP PREG Pattern Syntax http://www.php.net/manual/en/reference.pcre.pattern.syntax.php scroll down to 'Square brackets' Also, why are you allowing for uppercase letters when the RFC's don't allow them? I hadn't gotten far enough to strtolower(), but that's a good point, I hadn't actually considered it yet. Perhaps it has to do with the source of the string: can you guarantee that the URIs passed to this routine conform to spec? Another way to handle this would be to simply accept case-insensitive strings: |^ldap(s)?://[a-z0-9-]+\.[a-z.]{2,5}$|i Pattern Modifiers http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php i (PCRE_CASELESS) If this modifier is set, letters in the pattern match both upper and lower case letters. Regards, Paul __ Paul Novitski Juniper Webcraft Ltd. http://juniperwebcraft.com -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Re: Preg_match - Find URL and convert to lower case
In that case you could use the /e trailing option to use strtolower on the subpattern.
Re: [PHP] Re: Preg_match - Find URL and convert to lower case
On Nov 30, 2006, at 7:50 PM, Jonesy wrote: On Thu, 30 Nov 2006 14:16:16 -0800, Kevin Murphy wrote: I have some text that comes out of a database all in uppercase (old IBM Mainframe that only supports uppercase characters). I see via other followups that you have your kludge working. *But* , What do you mean by old IBM Mainframe that only supports uppercase characters? The EBCDIC codes X'81' X'89' (a-i), X'91' X'99' (j-r), and X'A2' X'A9' (s-z) have been defined and used since probably before you were born. I have in front of me my first IBM Green Card (IBM System/360 Reference Data, GX20-1703-3) from 1966 which debunks that urban legend. If the data in the mainframe database is all upper case, it was sloppy programming or sloppy design that got it there. If it _is_ stored in the mainframe database in proper UC/lc form, then it is probably a sloppy extraction procedure that is to blame for your input. Jonesy -- Marvin L Jones| jonz | W3DHJ | linux 38.24N 104.55W | @ config.com | Jonesy | OS/2 *** Killfiling google posts: http//jonz.net/ng.htm Yeah, that would be the problem. My website and its MySQL database are totally seperate from this data (its class schedule data). All the data in the database is uppercase and I've been told that all the data must remain as uppercase only. Why? I have no idea. Can I change that? Nope. Welcome to my world and the joys of working with governmental institutions. So I did misspeak before. The mainframe itself probably supports UC/ lc, but whatever program is on it, or maybe its just a procedure issue (its _always_ been done this way so we _must_ continue doing it that way). But, the data I see in my extract is all upper case and that's what I am dealing with. -- Kevin Murphy Webmaster: Information and Marketing Services Western Nevada Community College www.wncc.edu 775-445-3326
Re: [PHP] Re: Preg_match - Find URL and convert to lower case
On Fri, 1 Dec 2006, Kevin Murphy wrote: On Nov 30, 2006, at 7:50 PM, Jonesy wrote: On Thu, 30 Nov 2006 14:16:16 -0800, Kevin Murphy wrote: I have some text that comes out of a database all in uppercase (old IBM Mainframe that only supports uppercase characters). I see via other followups that you have your kludge working. *But* , What do you mean by old IBM Mainframe that only supports uppercase characters? The EBCDIC codes X'81' X'89' (a-i), X'91' X'99' (j-r), and X'A2' X'A9' (s-z) have been defined and used since probably before you were born. I have in front of me my first IBM Green Card (IBM System/360 Reference Data, GX20-1703-3) from 1966 which debunks that urban legend. If the data in the mainframe database is all upper case, it was sloppy programming or sloppy design that got it there. If it _is_ stored in the mainframe database in proper UC/lc form, then it is probably a sloppy extraction procedure that is to blame for your input. Yeah, that would be the problem. My website and its MySQL database are totally seperate from this data (its class schedule data). All the data in the database is uppercase and I've been told that all the data must remain as uppercase only. Why? I have no idea. Can I change that? Nope. Welcome to my world and the joys of working with governmental institutions. So I did misspeak before. The mainframe itself probably supports UC/lc, but whatever program is on it, or maybe its just a procedure issue (its _always_ been done this way so we _must_ continue doing it that way). But, the data I see in my extract is all upper case and that's what I am dealing with. Ahhh... So, it was sloppy design -- Way Back Then. Except, in MainFrame years, the arrival of the WWW and URL's was only weeks ago. There is no excuse for that DB design to be all uppercase, since that would pollute the data (URL's) and 'they' had to know it even then. Gov. work + lowest bidder . Good luck with the project. And, Happy Shopping Season. Jonesy -- Marvin L Jones | jonz | W3DHJ | linux Pueblo, Colorado | @ | Jonesy | OS/2 __ 38.24N 104.55W | config.com | DM78rf |SK -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Re: preg_match
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks, I weas reading on the php website under the preg_match functino and people were saying that you had to excape the $ so that iw ould be evaluated right. That's what got me confused. - --- Aaron Axelsen AIM: AAAK2 Email: [EMAIL PROTECTED] Want reliable web hosting at affordable prices? www.modevia.com Web Dev/Design Community/Zine www.developercube.com - -Original Message- From: sven [mailto:[EMAIL PROTECTED] Sent: Friday, June 20, 2003 4:31 AM To: [EMAIL PROTECTED] Subject: [PHP] Re: preg_match preg_matchtry without backslashes. $pattern = /$search/i; if (preg_match ($pattern, $date[$i])) { echo $date[$i]br /; } you don't need the .*? in your regex (either * or ? multiplier) as preg_match searches for any occurance (not from begin ^ or to end $). ciao SVEN Aaron Axelsen [EMAIL PROTECTED] schrieb im Newsbeitrag news:[EMAIL PROTECTED] -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I am trying to code a search, and i need to get my matching expression to work right. The user can enter a value swhich is stored in the vraible $search. What the below loop needs to do, is search each entry of the array for any occurance of the $search string. If i hard code in the string it works, but not when passed as a varaible. Is there something I am missing? Do i need to convert the $search variable to soetmhing? if (preg_match (/.*?\\\$search.*?/i,$date[$i])) { print $date[$i]br; } - --- Aaron Axelsen AIM: AAAK2 Email: [EMAIL PROTECTED] Want reliable web hosting at affordable prices? www.modevia.com Web Dev/Design Community/Zine www.developercube.com -BEGIN PGP SIGNATURE- Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com iQA/AwUBPvKWA7rnDjSLw9ADEQIfGQCgwAO5ikh/RIN5OXoVkC8F4FH/YAoAoJE5 zMxHkRssHbU2Vm4svv2hId8O =DJOi -END PGP SIGNATURE- -BEGIN PGP SIGNATURE- Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com iQA/AwUBPvM3FrrnDjSLw9ADEQLSSQCgp9Fmuoyn1LEkAA2vhcAsbXdsoKMAnjXd VSnoMXvBNIzW4BmJdk7Ki8rt =aWuk -END PGP SIGNATURE- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php