Re: [PHP] Re: preg_match and dates

2009-03-02 Thread Michael A. Peters

Peter Ford wrote:

Michael A. Peters wrote:

I have absolutely no control over the source file.

The source file is an xml file (er, sort of, it doesn't follow any
particular DTD) and has a tag called VERBATIM_DATE in each record -
looks to be required in their output as every record so far has it, but
w/o a DTD hard to know - time of day, on the other hand, is not required
and sometimes (usually) the tag missing.

Here's the beauty - VERBATIM_DATE in the same xml file uses multiple
different formats. IE -

12 March 1945
14 Mar 1967
Apr 1999
12-03-2005
Before 1904
Winter or Spring 1977

etc.

It does seem that if there is a day, the day is always first - but
sometimes it has a space as a delimiter, - as delimiter, and sometimes
it has both - IE

10-15 Dec 1934
12 March-03 April 1956

What I'm trying to do is write a preg matches for each case I come
across - if it matches the preg, it then parses according to the pattern
to get me an acceptable -MM-DD (not sure how I'll deal with the
season case yet ... but I'm serious, that kind of thing in there several
times)

To at least get started though, is there a wildcard defined that says
match a month?

IE

/^([0-9]{2})[\s-](MONTH_MATCH)[\s-]([0-9]{4,4}$/

where MONTH is some special magic that matches Mar March Apr April etc. ?

If you must know - it's data from a biology vertebrate museum. Thousands
of records may match a given query. Most of them look fairly easily
parsable and it does look like when a day is specified, it is always
first and year is always last.

The data is needed by me, so I'm planning on having the script die if it
comes across a date I don't have a regex to parse before it does
anything so I can add appropriate regex as necessary, but damn - you'd
think a vertebrate museum would have cleaned up their DB somewhat.



My first shot would be to see how far I get with strtotime(), or date_create().
The rest looks like a job for the Mechanical Turk (http://www.mturk.com/mturk).

For your specific query, you could do something like
(Jan|January|Feb|February|...) alternation, but that won't catch typos and
idiosyncrasies. You probably want to make it case-insensitive too.

I suspect you will end up with a bunch of records where the data cannot be
parsed sensibly - I would probably write the list of such records to an
exception file. Once you have a a system that generates a manageable number of
exceptions you can deal with those by hand.

As for your expectation of a museum: the reputation of dusty old rooms full of
stuff is not entirely un-earned, so I wouldn't expect their databases to be
spotless!



I got it figured out - the dates I couldn't parse I went ahead and 
entered into my database anyway but with a date of 1800-01-01 so I can 
go back and manually deal with them later (not that many)


http://homepage.mac.com/mpeters/misc/map41med.png
http://homepage.mac.com/mpeters/misc/map43med.png
http://homepage.mac.com/mpeters/misc/map49med.png

Working fabulously :)

(yes - most the records are  5 years old, younger records that are not 
in that museum will be added soon)


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Re: preg_match question...

2009-02-06 Thread bruce
hmmm...

tried your preg__match/regex...

i get:
0 - 1145 total
1 - 1145
2 - l

i would have thought that the 2nd array item should have had total...



-Original Message-
From: Frank Stanovcak [mailto:blindspot...@comcast.net]
Sent: Friday, February 06, 2009 6:15 AM
To: php-general@lists.php.net
Subject: [PHP] Re: preg_match question...



bruce bedoug...@earthlink.net wrote in message 
news:234801c98863$88f27260$0301a...@tmesa.com...
 hi...

 trying to figure out the best approach to using preg_match to extract the
 number from the follwing type of line...

  131646 sometext follows..

 basically, i want to extract the number, without the text, but i have to 
 be
 able to match on the text

 i've been playing with different preg_match regexs.. but i'm missing
 something obvious!

 thoughts/comments..


How about
preg_match('#(\d+)(.)+#',$haystack,$match)

if I remember right
$match[0] would be all of it
$match[1] would be the numbers
$match[2] would be the text 



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: preg_match question...

2009-02-06 Thread Shawn McKenzie
bruce wrote:
 hmmm...
 
 tried your preg__match/regex...
 
 i get:
 0 - 1145 total
 1 - 1145
 2 - l
 
 i would have thought that the 2nd array item should have had total...
 

Probably want this: '#(\d+)(.+)#'

-- 
Thanks!
-Shawn
http://www.spidean.com

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: preg_match question...

2009-02-06 Thread Alpár Török
2009/2/6 Shawn McKenzie nos...@mckenzies.net

 bruce wrote:
  hmmm...
 
  tried your preg__match/regex...
 
  i get:
  0 - 1145 total
  1 - 1145
  2 - l
 
  i would have thought that the 2nd array item should have had total...
 

 Probably want this: '#(\d+)(.+)#'

That's it sorry. Take a look at preg_match_all(), you can put the total in
you regexp like /^([0-9]+) total/



 --
 Thanks!
 -Shawn
 http://www.spidean.com

 --
 PHP General Mailing List (http://www.php.net/)
 To unsubscribe, visit: http://www.php.net/unsub.php




-- 
Alpar Torok


Re: [PHP] Re: preg_match question...

2009-02-06 Thread Frank Stanovcak

Shawn McKenzie nos...@mckenzies.net wrote in message 
news:e1.67.59347.e494c...@pb1.pair.com...
 bruce wrote:
 hmmm...

 tried your preg__match/regex...

 i get:
 0 - 1145 total
 1 - 1145
 2 - l

 i would have thought that the 2nd array item should have had total...


 Probably want this: '#(\d+)(.+)#'

 -- 
 Thanks!
 -Shawn
 http://www.spidean.com

yep.  Relized it after I saw his post.

the space doesn't hurt either

'#(\d+) (.+)#'

Frank...doh! 



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: preg_match() returns false but no documentation why

2007-05-30 Thread Paul Novitski



On 5/30/07, Jim Lucas [EMAIL PROTECTED] wrote:


The op will need to use something other than forward slashes.


At 5/30/2007 03:26 PM, Jared Farrish wrote:

You mean the delimiters (a la Richard's suggestion about using '|')?



Hi Jared,

If the pattern delimiter character appears in the pattern it must be 
escaped so that the regexp processor will correctly interpret it as a 
pattern character and not as the end of the pattern.


This would produce a regexp error:

/ldap://*/

but this is OK:

/ldap:\/\/*/

Therefore if you choose another delimiter altogether you don't have 
to escape the slashes:


#ldap://*#

Cleaner and more clear.



preg_match('|^ldap(s)?://[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$|', $this-server )


I also recommend using single quotes instead of double quotes here.


Single Quotes: Noted. Any reason why? I guess you might be a little out of
luck putting $vars into a regex without . concatenating.


Both PHP and regexp use the backslash as an escape.  Inside double 
quotes, PHP interprets \ as escape, while inside single quotes PHP 
interprets \ as a simple backslash character.


When working with regexp in PHP you're dealing with two interpreters, 
first PHP and then regexp.  To support PHP's interpretation with 
double quotes, you have to escape the escapes:


Single quotes:  '/ldap:\/\/*/'
Double quotes:  /ldap:\\/\\/*/

PHP interprets \\/ as \/
RegExp interprets \/ as /

There's also the additional minor argument that single-quoted strings 
take less processing because PHP isn't scanning them for escaped 
characters and variables to expand.  On a practical level, though, 
the difference is going to be measured in microseconds and is 
unlikely to affect the perceptible speed of a typical PHP application.


So, for a pattern like this that contains slashes, it's best to use a 
non-slash delimiter AND single quotes (unless, as you say, you need 
to include PHP variables in the pattern):


$pattern = '#ldap://*#';

Personally I favor heredoc syntax for such situations because I don't 
have to worry about the quotes:


$regexp = _
#ldap://*$var#
_;



why is there a period in the second pattern?


The period comes from the original article on SitePoint (linked earlier). Is
it unnecessary? I can't say I'm real sure what this means for the '.' in
regex's:

Matches any single character except line break characters \r and \n. Most
regex flavors have an option to make the dot match line break characters
too.
- http://www.regular-expressions.info/reference.html


Inside of a bracketed character class, the dot means a literal period 
character and not a wildcard.


All non-alphanumeric characters other than \, -, ^ (at the start) 
and the terminating ] are non-special in character classes


PHP PREG
Pattern Syntax
http://www.php.net/manual/en/reference.pcre.pattern.syntax.php
scroll down to 'Square brackets'



Also, why are you allowing for uppercase letters
when the RFC's don't allow them?


I hadn't gotten far enough to strtolower(), but that's a good point, I
hadn't actually considered it yet.


Perhaps it has to do with the source of the string: can you guarantee 
that the URIs passed to this routine conform to spec?


Another way to handle this would be to simply accept case-insensitive strings:

|^ldap(s)?://[a-z0-9-]+\.[a-z.]{2,5}$|i

Pattern Modifiers
http://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

i (PCRE_CASELESS)
If this modifier is set, letters in the pattern match both upper 
and lower case letters.


Regards,

Paul
__

Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com 


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Re: Preg_match - Find URL and convert to lower case

2006-12-01 Thread Dave Goodchild

In that case you could use the /e trailing option to use strtolower on the
subpattern.


Re: [PHP] Re: Preg_match - Find URL and convert to lower case

2006-12-01 Thread Kevin Murphy



On Nov 30, 2006, at 7:50 PM, Jonesy wrote:


On Thu, 30 Nov 2006 14:16:16 -0800, Kevin Murphy wrote:


I have some text that comes out of a database all in uppercase (old
IBM Mainframe that only supports uppercase characters).


I see via other followups that you have your kludge working.  *But* ,

What do you mean by old IBM Mainframe that only supports uppercase
characters?  The EBCDIC codes X'81'  X'89' (a-i), X'91'   
X'99' (j-r),
and X'A2'  X'A9' (s-z) have been defined and used since probably  
before

you were born.  I have in front of me my first IBM Green Card (IBM
System/360 Reference Data, GX20-1703-3) from 1966 which debunks that
urban legend.

If the data in the mainframe database is all upper case, it was sloppy
programming or sloppy design that got it there.  If it _is_ stored in
the mainframe database in proper UC/lc form, then it is probably a
sloppy extraction procedure that is to blame for your input.

Jonesy
--
  Marvin L Jones| jonz  | W3DHJ  | linux
   38.24N  104.55W  |  @ config.com | Jonesy |  OS/2
*** Killfiling google posts: http//jonz.net/ng.htm


Yeah, that would be the problem. My website and its MySQL database  
are totally seperate from this data (its class schedule data). All  
the data in the database is uppercase and I've been told that all the  
data must remain as uppercase only. Why? I have no idea. Can I change  
that? Nope. Welcome to my world and the joys of working with  
governmental institutions.


So I did misspeak before. The mainframe itself probably supports UC/ 
lc, but whatever program is on it, or maybe its just a procedure  
issue (its _always_ been done this way so we _must_ continue doing it  
that way). But, the data I see in my extract is all upper case  
and that's what I am dealing with.


--
Kevin Murphy
Webmaster: Information and Marketing Services
Western Nevada Community College
www.wncc.edu
775-445-3326



Re: [PHP] Re: Preg_match - Find URL and convert to lower case

2006-12-01 Thread Jonesy
On Fri, 1 Dec 2006, Kevin Murphy wrote:
 On Nov 30, 2006, at 7:50 PM, Jonesy wrote:
 On Thu, 30 Nov 2006 14:16:16 -0800, Kevin Murphy wrote:

 I have some text that comes out of a database all in uppercase (old
 IBM Mainframe that only supports uppercase characters).

 I see via other followups that you have your kludge working.  *But* ,

 What do you mean by old IBM Mainframe that only supports uppercase
 characters?  The EBCDIC codes X'81'  X'89' (a-i), X'91'  X'99' (j-r),
 and X'A2'  X'A9' (s-z) have been defined and used since probably before
 you were born.  I have in front of me my first IBM Green Card (IBM
 System/360 Reference Data, GX20-1703-3) from 1966 which debunks that
 urban legend.

 If the data in the mainframe database is all upper case, it was sloppy
 programming or sloppy design that got it there.  If it _is_ stored in
 the mainframe database in proper UC/lc form, then it is probably a
 sloppy extraction procedure that is to blame for your input.

 Yeah, that would be the problem. My website and its MySQL database are totally
 seperate from this data (its class schedule data). All the data in the 
 database
 is uppercase and I've been told that all the data must remain as uppercase
 only. Why? I have no idea. Can I change that? Nope. Welcome to my world and 
 the
 joys of working with governmental institutions.

 So I did misspeak before. The mainframe itself probably supports UC/lc, but
 whatever program is on it, or maybe its just a procedure issue (its _always_
 been done this way so we _must_ continue doing it that way). But, the data
 I see in my extract is all upper case and that's what I am dealing with.

Ahhh...  So, it was sloppy design -- Way Back Then.  Except, in
MainFrame years, the arrival of the WWW and URL's was only weeks ago.
There is no excuse for that DB design to be all uppercase, since that
would pollute the data (URL's) and 'they' had to know it even then.

Gov. work + lowest bidder .

Good luck with the project.
And, Happy Shopping Season.
Jonesy
-- 
  Marvin L Jones | jonz | W3DHJ  |  linux
   Pueblo, Colorado  |  @   | Jonesy |   OS/2 __
38.24N  104.55W  |   config.com | DM78rf |SK

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Re: preg_match

2003-06-20 Thread Aaron Axelsen
 
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Thanks,

I weas reading on the php website under the preg_match functino and
people were saying that you had to excape the $ so that iw ould be
evaluated right.  That's what got me confused.

- ---
Aaron Axelsen
AIM: AAAK2
Email: [EMAIL PROTECTED]

Want reliable web hosting at affordable prices?
www.modevia.com
 
Web Dev/Design Community/Zine
www.developercube.com



- -Original Message-
From: sven [mailto:[EMAIL PROTECTED] 
Sent: Friday, June 20, 2003 4:31 AM
To: [EMAIL PROTECTED]
Subject: [PHP] Re: preg_match


preg_matchtry without backslashes.

$pattern = /$search/i;
if (preg_match ($pattern, $date[$i]))
{
echo $date[$i]br /;
}

you don't need the .*? in your regex (either * or ? multiplier) as
preg_match searches for any occurance (not from begin ^ or to end $).

ciao SVEN
  Aaron Axelsen [EMAIL PROTECTED] schrieb im Newsbeitrag
news:[EMAIL PROTECTED]
  -BEGIN PGP SIGNED MESSAGE-
  Hash: SHA1

  I am trying to code a search, and i need to get my matching
  expression to work right.  The user can enter a value swhich is
  stored in the vraible $search.

  What the below loop needs to do, is search each entry of the array
  for any occurance of the $search string.  If i hard code in the
  string it works, but not when passed as a varaible.  Is there
  something I am missing? Do i need to convert the $search variable
to
  soetmhing?

  if (preg_match (/.*?\\\$search.*?/i,$date[$i])) {
  print $date[$i]br;
  }

  - ---
  Aaron Axelsen
  AIM: AAAK2
  Email: [EMAIL PROTECTED]

  Want reliable web hosting at affordable prices?
  www.modevia.com

  Web Dev/Design Community/Zine
  www.developercube.com



  -BEGIN PGP SIGNATURE-
  Version: PGPfreeware 7.0.3 for non-commercial use
http://www.pgp.com

  iQA/AwUBPvKWA7rnDjSLw9ADEQIfGQCgwAO5ikh/RIN5OXoVkC8F4FH/YAoAoJE5
  zMxHkRssHbU2Vm4svv2hId8O
  =DJOi
  -END PGP SIGNATURE-


-BEGIN PGP SIGNATURE-
Version: PGPfreeware 7.0.3 for non-commercial use http://www.pgp.com

iQA/AwUBPvM3FrrnDjSLw9ADEQLSSQCgp9Fmuoyn1LEkAA2vhcAsbXdsoKMAnjXd
VSnoMXvBNIzW4BmJdk7Ki8rt
=aWuk
-END PGP SIGNATURE-



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php