[PHP] Re: PCRE regex result is different between Linux Windows.

2008-11-12 Thread ClapClap
ClapClap 2000ans at free.fr writes:

 My versions of PCRE :
 - Linux 7.4 2007-09-21 (PHP 5.2.4-2ubuntu5.3)
 - Windows XP 7.2 2007-06-19 (PHP 5.2.4)


And :
- Windows 2000  7.6 2008-01-28 (PHP 5.2.6)

It works fine under Windows with PCRE 7.2  7.6.
I do not know why


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: PCRE regex result is different between Linux Windows.

2008-11-12 Thread ClapClap
ClapClap 2000ans at free.fr writes:

 
 ClapClap 2000ans at free.fr writes:
 
  My versions of PCRE :
  - Linux 7.4 2007-09-21 (PHP 5.2.4-2ubuntu5.3)
  - Windows XP 7.2 2007-06-19 (PHP 5.2.4)
 
 
 And :
 - Windows 20007.6 2008-01-28 (PHP 5.2.6)
 
 It works fine under Windows with PCRE 7.2  7.6.
 I do not know why
 

Again and again...
I've found the mistake.

It's the string encoding conversion to UTF-8 which make regex differences.
On Linux, it uses Glibc 2.7 while on Windows, it's libiconv 1.11.
According to PHP manual http://docs.php.net/manual/en/intro.iconv.php :
We have to use libiconv to play with encoding, it's better than Glibc.

Damned !


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: PCRE regex result is different between Linux Windows.

2008-11-08 Thread Lupus Michaelis

ClapClap a écrit :


For the PCRE version, I really can not tell you which one I use...
Where can I see that ?


  In the the output from phpinfo function.


So, It may be a bug ? Too bad...


  I remember some change behavior recently in PCRE. But I am not sure, 
I want just to drop this option first.


--
Mickaël Wolff aka Lupus Michaelis
http://lupusmic.org

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: PCRE regex result is different between Linux Windows.

2008-11-08 Thread ClapClap

Lupus Michaelis a écrit :

ClapClap a écrit :


For the PCRE version, I really can not tell you which one I use...
Where can I see that ?


  In the the output from phpinfo function.



Thanks.
My versions of PCRE :
- Linux 7.4 2007-09-21 (PHP 5.2.4-2ubuntu5.3)
- Windows XP 7.2 2007-06-19 (PHP 5.2.4)


So, It may be a bug ? Too bad...


  I remember some change behavior recently in PCRE. But I am not sure, I 
want just to drop this option first.




Yeah !
It's probably that. I hope...
Too bad.

--
Julien


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Re: PCRE regex result is different between Linux Windows.

2008-11-07 Thread ClapClap

Jochem Maas a écrit :

 [I precede you, sorry for language mistakes...]

 php or english? :-)


ohhh... sh.. ! I think I speak PHP better than english (silly, not ?).


 okay, are you using the same PHP version on both machines?
 anything in the php.ini's that differs?


The same, not possible (Windows/Linux).
For php.ini, quite the same (some directories are different).
Under Windows (PHP 5.1.6 on 2k SP4 / 5.2.4 on XP SP2, the officials).
Under Linux (Ubuntu 8.04) 5.2.4-2ubuntu5.3.

 are you possibly looking at an input/file character-set encoding related
 issue? (i.e. encoding is different between the two servers)?


All PHP source is written in UTF-8.
I take the HTML code and convert it to UTF-8 using iconv() / mbstring...

 can you post a short complete script to see if others can reproduce the
 error?


See the following link for the bogus test (Must match : windows = 90, 
linux = 54): http://pastebin.com/m1c43cc10


The same results are given when :
- comments are removed
- with 'm' or 's' PCRE options
- recursion is removed (multiple parses in while statement (matches for 
each pass : 55, 26, 5, 2))


This snippet is used in a part of code which goal is to convert HTML 
from Word 2003 to valid XHTML. But that is not the subject...


For the PCRE version, I really can not tell you which one I use...
Where can I see that ?

So, It may be a bug ? Too bad...


 have you tried to use the Tidy extension to clean up the input string?,
 it has alsorts of wonderful settings for making (x)HTML nice an shiny.


You think I have already tried it. ;-)
Tidy is too agressive for parsing HTML from MS Office...

Hope it will work :-/

--
Julien


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php