ID:               46947
 User updated by:  victor at casnt dot ro
 Reported By:      victor at casnt dot ro
 Status:           Bogus
 Bug Type:         PCRE related
 Operating System: Debian
 PHP Version:      5.2.8
 New Comment:

I have read the docs and I see that preg_match does not work with big
strings. 
In this case I would like transform this ticket from a bug report to a
request for new functionalities.

It will also be good for the competition(perl works with big strings).

I need to parse a big xml file. 
Using the php xml functions in my case is a complication.
Pattern matching can solve my problem in just 3 lines.

When I say "big string" I mean 100KB(my test failed on a 104KB file).
Pattern matching will clearly be inneficient on huge stings compared to
a dedicated parser.

Thank you.


Previous Comments:
------------------------------------------------------------------------

[2008-12-26 20:02:12] [email protected]

Thank you for taking the time to write to us, but this is not
a bug. Please double-check the documentation available at
http://www.php.net/manual/ and the instructions on how to report
a bug at http://bugs.php.net/how-to-report.php

.

------------------------------------------------------------------------

[2008-12-26 19:54:10] [email protected]

Hello, the dot character only match a newline in a string when using
the 's' modifier.

Try:
/(<\?xml.+<report[^>]+>)/s

PS: You don't need escape < > in this case (that it's not a separator
(the / in this case))

------------------------------------------------------------------------

[2008-12-26 19:16:53] victor at casnt dot ro

Description:
------------
The exact same pattern matching is not working on the same string, with
a extra line at the end. 
The length of the string seems to be the problem.

Reproduce code:
---------------
The script:
$handle = fopen('test3.xml', "r");
$contents = fread($handle, filesize('test3.xml'));
fclose($handle);
$contents = preg_replace('/\>[\t|\ |\s]{1,}\</', "><", $contents);
$contents = preg_replace('/\n/', "", $contents);
if(preg_match('/(\<\?xml.{1,}\<report[^>]{1,}\>)/', $contents, $match))
{
    print "Match: $match[1]\n";
} else {print "Fail\n";}

OBSERVATION: If you delete the last line from the file test3.xml, the
pattern matching will work fine. The last lines have nothing to do with
the pattern.

You can find the file here: 
http://www.casnt.ro/Files/XML/test3.xml

Expected result:
----------------
Match: <?xml version="1.0" encoding="utf-8"?><report
AppKey="CNAS-v1.0.2801.665" AppID="14" clinic="SSSSSSSSSSSSSSSSSSSSSS"
fiscalCode="11111111" contractNo="123123123/2008"
insuranceHouse="CAS-NT" reportingDate="2008-12-07"
startFrom="2008-11-01" endTo="2008-11-30" invoiceNo="" labValue="0"
hspValue="0" xmlns="http://localhost";>


Actual result:
--------------
Fail


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=46947&edit=1

Reply via email to