ID:               33334
 User updated by:  kloske at tpg dot com dot au
 Reported By:      kloske at tpg dot com dot au
-Status:           Bogus
+Status:           Open
 Bug Type:         Regexps related
 Operating System: Linux
 PHP Version:      4.3.10
 New Comment:

I do not believe this bug to be bogus or resolved.


Previous Comments:
------------------------------------------------------------------------

[2005-06-14 12:20:09] kloske at tpg dot com dot au

Hi, strangely enough, you are correct that placing a question mark (for
exactly 0 or 1 matches) works.

*however*, this opens up more questions than it answers (and to my mind
brings to light perhaps deeper bugs). The regex manuals all have the
following to say:

1. The behavior of multiple adjacent duplication symbols (+, *, ? and
intervals) produces undefined results.

2. * matches zero or more occurrances, so ignoring (1), *? taken to
mean what is most obvious means "zero or more repeated once or not at
all" which definitely logically collapses down to "zero or more" which
is what * means on its own, which is (a) what I had, and (b) logically
equivalent to the suggested solution.

3. '/' and '"' NEVER (even when greedy) match ([^\"]|\\|\"), which my
test case clearly demonstrates the PHP regular expression engine
doing.

(1) would tend to suggest that *? as the correct way to achieve what I
am after is undefined and therefore not correct.

(2) seems to indicate that failing (1), the two expressions should be
equivalent and both produce the same behavior (which they clearly do
not)

and

(3) cannot possibly be explained by ANY alternative solution since it
clearly violates all possible ways of interpreting the regex.

Put simply: any sequence of characters generated from this regular
expression ([^\"]|\\|\") can never contain a single backslash or a
quote that is not proceeded by a backslash, yet the match that PHP's
regular expression engine is returning violates this precondition.

I can see three possible situations occurring here:

1. PHP regex differs from the standard forms of regex available on
POSIX systems, and whilst this may be desirable it needs to be clearly
documented (which it currently is not - it is not even hinted at).

2. PHP regex has a bug with its handling of zero or more repetition
generators.

3. There is something which I still am missing after repeated
inspections, readings of the relevant manuals, and consultation with
peers.

------------------------------------------------------------------------

[2005-06-14 09:52:54] [EMAIL PROTECTED]

Regular expressions are greedy by default.  Change it to:

$r_text = "(\"(([^\\\"]|\\\\|\\\")*?)\"|[^\",][^,]*?)";

or use the U modifier on the call and I bet it will do what you want. 
There is no bug here.  

------------------------------------------------------------------------

[2005-06-14 09:31:40] kloske at tpg dot com dot au

Hi,

Unfortunately the system this is running on at present is in production
and I don't really have the resources just at this stage to get the
latest stable snapshot up and running.

Perhaps someone with this stable snapshot can copy and paste the 10 or
so short lines into a test.php webpage and see if it runs as expected
or not?

If the reason you're asking me to do this is that you've tested it on
the latest stable and it works then I will try as soon as I get time to
check this, but otherwise I'll have to leave it a while as I have a lot
of work on at the moment (buying a house, short staffed at work,
serious spinal problems - the usual!)

As a slight aside, I should mention that I just tested it on another
PHP box which is totally unrelated to the first, this time OpenBSD, PHP
4.1.2 and it is also affected.

I should have probably prefaced the report with the fact that I've got
a workaround for my particular case which is an acceptable solution
(just not accept strings which are unquoted and contain quotes).

------------------------------------------------------------------------

[2005-06-14 09:18:46] [EMAIL PROTECTED]

Please try using this CVS snapshot:

  http://snaps.php.net/php4-STABLE-latest.tar.gz
 
For Windows:
 
  http://snaps.php.net/win32/php4-win32-STABLE-latest.zip



------------------------------------------------------------------------

[2005-06-14 09:17:32] kloske at tpg dot com dot au

Note that due to issues with the CAPTCHA, I've somehow included the
wrong expected output and actual output.

The ACTUAL output is:
"some text","test \",thing"
("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*)
array(5) {
  [0]=>
  string(27) ""some text","test \",thing""
  [1]=>
  string(20) ""some text","test \""
  [2]=>
  string(18) "some text","test \"
  [3]=>
  string(1) "\"
  [4]=>
  string(6) "thing""
}

And the expected output is:
"some text","test \",thing"
("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*)
array(5) {
  [0]=>
  string(27) ""some text","test \",thing""
  [1]=>
  string(20) ""some text""
  [2]=>
  string(18) "some text"
  [3]=>
  string(1) "t"
  [4]=>
  string(6) ""test \", thing""
  [5]=>
  string(6) "test \", thing"
  [6]=>
  string(1) "g"
}

Sorry for the confusion.

------------------------------------------------------------------------

The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at
    http://bugs.php.net/33334

-- 
Edit this bug report at http://bugs.php.net/?id=33334&edit=1

Reply via email to