I sent out the following email with attachment, which is awaiting moderation. So while it's getting approved, I thought I'd shared my findings for those interested.

Thanks,
Chris

==========================

Scott,

After two full days of debugging, I finally narrowed it and found the main culprit. It's an extremely subtle bug that I couldn't reproduce when I was doing direct remote debugging with Intellij.

Well, there are actually three bugs. One of them you have already caught, which relates to the processing of the "?". I believe there is possibly another bug that related to the reluctant quantifier (ie. 'X*?').

=============================
Bug 1
=============================
This reluctant quantifier may or may not be a bug, but I added an additional check into the Regcomp.java class to do an additional check. If the current character is '*' and the next one is '?', then the code will instantiate a CharUngreedyLoop even if "isGreedy()" returns false. To fix this, I added the following switch statement lines into Regcomp.java line 181:

    case '*':
      if (tail == null)
        throw error(L.l("'*' requires a preceeding regexp"));

      //special case for reluctant quantifier
      switch(pattern.peek()) {
          case '?':
              pattern.read();
tail = tail.createLoopUngreedy(this, 0, Integer.MAX_VALUE);
              break;
          default:
              tail = createLoop(pattern, tail, 0, Integer.MAX_VALUE);
      }

      return parseRec(pattern, tail.getTail());

=========================
Bug 2 ------- BIG SUBTLE BUG
=========================

This one was ultra difficult to catch and I believe it may either be a JDK issue or a Linux issue. When resin is first started, quercus runs and parses code without any problems. The problem starts appearing after 4 times of refresh (removing the phpbb3 cached files and then refresh the same page in the browser). This was consistent under my hardware setup, which is running CentOS 5 with JDK 1.5.0_14. With remote debugging, I could never catch the problem in the act.

With recompilation and bunch of System.out statements, I was able to find out that somehow, Integer.MAX_VALUE comparison is causing weird errors. In RegexNode.java, there are lines such as these:

for (; i <= max; i++) { <================= CULPRIT
        tail = next.match(string, offset + i, state);

        if (tail >= 0)
          return tail;
        if (node.match(string, offset + i, state) < 0) {
          return -1;
        }
      }

These are located in various classes in the source file. The weirdest thing is that after a few times of running the same code, the conditional statement "i <= max" is somehow returning false. This happens when i = 0 and max = Integer.MAX_VALUE. Techncially, Integer.MAX_VALUE should be considered a -1. There is probably some sort of subtle bug here either on the hardware, the OS, or the JDK. It could even be a JIT compilation issue.

I fixed this problem by doing a conditional check in each individual constructor:

        if (_max == Integer.MAX_VALUE)
            _max = Integer.MAX_VALUE - 1;

This actually worked. Things became stable again. This is obviously a hack since I couldn't figure out what the real problem may be. The other possibility is the fact that somehow JIT compiler compiled this particular type of for statement (a for statement that does not have the first part initialization code) into something that was being run improperly.

I didn't get a chance to test this by either disable JIT or by changing the for statement into a while statement.

============================================

Given that bug #1 and the bug you mentioned is not being parsed properly, I was able to do a workaround for the regexp. I changed this original phpbb regexp:

#<!-- ([^<].*?) (?:.*?)? ?-->#

to

#<!-- ([^<].*) (.*)? ?-->#U

In this way, the U stands for Non-greedy. This effectively makes the '.*?' unnecessary. I began running this with the MAX_VALUE hack and phpbb3 now renders all the smarty templates properly. Some other regular expressions that I tried:

#<!-- ([^<].*?) (.*)? ?-->#U  == works
#<!-- ([^<].*?) (?:.*)? ?-->#U  == fails to match properly with ?: added
#<!-- ([^<].*) (?:.*)? ?-->#U == fails to match properly as above statement #<!-- ([^<].*?) (.*?)? ?--># == works seemingly with no problems (using my patch from above)

So it looks like "(?:.*?)?" is a very special case that requires some particular mmm... parsing and evaluation.


=========================================


For reference, I have attached to this email the two source files that I modified. For those who are interested in making this work, you are more than welcome to download Quercus 3.1.5 source code and replace the two files with these.

Thanks,
Chris


On Mar 13, 2008, at 9:45 PM, Scott Ferguson wrote:


On Mar 13, 2008, at 4:41 PM, Chris Chen wrote:

I've narrowed this down to the regular expression issue. It's failing on a regular expression that appears to work initially but later on fails for some very strange reason. I have not yet been able to pinpoint, but let me show you what I have so far.

In PHPBB3's includes/function_template.php file, there is a regular expression line:

preg_match_all('#<!-- ([^<].*?) (.*?)? ?-->#', $code, $blocks, PREG_SET_ORDER);

I've filed that as http://bugs.caucho.com/view.php?id=2526

The "(.*?)?" is more complicated than you'd think.

-- Scott


The $code is actually the template file to parse. This line fails miserably. Interestingly, the regex a few lines above it works well:

preg_match_all('#<!-- INCLUDE ([a-zA-Z0-9\_\-\+\./]+) -->#', $code, $matches);

It's parsing the same $code file.

I am currently testing the possible differences between the two expressions to narrow down the cause. It appears to be the "?" that may be causing serious issues or it could be more than that. Still checking.

I'm hoping that this may give either Scott or you some ideas on where to look cuz this is a serious issue if the regex parser is badly written. Too many php codes rely on regex.


-Chris

On Mar 13, 2008, at 1:59 PM, Andrew Fritz wrote:

Chris Chen wrote:

Your problem is similar to mine.


I spent the entire day yesterday trying to debug phpbb3 running on
quercus.  Phpbb3 appears to be using either Smarty or something
similar and I am getting almost the exact problem that you're getting.

It was super difficult to debug scripts under quercus.  Debugging
under IntelliJ is driving me nuts. :)

Yah, IntelliJ is the bomb... To bad they don't have PHP support yet... (make sure you leave a comment in their suggestion box that they need it).
Since i am able to consistently get quercus to not process smarty
tags, I am a bit closer to finding the real cause to the problem. I believe I've at least narrowed it down to either the Regex module or
the Array storage/processing in Quercus.

If I get some time, I may continue to debug and find out why Quercus
simply hates these regular expressions.

I'd guess Array, but that is based only on my experience with lazy initialization in hibernate not working on contained lists (but it works fine everywhere else). Of course, regex is probably equally likely. I spent the better part of the day trying to get HTMLPurifier working. The end result was a java class that System.executes native php to clean html and print the results... I never got it to run under Resin. It would fail to tokenize the HTML... In any case, I would appreciate any info you have if you manage to sort it out.
Perhaps spending the time trying to get vbulletin to work under
quercus might be a more productive use of my time. :)

-Chris

On Mar 13, 2008, at 1:06 PM, Andrew Fritz wrote:


Sorry if this is the wrong list, but the Quercus list appears to be
KIA.
There is no mail in the archive and mail to quercus-
[EMAIL PROTECTED]
bounces.

Now to my question/statement:

Smarty works great in 3.1.3, but is broken in 3.1.4 and 3.1.5. I opens the template file and returns the contents unprocessed. All the tags remain in the file. Is there a work around to make it work again? I
need
to upgrade to try to get around another bug, but can't right now since
our side is pretty much 100% smarty templates.

Andrew



_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

_______________________________________________
resin-interest mailing list
resin-interest@caucho.com
http://maillist.caucho.com/mailman/listinfo/resin-interest

Reply via email to