Hi,

Today we encountered an exception in a source file that wasn't in UTF-8.
This caused first a warning and then an error in the morbo output. Apache2
showed a "502 Proxy Error: Error reading from remote server" from in our
apache2+morbo setup.

The file on-disk is "binary", because it uses a perl source filter
<http://perldoc.perl.org/perlfilter.html>. Once given to the parser, it
*is* UTF-8. The same situation could've been occurred in a 3rd-party
library if it was encoded in other than [UTF-8, ASCII]. Something similar
was previously discussed in an old thread, Molo::Exception generate
warnings while read non-utf8 file
<https://groups.google.com/forum/#%21searchin/mojolicious/Mojo$3A$3AException$20UTF-8%7Csort:relevance/mojolicious/6c-ZavT2KzQ/vTsJO2E6RQkJ>,
but no solution was presented there.

This occurs when an exception is encountered in a source file, and
"Mojo::Exception->throw($@)" is called because of Mojolicious.pm's

local $SIG{__DIE__}
  = sub { ref $_[0] ? CORE::die $_[0] : Mojo::Exception->throw(shift) };

First, Mojo::Exception's "sub inspect" gave warnings because it guesses the
source file from parsing $@ and opens the source file with:

next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
$self->_context($file->[1], [[<$handle>]]);

Next we get a "Malformed UTF-8 character" fatal error from Mojolicious
(when generating output?) because of a s/// in Mojo::Util's xml_escape:

sub xml_escape {
  return $_[0] if ref $_[0] && ref $_[0] eq 'Mojo::ByteStream';
  my $str = shift // '';
  $str =~ s/([&<>"'])/$XML{$1}/ge;
  return $str;
}

It seems to be that the reading of non-UTF-8 data causes a warning, but the
later s/// causes a fatal exception. The behavior I see is replicated by
this snippet:

#!/usr/bin/perl -w
use strict;
open my $handle, '<:utf8', "binary.bin";
my $str = join('', <$handle>);
$str =~ s/([&<>"'])/foo/;

I'd like to suggest that reading a source file with UTF-8 problems be
treated as if the file was unreadable. To that end, I suggest this patch
(against current master HEAD 49dd3e7):

londo@peter:~/work/mojo> git diff
diff --git a/lib/Mojo/Exception.pm b/lib/Mojo/Exception.pm
index b218ee0..08759f5 100644
--- a/lib/Mojo/Exception.pm
+++ b/lib/Mojo/Exception.pm
@@ -20,7 +20,14 @@ sub inspect {
   # Search for context in files
   for my $file (@files) {
     next unless -r $file->[0] && open my $handle, '<:utf8', $file->[0];
-    $self->_context($file->[1], [[<$handle>]]);
+    # If there are UTF-8 problems in the source file, don't store any
context
+    eval {
+      use warnings 'FATAL' => 'utf8';
+      $self->_context($file->[1], [[<$handle>]]);
+    };
+    if ($@) {
+      next;
+    }
     return $self;
   }

For ASCII or UTF-8 encoded source files there is no change. For files with
UTF-8 problems, they are simply not stored as _context(). But they also
don't cause the entire application to crash. The best of both worlds.

Would that be acceptable? If so, should I just create a github PR? If not,
can somebody suggest an alternative given that some source files are in
fact not UTF-8?

Sincerely,

Peter

-- 
You received this message because you are subscribed to the Google Groups 
"Mojolicious" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to mojolicious+unsubscr...@googlegroups.com.
To post to this group, send email to mojolicious@googlegroups.com.
Visit this group at https://groups.google.com/group/mojolicious.
For more options, visit https://groups.google.com/d/optout.

Reply via email to