Hi Eric,

I work with Michael, and this discussion sure got off on the
wrong foot... we love unicorn and use it heavily, and just
want to contribute back to it.

To detail the encoding problem we were trying to fix, unicorn
uses rb_str_new in several places to create Ruby strings.
For Ruby 1.9 and later, these strings are assigned ASCII-8BIT
encoding.

While the Rack specification doesn't dictate what encoding
should be used for strings in the environment, many
developers would probably expect the default external encoding
setting in Encoding.default_external to be used.

Many Rails applications use UTF8 heavily. The use of ASCII-8BIT
in the env can lead to Encoding::CompatibilityErrors being
raised when a UTF8 string and ASCII-8BIT string are concatenated,
which happens frequently when properties like request.url are
referenced in erb templates. To get around these problems,
an app would have to force encoding on the strings in the env
manually. It seems a shame to do this in slower Ruby code when
it could be done up front by unicorn.

We'd like to propose that unicorn use rb_external_str_new to
make strings instead of rb_str_new.

Perhaps you have your reasons for continuing to use rb_str_new
but we figured we'd run this by you.

Here's a proposed patch.

>From befb01530c8d930ba53cc58b979ddf42a4c12565 Mon Sep 17 00:00:00 2001
From: Gary Grossman <[email protected]>
Date: Sat, 2 Aug 2014 00:19:30 -0700
Subject: [PATCH] If unicorn is used with Ruby 1.9 or later, use
 rb_external_str_new instead of rb_str_new to create strings. The resulting
 strings will use the default external encoding. Continue using rb_str_new
for
 older versions of Ruby.

Using the default external encoding instead of ASCII-8BIT for
strings is more in line with developer expectations and will cause
less unexpected bugs such as Encoding::CompatibilityErrors which
result when, say, a UTF8 string and ASCII-8BIT string are
concatenated together.

Added a unit test to ensure that strings returned in the Rack
environment conform to the default external encoding.
---
 ext/unicorn_http/ext_help.h |  6 ++++++
 test/unit/test_request.rb   | 13 +++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/ext/unicorn_http/ext_help.h b/ext/unicorn_http/ext_help.h
index c87c272..6806f8e 100644
--- a/ext/unicorn_http/ext_help.h
+++ b/ext/unicorn_http/ext_help.h
@@ -79,4 +79,10 @@ static int str_cstr_case_eq(VALUE val, const char *ptr,
long len)
 #define STR_CSTR_CASE_EQ(val, const_str) \
   str_cstr_case_eq(val, const_str, sizeof(const_str) - 1)

+#ifdef HAVE_RUBY_ENCODING_H
+/* Use default external encoding for strings for Ruby 1.9+,
+ * fall back to rb_str_new when unavailable */
+#define rb_str_new rb_external_str_new
+#endif
+
 #endif /* ext_help_h */
diff --git a/test/unit/test_request.rb b/test/unit/test_request.rb
index fbda1a2..0a105e0 100644
--- a/test/unit/test_request.rb
+++ b/test/unit/test_request.rb
@@ -179,4 +179,17 @@ class RequestTest < Test::Unit::TestCase
     env['rack.input'].rewind
     res = @lint.call(env)
   end
+
+  def test_encoding
+    if ''.respond_to?(:encoding)
+      client = MockRequest.new("GET http://e:3/x?y=z HTTP/1.1\r\n" \
+                               "Host: foo\r\n\r\n")
+      env = @request.read(client)
+      encoding = Encoding.default_external
+      assert_equal encoding, env['REQUEST_PATH'].encoding
+      assert_equal encoding, env['PATH_INFO'].encoding
+      assert_equal encoding, env['QUERY_STRING'].encoding
+    end
+  end
+
 end
-- 
1.9.1

Gary


Reply via email to