Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-29 Thread Paul Barry
I think you are correct.  When I was looking at the packets and seeing two characters, it is actually the characters 
that are equal to the 2 bytes that make up the single UTF-8 character.  I thought the browser was somehow not correctly 
encoding my data, because it was turning 1 character into 2 characters, but actually it is UTF-8 encoding my character 
correctly.  So I think if I use something to read the data and convert it from UTF-8 to Unicode, I will get the correct 
data on the server.

So from reading the documentation about FileUpload, that seems to be the way to go, but now my question is how to 
integrate FileUpload with struts?  My thought would be to call a method to populate an ActionForm in the beginning of my 
action, and then use that ActionForm instead of the one I get from the requestProcessor.  So like this:

 public ActionForward execute(
 ActionMapping mapping,
 ActionForm pform,
 HttpServletRequest request,
 HttpServletResponse response)
 throws Exception {
 TestActionForm form = getFormUsingFileUpload(request);
 log.info(The value is: +form.getTest());
 return null;
}
Is this how others have used Jakarta Commons FileUpload with Struts, or is there a better way?

Jason Lea wrote:

 From what I can see there Resin is expecting UTF-8 for any paramters 
passed to it, and decoding it correctly.  However multipart/form-data is 
treated differently as the data is not passed as normal parameters so 
the request.getParameter() cannot be used here (and servlet filters that 
set the request encoding won't help either).

You normally have to use something like the FileUpload component to 
extract form fields and files from the request.  This component is not 
going to know about the character encoding you have given to resin, so 
it will use the default which is probably US-ASCII.  With UTF-8 a single 
character can be rendered as 1, 2 or 3 bytes.  When decoding a UTF-8 
string the decoder will combine the 1,2 or 3 byte combinations into 1 
Unicode character.  When UTF-8 is not used to decode the string you will 
see the individual bytes.

Looking here (the jakarta apache FileUpload component):
http://jakarta.apache.org/commons/fileupload/apidocs/org/apache/commons/fileupload/FileUploadBase.html 

They have a setHeaderEncoding() method which I assume will deal with 
this problem (I haven't tested this so I don't know).  Are you using a 
file upload component?

Paul Barry wrote:

I am using Struts 1.1 in an application that needs to support the 
UTF-8 character set.  I am using Resin 2.1.10 with 
character-encoding=UTF-8, and on most of my forms this seems to work 
just fine.  I am having problems with forms that have to use the 
multipart/form-data enctype for handling uploading files.  If I print 
out the value of a text element in an html:form where the enctype is 
not set at all (which ends up using 
application/x-www-form-urlencoded), using UTF-8 characters works 
fine.  This is what I get:

INFO - test.TestAction - The value is: ä

Here is what the actual HTTP request that gets sent to the server 
looks like:

--- Start HTTP Request 
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, 
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 11
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd

test=%C3%AD
--- End HTTP Request 
--

But if I modify my html:form to use enctype=multipart/form-data, I 
get this:

INFO - test.TestAction - The value is: A¤

And the HTTP request looks like this:

--- Start HTTP Request 
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, 
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: multipart/form-data; 
boundary=---7d319628600e4
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 141
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd

-7d319628600e4
Content-Disposition: form-data; name=test
í
-7d319628600e4-
--- End HTTP Request 
--

It looks as if the character is already messed up before it even gets 
to the servlet container.  There are messages in the mailing list 
archive that discuss this problem, but I didn't see a solution.  What 
is the best way to handle 

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-29 Thread Martin Cooper
In Struts 1.1, the default file upload mechanism *is* Commons FileUpload.
;-)

It seems that you may have omitted to tell the browser explicitly that your
pages are in UTF-8. For some reason that I've never fully understood, that
causes the browser to use UTF-8 when it submits subsequent requests from
that page. Make sure that you use a meta element in your head to specify
UTF-8.

--
Martin Cooper


Paul Barry [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
 I think you are correct.  When I was looking at the packets and seeing two
characters, it is actually the characters
 that are equal to the 2 bytes that make up the single UTF-8 character.  I
thought the browser was somehow not correctly
 encoding my data, because it was turning 1 character into 2 characters,
but actually it is UTF-8 encoding my character
 correctly.  So I think if I use something to read the data and convert it
from UTF-8 to Unicode, I will get the correct
 data on the server.

 So from reading the documentation about FileUpload, that seems to be the
way to go, but now my question is how to
 integrate FileUpload with struts?  My thought would be to call a method to
populate an ActionForm in the beginning of my
 action, and then use that ActionForm instead of the one I get from the
requestProcessor.  So like this:

   public ActionForward execute(
   ActionMapping mapping,
   ActionForm pform,
   HttpServletRequest request,
   HttpServletResponse response)
   throws Exception {
   TestActionForm form = getFormUsingFileUpload(request);
   log.info(The value is: +form.getTest());
   return null;
  }

 Is this how others have used Jakarta Commons FileUpload with Struts, or is
there a better way?


 Jason Lea wrote:

   From what I can see there Resin is expecting UTF-8 for any paramters
  passed to it, and decoding it correctly.  However multipart/form-data is
  treated differently as the data is not passed as normal parameters so
  the request.getParameter() cannot be used here (and servlet filters that
  set the request encoding won't help either).
 
  You normally have to use something like the FileUpload component to
  extract form fields and files from the request.  This component is not
  going to know about the character encoding you have given to resin, so
  it will use the default which is probably US-ASCII.  With UTF-8 a single
  character can be rendered as 1, 2 or 3 bytes.  When decoding a UTF-8
  string the decoder will combine the 1,2 or 3 byte combinations into 1
  Unicode character.  When UTF-8 is not used to decode the string you will
  see the individual bytes.
 
  Looking here (the jakarta apache FileUpload component):
 
http://jakarta.apache.org/commons/fileupload/apidocs/org/apache/commons/fileupload/FileUploadBase.html
 
 
  They have a setHeaderEncoding() method which I assume will deal with
  this problem (I haven't tested this so I don't know).  Are you using a
  file upload component?
 
 
  Paul Barry wrote:
 
  I am using Struts 1.1 in an application that needs to support the
  UTF-8 character set.  I am using Resin 2.1.10 with
  character-encoding=UTF-8, and on most of my forms this seems to work
  just fine.  I am having problems with forms that have to use the
  multipart/form-data enctype for handling uploading files.  If I print
  out the value of a text element in an html:form where the enctype is
  not set at all (which ends up using
  application/x-www-form-urlencoded), using UTF-8 characters works
  fine.  This is what I get:
 
  INFO - test.TestAction - The value is: ä
 
  Here is what the actual HTTP request that gets sent to the server
  looks like:
 
  --- Start HTTP Request
  -
  POST /testForm.do HTTP/1.1
  Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
  application/x-shockwave-flash, */*
  Referer: http://pbdesktop/test.do
  Accept-Language: en-us
  Content-Type: application/x-www-form-urlencoded
  Accept-Encoding: gzip, deflate
  User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
  Host: pbdesktop
  Content-Length: 11
  Connection: Keep-Alive
  Cache-Control: no-cache
  Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
 
  test=%C3%AD
  --- End HTTP Request
  --
 
  But if I modify my html:form to use enctype=multipart/form-data, I
  get this:
 
  INFO - test.TestAction - The value is: A¤
 
  And the HTTP request looks like this:
 
  --- Start HTTP Request
  -
  POST /testForm.do HTTP/1.1
  Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
  application/x-shockwave-flash, */*
  Referer: http://pbdesktop/test.do
  Accept-Language: en-us
  Content-Type: multipart/form-data;
  boundary=---7d319628600e4
  Accept-Encoding: gzip, deflate
  User-Agent: Mozilla/4.0 (compatible; 

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-29 Thread Paul Barry
By using a meta element, do you mean this:

meta http-equiv=Content-Type content=test/html; charset=utf-8

That doesn't seem to work when the form is multipart/form-data, because the Content-Type header still just has 
multipart/form-data.  The problem seems to be that when I do a request.getCharacterEncoding(), I get null.  Is that 
normal?  I would think I should at least get the default character encoding for the webapp. I am using Resin 2.1.10. 
This might be an issue for me to report to them.

This definately is what is causing my problem, because if I look at the code in 
org.apache.struts.upload.CommonsMultipartRequestHandler.addTextParameter(), this is the first thing is does:

try {
value = item.getString(request.getCharacterEncoding());
} catch (Exception e) {
value = item.getString();
}
Since request.getCharacterEncoding() is null, I assume an Exception is being throw an caught (a log.warn() might be nice 
there) and then I am get getting the string without decoding it from UTF-8.

If I manually set the characterEncoding to UTF-8 before this code executes (in processMultipart() in the 
requestProcessor for example), then everything works fine.

So I guess my question is should I be expecting request.getCharacterEncoding() to return null or is there a bug in my 
app server?





Martin Cooper wrote:

In Struts 1.1, the default file upload mechanism *is* Commons FileUpload.
;-)
It seems that you may have omitted to tell the browser explicitly that your
pages are in UTF-8. For some reason that I've never fully understood, that
causes the browser to use UTF-8 when it submits subsequent requests from
that page. Make sure that you use a meta element in your head to specify
UTF-8.
--
Martin Cooper
Paul Barry [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
I think you are correct.  When I was looking at the packets and seeing two
characters, it is actually the characters

that are equal to the 2 bytes that make up the single UTF-8 character.  I
thought the browser was somehow not correctly

encoding my data, because it was turning 1 character into 2 characters,
but actually it is UTF-8 encoding my character

correctly.  So I think if I use something to read the data and convert it
from UTF-8 to Unicode, I will get the correct

data on the server.

So from reading the documentation about FileUpload, that seems to be the
way to go, but now my question is how to

integrate FileUpload with struts?  My thought would be to call a method to
populate an ActionForm in the beginning of my

action, and then use that ActionForm instead of the one I get from the
requestProcessor.  So like this:

 public ActionForward execute(
 ActionMapping mapping,
 ActionForm pform,
 HttpServletRequest request,
 HttpServletResponse response)
 throws Exception {
 TestActionForm form = getFormUsingFileUpload(request);
 log.info(The value is: +form.getTest());
 return null;
}
Is this how others have used Jakarta Commons FileUpload with Struts, or is
there a better way?

Jason Lea wrote:


From what I can see there Resin is expecting UTF-8 for any paramters
passed to it, and decoding it correctly.  However multipart/form-data is
treated differently as the data is not passed as normal parameters so
the request.getParameter() cannot be used here (and servlet filters that
set the request encoding won't help either).
You normally have to use something like the FileUpload component to
extract form fields and files from the request.  This component is not
going to know about the character encoding you have given to resin, so
it will use the default which is probably US-ASCII.  With UTF-8 a single
character can be rendered as 1, 2 or 3 bytes.  When decoding a UTF-8
string the decoder will combine the 1,2 or 3 byte combinations into 1
Unicode character.  When UTF-8 is not used to decode the string you will
see the individual bytes.
Looking here (the jakarta apache FileUpload component):

http://jakarta.apache.org/commons/fileupload/apidocs/org/apache/commons/fileupload/FileUploadBase.html

They have a setHeaderEncoding() method which I assume will deal with
this problem (I haven't tested this so I don't know).  Are you using a
file upload component?
Paul Barry wrote:


I am using Struts 1.1 in an application that needs to support the
UTF-8 character set.  I am using Resin 2.1.10 with
character-encoding=UTF-8, and on most of my forms this seems to work
just fine.  I am having problems with forms that have to use the
multipart/form-data enctype for handling uploading files.  If I print
out the value of a text element in an html:form where the enctype is
not set at all (which ends up using
application/x-www-form-urlencoded), using UTF-8 characters works
fine.  This is what I get:
INFO - test.TestAction - The value is: ä

Here is what the actual HTTP request that gets sent to the server
looks like:

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-29 Thread Jason Lea
Paul Barry wrote:

By using a meta element, do you mean this:

meta http-equiv=Content-Type content=test/html; charset=utf-8

That doesn't seem to work when the form is multipart/form-data, because the Content-Type header still just has 
multipart/form-data.  The problem seems to be that when I do a request.getCharacterEncoding(), I get null.  Is that 
normal?  I would think I should at least get the default character encoding for the webapp. I am using Resin 2.1.10. 
This might be an issue for me to report to them.

This definately is what is causing my problem, because if I look at the code in 
org.apache.struts.upload.CommonsMultipartRequestHandler.addTextParameter(), this is the first thing is does:

try {
value = item.getString(request.getCharacterEncoding());
} catch (Exception e) {
value = item.getString();
}
Since request.getCharacterEncoding() is null, I assume an Exception is being throw an caught (a log.warn() might be nice 
there) and then I am get getting the string without decoding it from UTF-8.

If I manually set the characterEncoding to UTF-8 before this code executes (in processMultipart() in the 
requestProcessor for example), then everything works fine.

So I guess my question is should I be expecting request.getCharacterEncoding() to return null or is there a bug in my 
app server?
 

Nope, it is not a bug. The browser hasn't set the encoding, so the 
method returns null to indicate this.

Here is the relevant section from Java Servlet Specification Version 
2.3 (servlet-2_3-fcs-spec.pdf)

SRV.4.9 Request data encoding
Currently, many browsers do not send a char encoding qualifier with the 
Content-Type header, leaving open the determination of the character 
encoding for reading HTTP requests. The default encoding of a request 
the container uses to create the request reader and parse POST data must 
be ISO-8859-1, if none has been specified by the client request. 
However, in order to indicate to the developer in this case the failure 
of the client to send a character encoding, the container returns null 
from the getCharacterEncoding method.

If the client hasnt set character encoding and the request data is 
encoded with a different encoding than the default as described above, 
breakage can occur. To remedy this situation, a new method 
setCharacterEncoding(String enc) has been added to the ServletRequest 
interface. Developers can override the character encoding supplied by 
the container by calling this method. It must be called prior to parsing 
any post data or reading any input from the request. Calling this method 
once data has been read will not affect the encoding.






Martin Cooper wrote:

 

In Struts 1.1, the default file upload mechanism *is* Commons FileUpload.
;-)
It seems that you may have omitted to tell the browser explicitly that your
pages are in UTF-8. For some reason that I've never fully understood, that
causes the browser to use UTF-8 when it submits subsequent requests from
that page. Make sure that you use a meta element in your head to specify
UTF-8.
--
Martin Cooper
Paul Barry [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
   

I think you are correct.  When I was looking at the packets and seeing two
 

characters, it is actually the characters

   

that are equal to the 2 bytes that make up the single UTF-8 character.  I
 

thought the browser was somehow not correctly

   

encoding my data, because it was turning 1 character into 2 characters,
 

but actually it is UTF-8 encoding my character

   

correctly.  So I think if I use something to read the data and convert it
 

from UTF-8 to Unicode, I will get the correct

   

data on the server.

So from reading the documentation about FileUpload, that seems to be the
 

way to go, but now my question is how to

   

integrate FileUpload with struts?  My thought would be to call a method to
 

populate an ActionForm in the beginning of my

   

action, and then use that ActionForm instead of the one I get from the
 

requestProcessor.  So like this:

   

public ActionForward execute(
ActionMapping mapping,
ActionForm pform,
HttpServletRequest request,
HttpServletResponse response)
throws Exception {
TestActionForm form = getFormUsingFileUpload(request);
log.info(The value is: +form.getTest());
return null;
   }
Is this how others have used Jakarta Commons FileUpload with Struts, or is
 

there a better way?

   

Jason Lea wrote:

 

From what I can see there Resin is expecting UTF-8 for any paramters
passed to it, and decoding it correctly.  However multipart/form-data is
treated differently as the data is not passed as normal parameters so
the request.getParameter() cannot be used here (and servlet filters that
set the request encoding won't help either).
You normally have to use something like the FileUpload 

RE: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread José Gustavo Zagato
Hi !

I don't if it will fit into your needs but, to handler UTF-8 I
build a serverlet filter with handles all encode / Decode operations. As
far as I know this approach is not a pure Struts solution but works
really fine !
I didn't test with a upload form like yours, but it’s a shot !

Regards

  José Gustavo Zagato Rosa
System Analyst - Atos Origin
[EMAIL PROTECTED]


-Original Message-
From: Paul Barry [mailto:[EMAIL PROTECTED] 
Sent: terça-feira, 28 de outubro de 2003 12:07
To: [EMAIL PROTECTED]
Subject: Problem with UTF-8 characters in a mutlipart/form-data encoded
form

I am using Struts 1.1 in an application that needs to support the UTF-8
character set.  I am using Resin 2.1.10 with 
character-encoding=UTF-8, and on most of my forms this seems to work
just fine.  I am having problems with forms that 
have to use the multipart/form-data enctype for handling uploading
files.  If I print out the value of a text element in 
an html:form where the enctype is not set at all (which ends up using
application/x-www-form-urlencoded), using UTF-8 
characters works fine.  This is what I get:

INFO - test.TestAction - The value is: ä

Here is what the actual HTTP request that gets sent to the server looks
like:

--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 11
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd

test=%C3%AD
--- End HTTP Request
--

But if I modify my html:form to use enctype=multipart/form-data, I get
this:

INFO - test.TestAction - The value is: A¤

And the HTTP request looks like this:

--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: multipart/form-data;
boundary=---7d319628600e4
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 141
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd

-7d319628600e4
Content-Disposition: form-data; name=test

í
-7d319628600e4-
--- End HTTP Request
--

It looks as if the character is already messed up before it even gets to
the servlet container.  There are messages in 
the mailing list archive that discuss this problem, but I didn't see a
solution.  What is the best way to handle UTF-8 
characters in a multipart/form-data encoded form?

Here is the code that I am testing with:

/test/test.jsp:
%@ taglib uri=WEB-INF/taglib/struts-html.tld prefix=html %
%@ taglib uri=WEB-INF/taglib/struts-bean.tld prefix=bean %

html:html
   body
 html:form action=testForm.do enctype=multipart/form-data
   html:text property=test /
   html:submit /
 /html:form
   /body
/html:html

Relavent parts of struts-config.xml:
struts-config

   form-beans
 form-bean name=testForm type=test.TestActionForm /
   /form-beans

   action-mappings
 action path=/test type=org.apache.struts.actions.ForwardAction
parameter=/test/test.jsp /
 action path=/testForm type=test.TestAction name=testForm
input=/test.do scope=request /
   /action-mappings

   controller contentType=text/html;charset=UTF-8 /

struts-config/

test.TestAction:
package test;

import javax.servlet.http.*;
import org.apache.commons.logging.*;
import org.apache.struts.action.*;

public class TestAction extends Action {
private static final Log log =
LogFactory.getLog(TestAction.class);

public ActionForward execute(
ActionMapping mapping,
ActionForm pform,
HttpServletRequest request,
HttpServletResponse response)
throws Exception {
TestActionForm form = (TestActionForm)pform;
log.info(The value is: +form.getTest());
return null;
}
}

test.TestActionForm:
package test;

import org.apache.struts.action.ActionForm;

public class TestActionForm extends ActionForm {
private String test;
public String getTest() { return test;  }
public void setTest(String string) { test = string; }
}


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional 

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread Paul Barry
Does it work with multipart/form-data encoding?  It seems to me that this problem is happening before the form is 
submitted to the servlet container (take a look at the value of test in the HTTP request with Content-Type: 
multipart/form-data in my original post), so the servlet filter wouldn't help, but I could be wrong.

Jos Gustavo Zagato wrote:

Hi !

I don't if it will fit into your needs but, to handler UTF-8 I
build a serverlet filter with handles all encode / Decode operations. As
far as I know this approach is not a pure Struts solution but works
really fine !
I didn't test with a upload form like yours, but its a shot !
Regards

  Jos Gustavo Zagato Rosa
System Analyst - Atos Origin
[EMAIL PROTECTED]
-Original Message-
From: Paul Barry [mailto:[EMAIL PROTECTED] 
Sent: tera-feira, 28 de outubro de 2003 12:07
To: [EMAIL PROTECTED]
Subject: Problem with UTF-8 characters in a mutlipart/form-data encoded
form

I am using Struts 1.1 in an application that needs to support the UTF-8
character set.  I am using Resin 2.1.10 with 
character-encoding=UTF-8, and on most of my forms this seems to work
just fine.  I am having problems with forms that 
have to use the multipart/form-data enctype for handling uploading
files.  If I print out the value of a text element in 
an html:form where the enctype is not set at all (which ends up using
application/x-www-form-urlencoded), using UTF-8 
characters works fine.  This is what I get:

INFO - test.TestAction - The value is: 

Here is what the actual HTTP request that gets sent to the server looks
like:
--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 11
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
test=%C3%AD
--- End HTTP Request
--
But if I modify my html:form to use enctype=multipart/form-data, I get
this:
INFO - test.TestAction - The value is: A

And the HTTP request looks like this:

--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: multipart/form-data;
boundary=---7d319628600e4
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 141
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
-7d319628600e4
Content-Disposition: form-data; name=test

-7d319628600e4-
--- End HTTP Request
--
It looks as if the character is already messed up before it even gets to
the servlet container.  There are messages in 
the mailing list archive that discuss this problem, but I didn't see a
solution.  What is the best way to handle UTF-8 
characters in a multipart/form-data encoded form?

Here is the code that I am testing with:

/test/test.jsp:
%@ taglib uri=WEB-INF/taglib/struts-html.tld prefix=html %
%@ taglib uri=WEB-INF/taglib/struts-bean.tld prefix=bean %
html:html
   body
 html:form action=testForm.do enctype=multipart/form-data
   html:text property=test /
   html:submit /
 /html:form
   /body
/html:html
Relavent parts of struts-config.xml:
struts-config
   form-beans
 form-bean name=testForm type=test.TestActionForm /
   /form-beans
   action-mappings
 action path=/test type=org.apache.struts.actions.ForwardAction
parameter=/test/test.jsp /
 action path=/testForm type=test.TestAction name=testForm
input=/test.do scope=request /
   /action-mappings
   controller contentType=text/html;charset=UTF-8 /

struts-config/

test.TestAction:
package test;
import javax.servlet.http.*;
import org.apache.commons.logging.*;
import org.apache.struts.action.*;
public class TestAction extends Action {
private static final Log log =
LogFactory.getLog(TestAction.class);

public ActionForward execute(
ActionMapping mapping,
ActionForm pform,
HttpServletRequest request,
HttpServletResponse response)
throws Exception {
TestActionForm form = (TestActionForm)pform;
log.info(The value is: +form.getTest());
return null;
}
}
test.TestActionForm:
package test;
import org.apache.struts.action.ActionForm;


RE: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread javen fang
It's true, although servlet filter is not pure struts
method, but it is used to solve character-encoding
widely in struts framework. 


--- Jos#38283;Gustavo_Zagato
[EMAIL PROTECTED] wrote:
 Hi !
 
   I don't if it will fit into your needs but, to
 handler UTF-8 I
 build a serverlet filter with handles all encode /
 Decode operations. As
 far as I know this approach is not a pure Struts
 solution but works
 really fine !
 I didn't test with a upload form like yours, but
 it#25263; a shot !
 
 Regards
 
   Jos?Gustavo Zagato Rosa
 System Analyst - Atos Origin
 [EMAIL PROTECTED]
 
 
 -Original Message-
 From: Paul Barry [mailto:[EMAIL PROTECTED] 
 Sent: ter#37872;-feira, 28 de outubro de 2003 12:07
 To: [EMAIL PROTECTED]
 Subject: Problem with UTF-8 characters in a
 mutlipart/form-data encoded
 form
 
 I am using Struts 1.1 in an application that needs
 to support the UTF-8
 character set.  I am using Resin 2.1.10 with 
 character-encoding=UTF-8, and on most of my forms
 this seems to work
 just fine.  I am having problems with forms that 
 have to use the multipart/form-data enctype for
 handling uploading
 files.  If I print out the value of a text element
 in 
 an html:form where the enctype is not set at all
 (which ends up using
 application/x-www-form-urlencoded), using UTF-8 
 characters works fine.  This is what I get:
 
 INFO - test.TestAction - The value is: ? 
 Here is what the actual HTTP request that gets sent
 to the server looks
 like:
 
 --- Start HTTP Request

-
 POST /testForm.do HTTP/1.1
 Accept: image/gif, image/x-xbitmap, image/jpeg,
 image/pjpeg,
 application/x-shockwave-flash, */*
 Referer: http://pbdesktop/test.do
 Accept-Language: en-us
 Content-Type: application/x-www-form-urlencoded
 Accept-Encoding: gzip, deflate
 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
 Windows NT 5.0)
 Host: pbdesktop
 Content-Length: 11
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: SERVER=op; locale=en_US;
 JSESSIONID=aoUCARQpqsLd
 
 test=%C3%AD
 --- End HTTP Request

--
 
 But if I modify my html:form to use
 enctype=multipart/form-data, I get
 this:
 
 INFO - test.TestAction - The value is: A? 
 And the HTTP request looks like this:
 
 --- Start HTTP Request

-
 POST /testForm.do HTTP/1.1
 Accept: image/gif, image/x-xbitmap, image/jpeg,
 image/pjpeg,
 application/x-shockwave-flash, */*
 Referer: http://pbdesktop/test.do
 Accept-Language: en-us
 Content-Type: multipart/form-data;
 boundary=---7d319628600e4
 Accept-Encoding: gzip, deflate
 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0;
 Windows NT 5.0)
 Host: pbdesktop
 Content-Length: 141
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: SERVER=op; locale=en_US;
 JSESSIONID=aoUCARQpqsLd
 
 -7d319628600e4
 Content-Disposition: form-data; name=test
 
 #38086;
 -7d319628600e4-
 --- End HTTP Request

--
 
 It looks as if the character is already messed up
 before it even gets to
 the servlet container.  There are messages in 
 the mailing list archive that discuss this problem,
 but I didn't see a
 solution.  What is the best way to handle UTF-8 
 characters in a multipart/form-data encoded form?
 
 Here is the code that I am testing with:
 
 /test/test.jsp:
 %@ taglib uri=WEB-INF/taglib/struts-html.tld
 prefix=html %
 %@ taglib uri=WEB-INF/taglib/struts-bean.tld
 prefix=bean %
 
 html:html
body
  html:form action=testForm.do
 enctype=multipart/form-data
html:text property=test /
html:submit /
  /html:form
/body
 /html:html
 
 Relavent parts of struts-config.xml:
 struts-config
 
form-beans
  form-bean name=testForm
 type=test.TestActionForm /
/form-beans
 
action-mappings
  action path=/test
 type=org.apache.struts.actions.ForwardAction
 parameter=/test/test.jsp /
  action path=/testForm type=test.TestAction
 name=testForm
 input=/test.do scope=request /
/action-mappings
 
controller contentType=text/html;charset=UTF-8
 /
 
 struts-config/
 
 test.TestAction:
 package test;
 
 import javax.servlet.http.*;
 import org.apache.commons.logging.*;
 import org.apache.struts.action.*;
 
 public class TestAction extends Action {
   private static final Log log =
 LogFactory.getLog(TestAction.class);
   
   public ActionForward execute(
   ActionMapping mapping,
   ActionForm pform,
   HttpServletRequest request,
   HttpServletResponse response)
   throws Exception {
   TestActionForm form = (TestActionForm)pform;
   log.info(The value is: +form.getTest());
   return null;
   }
 }
 
 test.TestActionForm:
 package test;
 
 import 

RE: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread José Gustavo Zagato
I have doubts on it also... 
The only thing that I'm doing at the front end is to set the encode to
utf-8 I will double check it..

Regards...

  José Gustavo Zagato Rosa
System Analyst - Atos Origin
[EMAIL PROTECTED]


-Original Message-
From: Paul Barry [mailto:[EMAIL PROTECTED] 
Sent: terça-feira, 28 de outubro de 2003 12:26
To: Struts Users Mailing List
Subject: Re: Problem with UTF-8 characters in a mutlipart/form-data
encoded form

Does it work with multipart/form-data encoding?  It seems to me that
this problem is happening before the form is 
submitted to the servlet container (take a look at the value of test
in the HTTP request with Content-Type: 
multipart/form-data in my original post), so the servlet filter wouldn't
help, but I could be wrong.

José Gustavo Zagato wrote:

 Hi !
 
   I don't if it will fit into your needs but, to handler UTF-8 I
 build a serverlet filter with handles all encode / Decode operations.
As
 far as I know this approach is not a pure Struts solution but works
 really fine !
 I didn't test with a upload form like yours, but it’s a shot !
 
 Regards
 
   José Gustavo Zagato Rosa
 System Analyst - Atos Origin
 [EMAIL PROTECTED]
 
 
 -Original Message-
 From: Paul Barry [mailto:[EMAIL PROTECTED] 
 Sent: terça-feira, 28 de outubro de 2003 12:07
 To: [EMAIL PROTECTED]
 Subject: Problem with UTF-8 characters in a mutlipart/form-data
encoded
 form
 
 I am using Struts 1.1 in an application that needs to support the
UTF-8
 character set.  I am using Resin 2.1.10 with 
 character-encoding=UTF-8, and on most of my forms this seems to work
 just fine.  I am having problems with forms that 
 have to use the multipart/form-data enctype for handling uploading
 files.  If I print out the value of a text element in 
 an html:form where the enctype is not set at all (which ends up using
 application/x-www-form-urlencoded), using UTF-8 
 characters works fine.  This is what I get:
 
 INFO - test.TestAction - The value is: ä
 
 Here is what the actual HTTP request that gets sent to the server
looks
 like:
 
 --- Start HTTP Request
 -
 POST /testForm.do HTTP/1.1
 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
 application/x-shockwave-flash, */*
 Referer: http://pbdesktop/test.do
 Accept-Language: en-us
 Content-Type: application/x-www-form-urlencoded
 Accept-Encoding: gzip, deflate
 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
 Host: pbdesktop
 Content-Length: 11
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
 
 test=%C3%AD
 --- End HTTP Request
 --
 
 But if I modify my html:form to use enctype=multipart/form-data, I
get
 this:
 
 INFO - test.TestAction - The value is: A¤
 
 And the HTTP request looks like this:
 
 --- Start HTTP Request
 -
 POST /testForm.do HTTP/1.1
 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
 application/x-shockwave-flash, */*
 Referer: http://pbdesktop/test.do
 Accept-Language: en-us
 Content-Type: multipart/form-data;
 boundary=---7d319628600e4
 Accept-Encoding: gzip, deflate
 User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
 Host: pbdesktop
 Content-Length: 141
 Connection: Keep-Alive
 Cache-Control: no-cache
 Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
 
 -7d319628600e4
 Content-Disposition: form-data; name=test
 
 í
 -7d319628600e4-
 --- End HTTP Request
 --
 
 It looks as if the character is already messed up before it even gets
to
 the servlet container.  There are messages in 
 the mailing list archive that discuss this problem, but I didn't see a
 solution.  What is the best way to handle UTF-8 
 characters in a multipart/form-data encoded form?
 
 Here is the code that I am testing with:
 
 /test/test.jsp:
 %@ taglib uri=WEB-INF/taglib/struts-html.tld prefix=html %
 %@ taglib uri=WEB-INF/taglib/struts-bean.tld prefix=bean %
 
 html:html
body
  html:form action=testForm.do enctype=multipart/form-data
html:text property=test /
html:submit /
  /html:form
/body
 /html:html
 
 Relavent parts of struts-config.xml:
 struts-config
 
form-beans
  form-bean name=testForm type=test.TestActionForm /
/form-beans
 
action-mappings
  action path=/test
type=org.apache.struts.actions.ForwardAction
 parameter=/test/test.jsp /
  action path=/testForm type=test.TestAction name=testForm
 input=/test.do scope=request /
/action-mappings
 
controller contentType=text/html;charset=UTF-8 /
 
 struts-config/
 
 test.TestAction:
 package test;
 
 import javax.servlet.http.*;
 import org.apache.commons.logging.*;
 import org.apache.struts.action.*;
 
 public class TestAction extends

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread Paul Barry
I think the the problem is that the browser is not encoding the text field in UTF-8.  Supposedly setting the 
accept-charset attribute of the HTML form tag to UTF-8 will make it encode the field in UTF-8.  Unfortunately, there is 
no accept-charset property of the html:form strugs tag, although there is a request for one to be added:

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21986

And on top of that, the accept-charset doesn't seems to work for me anyway when I try it outside of the html:form tag. 
For example, if I make an HTML form like this:

  form action=testForm.do enctype=multipart/form-data method=post 
accept-charset=UTF-8
input type=text name=test /
input type=submit /
  /form
If I use ethereal to capture the HTTP request as it received by the server (before it ever gets to the actual servlet 
container) the cahacters don't show up correctly.  For example, the single charater  turns into the two characters .

But, interestingly enough, using IE if I have View  Encoding set to Auto-Select, it does encode the data in UTF-8 and 
when it gets to my struts action, I correctly have a .  But, if I uncheck View  Encoding  Auto-Select, even though 
just below that View  Encoding  Unicode (UTF-8) is selected, the data then doesn't get encoding correctly and I end up 
with .  So it sounds like this isn't really a struts problem and more a browser problem, but how do I get the browser 
to encode the data in UTF-8?  I am doing something wrong with accept-charset?

Jos Gustavo Zagato wrote:

I have doubts on it also... 
The only thing that I'm doing at the front end is to set the encode to
utf-8 I will double check it..

Regards...

  Jos Gustavo Zagato Rosa
System Analyst - Atos Origin
[EMAIL PROTECTED]
-Original Message-
From: Paul Barry [mailto:[EMAIL PROTECTED] 
Sent: tera-feira, 28 de outubro de 2003 12:26
To: Struts Users Mailing List
Subject: Re: Problem with UTF-8 characters in a mutlipart/form-data
encoded form

Does it work with multipart/form-data encoding?  It seems to me that
this problem is happening before the form is 
submitted to the servlet container (take a look at the value of test
in the HTTP request with Content-Type: 
multipart/form-data in my original post), so the servlet filter wouldn't
help, but I could be wrong.

Jos Gustavo Zagato wrote:


Hi !

I don't if it will fit into your needs but, to handler UTF-8 I
build a serverlet filter with handles all encode / Decode operations.
As

far as I know this approach is not a pure Struts solution but works
really fine !
I didn't test with a upload form like yours, but its a shot !
Regards

 Jos Gustavo Zagato Rosa
System Analyst - Atos Origin
[EMAIL PROTECTED]
-Original Message-
From: Paul Barry [mailto:[EMAIL PROTECTED] 
Sent: tera-feira, 28 de outubro de 2003 12:07
To: [EMAIL PROTECTED]
Subject: Problem with UTF-8 characters in a mutlipart/form-data
encoded

form

I am using Struts 1.1 in an application that needs to support the
UTF-8

character set.  I am using Resin 2.1.10 with 
character-encoding=UTF-8, and on most of my forms this seems to work
just fine.  I am having problems with forms that 
have to use the multipart/form-data enctype for handling uploading
files.  If I print out the value of a text element in 
an html:form where the enctype is not set at all (which ends up using
application/x-www-form-urlencoded), using UTF-8 
characters works fine.  This is what I get:

INFO - test.TestAction - The value is: 

Here is what the actual HTTP request that gets sent to the server
looks

like:

--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 11
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
test=%C3%AD
--- End HTTP Request
--
But if I modify my html:form to use enctype=multipart/form-data, I
get

this:

INFO - test.TestAction - The value is: A

And the HTTP request looks like this:

--- Start HTTP Request
-
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: multipart/form-data;
boundary=---7d319628600e4
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 141
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
-7d319628600e4
Content

Re: Problem with UTF-8 characters in a mutlipart/form-data encoded form

2003-10-28 Thread Jason Lea
From what I can see there Resin is expecting UTF-8 for any paramters 
passed to it, and decoding it correctly.  However multipart/form-data is 
treated differently as the data is not passed as normal parameters so 
the request.getParameter() cannot be used here (and servlet filters that 
set the request encoding won't help either).

You normally have to use something like the FileUpload component to 
extract form fields and files from the request.  This component is not 
going to know about the character encoding you have given to resin, so 
it will use the default which is probably US-ASCII.  With UTF-8 a single 
character can be rendered as 1, 2 or 3 bytes.  When decoding a UTF-8 
string the decoder will combine the 1,2 or 3 byte combinations into 1 
Unicode character.  When UTF-8 is not used to decode the string you will 
see the individual bytes.

Looking here (the jakarta apache FileUpload component):
http://jakarta.apache.org/commons/fileupload/apidocs/org/apache/commons/fileupload/FileUploadBase.html
They have a setHeaderEncoding() method which I assume will deal with 
this problem (I haven't tested this so I don't know).  Are you using a 
file upload component?

Paul Barry wrote:

I am using Struts 1.1 in an application that needs to support the UTF-8 character set.  I am using Resin 2.1.10 with 
character-encoding=UTF-8, and on most of my forms this seems to work just fine.  I am having problems with forms that 
have to use the multipart/form-data enctype for handling uploading files.  If I print out the value of a text element in 
an html:form where the enctype is not set at all (which ends up using application/x-www-form-urlencoded), using UTF-8 
characters works fine.  This is what I get:

INFO - test.TestAction - The value is: ä

Here is what the actual HTTP request that gets sent to the server looks like:

--- Start HTTP Request -
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, 
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 11
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
test=%C3%AD
--- End HTTP Request --
But if I modify my html:form to use enctype=multipart/form-data, I get this:

INFO - test.TestAction - The value is: A¤

And the HTTP request looks like this:

--- Start HTTP Request -
POST /testForm.do HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, 
application/x-shockwave-flash, */*
Referer: http://pbdesktop/test.do
Accept-Language: en-us
Content-Type: multipart/form-data; boundary=---7d319628600e4
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
Host: pbdesktop
Content-Length: 141
Connection: Keep-Alive
Cache-Control: no-cache
Cookie: SERVER=op; locale=en_US; JSESSIONID=aoUCARQpqsLd
-7d319628600e4
Content-Disposition: form-data; name=test
í
-7d319628600e4-
--- End HTTP Request --
It looks as if the character is already messed up before it even gets to the servlet container.  There are messages in 
the mailing list archive that discuss this problem, but I didn't see a solution.  What is the best way to handle UTF-8 
characters in a multipart/form-data encoded form?

Here is the code that I am testing with:

/test/test.jsp:
%@ taglib uri=WEB-INF/taglib/struts-html.tld prefix=html %
%@ taglib uri=WEB-INF/taglib/struts-bean.tld prefix=bean %
html:html
  body
html:form action=testForm.do enctype=multipart/form-data
  html:text property=test /
  html:submit /
/html:form
  /body
/html:html
Relavent parts of struts-config.xml:
struts-config
  form-beans
form-bean name=testForm type=test.TestActionForm /
  /form-beans
  action-mappings
action path=/test type=org.apache.struts.actions.ForwardAction 
parameter=/test/test.jsp /
action path=/testForm type=test.TestAction name=testForm input=/test.do 
scope=request /
  /action-mappings
  controller contentType=text/html;charset=UTF-8 /

struts-config/

test.TestAction:
package test;
import javax.servlet.http.*;
import org.apache.commons.logging.*;
import org.apache.struts.action.*;
public class TestAction extends Action {
private static final Log log = LogFactory.getLog(TestAction.class);

public ActionForward execute(
ActionMapping mapping,
ActionForm pform,
HttpServletRequest request,
HttpServletResponse response)