Encoding issue

using namespace Sat, 27 Jun 2015 12:18:05 -0700

I have deployed a web-service on TomEE 1.7.1 and currently having encoding
problem when I work with request xml data. The web-service implements one
method, which receives and xml data inside a SOAP message like following:


<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/";
xmlns:soap="http://tempuri.org/soaprequest";>
   <soapenv:Header/>
   <soapenv:Body>
      <soap:soaprequest>
         <soap:streams>
            <soap:soapin contentType="?">
               <soap:Value>
                  

                     <tag_a>cyrillic text here...</tag_a>
                  

               </soap:Value>
            </soap:soapin>
         </soap:streams>
      </soap:soaprequest>
   </soapenv:Body>
</soapenv:Envelope>

Inside the web-service implementation class I retrieve everything from 
 tag and cast it to String:

                        Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();                        
                        Node node = (Node) soapinElement;
                        Document document = node.getOwnerDocument();
                        DOMImplementationLS domImplLS = (DOMImplementationLS)   
     
document.getImplementation();                   
                        LSSerializer serializer = 
domImplLS.createLSSerializer();
                        LSOutput output = domImplLS.createLSOutput();
                        output.setEncoding("UTF-8");
                        Writer stringWriter = new StringWriter();
                        output.setCharacterStream(stringWriter);
                        serializer.write(document, output);
                        String soapinString = stringWriter.toString();

And then I put soapinString into Oracle database CLOB column.

Everything is great when SOAP message is encoded in UTF-8, but I get
unreadable characters when SOAP message has different encoding, like CP1251
and what I see in Oracle as a result is:

                  

                     <tag_a>РћР’Р” Р’РћР</tag_a>
                  


I tried encoding conversion like this:

                        Element soapinElement = (Element)
streams.getSoapin().getValue().getAny();                        
                        Node node = (Node) soapinElement;
                        Document document = node.getOwnerDocument();
                        DOMImplementationLS domImplLS = (DOMImplementationLS)
document.getImplementation();                   
                        LSSerializer serializer = 
domImplLS.createLSSerializer();
                        LSOutput output = domImplLS.createLSOutput();
                        ByteArrayOutputStream byteArrayOutputStream = new
ByteArrayOutputStream();
                        output.setByteStream(byteArrayOutputStream);
                        byte[] result = byteArrayOutputStream.toByteArray();
                        InputStream is = new ByteArrayInputStream(result);
                        Reader reader = new InputStreamReader(is, 
"windows-1251");
                        OutputStream out = new ByteArrayOutputStream();
                        Writer writer = new OutputStreamWriter(out, "UTF-8");
                        writer.write("\uFEFF"); 
            char[] buffer = new char[10];
            int read;
            while ((read = reader.read(buffer)) != -1) {
                writer.write(buffer, 0, read);
            }                   
            reader.close();
            writer.close();
            serializer.write((Node) out, output);
            String soapinString = output.toString();

But it produces something that looks like byte code.
I would like to ask for some suggestions on possible ways to resolve
encoding conversion to UTF-8.



--
View this message in context: 
http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html
Sent from the TomEE Users mailing list archive at Nabble.com.

Encoding issue

Reply via email to