I have deployed a web-service on TomEE 1.7.1 and currently having encoding problem when I work with request xml data. The web-service implements one method, which receives and xml data inside a SOAP message like following:
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:soap="http://tempuri.org/soaprequest"> <soapenv:Header/> <soapenv:Body> <soap:soaprequest> <soap:streams> <soap:soapin contentType="?"> <soap:Value> <tag_a>cyrillic text here...</tag_a> </soap:Value> </soap:soapin> </soap:streams> </soap:soaprequest> </soapenv:Body> </soapenv:Envelope> Inside the web-service implementation class I retrieve everything from tag and cast it to String: Element soapinElement = (Element) streams.getSoapin().getValue().getAny(); Node node = (Node) soapinElement; Document document = node.getOwnerDocument(); DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation(); LSSerializer serializer = domImplLS.createLSSerializer(); LSOutput output = domImplLS.createLSOutput(); output.setEncoding("UTF-8"); Writer stringWriter = new StringWriter(); output.setCharacterStream(stringWriter); serializer.write(document, output); String soapinString = stringWriter.toString(); And then I put soapinString into Oracle database CLOB column. Everything is great when SOAP message is encoded in UTF-8, but I get unreadable characters when SOAP message has different encoding, like CP1251 and what I see in Oracle as a result is: <tag_a>РћР’Р” Р’РћР</tag_a> I tried encoding conversion like this: Element soapinElement = (Element) streams.getSoapin().getValue().getAny(); Node node = (Node) soapinElement; Document document = node.getOwnerDocument(); DOMImplementationLS domImplLS = (DOMImplementationLS) document.getImplementation(); LSSerializer serializer = domImplLS.createLSSerializer(); LSOutput output = domImplLS.createLSOutput(); ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); output.setByteStream(byteArrayOutputStream); byte[] result = byteArrayOutputStream.toByteArray(); InputStream is = new ByteArrayInputStream(result); Reader reader = new InputStreamReader(is, "windows-1251"); OutputStream out = new ByteArrayOutputStream(); Writer writer = new OutputStreamWriter(out, "UTF-8"); writer.write("\uFEFF"); char[] buffer = new char[10]; int read; while ((read = reader.read(buffer)) != -1) { writer.write(buffer, 0, read); } reader.close(); writer.close(); serializer.write((Node) out, output); String soapinString = output.toString(); But it produces something that looks like byte code. I would like to ask for some suggestions on possible ways to resolve encoding conversion to UTF-8. -- View this message in context: http://tomee-openejb.979440.n4.nabble.com/Encoding-issue-tp4675408.html Sent from the TomEE Users mailing list archive at Nabble.com.
