On 19.11.2009, at 19:06, Thorbjoern Ravn Andersen wrote: > Hi. > > We have reached a situation where I basically want to log a data structure in > order to be able to process it later. > > After a bit of pondering, I have concluded that the best approach for us to > do this would be to use the XMLEncoder/XMLDecoder in Java 1.4+ and log the > generated XML snippets. > > The issue I want to solve is that the XMLEncoder writes an UTF-8 encoded XML > file to an OutputStream, i.e. a byte oriented destination. To the best of my > knowledge the slf4j backends all deal with Strings, i.e. character oriented > destinations and the output files are written in the default encoding for the > platform. > > The question now is, what is the best way to handle the OutputStream > generated by XMLEncoder so it will survive all attempts to mess up any > unicode characters inside due to encoding differences on the way. I will be > using a custom layout anyway so much can be done :) A humanly readable > transport format will be preferred. >
XMLEncoder will have a severe impact on performance, I've tested this extensively. Have a look at http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance In my testcases, XMLEncoder serialized 300 events while a protobuf serializer managed to handle nearly 10.000! I'd therefore suggest that you take a mixed approach. Using protobuf to serialize the events to a file and writing an additional converter to convert that files to whatever you'd like as XML-Output as needed. A discussion about such a topic was started here: http://marc.info/?l=logback-dev&m=124905434331308&w=2 but I completely forgot to file an RFE for it. I've done just that now, thanks for the reminder! http://jira.qos.ch/browse/LBCORE-128 > My current thoughts is to use a ByteArrayOutputStream and generate a String > using the UTF-8 decoding. The resulting string contains a <?xml ... > encoding="UTF-8"?> which is stripped resulting in an XML String containing > Unicode chars (instead of encoded bytes). This can then be flattened to an > ASCII version, by converting all characters outside 32..127 to their numeric > entity (Ӓ), and THAT can be safely logged. I guess :) > That would probably work but it would further decrease the serialization speed. Logback (assuming you use Logback) should really support binary, i.e. byte-based, logfiles since this would really make a major performance difference. This should be discussed over at logback-dev, though. > I'd appreciate comments on my thoughts, as this is a rather important > intermediate step in us using log files to store information which can be > used to simulate an external system when replaying an interesting sesion. > > HTH, Joern. _______________________________________________ user mailing list user@slf4j.org http://www.slf4j.org/mailman/listinfo/user