On 19.11.2009, at 19:06, Thorbjoern Ravn Andersen wrote:

> Hi.
> 
> We have reached a situation where I basically want to log a data structure in 
> order to be able to process it later.
> 
> After a bit of pondering, I have concluded that the best approach for us to 
> do this would be to use the XMLEncoder/XMLDecoder in Java 1.4+ and log the 
> generated XML snippets.
> 
> The issue I want to solve is that the XMLEncoder writes an UTF-8 encoded XML 
> file to an OutputStream, i.e. a byte oriented destination.  To the best of my 
> knowledge the slf4j backends all deal with Strings, i.e. character oriented 
> destinations and the output files are written in the default encoding for the 
> platform.
> 
> The question now is, what is the best way to handle the OutputStream 
> generated by XMLEncoder so it will survive all attempts to mess up any 
> unicode characters inside due to encoding differences on the way.  I will be 
> using a custom layout anyway so much can be done :)  A humanly readable 
> transport format will be preferred.
> 

XMLEncoder will have a severe impact on performance, I've tested this 
extensively.
Have a look at 
http://sourceforge.net/apps/trac/lilith/wiki/SerializationPerformance
In my testcases, XMLEncoder serialized 300 events while a protobuf serializer 
managed to handle nearly 10.000!
I'd therefore suggest that you take a mixed approach. Using protobuf to 
serialize the events to a file and writing an additional converter to convert 
that files to whatever you'd like as XML-Output as needed.

A discussion about such a topic was started here: 
http://marc.info/?l=logback-dev&m=124905434331308&w=2 but I completely forgot 
to file an RFE for it.
I've done just that now, thanks for the reminder!
http://jira.qos.ch/browse/LBCORE-128

> My current thoughts is to use a ByteArrayOutputStream and generate a String 
> using the UTF-8 decoding.  The resulting string contains a <?xml ... 
> encoding="UTF-8"?> which is stripped resulting in an XML String containing 
> Unicode chars (instead of encoded bytes).  This can then be flattened to an 
> ASCII version, by converting all characters outside 32..127 to their numeric 
> entity (&#1234;), and THAT can be safely logged.  I guess :)
> 

That would probably work but it would further decrease the serialization speed.
Logback (assuming you use Logback) should really support binary, i.e. 
byte-based, logfiles since this would really make a major performance 
difference. This should be discussed over at logback-dev, though.

> I'd appreciate comments on my thoughts, as this is a rather important 
> intermediate step in us using log files to store information which can be 
> used to simulate an external system when replaying an interesting sesion.
> 
> 

HTH,
Joern.

_______________________________________________
user mailing list
user@slf4j.org
http://www.slf4j.org/mailman/listinfo/user

Reply via email to