Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Boyd Fletcher
Actually the W3C binary XML standard when compared to traditional
compression standards like Zip is significantly better. The binary
conversion process also compresses file.

You might want to read:

http://www.w3.org/XML/EXI/
http://www.w3.org/TR/2007/WD-exi-measurements-20070725/
http://www.w3.org/TR/xbc-characterization/#N107D4

BTW, Fast Infoset was not selected by the W3C.



On 2/14/08 5:04 PM, Fabio Forno [EMAIL PROTECTED] wrote:

 On Thu, Feb 14, 2008 at 9:39 PM, Dave Cridland [EMAIL PROTECTED] wrote:
 
   I've never been all that convinced about binary XML forms. They work
   to a degree with the highly fixed XML in, for example, SyncML, and
   they're pretty good at compressing individual stanza-like objects
   over SMS for things like OMA EMN (Email Message Notification, or
   something - I've long since forgotten what these acronyms stand for),
   but for long-running streams I'm under the impression that studies
   show it'll be outperformed.
 
   So if you're a big fan of Binary XML formats, please bring along your
   figures. :-)
 
 Missing the reference, but you should get the best with binary +
 compression, however it's not worth the candle, since EXTENSIBLE
 binary xml is not easy (there are fast infosets, but the specification
 is incredibly complex) and the gain is not so high
 
 --
 Fabio Forno, Ph.D.
 Bluendo srl http://www.bluendo.com
 jabber id: [EMAIL PROTECTED]
 




Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Boyd Fletcher
Dave,

take a look at http://www.agiledelta.com/w3c_binary_xml_proposal.html and
http://www.idealliance.org/papers/xml02/dx_xml02/papers/06-02-04/06-02-04.pd
f. The W3C spec is based on Agile Delta¹s EfficientXML. the data I have seen
on EfficientXML indicate that it many times more efficient on than Zip.

1362 byte message ­ strongly typed
WinZip 3.13 times smaller than original
EfficientXML 75.67 times smaller than original

980 byte message ­ loosely type
WinZip 1.6 times smaller than original
Efficient XML 8.45 times smaller than original

21437 byte message
Winzip 6 times smaller
Efficient XML 33 times smaller

I have other data for large message sizes if interested. Unfortunately I
can¹t provide the raw data or the messages used. But group that did the
study tested the messages with WinZip, MPEG-7+BIM, Xmill, Efficient XML,
ASN.1 PER, and WBXML-like. Efficient XML beat them all by a large margin.


Binary XML will help out in two significant errors where XMPP is used:

1. can be a significant reduce in b/w used. Which can have a big impact on
the performance of a server
2. faster processing in the chat server. reading XML is expensive. most of
the binary XML formats were designed to be not only much smaller in size but
also much less CPU intensive to process. This should in theory dramatically
improve the scalability of a given XMPP server.

boyd


On 2/14/08 3:39 PM, Dave Cridland [EMAIL PROTECTED] wrote:

 On Thu Feb 14 20:08:53 2008, Peter Saint-Andre wrote:
  Here's a list of things we might talk about:
 
  1. Recommendations regarding when to use the TCP binding and when
  to use
  the HTTP binding (BOSH).
 
  2. Compression via TLS or XEP-0138 (use it!). Also binary XML as a
  compression mechanism.
 
 
 I've never been all that convinced about binary XML forms. They work
 to a degree with the highly fixed XML in, for example, SyncML, and
 they're pretty good at compressing individual stanza-like objects
 over SMS for things like OMA EMN (Email Message Notification, or
 something - I've long since forgotten what these acronyms stand for),
 but for long-running streams I'm under the impression that studies
 show it'll be outperformed.
 
 So if you're a big fan of Binary XML formats, please bring along your
 figures. :-)
 
 
  3. Fast reconnect to avoid TLS+SASL+resource-binding packets.
 
 
 Lots of work from mobile email (ie, Lemonade) is transferrable here.
 It'd be really nice if Tony Finch was coming, since he could talk us
 through QTLS and QUICKSTART - they're SMTP fast startup work he did a
 while back. Very interesting, but didn't make it into the Lemonade
 Profile itself.
 
 
  4. ETags for roster-get (see XEP-0150, let's resurrect that).
 
 
 (Om. Looks quite ugly, IMHO. I'll do a counter-proposal)
 
 
  5. Advisability of presence-only connections (no roster-get, just
  send
  presence and whatever you receive is nice).
 
 
 If you can optimize the roster fetch sufficiently, this really isn't
 required.
 
 
  Anything else?
 
 Beer, obviously.
 
 Dave.
 --
 Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
   - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
   - http://dave.cridland.net/
 Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade
 




Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Dave Cridland

(Hey, where did that space come from in the subject line?)

On Thu Feb 14 22:06:19 2008, Boyd Fletcher wrote:

1362 byte message ­ strongly typed
WinZip 3.13 times smaller than original
EfficientXML 75.67 times smaller than original

980 byte message ­ loosely type
WinZip 1.6 times smaller than original
Efficient XML 8.45 times smaller than original

21437 byte message
Winzip 6 times smaller
Efficient XML 33 times smaller


Interesting, certainly. My impression has been that binary XML  
formats handle cases best where the schema is fixed, and the data is  
relatively tightly marked up, and the overall document length is low.


Our data is heavy on the text, and our overall schema varies wildly,  
and our documents are quite big.


The Efficient XML Interchange Measurements Note seems to back up this  
impression I have:


The best improvements compared to gzipped XML in the Both case come  
for small documents, which also have sufficient schema information,  
i.e., the FixML and CBMS  groups. Here FXDI and Efficient XML (and  
ASN.1 PER in some cases) manage to achieve a clear improvement,  
sometimes even under half the size of gzipped XML. For the larger  
documents there appears to be no gain over the Document case. For  
example, there is no size difference between gzipped XML and any of  
the candidates for the Seismic document, in contrast to the Schema  
case.


To my mind, the figures and graphs there suggest that improvements  
over DEFLATE will be marginal at best for our kind of data.


But I'll do my reading, certainly, as well as getting some figures  
for some XMPP session compression using existing mechanisms -  
assuming I can. (I vaguely recall that the jabber.org server does  
XEP-0138, and I know ours does TLS compression - I could stick  
XEP-0138 in it quite quickly I think as a test).


Dave.
--
Dave Cridland - mailto:[EMAIL PROTECTED] - xmpp:[EMAIL PROTECTED]
 - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
 - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade


Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Fabio Forno
On Fri, Feb 15, 2008 at 12:03 AM, Dave Cridland [EMAIL PROTECTED] wrote:

  To my mind, the figures and graphs there suggest that improvements
  over DEFLATE will be marginal at best for our kind of data.

That's my point as you can read in my other mail, benchmarks are too
sensitive to the nature of data. Before FOSDEM I can produce some
figures with real xmpp data using zlib and, I hope, with also some
binary xml. Anyway I can anticipate that with zlib the size of a whole
message stanza is often shorter or minimally longer than the
uncompressed body alone: do we really need better performance?

-- 
Fabio Forno, Ph.D.
Bluendo srl http://www.bluendo.com
jabber id: [EMAIL PROTECTED]


Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Boyd Fletcher



On 2/14/08 5:57 PM, Fabio Forno [EMAIL PROTECTED] wrote:

 On Thu, Feb 14, 2008 at 11:06 PM, Boyd Fletcher
 [EMAIL PROTECTED] wrote:
 
   1362 byte message ­ strongly typed
   WinZip 3.13 times smaller than original
   EfficientXML 75.67 times smaller than original
 
   980 byte message ­ loosely type
   WinZip 1.6 times smaller than original
   Efficient XML 8.45 times smaller than original
 
   21437 byte message
   Winzip 6 times smaller
   Efficient XML 33 times smaller
 
 Uhm, I've seen them, they are little significative for xmpp traffic.
 Try the same benchmarks on real xmpp streams and you see that the
 difference is not so high. The reason? Much of the redundancy comes
 from attribute values such as to, from, type and so on. Since
 it's almost impossible to make assumptions about the values of
 attributes, but few like type where sometimes there are restrictions
 on the schema, usually binary xmls don't use dictionaries and
 therefore they don't lead to any gains in these cases. Moreover in
 streams there is an incredibly high correlations between stanzas,
 making zlib to perform pretty better than in the single message
 scenario. Yep, at the end there is a gain, but it's much smaller than
 optimizing roster and presence stanza exchange and making the
 connection manager cache some information and answer for the client.
 
I agree that protocol improvements are in order. But XMPP data was looked at
but some of the folks on the W3 committee as example data and the
compression was significant. There has also been some internal testing in
DOD using EfficientXML with captured XMPP data streams and we have seen a
decrease in size of 4-5 times compared to zip lib approach.

 
 
 
  can be a significant reduce in b/w used. Which can have a big impact on the
  performance of a server
  faster processing in the chat server. reading XML is expensive. most of the
  binary XML formats were designed to be not only much smaller in size but
  also much less CPU intensive to process. This should in theory dramatically
  improve the scalability of a given XMPP server.
 
 Instead I agree on this topic, though I think you can get the best
 advantages while connecting very limited nodes such as in sensor
 networks.
 To make it clear:
 - I don't think that in the wired internet the relatively small
 advantages you can get abandoning text based xml can pay off; for text
 xml you have a high number of reliable libraries in any language,
 while the binary xml is still far from being mature
 
I strongly disagree. we have using binary XML for years and the libraries
are quite stable and reliable. Unfortunately there just aren¹t very many
open source libraries. Hopefully that will change over the next 2 years as
W3C¹s EXI specification is ratified.

In very high production environments, hundred of thousands of
users/connections the difference in binary XML vs. regular XML can be
significant not just in reduced bandwidth utilization but also in reduced
CPU overview in processing the XML data.

A couple of years ago, one of the large stock exchanges tried to switch to
XML as the data transport. It tanked because the servers could not process
the XML fast enough to keep up with the transaction rate. They switched back
to their legacy binary protocol within 2 days.


 - In edge cases such as mobiles and sensor networks xml bindings may
 have a sense, especially for computational constraints, but in these
 cases (more true for sensors)  it's also very likely to use a
 downsized version of xmpp, connecting to a proxy acting as a gateway
 
 --
 Fabio Forno, Ph.D.
 Bluendo srl http://www.bluendo.com
 jabber id: [EMAIL PROTECTED]
 




Re: [Standards] mobile optimizations (was: Re: G oogle Andro ï d SDK not XMPP compliant ?)

2008-02-14 Thread Fabio Forno
On Fri, Feb 15, 2008 at 12:10 AM, Boyd Fletcher
[EMAIL PROTECTED] wrote:

  I agree that protocol improvements are in order. But XMPP data was looked
 at but some of the folks on the W3 committee as example data and the
 compression was significant. There has also been some internal testing in
 DOD using EfficientXML with captured XMPP data streams and we have seen a
 decrease in size of 4-5 times compared to zip lib approach.

Just to setup the correct benchmark: you mean EfficientXML +
compression or EfficientXML alone? (I promise on the weekend I try to
get some figures out, but without compression it's difficult to
believe you can get those improvements)

  I strongly disagree. we have using binary XML for years and the libraries
 are quite stable and reliable. Unfortunately there just aren't very many
 open source libraries.

Indeed that was I meant, sorry for not being clear.

 Hopefully that will change over the next 2 years as
 W3C's EXI specification is ratified.

That was the other point about the maturity, I should have used
consensus: though having some libs, it is very difficult to base
some extension of xmpp on a not ratified standard and choose between
the many xml binding options. If the situation changes (or has
changed) I'd be happy to jump again on the binary supporters side,
where I was before trying to implement it for j2me ;)

  In very high production environments, hundred of thousands of
 users/connections the difference in binary XML vs. regular XML can be
 significant not just in reduced bandwidth utilization but also in reduced
 CPU overview in processing the XML data.

  A couple of years ago, one of the large stock exchanges tried to switch to
 XML as the data transport. It tanked because the servers could not process
 the XML fast enough to keep up with the transaction rate. They switched back
 to their legacy binary protocol within 2 days.

I don't have troubles in believing this, but the scenario - I guess -
is slightly different, since I don't think that their format had many
extensibility features (when the grammar is not fixed you loose most
of the possible optimizations)

-- 
Fabio Forno, Ph.D.
Bluendo srl http://www.bluendo.com
jabber id: [EMAIL PROTECTED]