Olivier Crête wrote:
Hello,
I'm one of the developers of Farsight, a media streaming library.
Farsight is used as part of Telepathy to implement Jingle audio/video.
I've recently read the jingle draft and I have a few questions and
suggestions.
Jingle ICE-UDP
Is it really required to send candidates separately instead of sending
them in one batch? Sending them in one batch like the ICE-19 draft says
would make having a single implementation for Jingle/SIP more simple.
Also, ICE-19 needs to order all of the candidates pair before it does
anything..
The spec doesn't make it clear if it is acceptable to send multiple
candidates in one message; I can't see any reason why it shouldn't be
permitted. However, ICE will inevitably cause candidates to be
generated in multiple events (some instantly, some waiting for responses
from STUN and TURN servers). Because the instantly generated candidates
will be local, and therefore the highest priority, if an aggressive
implementation of ICE is used, when the two clients are on the same
network, it would be possible for ICE to complete before a STUN binding
response is ever received.
Jingle audio
4. Application format
Why make the name attribute of the payload-type tag optional at all? Why
is the profile optional? and if it stays optional a default should be
specified (probably RTP/AVP) ?
The name is optional for static payload types because we know the codec
simply from the payload number.
I agree that we need to always know the profile type. I'd prefer to
have it a required attribute.
5. Negotiation
Why make the semantics slightly different from those proposed in RFC
3264 (SDP Offer/Answer) ? The "declare what we can receive" differs from
how SOA is used with some codecs (eg. H.264, see RFC 3984 section
8.2.2). That also means that it does not accommodate codecs such as
H.264 has have config-data that has to be sent from the sender to the
receiver.
I believe that it should be possible to do H.264 without any information
being send from the sender to the receiver, although this means forgoing
the symmetry in capabilities which RFC 3984 mandates.
I'm very much in favor of recommending PCMA/U, but mandating it would be
a problem because its relatively high bandwidth. And RFC4733 should
probably be mandated for audio/tone and audio/telephone-event. In the
case of audio/telephone-event, the optional properties (the fmtp line in
SDP) does not have the a=b format, we should probably mandate the
parameter name "event" for the list of supported event types.
There's no need to mandate the "events" parameter; If absent, we assume
0-9, *, # and A-D. It should be possible to restrict this though,
(probably to 0-9, * and #), in which case putting:
<parameter name='events' value='0-11'/>
within the payload type tag would be the way to do this. Note 'events',
not 'event', as in 2.4.1 of RFC 4733.
4. Application format
Why is the height/width specified? Why most payload types, it can change
dynamically without the signalling being notified, for example in the
case of H.263. How does width/height related to x/y? Are x/y coordinates
inside a width/height sized area or is width/height the size of the
rectangle displayed at x/y ? In either case, both the size of the
picture and of the full frame should probably be included? And what is
the use case for these?
Height and width are required for some codecs (H.261) to specify the
maximum we can receive, while others do crazier things (H.264). In
fact, most of the none-required attributes seem to be codec-specific,
and should probably be outside the scope of XEP-0180.
7. Error Handling
Why is unsupported-codecs here but not in Jingle audio ?
Because everything will have G.711 in common? :-D
Jingle DTMF
Why is RFC4733 negotiated separately from others audio codecs? It seems
to be redundant with the regular negotiation of codecs.
Maybe there should just be an "on/off" negotiation of the XMPP DTMF
method separate from the use of RFC 4733. Also, sine, XMPP dtmf doesnt
not include any timing information, it could be argued that it is
actually less real-time than RFC 4733 DTMF.
Because we negotiate one audio channel, one video channel, and one DTMF
channel.
XMPP DTMF has timing information: all the messages are sent in real time
(within the constraints of TCP), so button press durations can be
reasonably accurately recovered.
--
Paul