Sorry this is terribly late - I’ve been reviewing the Rayo XEP prior to voting 
on Draft, and I had a couple of questions/comments. This only covers the first 
half of the XEP (up to the end of section 6), as it seemed more useful for me 
to get the comments out than sit on them until I’m finished.

0) The initial diagram shows SIP being used, with Jingle being optional on the 
other side. I think this is just an example, but is it worth calling this out 
more explicitly in the diagram perhaps by replacing “SIP” with “e.g. SIP” and 
Jingle similarly?

1) Does leading with the examples help or hinder here? I found the examples at 
the start of one particular use case left more more confused than I think I 
would have been jumping straight in to what it’s trying to achieve. (No impact 
on going to Draft)

2) 5.1 (Actors) places requirements that these JIDs for components/mixers can 
only be only be under subdomains - why is this? AFAIK, this is the only part of 
XMPP that implies any relationship between a domain and a subdomain, and it 
doesn’t immediately seem like a useful restriction.

3) 5.1.6 Is calling things Components the most useful terminology here, when 
Components have a well-established meaning in XMPP (and a RAYO server is likely 
to be such a component).

4) 6.1’s reliance on a <show>chat</show> seems odd at best - wouldn’t a normal 
available presence be better here? I’m also not sure that the requirement for 
it to be directed presence is waranted - why wouldn’t broadcast presence work 
here?

5) 6.1 - if you want to rely on presence here, isn’t an unavailable presence 
the best way to signal unavailability? I don’t think it’s covered what 
receiving unavailable would mean here at the moment.

6) 6.2.1 Is how these metadata are handled defined?

7) 6.2.1 the uri attribute seems like it might be underspecified here. The 
server SHOULD try to create at the appropriate URI, but what happens if it 
decides not to (It’s not a MUST)? Similarly, what restrictions are there on how 
a client should form such a URI?

8) 6.2.1 How does the client discover the available URI schemes for to/from?

9) 6.2.1.1 “Third Party” is introduced as a term here for the first time, 
without explanation of which party this is.

10) 6.2.1.1 Use of presence for sending of notifications like this seems off. I 
realise this boat may have sailed, but it doesn’t seem right to me.

11) 6.2.1.2 Is it right that it has to treat this first as if there’s no join, 
and then process the join? So if it’s trying to join something that doesn’t 
exist, or is invalid, it should set up the call first, and only then say the 
join fails?

12) 6.2.2 Introduces “system” for the first time. Which of the entities is the 
system?

13) 6.6.2 Is requiring the server to immediately reject the call right here (I 
don’t know). I’m wondering if it might just let it ring, for example, until it 
has an available controlling party.

14) 6.6.2 MUST offer simultaneously - is this required? Why might it not offer 
to different entities in some staged order?

15) 6.6.2 MUST wait indefinitely - why is this required? If the original caller 
hangs up, for example, wouldn’t the server be able to stop waiting for a 
controller?

16) 6.3 The identifier for calls here is always a JID, isn’t it? If that’s the 
case, it’d make more sense to be using JIDs here, instead of adding the layer 
of indirection of a URI with a fixed scheme.

17) 6.3 I think here we’re getting into the territory where presence stanzas 
are really not inappropriate for this

18) 6.3.4 introduces a direction attribute that I don’t think has been defined 
anywhere at this point.

19) 6.4 "a server SHOULD represent a mixer internally using some alternative 
name scoped to the client's security zone and mapped to the friendly name/URI 
presented to the client for the emission of events and processing of commands” 
- I don’t entirely understand this. If it’s an internal representation, why is 
this important for interop?

20) "A mixer MUST be implicitly created the first time a call attempts to join 
it”. Is this required, or might there be scenarios where a mixer 
can’t/shouldn’t be created?

21) "Mixers MUST respect the normal rules of XMPP presence subscriptions. If a 
client sends directed presence to a mixer, the mixer MUST implicitly create a 
presence subscription for the client.” - but that isn’t the normal rule for 
presence subs, is it?

22) Example 43: It’s not immediately obvious to me what an empty output element 
means here, it seems to be different semantics to the use in Exmaple 6 of 
reading a document with text-to-speech.

23) Example 44: This introduces ‘active speaker detection’, but doesn’t explain 
what this is (or reference an explanation), I think.

24) "Once the last participant unjoins from the mixer, the mixer SHOULD be 
destroyed.” - in what scenarios would it be appropriate not to? Should this be 
discussed?

25) 6.5 "A server SHOULD implement all core components” - what are the 
implications for clients if the server doesn’t implement some of these?

26) 6.5.3 - a reference to SSML here would probably be appropriate.

27) "The component is created using an <output/> command, containing one or 
more documents to render” - I think this implies that the previous examples 
with <output…/> are invalid.

28) If the XML for SSML has to be escaped (which seems to be the case from the 
example), this should probably be called out.

29) 6.5.3.1 - I’m not sure why this is a SHOULD instead of a MUST?

30) 6.5.3.2 - I think a quick description of the necessary addressing here 
would be useful.

31) Example 69 - I think this doesn’t give the units of time for the seek 
except in the example title and would be worth calling out.

32) 6.5.4 I think some reference to DTMF and SRGS specs would be useful here.

33) 6.5.4 - How is discovery of the optional/extensible mechanisms discovered?

34) 6.5.4.1 - the SHOULD here seems more like it should be a MUST - is there a 
reason to do otherwise (and are there security implications or client 
implications?)

35) 6.5.4.4 - When would the nomatch expect to be triggered? Presumably it’s 
not firing off e.g. whenever anyone says anything that isn’t a DMTF when a DMTF 
input is configured? Can it trigger multiple times, or is it removed after a 
match?

36) 6.5.5 - I think the rules for what happens to the output when input begins 
aren’t defined. Although it’s implied that the output stops, does it continue 
again after input?

37) 6.5.6 says that there are options supplied, but the example shows none - 
should the text say they’re optional?

38) 6.5.6.1 When there are joins involved, can’t there be multiple callers? If 
so, how does that affect e.g. "In send mode, only the audio sent by the caller 
is recorded.”?

39) Links like 
http://xmpp.org/extensions/xep-0327.html#def-component-record-initial-timeout 
seem to be deadends

40) are x-skill and x-customer-id defined anywhere? I think the <header…/> 
stuff is new here (it doesn’t seem consistent with previous use of <header…/>). 
What are the rules for header here?

41) 6.6.2 - if the client can’t handle the call, what’re the other options than 
rejecting it? (MAY)

42) 6.8.1 - is feature-not-implemented an odd error to use for a protocol 
violation?


/K

Reply via email to