ср, 18 дек. 2019 г. в 20:12, Ralph Meijer <[email protected]>: > My assumption was that we are looking at character data on the abstract > layer /after/ parsing XML. You shouldn't see entities there (they'd be > resolved to their respective characters), nor should you see <![CDATA[]] > wrappers. > Hm, please, define 'abstract' layer more precisely. Citing example from the XEP proposal, which is the true abstract layer? this, [image: image.png], or this:[image: image.png] ? Or the layer with 'codepoints'? Is it really any better than escaped XML text?
This approach is also not very practical. When you do stanza processing on a server, most often you just take stanza as is, passing all references data without transferring data to abstract layer back and forth. Plus, when doing the web client this means an additional escaping - deescaping routine every time when something is sent-displayed, cause browsers require their own escaping. ср, 18 дек. 2019 г. в 20:41, Marvin W <[email protected]>: > [inline] > > On 12/18/19 3:22 PM, Andrew Nenakhov wrote: > > In the end we have settled for counting characters of escaped string, so > > This sounds like a terrible idea. In encoded XML, ">", ">", ">" > and "<!CDATA[>]]>" are equivalent. I just tried it out and servers > indeed do convert all of those to their shortest well-formed variant > (which is ">") so you cannot rely on their reference length at all. > Servers may at their discretion convert non-ascii characters to their > character reference form (starting with &#). I have seen this at least > once happening with emojis. > Why should standard be concerned about different server implementations converting anything? If a server does some converting for some reason from one way of escaping XML to another, of course it should recalculate all references. > > to draw *&&&* in a client we count it as string with a length of 15, > > thus <bold> reference points to characters 0..14: > > <reference xmlns="urn:xmpp:reference:0" begin="0" end="14" > > type="markup"><bold /></reference> > > Luckily for you, this looks pretty non-standard, ... > You are apparently mixing XEP-0372 and XEP-0394. > I am not mixing them. XEP-0394 is a pathetic ill-concieved nonsense, which couldn't even use the same attrubute names as preceeding references XEP: 0372 uses 'start' and 'end' and 394 uses 'begin' and 'end'. Standards, right. We chose to ignore both 394 and 385 and have develped a very uniform way to do all things we nee in messages - markup, links, images, voice messages, files, locations, etc. So far our 'non-standard' way of using references is in fact way more 'standard' than what is currently suggested by this mish-mash of different XEPs. ср, 18 дек. 2019 г. в 21:00, Ralph Meijer <[email protected]>: > On 18-12-2019 16:40, Marvin W wrote: > > [..] > > > > Also that's a weird counting there, usually I would expect end to > > point to the position after the last referenced character - at least > > that's what you do in most programming languages (e.g. > > "&&&"[0:14] will give you "&&&" without the > > last ";"). > > I'd not be opposed to changing the definition of 'end' here. Twitter > Entities [1] also points to the character after. Should we really be blindly fixed on copying Twitter approach, when, in fact, we have a significantly different use case? For one, Twitter entities are ALWAYS splitted by some symbol (space, punctuation marks). They never have url next to hashtag without some separator between them. The advantage of this approach is that you can derive the length of a reference by subtracting begin from end, but in return you end up with weird intersecting ranges: <body>*this**is**not**good*</body> 0..4: bold 4..6: italic 6..9: underscore 9..13: bold italic Not really cool, right? Also, by twitter own rules, the last indice should be 9..12, not 9..13: The second integer represents the location of the first non-URL character > occurring after the URL *(or the end of the string if the URL is the last > part of the Tweet text)* (emphasis mine). Since Twitter does not use null terminated strings, entity pointing to full "&&&" tweet would have indices [0, 14], not [0, 15] With all this written, I think it is safe to put to rest references (sic) to Twitter way of doing things. We thank them for inspiration, but that's it. We have different use cases. Cited example of programming languages is valid only in part. Yes, it is so in java or python, but not so in swift, obj-c or erlang. The last three use index of the first character and length, which is actually my favourite approach. ср, 18 дек. 2019 г. в 21:59, Marvin W <[email protected]>: > I don't think it really is a "change", in XEP-394 it is already defined > this way ("the last affected codepoint is the one just before end" [1]) > and the example in XEP-372 [2] also counts that way (char 72 is the "J" > of and char 78 is the space after "Juliet"). Only the text misleadingly > says "An end attribute is similarly used for the index of the last > character of the reference.", so this may need a clarification. > Well. I strongly object. Text of XEP-0372 clearly says that the end attribute uses the index of the last referenced character, not the character succeeding it. So the right thing to do here is to change value of the 'end' attribute from the example to '77' instead of '78'. ( Btw, did anyone but us implement this XEP at all? ) On 'already defined' 394. As we have learned from 0071 debacle, even widely implemented XEPs can be deprecated with vague reasoning, so deprecating a contradictory XEP that, to my knowledge, wasn't even implemented anywhere, shouldn't be too much of an issue. -- Andrew Nenakhov CEO, redsolution, OÜ https://redsolution.com <http://www.redsolution.com>
_______________________________________________ Standards mailing list Info: https://mail.jabber.org/mailman/listinfo/standards Unsubscribe: [email protected] _______________________________________________
