Re: [Standards] Support for stickers (custom emojis)

2019-10-26 Thread Tedd Sterr
I presume that the majority of implementations will do the UTF-8 decoding 
before/during XML parsing, so with offsets specified as bytes they will likely 
awkwardly re-encode the string again to be able to cross-reference these byte 
offsets with the codepoint* offsets they need.

For those which must operate on the byte level, anything other than byte 
offsets is going to be awkward. You can still manage without fully decoding 
UTF-8 however, as all non-head bytes have the pattern 01xx, so counting 
only head bytes will lead you to the correct start-of-codepoint - though it's 
obviously a little more work than direct indexing.

Bytes has the possibility of all error cases that codepoints has, but bytes has 
the additional possibility of offsets landing mid-codepoint, while that's 
impossible if codepoints are your units.

As for mid-glyph offsets, is it such a problem beyond possibly displaying 
badly? Where it's assumed to be an error, an easy solution would be to quietly 
round the start/end offsets to the start/end of their glyphs - obviously this 
is handled most efficiently by the display layer, but presumably that's the 
only place it matters anyway.

>From another angle, I'd position XMPP above XML, and XML above the text 
>encoding scheme used (UTF-8), so then it seems wrong to be concerning 
>ourselves with details of the encoding scheme from the top level.



* It's probably worth mentioning that there are a number of confusions people 
have with Unicode, and saying 'character' when they mean 'codepoint' is one of 
them (as they're equivalent for the single-codepoint characters they're 
familiar with.)
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-25 Thread Sam Whited
On Fri, Oct 25, 2019, at 14:25, Marvin W wrote:
> Yes and no. multi-codepoint emojis are still valid characters when
> split, whereas multi-byte codepoints cannot be split. There is
> nothing wrong with displaying the flag 🇪🇺 as 🇪​🇺 *, so your
> implementation is always capable in strictly following any markup
> being done on a codepoint basis, even if the markup border is inside
> a multi-codepoint emoji.

I don't believe that this is always true, but I don't have a good
example off the top of my head, the flag one might be a bad example.
Sometimes splitting codepoints will not result in two things that can be
displayed, for example if you split just before a zero-width joiner I'm
not sure what the behavior should be for ZWJ followed by an emoji.

> Some programming languages handle strings in unicode codepoints
> instead of bytes.

"Some" being the operative word. We're not writing a protocol to be
easily used in only "some" programming languages.

> I agree that this would be an issue for non messaging content (i.e.
> large files) but I don't think we are talking about. For messaging
> content, it's no issue that the client has two decode all the bytes -
> it will be required to do so anyway for displaying.

This happens at some point, but it doesn't have to happen again at the
application layer. Like I said, it's a minor problem, but it's
definitely more work that I'd prefer not to do.

> Assuming you meant codepoint boundary instead of byte boundary

Indeed.

> I agree that this would also be an option, as long as we make sure
> people actually do these checks. I personally prefer codepoints, but
> both are valid and sane options - as long as we don't go with grapheme
> cluster or any like this, we are fine IMO.

I agree, I thought the answer was grepheme clusters for a while but the
more I think about it the more this thread has convinced me that it's a
bad solution.

Currently I'm leaning towards bytes: it's more or less the same as
code points except it's simpler to implement and verify and plays
nicer with low resource hardware in a trusted environment (where we
might not care about doing any checks and assume messages from certain
sources are trusted so we don't want to have to decode UTF-8 to figure
out where the boundary should be). It does add an extra error case,
but it's one that's obviously a fatal error and means the reference
can't be rendered: we just have to explicitly say that this makes the
reference invalid.

—Sam

-- 
Sam Whited
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-25 Thread Marvin W

On 10/25/19 3:15 PM, Sam Whited wrote:

On Thu, Oct 24, 2019, at 18:32, Marvin W wrote:
XMPP uses UTF-8, and there's almost no reason to use anything but UTF-8.


I do agree that this is true inside XMPP, but the data being transported 
inside XMPP might be transcoded to non-xmpp transport (examples: bridges 
to other networks, clients that don't do XMPP on c2s connections) and 
for those use-cases different encodings might occur. We shouldn't focus 
on non-UTF-8 encodings, but considering it also doesn't hurt.

This problem exists with codepoints too, though to a lesser extent and
it may be less clear how it should be handled in all cases. For example,
in the middle of a multi-codepoint emoji or country flag.


Yes and no. multi-codepoint emojis are still valid characters when 
split, whereas multi-byte codepoints cannot be split. There is nothing 
wrong with displaying the flag 🇪🇺 as 🇪​🇺 *, so your implementation 
is always capable in strictly following any markup being done on a 
codepoint basis, even if the markup border is inside a multi-codepoint 
emoji.



There's also the minor problem of having to decode all the bytes up to
the start position at the application layer if we have to count
codepoints. 


Some programming languages handle strings in unicode codepoints instead 
of bytes. I agree that this would be an issue for non messaging content 
(i.e. large files) but I don't think we are talking about. For messaging 
content, it's no issue that the client has two decode all the bytes - it 
will be required to do so anyway for displaying.



With bytes you only have two checks: is the start and the
end marker on a byte boundary? If so the string in the middle can be
assumed to be valid.


Assuming you meant codepoint boundary instead of byte boundary, I agree 
that this would also be an option, as long as we make sure people 
actually do these checks. I personally prefer codepoints, but both are 
valid and sane options - as long as we don't go with grapheme cluster or 
any like this, we are fine IMO.


Marvin

--

* I put a zero-width space in there to ensure your mail client is not 
going to merge the two characters.

___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-25 Thread Sam Whited
On Thu, Oct 24, 2019, at 18:32, Marvin W wrote:
> 1) At some point we might want to allow the usage of UTF-16 or any
>other encoding. Byte counts would have to be translated when re-
>encoding which a server is probably unable to do generically.

XMPP uses UTF-8, and there's almost no reason to use anything but UTF-8.
On the public network, I think it's safe to operate under the assumption
that this will never change. If it ever does, we'll have lots of work
and bad assumptions to modify anyways, so one more won't hurt. Assuming
UTF-8 drastically simplifies a lot, so it doesn't seem worth changing
that assumption for a hypothetical.

> 2) There is no useful meaning of starting a link or bold inside a
>codepoint. Depending on the tech stack used, it might cause
>developers to unintentionally allow the generation of invalidly
>encoded strings, causing all kind of issues (including potential
>security impact)

This problem exists with codepoints too, though to a lesser extent and
it may be less clear how it should be handled in all cases. For example,
in the middle of a multi-codepoint emoji or country flag. By contrast,
if the start or end string exists between bytes in a UTF-8 encoding of a
single codepoint, it is easier to detect, and is clearly an error.

There's also the minor problem of having to decode all the bytes up to
the start position at the application layer if we have to count
codepoints. With bytes you only have two checks: is the start and the
end marker on a byte boundary? If so the string in the middle can be
assumed to be valid.


—Sam

-- 
Sam Whited
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Marvin W

On 10/24/19 9:40 PM, Kim Alvefur wrote:

We should refrain from using things like grapheme clusters in wire formats,
as those are subject to changes in upcoming Unicode versions and thus the
wire format would be understood differently depending on the Unicode version
implemented by the client.


Doesn't this also depend on the font?


If the font does not support certain graphemes it may be rendered as 
multiple (it may render 🤦‍♂️ as 🤦 and ♂️). The font rendering toolkit 
may be aware that this is a single grapheme since Emoji 4.0 and thus may 
consider it a single grapheme when selecting (for copy and paste, i.e. 
not allow to only copy the ♂️). If the rendering toolkit does allow to 
select only a part of this grapheme cluster and the user does so and 
instruct the client to make the selected text a reference, this would 
make things interesting again (because in the Unicode counting, you'd be 
in the middle of a character, so it would not be possible to actually do 
what the user instructed). Thus the font may be relevant for various 
UI/UX stuff and developers need to be aware of those when allowing the 
user to input stuff.


For output, the font would not be of any relevance, it doesn't matter if 
in the end the reference link is using a single grapheme or two 
graphemes because the font does not support that single grapheme from 
the newer Unicode version. Of course if the toolkit wants you to give 
highlight instructions in displayed graphemes, you'd have to deal with 
that, but I hope there is no toolkit doing that...


Does it make sense to do an Informational XEP for Unicode handling in XEPs?

Marvin
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Kim Alvefur
On Thu, Oct 24, 2019 at 08:32:04PM +0200, Marvin W wrote:
> Thus, I would vote for using codepoints.

I agree.

> The rule should just be that clients should not do that on outgoing
> data.

I agree with this as well.

> We should refrain from using things like grapheme clusters in wire formats,
> as those are subject to changes in upcoming Unicode versions and thus the
> wire format would be understood differently depending on the Unicode version
> implemented by the client.

Doesn't this also depend on the font?

-- 
Kim "Zash" Alvefur


signature.asc
Description: PGP signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Marvin W

On 10/21/19 4:06 PM, Jonathan Lennox wrote:

The right concept here is probably "grapheme clusters", as defined in
Unicode Standard Annex 29.  ICU has support for this.


We should refrain from using things like grapheme clusters in wire 
formats, as those are subject to changes in upcoming Unicode versions 
and thus the wire format would be understood differently depending on 
the Unicode version implemented by the client.


Technically we could also agree on using a certain Unicode version now 
and for all eternity, but this sounds like a stupid concept and will 
cause people to use ICU or similar which will break eventually as the 
standard changes.


We should strive for the maximum compatibility. This gives us basically 
two options: bytes and codepoints. As our encoding is fixed to UTF-8 per 
RFC6120, both would be equally understandable by clients. However there 
are two good reasons against bytes:
1) At some point we might want to allow the usage of UTF-16 or any other 
encoding. Byte counts would have to be translated when re-encoding which 
a server is probably unable to do generically.
2) There is no useful meaning of starting a link or bold inside a 
codepoint. Depending on the tech stack used, it might cause developers 
to unintentionally allow the generation of invalidly encoded strings, 
causing all kind of issues (including potential security impact)


Thus, I would vote for using codepoints. This would of course open the 
questions what happens if multiple codepoints result in a single 
grapheme and anything points inside the grapheme. The rule should just 
be that clients should not do that on outgoing data. If a clients 
receives input pointing inside a grapheme, it's implementation-defined 
if the grapheme is included, excluded or split. In practice this 
shouldn't happen so I doubt it is really worth it to define ruling in 
the respective XEP, but this would also be an option.


By the way, the often mentioned flag example is not consistent across 
browsers either, try https://larma.de/splitflag.html with various 
browsers and browser versions. (Bonus Task: Build a browser detector 
based on flag rendering)


Marvin
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-24 Thread Andrey Gagarin
пн, 21 окт. 2019 г. в 19:08, Jonathan Lennox :

> The right concept here is probably "grapheme clusters", as defined in
> Unicode Standard Annex 29.  ICU has support for this.
>

We have succeded implementing reference processing on three clients and on
the server side. And not one of the developers had problems calculating the
necessary positions. You just handle every emoji as one glyph.

In addition we made a XMPP bot with which you can test different
references: markup, string with escaped text and different media. You can
try it xmpp:dev...@dev.xabber.com

For instance, if you have such text : "😁😂😆 funny comment with some bold
text!" and you want to make it in some part bold, you should count every
symbol in this text and in the end you will get such message to send:






😁😂😆 funny comment with some bold text!


Each of these three emojis is counted as 1 symbol.

The client will render:
[image: Screenshot_2019-10-24 Xabber Web.png]

More complex example with unicode combining characters: "Test ◌⃤ BOLD
italic usual text". We count this graphem as one character. The message
should be like this:








Test ◌⃤ BOLD italic usual text


The client will render:

[image: Screenshot_2019-10-24 Xabber Web(1).png]

In addition we made a XMPP bot with which you can test different
references: markup, string with escaped text and different media content.
You can try it here xmpp:dev...@dev.xabber.com.

-- 
Andrey Gagarin
Developer, Redsolution OÜ
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-21 Thread Sam Whited
On Sun, Oct 20, 2019, at 18:39, JC Brand wrote:
> You don't need tons of ways, you can just follow the instructions. If
> the sending client is buggy, then this will become clear over time.

"Following the instructions" may  mean different things to different
clients in this case. One might treat it as an error, one might display
it and break up the flag emoji, etc. This is not ideal.

> Yes, you just render the two letters separately given that this is
> what's implied by the information you've been given and it's also a
> legitimate use-case.

Assuming this is the desired behavior and we can actually do this: Now
that they've been rendered separately, what if the receiving client
copies and pastes the message. The highlight is not included, or just
becomes plain text, does this mean the flag emoji is rejoined and now
the copy/pasted message is different from the original? This doesn't
seem ideal.

> > What if it's between something and a zero-width joiner that would
> > join it to another glyph, does that split that and now you have a
> > dangling joiner?
>
> This is as clearly an error as setting an offset in the middle of a
> UTF-8 encoding.

Perhaps. Now we just have to enumerate all the other ways that Unicode
handles things like this, and make sure all clients handle them the same
way. This would of course be a problem if we were using bytes, for
example, too, but the point is that it's not as simple as saying "these
things are errors and these aren't". There are different ways to handle
these, and Unicode has a lot of edge cases we likely won't think of.

> > From a code perspective does this mean that highlighting always has
> > to integrate with the text rendering engine? This seems like a
> > *major* downside to me, as it likely makes the code much more
> > complicated, and we may or may not even have the ability to
> > manipulate how the text rendering engine handles things.
>
> It's not clear to me why you think highlighting will necessarily
> require integration with the rendering engine. It should be possible
> to identify unicode codepoints in a string independent of any
> rendering engine.

How do you propose breaking up a flag emoji, for example? We have to
have a way to tell the text rendering engine "don't render this flag,
show the letters". We could probably include a zero width space or
something between the letters, but now when someone copy/pastes the
message they are copying characters that weren't part of what the sender
actually typed, which doesn't feel great.

—Sam

-- 
Sam Whited
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-21 Thread Sam Whited
On Mon, Oct 21, 2019, at 14:06, Jonathan Lennox wrote:
> The right concept here is probably "grapheme clusters", as defined in
> Unicode Standard Annex 29.  ICU has support for this.

This was also my suggestion at a summit a few years ago. However, the
downside here is that it significantly increases the footprint of the
code (you have to use a library that supports segmentation and grapheme
clusters, or write a fairly complicated algorithm yourself), requires a
lot more knowledge to implement (getting started with Unicode if it's
not your focus is a lot of new terms and confusing concepts that make it
easy to make a mistake, even if you do have a good library to work
with), and generally makes implementations harder to do.

I go back and forth between using grapheme clusters and bytes
personally, but all the options that have been laid out have their
downsides.

—Sam

-- 
Sam Whited
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-21 Thread Jonathan Lennox
On Saturday, October 19 2019, "Sam Whited" wrote to "standards@xmpp.org" saying:

> On Sat, Oct 19, 2019, at 04:57, JC Brand wrote:
> > You might still have an offset in between two codepoints that should
> > ideally be shown together like "EU" making the EU flag, but this seems
> > less of an issue to me.
> 
> I don't know if this is better or not, and I'm still not sure how best
> to handle it. If you end up with text in the middle of a UTF-8 encoding,
> at least that's clearly an error. If it's in between the two letters in
> a flag emoji, that's not necessarily an error and there are tons of
> different ways you could handle it, which seems much more complex.
> Does this break the flag emoji back into the letter glyphs that are
> shown if it doesn't form a flag? What if it's between something and a
> zero-width joiner that would join it to another glyph, does that split
> that and now you have a dangling joiner? From a code perspective does
> this mean that highlighting always has to integrate with the text
> rendering engine? This seems like a *major* downside to me, as it likely
> makes the code much more complicated, and we may or may not even have
> the ability to manipulate how the text rendering engine handles things.

The right concept here is probably "grapheme clusters", as defined in
Unicode Standard Annex 29.  ICU has support for this.

-- 
Jonathan Lennox
len...@cs.columbia.edu
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-20 Thread JC Brand
On Sat, Oct 19, 2019 at 04:41:19PM +, Sam Whited wrote:
> On Sat, Oct 19, 2019, at 04:57, JC Brand wrote:
> > You might still have an offset in between two codepoints that should
> > ideally be shown together like "EU" making the EU flag, but this seems
> > less of an issue to me.
> 
> I don't know if this is better or not, and I'm still not sure how best
> to handle it. If you end up with text in the middle of a UTF-8 encoding,
> at least that's clearly an error. If it's in between the two letters in
> a flag emoji, that's not necessarily an error and there are tons of
> different ways you could handle it, which seems much more complex

You don't need tons of ways, you can just follow the instructions. If the
sending client is buggy, then this will become clear over time.

> Does this break the flag emoji back into the letter glyphs that are
> shown if it doesn't form a flag?

Yes, you just render the two letters separately given that this is
what's implied by the information you've been given and it's also a 
legitimate use-case.

By referencing only one of two consecutive letter glyphs, you're indicating
that they're logically distinct, so it makes sense that they're not rendered
together. In any case, usually you'll want to somehow highlight, make clickable
or replace the referenced text, thereby affirming the need to render them
separately.

> What if it's between something and a
> zero-width joiner that would join it to another glyph, does that split
> that and now you have a dangling joiner?

This is as clearly an error as setting an offset in the middle of a UTF-8
encoding.

> From a code perspective does
> this mean that highlighting always has to integrate with the text
> rendering engine? This seems like a *major* downside to me, as it likely
> makes the code much more complicated, and we may or may not even have
> the ability to manipulate how the text rendering engine handles things.

It's not clear to me why you think highlighting will necessarily require
integration with the rendering engine. It should be possible to identify
unicode codepoints in a string independent of any rendering engine.



signature.asc
Description: PGP signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-19 Thread Sam Whited
On Sat, Oct 19, 2019, at 04:57, JC Brand wrote:
> You might still have an offset in between two codepoints that should
> ideally be shown together like "EU" making the EU flag, but this seems
> less of an issue to me.

I don't know if this is better or not, and I'm still not sure how best
to handle it. If you end up with text in the middle of a UTF-8 encoding,
at least that's clearly an error. If it's in between the two letters in
a flag emoji, that's not necessarily an error and there are tons of
different ways you could handle it, which seems much more complex.
Does this break the flag emoji back into the letter glyphs that are
shown if it doesn't form a flag? What if it's between something and a
zero-width joiner that would join it to another glyph, does that split
that and now you have a dangling joiner? From a code perspective does
this mean that highlighting always has to integrate with the text
rendering engine? This seems like a *major* downside to me, as it likely
makes the code much more complicated, and we may or may not even have
the ability to manipulate how the text rendering engine handles things.

—Sam
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-18 Thread JC Brand
On Thu, Oct 17, 2019 at 01:46:26PM +, Sam Whited wrote:
> TL;DR — we should avoid using XEP-0372 until "TODO: define character
> appropriately" is removed and resolved.

XEP-0394 (Message Markup) works similarly to XEP-0372 and defines the
"start" and "end" values in "units of unicode code points in the
character data of the body element".

This seems better than bytes because then you'll never have an offset in the
middle of a UTF-8 encoding.

You might still have an offset in between two codepoints that should ideally be
shown together like "EU" making the EU flag, but this seems less of an issue to
me.

I therefore think we should just do the same for XEP-0372. It would in any case
be crazy to specify one way of doing things in XEP-0394 and another in XEP-0372.

JC

 
> On Thu, Oct 17, 2019, at 10:07, JC Brand wrote:
> > Instead, I propose that we use XEP-0372 references to indicate that
> > a particular shortname (e.g. :dancingpanda:) should be replaced with
> > an image.
> >
> > For example:
> >
> >  I
> >  feel like dancing! :dancingpanda:  >  xmlnx="urn:xmpp:reference:0" begin="21" end="35" type="data" uri="
> >  https://images.com/dancingpanda"/> 
> 
> We should avoid using references in the wild until a few things are
> cleared up. We don't want lots of pre-mature implementations popping up
> that aren't compatible with one another.
> 
> For example, in the following message:
> 
> "> ☃︎ :sadpanda:"
> 
> Should the start attribute for ":sadpanda:" be 4 or 5? Unicode snowman
> is 2 bytes, after all.
> 
> What about:
> 
> "🇪🇺 :sadpanda:"
> 
> Which may be rendering as an EU flag or as the separate letters 'E', 'U'
> depending on your rendering?
> 
> The easiest way is to probably just say that the offset is in bytes, but
> now what do we do if a buggy or malicious client sends something with
> the offset in the middle of the UTF-8 encoding for the snowman emoji?
> What about in the middle of the two codepoints that will be combined to
> create the EU flag glyph which would still be between valid UTF-8
> encodings?
> 
> This is not an easy problem, and while I don't want to tackle trying to
> solve it in this thread, I think references should be avoided until we
> do or we'll never get all the implementations doing one thing later (and
> emojis are exactly the kind of feature that will lead to lots of
> implementations).
> 
> —Sam
> 
> 
> -- 
> Sam Whited
> ___
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: standards-unsubscr...@xmpp.org
> ___


signature.asc
Description: PGP signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Kim Alvefur
On Thu, Oct 17, 2019 at 04:17:23PM +0100, Matthew Wild wrote:
> On Thu, 17 Oct 2019 at 13:34, JC Brand  wrote:
> > "some other entity" isn't terribly well defined. How do I (or the
> > recipient of my stickers) know what other entity to ask?
> 
> It's part of the identifier, e.g.
> 'cid:sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'

>From what I understand, `@bob.xmpp.org` works as a namespace authority.
You can't ask xmpp:bob.xmpp.org about it, and the string is hardcoded in
the XEP itself (and in code).


-- 
Kim "Zash" Alvefur
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Matthew Wild
On Thu, 17 Oct 2019 at 16:17, Matthew Wild  wrote:
>
> On Thu, 17 Oct 2019 at 13:34, JC Brand  wrote:
> >
> > On Thu, Oct 17, 2019 at 01:23:18PM +0200, Marvin W wrote:
> >
>
> > > - I already dislike the fact that we do HTTP requests to arbitrary servers
> > > for file transfers, as we might be leaking IP addresses in such cases.
> >
> > The file servers are usually not arbitrary but are hosted by your XMPP host.
>
> They are effectively arbitrary. When you upload you upload to your own
> host, right. But when you receive an jabber:x:oob stanza, the URL can
> be absolutely anything (HTML, 50GB JPEG, "pixel tracker").
>
> > > In the case of Converse, you are likely to get into GDPR issues when 
> > > doing so
> > > without explicit user consent (and you don't want explicit user consent 
> > > for
> > > every emoji).
> >
> > Why would you need user consent to show remote images?
>
> You are providing the user's IP address, user agent, etc. to a third
> party. IANAL and I'm not saying that I buy the GDPR argument in this
> case, but there *is* a case for privacy here.
>
> > Otherwise any website that has user accounts and which links to 3rd party
> > images would need user consent for each particular image.
>
> Reality check: Third-party assets (images, scripts, etc.) are exactly
> how the majority of tracking happens online today.
>
> > > There is a reason why many e-Mail-Clients don't render remote
> > > content in e-Mails...
> >
> > And that's not GDPR, right?
> >
> > AFAIK it's to avoid pixel tracking and IP address leakage.
>
> Right. There are other reasons too, including spam and many other
> categories of media (some illegal) that I don't want my device
> automatically downloading and displaying. And don't forget data usage.
>
> > > - BOB does not require the sender to provide the file referenced by the 
> > > CID
> > > 0231 §2.1 says that you can send the IQ to request the bytes to 
> > > "potentially
> > > some other entity". If you don't expect the sending client to provide the
> > > file, it doesn't need to cache all stickers and it doesn't need to be
> > > online.
> >
> > "some other entity" isn't terribly well defined. How do I (or the
> > recipient of my stickers) know what other entity to ask?
>
> It's part of the identifier, e.g.
> 'cid:sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'

Zash corrected me here. This is actually a fixed suffix and not a
location, despite the syntax. Something new every day!

That does leave your question unanswered, but not unanswerable.

Regards,
Matthew
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Matthew Wild
On Thu, 17 Oct 2019 at 13:34, JC Brand  wrote:
>
> On Thu, Oct 17, 2019 at 01:23:18PM +0200, Marvin W wrote:
>

> > - I already dislike the fact that we do HTTP requests to arbitrary servers
> > for file transfers, as we might be leaking IP addresses in such cases.
>
> The file servers are usually not arbitrary but are hosted by your XMPP host.

They are effectively arbitrary. When you upload you upload to your own
host, right. But when you receive an jabber:x:oob stanza, the URL can
be absolutely anything (HTML, 50GB JPEG, "pixel tracker").

> > In the case of Converse, you are likely to get into GDPR issues when doing 
> > so
> > without explicit user consent (and you don't want explicit user consent for
> > every emoji).
>
> Why would you need user consent to show remote images?

You are providing the user's IP address, user agent, etc. to a third
party. IANAL and I'm not saying that I buy the GDPR argument in this
case, but there *is* a case for privacy here.

> Otherwise any website that has user accounts and which links to 3rd party
> images would need user consent for each particular image.

Reality check: Third-party assets (images, scripts, etc.) are exactly
how the majority of tracking happens online today.

> > There is a reason why many e-Mail-Clients don't render remote
> > content in e-Mails...
>
> And that's not GDPR, right?
>
> AFAIK it's to avoid pixel tracking and IP address leakage.

Right. There are other reasons too, including spam and many other
categories of media (some illegal) that I don't want my device
automatically downloading and displaying. And don't forget data usage.

> > - BOB does not require the sender to provide the file referenced by the CID
> > 0231 §2.1 says that you can send the IQ to request the bytes to "potentially
> > some other entity". If you don't expect the sending client to provide the
> > file, it doesn't need to cache all stickers and it doesn't need to be
> > online.
>
> "some other entity" isn't terribly well defined. How do I (or the
> recipient of my stickers) know what other entity to ask?

It's part of the identifier, e.g.
'cid:sha1+8f35fef110ffc5df08d579a50083ff9308fb6242@bob.xmpp.org'

> How are stickers added to the entity's cache? How does the client know which
> stickers it can send, i.e. which stickers are contained in the cache?
>
> These are all things not spec'd in BOB.

They are not. But BoB is 95% there IMHO, has desirable properties for
this use case, and we already have implementations.

> Ideally I'd like to implement something that works in 2019/2020.

Obviously we don't have anything widespread except for jabber:x:oob
right now (which is not suitable as it stands). I don't think pushing
a BoB solution over the finish line by filling in the gaps with a
'stickers' XEP would be infeasible or more work than standardizing a
HTTP-based solution.

Regards,
Matthew
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Sam Whited
TL;DR — we should avoid using XEP-0372 until "TODO: define character
appropriately" is removed and resolved.

On Thu, Oct 17, 2019, at 10:07, JC Brand wrote:
> Instead, I propose that we use XEP-0372 references to indicate that
> a particular shortname (e.g. :dancingpanda:) should be replaced with
> an image.
>
> For example:
>
>  I
>  feel like dancing! :dancingpanda:   xmlnx="urn:xmpp:reference:0" begin="21" end="35" type="data" uri="
>  https://images.com/dancingpanda"/> 

We should avoid using references in the wild until a few things are
cleared up. We don't want lots of pre-mature implementations popping up
that aren't compatible with one another.

For example, in the following message:

"> ☃︎ :sadpanda:"

Should the start attribute for ":sadpanda:" be 4 or 5? Unicode snowman
is 2 bytes, after all.

What about:

"🇪🇺 :sadpanda:"

Which may be rendering as an EU flag or as the separate letters 'E', 'U'
depending on your rendering?

The easiest way is to probably just say that the offset is in bytes, but
now what do we do if a buggy or malicious client sends something with
the offset in the middle of the UTF-8 encoding for the snowman emoji?
What about in the middle of the two codepoints that will be combined to
create the EU flag glyph which would still be between valid UTF-8
encodings?

This is not an easy problem, and while I don't want to tackle trying to
solve it in this thread, I think references should be avoided until we
do or we'll never get all the implementations doing one thing later (and
emojis are exactly the kind of feature that will lead to lots of
implementations).

—Sam


-- 
Sam Whited
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Sergey Ilinykh
> I'm not sure what you mean. The "begin" and "end" attributes on the
>  element can encapsulate the shortname of the sticker.

I mean where the encapsulated text has to be somehow presented to the user
or fully replaced with the sticker as if it never was there.
I believe in some cases (not stickers) the text has to be preserved for
SIMS.

Best Regards,
Sergey


чт, 17 окт. 2019 г. в 15:43, JC Brand :

> On Thu, Oct 17, 2019 at 03:05:32PM +0300, Sergey Ilinykh wrote:
> >Doesn't SIMS ([1]https://xmpp.org/extensions/xep-0385.html) resolve
> all
> >the concerns yet?
>
> Yes, SIMS is more comprehensive than my example. It's still uses a XEP-0372
> reference, so it's conceptually similar.
>
> >We can have a  with cid: uri too.
>
> yep.
>
> We could probably also specify under  some JID which can return
> the BOB
> data as Marvin suggested.
>
> >The only thing is missed as for me is an attribute where the
> referenced
> >text has to be removed or not.
> >Best Regards,
>
> I'm not sure what you mean. The "begin" and "end" attributes on the
>  element can encapsulate the shortname of the sticker.
>
> JC
>
> >Sergey
> >чт, 17 окт. 2019 г. в 14:24, Marvin W <[2]x...@larma.de>:
> >
> >  Hi,
> >
> >  Regarding your proposal:
> >  - You should still add a hash in the reference somehow so that
> clients
> >  *can* cache entries (even if you won't do it in Converse)
> >  - I already dislike the fact that we do HTTP requests to arbitrary
> >  servers for file transfers, as we might be leaking IP addresses in
> such
> >  cases. In the case of Converse, you are likely to get into GDPR
> issues
> >  when doing so without explicit user consent (and you don't want
> explicit
> >  user consent for every emoji). There is a reason why many
> e-Mail-Clients
> >  don't render remote content in e-Mails...
> >  - When this is combined with body-only e2e-encryption, you are
> leaking
> >  information as I guess you don't envision the emoji to be encrypted
> for
> >  each e2e session individually.
> >
> >  You can probably solve the second issue mentioned above and the
> issue
> >  with http files by proxying the image request through the server
> hosting
> >  Converse (which is what other popular sites that allow arbitrary
> http
> >  links like GitHub do). But I guess you don't want to do that.
> >
> >  Regarding your issues with using BOB:
> >  - BOB does not depend on XHTML-IM. 0231 §2.2 specifically says that
> "any
> >  appropriate format can be used" to share the CID. This means it is
> also
> >  possible to use it in 0372 references (as you suggest to do just
> without
> >  http).
> >  - BOB does not require the sender to provide the file referenced by
> the
> >  CID 0231 §2.1 says that you can send the IQ to request the bytes to
> >  "potentially some other entity". If you don't expect the sending
> client
> >  to provide the file, it doesn't need to cache all stickers and it
> >  doesn't need to be online.
> >
> >  Marvin
> >
> >  On 10/17/19 12:07 PM, JC Brand wrote:
> >  > Hello
> >  >
> >  > I'm currently working on adding support for non-unicode emojis to
> >  Converse.js.
> >  >
> >  > Currently, users can't upload their own images to be used for
> custom
> >  emojis,
> >  > mostly because Converse is a thin client with no backend support
> for
> >  it.
> >  >
> >  > So to add custom emojis, the web host needs to edit a emojis.json
> file
> >  > to add new entries with URLs pointing to the actual images.
> >  >
> >  > Concerning compatibility with other clients, I've discussed it
> with
> >  edhelas
> >  > and he told me he uses XEP-0231 BOB for sending stickers.
> >  >
> >  > There are a few reasons why I'm not keen on using BOB:
> >  >
> >  > - BOB depends on XHTML-IM which is deprecated. Converse.js doesn't
> >  support it
> >  >and I'm reluctant to add support just for this.
> >  > - BOB mentions that binary data should be smaller than 1KB. Not
> sure
> >  how
> >  >relevant that still is, but it discourages me from sending
> larger
> >  amounts.
> >  > - The sending client needs to maintain a cache of all sent
> stickers.
> >  > - AFAICT, when receiving an uncached BOB message via MAM and the
> >  sending client
> >  >is offline, then you can't get the image data.
> >  >
> >  > Instead, I propose that we use XEP-0372 references to indicate
> that a
> >  > particular shortname (e.g. :dancingpanda:) should be replaced
> with an
> >  image.
> >  >
> >  > For example:
> >  >
> >  >   >  >  I feel like dancing! :dancingpanda:
> >  >   >  >  begin="21"
> >  >  end="35"
> >  >  type="data"
> >  >  uri="[5]http

Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread JC Brand
On Thu, Oct 17, 2019 at 03:05:32PM +0300, Sergey Ilinykh wrote:
>Doesn't SIMS ([1]https://xmpp.org/extensions/xep-0385.html) resolve all
>the concerns yet?

Yes, SIMS is more comprehensive than my example. It's still uses a XEP-0372
reference, so it's conceptually similar.

>We can have a  with cid: uri too.

yep.

We could probably also specify under  some JID which can return the BOB
data as Marvin suggested.

>The only thing is missed as for me is an attribute where the referenced
>text has to be removed or not.
>Best Regards,

I'm not sure what you mean. The "begin" and "end" attributes on the
 element can encapsulate the shortname of the sticker.

JC

>Sergey
>чт, 17 окт. 2019 г. в 14:24, Marvin W <[2]x...@larma.de>:
> 
>  Hi,
> 
>  Regarding your proposal:
>  - You should still add a hash in the reference somehow so that clients
>  *can* cache entries (even if you won't do it in Converse)
>  - I already dislike the fact that we do HTTP requests to arbitrary
>  servers for file transfers, as we might be leaking IP addresses in such
>  cases. In the case of Converse, you are likely to get into GDPR issues
>  when doing so without explicit user consent (and you don't want explicit
>  user consent for every emoji). There is a reason why many e-Mail-Clients
>  don't render remote content in e-Mails...
>  - When this is combined with body-only e2e-encryption, you are leaking
>  information as I guess you don't envision the emoji to be encrypted for
>  each e2e session individually.
> 
>  You can probably solve the second issue mentioned above and the issue
>  with http files by proxying the image request through the server hosting
>  Converse (which is what other popular sites that allow arbitrary http
>  links like GitHub do). But I guess you don't want to do that.
> 
>  Regarding your issues with using BOB:
>  - BOB does not depend on XHTML-IM. 0231 §2.2 specifically says that "any
>  appropriate format can be used" to share the CID. This means it is also
>  possible to use it in 0372 references (as you suggest to do just without
>  http).
>  - BOB does not require the sender to provide the file referenced by the
>  CID 0231 §2.1 says that you can send the IQ to request the bytes to
>  "potentially some other entity". If you don't expect the sending client
>  to provide the file, it doesn't need to cache all stickers and it
>  doesn't need to be online.
> 
>  Marvin
> 
>  On 10/17/19 12:07 PM, JC Brand wrote:
>  > Hello
>  >
>  > I'm currently working on adding support for non-unicode emojis to
>  Converse.js.
>  >
>  > Currently, users can't upload their own images to be used for custom
>  emojis,
>  > mostly because Converse is a thin client with no backend support for
>  it.
>  >
>  > So to add custom emojis, the web host needs to edit a emojis.json file
>  > to add new entries with URLs pointing to the actual images.
>  >
>  > Concerning compatibility with other clients, I've discussed it with
>  edhelas
>  > and he told me he uses XEP-0231 BOB for sending stickers.
>  >
>  > There are a few reasons why I'm not keen on using BOB:
>  >
>  > - BOB depends on XHTML-IM which is deprecated. Converse.js doesn't
>  support it
>  >    and I'm reluctant to add support just for this.
>  > - BOB mentions that binary data should be smaller than 1KB. Not sure
>  how
>  >    relevant that still is, but it discourages me from sending larger
>  amounts.
>  > - The sending client needs to maintain a cache of all sent stickers.
>  > - AFAICT, when receiving an uncached BOB message via MAM and the
>  sending client
>  >    is offline, then you can't get the image data.
>  >
>  > Instead, I propose that we use XEP-0372 references to indicate that a
>  > particular shortname (e.g. :dancingpanda:) should be replaced with an
>  image.
>  >
>  > For example:
>  >
>  >        >          I feel like dancing! :dancingpanda:
>  >            >                  begin="21"
>  >                  end="35"
>  >                  type="data"
>  >                  uri="[5]https://images.com/dancingpanda"/>
>  >      
>  >
>  > I'm not sure whether "type" should be "data", seems a bit too generic
>  for me,
>  > perhaps it could be something else?
>  >
>  > Some criticisms of this approach from edhelas:
>  >
>  > - HTTP images can be sent to a webchat client served over HTTPS
>  > - There's no size limit, so users can send links to very large
>  stickers
>  >
>  > Concerning the first criticism, a client can choose to not render HTTP
>  > images inline and instead make the shortname a link which opens the
>  image in a
>  > new tab. Not ideal, but a comprom

Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread JC Brand
On Thu, Oct 17, 2019 at 01:23:18PM +0200, Marvin W wrote:
> Hi,

> Regarding your proposal:
> - You should still add a hash in the reference somehow so that clients *can*
> cache entries (even if you won't do it in Converse)

Sure.

> - I already dislike the fact that we do HTTP requests to arbitrary servers
> for file transfers, as we might be leaking IP addresses in such cases.

The file servers are usually not arbitrary but are hosted by your XMPP host.

> In the case of Converse, you are likely to get into GDPR issues when doing so
> without explicit user consent (and you don't want explicit user consent for
> every emoji).

Why would you need user consent to show remote images?

The GDPR makes provision for data capturing that's needed to provide the
actual service. I think that applies in this case.

Otherwise any website that has user accounts and which links to 3rd party
images would need user consent for each particular image.

> There is a reason why many e-Mail-Clients don't render remote
> content in e-Mails...

And that's not GDPR, right?

AFAIK it's to avoid pixel tracking and IP address leakage.

> - When this is combined with body-only e2e-encryption, you are leaking
> information as I guess you don't envision the emoji to be encrypted for each
> e2e session individually.

This is a general problem with body-only E2EE and a good example of
why we need full-stanza encryption.

> You can probably solve the second issue mentioned above and the issue with
> http files by proxying the image request through the server hosting Converse
> (which is what other popular sites that allow arbitrary http links like
> GitHub do). But I guess you don't want to do that.

Depends on the deployment, but no, not really.

> Regarding your issues with using BOB:
> - BOB does not depend on XHTML-IM. 0231 §2.2 specifically says that "any
> appropriate format can be used" to share the CID. This means it is also
> possible to use it in 0372 references (as you suggest to do just without
> http).

Yeah, the thought of doing that did cross my mind, I forgot to mention it in my
post though.

Then there are still the caching issues.

> - BOB does not require the sender to provide the file referenced by the CID
> 0231 §2.1 says that you can send the IQ to request the bytes to "potentially
> some other entity". If you don't expect the sending client to provide the
> file, it doesn't need to cache all stickers and it doesn't need to be
> online.

"some other entity" isn't terribly well defined. How do I (or the
recipient of my stickers) know what other entity to ask?

How are stickers added to the entity's cache? How does the client know which
stickers it can send, i.e. which stickers are contained in the cache?

These are all things not spec'd in BOB.

Ideally I'd like to implement something that works in 2019/2020.


> Marvin
> 
> On 10/17/19 12:07 PM, JC Brand wrote:
> > Hello
> > 
> > I'm currently working on adding support for non-unicode emojis to 
> > Converse.js.
> > 
> > Currently, users can't upload their own images to be used for custom emojis,
> > mostly because Converse is a thin client with no backend support for it.
> > 
> > So to add custom emojis, the web host needs to edit a emojis.json file
> > to add new entries with URLs pointing to the actual images.
> > 
> > Concerning compatibility with other clients, I've discussed it with edhelas
> > and he told me he uses XEP-0231 BOB for sending stickers.
> > 
> > There are a few reasons why I'm not keen on using BOB:
> > 
> > - BOB depends on XHTML-IM which is deprecated. Converse.js doesn't support 
> > it
> >and I'm reluctant to add support just for this.
> > - BOB mentions that binary data should be smaller than 1KB. Not sure how
> >relevant that still is, but it discourages me from sending larger 
> > amounts.
> > - The sending client needs to maintain a cache of all sent stickers.
> > - AFAICT, when receiving an uncached BOB message via MAM and the sending 
> > client
> >is offline, then you can't get the image data.
> > 
> > Instead, I propose that we use XEP-0372 references to indicate that a
> > particular shortname (e.g. :dancingpanda:) should be replaced with an image.
> > 
> > For example:
> > 
> >   >  I feel like dancing! :dancingpanda:
> >   >  begin="21"
> >  end="35"
> >  type="data"
> >  uri="https://images.com/dancingpanda"/>
> >  
> > 
> > I'm not sure whether "type" should be "data", seems a bit too generic for 
> > me,
> > perhaps it could be something else?
> > 
> > Some criticisms of this approach from edhelas:
> > 
> > - HTTP images can be sent to a webchat client served over HTTPS
> > - There's no size limit, so users can send links to very large stickers
> > 
> > Concerning the first criticism, a client can choose to not render HTTP
> > images inline and instead make the shortname a link which opens the image 
> > in a
> > new tab. Not id

Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Sergey Ilinykh
Doesn't SIMS (https://xmpp.org/extensions/xep-0385.html) resolve all the
concerns yet?
We can have a  with cid: uri too.

The only thing is missed as for me is an attribute where the referenced
text has to be removed or not.

Best Regards,
Sergey


чт, 17 окт. 2019 г. в 14:24, Marvin W :

> Hi,
>
> Regarding your proposal:
> - You should still add a hash in the reference somehow so that clients
> *can* cache entries (even if you won't do it in Converse)
> - I already dislike the fact that we do HTTP requests to arbitrary
> servers for file transfers, as we might be leaking IP addresses in such
> cases. In the case of Converse, you are likely to get into GDPR issues
> when doing so without explicit user consent (and you don't want explicit
> user consent for every emoji). There is a reason why many e-Mail-Clients
> don't render remote content in e-Mails...
> - When this is combined with body-only e2e-encryption, you are leaking
> information as I guess you don't envision the emoji to be encrypted for
> each e2e session individually.
>
> You can probably solve the second issue mentioned above and the issue
> with http files by proxying the image request through the server hosting
> Converse (which is what other popular sites that allow arbitrary http
> links like GitHub do). But I guess you don't want to do that.
>
> Regarding your issues with using BOB:
> - BOB does not depend on XHTML-IM. 0231 §2.2 specifically says that "any
> appropriate format can be used" to share the CID. This means it is also
> possible to use it in 0372 references (as you suggest to do just without
> http).
> - BOB does not require the sender to provide the file referenced by the
> CID 0231 §2.1 says that you can send the IQ to request the bytes to
> "potentially some other entity". If you don't expect the sending client
> to provide the file, it doesn't need to cache all stickers and it
> doesn't need to be online.
>
> Marvin
>
> On 10/17/19 12:07 PM, JC Brand wrote:
> > Hello
> >
> > I'm currently working on adding support for non-unicode emojis to
> Converse.js.
> >
> > Currently, users can't upload their own images to be used for custom
> emojis,
> > mostly because Converse is a thin client with no backend support for it.
> >
> > So to add custom emojis, the web host needs to edit a emojis.json file
> > to add new entries with URLs pointing to the actual images.
> >
> > Concerning compatibility with other clients, I've discussed it with
> edhelas
> > and he told me he uses XEP-0231 BOB for sending stickers.
> >
> > There are a few reasons why I'm not keen on using BOB:
> >
> > - BOB depends on XHTML-IM which is deprecated. Converse.js doesn't
> support it
> >and I'm reluctant to add support just for this.
> > - BOB mentions that binary data should be smaller than 1KB. Not sure how
> >relevant that still is, but it discourages me from sending larger
> amounts.
> > - The sending client needs to maintain a cache of all sent stickers.
> > - AFAICT, when receiving an uncached BOB message via MAM and the sending
> client
> >is offline, then you can't get the image data.
> >
> > Instead, I propose that we use XEP-0372 references to indicate that a
> > particular shortname (e.g. :dancingpanda:) should be replaced with an
> image.
> >
> > For example:
> >
> >   >  I feel like dancing! :dancingpanda:
> >   >  begin="21"
> >  end="35"
> >  type="data"
> >  uri="https://images.com/dancingpanda"/>
> >  
> >
> > I'm not sure whether "type" should be "data", seems a bit too generic
> for me,
> > perhaps it could be something else?
> >
> > Some criticisms of this approach from edhelas:
> >
> > - HTTP images can be sent to a webchat client served over HTTPS
> > - There's no size limit, so users can send links to very large stickers
> >
> > Concerning the first criticism, a client can choose to not render HTTP
> > images inline and instead make the shortname a link which opens the
> image in a
> > new tab. Not ideal, but a compromise for the privacy and security
> conscious.
> >
> > For the second I don't have a good answer.
> >
> > That said, I currently still prefer my suggestion to using BOB. I'd be
> > interested to hear your feedback and suggestions.
> >
> > Regards
> > JC
> >
> >
> > ___
> > Standards mailing list
> > Info: https://mail.jabber.org/mailman/listinfo/standards
> > Unsubscribe: standards-unsubscr...@xmpp.org
> > ___
> >
> ___
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: standards-unsubscr...@xmpp.org
> ___
>
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___

Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Marvin W

Hi,

Regarding your proposal:
- You should still add a hash in the reference somehow so that clients 
*can* cache entries (even if you won't do it in Converse)
- I already dislike the fact that we do HTTP requests to arbitrary 
servers for file transfers, as we might be leaking IP addresses in such 
cases. In the case of Converse, you are likely to get into GDPR issues 
when doing so without explicit user consent (and you don't want explicit 
user consent for every emoji). There is a reason why many e-Mail-Clients 
don't render remote content in e-Mails...
- When this is combined with body-only e2e-encryption, you are leaking 
information as I guess you don't envision the emoji to be encrypted for 
each e2e session individually.


You can probably solve the second issue mentioned above and the issue 
with http files by proxying the image request through the server hosting 
Converse (which is what other popular sites that allow arbitrary http 
links like GitHub do). But I guess you don't want to do that.


Regarding your issues with using BOB:
- BOB does not depend on XHTML-IM. 0231 §2.2 specifically says that "any 
appropriate format can be used" to share the CID. This means it is also 
possible to use it in 0372 references (as you suggest to do just without 
http).
- BOB does not require the sender to provide the file referenced by the 
CID 0231 §2.1 says that you can send the IQ to request the bytes to 
"potentially some other entity". If you don't expect the sending client 
to provide the file, it doesn't need to cache all stickers and it 
doesn't need to be online.


Marvin

On 10/17/19 12:07 PM, JC Brand wrote:

Hello

I'm currently working on adding support for non-unicode emojis to Converse.js.

Currently, users can't upload their own images to be used for custom emojis,
mostly because Converse is a thin client with no backend support for it.

So to add custom emojis, the web host needs to edit a emojis.json file
to add new entries with URLs pointing to the actual images.

Concerning compatibility with other clients, I've discussed it with edhelas
and he told me he uses XEP-0231 BOB for sending stickers.

There are a few reasons why I'm not keen on using BOB:

- BOB depends on XHTML-IM which is deprecated. Converse.js doesn't support it
   and I'm reluctant to add support just for this.
- BOB mentions that binary data should be smaller than 1KB. Not sure how
   relevant that still is, but it discourages me from sending larger amounts.
- The sending client needs to maintain a cache of all sent stickers.
- AFAICT, when receiving an uncached BOB message via MAM and the sending client
   is offline, then you can't get the image data.

Instead, I propose that we use XEP-0372 references to indicate that a
particular shortname (e.g. :dancingpanda:) should be replaced with an image.

For example:

 I feel like dancing! :dancingpanda:
 https://images.com/dancingpanda"/>
 

I'm not sure whether "type" should be "data", seems a bit too generic for me,
perhaps it could be something else?

Some criticisms of this approach from edhelas:

- HTTP images can be sent to a webchat client served over HTTPS
- There's no size limit, so users can send links to very large stickers

Concerning the first criticism, a client can choose to not render HTTP
images inline and instead make the shortname a link which opens the image in a
new tab. Not ideal, but a compromise for the privacy and security conscious.

For the second I don't have a good answer.

That said, I currently still prefer my suggestion to using BOB. I'd be
interested to hear your feedback and suggestions.

Regards
JC


___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


Re: [Standards] Support for stickers (custom emojis)

2019-10-17 Thread Philipp Hörist
Just make a HTTP HEAD request, look how big the content is then start to
download.

This is not a problem specific of stickers, you have that problem always
when you want to download data, potentially the data can always be big.

or you add an attribute size, and trust the sending client.

regards
Philipp

Am Do., 17. Okt. 2019 um 12:10 Uhr schrieb JC Brand :

> Hello
>
> I'm currently working on adding support for non-unicode emojis to
> Converse.js.
>
> Currently, users can't upload their own images to be used for custom
> emojis,
> mostly because Converse is a thin client with no backend support for it.
>
> So to add custom emojis, the web host needs to edit a emojis.json file
> to add new entries with URLs pointing to the actual images.
>
> Concerning compatibility with other clients, I've discussed it with edhelas
> and he told me he uses XEP-0231 BOB for sending stickers.
>
> There are a few reasons why I'm not keen on using BOB:
>
> - BOB depends on XHTML-IM which is deprecated. Converse.js doesn't support
> it
>   and I'm reluctant to add support just for this.
> - BOB mentions that binary data should be smaller than 1KB. Not sure how
>   relevant that still is, but it discourages me from sending larger
> amounts.
> - The sending client needs to maintain a cache of all sent stickers.
> - AFAICT, when receiving an uncached BOB message via MAM and the sending
> client
>   is offline, then you can't get the image data.
>
> Instead, I propose that we use XEP-0372 references to indicate that a
> particular shortname (e.g. :dancingpanda:) should be replaced with an
> image.
>
> For example:
>
>  I feel like dancing! :dancingpanda:
>  begin="21"
> end="35"
> type="data"
> uri="https://images.com/dancingpanda"/>
> 
>
> I'm not sure whether "type" should be "data", seems a bit too generic for
> me,
> perhaps it could be something else?
>
> Some criticisms of this approach from edhelas:
>
> - HTTP images can be sent to a webchat client served over HTTPS
> - There's no size limit, so users can send links to very large stickers
>
> Concerning the first criticism, a client can choose to not render HTTP
> images inline and instead make the shortname a link which opens the image
> in a
> new tab. Not ideal, but a compromise for the privacy and security
> conscious.
>
> For the second I don't have a good answer.
>
> That said, I currently still prefer my suggestion to using BOB. I'd be
> interested to hear your feedback and suggestions.
>
> Regards
> JC
> ___
> Standards mailing list
> Info: https://mail.jabber.org/mailman/listinfo/standards
> Unsubscribe: standards-unsubscr...@xmpp.org
> ___
>
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___


[Standards] Support for stickers (custom emojis)

2019-10-17 Thread JC Brand
Hello

I'm currently working on adding support for non-unicode emojis to Converse.js.

Currently, users can't upload their own images to be used for custom emojis,
mostly because Converse is a thin client with no backend support for it.

So to add custom emojis, the web host needs to edit a emojis.json file
to add new entries with URLs pointing to the actual images.

Concerning compatibility with other clients, I've discussed it with edhelas
and he told me he uses XEP-0231 BOB for sending stickers.

There are a few reasons why I'm not keen on using BOB:

- BOB depends on XHTML-IM which is deprecated. Converse.js doesn't support it
  and I'm reluctant to add support just for this.
- BOB mentions that binary data should be smaller than 1KB. Not sure how
  relevant that still is, but it discourages me from sending larger amounts.
- The sending client needs to maintain a cache of all sent stickers.
- AFAICT, when receiving an uncached BOB message via MAM and the sending client
  is offline, then you can't get the image data.

Instead, I propose that we use XEP-0372 references to indicate that a
particular shortname (e.g. :dancingpanda:) should be replaced with an image.

For example:

I feel like dancing! :dancingpanda:
https://images.com/dancingpanda"/>


I'm not sure whether "type" should be "data", seems a bit too generic for me,
perhaps it could be something else?

Some criticisms of this approach from edhelas:

- HTTP images can be sent to a webchat client served over HTTPS
- There's no size limit, so users can send links to very large stickers

Concerning the first criticism, a client can choose to not render HTTP
images inline and instead make the shortname a link which opens the image in a
new tab. Not ideal, but a compromise for the privacy and security conscious.

For the second I don't have a good answer.

That said, I currently still prefer my suggestion to using BOB. I'd be
interested to hear your feedback and suggestions.

Regards
JC


signature.asc
Description: PGP signature
___
Standards mailing list
Info: https://mail.jabber.org/mailman/listinfo/standards
Unsubscribe: standards-unsubscr...@xmpp.org
___