Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-20 Thread Ivan Daschinsky
We copy values unchanged as is in bytes representation. Could you please specify what could be done wrong? I see only one possibility: 1. Start cluster with default encoding (This is only the windows case :)). Set some metastorage values with non ASCII chars. 2. Stop it and restart with specifying

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-20 Thread Andrey Mashenkov
Ivan, I'm still not sure it is a good idea to upgrade metastorage automatically. Because we can't detect the correct charset the metastorage was created with, and at the same time we can't be sure the current charset is the correct one. So, is there any guarantee the metastorage is consistent

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-20 Thread Ivan Daschinsky
Andrey, I believe that we already have all machinery to do migration safe. See for example org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage#init and org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage.TmpStorage. This machinery was

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-20 Thread Mikhail Petrov
Thank you all for your replies! I got the idea and agreed with it. Based on the results of the discussion, I have filed a ticket [1]. I will try to investigate it. [1] - https://issues.apache.org/jira/browse/IGNITE-16157 On 16.12.2021 20:11, Ivan Daschinsky wrote: Andrey, agree with you,

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-16 Thread Ivan Daschinsky
Andrey, agree with you, good point. чт, 16 дек. 2021 г., 16:27 Andrey Mashenkov : > Guys, > > I like the idea with a flag, but for a different purpose. > I think it is easy to detect the issue (using the flag) when > metastorage was created on a new version with a fixed charset, or on an > older

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-16 Thread Andrey Mashenkov
Guys, I like the idea with a flag, but for a different purpose. I think it is easy to detect the issue (using the flag) when metastorage was created on a new version with a fixed charset, or on an older version with the user-defined default. Regarding the flag, we can choose a new strategy

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-16 Thread Ivan Daschinsky
Slava, great ticket! I suppose, that we can add feature flag to BPlusMetaIO and if it doesn't present or it is value is false, we can rebuild metastore during recovery and decode strings to default system encoding and save all of them back to UTF-8. After recovery, we should use UTF-8 by default.

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-16 Thread Вячеслав Коптилин
Hi folks, IMHO, we should do our best to fix all these places and should avoid using the default charset. In my understanding, this is only > The main question is - should we restrict the join of nodes with different encodings or just fix all places where implicit default encoding is used and

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Pavlukhin
Do encodings in question somehow influence on actual stored data (bytes)? If so, using an implicit platform encoding sounds quite dangerous. Moving data between servers (or perhaps even rebalancing) can lead to bad consequences. Anyways, IMHO an implicit encoding is not good, but sensible default

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Daschinsky
Unpaited surrogates are emoji symbols. One should be completely insane to use emojis in login. пн, 13 дек. 2021 г., 21:30 Mikhail Petrov : > Ivan, string with unpaired surrogates symbols are serialized and > deserialized by java UTF-8 decoder successfully but the result does not > match the

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Mikhail Petrov
Ivan, string with unpaired surrogates symbols are serialized and deserialized by java UTF-8 decoder successfully but the result does not match the initial string. It may result in that if the user's login contains these symbols, it will be distorted after deserialization and the user will not

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Pavlukhin
> I guess Nikolay is talking about the problem with UTF-8 in case string > contains unpaired surrogate symbols Folks, give me a clue why it is a problem? Naively it seems to be a good restriction rather than problem. What problems can it cause in practice? 2021-12-13 16:32 GMT+03:00, Ilya

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ilya Kasnacheev
Hello! We already have a warning about this, see IgniteKernal.checkFileEncoding() Regards, -- Ilya Kasnacheev пн, 13 дек. 2021 г. в 16:26, Ivan Daschinsky : > >> But now multiple components > >> independently serialize strings for their needs and use default encoding > >> for this. > >> For

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Daschinsky
>> But now multiple components >> independently serialize strings for their needs and use default encoding >> for this. >> For example DirectByteBufferStreamImplV2#writeString, >> MetaStorage#writeRaw and so on We should fix all of them. >> BinaryUtils#utf8BytesToStr Lets use this everywhere.

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Nikolay Izhikov
> Does Java String support all unicode characters and particularly does it > support more characters than UTF-8 It’s not about Java, it’s about UTF-8 standard. Please, take a look at [1] > In November 2003, UTF-8 was restricted by RFC 3629 to match the constraints > of the UTF-16 character

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Mikhail Petrov
Ivan Daschinsky, better variant is to enforce all strings to be encoded in UTF-8 I agree that it is possible way to go. But now multiple components independently serialize strings for their needs and use default encoding for this. For example  DirectByteBufferStreamImplV2#writeString,

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Pavlukhin
> UTF-8 can’t encode all UNICODE characters. Nikolay, could you please elaborate? My understanding is that encoding we speak about matters for conversion from byte arrays to strings. Does Java String support all unicode characters and particularly does it support more characters than UTF-8 (I am

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Daschinsky
UTF-8 is already a default encoding in our BinaryObject format. So I am for unification. пн, 13 дек. 2021 г. в 12:50, Nikolay Izhikov : > Hello, Ivan. > > UTF-8 can’t encode all UNICODE characters. > > > 13 дек. 2021 г., в 12:49, Ivan Daschinsky > написал(а): > > > > Khm, maybe a better

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Nikolay Izhikov
Hello, Ivan. UTF-8 can’t encode all UNICODE characters. > 13 дек. 2021 г., в 12:49, Ivan Daschinsky написал(а): > > Khm, maybe a better variant is to enforce all strings to be encoded in > UTF-8? > AFAIK multi OS cluster is a quite common case. > > > пн, 13 дек. 2021 г. в 11:36, Mikhail

Re: [DISCUSSION] Reject join of nodes with different character encodings

2021-12-13 Thread Ivan Daschinsky
Khm, maybe a better variant is to enforce all strings to be encoded in UTF-8? AFAIK multi OS cluster is a quite common case. пн, 13 дек. 2021 г. в 11:36, Mikhail Petrov : > Igniters, > > Recently we faced the problem that if the cluster consists of nodes > running in the JVM with different