We copy values unchanged as is in bytes representation. Could you please
specify what could be done wrong?
I see only one possibility:
1. Start cluster with default encoding (This is only the windows case :)).
Set some metastorage values with non ASCII chars.
2. Stop it and restart with specifying
Ivan,
I'm still not sure it is a good idea to upgrade metastorage automatically.
Because we can't detect the correct charset the metastorage was created
with, and
at the same time we can't be sure the current charset is the correct one.
So, is there any guarantee the metastorage is consistent
Andrey, I believe that we already have all machinery to do migration safe.
See for
example
org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage#init
and
org.apache.ignite.internal.processors.cache.persistence.metastorage.MetaStorage.TmpStorage.
This machinery was
Thank you all for your replies!
I got the idea and agreed with it. Based on the results of the
discussion, I have filed a ticket [1].
I will try to investigate it.
[1] - https://issues.apache.org/jira/browse/IGNITE-16157
On 16.12.2021 20:11, Ivan Daschinsky wrote:
Andrey, agree with you,
Andrey, agree with you, good point.
чт, 16 дек. 2021 г., 16:27 Andrey Mashenkov :
> Guys,
>
> I like the idea with a flag, but for a different purpose.
> I think it is easy to detect the issue (using the flag) when
> metastorage was created on a new version with a fixed charset, or on an
> older
Guys,
I like the idea with a flag, but for a different purpose.
I think it is easy to detect the issue (using the flag) when
metastorage was created on a new version with a fixed charset, or on an
older version with the user-defined default.
Regarding the flag, we can choose a new strategy
Slava, great ticket!
I suppose, that we can add feature flag to BPlusMetaIO and if it doesn't
present or it is value is false, we can rebuild metastore during
recovery and decode strings to default system encoding and save all of them
back to UTF-8. After recovery, we should use UTF-8 by default.
Hi folks,
IMHO, we should do our best to fix all these places and should avoid using
the default charset. In my understanding, this is only
> The main question is - should we restrict the join of nodes with
different encodings or just fix all places where implicit default encoding
is used and
Do encodings in question somehow influence on actual stored data
(bytes)? If so, using an implicit platform encoding sounds quite
dangerous. Moving data between servers (or perhaps even rebalancing)
can lead to bad consequences. Anyways, IMHO an implicit encoding is
not good, but sensible default
Unpaited surrogates are emoji symbols. One should be completely insane to
use emojis in login.
пн, 13 дек. 2021 г., 21:30 Mikhail Petrov :
> Ivan, string with unpaired surrogates symbols are serialized and
> deserialized by java UTF-8 decoder successfully but the result does not
> match the
Ivan, string with unpaired surrogates symbols are serialized and
deserialized by java UTF-8 decoder successfully but the result does not
match the initial string. It may result in that if the user's login
contains these symbols, it will be distorted after deserialization and
the user will not
> I guess Nikolay is talking about the problem with UTF-8 in case string
> contains unpaired surrogate symbols
Folks, give me a clue why it is a problem? Naively it seems to be a
good restriction rather than problem. What problems can it cause in
practice?
2021-12-13 16:32 GMT+03:00, Ilya
Hello!
We already have a warning about this, see IgniteKernal.checkFileEncoding()
Regards,
--
Ilya Kasnacheev
пн, 13 дек. 2021 г. в 16:26, Ivan Daschinsky :
> >> But now multiple components
> >> independently serialize strings for their needs and use default encoding
> >> for this.
> >> For
>> But now multiple components
>> independently serialize strings for their needs and use default encoding
>> for this.
>> For example DirectByteBufferStreamImplV2#writeString,
>> MetaStorage#writeRaw and so on
We should fix all of them.
>> BinaryUtils#utf8BytesToStr
Lets use this everywhere.
> Does Java String support all unicode characters and particularly does it
> support more characters than UTF-8
It’s not about Java, it’s about UTF-8 standard.
Please, take a look at [1]
> In November 2003, UTF-8 was restricted by RFC 3629 to match the constraints
> of the UTF-16 character
Ivan Daschinsky,
better variant is to enforce all strings to be encoded in
UTF-8
I agree that it is possible way to go. But now multiple components
independently serialize strings for their needs and use default encoding
for this.
For example DirectByteBufferStreamImplV2#writeString,
> UTF-8 can’t encode all UNICODE characters.
Nikolay, could you please elaborate? My understanding is that encoding
we speak about matters for conversion from byte arrays to strings.
Does Java String support all unicode characters and particularly does
it support more characters than UTF-8 (I am
UTF-8 is already a default encoding in our BinaryObject format. So I am
for unification.
пн, 13 дек. 2021 г. в 12:50, Nikolay Izhikov :
> Hello, Ivan.
>
> UTF-8 can’t encode all UNICODE characters.
>
> > 13 дек. 2021 г., в 12:49, Ivan Daschinsky
> написал(а):
> >
> > Khm, maybe a better
Hello, Ivan.
UTF-8 can’t encode all UNICODE characters.
> 13 дек. 2021 г., в 12:49, Ivan Daschinsky написал(а):
>
> Khm, maybe a better variant is to enforce all strings to be encoded in
> UTF-8?
> AFAIK multi OS cluster is a quite common case.
>
>
> пн, 13 дек. 2021 г. в 11:36, Mikhail
Khm, maybe a better variant is to enforce all strings to be encoded in
UTF-8?
AFAIK multi OS cluster is a quite common case.
пн, 13 дек. 2021 г. в 11:36, Mikhail Petrov :
> Igniters,
>
> Recently we faced the problem that if the cluster consists of nodes
> running in the JVM with different
20 matches
Mail list logo