Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-08-01 Thread Pavel Tupitsyn
er-class or per-field level, this is
> > >> wrong
> > >> What is wrong with this? BinaryTypeConfiguration looks the right place
> > for
> > >> such a setting.
> > >> Are we talking from SQL standpoint here, so you want this to be
> defined
> > >> somehow via DDL in future?
> > >>
> > >> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <
> voze...@gridgain.com>
> > >> wrote:
> > >>
> > >>> Encoding *must not* be added to per-class or per-field level, this is
> > >>> wrong.
> > >>>
> > >>> It should be added to per-cache level, and to per-cache-column level
> in
> > >>> future.
> > >>>
> > >>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>:
> > >>>
> > >>>> We discussed this with Pavel and Anton just a moment ago. Summary
> > >>> follows.
> > >>>>
> > >>>> - New byte "flag" is to be added (ENCODED_STRING)
> > >>>> - 'Encoding' property is to be added at
> > >>>>  -- global level (BinaryConfiguration)
> > >>>>  -- per-class level (BinaryTypeConfiguration)
> > >>>>  -- per-field level (BinaryTypeConfiguration)
> > >>>>
> > >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> > >>> Developers] <
> > >>>> ml+s2346864n20159...@n4.nabble.com>:
> > >>>>
> > >>>>> As Pavel mentioned, Marshaller should not be tied to cache,
> > >>> BinaryObject
> > >>>>> should be self-explanatory, i.e. containing all information
> necessary
> > >>> for
> > >>>>> unmarshalling. This is an absolute requirement.
> > >>>>>
> > >>>>> We will have one extra byte for in serialized form, meaning that
> > >>>> advantage
> > >>>>> of custom encoding will become evident for all strings with length
> >=
> > >>> 1,
> > >>>>> which is perfectly fine. I do not quite understand what are we
> > >> arguing
> > >>>>> about.
> > >>>>>
> > >>>>> As far as configuration, we can do it as follows:
> > >>>>>
> > >>>>> 1) Add global encoding, UTF8 by default.
> > >>>>> 2) Add per-cache encoding.
> > >>>>> 3) Add encoding to JDBC and ODBC driver properties.
> > >>>>>
> > >>>>> This should be enough.
> > >>>>>
> > >>>>>
> > >>>> --
> > >>>> Best regards,
> > >>>>  Andrey Kuznetsov.
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>> --
> > >>>> View this message in context:
> > >>>> http://apache-ignite-developers.2346864.n4.nabble.
> > >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> > >>> IGNITE-5655-tp20024p20161.html
> > >>>> Sent from the Apache Ignite Developers mailing list archive at
> > >>> Nabble.com.
> > >>>
> > >>
> >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-08-01 Thread Vladimir Ozerov
>> - New byte "flag" is to be added (ENCODED_STRING)
> >>>> - 'Encoding' property is to be added at
> >>>>  -- global level (BinaryConfiguration)
> >>>>  -- per-class level (BinaryTypeConfiguration)
> >>>>  -- per-field level (BinaryTypeConfiguration)
> >>>>
> >>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> >>> Developers] <
> >>>> ml+s2346864n20159...@n4.nabble.com>:
> >>>>
> >>>>> As Pavel mentioned, Marshaller should not be tied to cache,
> >>> BinaryObject
> >>>>> should be self-explanatory, i.e. containing all information necessary
> >>> for
> >>>>> unmarshalling. This is an absolute requirement.
> >>>>>
> >>>>> We will have one extra byte for in serialized form, meaning that
> >>>> advantage
> >>>>> of custom encoding will become evident for all strings with length >=
> >>> 1,
> >>>>> which is perfectly fine. I do not quite understand what are we
> >> arguing
> >>>>> about.
> >>>>>
> >>>>> As far as configuration, we can do it as follows:
> >>>>>
> >>>>> 1) Add global encoding, UTF8 by default.
> >>>>> 2) Add per-cache encoding.
> >>>>> 3) Add encoding to JDBC and ODBC driver properties.
> >>>>>
> >>>>> This should be enough.
> >>>>>
> >>>>>
> >>>> --
> >>>> Best regards,
> >>>>  Andrey Kuznetsov.
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://apache-ignite-developers.2346864.n4.nabble.
> >>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> >>> IGNITE-5655-tp20024p20161.html
> >>>> Sent from the Apache Ignite Developers mailing list archive at
> >>> Nabble.com.
> >>>
> >>
>
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Artem Schitow
> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis.

Or on per-column (per-field) basis. Though Oracle does not have per-column 
charset, some other databases provide this option.

MySQL:
- https://dev.mysql.com/doc/refman/5.7/en/create-table.html
| CHAR[(length)] [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]
  
| VARCHAR(length) [BINARY]
[CHARACTER SET charset_name] [COLLATE collation_name]

| TEXT [BINARY]  
[CHARACTER SET charset_name] [COLLATE collation_name]

SQL Server:
- 
https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql 
 ::=  
column_name   
[ FILESTREAM ]  
[ COLLATE collation_name ]   

Postgres:
- https://www.postgresql.org/docs/9.6/static/sql-createtable.html
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT 
EXISTS ] table_name
 ( [
  { 
column_name data_type [ COLLATE collation ]

> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?

You have to have two classes in this case, maybe with a common parent. Or you 
have to select a common denominator and settle with one encoding for both of 
them. Like Java did with UTF-16 java.util.String-s.

—
Artem Schitow
artem.schi...@gmail.com




> On 28 Jul 2017, at 14:45, Vladimir Ozerov <voze...@gridgain.com> wrote:
> 
> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis. The same should be done
> for Ignite. We do not define behavior of a type. We define behavior of a
> *storage*.
> 
> Two cases when proposed approach with per-type and per-type-field approach
> doesn't work:
> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?
> 2) I have an empty grid. Now I want to create a cache/table with custom
> encoding. How can I do that without cluster restart? Nohow, because
> BinaryTypeConfiguration configured statically, while caches/tables can be
> created in runtime.
> 
> On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <ptupit...@apache.org>
> wrote:
> 
>>> As Pavel mentioned, Marshaller should not be tied to cache
>>> should be added to per-cache level
>> Not sure if I follow.
>> Marshalling and caching are two separate mechanisms.
>> Defining binary format in CacheConfiguration violates separation of
>> concerns.
>> 
>>> Encoding *must not* be added to per-class or per-field level, this is
>> wrong
>> What is wrong with this? BinaryTypeConfiguration looks the right place for
>> such a setting.
>> Are we talking from SQL standpoint here, so you want this to be defined
>> somehow via DDL in future?
>> 
>> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <voze...@gridgain.com>
>> wrote:
>> 
>>> Encoding *must not* be added to per-class or per-field level, this is
>>> wrong.
>>> 
>>> It should be added to per-cache level, and to per-cache-column level in
>>> future.
>>> 
>>> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>:
>>> 
>>>> We discussed this with Pavel and Anton just a moment ago. Summary
>>> follows.
>>>> 
>>>> - New byte "flag" is to be added (ENCODED_STRING)
>>>> - 'Encoding' property is to be added at
>>>>  -- global level (BinaryConfiguration)
>>>>  -- per-class level (BinaryTypeConfiguration)
>>>>  -- per-field level (BinaryTypeConfiguration)
>>>> 
>>>> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
>>> Developers] <
>>>> ml+s2346864n20159...@n4.nabble.com>:
>>>> 
>>>>> As Pavel mentioned, Marshaller should not be tied to cache,
>>> BinaryObject
>>>>> should be self-explanatory, i.e. containing all information necessary
>>> for
>>>>> unmarshalling. This is an absolute requirement.
>>>>> 
>>>>> We will have one extra byte for in serialized form, meaning that
>>>> advantage
>>>>> of custom encoding will become evident for all strings with length >=
>>> 1,
>>>>> which is perfectly fine. I do not quite understand what are we
>> arguing
>>>>> about.
>>>>> 
>>>>> As far as configuration, we can do it as follows:
>>>>> 
>>>>> 1) Add global encoding, UTF8 by default.
>>>>> 2) Add per-cache encoding.
>>>>> 3) Add encoding to JDBC and ODBC driver properties.
>>>>> 
>>>>> This should be enough.
>>>>> 
>>>>> 
>>>> --
>>>> Best regards,
>>>>  Andrey Kuznetsov.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> View this message in context:
>>>> http://apache-ignite-developers.2346864.n4.nabble.
>>> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
>>> IGNITE-5655-tp20024p20161.html
>>>> Sent from the Apache Ignite Developers mailing list archive at
>>> Nabble.com.
>>> 
>> 



Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Andrey Kuznetsov
Currently, marshaller determines the type of field (BYTE, INT, STRING etc.)
only by the Class of data being serialized. It seems rather non-trivial to
manage marshaling parameters at cache creation point. Alternatively, there
exists simple and flexible way: just to introduce new Java type, say,
StringWithEncoding, but it looks ugly to my mind.

2017-07-28 14:45 GMT+03:00 Vladimir Ozerov :

> String encoding is a concept similar to "collation" in RDBMS. You can
> define it either globally, or on per-table basis. The same should be done
> for Ignite. We do not define behavior of a type. We define behavior of a
> *storage*.
>
> Two cases when proposed approach with per-type and per-type-field approach
> doesn't work:
> 1) I have a class Person with field "name". I have two caches/tables - one
> for US persons, where name is in Latin, another for RU persons with
> Cyrillic names. How can achieve optimal encoding formats for both tables?
> 2) I have an empty grid. Now I want to create a cache/table with custom
> encoding. How can I do that without cluster restart? Nohow, because
> BinaryTypeConfiguration configured statically, while caches/tables can be
> created in runtime.
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
String encoding is a concept similar to "collation" in RDBMS. You can
define it either globally, or on per-table basis. The same should be done
for Ignite. We do not define behavior of a type. We define behavior of a
*storage*.

Two cases when proposed approach with per-type and per-type-field approach
doesn't work:
1) I have a class Person with field "name". I have two caches/tables - one
for US persons, where name is in Latin, another for RU persons with
Cyrillic names. How can achieve optimal encoding formats for both tables?
2) I have an empty grid. Now I want to create a cache/table with custom
encoding. How can I do that without cluster restart? Nohow, because
BinaryTypeConfiguration configured statically, while caches/tables can be
created in runtime.

On Fri, Jul 28, 2017 at 2:38 PM, Pavel Tupitsyn <ptupit...@apache.org>
wrote:

> > As Pavel mentioned, Marshaller should not be tied to cache
> > should be added to per-cache level
> Not sure if I follow.
> Marshalling and caching are two separate mechanisms.
> Defining binary format in CacheConfiguration violates separation of
> concerns.
>
> > Encoding *must not* be added to per-class or per-field level, this is
> wrong
> What is wrong with this? BinaryTypeConfiguration looks the right place for
> such a setting.
> Are we talking from SQL standpoint here, so you want this to be defined
> somehow via DDL in future?
>
> On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
> > Encoding *must not* be added to per-class or per-field level, this is
> > wrong.
> >
> > It should be added to per-cache level, and to per-cache-column level in
> > future.
> >
> > пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>:
> >
> > > We discussed this with Pavel and Anton just a moment ago. Summary
> > follows.
> > >
> > > - New byte "flag" is to be added (ENCODED_STRING)
> > > - 'Encoding' property is to be added at
> > >   -- global level (BinaryConfiguration)
> > >   -- per-class level (BinaryTypeConfiguration)
> > >   -- per-field level (BinaryTypeConfiguration)
> > >
> > > 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> > Developers] <
> > > ml+s2346864n20159...@n4.nabble.com>:
> > >
> > > > As Pavel mentioned, Marshaller should not be tied to cache,
> > BinaryObject
> > > > should be self-explanatory, i.e. containing all information necessary
> > for
> > > > unmarshalling. This is an absolute requirement.
> > > >
> > > > We will have one extra byte for in serialized form, meaning that
> > > advantage
> > > > of custom encoding will become evident for all strings with length >=
> > 1,
> > > > which is perfectly fine. I do not quite understand what are we
> arguing
> > > > about.
> > > >
> > > > As far as configuration, we can do it as follows:
> > > >
> > > > 1) Add global encoding, UTF8 by default.
> > > > 2) Add per-cache encoding.
> > > > 3) Add encoding to JDBC and ODBC driver properties.
> > > >
> > > > This should be enough.
> > > >
> > > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> > >
> > >
> > >
> > >
> > > --
> > > View this message in context:
> > > http://apache-ignite-developers.2346864.n4.nabble.
> > com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> > IGNITE-5655-tp20024p20161.html
> > > Sent from the Apache Ignite Developers mailing list archive at
> > Nabble.com.
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Pavel Tupitsyn
> As Pavel mentioned, Marshaller should not be tied to cache
> should be added to per-cache level
Not sure if I follow.
Marshalling and caching are two separate mechanisms.
Defining binary format in CacheConfiguration violates separation of
concerns.

> Encoding *must not* be added to per-class or per-field level, this is
wrong
What is wrong with this? BinaryTypeConfiguration looks the right place for
such a setting.
Are we talking from SQL standpoint here, so you want this to be defined
somehow via DDL in future?

On Fri, Jul 28, 2017 at 2:30 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Encoding *must not* be added to per-class or per-field level, this is
> wrong.
>
> It should be added to per-cache level, and to per-cache-column level in
> future.
>
> пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>:
>
> > We discussed this with Pavel and Anton just a moment ago. Summary
> follows.
> >
> > - New byte "flag" is to be added (ENCODED_STRING)
> > - 'Encoding' property is to be added at
> >   -- global level (BinaryConfiguration)
> >   -- per-class level (BinaryTypeConfiguration)
> >   -- per-field level (BinaryTypeConfiguration)
> >
> > 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite
> Developers] <
> > ml+s2346864n20159...@n4.nabble.com>:
> >
> > > As Pavel mentioned, Marshaller should not be tied to cache,
> BinaryObject
> > > should be self-explanatory, i.e. containing all information necessary
> for
> > > unmarshalling. This is an absolute requirement.
> > >
> > > We will have one extra byte for in serialized form, meaning that
> > advantage
> > > of custom encoding will become evident for all strings with length >=
> 1,
> > > which is perfectly fine. I do not quite understand what are we arguing
> > > about.
> > >
> > > As far as configuration, we can do it as follows:
> > >
> > > 1) Add global encoding, UTF8 by default.
> > > 2) Add per-cache encoding.
> > > 3) Add encoding to JDBC and ODBC driver properties.
> > >
> > > This should be enough.
> > >
> > >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
> >
> >
> >
> > --
> > View this message in context:
> > http://apache-ignite-developers.2346864.n4.nabble.
> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> IGNITE-5655-tp20024p20161.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
Encoding *must not* be added to per-class or per-field level, this is wrong.

It should be added to per-cache level, and to per-cache-column level in
future.

пт, 28 июля 2017 г. в 14:27, Andrey Kuznetsov <stku...@gmail.com>:

> We discussed this with Pavel and Anton just a moment ago. Summary follows.
>
> - New byte "flag" is to be added (ENCODED_STRING)
> - 'Encoding' property is to be added at
>   -- global level (BinaryConfiguration)
>   -- per-class level (BinaryTypeConfiguration)
>   -- per-field level (BinaryTypeConfiguration)
>
> 2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite Developers] <
> ml+s2346864n20159...@n4.nabble.com>:
>
> > As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject
> > should be self-explanatory, i.e. containing all information necessary for
> > unmarshalling. This is an absolute requirement.
> >
> > We will have one extra byte for in serialized form, meaning that
> advantage
> > of custom encoding will become evident for all strings with length >= 1,
> > which is perfectly fine. I do not quite understand what are we arguing
> > about.
> >
> > As far as configuration, we can do it as follows:
> >
> > 1) Add global encoding, UTF8 by default.
> > 2) Add per-cache encoding.
> > 3) Add encoding to JDBC and ODBC driver properties.
> >
> > This should be enough.
> >
> >
> --
> Best regards,
>   Andrey Kuznetsov.
>
>
>
>
> --
> View this message in context:
> http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024p20161.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Andrey Kuznetsov
We discussed this with Pavel and Anton just a moment ago. Summary follows.

- New byte "flag" is to be added (ENCODED_STRING)
- 'Encoding' property is to be added at
  -- global level (BinaryConfiguration)
  -- per-class level (BinaryTypeConfiguration)
  -- per-field level (BinaryTypeConfiguration)

2017-07-28 14:15 GMT+03:00 Vladimir Ozerov [via Apache Ignite Developers] <
ml+s2346864n20159...@n4.nabble.com>:

> As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject
> should be self-explanatory, i.e. containing all information necessary for
> unmarshalling. This is an absolute requirement.
>
> We will have one extra byte for in serialized form, meaning that advantage
> of custom encoding will become evident for all strings with length >= 1,
> which is perfectly fine. I do not quite understand what are we arguing
> about.
>
> As far as configuration, we can do it as follows:
>
> 1) Add global encoding, UTF8 by default.
> 2) Add per-cache encoding.
> 3) Add encoding to JDBC and ODBC driver properties.
>
> This should be enough.
>
>
-- 
Best regards,
  Andrey Kuznetsov.




--
View this message in context: 
http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024p20161.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-28 Thread Vladimir Ozerov
As Pavel mentioned, Marshaller should not be tied to cache, BinaryObject
should be self-explanatory, i.e. containing all information necessary for
unmarshalling. This is an absolute requirement.

We will have one extra byte for in serialized form, meaning that advantage
of custom encoding will become evident for all strings with length >= 1,
which is perfectly fine. I do not quite understand what are we arguing
about.

As far as configuration, we can do it as follows:

1) Add global encoding, UTF8 by default.
2) Add per-cache encoding.
3) Add encoding to JDBC and ODBC driver properties.

This should be enough.

пт, 28 июля 2017 г. в 11:45, Pavel Tupitsyn :

> Val, of course other options should be available, such as
> BinaryTypeConfiguration,
> and maybe field-level and class-level annotations.
>
> On Thu, Jul 27, 2017 at 9:07 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
>
> > Pavel,
> >
> > This forces user to implement Binarylizable for whole type in case they
> > want to change encoding for one-two fields, right? I really don't like
> it,
> > why not add default encoding to BinaryTypeConfiguration?
> >
> > -Val
> >
> > On Thu, Jul 27, 2017 at 7:54 AM, Pavel Tupitsyn 
> > wrote:
> >
> > > > 1 byte for every field just for this
> > > GridBinaryMarshaller.STRING data type remains untouched.
> > > We add GridBinaryMarshaller.STRING_ENCODED, which has additional byte
> > for
> > > encoding type.
> > >
> > > This means no overhead for existing code.
> > > I think the most common use case is English, which uses 1 byte per char
> > in
> > > UTF-8.
> > > This is already as fast and compact as possible, and we don't want to
> > > introduce any lookup overhead here.
> > >
> > > And when user knows that their data will be more compact in some
> specific
> > > encoding,
> > > they use some BinaryWriter.writeString overload, which writes a
> different
> > > type code.
> > >
> > > Yes, it also writes an extra byte, but you save a byte per char of the
> > > actual string
> > > (for example, when using Windows-1251 for Russian text), so this does
> not
> > > matter.
> > >
> > > On Thu, Jul 27, 2017 at 5:35 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > Pavel, what would be the size overhead? Are we adding 1 byte for
> every
> > > > field just for this? If you would like to have this info in the
> binary
> > > > object directly, can we in this case have some bitmap of
> > > field-to-encoding?
> > > >
> > > > D.
> > > >
> > > > On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn <
> ptupit...@apache.org>
> > > > wrote:
> > > >
> > > > > I'm not sure I uderstand how this "per field" configuration is
> > supposed
> > > > to
> > > > > be implemented.
> > > > > * Marshaller is not tied to a cache. It serializes all kinds of
> > things,
> > > > > like compute job parameters and results.
> > > > > * Raw mode does not involve field names.
> > > > >
> > > > > Also it seems like a complicated and expensive solution - looking
> up
> > > > string
> > > > > format somewhere in the metadata will be slow.
> > > > >
> > > > > "encoded string" data type suggestion from Vladimir looks better to
> > me
> > > > from
> > > > > performance and implementation standpoint.
> > > > >
> > > > > Thanks,
> > > > > Pavel
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego  >
> > > > wrote:
> > > > > >
> > > > > > > Just a note from the platforms guy:
> > > > > > >
> > > > > > > Solution with table-level configuration is going to be
> > > significantly
> > > > > > > harder to implement for platforms and ODBC then field-level
> one.
> > > > > > >
> > > > > >
> > > > > > Igor, it seems like you are advocating the per-cell
> configuration,
> > > not
> > > > > > per-field one. The per-field configuration can be defined at the
> > > > > > table/cache level.
> > > > > >
> > > > > > I see your point about C++ and .NET integrations however. Can't
> we
> > > > > provide
> > > > > > this info at node-join time or table-creation time? This way all
> > > nodes
> > > > > will
> > > > > > receive it and you will be able to grab it on different
> platforms.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Also, what about binary objects, which are not stored in cache,
> > > > > > > but being marshalled?
> > > > > > >
> > > > > >
> > > > > > I think the default system encoding should be used here. If we
> > don't
> > > > have
> > > > > > configuration for default encoding, we should add it.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Best Regards,
> > > > > > > Igor
> > > > > > >
> > > > > > > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> > > > > > dsetrak...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav 

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Valentin Kulichenko
Pavel,

This forces user to implement Binarylizable for whole type in case they
want to change encoding for one-two fields, right? I really don't like it,
why not add default encoding to BinaryTypeConfiguration?

-Val

On Thu, Jul 27, 2017 at 7:54 AM, Pavel Tupitsyn 
wrote:

> > 1 byte for every field just for this
> GridBinaryMarshaller.STRING data type remains untouched.
> We add GridBinaryMarshaller.STRING_ENCODED, which has additional byte for
> encoding type.
>
> This means no overhead for existing code.
> I think the most common use case is English, which uses 1 byte per char in
> UTF-8.
> This is already as fast and compact as possible, and we don't want to
> introduce any lookup overhead here.
>
> And when user knows that their data will be more compact in some specific
> encoding,
> they use some BinaryWriter.writeString overload, which writes a different
> type code.
>
> Yes, it also writes an extra byte, but you save a byte per char of the
> actual string
> (for example, when using Windows-1251 for Russian text), so this does not
> matter.
>
> On Thu, Jul 27, 2017 at 5:35 PM, Dmitriy Setrakyan 
> wrote:
>
> > Pavel, what would be the size overhead? Are we adding 1 byte for every
> > field just for this? If you would like to have this info in the binary
> > object directly, can we in this case have some bitmap of
> field-to-encoding?
> >
> > D.
> >
> > On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn 
> > wrote:
> >
> > > I'm not sure I uderstand how this "per field" configuration is supposed
> > to
> > > be implemented.
> > > * Marshaller is not tied to a cache. It serializes all kinds of things,
> > > like compute job parameters and results.
> > > * Raw mode does not involve field names.
> > >
> > > Also it seems like a complicated and expensive solution - looking up
> > string
> > > format somewhere in the metadata will be slow.
> > >
> > > "encoded string" data type suggestion from Vladimir looks better to me
> > from
> > > performance and implementation standpoint.
> > >
> > > Thanks,
> > > Pavel
> > >
> > >
> > >
> > > On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego 
> > wrote:
> > > >
> > > > > Just a note from the platforms guy:
> > > > >
> > > > > Solution with table-level configuration is going to be
> significantly
> > > > > harder to implement for platforms and ODBC then field-level one.
> > > > >
> > > >
> > > > Igor, it seems like you are advocating the per-cell configuration,
> not
> > > > per-field one. The per-field configuration can be defined at the
> > > > table/cache level.
> > > >
> > > > I see your point about C++ and .NET integrations however. Can't we
> > > provide
> > > > this info at node-join time or table-creation time? This way all
> nodes
> > > will
> > > > receive it and you will be able to grab it on different platforms.
> > > >
> > > >
> > > > >
> > > > > Also, what about binary objects, which are not stored in cache,
> > > > > but being marshalled?
> > > > >
> > > >
> > > > I think the default system encoding should be used here. If we don't
> > have
> > > > configuration for default encoding, we should add it.
> > > >
> > > >
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Igor
> > > > >
> > > > > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> > > > dsetrak...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur <
> > > > daradu...@gmail.com
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > >
> > > > > > > > Encoding must be set on per field basis. This will give us as
> > > most
> > > > > > > flexible
> > > > > > > > solution at the cost of 1-byte overhead.
> > > > > > >
> > > > > > > > Vova, I agree that the encoding should be set on per-field
> > basis,
> > > > but
> > > > > > at
> > > > > > > > the table level, not at a cell level.
> > > > > > >
> > > > > > > Dmitriy, Vladimir,
> > > > > > > Let's use both approaches :-)
> > > > > > > We can add parameter to CacheConfiguration.
> > > > > > > If parameter specifie to use cache level encoding then
> marshaller
> > > > will
> > > > > > use
> > > > > > > encoding in a cache,
> > > > > > > otherwise marshaller will use per-field encoding.
> > > > > > > Of course only if it doesn't complicate the solution.
> > > > > > >
> > > > > > >
> > > > > > I think that it will complicate the solution and will complicate
> > the
> > > > > > marshalling protocol. The advantage of specifying the encoding at
> > > > > > table/cache level is that we don't need to add extra encoding
> bytes
> > > to
> > > > > the
> > > > > > marshalling protocol.
> > > > > >
> > > > > > I think Vova was suggesting encoding at the cell level, not at
> the
> > > > field
> > > > > > level, which seems to be redundant to me.
> > > > > >
> > > > > > Vova, do you agree?
> > > > > >
> > > > >
> > > 

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Dmitriy Setrakyan
Pavel, what would be the size overhead? Are we adding 1 byte for every
field just for this? If you would like to have this info in the binary
object directly, can we in this case have some bitmap of field-to-encoding?

D.

On Thu, Jul 27, 2017 at 9:22 AM, Pavel Tupitsyn 
wrote:

> I'm not sure I uderstand how this "per field" configuration is supposed to
> be implemented.
> * Marshaller is not tied to a cache. It serializes all kinds of things,
> like compute job parameters and results.
> * Raw mode does not involve field names.
>
> Also it seems like a complicated and expensive solution - looking up string
> format somewhere in the metadata will be slow.
>
> "encoded string" data type suggestion from Vladimir looks better to me from
> performance and implementation standpoint.
>
> Thanks,
> Pavel
>
>
>
> On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan 
> wrote:
>
> > On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego  wrote:
> >
> > > Just a note from the platforms guy:
> > >
> > > Solution with table-level configuration is going to be significantly
> > > harder to implement for platforms and ODBC then field-level one.
> > >
> >
> > Igor, it seems like you are advocating the per-cell configuration, not
> > per-field one. The per-field configuration can be defined at the
> > table/cache level.
> >
> > I see your point about C++ and .NET integrations however. Can't we
> provide
> > this info at node-join time or table-creation time? This way all nodes
> will
> > receive it and you will be able to grab it on different platforms.
> >
> >
> > >
> > > Also, what about binary objects, which are not stored in cache,
> > > but being marshalled?
> > >
> >
> > I think the default system encoding should be used here. If we don't have
> > configuration for default encoding, we should add it.
> >
> >
> > >
> > >
> > > Best Regards,
> > > Igor
> > >
> > > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> > dsetrak...@apache.org>
> > > wrote:
> > >
> > > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur <
> > daradu...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > >
> > > > > > Encoding must be set on per field basis. This will give us as
> most
> > > > > flexible
> > > > > > solution at the cost of 1-byte overhead.
> > > > >
> > > > > > Vova, I agree that the encoding should be set on per-field basis,
> > but
> > > > at
> > > > > > the table level, not at a cell level.
> > > > >
> > > > > Dmitriy, Vladimir,
> > > > > Let's use both approaches :-)
> > > > > We can add parameter to CacheConfiguration.
> > > > > If parameter specifie to use cache level encoding then marshaller
> > will
> > > > use
> > > > > encoding in a cache,
> > > > > otherwise marshaller will use per-field encoding.
> > > > > Of course only if it doesn't complicate the solution.
> > > > >
> > > > >
> > > > I think that it will complicate the solution and will complicate the
> > > > marshalling protocol. The advantage of specifying the encoding at
> > > > table/cache level is that we don't need to add extra encoding bytes
> to
> > > the
> > > > marshalling protocol.
> > > >
> > > > I think Vova was suggesting encoding at the cell level, not at the
> > field
> > > > level, which seems to be redundant to me.
> > > >
> > > > Vova, do you agree?
> > > >
> > >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Pavel Tupitsyn
I'm not sure I uderstand how this "per field" configuration is supposed to
be implemented.
* Marshaller is not tied to a cache. It serializes all kinds of things,
like compute job parameters and results.
* Raw mode does not involve field names.

Also it seems like a complicated and expensive solution - looking up string
format somewhere in the metadata will be slow.

"encoded string" data type suggestion from Vladimir looks better to me from
performance and implementation standpoint.

Thanks,
Pavel



On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan 
wrote:

> On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego  wrote:
>
> > Just a note from the platforms guy:
> >
> > Solution with table-level configuration is going to be significantly
> > harder to implement for platforms and ODBC then field-level one.
> >
>
> Igor, it seems like you are advocating the per-cell configuration, not
> per-field one. The per-field configuration can be defined at the
> table/cache level.
>
> I see your point about C++ and .NET integrations however. Can't we provide
> this info at node-join time or table-creation time? This way all nodes will
> receive it and you will be able to grab it on different platforms.
>
>
> >
> > Also, what about binary objects, which are not stored in cache,
> > but being marshalled?
> >
>
> I think the default system encoding should be used here. If we don't have
> configuration for default encoding, we should add it.
>
>
> >
> >
> > Best Regards,
> > Igor
> >
> > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > >
> > > > > Encoding must be set on per field basis. This will give us as most
> > > > flexible
> > > > > solution at the cost of 1-byte overhead.
> > > >
> > > > > Vova, I agree that the encoding should be set on per-field basis,
> but
> > > at
> > > > > the table level, not at a cell level.
> > > >
> > > > Dmitriy, Vladimir,
> > > > Let's use both approaches :-)
> > > > We can add parameter to CacheConfiguration.
> > > > If parameter specifie to use cache level encoding then marshaller
> will
> > > use
> > > > encoding in a cache,
> > > > otherwise marshaller will use per-field encoding.
> > > > Of course only if it doesn't complicate the solution.
> > > >
> > > >
> > > I think that it will complicate the solution and will complicate the
> > > marshalling protocol. The advantage of specifying the encoding at
> > > table/cache level is that we don't need to add extra encoding bytes to
> > the
> > > marshalling protocol.
> > >
> > > I think Vova was suggesting encoding at the cell level, not at the
> field
> > > level, which seems to be redundant to me.
> > >
> > > Vova, do you agree?
> > >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Igor Sapego
> Igor, it seems like you are advocating the per-cell configuration, not
> per-field one.

True, some terms mismatch here.

> I see your point about C++ and .NET integrations however. Can't we provide
> this info at node-join time or table-creation time? This way all nodes
will
> receive it and you will be able to grab it on different platforms.

This issue can be solved in different ways, I just say that it will be
significantly
more complicated. Just something we may want to consider when we choose
a solution here.

Best Regards,
Igor

On Thu, Jul 27, 2017 at 5:10 PM, Dmitriy Setrakyan 
wrote:

> On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego  wrote:
>
> > Just a note from the platforms guy:
> >
> > Solution with table-level configuration is going to be significantly
> > harder to implement for platforms and ODBC then field-level one.
> >
>
> Igor, it seems like you are advocating the per-cell configuration, not
> per-field one. The per-field configuration can be defined at the
> table/cache level.
>
> I see your point about C++ and .NET integrations however. Can't we provide
> this info at node-join time or table-creation time? This way all nodes will
> receive it and you will be able to grab it on different platforms.
>
>
> >
> > Also, what about binary objects, which are not stored in cache,
> > but being marshalled?
> >
>
> I think the default system encoding should be used here. If we don't have
> configuration for default encoding, we should add it.
>
>
> >
> >
> > Best Regards,
> > Igor
> >
> > On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan <
> dsetrak...@apache.org>
> > wrote:
> >
> > > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur <
> daradu...@gmail.com
> > >
> > > wrote:
> > >
> > > >
> > > > > Encoding must be set on per field basis. This will give us as most
> > > > flexible
> > > > > solution at the cost of 1-byte overhead.
> > > >
> > > > > Vova, I agree that the encoding should be set on per-field basis,
> but
> > > at
> > > > > the table level, not at a cell level.
> > > >
> > > > Dmitriy, Vladimir,
> > > > Let's use both approaches :-)
> > > > We can add parameter to CacheConfiguration.
> > > > If parameter specifie to use cache level encoding then marshaller
> will
> > > use
> > > > encoding in a cache,
> > > > otherwise marshaller will use per-field encoding.
> > > > Of course only if it doesn't complicate the solution.
> > > >
> > > >
> > > I think that it will complicate the solution and will complicate the
> > > marshalling protocol. The advantage of specifying the encoding at
> > > table/cache level is that we don't need to add extra encoding bytes to
> > the
> > > marshalling protocol.
> > >
> > > I think Vova was suggesting encoding at the cell level, not at the
> field
> > > level, which seems to be redundant to me.
> > >
> > > Vova, do you agree?
> > >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Dmitriy Setrakyan
On Thu, Jul 27, 2017 at 9:04 AM, Igor Sapego  wrote:

> Just a note from the platforms guy:
>
> Solution with table-level configuration is going to be significantly
> harder to implement for platforms and ODBC then field-level one.
>

Igor, it seems like you are advocating the per-cell configuration, not
per-field one. The per-field configuration can be defined at the
table/cache level.

I see your point about C++ and .NET integrations however. Can't we provide
this info at node-join time or table-creation time? This way all nodes will
receive it and you will be able to grab it on different platforms.


>
> Also, what about binary objects, which are not stored in cache,
> but being marshalled?
>

I think the default system encoding should be used here. If we don't have
configuration for default encoding, we should add it.


>
>
> Best Regards,
> Igor
>
> On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan 
> wrote:
>
> > On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur  >
> > wrote:
> >
> > >
> > > > Encoding must be set on per field basis. This will give us as most
> > > flexible
> > > > solution at the cost of 1-byte overhead.
> > >
> > > > Vova, I agree that the encoding should be set on per-field basis, but
> > at
> > > > the table level, not at a cell level.
> > >
> > > Dmitriy, Vladimir,
> > > Let's use both approaches :-)
> > > We can add parameter to CacheConfiguration.
> > > If parameter specifie to use cache level encoding then marshaller will
> > use
> > > encoding in a cache,
> > > otherwise marshaller will use per-field encoding.
> > > Of course only if it doesn't complicate the solution.
> > >
> > >
> > I think that it will complicate the solution and will complicate the
> > marshalling protocol. The advantage of specifying the encoding at
> > table/cache level is that we don't need to add extra encoding bytes to
> the
> > marshalling protocol.
> >
> > I think Vova was suggesting encoding at the cell level, not at the field
> > level, which seems to be redundant to me.
> >
> > Vova, do you agree?
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-27 Thread Igor Sapego
Just a note from the platforms guy:

Solution with table-level configuration is going to be significantly
harder to implement for platforms and ODBC then field-level one.

Also, what about binary objects, which are not stored in cache,
but being marshalled?


Best Regards,
Igor

On Wed, Jul 26, 2017 at 7:22 PM, Dmitriy Setrakyan 
wrote:

> On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur 
> wrote:
>
> >
> > > Encoding must be set on per field basis. This will give us as most
> > flexible
> > > solution at the cost of 1-byte overhead.
> >
> > > Vova, I agree that the encoding should be set on per-field basis, but
> at
> > > the table level, not at a cell level.
> >
> > Dmitriy, Vladimir,
> > Let's use both approaches :-)
> > We can add parameter to CacheConfiguration.
> > If parameter specifie to use cache level encoding then marshaller will
> use
> > encoding in a cache,
> > otherwise marshaller will use per-field encoding.
> > Of course only if it doesn't complicate the solution.
> >
> >
> I think that it will complicate the solution and will complicate the
> marshalling protocol. The advantage of specifying the encoding at
> table/cache level is that we don't need to add extra encoding bytes to the
> marshalling protocol.
>
> I think Vova was suggesting encoding at the cell level, not at the field
> level, which seems to be redundant to me.
>
> Vova, do you agree?
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Dmitriy Setrakyan
On Wed, Jul 26, 2017 at 3:40 AM, Vyacheslav Daradur 
wrote:

>
> > Encoding must be set on per field basis. This will give us as most
> flexible
> > solution at the cost of 1-byte overhead.
>
> > Vova, I agree that the encoding should be set on per-field basis, but at
> > the table level, not at a cell level.
>
> Dmitriy, Vladimir,
> Let's use both approaches :-)
> We can add parameter to CacheConfiguration.
> If parameter specifie to use cache level encoding then marshaller will use
> encoding in a cache,
> otherwise marshaller will use per-field encoding.
> Of course only if it doesn't complicate the solution.
>
>
I think that it will complicate the solution and will complicate the
marshalling protocol. The advantage of specifying the encoding at
table/cache level is that we don't need to add extra encoding bytes to the
marshalling protocol.

I think Vova was suggesting encoding at the cell level, not at the field
level, which seems to be redundant to me.

Vova, do you agree?


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Andrey Kuznetsov
Vladimir,

It's rather simple to support string encoding by setting it in
BinaryConfiguration. But I'm unsure whether it's a desired change. We need
to express our goal more precisely: should we control encoding at cache
level, field level, or binary configuration level? Currently,
BinaryMarshaller is controlled only by BinaryConfiguration and it's hard
for me to estimate changes to bring string encoding, say, to per-cache
basis.

2017-07-25 20:17 GMT+03:00 Vladimir Ozerov [via Apache Ignite Developers] <
ml+s2346864n20046...@n4.nabble.com>:

> Vyacheslav,
> When we finish varlen optimization for string lengths, I am afraid we
> could
> end up with very messy protocol, should we mix encoded length and
> encoding.
>
> Dima,
> Encoding must be set on per field basis. This will give us as most
> flexible
> solution at the cost of 1-byte overhead.
>
> вт, 25 июля 2017 г. в 20:23, Dmitriy Setrakyan <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20046=0>>:
>
> > I don't understand why this encoding is done on per-object and not on
> > per-cache level. Shouldn't the column-to-encoding mapping be defined at
> > cache level configuration?
> >
> > On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20046=1>>
> > wrote:
> >
> > > Andrey,
> > >
> > > You cannot have optional part in the middle as it will break
> > compatibility
> > > in dangerous way, probably leading to node crash. Also having INT (4
> > bytes)
> > > looks too much for me.
> > >
> > > Instead, I would add new type "encoded string":
> > > 1 byte - type
> > > 1 byte - encoding code, map frequently used encodings to some byte
> value;
> > > also have a special value, meaning that encoding will be written as
> > string
> > > afterwards, this way we will support any encoding out of the box
> > > [optional] encoding name
> > > 4 bytes - string length
> > > Finally - string bytes
> > >
> > > Vladimir.
> > >
> > > вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20046=2>>:
> > >
> > > > I apologize for damaged formatting. Below is my message as it should
> > be.
> > > >
> > > >
> > > > Hi Igniters,
> > > >
> > > > I'd like to discuss future changes related to
> > https://issues.apache.org/
> > > > jira/browse/IGNITE-5655
> > > > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> > > >
> > > > Is it really good idea to introduce new flag (ENCODED_STRING) for
> > > existing
> > > > String datatype? It's possible to use existing STRING flag at
> > negligible
> > > > performance cost.
> > > >
> > > > Currently, utf-8-encoded string looks like
> > > >
> > > > byteFlag nonNegativeIntStrLen bytes
> > > >
> > > > This format can be backward compatibly extended to
> > > >
> > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > >
> > > > Next, I suggest to add new BinaryConfiguration property for encoding
> to
> > > use
> > > > instead of using global property. It seems to be more convenient for
> > > user.
> > > >
> > > > I'll appreciate your feedback.
> > > >
> > > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20046=3>>:
> > > >
> > > > > Hi Igniters,I'd like to discuss future changes related to
> > IGNITE-5655
> > > > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it
> really
> > > good
> > > > > idea to introduce new flag (ENCODED_STRING) for existing String
> > > datatype?
> > > > > It's possible to use existing STRING flag at negligible
> performance
> > > cost.
> > > > > Currently, utf-8-encoded string looks like
> > > > > byteFlag nonNegativeIntStrLen bytes
> > > > > This format can be backward compatibly extended to
> > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > > > Next, I suggest to add new BinaryConfiguration property for
> encoding
> > to
> > > > use
> > > > > instead of using global property. It seems to be more convenient
> for
> > > > > user.I'll ap

Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-26 Thread Vyacheslav Daradur
dback.
> > > > >
> > > > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
> > > > >
> > > > > > Hi Igniters,I'd like to discuss future changes related to
> > > IGNITE-5655
> > > > > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it
> > really
> > > > good
> > > > > > idea to introduce new flag (ENCODED_STRING) for existing String
> > > > datatype?
> > > > > > It's possible to use existing STRING flag at negligible
> performance
> > > > cost.
> > > > > > Currently, utf-8-encoded string looks like
> > > > > > byteFlag nonNegativeIntStrLen bytes
> > > > > > This format can be backward compatibly extended to
> > > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > > > > Next, I suggest to add new BinaryConfiguration property for
> > encoding
> > > to
> > > > > use
> > > > > > instead of using global property. It seems to be more convenient
> > for
> > > > > > user.I'll appreciate your feedback.
> > > > > >
> > > > > >
> > > > > >
> > > > > > -
> > > > > > Best regards,
> > > > > >   Andrey Kuznetsov.
> > > > > > --
> > > > > > View this message in context: http://apache-ignite-
> > > > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > > > > Sent from the Apache Ignite Developers mailing list archive at
> > > > > Nabble.com.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >   Andrey Kuznetsov.
> > > > >
> > > >
> > >
> >
>



-- 
Best Regards, Vyacheslav D.


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Dmitriy Setrakyan
On Tue, Jul 25, 2017 at 12:36 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Vyacheslav,
> When we finish varlen optimization for string lengths, I am afraid we could
> end up with very messy protocol, should we mix encoded length and encoding.
>
> Dima,
> Encoding must be set on per field basis. This will give us as most flexible
> solution at the cost of 1-byte overhead.
>

Vova, I agree that the encoding should be set on per-field basis, but at
the table level, not at a cell level. I cannot foresee a situation where we
would have different encodings in the same column. If that ever happens,
then user can provide already encoded values.


>
> вт, 25 июля 2017 г. в 20:23, Dmitriy Setrakyan <dsetrak...@apache.org>:
>
> > I don't understand why this encoding is done on per-object and not on
> > per-cache level. Shouldn't the column-to-encoding mapping be defined at
> > cache level configuration?
> >
> > On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov <voze...@gridgain.com>
> > wrote:
> >
> > > Andrey,
> > >
> > > You cannot have optional part in the middle as it will break
> > compatibility
> > > in dangerous way, probably leading to node crash. Also having INT (4
> > bytes)
> > > looks too much for me.
> > >
> > > Instead, I would add new type "encoded string":
> > > 1 byte - type
> > > 1 byte - encoding code, map frequently used encodings to some byte
> value;
> > > also have a special value, meaning that encoding will be written as
> > string
> > > afterwards, this way we will support any encoding out of the box
> > > [optional] encoding name
> > > 4 bytes - string length
> > > Finally - string bytes
> > >
> > > Vladimir.
> > >
> > > вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>:
> > >
> > > > I apologize for damaged formatting. Below is my message as it should
> > be.
> > > >
> > > >
> > > > Hi Igniters,
> > > >
> > > > I'd like to discuss future changes related to
> > https://issues.apache.org/
> > > > jira/browse/IGNITE-5655
> > > > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> > > >
> > > > Is it really good idea to introduce new flag (ENCODED_STRING) for
> > > existing
> > > > String datatype? It's possible to use existing STRING flag at
> > negligible
> > > > performance cost.
> > > >
> > > > Currently, utf-8-encoded string looks like
> > > >
> > > > byteFlag nonNegativeIntStrLen bytes
> > > >
> > > > This format can be backward compatibly extended to
> > > >
> > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > >
> > > > Next, I suggest to add new BinaryConfiguration property for encoding
> to
> > > use
> > > > instead of using global property. It seems to be more convenient for
> > > user.
> > > >
> > > > I'll appreciate your feedback.
> > > >
> > > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
> > > >
> > > > > Hi Igniters,I'd like to discuss future changes related to
> > IGNITE-5655
> > > > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it
> really
> > > good
> > > > > idea to introduce new flag (ENCODED_STRING) for existing String
> > > datatype?
> > > > > It's possible to use existing STRING flag at negligible performance
> > > cost.
> > > > > Currently, utf-8-encoded string looks like
> > > > > byteFlag nonNegativeIntStrLen bytes
> > > > > This format can be backward compatibly extended to
> > > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > > > Next, I suggest to add new BinaryConfiguration property for
> encoding
> > to
> > > > use
> > > > > instead of using global property. It seems to be more convenient
> for
> > > > > user.I'll appreciate your feedback.
> > > > >
> > > > >
> > > > >
> > > > > -
> > > > > Best regards,
> > > > >   Andrey Kuznetsov.
> > > > > --
> > > > > View this message in context: http://apache-ignite-
> > > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > > > Sent from the Apache Ignite Developers mailing list archive at
> > > > Nabble.com.
> > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > > >
> > >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
Vladimir,

Thanks for reply. In any case, we'll break compatibility by introducing new
feature in marshalling. But both approaches preserve backward
compatibility.

I deemed it's unusual to make two differerent type markers (flags) for
single datatype. I can't see the source right now, but I'm unsure whether
it's possible to map two flags to single type in marshaller implementation.

25 июля 2017 г. 20:13 пользователь "Vladimir Ozerov" <voze...@gridgain.com>
написал:

> Andrey,
>
> You cannot have optional part in the middle as it will break compatibility
> in dangerous way, probably leading to node crash. Also having INT (4
> bytes)
> looks too much for me.
>
> Instead, I would add new type "encoded string":
> 1 byte - type
> 1 byte - encoding code, map frequently used encodings to some byte value;
> also have a special value, meaning that encoding will be written as string
> afterwards, this way we will support any encoding out of the box
> [optional] encoding name
> 4 bytes - string length
> Finally - string bytes
>
> Vladimir.
>
> вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20039=0>>:
>
> > I apologize for damaged formatting. Below is my message as it should be.
> >
> >
> > Hi Igniters,
> >
> > I'd like to discuss future changes related to https://issues.apache.org/
> > jira/browse/IGNITE-5655
> > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> >
> > Is it really good idea to introduce new flag (ENCODED_STRING) for
> existing
> > String datatype? It's possible to use existing STRING flag at negligible
> > performance cost.
> >
> > Currently, utf-8-encoded string looks like
> >
> > byteFlag nonNegativeIntStrLen bytes
> >
> > This format can be backward compatibly extended to
> >
> > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> >
> > Next, I suggest to add new BinaryConfiguration property for encoding to
> use
> > instead of using global property. It seems to be more convenient for
> user.
> >
> > I'll appreciate your feedback.
> >
> > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <[hidden email]
> <http:///user/SendEmail.jtp?type=node=20039=1>>:
> >
> > > Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
> > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really
> good
> > > idea to introduce new flag (ENCODED_STRING) for existing String
> datatype?
> > > It's possible to use existing STRING flag at negligible performance
> cost.
> > > Currently, utf-8-encoded string looks like
> > > byteFlag nonNegativeIntStrLen bytes
> > > This format can be backward compatibly extended to
> > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > Next, I suggest to add new BinaryConfiguration property for encoding
> to
> > use
> > > instead of using global property. It seems to be more convenient for
> > > user.I'll appreciate your feedback.
> > >
> > >
> > >
> > > -
> > > Best regards,
> > >   Andrey Kuznetsov.
> > > --
> > > View this message in context: http://apache-ignite-
> > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > Sent from the Apache Ignite Developers mailing list archive at
> > Nabble.com.
> >
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-ignite-developers.2346864.n4.nabble.
> com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-
> IGNITE-5655-tp20024p20039.html
> To unsubscribe from Non-UTF-8 string encoding support in BinaryMarshaller
> (IGNITE-5655), click here
> <http://apache-ignite-developers.2346864.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=20024=c3RrdXptYUBnbWFpbC5jb218MjAwMjR8LTUwMjc0NDk4NA==>
> .
> NAML
> <http://apache-ignite-developers.2346864.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vladimir Ozerov
Vyacheslav,
When we finish varlen optimization for string lengths, I am afraid we could
end up with very messy protocol, should we mix encoded length and encoding.

Dima,
Encoding must be set on per field basis. This will give us as most flexible
solution at the cost of 1-byte overhead.

вт, 25 июля 2017 г. в 20:23, Dmitriy Setrakyan <dsetrak...@apache.org>:

> I don't understand why this encoding is done on per-object and not on
> per-cache level. Shouldn't the column-to-encoding mapping be defined at
> cache level configuration?
>
> On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov <voze...@gridgain.com>
> wrote:
>
> > Andrey,
> >
> > You cannot have optional part in the middle as it will break
> compatibility
> > in dangerous way, probably leading to node crash. Also having INT (4
> bytes)
> > looks too much for me.
> >
> > Instead, I would add new type "encoded string":
> > 1 byte - type
> > 1 byte - encoding code, map frequently used encodings to some byte value;
> > also have a special value, meaning that encoding will be written as
> string
> > afterwards, this way we will support any encoding out of the box
> > [optional] encoding name
> > 4 bytes - string length
> > Finally - string bytes
> >
> > Vladimir.
> >
> > вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>:
> >
> > > I apologize for damaged formatting. Below is my message as it should
> be.
> > >
> > >
> > > Hi Igniters,
> > >
> > > I'd like to discuss future changes related to
> https://issues.apache.org/
> > > jira/browse/IGNITE-5655
> > > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> > >
> > > Is it really good idea to introduce new flag (ENCODED_STRING) for
> > existing
> > > String datatype? It's possible to use existing STRING flag at
> negligible
> > > performance cost.
> > >
> > > Currently, utf-8-encoded string looks like
> > >
> > > byteFlag nonNegativeIntStrLen bytes
> > >
> > > This format can be backward compatibly extended to
> > >
> > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > >
> > > Next, I suggest to add new BinaryConfiguration property for encoding to
> > use
> > > instead of using global property. It seems to be more convenient for
> > user.
> > >
> > > I'll appreciate your feedback.
> > >
> > > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
> > >
> > > > Hi Igniters,I'd like to discuss future changes related to
> IGNITE-5655
> > > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really
> > good
> > > > idea to introduce new flag (ENCODED_STRING) for existing String
> > datatype?
> > > > It's possible to use existing STRING flag at negligible performance
> > cost.
> > > > Currently, utf-8-encoded string looks like
> > > > byteFlag nonNegativeIntStrLen bytes
> > > > This format can be backward compatibly extended to
> > > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > > Next, I suggest to add new BinaryConfiguration property for encoding
> to
> > > use
> > > > instead of using global property. It seems to be more convenient for
> > > > user.I'll appreciate your feedback.
> > > >
> > > >
> > > >
> > > > -
> > > > Best regards,
> > > >   Andrey Kuznetsov.
> > > > --
> > > > View this message in context: http://apache-ignite-
> > > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > > Sent from the Apache Ignite Developers mailing list archive at
> > > Nabble.com.
> > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > >   Andrey Kuznetsov.
> > >
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Dmitriy Setrakyan
I don't understand why this encoding is done on per-object and not on
per-cache level. Shouldn't the column-to-encoding mapping be defined at
cache level configuration?

On Tue, Jul 25, 2017 at 12:13 PM, Vladimir Ozerov <voze...@gridgain.com>
wrote:

> Andrey,
>
> You cannot have optional part in the middle as it will break compatibility
> in dangerous way, probably leading to node crash. Also having INT (4 bytes)
> looks too much for me.
>
> Instead, I would add new type "encoded string":
> 1 byte - type
> 1 byte - encoding code, map frequently used encodings to some byte value;
> also have a special value, meaning that encoding will be written as string
> afterwards, this way we will support any encoding out of the box
> [optional] encoding name
> 4 bytes - string length
> Finally - string bytes
>
> Vladimir.
>
> вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>:
>
> > I apologize for damaged formatting. Below is my message as it should be.
> >
> >
> > Hi Igniters,
> >
> > I'd like to discuss future changes related to https://issues.apache.org/
> > jira/browse/IGNITE-5655
> > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> >
> > Is it really good idea to introduce new flag (ENCODED_STRING) for
> existing
> > String datatype? It's possible to use existing STRING flag at negligible
> > performance cost.
> >
> > Currently, utf-8-encoded string looks like
> >
> > byteFlag nonNegativeIntStrLen bytes
> >
> > This format can be backward compatibly extended to
> >
> > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> >
> > Next, I suggest to add new BinaryConfiguration property for encoding to
> use
> > instead of using global property. It seems to be more convenient for
> user.
> >
> > I'll appreciate your feedback.
> >
> > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
> >
> > > Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
> > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really
> good
> > > idea to introduce new flag (ENCODED_STRING) for existing String
> datatype?
> > > It's possible to use existing STRING flag at negligible performance
> cost.
> > > Currently, utf-8-encoded string looks like
> > > byteFlag nonNegativeIntStrLen bytes
> > > This format can be backward compatibly extended to
> > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > Next, I suggest to add new BinaryConfiguration property for encoding to
> > use
> > > instead of using global property. It seems to be more convenient for
> > > user.I'll appreciate your feedback.
> > >
> > >
> > >
> > > -
> > > Best regards,
> > >   Andrey Kuznetsov.
> > > --
> > > View this message in context: http://apache-ignite-
> > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > Sent from the Apache Ignite Developers mailing list archive at
> > Nabble.com.
> >
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vyacheslav Daradur
Hi Andrey.

Sound very useful.

We can save one byte if will use controlled overflow on
[nonNegativeIntStrLen]:
If [nonNegativeIntStrLen < 0] then [string is encoded]

I have some questions:
Will there any public API, e.g. "Encoder" interace?
Will user have the opportunity to define own encoding format?

2017-07-25 20:13 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>:

> Andrey,
>
> You cannot have optional part in the middle as it will break compatibility
> in dangerous way, probably leading to node crash. Also having INT (4 bytes)
> looks too much for me.
>
> Instead, I would add new type "encoded string":
> 1 byte - type
> 1 byte - encoding code, map frequently used encodings to some byte value;
> also have a special value, meaning that encoding will be written as string
> afterwards, this way we will support any encoding out of the box
> [optional] encoding name
> 4 bytes - string length
> Finally - string bytes
>
> Vladimir.
>
> вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>:
>
> > I apologize for damaged formatting. Below is my message as it should be.
> >
> >
> > Hi Igniters,
> >
> > I'd like to discuss future changes related to https://issues.apache.org/
> > jira/browse/IGNITE-5655
> > <https://issues.apache.org/jira/browse/IGNITE-5655>.
> >
> > Is it really good idea to introduce new flag (ENCODED_STRING) for
> existing
> > String datatype? It's possible to use existing STRING flag at negligible
> > performance cost.
> >
> > Currently, utf-8-encoded string looks like
> >
> > byteFlag nonNegativeIntStrLen bytes
> >
> > This format can be backward compatibly extended to
> >
> > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> >
> > Next, I suggest to add new BinaryConfiguration property for encoding to
> use
> > instead of using global property. It seems to be more convenient for
> user.
> >
> > I'll appreciate your feedback.
> >
> > 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
> >
> > > Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
> > > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really
> good
> > > idea to introduce new flag (ENCODED_STRING) for existing String
> datatype?
> > > It's possible to use existing STRING flag at negligible performance
> cost.
> > > Currently, utf-8-encoded string looks like
> > > byteFlag nonNegativeIntStrLen bytes
> > > This format can be backward compatibly extended to
> > > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > > Next, I suggest to add new BinaryConfiguration property for encoding to
> > use
> > > instead of using global property. It seems to be more convenient for
> > > user.I'll appreciate your feedback.
> > >
> > >
> > >
> > > -
> > > Best regards,
> > >   Andrey Kuznetsov.
> > > --
> > > View this message in context: http://apache-ignite-
> > > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > > Sent from the Apache Ignite Developers mailing list archive at
> > Nabble.com.
> >
> >
> >
> >
> > --
> > Best regards,
> >   Andrey Kuznetsov.
> >
>



-- 
Best Regards, Vyacheslav D.


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Vladimir Ozerov
Andrey,

You cannot have optional part in the middle as it will break compatibility
in dangerous way, probably leading to node crash. Also having INT (4 bytes)
looks too much for me.

Instead, I would add new type "encoded string":
1 byte - type
1 byte - encoding code, map frequently used encodings to some byte value;
also have a special value, meaning that encoding will be written as string
afterwards, this way we will support any encoding out of the box
[optional] encoding name
4 bytes - string length
Finally - string bytes

Vladimir.

вт, 25 июля 2017 г. в 18:24, Andrey Kuznetsov <stku...@gmail.com>:

> I apologize for damaged formatting. Below is my message as it should be.
>
>
> Hi Igniters,
>
> I'd like to discuss future changes related to https://issues.apache.org/
> jira/browse/IGNITE-5655
> <https://issues.apache.org/jira/browse/IGNITE-5655>.
>
> Is it really good idea to introduce new flag (ENCODED_STRING) for existing
> String datatype? It's possible to use existing STRING flag at negligible
> performance cost.
>
> Currently, utf-8-encoded string looks like
>
> byteFlag nonNegativeIntStrLen bytes
>
> This format can be backward compatibly extended to
>
> byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
>
> Next, I suggest to add new BinaryConfiguration property for encoding to use
> instead of using global property. It seems to be more convenient for user.
>
> I'll appreciate your feedback.
>
> 2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:
>
> > Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
> > <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really good
> > idea to introduce new flag (ENCODED_STRING) for existing String datatype?
> > It's possible to use existing STRING flag at negligible performance cost.
> > Currently, utf-8-encoded string looks like
> > byteFlag nonNegativeIntStrLen bytes
> > This format can be backward compatibly extended to
> > byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> > Next, I suggest to add new BinaryConfiguration property for encoding to
> use
> > instead of using global property. It seems to be more convenient for
> > user.I'll appreciate your feedback.
> >
> >
> >
> > -----
> > Best regards,
> >   Andrey Kuznetsov.
> > --
> > View this message in context: http://apache-ignite-
> > developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> > support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> > Sent from the Apache Ignite Developers mailing list archive at
> Nabble.com.
>
>
>
>
> --
> Best regards,
>   Andrey Kuznetsov.
>


Re: Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
I apologize for damaged formatting. Below is my message as it should be.


Hi Igniters,

I'd like to discuss future changes related to https://issues.apache.org/
jira/browse/IGNITE-5655.

Is it really good idea to introduce new flag (ENCODED_STRING) for existing
String datatype? It's possible to use existing STRING flag at negligible
performance cost.

Currently, utf-8-encoded string looks like

byteFlag nonNegativeIntStrLen bytes

This format can be backward compatibly extended to

byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes

Next, I suggest to add new BinaryConfiguration property for encoding to use
instead of using global property. It seems to be more convenient for user.

I'll appreciate your feedback.

2017-07-25 16:13 GMT+03:00 Andrey Kuznetsov <stku...@gmail.com>:

> Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
> <https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really good
> idea to introduce new flag (ENCODED_STRING) for existing String datatype?
> It's possible to use existing STRING flag at negligible performance cost.
> Currently, utf-8-encoded string looks like
> byteFlag nonNegativeIntStrLen bytes
> This format can be backward compatibly extended to
> byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
> Next, I suggest to add new BinaryConfiguration property for encoding to use
> instead of using global property. It seems to be more convenient for
> user.I'll appreciate your feedback.
>
>
>
> -
> Best regards,
>   Andrey Kuznetsov.
> --
> View this message in context: http://apache-ignite-
> developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-
> support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
> Sent from the Apache Ignite Developers mailing list archive at Nabble.com.




-- 
Best regards,
  Andrey Kuznetsov.


Non-UTF-8 string encoding support in BinaryMarshaller (IGNITE-5655)

2017-07-25 Thread Andrey Kuznetsov
Hi Igniters,I'd like to discuss future changes related to  IGNITE-5655
<https://issues.apache.org/jira/browse/IGNITE-5655>  . Is it really good
idea to introduce new flag (ENCODED_STRING) for existing String datatype?
It's possible to use existing STRING flag at negligible performance cost.
Currently, utf-8-encoded string looks like 
byteFlag nonNegativeIntStrLen bytes
This format can be backward compatibly extended to
byteFlag [negativeIntCharsetCode] nonNegativeIntStrLen bytes
Next, I suggest to add new BinaryConfiguration property for encoding to use
instead of using global property. It seems to be more convenient for
user.I'll appreciate your feedback.



-
Best regards,
  Andrey Kuznetsov.
--
View this message in context: 
http://apache-ignite-developers.2346864.n4.nabble.com/Non-UTF-8-string-encoding-support-in-BinaryMarshaller-IGNITE-5655-tp20024.html
Sent from the Apache Ignite Developers mailing list archive at Nabble.com.