Re: byte array as keys in Flink

2021-09-24 Thread Dan Hill
Thanks, everyone!

This all makes and why I sent the email.  I think UUID would work.  We also
talked about BigInteger and ByteBuffer.


On Fri, Sep 24, 2021 at 2:32 AM Schwalbe Matthias <
matthias.schwa...@viseca.ch> wrote:

> Hi Dan,
>
>
>
> Did you consider using java.utils.UUID as key type? It consists of two
> longs which should perform well for use as key.
>
> TypeInformation will map to GenericTypeInfo, i.e. it uses KryoSerializer,
> unless you register a specific TypeInformation for this class …
>
>
>
> I didn’t give it a try … keep us posted if that works 
>
>
>
> Thias
>
>
>
>
>
> *From:* Guowei Ma 
> *Sent:* Freitag, 24. September 2021 09:34
> *To:* Caizhi Weng 
> *Cc:* Dan Hill ; user 
> *Subject:* Re: byte array as keys in Flink
>
>
>
> Hi Hill
>
>
>
> As far as I know you could not use byte[] as a keyby. You could find more
> information from [1].
>
>
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby
>
>
>
> Best,
>
> Guowei
>
>
>
>
>
> On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng  wrote:
>
> Hi!
>
>
>
> It depends on the state backend you use. For example if you use a heap
> memory state backend it is backed by a hash map and it uses the hash code
> of byte[] to compare the two byte[] (see HeapMapState#put). However for
> rocksdb state backend it uses the serialized bytes (that is to say, the
> content of byte[]) to compare with the records and thus two byte[] with the
> same content can match (see RocksDBMapState#put).
>
>
>
> Dan Hill  于2021年9月24日周五 上午7:43写道:
>
> *Context*
>
> I want to perform joins based on UUIDs.  String version is less efficient
> so I figured I should use the byte[] version.  I did a shallow dive into
> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
> object equals/hashcode).
>
>
>
> *Request*
>
> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
> in a MapState.
>
>
>
>
>
> Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und
> beinhaltet unter Umständen vertrauliche Mitteilungen. Da die
> Vertraulichkeit von e-Mail-Nachrichten nicht gewährleistet werden kann,
> übernehmen wir keine Haftung für die Gewährung der Vertraulichkeit und
> Unversehrtheit dieser Mitteilung. Bei irrtümlicher Zustellung bitten wir
> Sie um Benachrichtigung per e-Mail und um Löschung dieser Nachricht sowie
> eventueller Anhänge. Jegliche unberechtigte Verwendung oder Verbreitung
> dieser Informationen ist streng verboten.
>
> This message is intended only for the named recipient and may contain
> confidential or privileged information. As the confidentiality of email
> communication cannot be guaranteed, we do not accept any responsibility for
> the confidentiality and the intactness of this message. If you have
> received it in error, please advise the sender by return e-mail and delete
> this message and any attachments. Any unauthorised use or dissemination of
> this information is strictly prohibited.
>


RE: byte array as keys in Flink

2021-09-24 Thread Schwalbe Matthias
Hi Dan,

Did you consider using java.utils.UUID as key type? It consists of two longs 
which should perform well for use as key.
TypeInformation will map to GenericTypeInfo, i.e. it uses KryoSerializer, 
unless you register a specific TypeInformation for this class …

I didn’t give it a try … keep us posted if that works 

Thias


From: Guowei Ma 
Sent: Freitag, 24. September 2021 09:34
To: Caizhi Weng 
Cc: Dan Hill ; user 
Subject: Re: byte array as keys in Flink

Hi Hill

As far as I know you could not use byte[] as a keyby. You could find more 
information from [1].

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby

Best,
Guowei


On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng 
mailto:tsreape...@gmail.com>> wrote:
Hi!

It depends on the state backend you use. For example if you use a heap memory 
state backend it is backed by a hash map and it uses the hash code of byte[] to 
compare the two byte[] (see HeapMapState#put). However for rocksdb state 
backend it uses the serialized bytes (that is to say, the content of byte[]) to 
compare with the records and thus two byte[] with the same content can match 
(see RocksDBMapState#put).

Dan Hill mailto:quietgol...@gmail.com>> 于2021年9月24日周五 
上午7:43写道:
Context
I want to perform joins based on UUIDs.  String version is less efficient so I 
figured I should use the byte[] version.  I did a shallow dive into the Flink 
code I'm not sure it's safe to use byte[] as a key (since it uses object 
equals/hashcode).

Request
How do other Flink devs do for byte[] keys? I want to use byte[] as a key in a 
MapState.


Diese Nachricht ist ausschliesslich für den Adressaten bestimmt und beinhaltet 
unter Umständen vertrauliche Mitteilungen. Da die Vertraulichkeit von 
e-Mail-Nachrichten nicht gewährleistet werden kann, übernehmen wir keine 
Haftung für die Gewährung der Vertraulichkeit und Unversehrtheit dieser 
Mitteilung. Bei irrtümlicher Zustellung bitten wir Sie um Benachrichtigung per 
e-Mail und um Löschung dieser Nachricht sowie eventueller Anhänge. Jegliche 
unberechtigte Verwendung oder Verbreitung dieser Informationen ist streng 
verboten.

This message is intended only for the named recipient and may contain 
confidential or privileged information. As the confidentiality of email 
communication cannot be guaranteed, we do not accept any responsibility for the 
confidentiality and the intactness of this message. If you have received it in 
error, please advise the sender by return e-mail and delete this message and 
any attachments. Any unauthorised use or dissemination of this information is 
strictly prohibited.


Re: byte array as keys in Flink

2021-09-24 Thread Guowei Ma
Hi Hill

As far as I know you could not use byte[] as a keyby. You could find more
information from [1].

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/datastream/operators/overview/#keyby

Best,
Guowei


On Fri, Sep 24, 2021 at 3:15 PM Caizhi Weng  wrote:

> Hi!
>
> It depends on the state backend you use. For example if you use a heap
> memory state backend it is backed by a hash map and it uses the hash code
> of byte[] to compare the two byte[] (see HeapMapState#put). However for
> rocksdb state backend it uses the serialized bytes (that is to say, the
> content of byte[]) to compare with the records and thus two byte[] with the
> same content can match (see RocksDBMapState#put).
>
> Dan Hill  于2021年9月24日周五 上午7:43写道:
>
>> *Context*
>> I want to perform joins based on UUIDs.  String version is less efficient
>> so I figured I should use the byte[] version.  I did a shallow dive into
>> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
>> object equals/hashcode).
>>
>> *Request*
>> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
>> in a MapState.
>>
>>
>>


Re: byte array as keys in Flink

2021-09-24 Thread Caizhi Weng
Hi!

It depends on the state backend you use. For example if you use a heap
memory state backend it is backed by a hash map and it uses the hash code
of byte[] to compare the two byte[] (see HeapMapState#put). However for
rocksdb state backend it uses the serialized bytes (that is to say, the
content of byte[]) to compare with the records and thus two byte[] with the
same content can match (see RocksDBMapState#put).

Dan Hill  于2021年9月24日周五 上午7:43写道:

> *Context*
> I want to perform joins based on UUIDs.  String version is less efficient
> so I figured I should use the byte[] version.  I did a shallow dive into
> the Flink code I'm not sure it's safe to use byte[] as a key (since it uses
> object equals/hashcode).
>
> *Request*
> How do other Flink devs do for byte[] keys? I want to use byte[] as a key
> in a MapState.
>
>
>