Re: String Interning

2018-06-28 Thread Elias Levy
Am I the only one that feels the config should be renamed or the docs on it
expanded?  Turning on object reuse doesn't really reuse objects, not in the
sense that an object can be reused for different values / messages /
records.  Instead, it stops Flink from making copies of of a record, by
serializing them and deserializing them, when passing them to the next
operator.

On Tue, Jun 26, 2018 at 1:26 AM Stefan Richter 
wrote:

> Hi,
>
> you can enable object reuse via the execution config [1]: „By default,
> objects are not reused in Flink. Enabling the object reuse mode will
> instruct the runtime to reuse user objects for better performance. Keep in
> mind that this can lead to bugs when the user-code function of an operation
> is not aware of this behavior.“.
>
> Best,
> Stefan
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/execution_configuration.html
>
> Am 22.06.2018 um 20:09 schrieb Martin, Nick :
>
> I have a job where I read data from Kafka, do some processing on it, and
> write it to a database. When I read data out of Kafka, I put it into an
> object that has a String field based on the Kafka message key. The possible
> values for the message key are tightly constrained so there are fewer than
> 100 possible unique key values. Profiling of the Flink job shows millions
> of in flight stream elements, with an equal number of Strings, but I know
> all the strings are duplicates of a small number of unique values.  So it’s
> an ideal usecase for String interning. I’ve tried to use interning in the
> constructors for the message elements, but I suspect that I need to do
> something to preserve the interning when Flink serializes/deserializes
> objects when passing them between operators. What’s the best way to
> accomplish that?
>
>
>
>
> --
> Notice: This e-mail is intended solely for use of the individual or entity
> to which it is addressed and may contain information that is proprietary,
> privileged and/or exempt from disclosure under applicable law. If the
> reader is not the intended recipient or agent responsible for delivering
> the message to the intended recipient, you are hereby notified that any
> dissemination, distribution or copying of this communication is strictly
> prohibited. This communication may also contain data subject to U.S. export
> laws. If so, data subject to the International Traffic in Arms Regulation
> cannot be disseminated, distributed, transferred, or copied, whether
> incorporated or in its original form, to foreign nationals residing in the
> U.S. or abroad, absent the express prior approval of the U.S. Department of
> State. Data subject to the Export Administration Act may not be
> disseminated, distributed, transferred or copied contrary to U. S.
> Department of Commerce regulations. If you have received this communication
> in error, please notify the sender by reply e-mail and destroy the e-mail
> message and any physical copies made of the communication.
>  Thank you.
> *
>
>
>


Re: String Interning

2018-06-26 Thread Stefan Richter
Hi,

you can enable object reuse via the execution config [1]: „By default, objects 
are not reused in Flink. Enabling the object reuse mode will instruct the 
runtime to reuse user objects for better performance. Keep in mind that this 
can lead to bugs when the user-code function of an operation is not aware of 
this behavior.“.

Best,
Stefan

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/execution_configuration.html
 
<https://ci.apache.org/projects/flink/flink-docs-release-1.5/dev/execution_configuration.html>

> Am 22.06.2018 um 20:09 schrieb Martin, Nick :
> 
> I have a job where I read data from Kafka, do some processing on it, and 
> write it to a database. When I read data out of Kafka, I put it into an 
> object that has a String field based on the Kafka message key. The possible 
> values for the message key are tightly constrained so there are fewer than 
> 100 possible unique key values. Profiling of the Flink job shows millions of 
> in flight stream elements, with an equal number of Strings, but I know all 
> the strings are duplicates of a small number of unique values.  So it’s an 
> ideal usecase for String interning. I’ve tried to use interning in the 
> constructors for the message elements, but I suspect that I need to do 
> something to preserve the interning when Flink serializes/deserializes 
> objects when passing them between operators. What’s the best way to 
> accomplish that?
>  
>  
>  
> 
> Notice: This e-mail is intended solely for use of the individual or entity to 
> which it is addressed and may contain information that is proprietary, 
> privileged and/or exempt from disclosure under applicable law. If the reader 
> is not the intended recipient or agent responsible for delivering the message 
> to the intended recipient, you are hereby notified that any dissemination, 
> distribution or copying of this communication is strictly prohibited. This 
> communication may also contain data subject to U.S. export laws. If so, data 
> subject to the International Traffic in Arms Regulation cannot be 
> disseminated, distributed, transferred, or copied, whether incorporated or in 
> its original form, to foreign nationals residing in the U.S. or abroad, 
> absent the express prior approval of the U.S. Department of State. Data 
> subject to the Export Administration Act may not be disseminated, 
> distributed, transferred or copied contrary to U. S. Department of Commerce 
> regulations. If you have received this communication in error, please notify 
> the sender by reply e-mail and destroy the e-mail message and any physical 
> copies made of the communication.
>  Thank you. 
> *



String Interning

2018-06-22 Thread Martin, Nick
I have a job where I read data from Kafka, do some processing on it, and write 
it to a database. When I read data out of Kafka, I put it into an object that 
has a String field based on the Kafka message key. The possible values for the 
message key are tightly constrained so there are fewer than 100 possible unique 
key values. Profiling of the Flink job shows millions of in flight stream 
elements, with an equal number of Strings, but I know all the strings are 
duplicates of a small number of unique values.  So it's an ideal usecase for 
String interning. I've tried to use interning in the constructors for the 
message elements, but I suspect that I need to do something to preserve the 
interning when Flink serializes/deserializes objects when passing them between 
operators. What's the best way to accomplish that?





--

Notice: This e-mail is intended solely for use of the individual or entity to 
which it is addressed and may contain information that is proprietary, 
privileged and/or exempt from disclosure under applicable law. If the reader is 
not the intended recipient or agent responsible for delivering the message to 
the intended recipient, you are hereby notified that any dissemination, 
distribution or copying of this communication is strictly prohibited. This 
communication may also contain data subject to U.S. export laws. If so, data 
subject to the International Traffic in Arms Regulation cannot be disseminated, 
distributed, transferred, or copied, whether incorporated or in its original 
form, to foreign nationals residing in the U.S. or abroad, absent the express 
prior approval of the U.S. Department of State. Data subject to the Export 
Administration Act may not be disseminated, distributed, transferred or copied 
contrary to U. S. Department of Commerce regulations. If you have received this 
communication in error, please notify the sender by reply e-mail and destroy 
the e-mail message and any physical copies made of the communication.
 Thank you. 
*