[jira] [Commented] (FLINK-28057) LD_PRELOAD is hardcoded to x64 on flink-docker

2022-08-30 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17597702#comment-17597702
 ] 

Nicolas Ferrario commented on FLINK-28057:
--

[~chesnay] this happens because the CI host doesn’t have jemalloc installed, 
and it’s actually running as “bare metal”. We’d need to change the test suite 
to execute the Docker image instead to make that warning go away.

> LD_PRELOAD is hardcoded to x64 on flink-docker
> --
>
> Key: FLINK-28057
> URL: https://issues.apache.org/jira/browse/FLINK-28057
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Nicolas Ferrario
>Assignee: Nicolas Ferrario
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 1.16.0, 1.15.2, 1.14.6
>
>
> ARM images are not using jemalloc because LD_PRELOAD is hardcoded to use an 
> x64 path, causing this error:
> {noformat}
> ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libjemalloc.so' from 
> LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
> {noformat}
> Right now docker-entrypoint is using this:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> }
> {code}
> I propose we use this instead:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> # Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
> -i)-linux-gnu/libjemalloc.so
> if [[ `uname -i` == 'aarch64' ]]; then
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
> else
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> fi
> }
> {code}
> https://github.com/apache/flink-docker/pull/117



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28057) LD_PRELOAD is hardcoded to x64 on flink-docker

2022-07-28 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17572626#comment-17572626
 ] 

Nicolas Ferrario commented on FLINK-28057:
--

Hi [~yunta]

[https://github.com/apache/flink-docker/pull/126] (1.14)

[https://github.com/apache/flink-docker/pull/125] (1.15)

> LD_PRELOAD is hardcoded to x64 on flink-docker
> --
>
> Key: FLINK-28057
> URL: https://issues.apache.org/jira/browse/FLINK-28057
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Nicolas Ferrario
>Assignee: Nicolas Ferrario
>Priority: Minor
>  Labels: pull-request-available
>
> ARM images are not using jemalloc because LD_PRELOAD is hardcoded to use an 
> x64 path, causing this error:
> {noformat}
> ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libjemalloc.so' from 
> LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
> {noformat}
> Right now docker-entrypoint is using this:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> }
> {code}
> I propose we use this instead:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> # Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
> -i)-linux-gnu/libjemalloc.so
> if [[ `uname -i` == 'aarch64' ]]; then
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
> else
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> fi
> }
> {code}
> https://github.com/apache/flink-docker/pull/117



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (FLINK-28057) LD_PRELOAD is hardcoded to x64 on flink-docker

2022-06-14 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554111#comment-17554111
 ] 

Nicolas Ferrario commented on FLINK-28057:
--

Hi [~yunta]. Flink is building official ARM images since 1.15 as far as I know, 
right? ([https://hub.docker.com/_/flink])

Everything works perfectly except for jemalloc, which doesn't prevent Flink 
from starting anyway.

 

A current workaround is to use these configs:
{code:yaml}
containerized.jobmanager.env.DISABLE_JEMALLOC: true
containerized.jobmanager.env.LD_PRELOAD: 
/usr/lib/aarch64-linux-gnu/libjemalloc.so
containerized.taskmanager.env.DISABLE_JEMALLOC: true
containerized.taskmanager.env.LD_PRELOAD: 
/usr/lib/aarch64-linux-gnu/libjemalloc.so 
{code}

> LD_PRELOAD is hardcoded to x64 on flink-docker
> --
>
> Key: FLINK-28057
> URL: https://issues.apache.org/jira/browse/FLINK-28057
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Nicolas Ferrario
>Priority: Minor
>  Labels: pull-request-available
>
> ARM images are not using jemalloc because LD_PRELOAD is hardcoded to use an 
> x64 path, causing this error:
> {noformat}
> ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libjemalloc.so' from 
> LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
> {noformat}
> Right now docker-entrypoint is using this:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> }
> {code}
> I propose we use this instead:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> # Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
> -i)-linux-gnu/libjemalloc.so
> if [[ `uname -i` == 'aarch64' ]]; then
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
> else
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> fi
> }
> {code}
> https://github.com/apache/flink-docker/pull/117



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FLINK-28057) LD_PRELOAD is hardcoded to x64 on flink-docker

2022-06-14 Thread Nicolas Ferrario (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-28057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nicolas Ferrario updated FLINK-28057:
-
Description: 
ARM images are not using jemalloc because LD_PRELOAD is hardcoded to use an x64 
path, causing this error:
{noformat}
ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libjemalloc.so' from LD_PRELOAD 
cannot be preloaded (cannot open shared object file): ignored.
{noformat}


Right now docker-entrypoint is using this:

{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
}
{code}

I propose we use this instead:
{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
# Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
-i)-linux-gnu/libjemalloc.so
if [[ `uname -i` == 'aarch64' ]]; then
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
else
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
fi
}
{code}

https://github.com/apache/flink-docker/pull/117

  was:
Right now docker-entrypoint is using this:

{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
}
{code}

I propose we use this instead:
{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
# Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
-i)-linux-gnu/libjemalloc.so
if [[ `uname -i` == 'aarch64' ]]; then
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
else
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
fi
}
{code}

https://github.com/apache/flink-docker/pull/117


> LD_PRELOAD is hardcoded to x64 on flink-docker
> --
>
> Key: FLINK-28057
> URL: https://issues.apache.org/jira/browse/FLINK-28057
> Project: Flink
>  Issue Type: Bug
>  Components: flink-docker
>Affects Versions: 1.15.0
>Reporter: Nicolas Ferrario
>Priority: Minor
>
> ARM images are not using jemalloc because LD_PRELOAD is hardcoded to use an 
> x64 path, causing this error:
> {noformat}
> ERROR: ld.so: object '/usr/lib/x86_64-linux-gnu/libjemalloc.so' from 
> LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
> {noformat}
> Right now docker-entrypoint is using this:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> }
> {code}
> I propose we use this instead:
> {code:sh}
> maybe_enable_jemalloc() {
> if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
> # Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
> -i)-linux-gnu/libjemalloc.so
> if [[ `uname -i` == 'aarch64' ]]; then
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
> else
> export 
> LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
> fi
> fi
> }
> {code}
> https://github.com/apache/flink-docker/pull/117



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-28057) LD_PRELOAD is hardcoded to x64 on flink-docker

2022-06-14 Thread Nicolas Ferrario (Jira)
Nicolas Ferrario created FLINK-28057:


 Summary: LD_PRELOAD is hardcoded to x64 on flink-docker
 Key: FLINK-28057
 URL: https://issues.apache.org/jira/browse/FLINK-28057
 Project: Flink
  Issue Type: Bug
  Components: flink-docker
Affects Versions: 1.15.0
Reporter: Nicolas Ferrario


Right now docker-entrypoint is using this:

{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
export LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
}
{code}

I propose we use this instead:
{code:sh}
maybe_enable_jemalloc() {
if [ "${DISABLE_JEMALLOC:-false}" == "false" ]; then
# Maybe use export LD_PRELOAD=$LD_PRELOAD:/usr/lib/$(uname 
-i)-linux-gnu/libjemalloc.so
if [[ `uname -i` == 'aarch64' ]]; then
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/aarch64-linux-gnu/libjemalloc.so
else
export 
LD_PRELOAD=$LD_PRELOAD:/usr/lib/x86_64-linux-gnu/libjemalloc.so
fi
fi
}
{code}

https://github.com/apache/flink-docker/pull/117



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (FLINK-5964) Change TypeSerializers to allow construction of immutable types

2021-10-31 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436499#comment-17436499
 ] 

Nicolas Ferrario commented on FLINK-5964:
-

Agree. We started to replace our Scala pipelines with Kotlin and we couldn't be 
happier. It'd be awesome to have some Kotlin support, and maybe some new 
Serializers/Deserializers, like Scala Case Classes do.

> Change TypeSerializers to allow construction of immutable types
> ---
>
> Key: FLINK-5964
> URL: https://issues.apache.org/jira/browse/FLINK-5964
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System
>Affects Versions: 1.1.4, 2.0.0
>Reporter: Jayson Minard
>Priority: Minor
>
> If your programming language has a lot of Immutable types (and with no 
> default constructor) Flink forces you to create new versions as read/write 
> POJO otherwise the types are rejected by the system.  In Kotlin for example, 
> given a class and property values we can determine which constructor to call 
> and invoke it using knowledge of default values, nullable types and which 
> properties can be set in construction or after construction.
> Flink provides no opportunity to use this model because Serializers are 
> littered with calls to `createInstance` that are not passed any values so 
> have no opportunity to fully inflate the object on construction.  
> This means that when you use Flink you throw away maybe hundreds of state 
> objects (my worst case) and have to create Flink-only variations which 
> becomes grunt work that adds no value.  
> Currently TypeExtractor isn't extendable, and all of the special cases are 
> hard coded.  It should be configured the order of checking for type 
> information so that pluggable types can be added into the chain of analysis.  
> For example before `analyzePojo` is called I could special case a Kotlin 
> class returning a different TypeInformation instance.  But I don't think that 
> solves the whole problem since other TypeSerializers make assumptions and 
> call `createInstance` on other TypeSerializers without knowing how they would 
> want to do the construction (in the Kotlin case it would be "tell me to 
> construct my instance and give me the list of named fields and serializers to 
> get their values and let me decide what to do).
> What is the best idea for this change?  With guidance, I can work on the PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21321) Change RocksDB incremental checkpoint re-scaling to use deleteRange

2021-10-31 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436498#comment-17436498
 ] 

Nicolas Ferrario commented on FLINK-21321:
--

This would be really useful for us. We have a couple of Stateful Functions jobs 
that are small but use a ton of state. Sometimes we need to scale out or down, 
and that may take up to 2 hours. A recovery without rescaling takes about 5-10 
minutes

> Change RocksDB incremental checkpoint re-scaling to use deleteRange
> ---
>
> Key: FLINK-21321
> URL: https://issues.apache.org/jira/browse/FLINK-21321
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / State Backends
>Reporter: Joey Pereira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In FLINK-8790, it was suggested to use RocksDB's {{deleteRange}} API to more 
> efficiently clip the databases for the desired target group.
> During the PR for that ticket, 
> [#5582|https://github.com/apache/flink/pull/5582], the change did not end up 
> using the {{deleteRange}} method  as it was an experimental feature in 
> RocksDB.
> At this point {{deleteRange}} is in a far less experimental state now but I 
> believe is still formally "experimental". It is heavily by many others like 
> CockroachDB and TiKV and they have teased out several bugs in complex 
> interactions over the years.
> For certain re-scaling situations where restores trigger 
> {{restoreWithScaling}} and the DB clipping logic, this would likely reduce an 
> O[n] operation (N = state size/records) to O(1). For large state apps, this 
> would potentially represent a non-trivial amount of time spent for 
> re-scaling. In the case of my workplace, we have an operator with 100s of 
> billions of records in state and re-scaling was taking a long time (>>30min, 
> but it has been awhile since doing it).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-21409) Add Avro to DataTypes & Serialization docs

2021-10-31 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-21409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17436496#comment-17436496
 ] 

Nicolas Ferrario commented on FLINK-21409:
--

I think this is important. We learned the hard way that serializing a 
GenericRecord using TypeInformation.of actually falls back to Kryo. It used to 
work up to Flink 1.12, but we started getting (useful) errors in 1.13, probably 
also because Avro was bumped up from 1.8 to 1.10.

Having docs would definitely help newcomers

> Add Avro to DataTypes & Serialization docs
> --
>
> Key: FLINK-21409
> URL: https://issues.apache.org/jira/browse/FLINK-21409
> Project: Flink
>  Issue Type: Improvement
>  Components: API / Type Serialization System, Documentation, Formats 
> (JSON, Avro, Parquet, ORC, SequenceFile)
>Reporter: Chesnay Schepler
>Priority: Minor
>  Labels: auto-deprioritized-major
> Fix For: 1.15.0
>
>
> The "Data Types & Serialization" barely mention Avro, which is surprising 
> given how common it is.
> Even basic things like how to create a correct TypeInformation for 
> GenericRecords is missing, or special cases like FLINK-21386 which likely 
> just won't work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-10-11 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17427194#comment-17427194
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

[~igal] just tested and it works great as well! Thanks a lot!

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major, pull-request-available
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-10-08 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17426259#comment-17426259
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

Hi [~igal], I just tested it and it looks awesome! That's exactly what we 
needed.

One more thing we would like to have is the ability to use these metrics on 
StatefulFunctionModule. We have some filter logic there to ignore invalid 
events before they are sent over the network to some function.

I understand that may go against Statefun's goal, and it may end up looking 
like a hack, so I'd definitely understand if this proposal is rejected.

For example, this is our code (in Kotlin ), where _stats_ is a Statsd client:
{code:java}
binder.bindIngressRouter(myIngress) { message, downstream ->
if (message.isValid && message.auctionId > 0) {
downstream.forward(Identifiers.storageType, 
message.auctionId.toString(), message)
} else {
stats.incrementCounter("invalid-events")
}
}
{code}

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-10-05 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424504#comment-17424504
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

[~igal] sorry, didn't have a chance last week. I've been fighting some fires :( 
Will test this week for sure

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-09-27 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420837#comment-17420837
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

[~igal] awesome, I'll make sure to try it within this week. Btw I see you added 
`inc` and `dec` functions to the counter, so it can behave like a Gauge!

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-09-27 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420729#comment-17420729
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

Sounds great, thanks!

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-09-27 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17420714#comment-17420714
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

Hi [~igal],

That'd be sweet!! We're just using counters IIRC, so that would be more than 
enough for us! Gauges would be nice to have, but that can wait.

On that note, Accumulators would also be helpful, if that happens to be easier 
to implement for a first iteration.

Thanks!

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Assignee: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-22533) Allow creating custom metrics

2021-09-23 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17419254#comment-17419254
 ] 

Nicolas Ferrario commented on FLINK-22533:
--

This would be really useful to us! We're currently hacking a StatsD client, but 
we'd definitely love to have a native way to do so.

An embedded-function only feature would be more than enough for us

> Allow creating custom metrics
> -
>
> Key: FLINK-22533
> URL: https://issues.apache.org/jira/browse/FLINK-22533
> Project: Flink
>  Issue Type: Improvement
>  Components: Stateful Functions
>Reporter: Igal Shilman
>Priority: Minor
>  Labels: auto-deprioritized-major
>
> Currently it is not possible to create custom metrics in StateFun.
> Let us consider supporting these. 
>  
> Mailing list thread: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-metrics-in-Stateful-Functions-td43282.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18229) Pending worker requests should be properly cleared

2021-05-26 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352240#comment-17352240
 ] 

Nicolas Ferrario commented on FLINK-18229:
--

Hi [~xintongsong], thank you very much! I'll try those workarounds, and 
otherwise I'll just wait for 1.14.

> Pending worker requests should be properly cleared
> --
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>Affects Versions: 1.9.3, 1.10.1, 1.11.0
>Reporter: Xintong Song
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-18229) Pending worker requests should be properly cleared

2021-05-26 Thread Nicolas Ferrario (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352093#comment-17352093
 ] 

Nicolas Ferrario commented on FLINK-18229:
--

Hey [~xintongsong], what's the status of this ticket? We just had this problem 
running on Native Kubernetes. So we were using all resources available in a K8s 
cluster, and killed a pod (TM) intentionally. Flink somehow requested 6 more 
TMs, and only one of them succeed, since there were no more resources available 
for the rest. What's interesting is that the old TMs never stopped, so they got 
reused after the job recovered itself from last checkpoint, but we were left 
with 5 TMs in Pending state that will never be gone unless we free some slots.

> Pending worker requests should be properly cleared
> --
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>Affects Versions: 1.9.3, 1.10.1, 1.11.0
>Reporter: Xintong Song
>Priority: Major
> Fix For: 1.14.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-21825) AvroInputFormat doesn't honor -1 length on FileInputSplits

2021-03-16 Thread Nicolas Ferrario (Jira)
Nicolas Ferrario created FLINK-21825:


 Summary: AvroInputFormat doesn't honor -1 length on FileInputSplits
 Key: FLINK-21825
 URL: https://issues.apache.org/jira/browse/FLINK-21825
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.12.2
Reporter: Nicolas Ferrario


FileInputSplit documentation says a length of *-1* means that the whole file 
should be read, however AvroInputFormat expects the exact size.

[https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-core/src/main/java/org/apache/flink/core/fs/FileInputSplit.java#L50]
{code:java}
/**
  * Constructs a split with host information.
  *  
  * @param num the number of this input split
  * @param file the file name
  * @param start the position of the first byte in the file to process
  * @param length the number of bytes in the file to process (-1 is flag for 
"read whole file")
  * @param hosts the list of hosts containing the block, possibly 
null
  */
  public FileInputSplit(int num, Path file, long start, long length, String[] 
hosts){code}
[https://github.com/apache/flink/blob/97bfd049951f8d52a2e0aed14265074c4255ead0/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/AvroInputFormat.java#L141]

 
{code:java}
private DataFileReader initReader(FileInputSplit split) throws IOException {
DatumReader datumReader;

if (org.apache.avro.generic.GenericRecord.class == avroValueType) {
datumReader = new GenericDatumReader();
} else {
datumReader =

org.apache.avro.specific.SpecificRecordBase.class.isAssignableFrom(
avroValueType)
? new SpecificDatumReader(avroValueType)
: new ReflectDatumReader(avroValueType);
}
if (LOG.isInfoEnabled()) {
LOG.info("Opening split {}", split);
}

SeekableInput in =
new FSDataInputStreamWrapper(
stream,

split.getPath().getFileSystem().getFileStatus(split.getPath()).getLen());
DataFileReader dataFileReader =
(DataFileReader) DataFileReader.openReader(in, datumReader);

if (LOG.isDebugEnabled()) {
LOG.debug("Loaded SCHEMA: {}", dataFileReader.getSchema());
}

end = split.getStart() + split.getLength(); <--- THIS LINE
recordsReadSinceLastSync = 0;
return dataFileReader;
}
{code}
This could be fixed either by updating the documentation, or by honoring -1 in 
AvroInputFormat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)