RE: Re-start strategy without checkpointing enabled

2023-08-23 Thread Kamal Mittal via user
Thanks.

Only query is that whether checkpointing and job re-start strategy are 
independent to each other? If checkpointing is not enabled and re-start 
strategy is given then flink will still re-try the job as per configuration? 
Checkpointing is not enabled as there is no state to maintain.

If checkpointing is not enabled, the “no restart” strategy is used. If 
checkpointing is activated and the restart strategy has not been configured, 
the fixed-delay strategy is used with Integer.MAX_VALUE restart attempts. See 
the following list of available restart strategies to learn what values are 
supported.

Rgds,
Kamal

From: liu ron 
Sent: 24 August 2023 07:43 AM
To: user@flink.apache.org
Subject: Re: Re-start strategy without checkpointing enabled

Hi, Kamal

As Hang says, some extra info about job failover strategy for reference in [1]

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/

Best,
Ron

Hang Ruan mailto:ruanhang1...@gmail.com>> 于2023年8月23日周三 
22:27写道:
Hi, Kamal.

If we don't enable checkpointing, the job will be started with the startup mode 
each time.
For example, the job reads Kafka from the earliest offset and writes to mysql. 
If the job failover without checkpointing, the tasks will consume Kafka from 
the earliest offset again.

I think it is best to enable checkpointing to restart job from the position 
where the job stopped reading.

Best,
Hang

Kamal Mittal via user mailto:user@flink.apache.org>> 
于2023年8月23日周三 18:46写道:
Hello,

If checkpointing is NOT enabled and re-start strategy is configured then flink 
retries the whole job execution i.e. enabling checkpointing is must for re-try 
or not?

Rgds,
Kamal


Re: K8s Appliaction模式无法支持flinkjar中Java动态编译?

2023-08-23 Thread Weihua Hu
Hi,

抱歉我对 JavaCompiler 不是非常了解,我想知道这些动态编译是运行在 UserJar 的 main 方法中吗?以及编译的产物是怎么传递给
Flink 的?

Best,
Weihua


On Tue, Aug 22, 2023 at 5:12 PM 周坤 <18679131...@163.com> wrote:

> 你好!
>
> 有一个关于flink K8S apllication模式运行的问题需要解答下;
>
> 原本又yarn per模式运行的flink需要切换到K8s apllication模式;
>
>
>
>
> 目前公司环境提供了一个通用的基础flink1.13镜像包;
>
> usrJar:自己实现一个flink任务, 该任务存在使用 javax.tools.JavaCompiler
> 动态加载数据库的java类,进行动态编译加载运行;
>
> 目前在切换运行的时候会报 需要动态编译的类的依赖找不到;
>
>
>
>
> 需要动态编译的class文件的依赖在 usrJar中都是存在的,但是启动却包找不到依赖的类;
> 开始以为是 flink类加载机制导致:  classloader.resolve-order: parent-first , 增加该配置也无效;
> 后来发现将需要编译的类依赖放入到lib, 可以执行通过,但是如此违背了动态编译的初衷;
>
> 对此我感到很困惑? 是什么原因导致,期待能有回复。
>
>


Re: Re-start strategy without checkpointing enabled

2023-08-23 Thread liu ron
Hi, Kamal

As Hang says, some extra info about job failover strategy for reference in
[1]

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/task_failure_recovery/

Best,
Ron

Hang Ruan  于2023年8月23日周三 22:27写道:

> Hi, Kamal.
>
> If we don't enable checkpointing, the job will be started with the startup
> mode each time.
> For example, the job reads Kafka from the earliest offset and writes to
> mysql. If the job failover without checkpointing, the tasks will consume
> Kafka from the earliest offset again.
>
> I think it is best to enable checkpointing to restart job from the
> position where the job stopped reading.
>
> Best,
> Hang
>
> Kamal Mittal via user  于2023年8月23日周三 18:46写道:
>
>> Hello,
>>
>>
>>
>> If checkpointing is NOT enabled and re-start strategy is configured then
>> flink retries the whole job execution i.e. enabling checkpointing is must
>> for re-try or not?
>>
>>
>>
>> Rgds,
>>
>> Kamal
>>
>


Re: Flink 1.17.2 planned?

2023-08-23 Thread Jing Ge via user
Hi Christian,

Thanks for your understanding. We will take a look at 1.17.2, once the 1.18
release is done. In the meantime, there might be someone in the community
who volunteers to be the 1.17.2 release manager. You will see related email
threads on the Dev. Stay tuned please :-)

Best regards,
Jing

On Wed, Aug 23, 2023 at 9:27 AM Christian Lorenz 
wrote:

> Hi Jing,
>
>
>
> thanks for the answer. I have no idea what kind of work is needed for
> being a release manager. I think we’ll have to wait for the release then
> (if really urgent, the blocking bug can also be patched by us).
>
>
>
> Kind regards,
>
> Christian
>
>
>
> *Von: *Jing Ge via user 
> *Datum: *Dienstag, 22. August 2023 um 11:40
> *An: *liu ron 
> *Cc: *user@flink.apache.org 
> *Betreff: *Re: Flink 1.17.2 planned?
>
> This email has reached Mapp via an external source
>
>
>
> Hi Christian,
>
>
>
> Thanks for reaching out. Liked Ron pointed out that the community is
> focusing on the 1.18 release. If you are facing urgent issues, would you
> like to volunteer as the release manager of 1.17.2 and drive the release?
> Theoretically, everyone could be the release manager of a bugs fix release.
>
>
>
> Best regards,
>
> Jing
>
>
>
> On Tue, Aug 22, 2023 at 3:41 AM liu ron  wrote:
>
> Hi, Christian
>
>
>
> We released 1.17.1 [1] in May, and the main focus of the community is
> currently on the 1.18 release, so 1.17.2 should be planned for after the
> 1.18 release!
>
>
>
> [1]
> https://flink.apache.org/2023/05/25/apache-flink-1.17.1-release-announcement/
>
>
>
>
>
> Best,
>
> Ron
>
>
>
> Christian Lorenz via user  于2023年8月21日周一 17:33写道:
>
> Hi team,
>
>
>
> are there any infos about a bugfix release 1.17.2 available? E.g. will
> there be another bugfix release of 1.17 / approximate timing?
>
> We are hit by https://issues.apache.org/jira/browse/FLINK-32296 which
> leads to wrong SQL responses in some circumstances.
>
>
>
> Kind regards,
>
> Christian
>
> This e-mail is from Mapp Digital Group and its international legal
> entities and may contain information that is confidential.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
>
> This e-mail is from Mapp Digital Group and its international legal
> entities and may contain information that is confidential.
> If you are not the intended recipient, do not read, copy or distribute the
> e-mail or any attachments. Instead, please notify the sender and delete the
> e-mail and any attachments.
>


Re: Request-Response flow for real-time analytics

2023-08-23 Thread Alex Cruise
This is a pretty hard problem. I would be inclined to try Queryable State (
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/)
first.

-0xe1a

On Mon, Aug 21, 2023 at 11:04 PM Jiten Pathy  wrote:

> Hi,
> We are currently evaluating Flink for our analytics engine. We would
> appreciate any help with our experiment in using flink for real-time
> request-response use-case.
>
> To demonstrate the current use-case: our application produces events of
> the following form:
>
> {id, customerId, amount, timestamp}
>
> We calculate some continuous aggregates triggered by each event produced
> and use them to decide on the action.
>
> Examples of Aggregates: sum of amount total, amount group by customerId,
> amount per day(group-by customer), per month etc.
>
> One approach we considered is to correlate the aggregates with the `Id`,
> So for the following input events:
>
> {1, "CUST1", 100, $TS1}
> {2, "CUST2", 5, $TS2}
> {3, "CUST1", 15, $TS3}
>
> We would generate the following(ignoring timestamp for now) into kafka:
>
> {1, "TOTAL", 100} , {1, "GROUPBYCUSTTOTAL", "CUST1", 100}
> {2, "TOTAL", 105} , {2, "GROUPBYCUSTTOTAL", "CUST2", 5}
> {3, "TOTAL", 120} , {3, "GROUPBYCUSTTOTAL", "CUST1", 115}
>
> And our application would read from kafka and process them.
>
> So the flow looks like:
>
> Application -- kafka---> flink --> kafka <--- Application
>
> We want to keep our current request - response model i.e. we need all
> continuous aggregates out for every ingested event into flink, before we
> can further process the said event.
>
> Unfortunately we don't see a way to do this in flink-SQL: As the
> aggregates would not have the requestId for us to correlate with e.g. for
> the following simple continuous query:
> SELECT sum(amount) from EVENTS
>
> We have tried doing this with flink-Datastream API: KeyedProcessFunction
>  with MapState per window, and collecting in processElement and using
> Kafka sink.
>
> A sample code for the windowing would look like the following:
>
>  public void processElement(Transaction transaction, 
> KeyedProcessFunction.Context context, 
> Collector collector) throws Exception {
> ()
> collector.collect(new Aggregate(transaction.getId(), 
> context.getCurrentKey(), agg0, evTime));
> }
>
> If we were to use FlinkSQL instead, how would we accomplish this
> functionality?
>
> If there are any alternative approaches to accomplish this while
> maintaining our invariant: every event must produce all aggregates that
> consume the corresponding event, we would love to hear from the community.
>
> Regards,
>
> Jiten
>
> *The information contained in this transmission (including any
> attachments) is confidential and may be privileged. It is intended only for
> the use of the individual or entity named above. If you are not the
> intended recipient; dissemination, distribution, or copy of this
> communication is strictly prohibited. If you have received this
> communication in error, please erase all copies of this message and its
> attachments and notify me immediately.*
>


Re: Re-start strategy without checkpointing enabled

2023-08-23 Thread Hang Ruan
Hi, Kamal.

If we don't enable checkpointing, the job will be started with the startup
mode each time.
For example, the job reads Kafka from the earliest offset and writes to
mysql. If the job failover without checkpointing, the tasks will consume
Kafka from the earliest offset again.

I think it is best to enable checkpointing to restart job from the position
where the job stopped reading.

Best,
Hang

Kamal Mittal via user  于2023年8月23日周三 18:46写道:

> Hello,
>
>
>
> If checkpointing is NOT enabled and re-start strategy is configured then
> flink retries the whole job execution i.e. enabling checkpointing is must
> for re-try or not?
>
>
>
> Rgds,
>
> Kamal
>


Re-start strategy without checkpointing enabled

2023-08-23 Thread Kamal Mittal via user
Hello,

If checkpointing is NOT enabled and re-start strategy is configured then flink 
retries the whole job execution i.e. enabling checkpointing is must for re-try 
or not?

Rgds,
Kamal


Re: Request-Response flow for real-time analytics

2023-08-23 Thread xiangyu feng
Hi Pathy,

Pls check if following SQL fits ur need,


CREATE TABLE event1 (id BIGINT, customer_id STRING, amount INT, ts TIMESTAMP
);

CREATE VIEW event2 AS
SELECT *
FROM event1;

CREATE VIEW event3 AS
SELECT *
FROM event1;

CREATE VIEW temp1 AS
SELECT id AS event2.id,
"Total" AS Total,
SUM(event3.amount) AS total_amount
FROM event2
LEFT JOIN
event3
ON event2.ts >= event3.ts
GROUP BY
event2.id;

CREATE VIEW temp2 AS
SELECT MAX(event2.id) as id,
event2.customer_id AS customer_id,
SUM(event3.amount) AS client_total_amount
FROM event2
LEFT JOIN
event3
ON event2.ts >= event3.ts
AND event2.customer_id = event3.customer_id
GROUP BY
event2.customer_id;

Regards,
Xiangyu

Jiten Pathy  于2023年8月23日周三 16:38写道:

> Hi Xiangyu,
> Yes, that's correct. It is the requestId, we will have for each request.
>
> On Wed, 23 Aug 2023 at 13:47, xiangyu feng  wrote:
>
>> Hi Pathy,
>>
>> I want to know if the 'id' in {id, customerId, amount, timestamp} stands
>> for 'requestId'? If not,  how is this 'id' field generated and can we
>> add 'requestId' field in the event?
>>
>> Thx,
>> Xiangyu
>>
>> Jiten Pathy  于2023年8月22日周二 14:04写道:
>>
>>> Hi,
>>> We are currently evaluating Flink for our analytics engine. We would
>>> appreciate any help with our experiment in using flink for real-time
>>> request-response use-case.
>>>
>>> To demonstrate the current use-case: our application produces events of
>>> the following form:
>>>
>>> {id, customerId, amount, timestamp}
>>>
>>> We calculate some continuous aggregates triggered by each event produced
>>> and use them to decide on the action.
>>>
>>> Examples of Aggregates: sum of amount total, amount group by customerId,
>>> amount per day(group-by customer), per month etc.
>>>
>>> One approach we considered is to correlate the aggregates with the `Id`,
>>> So for the following input events:
>>>
>>> {1, "CUST1", 100, $TS1}
>>> {2, "CUST2", 5, $TS2}
>>> {3, "CUST1", 15, $TS3}
>>>
>>> We would generate the following(ignoring timestamp for now) into kafka:
>>>
>>> {1, "TOTAL", 100} , {1, "GROUPBYCUSTTOTAL", "CUST1", 100}
>>> {2, "TOTAL", 105} , {2, "GROUPBYCUSTTOTAL", "CUST2", 5}
>>> {3, "TOTAL", 120} , {3, "GROUPBYCUSTTOTAL", "CUST1", 115}
>>>
>>> And our application would read from kafka and process them.
>>>
>>> So the flow looks like:
>>>
>>> Application -- kafka---> flink --> kafka <--- Application
>>>
>>> We want to keep our current request - response model i.e. we need all
>>> continuous aggregates out for every ingested event into flink, before
>>> we can further process the said event.
>>>
>>> Unfortunately we don't see a way to do this in flink-SQL: As the
>>> aggregates would not have the requestId for us to correlate with e.g.
>>> for the following simple continuous query:
>>> SELECT sum(amount) from EVENTS
>>>
>>> We have tried doing this with flink-Datastream API: KeyedProcessFunction
>>>  with MapState per window, and collecting in processElement and using
>>> Kafka sink.
>>>
>>> A sample code for the windowing would look like the following:
>>>
>>>  public void processElement(Transaction transaction, 
>>> KeyedProcessFunction.Context context, 
>>> Collector collector) throws Exception {
>>> ()
>>> collector.collect(new Aggregate(transaction.getId(), 
>>> context.getCurrentKey(), agg0, evTime));
>>> }
>>>
>>> If we were to use FlinkSQL instead, how would we accomplish this
>>> functionality?
>>>
>>> If there are any alternative approaches to accomplish this while
>>> maintaining our invariant: every event must produce all aggregates that
>>> consume the corresponding event, we would love to hear from the community.
>>>
>>> Regards,
>>>
>>> Jiten
>>>
>>> *The information contained in this transmission (including any
>>> attachments) is confidential and may be privileged. It is intended only for
>>> the use of the individual or entity named above. If you are not the
>>> intended recipient; dissemination, distribution, or copy of this
>>> communication is strictly prohibited. If you have received this
>>> communication in error, please erase all copies of this message and its
>>> attachments and notify me immediately.*
>>>
>>
> *The information contained in this transmission (including any
> attachments) is confidential and may be privileged. It is intended only for
> the use of the individual or entity named above. If you are not the
> intended recipient; dissemination, distribution, or copy of this
> communication is strictly prohibited. If you have received this
> communication in error, please erase all copies of this message and its
> attachments and notify me immediately.*
>


Re: Request-Response flow for real-time analytics

2023-08-23 Thread Jiten Pathy
Hi Xiangyu,
Yes, that's correct. It is the requestId, we will have for each request.

On Wed, 23 Aug 2023 at 13:47, xiangyu feng  wrote:

> Hi Pathy,
>
> I want to know if the 'id' in {id, customerId, amount, timestamp} stands
> for 'requestId'? If not,  how is this 'id' field generated and can we add
> 'requestId' field in the event?
>
> Thx,
> Xiangyu
>
> Jiten Pathy  于2023年8月22日周二 14:04写道:
>
>> Hi,
>> We are currently evaluating Flink for our analytics engine. We would
>> appreciate any help with our experiment in using flink for real-time
>> request-response use-case.
>>
>> To demonstrate the current use-case: our application produces events of
>> the following form:
>>
>> {id, customerId, amount, timestamp}
>>
>> We calculate some continuous aggregates triggered by each event produced
>> and use them to decide on the action.
>>
>> Examples of Aggregates: sum of amount total, amount group by customerId,
>> amount per day(group-by customer), per month etc.
>>
>> One approach we considered is to correlate the aggregates with the `Id`,
>> So for the following input events:
>>
>> {1, "CUST1", 100, $TS1}
>> {2, "CUST2", 5, $TS2}
>> {3, "CUST1", 15, $TS3}
>>
>> We would generate the following(ignoring timestamp for now) into kafka:
>>
>> {1, "TOTAL", 100} , {1, "GROUPBYCUSTTOTAL", "CUST1", 100}
>> {2, "TOTAL", 105} , {2, "GROUPBYCUSTTOTAL", "CUST2", 5}
>> {3, "TOTAL", 120} , {3, "GROUPBYCUSTTOTAL", "CUST1", 115}
>>
>> And our application would read from kafka and process them.
>>
>> So the flow looks like:
>>
>> Application -- kafka---> flink --> kafka <--- Application
>>
>> We want to keep our current request - response model i.e. we need all
>> continuous aggregates out for every ingested event into flink, before we
>> can further process the said event.
>>
>> Unfortunately we don't see a way to do this in flink-SQL: As the
>> aggregates would not have the requestId for us to correlate with e.g.
>> for the following simple continuous query:
>> SELECT sum(amount) from EVENTS
>>
>> We have tried doing this with flink-Datastream API: KeyedProcessFunction
>>  with MapState per window, and collecting in processElement and using
>> Kafka sink.
>>
>> A sample code for the windowing would look like the following:
>>
>>  public void processElement(Transaction transaction, 
>> KeyedProcessFunction.Context context, 
>> Collector collector) throws Exception {
>> ()
>> collector.collect(new Aggregate(transaction.getId(), 
>> context.getCurrentKey(), agg0, evTime));
>> }
>>
>> If we were to use FlinkSQL instead, how would we accomplish this
>> functionality?
>>
>> If there are any alternative approaches to accomplish this while
>> maintaining our invariant: every event must produce all aggregates that
>> consume the corresponding event, we would love to hear from the community.
>>
>> Regards,
>>
>> Jiten
>>
>> *The information contained in this transmission (including any
>> attachments) is confidential and may be privileged. It is intended only for
>> the use of the individual or entity named above. If you are not the
>> intended recipient; dissemination, distribution, or copy of this
>> communication is strictly prohibited. If you have received this
>> communication in error, please erase all copies of this message and its
>> attachments and notify me immediately.*
>>
>

-- 
*The information contained in this transmission (including any attachments) 
is confidential and may be privileged. It is intended only for the use of 
the individual or entity named above. If you are not the intended 
recipient; dissemination, distribution, or copy of this communication is 
strictly prohibited. If you have received this communication in error, 
please erase all copies of this message and its attachments and notify me 
immediately.*


Re: Request-Response flow for real-time analytics

2023-08-23 Thread xiangyu feng
Hi Pathy,

I want to know if the 'id' in {id, customerId, amount, timestamp} stands
for 'requestId'? If not,  how is this 'id' field generated and can we add
'requestId' field in the event?

Thx,
Xiangyu

Jiten Pathy  于2023年8月22日周二 14:04写道:

> Hi,
> We are currently evaluating Flink for our analytics engine. We would
> appreciate any help with our experiment in using flink for real-time
> request-response use-case.
>
> To demonstrate the current use-case: our application produces events of
> the following form:
>
> {id, customerId, amount, timestamp}
>
> We calculate some continuous aggregates triggered by each event produced
> and use them to decide on the action.
>
> Examples of Aggregates: sum of amount total, amount group by customerId,
> amount per day(group-by customer), per month etc.
>
> One approach we considered is to correlate the aggregates with the `Id`,
> So for the following input events:
>
> {1, "CUST1", 100, $TS1}
> {2, "CUST2", 5, $TS2}
> {3, "CUST1", 15, $TS3}
>
> We would generate the following(ignoring timestamp for now) into kafka:
>
> {1, "TOTAL", 100} , {1, "GROUPBYCUSTTOTAL", "CUST1", 100}
> {2, "TOTAL", 105} , {2, "GROUPBYCUSTTOTAL", "CUST2", 5}
> {3, "TOTAL", 120} , {3, "GROUPBYCUSTTOTAL", "CUST1", 115}
>
> And our application would read from kafka and process them.
>
> So the flow looks like:
>
> Application -- kafka---> flink --> kafka <--- Application
>
> We want to keep our current request - response model i.e. we need all
> continuous aggregates out for every ingested event into flink, before we
> can further process the said event.
>
> Unfortunately we don't see a way to do this in flink-SQL: As the
> aggregates would not have the requestId for us to correlate with e.g. for
> the following simple continuous query:
> SELECT sum(amount) from EVENTS
>
> We have tried doing this with flink-Datastream API: KeyedProcessFunction
>  with MapState per window, and collecting in processElement and using
> Kafka sink.
>
> A sample code for the windowing would look like the following:
>
>  public void processElement(Transaction transaction, 
> KeyedProcessFunction.Context context, 
> Collector collector) throws Exception {
> ()
> collector.collect(new Aggregate(transaction.getId(), 
> context.getCurrentKey(), agg0, evTime));
> }
>
> If we were to use FlinkSQL instead, how would we accomplish this
> functionality?
>
> If there are any alternative approaches to accomplish this while
> maintaining our invariant: every event must produce all aggregates that
> consume the corresponding event, we would love to hear from the community.
>
> Regards,
>
> Jiten
>
> *The information contained in this transmission (including any
> attachments) is confidential and may be privileged. It is intended only for
> the use of the individual or entity named above. If you are not the
> intended recipient; dissemination, distribution, or copy of this
> communication is strictly prohibited. If you have received this
> communication in error, please erase all copies of this message and its
> attachments and notify me immediately.*
>


Fwd: [Discussion] Slack Channel

2023-08-23 Thread Jing Ge via user
Hi devs,

Thanks Giannis for your suggestion. It seems that the last email wasn't
sent to the dev ML. It is also an interesting topic for devs and user-zh.

Best regards,
Jing

-- Forwarded message -
From: Giannis Polyzos 
Date: Tue, Aug 22, 2023 at 11:11 AM
Subject: [Discussion] Slack Channel
To: user , 


Hello folks,
considering how apache flink gains more and more popularity and seeing how
other open-source projects use Slack, I wanted to start this thread to see
how we can grow the community.
First of all one thing I have noticed, although there are people involved
with Flink only lately they start noticing there is actually an open-source
Slack channel.
Maybe we can help somehow raise awareness around that? Like inviting people
to join.
The other part is around the channels.
Currently, there is only one channel #troubleshooting for people to ask
questions.
I believe this creates a few limitations, as there are quite a few
questions daily, but it's hard to differentiate between topics, and people
with context on specific parts can't identify them easily.
I'm thinking it would be nice to create more channels like:
#flink-cdc
#flink-paimon
#flink-datastream
#pyflink
#flink-sql
#flink-statebackends
#flink-monitoring
#flink-k8s
#job-board etc.
to help people have more visibility on different topics, make it easier to
find answers to similar questions, and search for things of interest.
This can be a first step towards growing the community.

Best,
Giannis


Fwd: [Discussion] Slack Channel

2023-08-23 Thread Jing Ge
Hi devs,

Thanks Giannis for your suggestion. It seems that the last email wasn't
sent to the dev ML. It is also an interesting topic for devs and user-zh.

Best regards,
Jing

-- Forwarded message -
From: Giannis Polyzos 
Date: Tue, Aug 22, 2023 at 11:11 AM
Subject: [Discussion] Slack Channel
To: user , 


Hello folks,
considering how apache flink gains more and more popularity and seeing how
other open-source projects use Slack, I wanted to start this thread to see
how we can grow the community.
First of all one thing I have noticed, although there are people involved
with Flink only lately they start noticing there is actually an open-source
Slack channel.
Maybe we can help somehow raise awareness around that? Like inviting people
to join.
The other part is around the channels.
Currently, there is only one channel #troubleshooting for people to ask
questions.
I believe this creates a few limitations, as there are quite a few
questions daily, but it's hard to differentiate between topics, and people
with context on specific parts can't identify them easily.
I'm thinking it would be nice to create more channels like:
#flink-cdc
#flink-paimon
#flink-datastream
#pyflink
#flink-sql
#flink-statebackends
#flink-monitoring
#flink-k8s
#job-board etc.
to help people have more visibility on different topics, make it easier to
find answers to similar questions, and search for things of interest.
This can be a first step towards growing the community.

Best,
Giannis


Re: [DISCUSS] Status of Statefun Project

2023-08-23 Thread Filip Karnicki
Hi Gordon

Any chance we could get this reviewed and released to the central repo?
We're currently forced to use a Flink version that has a nasty bug causing
an operational nightmare

Many thanks
Fil

On Sat, 19 Aug 2023 at 01:38, Galen Warren via user 
wrote:

> Gotcha, makes sense as to the original division.
>
> >> Can this be solved by simply passing in the path to the artifacts
>
> This definitely works if we're going to be copying the artifacts on the
> host side -- into the build context -- and then from the context into the
> image. It only gets tricky to have a potentially varying path to the
> artifacts if we're trying to *directly *include the artifacts in the
> Docker context -- then we have a situation where the Docker context must
> contain both the artifacts and playground files, with (potentially)
> different root locations.
>
> Maybe the simplest thing to do here is just to leave the playground as-is
> and then copy the artifacts into the Docker context manually, prior to
> building the playground images. I'm fine with that. It will mean that each
> Statefun release will require two PRs and two sets of build/publish steps
> instead of one, but if everyone else is fine with that I am, too. Unless
> anyone objects, I'll go ahead and queue up a PR for the playground that
> makes these changes.
>
> Also, I should mention -- in case it's not clear -- that I have already
> built and run the playground examples with the code from the PR and
> everything worked. So that PR is ready to move forward with review, etc.,
> at this point.
>
> Thanks.
>
>
>
>
>
>
>
> On Fri, Aug 18, 2023 at 4:16 PM Tzu-Li (Gordon) Tai 
> wrote:
>
>> Hi Galen,
>>
>> The original intent of having a separate repo for the playground repo,
>> was that StateFun users can just go to that and start running simple
>> examples without any other distractions from the core code. I personally
>> don't have a strong preference here and can understand how it would make
>> the workflow more streamlined, but just FYI on the reasoning why are
>> separate in the first place.
>>
>> re: paths for locating StateFun artifacts.
>> Can this be solved by simply passing in the path to the artifacts? As
>> well as the image tag for the locally build base StateFun image. They could
>> probably be environment variables.
>>
>> Cheers,
>> Gordon
>>
>> On Fri, Aug 18, 2023 at 12:13 PM Galen Warren via user <
>> user@flink.apache.org> wrote:
>>
>>> Yes, exactly! And in addition to the base Statefun jars and the jar for
>>> the Java SDK, it does an equivalent copy/register operation for each of the
>>> other SDK libraries (Go, Python, Javascript) so that those libraries are
>>> also available when building the playground examples.
>>>
>>> One more question: In order to copy the various build artifacts into the
>>> Docker containers, those artifacts need to be part of the Docker context.
>>> With the playground being a separate project, that's slightly tricky to do,
>>> as there is no guarantee (other than convention) about the relative paths
>>> of *flink-statefun* and* flink-statefun-playground *in someone's local
>>> filesystem. The way I've set this up locally is to copy the playground into
>>> the* flink-statefun* project -- i.e. to *flink-statefun*/playground --
>>> such that I can set the Docker context to the root folder of
>>> *flink-statefun* and then have access to any local code and/or build
>>> artifacts.
>>>
>>> I'm wondering if there might be any appetite for making that move
>>> permanent, i.e. moving the playground to *flink-statefun*/playground
>>> and deprecating the standalone playground project. In addition to making
>>> the problem of building with unreleased artifacts a bit simpler to solve,
>>> it would also simplify the process of releasing a new Statefun version,
>>> since the entire process could be handled with a single PR and associated
>>> build/deploy tasks. In other words, a single PR could both update and
>>> deploy the Statefun package and the playground code and images.
>>>
>>> As it stands, at least two PRs would be required for each Statefun
>>> version update -- one for *flink-statefun* and one for
>>> *flink-statefun-playground*.
>>>
>>> Anyway, just an idea. Maybe there's an important reason for these
>>> projects to remain separate. If we do want to keep the playground project
>>> where it is, I could solve the copying problem by requiring the two
>>> projects to be siblings in the file system and by pre-copying the local
>>> build artifacts into a location accessible by the existing Docker contexts.
>>> This would still leave us with the need to have two PRs and releases
>>> instead of one, though.
>>>
>>> Thanks for your help!
>>>
>>>
>>> On Fri, Aug 18, 2023 at 2:45 PM Tzu-Li (Gordon) Tai 
>>> wrote:
>>>
 Hi Galen,

 > locally built code is copied into the build containers
 so that it can be accessed during the build.

 That's exactly what we had been doing for release testing, yes. Sorry