flink Job is throwing depdnecy issue when submitted to clusrer

2022-05-06 Thread Great Info
I have one flink job which reads files from s3 and processes them.
Currently, it is running on flink 1.9.0, I need to upgrade my cluster to
1.13.5, so I have done the changes in my job pom and brought up the flink
cluster using 1.13.5 dist.

when I submit my application I am getting the below error when it tries to
connect to s3, have updated the s3 SDK version to the latest, but still
getting the same error.

caused by: java.lang.invoke.lambdaconversionexception: invalid receiver
type interface org.apache.http.header; not a subtype of implementation type
interface org.apache.http.namevaluepair

it works when I just run as a mini-cluster ( running just java -jar
) and also when I submit to the Flink cluster with 1.9.0.

Not able to understand where the dependency match is happening.


Re: Flink Kubernetes operator not having a scale subresource

2022-05-06 Thread Yang Wang
Currently, the flink-kubernetes-operator is using Flink native K8s
integration[1], which means Flink ResourceManager will dynamically allocate
TaskManager on demand.
So the users do not need to specify the replicas of TaskManager.

Just like Gyula said, one possible solution to make "kubectl scale" work is
to change the parallelism of Flink job.

If the standalone mode[2] is introduced in the operator, then it is also
possible to directly change the replicas of TaskManager pods.


[1].
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/resource-providers/native_kubernetes/
[2].
https://cwiki.apache.org/confluence/display/FLINK/FLIP-225%3A+Implement+standalone+mode+support+in+the+kubernetes+operator

Best,
Yang

Gyula Fóra  于2022年5月7日周六 04:26写道:

> Hi Jay!
>
> Interesting question/proposal to add the scale-subresource.
>
> I am not an expert on this area but we will look into this a little and
> give you some feedback and see if we can incorporate something into the
> upcoming release if it makes sense.
>
> On a high level there is not a single replicas value for a
> FlinkDeployment that would be easy to map, but maybe we could use the
> parallelism value for this purpose for Applications/Session jobs.
>
> Cheers,
> Gyula
>
> On Fri, May 6, 2022 at 8:04 PM Jay Ghiya  wrote:
>
>>  Hi Team,
>>
>>
>> I have been experimenting the Flink Kubernetes operator. One of the
>> biggest miss that we have is it does not support scale sub resource as of
>> now to support reactive scaling. Without that commercially it becomes very
>> difficult for products like us who have very varied loads for every hour.
>>
>>
>>
>> Can I get some direction on the same to contribute on
>> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource
>>  for
>> our Kubernetes operator crd?
>>
>> I have been a hard time reading -> 
>> *https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml
>> 
>>  to
>> figure out the replicas, status,label selector json path of task
>> manager? It may be due to lack of my knowledge so sense of direction will
>> help me.*
>>
>> *-Jay*
>> *GEHC*
>>
>


Re: CsvReaderFormat类的位置

2022-05-06 Thread Shengkai Fang
Hi,

看不到你的图,是说这个吗[1]?

Best,
Shengkai
[1]
https://github.com/apache/flink/blob/ca58a700bbc0522f3c62e9db720f9f89c8bd8313/flink-formats/flink-csv/src/main/java/org/apache/flink/formats/csv/CsvReaderFormat.java

 于2022年5月7日周六 10:07写道:

> 您好
>
>我在flink文档中的文件系统部分看到了这个类:CsvReaderFormat,但是我并没有找到它所在的包位置,能告诉我一下吗?
>
>
>
>
>
> ---
> 免责声明(Disclaimer):
>
> 1.此电子邮件包含来自神州信息的信息,而且是机密的或者专用的信息。这些信息是供所有以上列出的个人或者团体使用的。如果您不是此邮件的预期收件人,请勿阅读、复制、转发或存储此邮件。如果已误收此邮件,请通知发件人。
> This e-mail may contain confidential and/or privileged information from
> DCITS and is intended solely for the attention and use of the person(s)
> named above. If you are not the intended recipient (or have received this
> e-mail in error), please notify the sender immediately and destroy this
> e-mail. Any unauthorized copying, disclosure or distribution of the
> material in this email is strictly forbidden.
> 2.本公司不担保本电子邮件中信息的准确性、适当性或完整性,并且对此产生的任何错误或疏忽不承担任何责任。
> The content provided in this e-mail can not be guaranteed and assured to
> be accurate, appropriate for all, and complete by Digital China, and
> Digital China can not be held responsible for any error or negligence
> derived therefrom.
> 3.接收方应在接收电子邮件或任何附件时检查有无病毒。本公司对由于转载本电子邮件而引发病毒产生的任何损坏不承担任何责任。
> The internet communications through this e-mail can not be guaranteed or
> assured to be error or virus-free, and the sender do not accept liability
> for any errors, omissions or damages arising therefrom.
>
>


CsvReaderFormat类的位置

2022-05-06 Thread wangxzr
您好
我在flink文档中的文件系统部分看到了这个类:CsvReaderFormat,但是我并没有找到它所在的包位置,能告诉我一下吗?



---

免责声明(Disclaimer):

1.此电子邮件包含来自神州信息的信息,而且是机密的或者专用的信息。这些信息是供所有以上列出的个人或者团体使用的。如果您不是此邮件的预期收件人,请勿阅读、复制、转发或存储此邮件。如果已误收此邮件,请通知发件人。

This e-mail may contain confidential and/or privileged information from DCITS 
and is intended solely for the attention and use of the person(s) named above. 
If you are not the intended recipient (or have received this e-mail in error), 
please notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this email is strictly 
forbidden.

2.本公司不担保本电子邮件中信息的准确性、适当性或完整性,并且对此产生的任何错误或疏忽不承担任何责任。

The content provided in this e-mail can not be guaranteed and assured to be 
accurate, appropriate for all, and complete by Digital China, and Digital China 
can not be held responsible for any error or negligence derived therefrom.

3.接收方应在接收电子邮件或任何附件时检查有无病毒。本公司对由于转载本电子邮件而引发病毒产生的任何损坏不承担任何责任。

The internet communications through this e-mail can not be guaranteed or 
assured to be error or virus-free, and the sender do not accept liability for 
any errors, omissions or damages arising therefrom.



Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Jingsong Li
Most of the open source communities I know have set up their slack
channels, such as Apache Iceberg [1], Apache Druid [2], etc.
So I think slack can be worth trying.

David is right, there are some cases that need to communicate back and
forth, slack communication will be more effective.

But back to the question, ultimately it's about whether there are
enough core developers willing to invest time in the slack, to
discuss, to answer questions, to communicate.
And whether there will be enough time to reply to the mailing list and
stackoverflow after we put in the slack (which we need to do).

[1] https://iceberg.apache.org/community/#slack
[2] https://druid.apache.org/community/

On Fri, May 6, 2022 at 10:06 PM David Anderson  wrote:
>
> I have mixed feelings about this.
>
> I have been rather visible on stack overflow, and as a result I get a lot of 
> DMs asking for help. I enjoy helping, but want to do it on a platform where 
> the responses can be searched and shared.
>
> It is currently the case that good questions on stack overflow frequently go 
> unanswered because no one with the necessary expertise takes the time to 
> respond. If the Flink community has the collective energy to do more user 
> outreach, more involvement on stack overflow would be a good place to start. 
> Adding slack as another way for users to request help from those who are 
> already actively providing support on the existing communication channels 
> might just lead to burnout.
>
> On the other hand, there are rather rare, but very interesting cases where 
> considerable back and forth is needed to figure out what's going on. This can 
> happen, for example, when the requirements are unusual, or when a difficult 
> to diagnose bug is involved. In these circumstances, something like slack is 
> much better suited than email or stack overflow.
>
> David
>
> On Fri, May 6, 2022 at 3:04 PM Becket Qin  wrote:
>>
>> Thanks for the proposal, Xintong.
>>
>> While I share the same concerns as those mentioned in the previous 
>> discussion thread, admittedly there are benefits of having a slack channel 
>> as a supplementary way to discuss Flink. The fact that this topic is raised 
>> once a while indicates lasting interests.
>>
>> Personally I am open to having such a slack channel. Although it has 
>> drawbacks, it serves a different purpose. I'd imagine that for people who 
>> prefer instant messaging, in absence of the slack channel, a lot of 
>> discussions might just take place offline today, which leaves no public 
>> record at all.
>>
>> One step further, if the channel is maintained by the Flink PMC, some kind 
>> of code of conduct might be necessary. I think the suggestions of ad-hoc 
>> conversations, reflecting back to the emails are good starting points. I am 
>> +1 to give it a try and see how it goes. In the worst case, we can just stop 
>> doing this and come back to where we are right now.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Fri, May 6, 2022 at 8:55 PM Martijn Visser  wrote:
>>>
>>> Hi everyone,
>>>
>>> While I see Slack having a major downside (the results are not indexed by 
>>> external search engines, you can't link directly to Slack content unless 
>>> you've signed up), I do think that the open source space has progressed and 
>>> that Slack is considered as something that's invaluable to users. There are 
>>> other Apache programs that also run it, like Apache Airflow [1]. I also see 
>>> it as a potential option to create a more active community.
>>>
>>> A concern I can see is that users will start DMing well-known 
>>> reviewers/committers to get a review or a PR merged. That can cause a lot 
>>> of noise. I can go +1 for Slack, but then we need to establish a set of 
>>> community rules.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> [1] https://airflow.apache.org/community/
>>>
>>> On Fri, 6 May 2022 at 13:59, Piotr Nowojski  wrote:

 Hi Xintong,

 I'm not sure if slack is the right tool for the job. IMO it works great as
 an adhoc tool for discussion between developers, but it's not searchable
 and it's not persistent. Between devs, it works fine, as long as the result
 of the ad hoc discussions is backported to JIRA/mailing list/design doc.
 For users, that simply would be extremely difficult to achieve. In the
 result, I would be afraid we are answering the same questions over, and
 over and over again, without even a way to provide a link to the previous
 thread, because nobody can search for it .

 I'm +1 for having an open and shared slack space/channel for the
 contributors, but I think I would be -1 for such channels for the users.

 For users, I would prefer to focus more on, for example, stackoverflow.
 With upvoting, clever sorting of the answers (not the oldest/newest at top)
 it's easily searchable - those features make it fit our use case much
 better IMO.

 Best,
 Piotrek




Re: Flink Kubernetes operator not having a scale subresource

2022-05-06 Thread Gyula Fóra
Hi Jay!

Interesting question/proposal to add the scale-subresource.

I am not an expert on this area but we will look into this a little and
give you some feedback and see if we can incorporate something into the
upcoming release if it makes sense.

On a high level there is not a single replicas value for a
FlinkDeployment that would be easy to map, but maybe we could use the
parallelism value for this purpose for Applications/Session jobs.

Cheers,
Gyula

On Fri, May 6, 2022 at 8:04 PM Jay Ghiya  wrote:

>  Hi Team,
>
>
> I have been experimenting the Flink Kubernetes operator. One of the
> biggest miss that we have is it does not support scale sub resource as of
> now to support reactive scaling. Without that commercially it becomes very
> difficult for products like us who have very varied loads for every hour.
>
>
>
> Can I get some direction on the same to contribute on
> https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource
>  for
> our Kubernetes operator crd?
>
> I have been a hard time reading -> 
> *https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml
> 
>  to
> figure out the replicas, status,label selector json path of task
> manager? It may be due to lack of my knowledge so sense of direction will
> help me.*
>
> *-Jay*
> *GEHC*
>


Flink Kubernetes operator not having a scale subresource

2022-05-06 Thread Jay Ghiya
 Hi Team,


I have been experimenting the Flink Kubernetes operator. One of the biggest 
miss that we have is it does not support scale sub resource as of now to 
support reactive scaling. Without that commercially it becomes very difficult 
for products like us who have very varied loads for every hour.



Can I get some direction on the same to contribute on 
https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/#scale-subresource
 

 for our Kubernetes operator crd?

I have been a hard time reading -> 
https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/crds/flinkdeployments.flink.apache.org-v1.yml
 

 to figure out the replicas, status,label selector json path of task manager? 
It may be due to lack of my knowledge so sense of direction will help me.

-Jay
GEHC

Practical guidance with Scala and Flink >= 1.15

2022-05-06 Thread Salva Alcántara
I've always used Scala in the context of Flink. Now that Flink 1.15 has
become Scala-free, I wonder what is the best (most practical) route for me
moving forward. These are my options:

1. Keep using Scala 2.12 for the years to come (and upgrade to newer
versions when the community has come up with something). How long is Flink
expected to support Scala 2.12?

2. Upgrade to Scala 2.13 or Scala 3 and use the Java API directly (without
any Scala-specific wrapper/API). How problematic will that be, especially
regarding type information & scala-specific serializers? I hate those
"returns" (type hints) in the Java API...

3. Switch to Java, at least for the time being...

To be clear, I have a strong preference for Scala over Java, but I'm trying
to look at the "grand scheme of things" here, and be pragmatic. I guess I'm
not alone here, and that many people are indeed evaluating the same pros &
cons. Any feedback will be much appreciated.

Thanks in advance!


Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread David Anderson
I have mixed feelings about this.

I have been rather visible on stack overflow, and as a result I get a lot
of DMs asking for help. I enjoy helping, but want to do it on a platform
where the responses can be searched and shared.

It is currently the case that good questions on stack overflow frequently
go unanswered because no one with the necessary expertise takes the time to
respond. If the Flink community has the collective energy to do more user
outreach, more involvement on stack overflow would be a good place to
start. Adding slack as another way for users to request help from those who
are already actively providing support on the existing communication
channels might just lead to burnout.

On the other hand, there are rather rare, but very interesting cases where
considerable back and forth is needed to figure out what's going on. This
can happen, for example, when the requirements are unusual, or when a
difficult to diagnose bug is involved. In these circumstances, something
like slack is much better suited than email or stack overflow.

David

On Fri, May 6, 2022 at 3:04 PM Becket Qin  wrote:

> Thanks for the proposal, Xintong.
>
> While I share the same concerns as those mentioned in the previous
> discussion thread, admittedly there are benefits of having a slack channel
> as a supplementary way to discuss Flink. The fact that this topic is raised
> once a while indicates lasting interests.
>
> Personally I am open to having such a slack channel. Although it has
> drawbacks, it serves a different purpose. I'd imagine that for people who
> prefer instant messaging, in absence of the slack channel, a lot of
> discussions might just take place offline today, which leaves no public
> record at all.
>
> One step further, if the channel is maintained by the Flink PMC, some kind
> of code of conduct might be necessary. I think the suggestions of ad-hoc
> conversations, reflecting back to the emails are good starting points. I
> am +1 to give it a try and see how it goes. In the worst case, we can just
> stop doing this and come back to where we are right now.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Fri, May 6, 2022 at 8:55 PM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> While I see Slack having a major downside (the results are not indexed by
>> external search engines, you can't link directly to Slack content unless
>> you've signed up), I do think that the open source space has progressed and
>> that Slack is considered as something that's invaluable to users. There are
>> other Apache programs that also run it, like Apache Airflow [1]. I also see
>> it as a potential option to create a more active community.
>>
>> A concern I can see is that users will start DMing well-known
>> reviewers/committers to get a review or a PR merged. That can cause a lot
>> of noise. I can go +1 for Slack, but then we need to establish a set of
>> community rules.
>>
>> Best regards,
>>
>> Martijn
>>
>> [1] https://airflow.apache.org/community/
>>
>> On Fri, 6 May 2022 at 13:59, Piotr Nowojski  wrote:
>>
>>> Hi Xintong,
>>>
>>> I'm not sure if slack is the right tool for the job. IMO it works great
>>> as
>>> an adhoc tool for discussion between developers, but it's not searchable
>>> and it's not persistent. Between devs, it works fine, as long as the
>>> result
>>> of the ad hoc discussions is backported to JIRA/mailing list/design doc.
>>> For users, that simply would be extremely difficult to achieve. In the
>>> result, I would be afraid we are answering the same questions over, and
>>> over and over again, without even a way to provide a link to the previous
>>> thread, because nobody can search for it .
>>>
>>> I'm +1 for having an open and shared slack space/channel for the
>>> contributors, but I think I would be -1 for such channels for the users.
>>>
>>> For users, I would prefer to focus more on, for example, stackoverflow.
>>> With upvoting, clever sorting of the answers (not the oldest/newest at
>>> top)
>>> it's easily searchable - those features make it fit our use case much
>>> better IMO.
>>>
>>> Best,
>>> Piotrek
>>>
>>>
>>>
>>> pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):
>>>
>>> > Thank you~
>>> >
>>> > Xintong Song
>>> >
>>> >
>>> >
>>> > -- Forwarded message -
>>> > From: Xintong Song 
>>> > Date: Fri, May 6, 2022 at 5:07 PM
>>> > Subject: Re: [Discuss] Creating an Apache Flink slack workspace
>>> > To: private 
>>> > Cc: Chesnay Schepler 
>>> >
>>> >
>>> > Hi Chesnay,
>>> >
>>> > Correct me if I'm wrong, I don't find this is *repeatedly* discussed
>>> on the
>>> > ML. The only discussions I find are [1] & [2], which are 4 years ago.
>>> On
>>> > the other hand, I do find many users are asking questions about whether
>>> > Slack should be supported [2][3][4]. Besides, I also find a recent
>>> > discussion thread from ComDev [5], where alternative communication
>>> channels
>>> > are being discussed. It seems to me ASF is quite open to having such
>>> > additional 

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Becket Qin
Thanks for the proposal, Xintong.

While I share the same concerns as those mentioned in the previous
discussion thread, admittedly there are benefits of having a slack channel
as a supplementary way to discuss Flink. The fact that this topic is raised
once a while indicates lasting interests.

Personally I am open to having such a slack channel. Although it has
drawbacks, it serves a different purpose. I'd imagine that for people who
prefer instant messaging, in absence of the slack channel, a lot of
discussions might just take place offline today, which leaves no public
record at all.

One step further, if the channel is maintained by the Flink PMC, some kind
of code of conduct might be necessary. I think the suggestions of ad-hoc
conversations, reflecting back to the emails are good starting points. I
am +1 to give it a try and see how it goes. In the worst case, we can just
stop doing this and come back to where we are right now.

Thanks,

Jiangjie (Becket) Qin

On Fri, May 6, 2022 at 8:55 PM Martijn Visser  wrote:

> Hi everyone,
>
> While I see Slack having a major downside (the results are not indexed by
> external search engines, you can't link directly to Slack content unless
> you've signed up), I do think that the open source space has progressed and
> that Slack is considered as something that's invaluable to users. There are
> other Apache programs that also run it, like Apache Airflow [1]. I also see
> it as a potential option to create a more active community.
>
> A concern I can see is that users will start DMing well-known
> reviewers/committers to get a review or a PR merged. That can cause a lot
> of noise. I can go +1 for Slack, but then we need to establish a set of
> community rules.
>
> Best regards,
>
> Martijn
>
> [1] https://airflow.apache.org/community/
>
> On Fri, 6 May 2022 at 13:59, Piotr Nowojski  wrote:
>
>> Hi Xintong,
>>
>> I'm not sure if slack is the right tool for the job. IMO it works great as
>> an adhoc tool for discussion between developers, but it's not searchable
>> and it's not persistent. Between devs, it works fine, as long as the
>> result
>> of the ad hoc discussions is backported to JIRA/mailing list/design doc.
>> For users, that simply would be extremely difficult to achieve. In the
>> result, I would be afraid we are answering the same questions over, and
>> over and over again, without even a way to provide a link to the previous
>> thread, because nobody can search for it .
>>
>> I'm +1 for having an open and shared slack space/channel for the
>> contributors, but I think I would be -1 for such channels for the users.
>>
>> For users, I would prefer to focus more on, for example, stackoverflow.
>> With upvoting, clever sorting of the answers (not the oldest/newest at
>> top)
>> it's easily searchable - those features make it fit our use case much
>> better IMO.
>>
>> Best,
>> Piotrek
>>
>>
>>
>> pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):
>>
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> >
>> > -- Forwarded message -
>> > From: Xintong Song 
>> > Date: Fri, May 6, 2022 at 5:07 PM
>> > Subject: Re: [Discuss] Creating an Apache Flink slack workspace
>> > To: private 
>> > Cc: Chesnay Schepler 
>> >
>> >
>> > Hi Chesnay,
>> >
>> > Correct me if I'm wrong, I don't find this is *repeatedly* discussed on
>> the
>> > ML. The only discussions I find are [1] & [2], which are 4 years ago. On
>> > the other hand, I do find many users are asking questions about whether
>> > Slack should be supported [2][3][4]. Besides, I also find a recent
>> > discussion thread from ComDev [5], where alternative communication
>> channels
>> > are being discussed. It seems to me ASF is quite open to having such
>> > additional channels and they have been worked well for many projects
>> > already.
>> >
>> > I see two reasons for brining this discussion again:
>> > 1. There are indeed many things that have change during the past 4
>> years.
>> > We have more contributors, including committers and PMC members, and
>> even
>> > more users from various organizations and timezones. That also means
>> more
>> > discussions and Q are happening.
>> > 2. The proposal here is different from the previous discussion. Instead
>> of
>> > maintaining a channel for Flink in the ASF workspace, here we are
>> proposing
>> > to create a dedicated Apache Flink slack workspace. And instead of
>> *moving*
>> > the discussion to Slack, we are proposing to add a Slack Workspace as an
>> > addition to the ML.
>> >
>> > Below is your opinions that I found from your previous -1 [1]. IIUR,
>> these
>> > are all about the using ASF Slack Workspace. If I overlooked anything,
>> > please let me know.
>> >
>> > > 1. According to INFRA-14292 <
>> > > https://issues.apache.org/jira/browse/INFRA-14292> the ASF Slack
>> isn't
>> > > run by the ASF. This alone puts this service into rather questionable
>> > > territory as it /looks/ like an official ASF service. If anyone can
>> > provide

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Martijn Visser
Hi everyone,

While I see Slack having a major downside (the results are not indexed by
external search engines, you can't link directly to Slack content unless
you've signed up), I do think that the open source space has progressed and
that Slack is considered as something that's invaluable to users. There are
other Apache programs that also run it, like Apache Airflow [1]. I also see
it as a potential option to create a more active community.

A concern I can see is that users will start DMing well-known
reviewers/committers to get a review or a PR merged. That can cause a lot
of noise. I can go +1 for Slack, but then we need to establish a set of
community rules.

Best regards,

Martijn

[1] https://airflow.apache.org/community/

On Fri, 6 May 2022 at 13:59, Piotr Nowojski  wrote:

> Hi Xintong,
>
> I'm not sure if slack is the right tool for the job. IMO it works great as
> an adhoc tool for discussion between developers, but it's not searchable
> and it's not persistent. Between devs, it works fine, as long as the result
> of the ad hoc discussions is backported to JIRA/mailing list/design doc.
> For users, that simply would be extremely difficult to achieve. In the
> result, I would be afraid we are answering the same questions over, and
> over and over again, without even a way to provide a link to the previous
> thread, because nobody can search for it .
>
> I'm +1 for having an open and shared slack space/channel for the
> contributors, but I think I would be -1 for such channels for the users.
>
> For users, I would prefer to focus more on, for example, stackoverflow.
> With upvoting, clever sorting of the answers (not the oldest/newest at top)
> it's easily searchable - those features make it fit our use case much
> better IMO.
>
> Best,
> Piotrek
>
>
>
> pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):
>
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > -- Forwarded message -
> > From: Xintong Song 
> > Date: Fri, May 6, 2022 at 5:07 PM
> > Subject: Re: [Discuss] Creating an Apache Flink slack workspace
> > To: private 
> > Cc: Chesnay Schepler 
> >
> >
> > Hi Chesnay,
> >
> > Correct me if I'm wrong, I don't find this is *repeatedly* discussed on
> the
> > ML. The only discussions I find are [1] & [2], which are 4 years ago. On
> > the other hand, I do find many users are asking questions about whether
> > Slack should be supported [2][3][4]. Besides, I also find a recent
> > discussion thread from ComDev [5], where alternative communication
> channels
> > are being discussed. It seems to me ASF is quite open to having such
> > additional channels and they have been worked well for many projects
> > already.
> >
> > I see two reasons for brining this discussion again:
> > 1. There are indeed many things that have change during the past 4 years.
> > We have more contributors, including committers and PMC members, and even
> > more users from various organizations and timezones. That also means more
> > discussions and Q are happening.
> > 2. The proposal here is different from the previous discussion. Instead
> of
> > maintaining a channel for Flink in the ASF workspace, here we are
> proposing
> > to create a dedicated Apache Flink slack workspace. And instead of
> *moving*
> > the discussion to Slack, we are proposing to add a Slack Workspace as an
> > addition to the ML.
> >
> > Below is your opinions that I found from your previous -1 [1]. IIUR,
> these
> > are all about the using ASF Slack Workspace. If I overlooked anything,
> > please let me know.
> >
> > > 1. According to INFRA-14292 <
> > > https://issues.apache.org/jira/browse/INFRA-14292> the ASF Slack isn't
> > > run by the ASF. This alone puts this service into rather questionable
> > > territory as it /looks/ like an official ASF service. If anyone can
> > provide
> > > information to the contrary, please do so.
> >
> > 2. We already discuss things on the mailing lists, JIRA and GitHub. All
> of
> > > these are available to the public, whereas the slack channel requires
> an
> > > @apache mail address, i.e. you have to be a committer. This minimizes
> the
> > > target audience rather significantly. I would much rather prefer
> > something
> > > that is also available to contributors.
> >
> >
> > I do agree this should be decided by the whole community. I'll forward
> this
> > to dev@ and user@ ML.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> > [1] https://lists.apache.org/thread/gxwv49ssq82g06dbhy339x6rdxtlcv3d
> > [2] https://lists.apache.org/thread/kcym1sozkrtwxw1fjbnwk1nqrrlzolcc
> > [3] https://lists.apache.org/thread/7rmd3ov6sv3wwhflp97n4czz25hvmqm6
> > [4] https://lists.apache.org/thread/n5y1kzv50bkkbl3ys494dglyxl45bmts
> > [5] https://lists.apache.org/thread/fzwd3lj0x53hkq3od5ot0y719dn3kj1j
> >
> > On Fri, May 6, 2022 at 3:05 PM Chesnay Schepler 
> > wrote:
> >
> > > This has been repeatedly discussed on the ML over the years and was
> > > rejected every time.
> > >
> > > I don't see that anything 

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Piotr Nowojski
Hi Xintong,

I'm not sure if slack is the right tool for the job. IMO it works great as
an adhoc tool for discussion between developers, but it's not searchable
and it's not persistent. Between devs, it works fine, as long as the result
of the ad hoc discussions is backported to JIRA/mailing list/design doc.
For users, that simply would be extremely difficult to achieve. In the
result, I would be afraid we are answering the same questions over, and
over and over again, without even a way to provide a link to the previous
thread, because nobody can search for it .

I'm +1 for having an open and shared slack space/channel for the
contributors, but I think I would be -1 for such channels for the users.

For users, I would prefer to focus more on, for example, stackoverflow.
With upvoting, clever sorting of the answers (not the oldest/newest at top)
it's easily searchable - those features make it fit our use case much
better IMO.

Best,
Piotrek



pt., 6 maj 2022 o 11:08 Xintong Song  napisał(a):

> Thank you~
>
> Xintong Song
>
>
>
> -- Forwarded message -
> From: Xintong Song 
> Date: Fri, May 6, 2022 at 5:07 PM
> Subject: Re: [Discuss] Creating an Apache Flink slack workspace
> To: private 
> Cc: Chesnay Schepler 
>
>
> Hi Chesnay,
>
> Correct me if I'm wrong, I don't find this is *repeatedly* discussed on the
> ML. The only discussions I find are [1] & [2], which are 4 years ago. On
> the other hand, I do find many users are asking questions about whether
> Slack should be supported [2][3][4]. Besides, I also find a recent
> discussion thread from ComDev [5], where alternative communication channels
> are being discussed. It seems to me ASF is quite open to having such
> additional channels and they have been worked well for many projects
> already.
>
> I see two reasons for brining this discussion again:
> 1. There are indeed many things that have change during the past 4 years.
> We have more contributors, including committers and PMC members, and even
> more users from various organizations and timezones. That also means more
> discussions and Q are happening.
> 2. The proposal here is different from the previous discussion. Instead of
> maintaining a channel for Flink in the ASF workspace, here we are proposing
> to create a dedicated Apache Flink slack workspace. And instead of *moving*
> the discussion to Slack, we are proposing to add a Slack Workspace as an
> addition to the ML.
>
> Below is your opinions that I found from your previous -1 [1]. IIUR, these
> are all about the using ASF Slack Workspace. If I overlooked anything,
> please let me know.
>
> > 1. According to INFRA-14292 <
> > https://issues.apache.org/jira/browse/INFRA-14292> the ASF Slack isn't
> > run by the ASF. This alone puts this service into rather questionable
> > territory as it /looks/ like an official ASF service. If anyone can
> provide
> > information to the contrary, please do so.
>
> 2. We already discuss things on the mailing lists, JIRA and GitHub. All of
> > these are available to the public, whereas the slack channel requires an
> > @apache mail address, i.e. you have to be a committer. This minimizes the
> > target audience rather significantly. I would much rather prefer
> something
> > that is also available to contributors.
>
>
> I do agree this should be decided by the whole community. I'll forward this
> to dev@ and user@ ML.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://lists.apache.org/thread/gxwv49ssq82g06dbhy339x6rdxtlcv3d
> [2] https://lists.apache.org/thread/kcym1sozkrtwxw1fjbnwk1nqrrlzolcc
> [3] https://lists.apache.org/thread/7rmd3ov6sv3wwhflp97n4czz25hvmqm6
> [4] https://lists.apache.org/thread/n5y1kzv50bkkbl3ys494dglyxl45bmts
> [5] https://lists.apache.org/thread/fzwd3lj0x53hkq3od5ot0y719dn3kj1j
>
> On Fri, May 6, 2022 at 3:05 PM Chesnay Schepler 
> wrote:
>
> > This has been repeatedly discussed on the ML over the years and was
> > rejected every time.
> >
> > I don't see that anything has changed that would invalidate the
> previously
> > raised arguments against it, so I'm still -1 on it.
> >
> > This is also not something the PMC should decide anyway, but the project
> > as a whole.
> >
> > On 06/05/2022 06:48, Jark Wu wrote:
> >
> > Thank Xintong, for starting this exciting topic.
> >
> > I think Slack would be an essential addition to the mailing list.
> > I have talked with some Flink users, and they are surprised
> > Flink doesn't have Slack yet, and they would love to use Slack.
> > We can also see a trend that new open-source communities
> > are using Slack as the community base camp.
> >
> > Slack is also helpful for brainstorming and asking people for opinions
> and
> > use cases.
> > I think Slack is not only another place for Q but also a connection to
> > the Flink users.
> > We can create more channels to make the community have more social
> > attributes, for example,
> >  - Share ideas, projects, integrations, articles, and presentations
> > related to Flink in 

Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Georg Heiler
Hi,

I would disagree:
In the case of spark, it is a streaming application that is offering full
streaming semantics (but with less cost and bigger latency) as it triggers
less often. In particular, windowing and stateful semantics as well as
late-arriving data are handled automatically using the regular streaming
features.

Would these features be available in a Flink Batch job as well?

Best,
Georg

Am Fr., 6. Mai 2022 um 13:26 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> Hi Georg,
>
> Flink batch applications run until all their input is processed. When
> that's the case, the application finishes. You can read more about this in
> the documentation for DataStream [1] or Table API [2]. I think this matches
> the same as Spark is explaining in the documentation.
>
> Best regards,
>
> Martijn
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
> [2]
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/
>
> On Mon, 2 May 2022 at 16:46, Georg Heiler 
> wrote:
>
>> Hi,
>>
>> spark
>> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
>> offers a variety of triggers.
>>
>> In particular, it also has the "once" mode:
>>
>> *One-time micro-batch* The query will execute *only one* micro-batch to
>> process all the available data and then stop on its own. This is useful in
>> scenarios you want to periodically spin up a cluster, process everything
>> that is available since the last period, and then shutdown the cluster. In
>> some case, this may lead to significant cost savings.
>>
>> Does flink have a similar possibility?
>>
>> Best,
>> Georg
>>
>


Re: trigger once (batch job with streaming semantics)

2022-05-06 Thread Martijn Visser
Hi Georg,

Flink batch applications run until all their input is processed. When
that's the case, the application finishes. You can read more about this in
the documentation for DataStream [1] or Table API [2]. I think this matches
the same as Spark is explaining in the documentation.

Best regards,

Martijn

[1]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/execution_mode/
[2]
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/common/

On Mon, 2 May 2022 at 16:46, Georg Heiler  wrote:

> Hi,
>
> spark
> https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#triggers
> offers a variety of triggers.
>
> In particular, it also has the "once" mode:
>
> *One-time micro-batch* The query will execute *only one* micro-batch to
> process all the available data and then stop on its own. This is useful in
> scenarios you want to periodically spin up a cluster, process everything
> that is available since the last period, and then shutdown the cluster. In
> some case, this may lead to significant cost savings.
>
> Does flink have a similar possibility?
>
> Best,
> Georg
>


Re: Flink-SQL returning duplicate rows for some records

2022-05-06 Thread Martijn Visser
Hi Joost,

I'm looping in Leonard and Jark who might be able to help out here.

Best regards,

Martijn

On Mon, 2 May 2022 at 16:01, Joost Molenaar  wrote:

> Hello all,
>
> I'm trying to use Flink-SQL to monitor a Kafka topic that's populated by
> Debezium, which is in turn monitoring a MS-SQL CDC table. For some reason,
> Flink-SQL shows a new row when I update the boolean field, but updates the
> row in place when I update the text field, and I'm not understanding why
> this happens. My ultimate goal is to use Flink-SQL to do a join on records
> that come from both sides of a 1:N relation in the foreign database, to
> expose a more ready to consume JSON object to downstream consumers.
>
> The source table is defined like this in MS-SQL:
>
> CREATE TABLE todo_list (
> id int IDENTITY NOT NULL,
> done bit NOT NULL DEFAULT 0,
> name varchar(MAX) NOT NULL,
> CONSTRAINT PK_todo_list PRIMARY KEY (id)
> );
>
> This is the configuration I'm sending to Debezium, note that I'm not
> including the
> JSON-schema in both keys and values:
>
> {
> "name": "todo-connector",
> "config": {
> "connector.class":
> "io.debezium.connector.sqlserver.SqlServerConnector",
> "tasks.max": "1",
> "database.server.name": "mssql",
> "database.hostname": "10.88.10.1",
> "database.port": "1433",
> "database.user": "sa",
> "database.password": "...",
> "database.dbname": "todo",
> "database.history.kafka.bootstrap.servers": "10.88.10.10:9092
> ",
> "database.history.kafka.topic": "schema-changes.todo",
> "key.converter": "org.apache.kafka.connect.json.JsonConverter",
> "key.converter.schemas.enable": false,
> "value.converter":
> "org.apache.kafka.connect.json.JsonConverter",
> "value.converter.schemas.enable": false
> }
> }
>
> So Debezium is publishing events to Kafka with keys like this:
>
> {"id":3}
>
> And values like this (whitespace added for readability), this is updating
> the
> value of the 'name' field:
>
> {
>   "before": {
> "id": 3,
> "done": false,
> "name": "test"
>   },
>   "after": {
> "id": 3,
> "done": false,
> "name": "test2"
>   },
>   "source": {
> "version": "1.9.0.Final",
> "connector": "sqlserver",
> "name": "mssql",
> "ts_ms": 1651497653043,
> "snapshot": "false",
> "db": "todo",
> "sequence": null,
> "schema": "dbo",
> "table": "todo_list",
> "change_lsn": "0025:0d58:0002",
> "commit_lsn": "0025:0d58:0003",
> "event_serial_no": 2
>   },
>   "op": "u",
>   "ts_ms": 1651497654127,
>   "transaction": null
> }
>
> (I verified this using a Python script that follows the relevant Kafka
> topic.)
>
> Next, I'm trying to follow this CDC stream in Flink by adding the
> Kafka connector
> for Flink SQL, defining a source table and starting a job in the Flink-SQL
> CLI:
>
> ADD JAR '/opt/flink/opt/flink-sql-connector-kafka_2.11-1.14.4.jar';
>
> CREATE TABLE todo_list (
> k_id BIGINT,
> done BOOLEAN,
> name STRING
> )
> WITH (
> 'connector'='kafka',
> 'topic'='mssql.dbo.todo_list',
> 'properties.bootstrap.servers'='10.88.10.10:9092',
> 'properties.group.id'='flinksql-todo-list',
> 'scan.startup.mode'='earliest-offset',
> 'key.format'='json',
> 'key.fields-prefix'='k_',
> 'key.fields'='k_id',
> 'value.format'='debezium-json',
> 'value.debezium-json.schema-include'='false',
> 'value.fields-include'='EXCEPT_KEY'
> );
>
> SELECT * FROM todo_list;
>
> Now, when I perform a query like this in the MS-SQL database:
>
> UPDATE todo_list SET name='test2' WHERE id=3;
>
> Now I see that the Flink-SQL client updates the row with id=3 to have the
> new
> value "test2" for the 'name' field, as I was expecting. However, when I
> duplicate the 'done' field to have a different value, Flink-SQL seems to
> leave
> the old row with values (3, False, 'test2') intact, and shows a new row
> with
> values (3, True, 'test2').
>
> I tried to append a `PRIMARY KEY (k_id) NOT ENFORCED` line between the
> first
> parentheses in the CREATE TABLE statement, but this seems to make no
> difference, except when running `DESCRIBE todo_list` in Flink-SQL.
>
> I have no idea why the boolean field would cause different behavior than
> the
> text field. Am I missing some piece of configuration, are my expectations
> wrong?
>
>
> Regards,
> Joost Molenaar
>


[REMINDER] Final Call for Presentations for Flink Forward San Francisco 2022

2022-05-06 Thread Timo Walther

Hi everyone,

I would like to send out a final reminder. We have already received some 
great submissions for FlinkForward San Francisco 2022. Nevertheless, we 
decided to extend the deadline by another week to give people a second 
chance to work on their abstracts and presentation ideas.


This is the final call to be a part of the event as a speaker - until 
11:59, May 12th PDT.


Any topic that can be categorized as
- Flink Use Cases
- Flink Operations
- Technology Deep Dives
- Ecosystem
- Community
is welcome.

NOTE: This will be an in-person event. However, if your country has 
travel restrictions, please let us know in the form. We will offer a 
limited number of slots for pre-recorded/remote Q talks to not exclude 
anyone.


https://www.flink-forward.org/sf-2022/call-for-presentations

In any case, it would be great to meet each other again!

Looking forward to the event,

Timo




Re: How to return JSON Object from UDF

2022-05-06 Thread Surendra Lalwani
Hi Timo,

Please find the example:

@DataTypeHint(value = "RAW", bridgedTo = JsonObject.class)
public  Object eval(String jsonString) {

//Logic to parse

}

Thanks and Regards ,
Surendra Lalwani


On Fri, May 6, 2022 at 3:13 PM Timo Walther  wrote:

> Can you show the full example? It looks like there is still a JSONObject
> without a @DataTypeHint next to it.
>
>
>
> Am 06.05.22 um 11:18 schrieb Surendra Lalwani:
>
> Hi Timo,
>
> I tried this but still getting error:
>
> Caused by: org.apache.flink.table.api.ValidationException: Could not
> extract a data type from 'class net.minidev.json.JSONObject'. Interpreting
> it as a structured type was also not successful.at
> org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:361)
>at
> org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrError(DataTypeExtractor.java:291)
>at
> org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrRawWithTemplate(DataTypeExtractor.java:233)
>... 83 common frames omittedCaused by:
> org.apache.flink.table.api.ValidationException: Field 'threshold' of class
> 'net.minidev.json.JSONObject' is neither publicly accessible nor does it
> have a corresponding getter method.
>
>
> Thanks and Regards ,
> Surendra Lalwani
>
>
> On Fri, May 6, 2022 at 2:43 PM Timo Walther  wrote:
>
>> Hi Surendra,
>>
>> in general we would like to encourage users to use the SQL type system
>> classes instead of RAW types. Otherwise they are simply black boxes in the
>> SQL engine. A STRING or ROW type might be more appropriate.
>>
>> You can use
>>
>> @DataTypeHint(value = "RAW")  // defaults to Object.class
>>
>> @DataTypeHint(value = "RAW", bridgedTo =JSONObject.class)// more
>> precise class information
>>
>> the rawSerializer is usually not required and Kyro will be used in this
>> case.
>>
>> Regards,
>> Timo
>>
>>
>> Am 06.05.22 um 10:36 schrieb yuxia:
>>
>> Does the DatatypeHint with bridgedTo can meet your requirements?
>> For example:
>> '
>> public @DataTypeHint(
>> value = "RAW",
>> bridgedTo =JSONObject.class,
>> rawSerializer =JSONObjectSerializer.class) JSONObject 
>> eval(String
>> str) {
>> return JSONObject.parse(str);
>> }
>> '
>> You may need to provide a class likeJSONObjectSerializer that
>> extends TypeSerializerSingleton.
>>
>>
>> Best regards,
>> Yuxia
>>
>> --
>> *发件人: *"Surendra Lalwani" 
>> 
>> *收件人: *"User"  
>> *发送时间: *星期五, 2022年 5 月 06日 下午 4:40:19
>> *主题: *How to return JSON Object from UDF
>>
>> Hi Team,
>>
>> I am using Flink 1.13.6 and I have created a UDF and I want to return
>> JSONObject from that UDF or basically an Object but it doesn't seems to
>> work as there is no datatype hint compatible to Object. in earlier flink
>> versions when DataTypeHint wasn't there, it used to work. Any help would be
>> appreciated.
>>
>> Thanks and Regards ,
>> Surendra Lalwani
>>
>>
>> --
>> IMPORTANT NOTICE: This e-mail, including any attachments, may contain
>> confidential information and is intended only for the addressee(s) named
>> above. If you are not the intended recipient(s), you should not
>> disseminate, distribute, or copy this e-mail. Please notify the sender by
>> reply e-mail immediately if you have received this e-mail in error and
>> permanently delete all copies of the original message from your system.
>> E-mail transmission cannot be guaranteed to be secure as it could be
>> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
>> contain viruses. Company accepts no liability for any damage or loss of
>> confidential information caused by this email or due to any virus
>> transmitted by this email or otherwise.
>>
>>
>>
>
> --
> IMPORTANT NOTICE: This e-mail, including any attachments, may contain
> confidential information and is intended only for the addressee(s) named
> above. If you are not the intended recipient(s), you should not
> disseminate, distribute, or copy this e-mail. Please notify the sender by
> reply e-mail immediately if you have received this e-mail in error and
> permanently delete all copies of the original message from your system.
> E-mail transmission cannot be guaranteed to be secure as it could be
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> contain viruses. Company accepts no liability for any damage or loss of
> confidential information caused by this email or due to any virus
> transmitted by this email or otherwise.
>
>
>

-- 

IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) named 
above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender by 
reply e-mail immediately if you have received this e-mail in 

Re: How to return JSON Object from UDF

2022-05-06 Thread Timo Walther
Can you show the full example? It looks like there is still a JSONObject 
without a @DataTypeHint next to it.




Am 06.05.22 um 11:18 schrieb Surendra Lalwani:

Hi Timo,

I tried this but still getting error:

Caused by: org.apache.flink.table.api.ValidationException: Could not 
extract a data type from 'class net.minidev.json.JSONObject'. 
Interpreting it as a structured type was also not successful.       
 at 
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:361) 
       at 
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrError(DataTypeExtractor.java:291) 
       at 
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrRawWithTemplate(DataTypeExtractor.java:233) 
       ... 83 common frames omittedCaused by: 
org.apache.flink.table.api.ValidationException: Field 'threshold' of 
class 'net.minidev.json.JSONObject' is neither publicly accessible nor 
does it have a corresponding getter method.



Thanks and Regards ,
Surendra Lalwani


On Fri, May 6, 2022 at 2:43 PM Timo Walther  wrote:

Hi Surendra,

in general we would like to encourage users to use the SQL type
system classes instead of RAW types. Otherwise they are simply
black boxes in the SQL engine. A STRING or ROW type might be more
appropriate.

You can use

@DataTypeHint(value = "RAW")  // defaults to Object.class

@DataTypeHint(value = "RAW", bridgedTo =JSONObject.class) // more
precise class information

the rawSerializer is usually not required and Kyro will be used in
this case.

Regards,
Timo


Am 06.05.22 um 10:36 schrieb yuxia:

Does the DatatypeHint with bridgedTo can meet your requirements?
For example:
'
public @DataTypeHint(
                value = "RAW",
                bridgedTo =JSONObject.class,
                rawSerializer =JSONObjectSerializer.class)
JSONObject eval(String str) {
            return JSONObject.parse(str);
        }
'
You may need to provide a class likeJSONObjectSerializer that
extends TypeSerializerSingleton.


Best regards,
Yuxia


*发件人: *"Surendra Lalwani" 

*收件人: *"User" 

*发送时间: *星期五, 2022年 5 月 06日 下午 4:40:19
*主题: *How to return JSON Object from UDF

Hi Team,

I am using Flink 1.13.6 and I have created a UDF and I want to
return JSONObject from that UDF or basically an Object but it
doesn't seems to work as there is no datatype hint compatible to
Object. in earlier flink versions when DataTypeHint wasn't there,
it used to work. Any help would be appreciated.

Thanks and Regards ,
Surendra Lalwani



IMPORTANT NOTICE: This e-mail, including any attachments, may
contain confidential information and is intended only for the
addressee(s) named above. If you are not the intended
recipient(s), you should not disseminate, distribute, or copy
this e-mail. Please notify the sender by reply e-mail immediately
if you have received this e-mail in error and permanently delete
all copies of the original message from your system. E-mail
transmission cannot be guaranteed to be secure as it could be
intercepted, corrupted, lost, destroyed, arrive late or
incomplete, or contain viruses. Company accepts no liability for
any damage or loss of confidential information caused by this
email or due to any virus transmitted by this email or otherwise.






IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) 
named above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender 
by reply e-mail immediately if you have received this e-mail in error 
and permanently delete all copies of the original message from your 
system. E-mail transmission cannot be guaranteed to be secure as it 
could be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete, or contain viruses. Company accepts no liability for any 
damage or loss of confidential information caused by this email or due 
to any virus transmitted by this email or otherwise. 




Re: Flink OLAP 与 Trino TPC-DS 对比

2022-05-06 Thread LuNing Wang
谢谢 Yu Li 老师提醒,
我彻底开放了测试表格,和资源配置文档。

《TPC-DS各引擎耗时》
https://www.yuque.com/deadwind/notes/tpcds-benchmark-table
《TPC-DS资源配置》
https://www.yuque.com/deadwind/notes/tpcds-resource
Best,
LuNing Wang.


Re: How to return JSON Object from UDF

2022-05-06 Thread Surendra Lalwani
Hi Timo,

I tried this but still getting error:

Caused by: org.apache.flink.table.api.ValidationException: Could not
extract a data type from 'class net.minidev.json.JSONObject'. Interpreting
it as a structured type was also not successful.at
org.apache.flink.table.types.extraction.ExtractionUtils.extractionError(ExtractionUtils.java:361)
   at
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrError(DataTypeExtractor.java:291)
   at
org.apache.flink.table.types.extraction.DataTypeExtractor.extractDataTypeOrRawWithTemplate(DataTypeExtractor.java:233)
   ... 83 common frames omittedCaused by:
org.apache.flink.table.api.ValidationException: Field 'threshold' of class
'net.minidev.json.JSONObject' is neither publicly accessible nor does it
have a corresponding getter method.


Thanks and Regards ,
Surendra Lalwani


On Fri, May 6, 2022 at 2:43 PM Timo Walther  wrote:

> Hi Surendra,
>
> in general we would like to encourage users to use the SQL type system
> classes instead of RAW types. Otherwise they are simply black boxes in the
> SQL engine. A STRING or ROW type might be more appropriate.
>
> You can use
>
> @DataTypeHint(value = "RAW")  // defaults to Object.class
>
> @DataTypeHint(value = "RAW", bridgedTo =JSONObject.class)// more
> precise class information
>
> the rawSerializer is usually not required and Kyro will be used in this
> case.
>
> Regards,
> Timo
>
>
> Am 06.05.22 um 10:36 schrieb yuxia:
>
> Does the DatatypeHint with bridgedTo can meet your requirements?
> For example:
> '
> public @DataTypeHint(
> value = "RAW",
> bridgedTo =JSONObject.class,
> rawSerializer =JSONObjectSerializer.class) JSONObject 
> eval(String
> str) {
> return JSONObject.parse(str);
> }
> '
> You may need to provide a class likeJSONObjectSerializer that
> extends TypeSerializerSingleton.
>
>
> Best regards,
> Yuxia
>
> --
> *发件人: *"Surendra Lalwani" 
> 
> *收件人: *"User"  
> *发送时间: *星期五, 2022年 5 月 06日 下午 4:40:19
> *主题: *How to return JSON Object from UDF
>
> Hi Team,
>
> I am using Flink 1.13.6 and I have created a UDF and I want to return
> JSONObject from that UDF or basically an Object but it doesn't seems to
> work as there is no datatype hint compatible to Object. in earlier flink
> versions when DataTypeHint wasn't there, it used to work. Any help would be
> appreciated.
>
> Thanks and Regards ,
> Surendra Lalwani
>
>
> --
> IMPORTANT NOTICE: This e-mail, including any attachments, may contain
> confidential information and is intended only for the addressee(s) named
> above. If you are not the intended recipient(s), you should not
> disseminate, distribute, or copy this e-mail. Please notify the sender by
> reply e-mail immediately if you have received this e-mail in error and
> permanently delete all copies of the original message from your system.
> E-mail transmission cannot be guaranteed to be secure as it could be
> intercepted, corrupted, lost, destroyed, arrive late or incomplete, or
> contain viruses. Company accepts no liability for any damage or loss of
> confidential information caused by this email or due to any virus
> transmitted by this email or otherwise.
>
>
>

-- 

IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) named 
above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender by 
reply e-mail immediately if you have received this e-mail in error and 
permanently delete all copies of the original message from your system. 
E-mail transmission cannot be guaranteed to be secure as it could be 
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
contain viruses. Company accepts no liability for any damage or loss of 
confidential information caused by this email or due to any virus 
transmitted by this email or otherwise.


Re: How to return JSON Object from UDF

2022-05-06 Thread Timo Walther

Hi Surendra,

in general we would like to encourage users to use the SQL type system 
classes instead of RAW types. Otherwise they are simply black boxes in 
the SQL engine. A STRING or ROW type might be more appropriate.


You can use

@DataTypeHint(value = "RAW")  // defaults to Object.class

@DataTypeHint(value = "RAW", bridgedTo =JSONObject.class)    // more 
precise class information


the rawSerializer is usually not required and Kyro will be used in this 
case.


Regards,
Timo


Am 06.05.22 um 10:36 schrieb yuxia:

Does the DatatypeHint with bridgedTo can meet your requirements?
For example:
'
public @DataTypeHint(
                value = "RAW",
                bridgedTo =JSONObject.class,
                rawSerializer =JSONObjectSerializer.class) JSONObject 
eval(String str) {

            return JSONObject.parse(str);
        }
'
You may need to provide a class likeJSONObjectSerializer that 
extends TypeSerializerSingleton.



Best regards,
Yuxia


*发件人: *"Surendra Lalwani" 
*收件人: *"User" 
*发送时间: *星期五, 2022年 5 月 06日 下午 4:40:19
*主题: *How to return JSON Object from UDF

Hi Team,

I am using Flink 1.13.6 and I have created a UDF and I want to return 
JSONObject from that UDF or basically an Object but it doesn't seems 
to work as there is no datatype hint compatible to Object. in earlier 
flink versions when DataTypeHint wasn't there, it used to work. Any 
help would be appreciated.


Thanks and Regards ,
Surendra Lalwani



IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) 
named above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender 
by reply e-mail immediately if you have received this e-mail in error 
and permanently delete all copies of the original message from your 
system. E-mail transmission cannot be guaranteed to be secure as it 
could be intercepted, corrupted, lost, destroyed, arrive late or 
incomplete, or contain viruses. Company accepts no liability for any 
damage or loss of confidential information caused by this email or due 
to any virus transmitted by this email or otherwise.




Fwd: [Discuss] Creating an Apache Flink slack workspace

2022-05-06 Thread Xintong Song
Thank you~

Xintong Song



-- Forwarded message -
From: Xintong Song 
Date: Fri, May 6, 2022 at 5:07 PM
Subject: Re: [Discuss] Creating an Apache Flink slack workspace
To: private 
Cc: Chesnay Schepler 


Hi Chesnay,

Correct me if I'm wrong, I don't find this is *repeatedly* discussed on the
ML. The only discussions I find are [1] & [2], which are 4 years ago. On
the other hand, I do find many users are asking questions about whether
Slack should be supported [2][3][4]. Besides, I also find a recent
discussion thread from ComDev [5], where alternative communication channels
are being discussed. It seems to me ASF is quite open to having such
additional channels and they have been worked well for many projects
already.

I see two reasons for brining this discussion again:
1. There are indeed many things that have change during the past 4 years.
We have more contributors, including committers and PMC members, and even
more users from various organizations and timezones. That also means more
discussions and Q are happening.
2. The proposal here is different from the previous discussion. Instead of
maintaining a channel for Flink in the ASF workspace, here we are proposing
to create a dedicated Apache Flink slack workspace. And instead of *moving*
the discussion to Slack, we are proposing to add a Slack Workspace as an
addition to the ML.

Below is your opinions that I found from your previous -1 [1]. IIUR, these
are all about the using ASF Slack Workspace. If I overlooked anything,
please let me know.

> 1. According to INFRA-14292 <
> https://issues.apache.org/jira/browse/INFRA-14292> the ASF Slack isn't
> run by the ASF. This alone puts this service into rather questionable
> territory as it /looks/ like an official ASF service. If anyone can provide
> information to the contrary, please do so.

2. We already discuss things on the mailing lists, JIRA and GitHub. All of
> these are available to the public, whereas the slack channel requires an
> @apache mail address, i.e. you have to be a committer. This minimizes the
> target audience rather significantly. I would much rather prefer something
> that is also available to contributors.


I do agree this should be decided by the whole community. I'll forward this
to dev@ and user@ ML.

Thank you~

Xintong Song


[1] https://lists.apache.org/thread/gxwv49ssq82g06dbhy339x6rdxtlcv3d
[2] https://lists.apache.org/thread/kcym1sozkrtwxw1fjbnwk1nqrrlzolcc
[3] https://lists.apache.org/thread/7rmd3ov6sv3wwhflp97n4czz25hvmqm6
[4] https://lists.apache.org/thread/n5y1kzv50bkkbl3ys494dglyxl45bmts
[5] https://lists.apache.org/thread/fzwd3lj0x53hkq3od5ot0y719dn3kj1j

On Fri, May 6, 2022 at 3:05 PM Chesnay Schepler  wrote:

> This has been repeatedly discussed on the ML over the years and was
> rejected every time.
>
> I don't see that anything has changed that would invalidate the previously
> raised arguments against it, so I'm still -1 on it.
>
> This is also not something the PMC should decide anyway, but the project
> as a whole.
>
> On 06/05/2022 06:48, Jark Wu wrote:
>
> Thank Xintong, for starting this exciting topic.
>
> I think Slack would be an essential addition to the mailing list.
> I have talked with some Flink users, and they are surprised
> Flink doesn't have Slack yet, and they would love to use Slack.
> We can also see a trend that new open-source communities
> are using Slack as the community base camp.
>
> Slack is also helpful for brainstorming and asking people for opinions and
> use cases.
> I think Slack is not only another place for Q but also a connection to
> the Flink users.
> We can create more channels to make the community have more social
> attributes, for example,
>  - Share ideas, projects, integrations, articles, and presentations
> related to Flink in the #shows channel
>  - Flink releases, events in the #news channel
>
> Thus, I'm +1 to create an Apache Flink slack, and I can help set up the
> Flink slack and maintain it.
>
> Best,
> Jark
>
> On Fri, 6 May 2022 at 10:38, Xintong Song  wrote:
>
>> Hi all,
>>
>> I’d like to start a discussion on creating an Apache Flink slack
>> workspace.
>>
>> ## Motivation
>> Today many organizations choose to do real time communication through
>> slack. IMHO, we, Flink, as a technique for real time computing, should
>> embrace the more real time way for communication, especially for ad-hoc
>> questions and interactions. With more and more contributors from different
>> organizations joining this community, it would be good to provide a common
>> channel for such real time communications. Therefore, I'd propose to create
>> an Apache Flink slack workspace that is maintained by the Flink PMC.
>>
>> ## Benefits
>> - Easier to reach out to people. Messages are less likely overlooked.
>> - Realtime messages, voice / video calls, file transmissions that help
>> improve the communication efficiency.
>> - Finer-grained channels (e.g., flink-ml, flink-statefun, temporal
>> discussion channels 

Re: How to return JSON Object from UDF

2022-05-06 Thread yuxia
Does the DatatypeHint with bridgedTo can meet your requirements? 
For example: 
' 
public @DataTypeHint( 
value = "RAW", 
bridgedTo = JSONObject .class, 
rawSerializer = JSONObject Serializer.class) JSONObject eval(String str) { 
return JSONObject .parse(str); 
} 
' 
You may need to provide a class like JSONObject Serializer that extends 
TypeSerializerSingleton. 


Best regards, 
Yuxia 


发件人: "Surendra Lalwani"  
收件人: "User"  
发送时间: 星期五, 2022年 5 月 06日 下午 4:40:19 
主题: How to return JSON Object from UDF 

Hi Team, 

I am using Flink 1.13.6 and I have created a UDF and I want to return 
JSONObject from that UDF or basically an Object but it doesn't seems to work as 
there is no datatype hint compatible to Object. in earlier flink versions when 
DataTypeHint wasn't there, it used to work. Any help would be appreciated. 

Thanks and Regards , 
Surendra Lalwani 



IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) named above. 
If you are not the intended recipient(s), you should not disseminate, 
distribute, or copy this e-mail. Please notify the sender by reply e-mail 
immediately if you have received this e-mail in error and permanently delete 
all copies of the original message from your system. E-mail transmission cannot 
be guaranteed to be secure as it could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. Company accepts no 
liability for any damage or loss of confidential information caused by this 
email or due to any virus transmitted by this email or otherwise. 



Re: Flink OLAP 与 Trino TPC-DS 对比

2022-05-06 Thread Yu Li
感谢大家的分享和分析,也期待Flink在相关方向的持续优化!

Let's make Flink great together. :-)

btw, 第5个引用的语雀文档链接已过期,建议使用google doc并更新一下链接

Best Regards,
Yu


On Sun, 1 May 2022 at 21:57, Zhilong Hong  wrote:

> Hello,
>
> 这段时间我们针对 LuNing 反馈的问题进行了深入的分析调研,在此将结论同步给社区。特别感谢 LuNing 反馈这一问题并与我们一起进行分析排查。
>
> 根据我们的分析,造成 Flink 1.14 在 TPCDS 10G 数据集、2 节点集群规模的情况下,与 Trino 359
> 性能差距较大的原因主要包括以下 3 点:
>
> 1. 使用 SQL Client 提交 Flink 作业的耗时较长(单 query 约需要 4s)。在需要频繁提交作业的 OLAP
> 场景下,我们建议使用 Flink SQL Gateway 提交作业,避免重复创建 Client 进程、建立网络链接等额外开销。我们目前使用的是
> Ververica 开源的 SQL Gateway [1],此外社区也正在准备推出官方的 SQL Gateway,详见 FLIP-91 [2]。
>
> 2. 测试使用的数据集比较小(10GiB),导致 Hive Source 根据数据量划分出的 Split 数也比较少。而 Split 是 Source
> 处理数据的最小单位,这就导致虽然看起来 Source 有 32 个并发,实际读取并处理数据的往往只有 1~2 个并发。此外,由于测试配置中关闭了
> Hive Source 的自动推断并发度功能 [3],导致上下游的并发数相同并且被 chain
> 在一起,间接导致了下游算子实际处理数据的并发数也受到了影响。这一问题我们此前也发现过 [4],但没有像在 10GiB 这么小的数据集上影响这么大。
>
> 3. 目前对于部分 TPCDS 测试集的 Query,Flink SQL 生成的执行计划不是最优,导致 Flink 实际处理的数据量比 Trino
> 要大。这与我们在大规模数据集上的观察是一致的,目前社区 SQL 模块的小伙伴们也在继续对这些 case 进行优化。
>
> 总的来看,上述 3 点中,第 2 点对 Flink 性能的影响是最大的。我们针对这一问题做了一定优化。打了 patch 后,尽管实际读取并处理数据的
> Hive Source 并发仍达不到配置的 32 并发,但与 Trino 的差距已大幅缩短,详见 [5]。
>
> 目前在 OLAP 场景下 Flink 与 Trino 确实还存在差距,社区目前也正在针对这一场景进行优化
> [6]。我们目前在阿里内部的开发分支上,已经追平了 Trino 的性能,相关优化预计会在 Flink 1.16、1.17 两个版本中陆续贡献回社区。
>
> Best,
> Zhilong Hong
>
> [1] https://github.com/ververica/flink-sql-gateway
> [2]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Gateway
> [3]
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/hive/hive_read_write/#source-parallelism-inference
> [4] https://issues.apache.org/jira/browse/FLINK-27338
> [5]
> https://www.yuque.com/docs/share/b89479ab-9c24-45c8-9390-77299ae0c4cd?#AkK9
> [6] https://issues.apache.org/jira/browse/FLINK-25318
>
> On Tue, Apr 19, 2022 at 5:43 PM LuNing Wang  wrote:
>
> >
> https://www.yuque.com/docs/share/8625d14b-d465-48a3-8dc1-0be32b138f34?#lUX6
> > 《tpcds-各引擎耗时》
> > 链接有效期至 2022-04-22 10:31:05
> >
> > LuNing Wong  于2022年4月18日周一 09:44写道:
> >
> > > 补充,用的Hive 3.1.2 Hadoop 3.1.0做的数据源。
> > >
> > > LuNing Wong  于2022年4月18日周一 09:42写道:
> > >
> > > > Flink版本是1.14.4,
> > Trino是359版本,tm.memory.process.size和CPU资源我都和Trino对齐了。都是32G
> > > > 16核 16线程,2台计算节点。
> > > >
> > > > Zhilong Hong  于2022年4月15日周五 18:21写道:
> > > >
> > > >> Hello, Luning!
> > > >>
> > > >>
> > > >>
> > >
> >
> 我们目前也正在关注Flink在OLAP场景的性能表现,请问你测试的Flink和Trino版本分别是什么呢?另外我看到flink-sql-benchmark中所使用的集群配置和你的不太一样,可能需要根据集群资源对flink-conf.yaml中taskmanager.memory.process.size等资源配置进行调整。
> > > >>
> > > >> Best,
> > > >> Zhilong
> > > >>
> > > >> On Fri, Apr 15, 2022 at 2:38 PM LuNing Wang 
> > > >> wrote:
> > > >>
> > > >> > 跑了100个 TPC-DS SQL
> > > >> > 10 GB 数据、2个Worker(TM)、每个32G内存,16个核心。
> > > >> > Flink平均用时 18秒
> > > >> > Trino平均用时 7秒
> > > >> >
> > > >> > 我看字节跳动和阿里的老师测试,Flink和presto
> > > >> OLAP性能接近,但是我测的差距很大。想进一步和老师交流下,是不是我Flink设置的有问题。
> > > >> > 我基本上是按照下面这个项目里模板配置的Flink相关参数。
> > > >> > https://github.com/ververica/flink-sql-benchmark
> > > >> >
> > > >> >
> > > >> > LuNing Wang  于2022年4月15日周五 14:34写道:
> > > >> >
> > > >> > > 跑了100个SQL
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>


How to return JSON Object from UDF

2022-05-06 Thread Surendra Lalwani
Hi Team,

I am using Flink 1.13.6 and I have created a UDF and I want to return
JSONObject from that UDF or basically an Object but it doesn't seems to
work as there is no datatype hint compatible to Object. in earlier flink
versions when DataTypeHint wasn't there, it used to work. Any help would be
appreciated.

Thanks and Regards ,
Surendra Lalwani

-- 

IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
confidential information and is intended only for the addressee(s) named 
above. If you are not the intended recipient(s), you should not 
disseminate, distribute, or copy this e-mail. Please notify the sender by 
reply e-mail immediately if you have received this e-mail in error and 
permanently delete all copies of the original message from your system. 
E-mail transmission cannot be guaranteed to be secure as it could be 
intercepted, corrupted, lost, destroyed, arrive late or incomplete, or 
contain viruses. Company accepts no liability for any damage or loss of 
confidential information caused by this email or due to any virus 
transmitted by this email or otherwise.


Re: 退订

2022-05-06 Thread yuxia
退订发任意主题的邮件至 user-zh-unsubscr...@flink.apache.org 即可


- 原始邮件 -
发件人: zh...@greatld.com
收件人: "user-zh" 
发送时间: 星期五, 2022年 5 月 06日 下午 1:44:44
主题: 退订

退订



zh...@greatld.com
 
发件人: Peng Vincent
发送时间: 2022-04-16 11:59
收件人: user-zh
主题: 退订
Best wishes!
 
Vincent


Re: 退订

2022-05-06 Thread yuxia
退订发任意主题的邮件至 user-zh-unsubscr...@flink.apache.org 即可


- 原始邮件 -
发件人: "顺其自然" <712677...@qq.com.INVALID>
收件人: "user-zh" 
发送时间: 星期五, 2022年 5 月 06日 下午 1:57:30
主题: 退订

退订