Re: [DISCUSS] Stop adding new bash-based e2e tests to Flink

2020-11-23 Thread Kevin Kwon
Hi :)

For me, it's a bit unclear what it means by not using Docker in e2e tests
in K8S context, since K8S require Docker by default

Also, for for K8S test orchestration, wouldn't vanilla Python suffice? It
has all the K8S, Docker client libraries supported natively by them

Might just need to write a bit of health check scripts on other infras that
Flink supports like Kafka, S3 (mock) etc to safely start Flink and perform
any blackbox testing


On 2020/11/17 15:35:38, Robert Metzger  wrote:
> Hi all,>
>
> Since we are currently testing the 1.12 release, and potentially adding>
> more automated e2e tests, I would like to bring up our end to end tests
for>
> discussion.>
>
> Some time ago, we introduced a Java-based testing framework, with the
idea>
> of replacing the current bash-based end to end tests.>
> Since the introduction of the java-based framework, more bash tests were>
> actually added, making a future migration even harder.>
>
> *For that reason, I would like to propose that we are stopping to add
any>
> new bash end to end tests to Flink. All new end to end tests must be>
> written in Java and rely on the existing testing framework.*>
>
> For the 1.13 release, I'm trying to find some time to revisit potential>
> improvements for the existing java e2e framework (such as using Docker>
> images everywhere), as well as a migration plan for the existing bash>
> tests. We have a large number of bash e2e tests that are just
parameterized>
> differently. If we would start migrating them to Java, we could move a>
> larger proportion of tests over to the new Java framework, and tackle
the>
> more involved bash tests later (kerberized yarn, kubernetes, ...).>
>
> Let me know what you think!>
>
> Best,>
> Robert>
>
>
> PS: If you are wondering why I'm bringing this up now: I'm spending quite
a>
> lot of time trying to figure out really hard to debug issues with our
bash>
> testing infra.>
> Also, it is very difficult to introduce something generic for all tests>
> (such as a test-timeout, using docker as the preferred deployment method>
> etc.) since the tests often don't share common tooling.>
> Speaking about tooling: there are a lot of utilities everywhere,
sometimes>
> duplicated, with different features / stability etc.>
> I believe bash is not the right tool for a project this size (in terms
of>
> developers and lines of code)>
>

-- 










*This email, including any information it contains and any 
attachments to it, is confidential and may be privileged. This email is 
intended only for the use of the named recipient(s). If you are not a named 
recipient, please notify the sender immediately by replying to this message 
and delete the original message. You should not disclose or copy this 
email, any of its contents or any attachments to it. This email may have 
been transmitted over an unsecure public network and, therefore, Careem 
does not accept responsibility for its contents or for any damage sustained 
as a result of viewing its contents or in connection with its transmission. 
Careem reserves the right to monitor all communications from or to this 
account.*


[jira] [Created] (FLINK-19517) Support for Confluent Kafka on table creation

2020-10-06 Thread Kevin Kwon (Jira)
Kevin Kwon created FLINK-19517:
--

 Summary: Support for Confluent Kafka on table creation
 Key: FLINK-19517
 URL: https://issues.apache.org/jira/browse/FLINK-19517
 Project: Flink
  Issue Type: Wish
Affects Versions: 1.12.0
Reporter: Kevin Kwon


Currently, table creation from SQL client such as below works well
{code:sql}
CREATE TABLE kafkaTable (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3)
) WITH (
  'connector' = 'kafka',
  'topic' = 'user_behavior',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'testGroup',
  'format' = 'avro',
  'scan.startup.mode' = 'earliest-offset'
)
{code}
Although I would wish for the table creation to support Confluent Kafka 
configuration as well. For example some think like
{code:sql}
CREATE TABLE kafkaTable (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3)
) WITH (
  'connector' = 'kafka-confluent',
  'topic' = 'user_behavior',
  'properties.bootstrap.servers' = 'localhost:9092',
  'properties.group.id' = 'testGroup',
  'schema-registry' = 'http://schema-registry.com',
  'scan.startup.mode' = 'earliest-offset'
)
{code}
Additionally, it will be better if we can
 - specify 'parallelism' within WITH clause to support parallel partition 
processing
 - specify custom properties within WITH clause specified in 
[https://docs.confluent.io/5.4.2/installation/configuration/consumer-configs.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)