Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource
+1 to Bowen's proposal. I also saw many requirements on such built-in connectors. I will leave some my thoughts here: > 1. datagen source (random source) I think we can merge the functinality of sequence-source into random source to allow users to custom their data values. Flink can generate random data according to the field types, users can customize their values to be more domain specific, e.g. 'field.user'='User_[1-9]{0,1}' This will be similar to kafka-datagen-connect[1]. > 2. console sink (print sink) This will be very useful in production debugging, to easily output an intermediate view or result view to a `.out` file. So that we can look into the data representation, or check dirty data. This should be out-of-box without manually DDL registration. > 3. blackhole sink (no output sink) This is very useful for high performance testing of Flink, to meansure the throughput of the whole pipeline without sink. Presto also provides this as a built-in connector [2]. Best, Jark [1]: https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification [2]: https://prestodb.io/docs/current/connector/blackhole.html On Sat, 21 Mar 2020 at 12:31, Bowen Li wrote: > +1. > > I would suggest to take a step even further and see what users really need > to test/try/play with table API and Flink SQL. Besides this one, here're > some more sources and sinks that I have developed or used previously to > facilitate building Flink table/SQL pipelines. > > >1. random input data source > - should generate random data at a specified rate according to schema > - purposes > - test Flink pipeline and data can end up in external storage > correctly > - stress test Flink sink as well as tuning up external storage > 2. print data sink > - should print data in row format in console > - purposes > - make it easier to test Flink SQL job e2e in IDE > - test Flink pipeline and ensure output data format/value is > correct > 3. no output data sink > - just swallow output data without doing anything > - purpose > - evaluate and tune performance of Flink source and the whole > pipeline. Users' don't need to worry about sink back pressure > > These may be taken into consideration all together as an effort to lower > the threshold of running Flink SQL/table API, and facilitate users' daily > work. > > Cheers, > Bowen > > > On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li > wrote: > > > Hi all, > > > > I heard some users complain that table is difficult to test. Now with SQL > > client, users are more and more inclined to use it to test rather than > > program. > > The most common example is Kafka source. If users need to test their SQL > > output and checkpoint, they need to: > > > > - 1.Launch a Kafka standalone, create a Kafka topic . > > - 2.Write a program, mock input records, and produce records to Kafka > > topic. > > - 3.Then test in Flink. > > > > The step 1 and 2 are annoying, although this test is E2E. > > > > Then I found StatefulSequenceSource, it is very good because it has deal > > with checkpoint things, so it is very good to checkpoint > mechanism.Usually, > > users are turned on checkpoint in production. > > > > With computed columns, user are easy to create a sequence source DDL same > > to Kafka DDL. Then they can test inside Flink, don't need launch other > > things. > > > > Have you consider this? What do you think? > > > > CC: @Aljoscha Krettek the author > > of StatefulSequenceSource. > > > > Best, > > Jingsong Lee > > >
[jira] [Created] (FLINK-16705) LocalExecutor tears down MiniCluster before client can retrieve JobResult
Maximilian Michels created FLINK-16705: -- Summary: LocalExecutor tears down MiniCluster before client can retrieve JobResult Key: FLINK-16705 URL: https://issues.apache.org/jira/browse/FLINK-16705 Project: Flink Issue Type: Bug Components: Client / Job Submission Reporter: Maximilian Michels Assignee: Maximilian Michels There is a race condition in {{LocalExecutor}} between (a) shutting down the cluster when the job has finished and (b) the client which retrieves the result of the job execution. This was observed in Beam, running a large test suite with the Flink Runner. We should make sure the job result retrieval and the cluster shutdown do not interfere. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16704) Document has some error for connect.md
hehuiyuan created FLINK-16704: - Summary: Document has some error for connect.md Key: FLINK-16704 URL: https://issues.apache.org/jira/browse/FLINK-16704 Project: Flink Issue Type: Wish Components: Documentation Affects Versions: 1.10.0 Reporter: hehuiyuan Attachments: image-2020-03-21-15-09-59-802.png, image-2020-03-21-15-11-00-061.png For branch < = 1.10 , that has some errors. *Type.ROW ->Types.ROW* !image-2020-03-21-15-09-59-802.png|width=265,height=296! !image-2020-03-21-15-11-00-061.png|width=334,height=193! -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Releasing Flink 1.10.1
Hi All, Here is the status update of issues in 1.10.1 watch list: * Blockers (7) - [Under Discussion] FLINK-16018 Improve error reporting when submitting batch job (instead of AskTimeoutException) - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error on repeated job submission - [PR Submitted] FLINK-16170 SearchTemplateRequest ClassNotFoundException when use flink-sql-connector-elasticsearch7 - [PR Submitted] FLINK-16262 Class loader problem with FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory - [Closed] FLINK-16406 Increase default value for JVM Metaspace to minimise its OutOfMemoryError - [Closed] FLINK-16454 Update the copyright year in NOTICE files - [Open] FLINK-16576 State inconsistency on restore with memory state backends * Critical (4) - [Closed] FLINK-16047 Blink planner produces wrong aggregate results with state clean up - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique key for UpsertStreamTableSink - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be handled as Fatal Error in TaskManager - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot Please let me know if you see any new blockers to add to the list. Thanks. Best Regards, Yu On Wed, 18 Mar 2020 at 00:11, Yu Li wrote: > Thanks for the updates Till! > > For FLINK-16018, maybe we could create two sub-tasks for easy and complete > fix separately, and only include the easy one in 1.10.1? Or please just > feel free to postpone the whole task to 1.10.2 if "all or nothing" policy > is preferred (smile). Thanks. > > Best Regards, > Yu > > > On Tue, 17 Mar 2020 at 23:33, Till Rohrmann wrote: > >> +1 for a soonish bug fix release. Thanks for volunteering as our release >> manager Yu. >> >> I think we can soon merge the increase of metaspace size and improving the >> error message. The assumption is that we currently don't have too many >> small Flink 1.10 deployments with a process size <= 1GB. Of course, the >> sooner we release the bug fix release, the fewer deployments will be >> affected by this change. >> >> For FLINK-16018, I think there would be an easy solution which we could >> include in the bug fix release. The proper fix will most likely take a bit >> longer, though. >> >> Cheers, >> Till >> >> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin >> wrote: >> >> > > @Andrey and @Xintong - could we have a quick poll on the user mailing >> > list >> > > about increasing the metaspace size in Flink 1.10.1? Specifically >> asking >> > > for who has very small TM setups? >> > >> > There has been a survey about this topic since 10 days: >> > >> > `[Survey] Default size for the new JVM Metaspace limit in 1.10` >> > I can ask there about the small TM setups. >> > >> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li wrote: >> > >> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on restore >> > with >> > > memory state backends >> > > >> > > Let me recompile the watching list with recent feedbacks. There're >> > totally >> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14 >> > > already resolved and 31 left, and the below ones are watched (meaning >> > that >> > > once the below ones got fixed, we will kick of the releasing with left >> > ones >> > > postponed to 1.10.2 unless objections): >> > > >> > > * Blockers (7) >> > > - [Under Discussion] FLINK-16018 Improve error reporting when >> > submitting >> > > batch job (instead of AskTimeoutException) >> > > - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM >> error >> > > on repeated job submission >> > > - [PR Submitted] FLINK-16170 SearchTemplateRequest >> > ClassNotFoundException >> > > when use flink-sql-connector-elasticsearch7 >> > > - [PR Submitted] FLINK-16262 Class loader problem with >> > > FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory >> > > - [Closed] FLINK-16406 Increase default value for JVM Metaspace to >> > > minimise its OutOfMemoryError >> > > - [Open] FLINK-16454 Update the copyright year in NOTICE files >> > > - [Open] FLINK-16576 State inconsistency on restore with memory >> state >> > > backends >> > > >> > > * Critical (4) >> > > - [Open] FLINK-16047 Blink planner produces wrong aggregate results >> > with >> > > state clean up >> > > - [PR Submitted] FLINK-16070 Blink planner can not extract correct >> > unique >> > > key for UpsertStreamTableSink >> > > - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be >> > > handled as Fatal Error in TaskManager >> > > - [Open] FLINK-16408 Bind user code class loader to lifetime of a >> slot >> > > >> > > Please raise your hand if you find any other issues should be put into >> > this >> > > list, thanks. >> > > >> > > Please also note that 1.10.2 version is already created (thanks for >> the >> > > help @jincheng), and feel free to adjust/assign fix version to it when >> > > necessary. >> > > >> > > Best