Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-21 Thread Jark Wu
+1 to Bowen's proposal. I also saw many requirements on such built-in
connectors.

I will leave some my thoughts here:

> 1. datagen source (random source)
I think we can merge the functinality of sequence-source into random source
to allow users to custom their data values.
Flink can generate random data according to the field types, users
can customize their values to be more domain specific, e.g.
'field.user'='User_[1-9]{0,1}'
This will be similar to kafka-datagen-connect[1].

> 2. console sink (print sink)
This will be very useful in production debugging, to easily output an
intermediate view or result view to a `.out` file.
So that we can look into the data representation, or check dirty data.
This should be out-of-box without manually DDL registration.

> 3. blackhole sink (no output sink)
This is very useful for high performance testing of Flink, to meansure the
throughput of the whole pipeline without sink.
Presto also provides this as a built-in connector [2].

Best,
Jark

[1]:
https://github.com/confluentinc/kafka-connect-datagen#define-a-new-schema-specification
[2]: https://prestodb.io/docs/current/connector/blackhole.html


On Sat, 21 Mar 2020 at 12:31, Bowen Li  wrote:

> +1.
>
> I would suggest to take a step even further and see what users really need
> to test/try/play with table API and Flink SQL. Besides this one, here're
> some more sources and sinks that I have developed or used previously to
> facilitate building Flink table/SQL pipelines.
>
>
>1. random input data source
>   - should generate random data at a specified rate according to schema
>   - purposes
>  - test Flink pipeline and data can end up in external storage
>  correctly
>  - stress test Flink sink as well as tuning up external storage
>   2. print data sink
>   - should print data in row format in console
>   - purposes
>  - make it easier to test Flink SQL job e2e in IDE
>  - test Flink pipeline and ensure output data format/value is
>  correct
>   3. no output data sink
>   - just swallow output data without doing anything
>   - purpose
>  - evaluate and tune performance of Flink source and the whole
>  pipeline. Users' don't need to worry about sink back pressure
>
> These may be taken into consideration all together as an effort to lower
> the threshold of running Flink SQL/table API, and facilitate users' daily
> work.
>
> Cheers,
> Bowen
>
>
> On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li 
> wrote:
>
> > Hi all,
> >
> > I heard some users complain that table is difficult to test. Now with SQL
> > client, users are more and more inclined to use it to test rather than
> > program.
> > The most common example is Kafka source. If users need to test their SQL
> > output and checkpoint, they need to:
> >
> > - 1.Launch a Kafka standalone, create a Kafka topic .
> > - 2.Write a program, mock input records, and produce records to Kafka
> > topic.
> > - 3.Then test in Flink.
> >
> > The step 1 and 2 are annoying, although this test is E2E.
> >
> > Then I found StatefulSequenceSource, it is very good because it has deal
> > with checkpoint things, so it is very good to checkpoint
> mechanism.Usually,
> > users are turned on checkpoint in production.
> >
> > With computed columns, user are easy to create a sequence source DDL same
> > to Kafka DDL. Then they can test inside Flink, don't need launch other
> > things.
> >
> > Have you consider this? What do you think?
> >
> > CC: @Aljoscha Krettek  the author
> > of StatefulSequenceSource.
> >
> > Best,
> > Jingsong Lee
> >
>


[jira] [Created] (FLINK-16705) LocalExecutor tears down MiniCluster before client can retrieve JobResult

2020-03-21 Thread Maximilian Michels (Jira)
Maximilian Michels created FLINK-16705:
--

 Summary: LocalExecutor tears down MiniCluster before client can 
retrieve JobResult
 Key: FLINK-16705
 URL: https://issues.apache.org/jira/browse/FLINK-16705
 Project: Flink
  Issue Type: Bug
  Components: Client / Job Submission
Reporter: Maximilian Michels
Assignee: Maximilian Michels


There is a race condition in {{LocalExecutor}} between (a) shutting down the 
cluster when the job has finished and (b) the client which retrieves the result 
of the job execution.

This was observed in Beam, running a large test suite with the Flink Runner.

We should make sure the job result retrieval and the cluster shutdown do not 
interfere.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16704) Document has some error for connect.md

2020-03-21 Thread hehuiyuan (Jira)
hehuiyuan created FLINK-16704:
-

 Summary: Document has  some error for connect.md
 Key: FLINK-16704
 URL: https://issues.apache.org/jira/browse/FLINK-16704
 Project: Flink
  Issue Type: Wish
  Components: Documentation
Affects Versions: 1.10.0
Reporter: hehuiyuan
 Attachments: image-2020-03-21-15-09-59-802.png, 
image-2020-03-21-15-11-00-061.png

For branch < = 1.10 , that has some errors.

*Type.ROW ->Types.ROW*

!image-2020-03-21-15-09-59-802.png|width=265,height=296!

 

 

!image-2020-03-21-15-11-00-061.png|width=334,height=193!

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Flink 1.10.1

2020-03-21 Thread Yu Li
Hi All,

Here is the status update of issues in 1.10.1 watch list:

* Blockers (7)

  - [Under Discussion] FLINK-16018 Improve error reporting when submitting
batch job (instead of AskTimeoutException)

  - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error
on repeated job submission

  - [PR Submitted] FLINK-16170 SearchTemplateRequest ClassNotFoundException
when use flink-sql-connector-elasticsearch7

  - [PR Submitted] FLINK-16262 Class loader problem with
FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory

  - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
minimise its OutOfMemoryError

  - [Closed] FLINK-16454 Update the copyright year in NOTICE files

  - [Open] FLINK-16576 State inconsistency on restore with memory state
backends


* Critical (4)

  - [Closed] FLINK-16047 Blink planner produces wrong aggregate results
with state clean up

  - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique
key for UpsertStreamTableSink

  - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be
handled as Fatal Error in TaskManager

  - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot

Please let me know if you see any new blockers to add to the list. Thanks.

Best Regards,
Yu


On Wed, 18 Mar 2020 at 00:11, Yu Li  wrote:

> Thanks for the updates Till!
>
> For FLINK-16018, maybe we could create two sub-tasks for easy and complete
> fix separately, and only include the easy one in 1.10.1? Or please just
> feel free to postpone the whole task to 1.10.2 if "all or nothing" policy
> is preferred (smile). Thanks.
>
> Best Regards,
> Yu
>
>
> On Tue, 17 Mar 2020 at 23:33, Till Rohrmann  wrote:
>
>> +1 for a soonish bug fix release. Thanks for volunteering as our release
>> manager Yu.
>>
>> I think we can soon merge the increase of metaspace size and improving the
>> error message. The assumption is that we currently don't have too many
>> small Flink 1.10 deployments with a process size <= 1GB. Of course, the
>> sooner we release the bug fix release, the fewer deployments will be
>> affected by this change.
>>
>> For FLINK-16018, I think there would be an easy solution which we could
>> include in the bug fix release. The proper fix will most likely take a bit
>> longer, though.
>>
>> Cheers,
>> Till
>>
>> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin 
>> wrote:
>>
>> > > @Andrey and @Xintong - could we have a quick poll on the user mailing
>> > list
>> > > about increasing the metaspace size in Flink 1.10.1? Specifically
>> asking
>> > > for who has very small TM setups?
>> >
>> > There has been a survey about this topic since 10 days:
>> >
>> > `[Survey] Default size for the new JVM Metaspace limit in 1.10`
>> > I can ask there about the small TM setups.
>> >
>> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li  wrote:
>> >
>> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on restore
>> > with
>> > > memory state backends
>> > >
>> > > Let me recompile the watching list with recent feedbacks. There're
>> > totally
>> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14
>> > > already resolved and 31 left, and the below ones are watched (meaning
>> > that
>> > > once the below ones got fixed, we will kick of the releasing with left
>> > ones
>> > > postponed to 1.10.2 unless objections):
>> > >
>> > > * Blockers (7)
>> > >   - [Under Discussion] FLINK-16018 Improve error reporting when
>> > submitting
>> > > batch job (instead of AskTimeoutException)
>> > >   - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM
>> error
>> > > on repeated job submission
>> > >   - [PR Submitted] FLINK-16170 SearchTemplateRequest
>> > ClassNotFoundException
>> > > when use flink-sql-connector-elasticsearch7
>> > >   - [PR Submitted] FLINK-16262 Class loader problem with
>> > > FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
>> > >   - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
>> > > minimise its OutOfMemoryError
>> > >   - [Open] FLINK-16454 Update the copyright year in NOTICE files
>> > >   - [Open] FLINK-16576 State inconsistency on restore with memory
>> state
>> > > backends
>> > >
>> > > * Critical (4)
>> > >   - [Open] FLINK-16047 Blink planner produces wrong aggregate results
>> > with
>> > > state clean up
>> > >   - [PR Submitted] FLINK-16070 Blink planner can not extract correct
>> > unique
>> > > key for UpsertStreamTableSink
>> > >   - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be
>> > > handled as Fatal Error in TaskManager
>> > >   - [Open] FLINK-16408 Bind user code class loader to lifetime of a
>> slot
>> > >
>> > > Please raise your hand if you find any other issues should be put into
>> > this
>> > > list, thanks.
>> > >
>> > > Please also note that 1.10.2 version is already created (thanks for
>> the
>> > > help @jincheng), and feel free to adjust/assign fix version to it when
>> > > necessary.
>> > >
>> > > Best