date:20190129

[jira] [Created] (FLINK-11460) Remove useless AcknowledgeStreamMockEnvironment

2019-01-29 Thread zhijiang (JIRA)

zhijiang created FLINK-11460:


 Summary: Remove useless AcknowledgeStreamMockEnvironment
 Key: FLINK-11460
 URL: https://issues.apache.org/jira/browse/FLINK-11460
 Project: Flink
  Issue Type: Task
  Components: Tests
Reporter: zhijiang
Assignee: zhijiang
 Fix For: 1.8.0


This class is not used any more in the code path, so delete it directly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Ratelimiting in the Flink Kafka connector

2019-01-29 Thread Thomas Weise

It is preferred for the service to rate limit. The problem is that not all
Kafka setups have that control enabled / support for it.

Even when rate limiting was enabled, it may still be *nice* for the client
to gracefully handle it.

There was discussion in the past that we should not bloat the Kafka
consumer further and I agree with that.

On the other hand it would be good if the consumer can be augmented a bit
to provide hooks for customization (we had done that for the Kinesis
consumer also).

Thanks,
Thomas


On Mon, Jan 28, 2019 at 3:14 AM Becket Qin  wrote:

> Hi Lakshmi，
>
> As Nagajun mentioned, you might want to configure quota on the Kafka broker
> side for your Flink connector client.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Sat, Jan 26, 2019 at 10:44 AM Ning Shi  wrote:
>
> > > We have a Flink job reading from Kafka (specifically it uses
> > > FlinkKafkaConsumer011). There are instances when the job is processing
> a
> > > backlog and it ends up reading at a significantly high throughput and
> > > degrades the underlying Kafka cluster. If there was a way to rate limit
> > the
> > > calls to Kafka (by controlling how often the *consumer*.poll() is
> > called),
> > > it would be a useful feature for our use case.
> > >
> > > Has anyone has run into a similar issue? Are there are any
> > efforts/thoughts
> > > on implementing a rate-limiting feature in the Flink Kafka connector?
> >
> > We has similar problem and ended up putting a Guava rate limiter
> > inside the Kafka consumer to limit the consumption rate. Since we use
> > POJO, this is easily done by putting the rate limiter inside the POJO
> > deserializer, which runs in the Kafka source.
> >
> > This has the benefit of not slowing down checkpoints because the
> > source doesn't have to do alignment. If you don't care about
> > checkpoint alignment, you can also add a map function with a Guava
> > rate limiter immediately after the Kafka source. When it throttles,
> > back pressure should eventually cause the Kafka source to slowdown
> > consumption.
> >
> > Ning
> >
>

Re: Flink Table Case Statements appending spaces

2019-01-29 Thread Hequn Cheng

Hi Ramya,

Fabian is right. The behavior in strict SQL standard mode(SQL:2003) returns
a CHAR(N) type with blank-padded.

Best,
Hequn


On Wed, Jan 30, 2019 at 1:59 AM Fabian Hueske  wrote:

> Hi,
>
> Table API queries are executed with SQL semantics.
> In SQL, the strings are padded because the result data type is not a
> VARCHAR but a CHAR with as many character as the longest string.
> You can use the RTRIM function to remove the padding whitespaces.
>
> Best, Fabian
>
>
> Am Di., 29. Jan. 2019 um 15:08 Uhr schrieb Ramya Ramamurthy <
> hair...@gmail.com>:
>
> > Hi,
> >
> > I have encountered a weird issue. When constructing a Table Query with
> CASE
> > Statements, like below:
> > .append("CASE ")
> > .append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
> > .append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
> > .append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
> >  .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")
> >
> > Followed by converting to a stream:
> > DataStream ds = tableEnv.toAppendStream(table_name, Row.Class)
> > ds.print()
> >
> > I can see that while I print this, it takes the length of the longest
> > string of all the four categories and appends spaces till that length.
> > Eg. Crawler is of length of the biggest string [social network mayb].
> >
> > This affects when my data is sinked to ES. The fields are appended with
> > empty spaces, which affects the querying.
> >
> > I know i can do a simple split to eliminate these spaces in my code, but
> > just curious to understand why this behavior. It becomes cumbersome to
> > maintain code this way and for readability as well.
> >
> > ~Ramya.
> >
>

Re: [DISCUSS] Proposal of external shuffle service

2019-01-29 Thread qi luo

Very clear. Thanks!

> On Jan 28, 2019, at 10:29 PM, zhijiang  wrote:
> 
> Hi Qi,
> 
> Thanks for the concerns of this proposal. In Blink we implemented the 
> YarnShuffleService which is mainly used for batch jobs in production and some 
> benchmark before. This YarnShuffleService is not within the current proposed 
> ShuffleManager interface and there is also no ShuffleMaster component in JM 
> side. You can regard that as a simple and special implementation version. And 
> the YarnShuffleService can further be refactored within this proposed shuffle 
> manager architecture. 
> 
> Best,
> Zhijiang
> 
> --
> From:qi luo 
> Send Time:2019年1月28日(星期一) 20:55
> To:dev ; zhijiang 
> Cc:Till Rohrmann ; Andrey Zagrebin 
> 
> Subject:Re: [DISCUSS] Proposal of external shuffle service
> 
> Hi Zhijiang,
> 
> I see there’s a YarnShuffleService in newly released Blink branch. Is there 
> any relationship between that YarnShuffleService and  your external shuffle 
> service?
> 
> Regards,
> Qi
> 
> > On Jan 28, 2019, at 8:07 PM, zhijiang  
> > wrote:
> > 
> > Hi till,
> > 
> > Very glad to receive your feedbacks and it is atually very helpful.
> > 
> > The proposed ShuffleMaster in JM would be involved in many existing 
> > processes, such as task deployment, task failover, TM release, so it might 
> > be interactive with corresponding Scheduler, FailoverStrategy, SlotPool 
> > components. In the first version we try to focus on deploying process which 
> > is described in detail in the FLIP. Concerning the other improvements based 
> > on the proposed architecuture, we just mentioned the basic ideas and have 
> > not given the whole detail process. But I think it is reasonable and 
> > natural to solve these issues based on that. And we would further give more 
> > details for other future steps.
> > 
> > I totally agree with your thought of handling TM release. Currently once 
> > the task is finished, the corresponding slot is regared as free no matter 
> > whether the produced partition is consumed or not. Actually we could think 
> > both task and its partitionsoccupy resources in slot. So the slot can be 
> > regared as free until the internal partition is consumed and released. Then 
> > the TM release logic is also improved meanwhile. I think your suggestions 
> > below already gives the detail and specific process for this improvement.
> > 
> > I am in favor of launching a separate thread for this discussion again, 
> > thanks for the advice!
> > 
> > Best,
> > Zhijiang
> > 
> > 
> > --
> > From:Till Rohrmann 
> > Send Time:2019年1月28日(星期一) 19:14
> > To:dev ; zhijiang 
> > Cc:Andrey Zagrebin 
> > Subject:Re: [DISCUSS] Proposal of external shuffle service
> > 
> > Thanks for creating the FLIP-31 for the external shuffle service Zhijiang. 
> > It looks good to me. 
> > 
> > One thing which is not fully clear to me yet is how the lifecycle 
> > management of the partitions integrates with the slot management. At the 
> > moment, conceptually we consider the partition data being owned by the TM 
> > if I understood it correctly. This means the ShuffleMaster is asked whether 
> > a TM can be freed. However, the JobMaster only thinks in terms of slots and 
> > not TMs. Thus, the logic would be that the JM asks the ShuffleMaster 
> > whether it can return a certain slot. Atm the freeing of slots is done by 
> > the `SlotPool` and, thus this would couple the `SlotPool` and the 
> > `ShuffleMaster`. Maybe we need to introduce some mechanism to signal when a 
> > slot has still some occupied resources. In the shared slot case, one could 
> > think of allocating a dummy slot in the shared slot which we only release 
> > after the partition data has been consumed.
> > 
> > In order to give this design document a little bit more visibility, I would 
> > suggest to post it again on the dev mailing list in a separate thread under 
> > the title "[DISCUSS] Flip-31: Pluggable Shuffle Manager" or something like 
> > this.
> > 
> > Cheers,
> > Till
> > On Mon, Jan 21, 2019 at 7:05 AM zhijiang 
> >  wrote:
> > Hi all,
> > 
> > FYI, I created the FLIP-31 under [1] for this proposal and created some 
> > subtasks under umbrella jira [2].
> > Welcome any concerns in previous google doc or speific jiras.
> > 
> > [1] 
> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
> > [2] https://issues.apache.org/jira/browse/FLINK-10653
> > 
> > Best,
> > Zhijiang
> > --
> > From:zhijiang 
> > Send Time:2019年1月15日(星期二) 17:55
> > To:Andrey Zagrebin 
> > Cc:dev 
> > Subject:Re: [DISCUSS] Proposal of external shuffle service
> > 
> > Hi all,
> > 
> > After continuous discussion with Andrey offline, we already reach an 
> > agreement for this proposal and co-author the latest google doc under [1].
> > 
> > We plan to

[jira] [Created] (FLINK-11459) Presto S3 does not show errors due to missing credentials with minio

2019-01-29 Thread Nico Kruber (JIRA)

Nico Kruber created FLINK-11459:
---

 Summary: Presto S3 does not show errors due to missing credentials 
with minio
 Key: FLINK-11459
 URL: https://issues.apache.org/jira/browse/FLINK-11459
 Project: Flink
  Issue Type: Bug
  Components: filesystem-connector
Affects Versions: 1.6.2
Reporter: Nico Kruber


It seems that when using minio for S3-like storage and with mis-configurations 
such as missing (maybe also wrong) credentials gets into a failing state but 
with no reason for it:
{code}
...
2019-01-29 15:43:27,676 INFO  
org.apache.flink.configuration.GlobalConfiguration   [] - Loading 
configuration property: taskmanager.heap.mb, 353
2019-01-29 15:43:27,738 INFO  
org.apache.flink.configuration.GlobalConfiguration   [] - Loading 
configuration property: jobmanager.heap.mb, 429
2019-01-29 15:43:27,758 INFO  org.apache.flink.api.java.ExecutionEnvironment
   [] - The job has 0 registered types and 0 default Kryo serializers
2019-01-29 15:43:29,943 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2093.606], CredentialsRequestTime=[2092.961], 
2019-01-29 15:43:29,956 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2115.551], 
2019-01-29 15:43:31,946 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[3597.992], CredentialsRequestTime=[3597.788], 
2019-01-29 15:43:31,958 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[3610.417], 
2019-01-29 15:43:33,954 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2907.39], CredentialsRequestTime=[2906.853], 
2019-01-29 15:43:33,963 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2917.786], 
2019-01-29 15:43:36,133 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2005.692], CredentialsRequestTime=[2004.942], 
2019-01-29 15:43:36,156 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2029.473], 
2019-01-29 15:43:38,142 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2077.053], CredentialsRequestTime=[2076.05], 
2019-01-29 15:43:38,164 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2092.878], 
2019-01-29 15:43:42,181 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2005.91], CredentialsRequestTime=[2005.164], 
2019-01-29 15:43:42,186 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2011.204], 
2019-01-29 15:43:44,262 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2007.886], CredentialsRequestTime=[2007.165], 
2019-01-29 15:43:44,276 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency[] - 
ClientExecuteTime=[2024.312], 
2019-01-29 15:43:44,585 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint[] - RECEIVED 
SIGNAL 15: SIGTERM. Shutting down as requested.
2019-01-29 15:43:44,628 INFO  org.apache.flink.runtime.blob.TransientBlobCache  
   [] - Shutting down BLOB cache
2019-01-29 15:43:44,661 INFO  org.apache.flink.runtime.blob.BlobServer  
   [] - Stopped BLOB server at 0.0.0.0:6124
{code}

With AWS S3, it is actually printing an exception instead:
{code}
2019-01-29 19:24:39,968 INFO  
org.apache.flink.configuration.GlobalConfiguration- Loading 
configuration property: rest.port, 8081
2019-01-29 19:24:39,990 INFO  org.apache.flink.api.java.ExecutionEnvironment
- The job has 0 registered types and 0 default Kryo serializers
2019-01-29 19:24:43,117 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2047.535], CredentialsRequestTime=[2033.619], 
2019-01-29 19:24:43,118 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2049.826], 
2019-01-29 19:24:46,215 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2003.168], CredentialsRequestTime=[2002.836], 
2019-01-29 19:24:46,216 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2004.182], 
2019-01-29 19:24:50,384 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2003.15], CredentialsRequestTime=[2002.803], 
2019-01-29 19:24:50,384 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2004.308], 
2019-01-29 19:24:56,691 INFO  
org.apache.flink.fs.s3presto.shaded.com.amazonaws.latency - 
ClientExecuteTime=[2002.596], CredentialsRequestTime=[2002.45], 
2019-01-29 19:24:56,691 INFO

Re: [DISCUSS] Improvement to Flink Window Operator with Slicing

2019-01-29 Thread Fabian Hueske

Thank you Rong!
The performance of sliding windows is an issue for many users.
Adding support for a more efficient window is a great effort.

Thank you,
Fabian

Am Di., 29. Jan. 2019 um 16:37 Uhr schrieb Rong Rong :

> Hi all,
>
> Thanks for the feedbacks and suggestions to the design doc. I have created
> a parent JIRA [1] to track the related tasks and started the implementation
> process
> Any further feedbacks or suggestions are highly appreciated.
>
> Best,
> Rong
>
> [1] https://issues.apache.org/jira/browse/FLINK-11276
>
> On Wed, Dec 5, 2018 at 6:17 PM Rong Rong  wrote:
>
> > Hi all,
> >
> > Various discussion in the mailing list & JIRA tickets [2] had been
> brought
> > up in the past regarding the windowing operation performance. As we
> > experiment internally with some of our extreme use cases, we found out
> that
> > using a slice-based implementation can optimize Flink's windowing
> mechanism
> > and provide a better performance in most cases.
> >
> > We've put together a preliminary enhancement and performance optimization
> > plan [1] for the current windowing operation in Flink. This is largely
> > inspired by stream slicing research shared in recent Flink Forward
> > conference [3] by Philip and Jonas, and the discussion in the main JIRA
> > ticket [2]. The initial design and POC implementations consider
> optimizing
> > the performance for the category of overlapping windows as well as
> allowing
> > chaining of cascade window operators.
> >
> > It will be great to hear the feedbacks and suggestions from the
> community.
> > Please kindly share your comments and suggestions.
> >
> > Thanks,
> > Rong
> >
> > [1]
> >
> https://docs.google.com/document/d/1ziVsuW_HQnvJr_4a9yKwx_LEnhVkdlde2Z5l6sx5HlY/edit?usp=sharing
> > [2] https://issues.apache.org/jira/browse/FLINK-7001
> > [3]
> >
> https://data-artisans.com/flink-forward-berlin/resources/efficient-window-aggregation-with-stream-slicing
> >
> >
> >
> >
>

Re: Side Outputs for late arriving records

2019-01-29 Thread Fabian Hueske

Hi Ramya,

If you don't want to apply any logic but just filter late records, you
should not use a window because it needs to shuffle and group records into
windows.
Instead, you can use a non-keyed ProcessFunction and compare the timestamp
of the record (context.timestamp()) with the current watermark
(context.getTimerService().currentWatermark()) and emit all records that
are late to a side output.
This will avoid the shuffle and reduce processing latency.

Best, Fabian

Am Di., 29. Jan. 2019 um 15:02 Uhr schrieb Ramya Ramamurthy <
hair...@gmail.com>:

> Hi Fabian,
>
> I do not have anything to do "yourfunction".
> .apply windowFunction is legacy is what the documentation says. But I am at
> a loss to understand which of the reduce, aggregate, Fold, Apply must i
> use, as i hardly have any operations to perform but to return the stream
> with no late data back to me [So that I will construct a Flink table with
> this data and do my processing there].
>
> ~Ramya.
>
>
>
> On Tue, Jan 29, 2019 at 3:52 PM Fabian Hueske  wrote:
>
> > Hi Ramya,
> >
> > This works by calling getSideOutput() on the main output of the window
> > function.
> > The main output is collected by applying a function on the window.
> >
> > DataStream input = ...
> > OutputTag lateTag = ...
> >
> > DataStream mainResult = input
> >   .keyBy(...)
> >   .window(...)
> >   .sideOutputLateData(lateTag)
> >   .apply(yourFunction);
> >
> > DataStream lateRecords = mainResult.getSideOutput(lateTag);
> >
> > Best, Fabian
> >
> > Am Mo., 28. Jan. 2019 um 11:09 Uhr schrieb Ramya Ramamurthy <
> > hair...@gmail.com>:
> >
> > > Hi,
> > >
> > > We were trying to collect the sideOutput.
> > > But failed to understand as to how to convert this windowed stream to a
> > > datastream.
> > >
> > > final OutputTag > Timestamp>>
> > > lateOutputTag = new OutputTag > > String, Timestamp>>("late-data"){};
> > > withTime.keyBy(0, 2)
> > > .window(TumblingEventTimeWindows.of(Time.minutes(5)))
> > > .allowedLateness(Time.minutes(1))
> > > .sideOutputLateData(lateOutputTag);
> > >
> > > I would now have a windowed stream with records coming in late, tagged
> as
> > > lateOutputTag. How to convert the packets which are not late , back to
> a
> > > datastream. Do we need to use the .apply function to collect this data
> > ...
> > > quite unsure of this. Appreciate your help.
> > >
> > > Best Regards,
> > >
> > >
> > >
> > > On Thu, Jan 24, 2019 at 11:03 PM Fabian Hueske 
> > wrote:
> > >
> > > > Hi Ramya,
> > > >
> > > > This would be a great feature, but unfortunately is not support (yet)
> > by
> > > > Flink SQL.
> > > > Currently, all late records are dropped.
> > > >
> > > > A workaround is to ingest the stream as a DataStream, have a custom
> > > > operator that routes all late records to a side output, and
> registering
> > > the
> > > > DataStream without late records as a table on which the SQL query is
> > > > evaluated.
> > > > This requires quite a bit of boilerplate code but could be hidden in
> a
> > > util
> > > > class.
> > > >
> > > > Best, Fabian
> > > >
> > > > Am Do., 24. Jan. 2019 um 06:42 Uhr schrieb Ramya Ramamurthy <
> > > > hair...@gmail.com>:
> > > >
> > > > > Hi,
> > > > >
> > > > > I have a query with regard to Late arriving records.
> > > > > We are using Flink 1.7 with Kafka Consumers of version 2.11.0.11.
> > > > > In my sink operators, which converts this table to a stream which
> is
> > > > being
> > > > > pushed to Elastic Search, I am able to see this metric "
> > > > > *numLateRecordsDropped*".
> > > > >
> > > > > My Kafka consumers doesn't seem to have any lag and the events are
> > > > > processed properly. To be able to take these events to a side
> outputs
> > > > > doesn't seem to be possible with tables. Below is the snippet:
> > > > >
> > > > > tableEnv.connect(new Kafka()
> > > > >   /* setting of all kafka properties */
> > > > >.startFromLatest())
> > > > >.withSchema(new Schema()
> > > > >.field("sid", Types.STRING())
> > > > >.field("_zpsbd6", Types.STRING())
> > > > >.field("r1", Types.STRING())
> > > > >.field("r2", Types.STRING())
> > > > >.field("r5", Types.STRING())
> > > > >.field("r10", Types.STRING())
> > > > >.field("isBot", Types.BOOLEAN())
> > > > >.field("botcode", Types.STRING())
> > > > >.field("ts", Types.SQL_TIMESTAMP())
> > > > >.rowtime(new Rowtime()
> > > > >.timestampsFromField("recvdTime")
> > > > >.watermarksPeriodicBounded(1)
> > > > >)
> > > > >)
> > > > >.withFormat(new Json().deriveSchema())
> > > > >.inAppendMode()
> > > > >

[jira] [Created] (FLINK-11458) Ensure sink commit side-effects when cancelling with savepoint.

2019-01-29 Thread Kostas Kloudas (JIRA)

Kostas Kloudas created FLINK-11458:
--

 Summary: Ensure sink commit side-effects when cancelling with 
savepoint.
 Key: FLINK-11458
 URL: https://issues.apache.org/jira/browse/FLINK-11458
 Project: Flink
  Issue Type: New Feature
  Components: State Backends, Checkpointing
Affects Versions: 1.8.0
Reporter: Kostas Kloudas
Assignee: Kostas Kloudas
 Fix For: 1.8.0


TBD.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11457) PrometheusPushGatewayReporter either overwrites its own metrics or creates too may labels

2019-01-29 Thread Oscar Westra van Holthe - Kind (JIRA)

Oscar Westra van Holthe - Kind created FLINK-11457:
--

 Summary: PrometheusPushGatewayReporter either overwrites its own 
metrics or creates too may labels
 Key: FLINK-11457
 URL: https://issues.apache.org/jira/browse/FLINK-11457
 Project: Flink
  Issue Type: Bug
Reporter: Oscar Westra van Holthe - Kind


When using the PrometheusPushGatewayReporter, one has two options:
 * Use a fixed job name, which causes the jobmanager and taskmanager to 
overwrite each others metrics (i.e. last write wins, and you lose a lot of 
metrics)
 * Use a random suffix for the job name, which creates a lot of labels that 
have to be cleaned up manually

The manual cleanup should not be necessary, but happens nonetheless when using 
a yarn cluster.

A fix could be to add a suffix the job name, naming the nodes in a non-random 
manner like: {{myjob_jm0}}, {{my_job_tm1}}, {{my_job_tm1}}, {{my_job_tm2}}, 
{{my_job_tm3}}, {{my_job_tm4}}, ..., using a counter (not sure if such is 
available), or some other stable (!) suffix.

Related discussion: FLINK-9187

 

Any thoughts on a solution? I'm happy to implement it, but Im not sure what the 
best solution would be.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11456) Improve window operator with sliding window assigners

2019-01-29 Thread Rong Rong (JIRA)

Rong Rong created FLINK-11456:
-

 Summary: Improve window operator with sliding window assigners
 Key: FLINK-11456
 URL: https://issues.apache.org/jira/browse/FLINK-11456
 Project: Flink
  Issue Type: Sub-task
  Components: DataStream API
Reporter: Rong Rong


With Slicing and merging operators that exposes the internals of window 
operators. current sliding window can be improved by eliminating duplicate 
aggregations or duplicate element insert into multiple panes (e.g. namespaces). 

The following sliding window operation
{code:java}
val resultStream: DataStream = inputStream
  .keyBy("key")
  .window(SlidingEventTimeWindow.of(Time.seconds(5L), Time.seconds(15L)))
  .sum("value")
{code}
can produce job graph equivalent to
{code:java}
val resultStream: DataStream = inputStream
  .keyBy("key")
  .sliceWindow(Time.seconds(5L))
  .sum("value")
  .slideOver(Count.of(3))
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Improvement to Flink Window Operator with Slicing

2019-01-29 Thread Rong Rong

Hi all,

Thanks for the feedbacks and suggestions to the design doc. I have created
a parent JIRA [1] to track the related tasks and started the implementation
process
Any further feedbacks or suggestions are highly appreciated.

Best,
Rong

[1] https://issues.apache.org/jira/browse/FLINK-11276

On Wed, Dec 5, 2018 at 6:17 PM Rong Rong  wrote:

> Hi all,
>
> Various discussion in the mailing list & JIRA tickets [2] had been brought
> up in the past regarding the windowing operation performance. As we
> experiment internally with some of our extreme use cases, we found out that
> using a slice-based implementation can optimize Flink's windowing mechanism
> and provide a better performance in most cases.
>
> We've put together a preliminary enhancement and performance optimization
> plan [1] for the current windowing operation in Flink. This is largely
> inspired by stream slicing research shared in recent Flink Forward
> conference [3] by Philip and Jonas, and the discussion in the main JIRA
> ticket [2]. The initial design and POC implementations consider optimizing
> the performance for the category of overlapping windows as well as allowing
> chaining of cascade window operators.
>
> It will be great to hear the feedbacks and suggestions from the community.
> Please kindly share your comments and suggestions.
>
> Thanks,
> Rong
>
> [1]
> https://docs.google.com/document/d/1ziVsuW_HQnvJr_4a9yKwx_LEnhVkdlde2Z5l6sx5HlY/edit?usp=sharing
> [2] https://issues.apache.org/jira/browse/FLINK-7001
> [3]
> https://data-artisans.com/flink-forward-berlin/resources/efficient-window-aggregation-with-stream-slicing
>
>
>
>

[jira] [Created] (FLINK-11455) Support evictor operations on slicing and merging operators

2019-01-29 Thread Rong Rong (JIRA)

Rong Rong created FLINK-11455:
-

 Summary: Support evictor operations on slicing and merging 
operators
 Key: FLINK-11455
 URL: https://issues.apache.org/jira/browse/FLINK-11455
 Project: Flink
  Issue Type: Sub-task
Reporter: Rong Rong


The original implementation POC of SliceStream and MergeStream does not 
considere evicting window operations. this support can be further expanded in 
order to cover multiple timeout duration session windows. See 
[https://docs.google.com/document/d/1ziVsuW_HQnvJr_4a9yKwx_LEnhVkdlde2Z5l6sx5HlY/edit#heading=h.ihxm3alf3tk0.]
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11454) Support MergedStream operation

2019-01-29 Thread Rong Rong (JIRA)

Rong Rong created FLINK-11454:
-

 Summary: Support MergedStream operation
 Key: FLINK-11454
 URL: https://issues.apache.org/jira/browse/FLINK-11454
 Project: Flink
  Issue Type: Sub-task
  Components: DataStream API
Reporter: Rong Rong


Following SlicedStream, the mergedStream operator merges results from sliced 
stream and produces windowing results.
{code:java}
val slicedStream: SlicedStream = inputStream
  .keyBy("key")
  .sliceWindow(Time.seconds(5L))   // new “slice window” concept: to 
combine 
   // tumble results based on discrete
   // non-overlapping windows.
  .aggregate(aggFunc)

val mergedStream1: MergedStream = slicedStream
  .slideOver(Time.second(10L)) // combine slice results with same   
 
   // windowing function, equivalent to 
   // WindowOperator with an aggregate 
state 
   // and derived aggregate function.

val mergedStream2: MergedStream = slicedStream
  .slideOver(Count.of(5))
  .apply(windowFunction)   // apply a different window function 
over  
   // the sliced results.{code}
MergedStream are produced by MergeOperator. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Towards a leaner flink-dist

2019-01-29 Thread Chesnay Schepler

It is not viable for us, as of right now, to release both a lean and fat 
version of flink-dist.
We don't have the required tooling to assemble a correct NOTICE file for 
that scenario.


Besides that his would also go against recent efforts to reduce the 
total size of a Flink release,
as we'd be increasing the total size again by roughly 60% (and naturally 
also increase the compile

time of releases), which I'd like to avoid.

I like Stephans compromise of excluding reporters and file-systems; this 
removes more than 100mb

from the distribution yet still retains all the user-facing APIs.

Do note that hadoop will already not be included in convenience binaries 
for 1.8 . This was

the motivation behind the new section on the download page.

On 25.01.2019 06:42, Jark Wu wrote:

+1 for the leaner distribution and improve the "Download" page.

On Fri, 25 Jan 2019 at 01:54, Bowen Li  wrote:


+1 for leaner distribution and a better 'download' webpage.

+1 for a full distribution if we can automate it besides supporting the
leaner one. If we support both, I'd image release managers should be able
to package two distributions with a single change of parameter instead of
manually package the full distribution. How to achieve that needs to be
evaluated and discussed, probably can be something like 'mvn clean install
-Dfull/-Dlean', I'm not sure yet.


On Wed, Jan 23, 2019 at 10:11 AM Thomas Weise  wrote:


+1 for trimming the size by default and offering the fat distribution as
alternative download


On Wed, Jan 23, 2019 at 8:35 AM Till Rohrmann 
wrote:


Ufuk's proposal (having a lean default release and a user convenience
tarball) sounds good to me. That way advanced users won't be bothered by
an
unnecessarily large release and new users can benefit from having many
useful extensions bundled in one tarball.

Cheers,
Till

On Wed, Jan 23, 2019 at 3:42 PM Ufuk Celebi  wrote:


On Wed, Jan 23, 2019 at 11:01 AM Timo Walther 

wrote:

I think what is more important than a big dist bundle is a helpful
"Downloads" page where users can easily find available filesystems,
connectors, metric repoters. Not everyone checks Maven central for
available JAR files. I just saw that we added a "Optional components"
section recently [1], we just need to make it more prominent. This is
also done for the SQL connectors and formats [2].

+1 I fully agree with the importance of the Downloads page. We
definitely need to make any optional dependencies that users need to
download easy to find.

[jira] [Created] (FLINK-11453) Support SliceWindow with forwardable pane info

2019-01-29 Thread Rong Rong (JIRA)

Rong Rong created FLINK-11453:
-

 Summary: Support SliceWindow with forwardable pane  info
 Key: FLINK-11453
 URL: https://issues.apache.org/jira/browse/FLINK-11453
 Project: Flink
  Issue Type: Sub-task
  Components: DataStream API
Reporter: Rong Rong


Support slicing operation that produces slicing:
{code:java}
val slicedStream: SlicedStream = inputStream
  .keyBy("key")
  .sliceWindow(Time.seconds(5L))   // new “slice window” concept: to 
combine 
   // tumble results based on discrete
   // non-overlapping windows.
  .aggregate(aggFunc)
{code}
{{SlicedStream}} will produce results that exposes current {{WindowOperator}} 
internal state {{InternalAppendingState}}, which can be 
later applied with {{WindowFunction}} separately in another operator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Flink Table Case Statements appending spaces

2019-01-29 Thread Ramya Ramamurthy

Hi,

I have encountered a weird issue. When constructing a Table Query with CASE
Statements, like below:
.append("CASE ")
.append("WHEN aggrcat = '0' AND botcode='r4'  THEN 'monitoring' ")
.append("WHEN aggrcat = '1' AND botcode='r4' THEN 'aggregator' ")
.append("WHEN aggrcat = '2' AND botcode='r4' THEN 'social network' ")
 .append("WHEN botcode='r0' OR botcode='r8' THEN 'crawler' ")

Followed by converting to a stream:
DataStream ds = tableEnv.toAppendStream(table_name, Row.Class)
ds.print()

I can see that while I print this, it takes the length of the longest
string of all the four categories and appends spaces till that length.
Eg. Crawler is of length of the biggest string [social network mayb].

This affects when my data is sinked to ES. The fields are appended with
empty spaces, which affects the querying.

I know i can do a simple split to eliminate these spaces in my code, but
just curious to understand why this behavior. It becomes cumbersome to
maintain code this way and for readability as well.

~Ramya.

Re: Side Outputs for late arriving records

2019-01-29 Thread Ramya Ramamurthy

Hi Fabian,

I do not have anything to do "yourfunction".
.apply windowFunction is legacy is what the documentation says. But I am at
a loss to understand which of the reduce, aggregate, Fold, Apply must i
use, as i hardly have any operations to perform but to return the stream
with no late data back to me [So that I will construct a Flink table with
this data and do my processing there].

~Ramya.



On Tue, Jan 29, 2019 at 3:52 PM Fabian Hueske  wrote:

> Hi Ramya,
>
> This works by calling getSideOutput() on the main output of the window
> function.
> The main output is collected by applying a function on the window.
>
> DataStream input = ...
> OutputTag lateTag = ...
>
> DataStream mainResult = input
>   .keyBy(...)
>   .window(...)
>   .sideOutputLateData(lateTag)
>   .apply(yourFunction);
>
> DataStream lateRecords = mainResult.getSideOutput(lateTag);
>
> Best, Fabian
>
> Am Mo., 28. Jan. 2019 um 11:09 Uhr schrieb Ramya Ramamurthy <
> hair...@gmail.com>:
>
> > Hi,
> >
> > We were trying to collect the sideOutput.
> > But failed to understand as to how to convert this windowed stream to a
> > datastream.
> >
> > final OutputTag Timestamp>>
> > lateOutputTag = new OutputTag > String, Timestamp>>("late-data"){};
> > withTime.keyBy(0, 2)
> > .window(TumblingEventTimeWindows.of(Time.minutes(5)))
> > .allowedLateness(Time.minutes(1))
> > .sideOutputLateData(lateOutputTag);
> >
> > I would now have a windowed stream with records coming in late, tagged as
> > lateOutputTag. How to convert the packets which are not late , back to a
> > datastream. Do we need to use the .apply function to collect this data
> ...
> > quite unsure of this. Appreciate your help.
> >
> > Best Regards,
> >
> >
> >
> > On Thu, Jan 24, 2019 at 11:03 PM Fabian Hueske 
> wrote:
> >
> > > Hi Ramya,
> > >
> > > This would be a great feature, but unfortunately is not support (yet)
> by
> > > Flink SQL.
> > > Currently, all late records are dropped.
> > >
> > > A workaround is to ingest the stream as a DataStream, have a custom
> > > operator that routes all late records to a side output, and registering
> > the
> > > DataStream without late records as a table on which the SQL query is
> > > evaluated.
> > > This requires quite a bit of boilerplate code but could be hidden in a
> > util
> > > class.
> > >
> > > Best, Fabian
> > >
> > > Am Do., 24. Jan. 2019 um 06:42 Uhr schrieb Ramya Ramamurthy <
> > > hair...@gmail.com>:
> > >
> > > > Hi,
> > > >
> > > > I have a query with regard to Late arriving records.
> > > > We are using Flink 1.7 with Kafka Consumers of version 2.11.0.11.
> > > > In my sink operators, which converts this table to a stream which is
> > > being
> > > > pushed to Elastic Search, I am able to see this metric "
> > > > *numLateRecordsDropped*".
> > > >
> > > > My Kafka consumers doesn't seem to have any lag and the events are
> > > > processed properly. To be able to take these events to a side outputs
> > > > doesn't seem to be possible with tables. Below is the snippet:
> > > >
> > > > tableEnv.connect(new Kafka()
> > > >   /* setting of all kafka properties */
> > > >.startFromLatest())
> > > >.withSchema(new Schema()
> > > >.field("sid", Types.STRING())
> > > >.field("_zpsbd6", Types.STRING())
> > > >.field("r1", Types.STRING())
> > > >.field("r2", Types.STRING())
> > > >.field("r5", Types.STRING())
> > > >.field("r10", Types.STRING())
> > > >.field("isBot", Types.BOOLEAN())
> > > >.field("botcode", Types.STRING())
> > > >.field("ts", Types.SQL_TIMESTAMP())
> > > >.rowtime(new Rowtime()
> > > >.timestampsFromField("recvdTime")
> > > >.watermarksPeriodicBounded(1)
> > > >)
> > > >)
> > > >.withFormat(new Json().deriveSchema())
> > > >.inAppendMode()
> > > >.registerTableSource("sourceTopic");
> > > >
> > > >String sql = "SELECT sid, _zpsbd6 as ip, COUNT(*) as
> > total_hits, "
> > > >+ "TUMBLE_START(ts, INTERVAL '5' MINUTE) as
> > tumbleStart, "
> > > >+ "TUMBLE_END(ts, INTERVAL '5' MINUTE) as tumbleEnd
> FROM
> > > > sourceTopic "
> > > >+ "WHERE r1='true' or r2='true' or r5='true' or
> > r10='true'
> > > > and isBot='true' "
> > > >+ "GROUP BY TUMBLE(ts, INTERVAL '5' MINUTE), sid,
> > > _zpsbd6";
> > > >
> > > > Table source = tableEnv.sqlQuery(sql) ---> This is where the metric
> is
> > > > showing the lateRecordsDropped, while executing the group by
> operation.
> > > >
> > > > Is there  a way to get the sideOutput of this to be able to debug
> > better
> > > ??
> > > >
> > > > Thanks,
> > > > ~Ramya.
> > >

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread Jark Wu

Hi,

Thank you all for welcoming this proposal.
I will draft a detailed proposal about how to do contribute it step by step
in the next few days, and create umbrella JIRA once the proposal is
accepted.

@Fabian @JIngcheng  Regarding the name, I'm fine with
https://flink.apache.org/zh/.

@Yun Tang Thanks for pointing Alluxio out as a reference. I agree with that
reviewing Chinese documentation is not easy for the community.
But I think this wouldn't be a big problem as the community already has
many Chinese-speaking committers and more Chinese contributors are joining
in.

@Dian Fu  Adding a step to the review process is a good idea!
Maybe we can automatically add an item of "Chinese document sync" when the
document is modified with the help of the newly introduced review-bot.

Hi @Jörn Franke
> Keep also in mind the other direction eg a new/modified version of the
Chinese documentation needs to be reflected in the English one.
I think we should always modify English doc first, and then translate the
changes to Chinese document.

Hi @Chesnay
> Is the build-system for flink-china.org identical to flink-web?
The build-system of flink-china.org is wrote by Yadong (vthink...@gmail.com)
which is not using Jekyll.  You can access the code of flink-china.org via:
https://github.com/flink-china/doc

Best,
Jark

On Tue, 29 Jan 2019 at 18:21, Chesnay Schepler  wrote:

> Is the build-system for flink-china.org identical to flink-web?
>
> On 28.01.2019 12:48, Jark Wu wrote:
> > Hi all,
> >
> > In the past year, the Chinese community is working on building a Chinese
> > translated Flink website (http://flink.apache.org) and documents (
> > http://ci.apache.org/projects/flink/flink-docs-master/) in order to help
> > Chinese speaking users. This is http://flink-china.org and it has
> received
> > a lot of praise since online.
> >
> > In order to follow the Apache Way and grow Apache Flink community, we
> want
> > to contribute it to Apache Flink. It contains two parts to contribute:
> > (1) the Chinese translated version of the Flink website
> > (2) the Chinese translated version of the Flink documentation.
> >
> > But there are some questions are up to discuss:
> >
> > ## The Address of the translated version
> >
> > I think we can add a Chinese channel on official flink website, such as "
> > https://flink.apache.org/cn/;, which is similar as "
> > http://kylin.apache.org/cn/;. And use "
> > https://ci.apache.org/projects/flink/flink-docs-zh-master/; to put the
> > Chinese translated docs.
> >
> > ## Add a link to the translated version
> >
> > It would be great if we can add links to each other in both Chinese
> version
> > and English version. For example, we can add a link to the translated
> > website on the sidebar of the Flink website. We can also add a dropdown
> > button for the Chinese document version under the "Pick Docs Version" in
> > Flink document.
> >
> > ## How to contribute the translation in a long term
> >
> > This is a more important problem. Because translation is a huge and
> > long-term work. We need a healthy mechanism to ensure the sustainability
> of
> > contributions and the quality of translations.
> >
> > I would suggest to put the Chinese version document in flink repo (such
> as
> > "doc-zh" folder) and update with the master. Once we modify the English
> > doc, we have to update the Chinese doc together, or create a JIRA
> (contains
> > git commit id refer to the English modification) to do that. This will
> > increase some workload when we update the doc. But this will keep the
> > Chinese doc up to date. We can attract more Chinese contributors to help
> > build the doc. And the modification is small enough and easy to review.
> >
> > Maybe there is a better solution and we can also learn how the other
> > projects do it.
> >
> > Any feedbacks are welcome!
> >
> > Best,
> > Jark Wu
> >
>
>

[jira] [Created] (FLINK-11452) Make the table planner pluggable

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11452:


 Summary: Make the table planner pluggable
 Key: FLINK-11452
 URL: https://issues.apache.org/jira/browse/FLINK-11452
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

The previous tasks should have split the API from the Planner so we should be 
able to make it pluggable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11451) Move *QueryConfig and TableDescriptor to flink-table-api-java

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11451:


 Summary: Move *QueryConfig and TableDescriptor to 
flink-table-api-java
 Key: FLINK-11451
 URL: https://issues.apache.org/jira/browse/FLINK-11451
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

Move QueryConfig, BatchQueryConfig, StreamQueryConfig, TableDescriptor in 
flink-table-api-java.

Unblocks TableEnvironment interface task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Start new Review Process

2019-01-29 Thread Chesnay Schepler

I assume he means that bots cannot do it, as they'd require committer 
permissions. Same with assigning reviewers and such.


On 29.01.2019 13:09, Aljoscha Krettek wrote:

What do you mean by “merging cannot happen through the GitHub user interface”? 
You can in fact merge PRs by clicking on the merge button, or “rebase and 
merge”.

Aljoscha


On 29. Jan 2019, at 11:58, Robert Metzger  wrote:

@Fabian: Thank you for your suggestions. Multiple approvals in one comment
are already possible.
I will see if I can easily add multiple approvals in one line as well.
I will also address 2) and 3).

Regarding usage of the bot: Anyone can use it! It is up to the committer
who's merging in the end whether they are happy with the approver.
One of the feature ideas I have is to indicate whether somebody is PMC or
committer.

I'm against enforcing the order of approvals for now. I fear that this will
make the tool too restrictive. I like Ufuk's idea of putting a note into
the tracking comment for now.
Once it's active and we are using it day to day, we'll probably learn what
features we need the most.


@Ufuk: I think we should put it into a Apache repo at some point. But I'm
not sure if it's worth going through the effort of setting up a new repo
now, given that the bot is not even active yet, and we are not sure if it's
going to be useful.
Once it is active for a month or two, I will move it.

Regarding the bots in general: I don't see a problem with having multiple
bots in place, as long as they get along with each other well.
We should try not to reinvent the wheel, if there's already a good bot
implementation, I don't see a reason to not use it.
The problem in our case is that we have limited access to our GitHub page,
and merging can not happen through the GitHub user interface -- so I guess
many "off the shelf" bots will not work in our environment.
I'm thinking already about approaches how to automatically merge pull
requests with the bot. But this will be a separate mailing list thread :)

Thanks for the feedback!




On Mon, Jan 28, 2019 at 5:20 PM Ufuk Celebi  wrote:


Thanks for the clarification. I agree that it only makes sense to
check the points in order. +1 to add this if we can think of a nice
way to do it. I'm not sure how we would enforce the order with the bot
since there is only indirect feedback to a bot command. The only thing
I can think of at the moment is to add a note to a check in case
earlier steps where not executed. Just ignoring a bot command because
other commands have not been executed before feels not helpful to me
(since we can't prevent reviewers to jump to later steps if they wish
to do so).

I'd rather add a bold note to the bot template that makes clear that
all points should be followed in order to avoid potentially redundant
work.

– Ufuk

On Mon, Jan 28, 2019 at 5:01 PM Fabian Hueske  wrote:

The points in the review template are in the order in which they should

be

checked, i.e., first checking the description, then consensus and finally
checking the code.
Currently, it is possible to tick off the code box before checking the
description.
One motivation for the process was to do the steps in exactly the

proposed

order for example to to avoid detailed code reviews before there was
consensus whether the contribution was welcome or not.

Am Mo., 28. Jan. 2019 um 16:54 Uhr schrieb Ufuk Celebi :


I played around with the bot and it works pretty well. :-) @Robert:
Are there any plans to contribute the code for the bot to Apache
(potentially in another repository)?

I like Fabians suggestions. Regarding the questions:
1) I would make that dependent on whether you expected the review
guideline to only apply to committers or also regular contributors.
Since the bot is not merging PRs, I don't see a down side in having it
open for all contributors (except potential noise).
2) What do you mean with "order of approvals"?

Since there is another lively discussion going on about a bot for
stale PRs, I'm wondering what the future plans for @flinkbot are. Do
we want to have multiple bots for the project? Or do you plan to
support staleness checks in the future? What about merging of PRs?

– Ufuk

On Mon, Jan 28, 2019 at 4:23 PM Fabian Hueske 

wrote:

Hi Robert,

Thanks for working on the bot!
I have a few suggestions / questions:

Suggestions:
1) It would be great to approve multiple boxes in one comment.

Either as

@flinkbot approve contribution consensus

or by

@flinkbot approve contribution
@flinkbot approve consensus

2) Extend the last line of the description by adding something like

"See

Pull Request Review Guide for how to use the Flink bot."?
3) Rephrase the first line to include "[description]" instead of
"[contribution]", as it is about approving the description

Questions:
1) Can anybody use the bot or just committers?
2) Does it make sense to enforce the order in which approvals are

given?

Best,
Fabian

Am Mi., 23. Jan. 2019 um 13:51 Uhr schrieb Robert Metzger <

Re: [DISCUSS] Start new Review Process

2019-01-29 Thread Aljoscha Krettek

What do you mean by “merging cannot happen through the GitHub user interface”? 
You can in fact merge PRs by clicking on the merge button, or “rebase and 
merge”.

Aljoscha

> On 29. Jan 2019, at 11:58, Robert Metzger  wrote:
> 
> @Fabian: Thank you for your suggestions. Multiple approvals in one comment
> are already possible.
> I will see if I can easily add multiple approvals in one line as well.
> I will also address 2) and 3).
> 
> Regarding usage of the bot: Anyone can use it! It is up to the committer
> who's merging in the end whether they are happy with the approver.
> One of the feature ideas I have is to indicate whether somebody is PMC or
> committer.
> 
> I'm against enforcing the order of approvals for now. I fear that this will
> make the tool too restrictive. I like Ufuk's idea of putting a note into
> the tracking comment for now.
> Once it's active and we are using it day to day, we'll probably learn what
> features we need the most.
> 
> 
> @Ufuk: I think we should put it into a Apache repo at some point. But I'm
> not sure if it's worth going through the effort of setting up a new repo
> now, given that the bot is not even active yet, and we are not sure if it's
> going to be useful.
> Once it is active for a month or two, I will move it.
> 
> Regarding the bots in general: I don't see a problem with having multiple
> bots in place, as long as they get along with each other well.
> We should try not to reinvent the wheel, if there's already a good bot
> implementation, I don't see a reason to not use it.
> The problem in our case is that we have limited access to our GitHub page,
> and merging can not happen through the GitHub user interface -- so I guess
> many "off the shelf" bots will not work in our environment.
> I'm thinking already about approaches how to automatically merge pull
> requests with the bot. But this will be a separate mailing list thread :)
> 
> Thanks for the feedback!
> 
> 
> 
> 
> On Mon, Jan 28, 2019 at 5:20 PM Ufuk Celebi  wrote:
> 
>> Thanks for the clarification. I agree that it only makes sense to
>> check the points in order. +1 to add this if we can think of a nice
>> way to do it. I'm not sure how we would enforce the order with the bot
>> since there is only indirect feedback to a bot command. The only thing
>> I can think of at the moment is to add a note to a check in case
>> earlier steps where not executed. Just ignoring a bot command because
>> other commands have not been executed before feels not helpful to me
>> (since we can't prevent reviewers to jump to later steps if they wish
>> to do so).
>> 
>> I'd rather add a bold note to the bot template that makes clear that
>> all points should be followed in order to avoid potentially redundant
>> work.
>> 
>> – Ufuk
>> 
>> On Mon, Jan 28, 2019 at 5:01 PM Fabian Hueske  wrote:
>>> 
>>> The points in the review template are in the order in which they should
>> be
>>> checked, i.e., first checking the description, then consensus and finally
>>> checking the code.
>>> Currently, it is possible to tick off the code box before checking the
>>> description.
>>> One motivation for the process was to do the steps in exactly the
>> proposed
>>> order for example to to avoid detailed code reviews before there was
>>> consensus whether the contribution was welcome or not.
>>> 
>>> Am Mo., 28. Jan. 2019 um 16:54 Uhr schrieb Ufuk Celebi :
>>> 
 I played around with the bot and it works pretty well. :-) @Robert:
 Are there any plans to contribute the code for the bot to Apache
 (potentially in another repository)?
 
 I like Fabians suggestions. Regarding the questions:
 1) I would make that dependent on whether you expected the review
 guideline to only apply to committers or also regular contributors.
 Since the bot is not merging PRs, I don't see a down side in having it
 open for all contributors (except potential noise).
 2) What do you mean with "order of approvals"?
 
 Since there is another lively discussion going on about a bot for
 stale PRs, I'm wondering what the future plans for @flinkbot are. Do
 we want to have multiple bots for the project? Or do you plan to
 support staleness checks in the future? What about merging of PRs?
 
 – Ufuk
 
 On Mon, Jan 28, 2019 at 4:23 PM Fabian Hueske 
>> wrote:
> 
> Hi Robert,
> 
> Thanks for working on the bot!
> I have a few suggestions / questions:
> 
> Suggestions:
> 1) It would be great to approve multiple boxes in one comment.
>> Either as
>> @flinkbot approve contribution consensus
> or by
>> @flinkbot approve contribution
>> @flinkbot approve consensus
> 
> 2) Extend the last line of the description by adding something like
>> "See
> Pull Request Review Guide for how to use the Flink bot."?
> 3) Rephrase the first line to include "[description]" instead of
> "[contribution]", as it is about approving the description

[jira] [Created] (FLINK-11450) Port and move TableSource and TableSink to flink-table-common

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11450:


 Summary: Port and move TableSource and TableSink to 
flink-table-common
 Key: FLINK-11450
 URL: https://issues.apache.org/jira/browse/FLINK-11450
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

This step only unblockes the TableEnvironment interfaces task. 
Stream/BatchTableSouce/Sink remain in flink-table-api-java-bridge for now until 
they have been reworked.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Start new Review Process

2019-01-29 Thread Robert Metzger

@Fabian: Thank you for your suggestions. Multiple approvals in one comment
are already possible.
I will see if I can easily add multiple approvals in one line as well.
I will also address 2) and 3).

Regarding usage of the bot: Anyone can use it! It is up to the committer
who's merging in the end whether they are happy with the approver.
One of the feature ideas I have is to indicate whether somebody is PMC or
committer.

I'm against enforcing the order of approvals for now. I fear that this will
make the tool too restrictive. I like Ufuk's idea of putting a note into
the tracking comment for now.
Once it's active and we are using it day to day, we'll probably learn what
features we need the most.

@Ufuk: I think we should put it into a Apache repo at some point. But I'm
not sure if it's worth going through the effort of setting up a new repo
now, given that the bot is not even active yet, and we are not sure if it's
going to be useful.
Once it is active for a month or two, I will move it.

Regarding the bots in general: I don't see a problem with having multiple
bots in place, as long as they get along with each other well.
We should try not to reinvent the wheel, if there's already a good bot
implementation, I don't see a reason to not use it.
The problem in our case is that we have limited access to our GitHub page,
and merging can not happen through the GitHub user interface -- so I guess
many "off the shelf" bots will not work in our environment.
I'm thinking already about approaches how to automatically merge pull
requests with the bot. But this will be a separate mailing list thread :)

Thanks for the feedback!

On Mon, Jan 28, 2019 at 5:20 PM Ufuk Celebi  wrote:

> Thanks for the clarification. I agree that it only makes sense to
> check the points in order. +1 to add this if we can think of a nice
> way to do it. I'm not sure how we would enforce the order with the bot
> since there is only indirect feedback to a bot command. The only thing
> I can think of at the moment is to add a note to a check in case
> earlier steps where not executed. Just ignoring a bot command because
> other commands have not been executed before feels not helpful to me
> (since we can't prevent reviewers to jump to later steps if they wish
> to do so).
>
> I'd rather add a bold note to the bot template that makes clear that
> all points should be followed in order to avoid potentially redundant
> work.
>
> – Ufuk
>
> On Mon, Jan 28, 2019 at 5:01 PM Fabian Hueske  wrote:
> >
> > The points in the review template are in the order in which they should
> be
> > checked, i.e., first checking the description, then consensus and finally
> > checking the code.
> > Currently, it is possible to tick off the code box before checking the
> > description.
> > One motivation for the process was to do the steps in exactly the
> proposed
> > order for example to to avoid detailed code reviews before there was
> > consensus whether the contribution was welcome or not.
> >
> > Am Mo., 28. Jan. 2019 um 16:54 Uhr schrieb Ufuk Celebi :
> >
> > > I played around with the bot and it works pretty well. :-) @Robert:
> > > Are there any plans to contribute the code for the bot to Apache
> > > (potentially in another repository)?
> > >
> > > I like Fabians suggestions. Regarding the questions:
> > > 1) I would make that dependent on whether you expected the review
> > > guideline to only apply to committers or also regular contributors.
> > > Since the bot is not merging PRs, I don't see a down side in having it
> > > open for all contributors (except potential noise).
> > > 2) What do you mean with "order of approvals"?
> > >
> > > Since there is another lively discussion going on about a bot for
> > > stale PRs, I'm wondering what the future plans for @flinkbot are. Do
> > > we want to have multiple bots for the project? Or do you plan to
> > > support staleness checks in the future? What about merging of PRs?
> > >
> > > – Ufuk
> > >
> > > On Mon, Jan 28, 2019 at 4:23 PM Fabian Hueske 
> wrote:
> > > >
> > > > Hi Robert,
> > > >
> > > > Thanks for working on the bot!
> > > > I have a few suggestions / questions:
> > > >
> > > > Suggestions:
> > > > 1) It would be great to approve multiple boxes in one comment.
> Either as
> > > > > @flinkbot approve contribution consensus
> > > > or by
> > > > > @flinkbot approve contribution
> > > > > @flinkbot approve consensus
> > > >
> > > > 2) Extend the last line of the description by adding something like
> "See
> > > > Pull Request Review Guide for how to use the Flink bot."?
> > > > 3) Rephrase the first line to include "[description]" instead of
> > > > "[contribution]", as it is about approving the description
> > > >
> > > > Questions:
> > > > 1) Can anybody use the bot or just committers?
> > > > 2) Does it make sense to enforce the order in which approvals are
> given?
> > > >
> > > > Best,
> > > > Fabian
> > > >
> > > > Am Mi., 23. Jan. 2019 um 13:51 Uhr schrieb Robert Metzger <
> > > >

[jira] [Created] (FLINK-11442) Upgrade OSS SDK Version

2019-01-29 Thread wujinhu (JIRA)

wujinhu created FLINK-11442:
---

 Summary: Upgrade OSS SDK Version
 Key: FLINK-11442
 URL: https://issues.apache.org/jira/browse/FLINK-11442
 Project: Flink
  Issue Type: Improvement
  Components: filesystem-connector
Affects Versions: 1.8.0
Reporter: wujinhu
Assignee: wujinhu


Upgrade oss sdk version to exclude org.json dependency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11449) Uncouple the Expression class from RexNodes

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11449:


 Summary: Uncouple the Expression class from RexNodes
 Key: FLINK-11449
 URL: https://issues.apache.org/jira/browse/FLINK-11449
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

Calcite will not be part of any API module anymore. Therefore, RexNode 
translation must happen in a different layer. This issue will require a new 
design document.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11448) Clean-up and prepare Table API to be uncoupled from table core

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11448:


 Summary: Clean-up and prepare Table API to be uncoupled from table 
core
 Key: FLINK-11448
 URL: https://issues.apache.org/jira/browse/FLINK-11448
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

This step aims to provide a clean API that is preferably implemented in Java 
and uncoupled from the table core. Ideally, the API consists of well-documented 
interfaces. A planner can provide an implementation for those.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11447) Deprecate "new Table(TableEnvironment, String)"

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11447:


 Summary: Deprecate "new Table(TableEnvironment, String)"
 Key: FLINK-11447
 URL: https://issues.apache.org/jira/browse/FLINK-11447
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

Once table is an interface we can easily replace the underlying implementation 
at any time. The constructor call prevents us from converting it into an 
interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (FLINK-11446) FlinkKafkaProducer011ITCase.testRecoverCommittedTransaction failed on Travis

2019-01-29 Thread Till Rohrmann (JIRA)

Till Rohrmann created FLINK-11446:
-

 Summary: 
FlinkKafkaProducer011ITCase.testRecoverCommittedTransaction failed on Travis
 Key: FLINK-11446
 URL: https://issues.apache.org/jira/browse/FLINK-11446
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.8.0
Reporter: Till Rohrmann


The {{FlinkKafkaProducer011ITCase.testRecoverCommittedTransaction}} failed on 
Travis with producing no output for 10 minutes: 
https://api.travis-ci.org/v3/job/485771998/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: Side Outputs for late arriving records

2019-01-29 Thread Fabian Hueske

Hi Ramya,

This works by calling getSideOutput() on the main output of the window
function.
The main output is collected by applying a function on the window.

DataStream input = ...
OutputTag lateTag = ...

DataStream mainResult = input
  .keyBy(...)
  .window(...)
  .sideOutputLateData(lateTag)
  .apply(yourFunction);

DataStream lateRecords = mainResult.getSideOutput(lateTag);

Best, Fabian

Am Mo., 28. Jan. 2019 um 11:09 Uhr schrieb Ramya Ramamurthy <
hair...@gmail.com>:

> Hi,
>
> We were trying to collect the sideOutput.
> But failed to understand as to how to convert this windowed stream to a
> datastream.
>
> final OutputTag>
> lateOutputTag = new OutputTag String, Timestamp>>("late-data"){};
> withTime.keyBy(0, 2)
> .window(TumblingEventTimeWindows.of(Time.minutes(5)))
> .allowedLateness(Time.minutes(1))
> .sideOutputLateData(lateOutputTag);
>
> I would now have a windowed stream with records coming in late, tagged as
> lateOutputTag. How to convert the packets which are not late , back to a
> datastream. Do we need to use the .apply function to collect this data ...
> quite unsure of this. Appreciate your help.
>
> Best Regards,
>
>
>
> On Thu, Jan 24, 2019 at 11:03 PM Fabian Hueske  wrote:
>
> > Hi Ramya,
> >
> > This would be a great feature, but unfortunately is not support (yet) by
> > Flink SQL.
> > Currently, all late records are dropped.
> >
> > A workaround is to ingest the stream as a DataStream, have a custom
> > operator that routes all late records to a side output, and registering
> the
> > DataStream without late records as a table on which the SQL query is
> > evaluated.
> > This requires quite a bit of boilerplate code but could be hidden in a
> util
> > class.
> >
> > Best, Fabian
> >
> > Am Do., 24. Jan. 2019 um 06:42 Uhr schrieb Ramya Ramamurthy <
> > hair...@gmail.com>:
> >
> > > Hi,
> > >
> > > I have a query with regard to Late arriving records.
> > > We are using Flink 1.7 with Kafka Consumers of version 2.11.0.11.
> > > In my sink operators, which converts this table to a stream which is
> > being
> > > pushed to Elastic Search, I am able to see this metric "
> > > *numLateRecordsDropped*".
> > >
> > > My Kafka consumers doesn't seem to have any lag and the events are
> > > processed properly. To be able to take these events to a side outputs
> > > doesn't seem to be possible with tables. Below is the snippet:
> > >
> > > tableEnv.connect(new Kafka()
> > >   /* setting of all kafka properties */
> > >.startFromLatest())
> > >.withSchema(new Schema()
> > >.field("sid", Types.STRING())
> > >.field("_zpsbd6", Types.STRING())
> > >.field("r1", Types.STRING())
> > >.field("r2", Types.STRING())
> > >.field("r5", Types.STRING())
> > >.field("r10", Types.STRING())
> > >.field("isBot", Types.BOOLEAN())
> > >.field("botcode", Types.STRING())
> > >.field("ts", Types.SQL_TIMESTAMP())
> > >.rowtime(new Rowtime()
> > >.timestampsFromField("recvdTime")
> > >.watermarksPeriodicBounded(1)
> > >)
> > >)
> > >.withFormat(new Json().deriveSchema())
> > >.inAppendMode()
> > >.registerTableSource("sourceTopic");
> > >
> > >String sql = "SELECT sid, _zpsbd6 as ip, COUNT(*) as
> total_hits, "
> > >+ "TUMBLE_START(ts, INTERVAL '5' MINUTE) as
> tumbleStart, "
> > >+ "TUMBLE_END(ts, INTERVAL '5' MINUTE) as tumbleEnd FROM
> > > sourceTopic "
> > >+ "WHERE r1='true' or r2='true' or r5='true' or
> r10='true'
> > > and isBot='true' "
> > >+ "GROUP BY TUMBLE(ts, INTERVAL '5' MINUTE), sid,
> > _zpsbd6";
> > >
> > > Table source = tableEnv.sqlQuery(sql) ---> This is where the metric is
> > > showing the lateRecordsDropped, while executing the group by operation.
> > >
> > > Is there  a way to get the sideOutput of this to be able to debug
> better
> > ??
> > >
> > > Thanks,
> > > ~Ramya.
> > >
> >
>

[jira] [Created] (FLINK-11445) Deprecate static methods in TableEnvironments

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11445:


 Summary: Deprecate static methods in TableEnvironments
 Key: FLINK-11445
 URL: https://issues.apache.org/jira/browse/FLINK-11445
 Project: Flink
  Issue Type: New Feature
  Components: Table API  SQL
Reporter: Timo Walther


A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions]
 (Step ).

Direct to the {{Batch/StreamTableEnvionrment.create()}} approach. The 
{{create()}} method must not necessarily already perform a planner discovery. 
We can hard-code the target table environment for now.

{{TableEnvironment.create()}} is not supported yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread Chesnay Schepler


Is the build-system for flink-china.org identical to flink-web?

On 28.01.2019 12:48, Jark Wu wrote:

Hi all,

In the past year, the Chinese community is working on building a Chinese
translated Flink website (http://flink.apache.org) and documents (
http://ci.apache.org/projects/flink/flink-docs-master/) in order to help
Chinese speaking users. This is http://flink-china.org and it has received
a lot of praise since online.

In order to follow the Apache Way and grow Apache Flink community, we want
to contribute it to Apache Flink. It contains two parts to contribute:
(1) the Chinese translated version of the Flink website
(2) the Chinese translated version of the Flink documentation.

But there are some questions are up to discuss:

## The Address of the translated version

I think we can add a Chinese channel on official flink website, such as "
https://flink.apache.org/cn/;, which is similar as "
http://kylin.apache.org/cn/;. And use "
https://ci.apache.org/projects/flink/flink-docs-zh-master/; to put the
Chinese translated docs.

## Add a link to the translated version

It would be great if we can add links to each other in both Chinese version
and English version. For example, we can add a link to the translated
website on the sidebar of the Flink website. We can also add a dropdown
button for the Chinese document version under the "Pick Docs Version" in
Flink document.

## How to contribute the translation in a long term

This is a more important problem. Because translation is a huge and
long-term work. We need a healthy mechanism to ensure the sustainability of
contributions and the quality of translations.

I would suggest to put the Chinese version document in flink repo (such as
"doc-zh" folder) and update with the master. Once we modify the English
doc, we have to update the Chinese doc together, or create a JIRA (contains
git commit id refer to the English modification) to do that. This will
increase some workload when we update the doc. But this will keep the
Chinese doc up to date. We can attract more Chinese contributors to help
build the doc. And the modification is small enough and easy to review.

Maybe there is a better solution and we can also learn how the other
projects do it.

Any feedbacks are welcome!

Best,
Jark Wu

[jira] [Created] (FLINK-11444) Deprecate methods for uncoupling table API and table core

2019-01-29 Thread Timo Walther (JIRA)

Timo Walther created FLINK-11444:


 Summary: Deprecate methods for uncoupling table API and table core 
 Key: FLINK-11444
 URL: https://issues.apache.org/jira/browse/FLINK-11444
 Project: Flink
  Issue Type: Sub-task
  Components: Table API  SQL
Reporter: Timo Walther


Some API changes are required in order to uncouple table API from the table 
core.

A more detailed description can be found in 
[FLIP-32|https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions].

We will first deprecate the affected methods to give people time to update the 
calls in the next Flink release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

numLateRecordsDropped

2019-01-29 Thread Ramya Ramamurthy

Hi,

I am reading from Kafka and pushing it to the ES. There are basically 2
operators that we have, one to consume from Kafka and the other does the
operation on the Flink table and pushes to ES.

[image: image.png]

I am able to see the Flink Metrics : numLateRecordsDropped on my second
operator, losing some records due to late arrival. Is my understanding
right to assume these are the messages which have arrived later than my
tumbling window [my window is 5 mins with an allowed lateness of 1 min].

What are the possible causes for these ?? --- I am not seeing any lags in
my kafka consumer. The ES sink looks to work fine.
Of-course there can be a possibility of the producer producing this late.
But for that, any thing else that I could have missed on ??
Any thoughts would be appreciated !!

Regards.

Re: [DISCUSS] FLIP-32: Restructure flink-table for future contributions

2019-01-29 Thread Timo Walther

Hi everyone,

thanks for the positive feedback we received so far. I converted the
design document into a actual FLIP in the wiki. You can find it here:

https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions

I'm in the process of creating the first JIRA issues to kick off the
preparation efforts. Most of the issues will be first or second level
children of FLINK-11439:

https://issues.apache.org/jira/browse/FLINK-11439

Feel free to further join the discussions and provide feedback. I'm sure
there will be a lot of stuff to discuss :-)

Thanks,
Timo

Am 25.01.19 um 16:43 schrieb fudian.fd:

Hi Timo,

Thanks a lot for bringing up the discussion and the detailed implementation
plan. The implementation plan makes much sense to me. +1 to the FLIP.

Regards,
Dian

在 2019年1月25日，下午11:00，Jark Wu 写道：

Hi Timo,

Thanks for the detailed design. +1 for the FLIP.

It's very nice to have Blink planner and runtime as a plugin in the early
stages. This will keep flink-table stable as much as possible.

Best,
Jark

On Fri, 25 Jan 2019 at 22:26, Hequn Cheng wrote:

Hi Timo,

+1 for the FLIP!

Great work and thanks a lot for the detailed document! The task dependency
graph is very helpful for teasing out relationships between tasks.
Looking forward to the JIRAs and hoping to contribute to it!

Best,
Hequn

On Fri, Jan 25, 2019 at 7:54 PM Piotr Nowojski
wrote:

Hi,

+1 from my side for this plan. The proposed decoupling TypeSystem and

most

interface (TableSource/Sinks, UDFs, Catalogs, …) discussions from the of
merging MVP Blink's runtime gives us best shot of handling Blink merging

fluently and as painless as possible. I’m also looking forward to the
follow up discussions that will need to take place in order to achieve

the

final goal :)

Piotrek

On 25 Jan 2019, at 07:53, jincheng sun

wrote:

Hi Timo,

Thanks a lot for bring up the FLIP-32 discussion and the very detailed
implementation plan document !

Restructure `flink-table` is an important part of flink merge blink,
looking forward to the JIRAs which will be opened !

Cheers，
Jincheng

Timo Walther 于2019年1月24日周四 下午9:06写道：

Hi everyone,

as Stephan already announced on the mailing list [1], the Flink
community will receive a big code contribution from Alibaba. The
flink-table module is one of the biggest parts that will receive many
new features and major architectural improvements. Instead of waiting
until the next major version of Flink or introducing big API-breaking
changes, we would like to gradually build up the Blink-based planner

and

runtime while keeping the Table & SQL API mostly stable. Users will be
able to play around with the current merge status of the new planner

fall back to the old planner until the new one is stable.

We have prepared a design document that discusses a restructuring of

the

flink-table module and suggests a rough implementation plan:

https://docs.google.com/document/d/1Tfl2dBqBV3qSBy7oV3qLYvRRDbUOasvA1lhvYWWljQw/edit?usp=sharing

I will briefly summarize the steps we would like to do:

- Split the flink-table module similar to the proposal of FLIP-28 [3]
which is outdated. This is a preparation to separate API from core
(targeted for Flink 1.8).
- Perform minor API changes to separate API from actual implementation
(targeted for Flink 1.8).
- Merge a MVP Blink SQL planner given that necessary Flink

core/runtime

changes have been completed.
The merging will happen in stages (e.g. basic planner framework,

then

operator by operator). The exact merging plan still needs to be

determined.

- Rework the type system in order to unblock work on unified table
environments, UDFs, sources/sinks, and catalog.
- Enable full end-to-end batch and stream execution features.

Our mid-term goal:

Run full TPC-DS on a unified batch/streaming runtime. Initially, we

will

only support ingesting data coming from the DataStream API. Once we
reworked the sources/sink interfaces, we will target full end-to-end
TPC-DS query execution with table connectors.

A rough task dependency graph is illustrated in the design document. A
more detailed task dependency structure will be added to JIRA after we
agreed on this FLIP.

Looking forward to any feedback.

Thanks,
Timo

[1]

https://lists.apache.org/thread.html/2f7330e85d702a53b4a2b361149930b50f2e89d8e8a572f8ee2a0e6d@%3Cdev.flink.apache.org%3E

[2]

https://lists.apache.org/thread.html/6066abd0f09fc1c41190afad67770ede8efd0bebc36f00938eecc118@%3Cdev.flink.apache.org%3E

[3]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-29 Thread Jark Wu

Cheers!

Subscribed. Looking forward to the first Chinese question ;)

On Tue, 29 Jan 2019 at 17:16, Robert Metzger  wrote:

> Success!
> The mailing list has been created.
>
> Send an email to "user-zh-subscr...@flink.apache.org" to subscribe!
> I've also updated the website with the list:
> https://flink.apache.org/community.html
>
> I will now also tweet about it, even though I believe it'll be more
> important to advertise the list on Chinese social media platforms.
>
>
> On Tue, Jan 29, 2019 at 1:52 AM ZILI CHEN  wrote:
>
> > +1，sounds good
> >
> > Ufuk Celebi  于2019年1月29日周二 上午1:46写道：
> >
> > > I'm late to this party but big +1. Great idea! I think this will help
> > > to better represent the actual Flink community size and increase
> > > interaction between the English and non-English speaking community.
> > > :-)
> > >
> > > On Mon, Jan 28, 2019 at 6:02 PM jincheng sun  >
> > > wrote:
> > > >
> > > > +1，I like the idea very much！
> > > >
> > > > Robert Metzger 于2019年1月24日 周四19:15写道：
> > > >
> > > > > Hey all,
> > > > >
> > > > > I would like to create a new user support mailing list called "
> > > > > user...@flink.apache.org" to cater the Chinese-speaking Flink
> > > community.
> > > > >
> > > > > Why?
> > > > > In the last year 24% of the traffic on flink.apache.org came from
> > the
> > > US,
> > > > > 22% from China. In the last three months, China is at 30%, the US
> at
> > > 20%.
> > > > > An additional data point is that there's a Flink DingTalk group
> with
> > > more
> > > > > than 5000 members, asking Flink questions.
> > > > > I believe that knowledge about Flink should be available in public
> > > forums
> > > > > (our mailing list), indexable by search engines. If there's a huge
> > > demand
> > > > > in a Chinese language support, we as a community should provide
> these
> > > users
> > > > > the tools they need, to grow our community and to allow them to
> > follow
> > > the
> > > > > Apache way.
> > > > >
> > > > > Is it possible?
> > > > > I believe it is, because a number of other Apache projects are
> > running
> > > > > non-English user@ mailing lists.
> > > > > Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> > > non-English
> > > > > lists: http://mail-archives.apache.org/mod_mbox/
> > > > > One thing I want to make very clear in this discussion is that all
> > > project
> > > > > decisions, developer discussions, JIRA tickets etc. need to happen
> in
> > > > > English, as this is the primary language of the Apache Foundation
> and
> > > our
> > > > > community.
> > > > > We should also clarify this on the page listing the mailing lists.
> > > > >
> > > > > How?
> > > > > If there is consensus in this discussion thread, I would request
> the
> > > new
> > > > > mailing list next Monday.
> > > > > In case of discussions, I will start a vote on Monday or when the
> > > > > discussions have stopped.
> > > > > Then, we should put the new list on our website and start promoting
> > it
> > > (in
> > > > > said DingTalk group and on social media).
> > > > >
> > > > > Let me know what you think about this idea :)
> > > > >
> > > > > Best,
> > > > > Robert
> > > > >
> > > > >
> > > > > PS: In case you are wondering what ZH stands for:
> > > > > https://en.wiktionary.org/wiki/ZH
> > > > >
> > >
> >
>

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread JMX

Hi Jark,
I am glad to see your exciting email about contributing to Flink Chinese
Community. Remebering the days to solve outstanding issues in Spark
Officical Documents, the project was developped slowly, I am looking forward
to more and more contributions to Flink Chinese Community. Of course, I am
glad to join in the activities. I hope that we could help more and more
Chinese developers to develop their projects more and more quickly and
smoothly by Flink. It is a very meaningful thing!

Best,
Hunk



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread SteNicholas

Hi Jark, 

Thank you for starting this discussion.I am very willing to participate in
flink document translation.

Best, 
Nicholas



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

[jira] [Created] (FLINK-11443) Recovering from save point error after adding "sum (constant)"

2019-01-29 Thread tivanli (JIRA)

tivanli created FLINK-11443:
---

 Summary:  Recovering from save point error after adding "sum 
(constant)"
 Key: FLINK-11443
 URL: https://issues.apache.org/jira/browse/FLINK-11443
 Project: Flink
  Issue Type: Bug
  Components: Table API  SQL
Affects Versions: 1.6.3, 1.6.2
Reporter: tivanli


h3. Resuming from savepoint error when I add a "sum (2)" column to flink sql .

 
{code:java}
org.apache.flink.util.StateMigrationException: State migration is currently not 
supported.
at 
org.apache.flink.util.StateMigrationException.notSupported(StateMigrationException.java:42)
at 
org.apache.flink.runtime.state.RegisteredKeyValueStateBackendMetaInfo.resolveKvStateCompatibility(RegisteredKeyValueStateBackendMetaInfo.java:212)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.tryRegisterKvStateInformation(RocksDBKeyedStateBackend.java:1336)
at 
org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend.createInternalState(RocksDBKeyedStateBackend.java:1391)
at 
org.apache.flink.runtime.state.KeyedStateFactory.createInternalState(KeyedStateFactory.java:47)
at 
org.apache.flink.runtime.state.ttl.TtlStateFactory.createStateAndWrapWithTtlIfEnabled(TtlStateFactory.java:63)
at 
org.apache.flink.runtime.state.AbstractKeyedStateBackend.getOrCreateKeyedState(AbstractKeyedStateBackend.java:238)
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.getOrCreateKeyedState(AbstractStreamOperator.java:562)
at 
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.open(WindowOperator.java:240)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:424)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:290)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
at java.lang.Thread.run(Thread.java:748)
{code}
 
{code:java}
--old sql
SELECT TUMBLE_START(rowtime, INTERVAL '1' minute) as dtEventTime,
word,
sum(frequency) as frequency
FROM test
GROUP BY word,
TUMBLE(rowtime, INTERVAL '1' minute)

--new sql
SELECT TUMBLE_START(rowtime, INTERVAL '1' minute) as dtEventTime,
word,
sum(frequency) as frequency,
sum(2) as s2
FROM test
GROUP BY word,
TUMBLE(rowtime, INTERVAL '1' minute)
{code}
 

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Re: [DISCUSS] Start a user...@flink.apache.org mailing list for the Chinese-speaking community?

2019-01-29 Thread Robert Metzger

Success!
The mailing list has been created.

Send an email to "user-zh-subscr...@flink.apache.org" to subscribe!
I've also updated the website with the list:
https://flink.apache.org/community.html

I will now also tweet about it, even though I believe it'll be more
important to advertise the list on Chinese social media platforms.


On Tue, Jan 29, 2019 at 1:52 AM ZILI CHEN  wrote:

> +1，sounds good
>
> Ufuk Celebi  于2019年1月29日周二 上午1:46写道：
>
> > I'm late to this party but big +1. Great idea! I think this will help
> > to better represent the actual Flink community size and increase
> > interaction between the English and non-English speaking community.
> > :-)
> >
> > On Mon, Jan 28, 2019 at 6:02 PM jincheng sun 
> > wrote:
> > >
> > > +1，I like the idea very much！
> > >
> > > Robert Metzger 于2019年1月24日 周四19:15写道：
> > >
> > > > Hey all,
> > > >
> > > > I would like to create a new user support mailing list called "
> > > > user...@flink.apache.org" to cater the Chinese-speaking Flink
> > community.
> > > >
> > > > Why?
> > > > In the last year 24% of the traffic on flink.apache.org came from
> the
> > US,
> > > > 22% from China. In the last three months, China is at 30%, the US at
> > 20%.
> > > > An additional data point is that there's a Flink DingTalk group with
> > more
> > > > than 5000 members, asking Flink questions.
> > > > I believe that knowledge about Flink should be available in public
> > forums
> > > > (our mailing list), indexable by search engines. If there's a huge
> > demand
> > > > in a Chinese language support, we as a community should provide these
> > users
> > > > the tools they need, to grow our community and to allow them to
> follow
> > the
> > > > Apache way.
> > > >
> > > > Is it possible?
> > > > I believe it is, because a number of other Apache projects are
> running
> > > > non-English user@ mailing lists.
> > > > Apache OpenOffice, Cocoon, OpenMeetings, CloudStack all have
> > non-English
> > > > lists: http://mail-archives.apache.org/mod_mbox/
> > > > One thing I want to make very clear in this discussion is that all
> > project
> > > > decisions, developer discussions, JIRA tickets etc. need to happen in
> > > > English, as this is the primary language of the Apache Foundation and
> > our
> > > > community.
> > > > We should also clarify this on the page listing the mailing lists.
> > > >
> > > > How?
> > > > If there is consensus in this discussion thread, I would request the
> > new
> > > > mailing list next Monday.
> > > > In case of discussions, I will start a vote on Monday or when the
> > > > discussions have stopped.
> > > > Then, we should put the new list on our website and start promoting
> it
> > (in
> > > > said DingTalk group and on social media).
> > > >
> > > > Let me know what you think about this idea :)
> > > >
> > > > Best,
> > > > Robert
> > > >
> > > >
> > > > PS: In case you are wondering what ZH stands for:
> > > > https://en.wiktionary.org/wiki/ZH
> > > >
> >
>

?????? [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread ????

Hi Jark,
Thanks for starting the discussion! 


I think integrating the website and documentation as subdirectories into
the existing website and docs is a very good approach too.


And when submitting a pull request, we should notice the author whether they 
changed the documents, If yes, should create a jira for the Chinese documents.


--  --
??: "fudian.fd";
: 2019??1??29??(??) 1:30
??: "dev";

: Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink



Hi Jark,

Thanks a lot for starting the discussion! It would be great to have an official 
Flink Chinese doc.  For the long term maintaining problem, I think creating a 
JIRA when the English documentation is updated is a good idea. Should we add 
one item such as "Does this pull request updated the documentation? If yes, is 
the Chinese documentation is updated?" in the pull-request-template to remind 
the contributors to create the JIRA if Chinese documentation should be updated?

Regards,
Dian

> ?? 2019??1??2911:13??jincheng sun  ??
> 
> Thanks Jark starting this discussion!
> 
> Hi Fabian, very glad to hear that you like this proposal.
> As far as I know?? `zh` is the language, `cn` is the territory, such as:
> `zh-cn` representative
> Simplified Chinese (China) ,  `zh-tw` representative Traditional Chinese
> (Taiwan), So i like the naming the mailing list and website as follows:
> 
> maillist: user...@flink.apache.org
> website: https://flink.apache.org/zh/
> 
> BTW:
> I completely agree with translating resources is a great way to contribute
> without writing source code.
> Welcome and thank every contributors to the translation resources??
> 
> Best, Jincheng
> 
> Fabian Hueske  ??2019??1??28?? 9:49??
> 
>> Hi Jark,
>> 
>> Thank you for starting this discussion!
>> I'm very happy about the various efforts to support the Chinese Flink
>> community.
>> Offering a translated website and documentation gives Flink a lot more
>> reach and will help many users.
>> 
>> I think integrating the website and documentation as subdirectories into
>> the existing website and docs is a very good approach.
>> Regarding the name, does it make sense to keep the URLs in sync with the
>> newly created mailing list (user...@flink.apache.org), i.e.,
>> https://flink.apache.org/zh/  etc?
>> 
>> I think integrating the translated website/documentation into Apache
>> Flink's repositories might also help to grow the number of Chinese non-code
>> contributors.
>> Translating resources is IMO a great way to contribute without writing
>> source code.
>> 
>> I'm very much looking forward to this.
>> 
>> Best, Fabian
>> 
>> Am Mo., 28. Jan. 2019 um 12:59 Uhr schrieb Jark Wu :
>> 
>>> Hi all,
>>> 
>>> In the past year, the Chinese community is working on building a Chinese
>>> translated Flink website (http://flink.apache.org) and documents (
>>> http://ci.apache.org/projects/flink/flink-docs-master/) in order to help
>>> Chinese speaking users. This is http://flink-china.org and it has
>> received
>>> a lot of praise since online.
>>> 
>>> In order to follow the Apache Way and grow Apache Flink community, we
>> want
>>> to contribute it to Apache Flink. It contains two parts to contribute:
>>> (1) the Chinese translated version of the Flink website
>>> (2) the Chinese translated version of the Flink documentation.
>>> 
>>> But there are some questions are up to discuss:
>>> 
>>> ## The Address of the translated version
>>> 
>>> I think we can add a Chinese channel on official flink website, such as "
>>> https://flink.apache.org/cn/;, which is similar as "
>>> http://kylin.apache.org/cn/;. And use "
>>> https://ci.apache.org/projects/flink/flink-docs-zh-master/; to put the
>>> Chinese translated docs.
>>> 
>>> ## Add a link to the translated version
>>> 
>>> It would be great if we can add links to each other in both Chinese
>> version
>>> and English version. For example, we can add a link to the translated
>>> website on the sidebar of the Flink website. We can also add a dropdown
>>> button for the Chinese document version under the "Pick Docs Version" in
>>> Flink document.
>>> 
>>> ## How to contribute the translation in a long term
>>> 
>>> This is a more important problem. Because translation is a huge and
>>> long-term work. We need a healthy mechanism to ensure the sustainability
>> of
>>> contributions and the quality of translations.
>>> 
>>> I would suggest to put the Chinese version document in flink repo (such
>> as
>>> "doc-zh" folder) and update with the master. Once we modify the English
>>> doc, we have to update the Chinese doc together, or create a JIRA
>> (contains
>>> git commit id refer to the English modification) to do that. This will
>>> increase some workload when we update the doc. But this will keep the
>>> Chinese doc up to date. We can attract more Chinese contributors to help
>>> build the doc. And the

Re: [DISCUSS] Contributing Chinese website and docs to Apache Flink

2019-01-29 Thread Jörn Franke

Keep also in mind the other direction eg a new/modified version of the Chinese 
documentation needs to be reflected in the English one.

> Am 29.01.2019 um 06:35 schrieb SteNicholas :
> 
> Hi Jark, 
> 
> Thank you for starting this discussion.I am very willing to participate in
> flink document translation. 
> 
> Best, 
> Nicholas
> 
> 
> 
> 
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

42 matches

Mail list logo