[jira] [Created] (FLINK-9188) Provide a mechanism to configure AmazonKinesisClient in FlinkKinesisConsumer

2018-04-16 Thread Thomas Weise (JIRA)
Thomas Weise created FLINK-9188:
---

 Summary: Provide a mechanism to configure AmazonKinesisClient in 
FlinkKinesisConsumer
 Key: FLINK-9188
 URL: https://issues.apache.org/jira/browse/FLINK-9188
 Project: Flink
  Issue Type: Task
  Components: Kinesis Connector
Reporter: Thomas Weise


It should be possible to control the ClientConfiguration to set socket timeout 
and other properties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9187) add prometheus pushgateway reporter

2018-04-16 Thread lamber-ken (JIRA)
lamber-ken created FLINK-9187:
-

 Summary: add prometheus pushgateway reporter
 Key: FLINK-9187
 URL: https://issues.apache.org/jira/browse/FLINK-9187
 Project: Flink
  Issue Type: New Feature
  Components: Metrics
Affects Versions: 1.4.2
Reporter: lamber-ken
 Fix For: 1.5.0


make flink system can send metrics to prometheus via pushgateway.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Validating markdown doc files before creating a PR

2018-04-16 Thread Chesnay Schepler
You can serve the documentation locally as described here 
.


On 16.04.2018 21:17, Ken Krugler wrote:

Hi all,

I’m curious how devs check that their .md file edits are correct.

I've tried a few different plugins and command line utilities, but the mix of 
HTML and markdown has meant none of them render properly.

Thanks,

— Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr






Validating markdown doc files before creating a PR

2018-04-16 Thread Ken Krugler
Hi all,

I’m curious how devs check that their .md file edits are correct.

I've tried a few different plugins and command line utilities, but the mix of 
HTML and markdown has meant none of them render properly.

Thanks,

— Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Re: Kinesis getRecords read timeout and retry

2018-04-16 Thread Thomas Weise
Hi Gordon,

This is indeed a discussion necessary to have!

The purpose of previous PRs wasn't to provide solutions to the original
identified issues, but rather to enable solve those through customization.
What those customizations would be was also communicated, along with the
intent to contribute them subsequently as well, if they are deemed broadly
enough applicable and we find a reasonable contribution path.

So far we have implemented the following in custom code:

* use ListShards for discovery (plus also considered to limit the discovery
to a single subtask and share the results between subtasks, which is almost
certainly not something I would propose to add to Flink due to additional
deployment dependencies).

* override emitRecords in the fetcher to provide source watermarking with
idle shard handling. Related discussions for the Kafka consumer show that
it isn't straightforward to arrive at a solution that will satisfy
everyone. Still open to contribute those changes also, but had not seen a
response to that. Nevertheless, it is key to allow users to implement what
they need for their use case.

* retry certain exceptions in getRecords based on our production learnings.
Whether or not those are applicable to everyone and the Flink
implementation should be changed to retry by default is actually a future
discussion I'm intending to start. But in any case, we need to be able to
make the changes that we need on our end.

* ability to configure the AWS HTTP client when defaults turn out
unsuitable for the use case. This is a very basic requirement and it is
rather surprising that the Flink Kinesis consumer wasn't written to provide
access to the settings that the AWS SDK provides.

I hope above examples make clear that it is necessary to leave room for
users to augment a base implementation. There is no such thing as a perfect
connector and there will always be new discoveries by users that require
improvements or changes. Use case specific considerations may require to
augment the even best default behavior, what works for one user may not
work for another.

If I don't have the hooks that referenced PRs enable, then the alternative
is to fork the code. That will further reduce the likelihood of changes
making their way back to Flink.

I think we agree in the ultimate goal of improving the default
implementation of the connector. There are more fundamental issues with the
Kinesis connector (and other connectors) that I believe require deeper
design work and rewrite, which go beyond what we discuss here.

Finally, I'm also curious how much appetite for contributions in the
connector areas there is? I see that we have now accumulated 340 open PRs,
and review bandwidth seems hard to come by.

Thanks,
Thomas


On Sun, Apr 15, 2018 at 8:56 PM, Tzu-Li (Gordon) Tai 
wrote:

> Hi Thomas,
>
> Thanks for your PRs!
>
> I understand and fully agree with both points that you pointed out.
>
> What I'm still a bit torn with is the current proposed solutions for these
> issues (and other similar connector issues).
>
> This might actually call for a good opportunity to bring some thoughts up
> about connector contributions. My arguments would be the following:
>
> The solutions actually break some fundamental designs of the connector
> code.
> For example, in recent PRs for the Kinesis connector we've been proposing
> to relax access of the `KinesisProxy` constructor.
> AFAIK, this fix was triggered by an inefficiency in the
> `KinesisProxy#getShardsOfStream` method which influences shard discovery
> performance.
> First of all, such a change breaks the fact that the class is an internal
> class (it is marked as @Internal). It was made private as it handles
> critical paths such as record fetching and shard listing, and is not
> intended to be modified at all.
> Second of all, the fix in the end did not fix the inefficiency at all -
> only for advanced users who perhaps have saw the corresponding JIRA and
> would bother to do the same and override the inefficient implementations by
> themselves.
> If there is a fix that would have benefited all users of the connector in
> general, I would definitely be more in favor of that.
> This goes the same for https://github.com/apache/flink/pull/5803 - I'm not
> sure that allowing overrides on the retry logic is ideal. For example, we
> previously introduced in the Elasticsearch connector a RetryHandler
> user-facing API to allow such customizations.
>
> On one hand, I do understand that solving these connector issues properly
> would perhaps require a more thorough design-wise ground-work and could be
> more time-consuming.
> On the other hand, I also understand that we need to find a good balance to
> allow production users of these connectors to be able to quickly iterate
> what issues the current code has and unblock encountered problems.
>
> My main concern is that our current approach to fixing these issues, IMO,
> actually do not encourage good 

[jira] [Created] (FLINK-9186) Enable dependency convergence for flink-libraries

2018-04-16 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-9186:
---

 Summary: Enable dependency convergence for flink-libraries
 Key: FLINK-9186
 URL: https://issues.apache.org/jira/browse/FLINK-9186
 Project: Flink
  Issue Type: Sub-task
  Components: Build System
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9185) Potential null dereference in PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives

2018-04-16 Thread Ted Yu (JIRA)
Ted Yu created FLINK-9185:
-

 Summary: Potential null dereference in 
PrioritizedOperatorSubtaskState#resolvePrioritizedAlternatives
 Key: FLINK-9185
 URL: https://issues.apache.org/jira/browse/FLINK-9185
 Project: Flink
  Issue Type: Bug
Reporter: Ted Yu


{code}
if (alternative != null
  && alternative.hasState()
  && alternative.size() == 1
  && approveFun.apply(reference, alternative.iterator().next())) {
{code}
The return value from approveFun.apply would be unboxed.
We should check that the return value is not null.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9184) Remove warning from kafka consumer docs

2018-04-16 Thread Juho Autio (JIRA)
Juho Autio created FLINK-9184:
-

 Summary: Remove warning from kafka consumer docs
 Key: FLINK-9184
 URL: https://issues.apache.org/jira/browse/FLINK-9184
 Project: Flink
  Issue Type: Sub-task
Reporter: Juho Autio


Once the main problem of FLINK-5479 has been fixed, remove the warning about 
idle from the docs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9183) Kafka consumer docs to warn about idle partitions

2018-04-16 Thread Juho Autio (JIRA)
Juho Autio created FLINK-9183:
-

 Summary: Kafka consumer docs to warn about idle partitions
 Key: FLINK-9183
 URL: https://issues.apache.org/jira/browse/FLINK-9183
 Project: Flink
  Issue Type: Sub-task
Reporter: Juho Autio


Looks like the bug FLINK-5479 is entirely preventing 
FlinkKafkaConsumerBase#assignTimestampsAndWatermarks to be used if there are 
any idle partitions. It would be nice to mention in documentation that 
currently this requires all subscribed partitions to have a constant stream of 
data with growing timestamps. When watermark gets stalled on an idle partition 
it blocks everything.
 
Link to current documentation:
[https://ci.apache.org/projects/flink/flink-docs-master/dev/connectors/kafka.html#kafka-consumers-and-timestamp-extractionwatermark-emission]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Flink 1.3.2 with CEP Pattern, Memory usage increases results in OOM

2018-04-16 Thread Kostas Kloudas
Hi Abiramalakshmi,

Thanks for reporting this!

As a starting point I would recommend:

1) use RocksDB as your backend, so that state is not accumulated in memory
2) enable incremental checkpoints
3) the “new IterativeCondition() {…}” can become “new 
SimpleCondition() {}”, 
as this is more efficient
4) set the default watermark interval to a small value so that you have 
frequent watermarks and elements are not 
   accumulated.

If you do the above, please let me know if the problems persist.

Thanks,
Kostas


> On Apr 10, 2018, at 1:19 PM, Abiramalakshmi Natarajan 
>  wrote:
> 
> new Iter



[jira] [Created] (FLINK-9182) async checkpoints for timer service

2018-04-16 Thread makeyang (JIRA)
makeyang created FLINK-9182:
---

 Summary: async checkpoints for timer service
 Key: FLINK-9182
 URL: https://issues.apache.org/jira/browse/FLINK-9182
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.4.2, 1.5.0
Reporter: makeyang
 Fix For: 1.4.3, 1.5.1


# problem description:
 ## with the increase in the number of  'InternalTimer' object the checkpoint 
more and more slowly
 # improvement desgin
 ## maintain a stateTableVersion, which is exactly the same thing as 
CopyOnWriteStateTable and snapshotVersions which is exactly the same thing as 
CopyOnWriteStateTable in InternalTimeServiceManager. one more thing: a 
readwrite lock, which is used to protect snapshotVersions and stateTableVersion
 ## for each InternalTimer, add 2 more properties: create version and delete 
version beside 3 existing properties: timestamp, key and namespace. each time a 
Timer is registered in timerservice, it is created with stateTableVersion as 
its create version while delete version is -1. each time when timer is deleted 
in timerservice, it is marked delete for giving it a delete verison equals to 
stateTableVersion without physically delete it from timerservice.
 ## each time when try to snapshot timers, InternalTimeServiceManager increase 
its stateTableVersion and add this stateTableVersion in snapshotVersions. these 
2 operators are protected by write lock of InternalTimeServiceManager. that 
current stateTableVersion take as snapshot version of this snapshot
 ## shallow copy  tuples
 ## then use a another thread asynchronous snapshot whole things: 
keyserialized, namespaceserializer and timers. for timers which is not 
deleted(delete version is -1) and create version less than snapshot version, 
serialized it. for timers whose delete version is not -1 and is bigger than or 
equals snapshot version, serialized it. otherwise, it will not be serialized by 
this snapshot.
 ## when everything is serialized, remove snapshot version in snapshotVersions, 
which is still in another thread and this action is guarded by write lock.
 ## last thing: timer physical deletion. 2 places to physically delete timers: 
each time when timer is deleted in timerservice, it is marked delete for giving 
it a delete verison equals to stateTableVersion without physically delete it 
from timerservice. after this, check if snapshotVersions size is 0 (which means 
there is no running snapshot) and if true, delete timer .the other place to 
delete is in snapshot timer's iterat: when timer's delete version is less than 
min value of snapshotVersions, which means the timer is deleted and no running 
snapshot should keep it.
 ## some more additions: processingTimeTimers and eventTimeTimers for each 
group used to be hashset and now it is changed to concurrenthashmap with 
key+namesapce+timestamp as its hash key.
 # related mail list thread
 ## 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Slow-flink-checkpoint-td18946.html
 # github pull request
 ## //coming soon



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9181) Add SQL Client documentation page

2018-04-16 Thread Timo Walther (JIRA)
Timo Walther created FLINK-9181:
---

 Summary: Add SQL Client documentation page
 Key: FLINK-9181
 URL: https://issues.apache.org/jira/browse/FLINK-9181
 Project: Flink
  Issue Type: Improvement
  Components: Table API  SQL
Reporter: Timo Walther
Assignee: Timo Walther


The current implementation of the SQL Client implementation needs documentation 
for the upcoming 1.5 release. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9180) Remove REST_ prefix from rest options

2018-04-16 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-9180:
---

 Summary: Remove REST_ prefix from rest options
 Key: FLINK-9180
 URL: https://issues.apache.org/jira/browse/FLINK-9180
 Project: Flink
  Issue Type: Improvement
  Components: Configuration, REST
Affects Versions: 1.5.0
Reporter: Chesnay Schepler
 Fix For: 1.5.0


Several fields in the {{RestOptions}} class have a {{REST_}} prefix. So far we 
went with the convention that we do not have such prefixes if it already 
contained in the class name, hence we should remove it from the field names.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9179) Deduplicate WebOptions.PORT and RestOptions.REST_PORT

2018-04-16 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-9179:
---

 Summary: Deduplicate WebOptions.PORT and RestOptions.REST_PORT
 Key: FLINK-9179
 URL: https://issues.apache.org/jira/browse/FLINK-9179
 Project: Flink
  Issue Type: Improvement
  Components: Configuration, REST, Webfrontend
Affects Versions: 1.5.0
Reporter: Chesnay Schepler
 Fix For: 1.5.0


In the past {{WebOptions.PORT}} was used to configure the port on which the 
WebUI listens on. With the rework of the REST API we added a new configuration 
key {{RestOptions.REST_PORT}} to specify on which port the REST API listens on.

Effectively these 2 options control the same thing, with the rest option being 
broader and also applicable to components with a REST API but no WebUI.

I suggest to deprecate WebOptions.PORT, and add a deprecated key to 
{{RestOptions.REST_PORT}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9178) Add rate control for kafka source

2018-04-16 Thread buptljy (JIRA)
buptljy created FLINK-9178:
--

 Summary: Add rate control for kafka source
 Key: FLINK-9178
 URL: https://issues.apache.org/jira/browse/FLINK-9178
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: buptljy


When I want to run the flink program from the earliest offset in Kafka, it'll 
be very easy to cause OOM because of too many HeapMemorySegment in 
NetworkBufferPool.

I think we should support this "rerun" situation, which is very common for most 
businesses.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9177) Link under Installing Mesos goes to a 404 page

2018-04-16 Thread Arunan Sugunakumar (JIRA)
Arunan Sugunakumar created FLINK-9177:
-

 Summary: Link under Installing Mesos goes to a 404 page
 Key: FLINK-9177
 URL: https://issues.apache.org/jira/browse/FLINK-9177
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Arunan Sugunakumar


Under "Clusters and Depployment - Mesos" 
[guide|https://ci.apache.org/projects/flink/flink-docs-release-1.5/ops/deployment/mesos.html#installing-mesos],
 installing mesos points to an old link. 

The following link 

[http://mesos.apache.org/documentation/latest/getting-started/]

should be changed to 

[http://mesos.apache.org/documentation/getting-started/]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)