[GitHub] flink issue #2510: FLINK-3322 Allow drivers and iterators to reuse the memor...
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2510 @StephanEwen - Any comments/feedback here. A gentle reminder !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs
[ https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631588#comment-15631588 ] ASF GitHub Bot commented on FLINK-3322: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2510 @StephanEwen - Any comments/feedback here. A gentle reminder !! > MemoryManager creates too much GC pressure with iterative jobs > -- > > Key: FLINK-3322 > URL: https://issues.apache.org/jira/browse/FLINK-3322 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.0.0 >Reporter: Gabor Gevay >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 1.0.0 > > Attachments: FLINK-3322.docx, FLINK-3322_reusingmemoryfordrivers.docx > > > When taskmanager.memory.preallocate is false (the default), released memory > segments are not added to a pool, but the GC is expected to take care of > them. This puts too much pressure on the GC with iterative jobs, where the > operators reallocate all memory at every superstep. > See the following discussion on the mailing list: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html > Reproducing the issue: > https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc > The class to start is malom.Solver. If you increase the memory given to the > JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. > (It will generate some lookuptables to /tmp on first run for a few minutes.) > (I think the slowdown might also depend somewhat on > taskmanager.memory.fraction, because more unused non-managed memory results > in rarer GCs.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs
[ https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631587#comment-15631587 ] ASF GitHub Bot commented on FLINK-3322: --- Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2495 @StephanEwen - Any comments/feedback here. A gentle reminder !! > MemoryManager creates too much GC pressure with iterative jobs > -- > > Key: FLINK-3322 > URL: https://issues.apache.org/jira/browse/FLINK-3322 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 1.0.0 >Reporter: Gabor Gevay >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 1.0.0 > > Attachments: FLINK-3322.docx, FLINK-3322_reusingmemoryfordrivers.docx > > > When taskmanager.memory.preallocate is false (the default), released memory > segments are not added to a pool, but the GC is expected to take care of > them. This puts too much pressure on the GC with iterative jobs, where the > operators reallocate all memory at every superstep. > See the following discussion on the mailing list: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html > Reproducing the issue: > https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc > The class to start is malom.Solver. If you increase the memory given to the > JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. > (It will generate some lookuptables to /tmp on first run for a few minutes.) > (I think the slowdown might also depend somewhat on > taskmanager.memory.fraction, because more unused non-managed memory results > in rarer GCs.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2495: FLINK-3322 - Make sorters to reuse the memory pages alloc...
Github user ramkrish86 commented on the issue: https://github.com/apache/flink/pull/2495 @StephanEwen - Any comments/feedback here. A gentle reminder !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4576) Low Watermark Service in JobManager for Streaming Sources
[ https://issues.apache.org/jira/browse/FLINK-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631405#comment-15631405 ] Tzu-Li (Gordon) Tai commented on FLINK-4576: Ah, right. Got lost on this part mid-discussion. However, I have the feeling that letting all operators interact with this service (as pointed out previously, basically all stream operators need to work with the service) there's a bit of overkill in trying to solve what initially brought out the discussion of the low watermark service in the first place: some source subtasks like FlinkKafkaConsumer may initially be idle with no partitions to read from, in which we would want to emit the global low watermark across subtasks for them, instead of the max value watermark. So, in the beginning we were trying to aim for a source-only solution. But honestly I currently don't have better ideas on how to achieve that, because we allow users to assign watermarks at sources and in middle of topologies. > Low Watermark Service in JobManager for Streaming Sources > - > > Key: FLINK-4576 > URL: https://issues.apache.org/jira/browse/FLINK-4576 > Project: Flink > Issue Type: New Feature > Components: JobManager, Streaming, TaskManager >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.2.0 > > > As per discussion in FLINK-4341 by [~aljoscha] and [~StephanEwen], we need a > low watermark service in the JobManager to support transparent resharding / > partition discovery for our Kafka and Kinesis consumers (and any future > streaming connectors in general for which the external system may elastically > scale up and down independently of the parallelism of sources in Flink). The > main idea is to let source subtasks that don't emit their own watermarks > (because they currently don't have data partitions to consume) emit the low > watermark across all subtasks, instead of simply emitting a Long.MAX_VALUE > watermark and forbidding them to be assigned partitions in the future. > The proposed implementation, from a high-level: a {{LowWatermarkCoordinator}} > will be added to execution graphs, periodically triggering only the source > vertices with a {{RetrieveLowWatermark}} message. The tasks reply to the > JobManager through the actor gateway (or a new interface after FLINK-4456 > gets merged) with a {{ReplyLowWatermark}} message. When the coordinator > collects all low watermarks for a particular source vertex and determines the > aggregated low watermark for this round (accounting only values that are > larger than the aggregated low watermark of the last round), it sends a > {{NotifyNewLowWatermark}} message to the source vertex's tasks. > The messages will only be relevant to tasks that implement an internal > {{LowWatermarkCooperatingTask}} interface. For now, only {{SourceStreamTask}} > should implement {{LowWatermarkCooperatingTask}}. > Source functions should implement a public {{LowWatermarkListener}} interface > if they wish to get notified of the aggregated low watermarks across > subtasks. Connectors like the Kinesis consumer can choose to emit this > watermark if the subtask currently does not have any shards, so that > downstream operators may still properly advance time windows (implementation > for this is tracked as a separate issue). > Overall, the service will include - > New messages between JobManager <-> TaskManager: > {{RetrieveLowWatermark(jobId, jobVertexId, taskId, timestamp)}} > {{ReplyLowWatermark(jobId, jobVertexId, taskId, currentLowWatermark)}} > {{NotifyNewLowWatermark(jobId, jobVertexId, taskId, newLowWatermark, > timestamp)}} > New internal task interface {{LowWatermarkCooperatingTask}} in flink-runtime > New public interface {{LowWatermarkListener}} in flink-streaming-java > Might also need to extend {{SourceFunction.SourceContext}} to support > retrieving the current low watermark of sources. > Any feedback for this is appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4964) FlinkML - Add StringIndexer
[ https://issues.apache.org/jira/browse/FLINK-4964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630729#comment-15630729 ] ASF GitHub Bot commented on FLINK-4964: --- Github user tfournier314 commented on the issue: https://github.com/apache/flink/pull/2740 Yes, I've just updated the PR title > FlinkML - Add StringIndexer > --- > > Key: FLINK-4964 > URL: https://issues.apache.org/jira/browse/FLINK-4964 > Project: Flink > Issue Type: New Feature >Reporter: Thomas FOURNIER >Priority: Minor > > Add StringIndexer as described here: > http://spark.apache.org/docs/latest/ml-features.html#stringindexer > This will be added in package preprocessing of FlinkML -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2740: [FLINK-4964] [ml]
Github user tfournier314 commented on the issue: https://github.com/apache/flink/pull/2740 Yes, I've just updated the PR title --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630704#comment-15630704 ] Stephan Ewen commented on FLINK-4022: - Ah, okay, this is relevant when there are initially idle subtasks. When there are not, this is not a problem. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for handling topic repartitioning. > Overall, > User-facing API changes: > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > DeserializationSchema, Properties) > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > KeyedDeserializationSchema, Properties) > Implementation changes: > 1. Dynamic partition assigning depending on KafkaConsumer#subscribe > - Remove partition list querying from constructor > - Remove static partition assigning to substasks in run() > - Instead of using KafkaConsumer#assign() in fetchers to manually assign > static partitions, use subscribe() registered with the callback > implementation explained above. > 2. Restoring from checkpointed states > - Snapshotting should remain unchanged > - Restoring requires subtasks to coordinate the restored offsets they hold > before continuing (unless we are able to have merged restore states). > 3. For previous consumer functionality (consume from fixed list of topics), > the KafkaConsumer#subscribe() has a corresponding overload method for fixed > list of topics. We can simply decide which subscribe() overload to use > depending on whether a regex Pattern or list of topics is supplied. > 4. If subtasks don't initially have any assigned partitions, we shouldn't > emit MAX_VALUE watermark, since it may hold partitions after a rebalance. > Instead, un-assigned subtasks should be running a fetcher instance too and > take part as a process pool for the consumer group of the subscribed topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5002) Lack of synchronization in LocalBufferPool#getNumberOfUsedBuffers
Ted Yu created FLINK-5002: - Summary: Lack of synchronization in LocalBufferPool#getNumberOfUsedBuffers Key: FLINK-5002 URL: https://issues.apache.org/jira/browse/FLINK-5002 Project: Flink Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} public int getNumberOfUsedBuffers() { return numberOfRequestedMemorySegments - availableMemorySegments.size(); } {code} Access to availableMemorySegments should be protected with proper synchronization as other methods do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4992) Expose String parameter for timers in Timely functions and TimerService
[ https://issues.apache.org/jira/browse/FLINK-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630678#comment-15630678 ] Gyula Fora commented on FLINK-4992: --- Well, it doesn't have to be Strings but there should be a way of attaching some sort of metadata to the registered timer (like the namespace internally). Can you please explain what you meant by different timer services? Do you mean attaching metadata to the timerservice instead of the actual registered timer? > Expose String parameter for timers in Timely functions and TimerService > --- > > Key: FLINK-4992 > URL: https://issues.apache.org/jira/browse/FLINK-4992 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Gyula Fora >Priority: Minor > > Currently it is very hard to register and execute multiple different types > timers from the same user function because timers don't carry any metadata. > We propose to extend the timer registration and onTimer logic by attaching a > String argument so users of these features can implement functionality that > depends on this addtitional metadata. > The proposed new methods: > In the TimerService: > void registerProcessingTimeTimer(long time, String label); > void registerEventTimeTimer(long time, String label); > In the TimelyFunctions: > void onTimer(long timestamp, String label, TimeDomain timeDomain, > TimerService timerService...); > This extended functionality can be mapped to a String namespace for the > internal timer service. I suggest we don't use the term "namespace" here > because it just complicates things for the users, I think "label" or "id" or > "name" is much simpler to understand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4983) Web UI: Add favicon
[ https://issues.apache.org/jira/browse/FLINK-4983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630674#comment-15630674 ] ASF GitHub Bot commented on FLINK-4983: --- Github user siliconcat commented on the issue: https://github.com/apache/flink/pull/2737 Ok, what about this? Black and white for the favicon in browsers. In colour for the mobile screens. Generated using the favicon generator above... Couldn't test the iphone, as I don't have one. > Web UI: Add favicon > --- > > Key: FLINK-4983 > URL: https://issues.apache.org/jira/browse/FLINK-4983 > Project: Flink > Issue Type: Improvement > Components: Webfrontend >Reporter: Mischa Krüger >Priority: Trivial > > Makes the tab easier to find when having multiple tabs open :) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2737: [FLINK-4983] Web UI: Add favicon
Github user siliconcat commented on the issue: https://github.com/apache/flink/pull/2737 Ok, what about this? Black and white for the favicon in browsers. In colour for the mobile screens. Generated using the favicon generator above... Couldn't test the iphone, as I don't have one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4990) Remove JAR option from savepoint disposal
[ https://issues.apache.org/jira/browse/FLINK-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630637#comment-15630637 ] Stephan Ewen commented on FLINK-4990: - +1 > Remove JAR option from savepoint disposal > - > > Key: FLINK-4990 > URL: https://issues.apache.org/jira/browse/FLINK-4990 > Project: Flink > Issue Type: Improvement > Components: Startup Shell Scripts >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > Fix For: 1.2.0 > > > For 1.1 we needed to have the user JAR present to dispose savepoints. With > the recent state refactorings this is not necessary anymore and we should > deprecate/remove this option for 1.2. In general, disposal of savepoints > might not be the business of Flink, but that's a question for a different > release. > [~StephanEwen] do you concur? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4992) Expose String parameter for timers in Timely functions and TimerService
[ https://issues.apache.org/jira/browse/FLINK-4992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630631#comment-15630631 ] Stephan Ewen commented on FLINK-4992: - Would having different timer services solve this? The side of me that cares about performance and efficiency goes into a shock when thinking about attaching strings to every element ;-) > Expose String parameter for timers in Timely functions and TimerService > --- > > Key: FLINK-4992 > URL: https://issues.apache.org/jira/browse/FLINK-4992 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Gyula Fora >Priority: Minor > > Currently it is very hard to register and execute multiple different types > timers from the same user function because timers don't carry any metadata. > We propose to extend the timer registration and onTimer logic by attaching a > String argument so users of these features can implement functionality that > depends on this addtitional metadata. > The proposed new methods: > In the TimerService: > void registerProcessingTimeTimer(long time, String label); > void registerEventTimeTimer(long time, String label); > In the TimelyFunctions: > void onTimer(long timestamp, String label, TimeDomain timeDomain, > TimerService timerService...); > This extended functionality can be mapped to a String namespace for the > internal timer service. I suggest we don't use the term "namespace" here > because it just complicates things for the users, I think "label" or "id" or > "name" is much simpler to understand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4999) Improve accuracy of SingleInputGate#getNumberOfQueuedBuffers
[ https://issues.apache.org/jira/browse/FLINK-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630624#comment-15630624 ] Stephan Ewen commented on FLINK-4999: - I am skeptical about locks acquired by metrics that block the data execution paths. The philosophy for metrics should be to never interfere with the processing threads - certainly not block them. > Improve accuracy of SingleInputGate#getNumberOfQueuedBuffers > > > Key: FLINK-4999 > URL: https://issues.apache.org/jira/browse/FLINK-4999 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Priority: Minor > > {code} > // re-try 3 times, if fails, return 0 for "unknown" > for (int retry = 0; retry < 3; retry++) { > ... > catch (Exception ignored) {} > {code} > There is no synchronization around accessing inputChannels currently. > Therefore the method expects potential exception. > Upon the 3rd try, synchronization should be taken w.r.t. inputChannels so > that the return value is accurate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
[ https://issues.apache.org/jira/browse/FLINK-5001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-5001: Priority: Blocker (was: Critical) > Ensure that the Kafka 0.9+ connector is compatible with > kafka-consumer-groups.sh > > > Key: FLINK-5001 > URL: https://issues.apache.org/jira/browse/FLINK-5001 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Affects Versions: 1.1.0, 1.2.0 >Reporter: Robert Metzger >Priority: Blocker > > Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer > are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4315) Deprecate Hadoop dependent methods in flink-java
[ https://issues.apache.org/jira/browse/FLINK-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-4315. -- Resolution: Done Fix Version/s: (was: 2.0.0) 1.2.0 Done for 1.2.0 with 7d61e1f2fd0c9b0e3719b2d7252a164cfdf941c4 Thanks for the contribution [~kenmy]! > Deprecate Hadoop dependent methods in flink-java > > > Key: FLINK-4315 > URL: https://issues.apache.org/jira/browse/FLINK-4315 > Project: Flink > Issue Type: Task > Components: Java API >Reporter: Stephan Ewen >Assignee: Evgeny Kincharov > Fix For: 1.2.0 > > > The API projects should be independent of Hadoop, because Hadoop is not an > integral part of the Flink stack, and we should have the option to offer > Flink without Hadoop dependencies. > The current batch APIs have a hard dependency on Hadoop, mainly because the > API has utility methods like `readHadoopFile(...)`. > I suggest to deprecate those methods and add helpers in the > `flink-hadoop-compatibility` project. > FLINK-4048 will later remove the deprecated methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4743) The sqrt/power function not accept the real data types.
[ https://issues.apache.org/jira/browse/FLINK-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-4743. Resolution: Fixed Fix Version/s: 1.2.0 Fixed for 1.2.0 with 4565170088595838ec53f3ca9b898126c62abbbc > The sqrt/power function not accept the real data types. > --- > > Key: FLINK-4743 > URL: https://issues.apache.org/jira/browse/FLINK-4743 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.1 >Reporter: Anton Mushin >Assignee: Anton Solovev > Fix For: 1.2.0 > > > At time calculate the sequences sql aggregate functions for real type column, > got exception: No applicable constructor/method found for actual parameters > "float, java.math.BigDecimal" > And for column of integer type the problem does not occur. > Code reproduce: > {code} > @Test > def test():Unit={ > val env = ExecutionEnvironment.getExecutionEnvironment > val tEnv = TableEnvironment.getTableEnvironment(env, config) > val ds = env.fromElements( > (1.0f, 1), > (2.0f, 2)).toTable(tEnv) > tEnv.registerTable("MyTable", ds) > val sqlQuery = "SELECT " + > "SQRT((SUM(a*a) - SUM(a)*SUM(a)/COUNT(a))/COUNT(a)) "+ > "from (select _1 as a from MyTable)" > tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(x=>print(s"$x ")) > } > {code} > got exception: > {noformat} > org.apache.flink.api.common.InvalidProgramException: Table program cannot be > compiled. This is a bug. Please file an issue. > at > org.apache.flink.api.table.runtime.FunctionCompiler$class.compile(FunctionCompiler.scala:37) > at > org.apache.flink.api.table.runtime.FlatMapRunner.compile(FlatMapRunner.scala:28) > at > org.apache.flink.api.table.runtime.FlatMapRunner.open(FlatMapRunner.scala:42) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38) > at > org.apache.flink.api.common.operators.base.FlatMapOperatorBase.executeOnCollections(FlatMapOperatorBase.java:62) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:249) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:147) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:180) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:156) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:113) > at > org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:47) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:42) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637) > at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547) > at > org.apache.flink.api.scala.batch.sql.AggregationsITCase.test(AggregationsITCase.scala:307) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at
[jira] [Closed] (FLINK-4623) Create Physical Execution Plan of a DataStream
[ https://issues.apache.org/jira/browse/FLINK-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-4623. Resolution: Fixed Fix Version/s: 1.2.0 Fixed for 1.2.0 with d60fe723aa357733c6ad8715b0e8c4e55ab7f52d Thanks for the contribution [~tonycox]! > Create Physical Execution Plan of a DataStream > -- > > Key: FLINK-4623 > URL: https://issues.apache.org/jira/browse/FLINK-4623 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Anton Solovev > Labels: starter > Fix For: 1.2.0 > > > The {{StreamTableEnvironment#explain(Table)}} command for tables of a > {{StreamTableEnvironment}} only outputs the abstract syntax tree. It would be > helpful if the {{explain}} method could also generate a string from the > {{DataStream}} containing a physical execution plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4943) flink-mesos/ConfigConstants: Typo: YYARN -> YARN
[ https://issues.apache.org/jira/browse/FLINK-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-4943. Resolution: Fixed Assignee: Mischa Krüger Fix Version/s: 1.2.0 Fixed for 1.2.0 with ed6a602b34d185c1482b60b06ff585d08dab308b > flink-mesos/ConfigConstants: Typo: YYARN -> YARN > > > Key: FLINK-4943 > URL: https://issues.apache.org/jira/browse/FLINK-4943 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Mischa Krüger >Assignee: Mischa Krüger >Priority: Trivial > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4996) Make CrossHint @Public
[ https://issues.apache.org/jira/browse/FLINK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-4996. Resolution: Fixed Fix Version/s: 1.1.4 Fixed for 1.2.0 with 6346a89972416489bc43ee30946078341496d1e1 Fixed for 1.1.4 with 6e57e7f463873cc9a36c71e3edde91a724bd48a6 > Make CrossHint @Public > -- > > Key: FLINK-4996 > URL: https://issues.apache.org/jira/browse/FLINK-4996 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.2.0, 1.1.4 > > > {{CrossHint}} should be annotated {{@Public}} as is {{JoinHint}}. It is > currently marked {{@Internal}} by its enclosing class {{CrossOperatorBase}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4943) flink-mesos/ConfigConstants: Typo: YYARN -> YARN
[ https://issues.apache.org/jira/browse/FLINK-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630434#comment-15630434 ] ASF GitHub Bot commented on FLINK-4943: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2704 > flink-mesos/ConfigConstants: Typo: YYARN -> YARN > > > Key: FLINK-4943 > URL: https://issues.apache.org/jira/browse/FLINK-4943 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Mischa Krüger >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4996) Make CrossHint @Public
[ https://issues.apache.org/jira/browse/FLINK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630430#comment-15630430 ] ASF GitHub Bot commented on FLINK-4996: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2743 > Make CrossHint @Public > -- > > Key: FLINK-4996 > URL: https://issues.apache.org/jira/browse/FLINK-4996 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.2.0 > > > {{CrossHint}} should be annotated {{@Public}} as is {{JoinHint}}. It is > currently marked {{@Internal}} by its enclosing class {{CrossOperatorBase}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2637: [FLINK-4315] Deprecate Hadoop dependent methods in...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2637 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4743) The sqrt/power function not accept the real data types.
[ https://issues.apache.org/jira/browse/FLINK-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630431#comment-15630431 ] ASF GitHub Bot commented on FLINK-4743: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2686 > The sqrt/power function not accept the real data types. > --- > > Key: FLINK-4743 > URL: https://issues.apache.org/jira/browse/FLINK-4743 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.1 >Reporter: Anton Mushin >Assignee: Anton Solovev > > At time calculate the sequences sql aggregate functions for real type column, > got exception: No applicable constructor/method found for actual parameters > "float, java.math.BigDecimal" > And for column of integer type the problem does not occur. > Code reproduce: > {code} > @Test > def test():Unit={ > val env = ExecutionEnvironment.getExecutionEnvironment > val tEnv = TableEnvironment.getTableEnvironment(env, config) > val ds = env.fromElements( > (1.0f, 1), > (2.0f, 2)).toTable(tEnv) > tEnv.registerTable("MyTable", ds) > val sqlQuery = "SELECT " + > "SQRT((SUM(a*a) - SUM(a)*SUM(a)/COUNT(a))/COUNT(a)) "+ > "from (select _1 as a from MyTable)" > tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(x=>print(s"$x ")) > } > {code} > got exception: > {noformat} > org.apache.flink.api.common.InvalidProgramException: Table program cannot be > compiled. This is a bug. Please file an issue. > at > org.apache.flink.api.table.runtime.FunctionCompiler$class.compile(FunctionCompiler.scala:37) > at > org.apache.flink.api.table.runtime.FlatMapRunner.compile(FlatMapRunner.scala:28) > at > org.apache.flink.api.table.runtime.FlatMapRunner.open(FlatMapRunner.scala:42) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38) > at > org.apache.flink.api.common.operators.base.FlatMapOperatorBase.executeOnCollections(FlatMapOperatorBase.java:62) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:249) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:147) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:180) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:156) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:113) > at > org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:47) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:42) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637) > at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547) > at > org.apache.flink.api.scala.batch.sql.AggregationsITCase.test(AggregationsITCase.scala:307) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at
[jira] [Commented] (FLINK-4315) Deprecate Hadoop dependent methods in flink-java
[ https://issues.apache.org/jira/browse/FLINK-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630433#comment-15630433 ] ASF GitHub Bot commented on FLINK-4315: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2637 > Deprecate Hadoop dependent methods in flink-java > > > Key: FLINK-4315 > URL: https://issues.apache.org/jira/browse/FLINK-4315 > Project: Flink > Issue Type: Task > Components: Java API >Reporter: Stephan Ewen >Assignee: Evgeny Kincharov > Fix For: 2.0.0 > > > The API projects should be independent of Hadoop, because Hadoop is not an > integral part of the Flink stack, and we should have the option to offer > Flink without Hadoop dependencies. > The current batch APIs have a hard dependency on Hadoop, mainly because the > API has utility methods like `readHadoopFile(...)`. > I suggest to deprecate those methods and add helpers in the > `flink-hadoop-compatibility` project. > FLINK-4048 will later remove the deprecated methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4623) Create Physical Execution Plan of a DataStream
[ https://issues.apache.org/jira/browse/FLINK-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630432#comment-15630432 ] ASF GitHub Bot commented on FLINK-4623: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2720 > Create Physical Execution Plan of a DataStream > -- > > Key: FLINK-4623 > URL: https://issues.apache.org/jira/browse/FLINK-4623 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Anton Solovev > Labels: starter > > The {{StreamTableEnvironment#explain(Table)}} command for tables of a > {{StreamTableEnvironment}} only outputs the abstract syntax tree. It would be > helpful if the {{explain}} method could also generate a string from the > {{DataStream}} containing a physical execution plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2743: [FLINK-4996] [core] Make CrossHint @Public
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2743 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2686: [FLINK-4743] The sqrt/power function not accept th...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2686 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2704: [FLINK-4943] ConfigConstants: YYARN -> YARN
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2704 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2720: [FLINK-4623] Create Physical Execution Plan of a D...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2720 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-5001) Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh
Robert Metzger created FLINK-5001: - Summary: Ensure that the Kafka 0.9+ connector is compatible with kafka-consumer-groups.sh Key: FLINK-5001 URL: https://issues.apache.org/jira/browse/FLINK-5001 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.1.0, 1.2.0 Reporter: Robert Metzger Priority: Critical Similarly to FLINK-4822, the offsets committed by Flink's Kafka 0.9+ consumer are not available through the {{kafka-consumer-groups.sh}} tool. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629801#comment-15629801 ] Tzu-Li (Gordon) Tai edited comment on FLINK-4022 at 11/2/16 5:48 PM: - Hi [~StephanEwen], this is what I recall from FLINK-4341: Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a means to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. was (Author: tzulitai): Hi [~StephanEwen], this is what I recall from FLINK-4341: Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a means to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for
[jira] [Comment Edited] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629801#comment-15629801 ] Tzu-Li (Gordon) Tai edited comment on FLINK-4022 at 11/2/16 5:48 PM: - Hi [~StephanEwen], this is what I recall from FLINK-4341: Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a means to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. was (Author: tzulitai): Hi [~StephanEwen], this is what I recall from FLINK-4341: Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a mean to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition
[jira] [Commented] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629801#comment-15629801 ] Tzu-Li (Gordon) Tai commented on FLINK-4022: Hi [~StephanEwen], Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a mean to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for handling topic repartitioning. > Overall, > User-facing API changes: > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > DeserializationSchema, Properties) > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > KeyedDeserializationSchema, Properties) > Implementation changes: > 1. Dynamic partition assigning depending on KafkaConsumer#subscribe > - Remove partition list querying from constructor > - Remove static partition assigning to substasks in run() > - Instead of using KafkaConsumer#assign() in fetchers to manually assign > static partitions, use subscribe() registered with the callback > implementation explained above. > 2. Restoring from checkpointed states > - Snapshotting should remain unchanged > - Restoring requires subtasks to coordinate the restored offsets they hold > before continuing (unless we are able to have merged restore states). > 3. For previous consumer functionality
[jira] [Comment Edited] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629801#comment-15629801 ] Tzu-Li (Gordon) Tai edited comment on FLINK-4022 at 11/2/16 5:47 PM: - Hi [~StephanEwen], this is what I recall from FLINK-4341: Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a mean to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. was (Author: tzulitai): Hi [~StephanEwen], Right now, for subtasks that do not have partitions assigned to it, we will emit a max value watermark from that subtask because it will not read any data. However, with partition discovery enabled, subtasks which initially do not have partitions might be assigned one later on and start reading data, and this will mess up the watermarks downstream. We also cannot let idle subtasks not emit any watermarks and just wait for partitions be assigned to it, because then downstream time window states will unboundly accumulate (FLINK-4341). So, the watermark service came around as a mean to let source subtasks that currently do not read any data (thus cannot emit any watermarks for event time) emit the global low watermark across subtasks instead. For this I think we still need the watermark service. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run
[jira] [Created] (FLINK-5000) Rename Methods in ManagedInitializationContext
Aljoscha Krettek created FLINK-5000: --- Summary: Rename Methods in ManagedInitializationContext Key: FLINK-5000 URL: https://issues.apache.org/jira/browse/FLINK-5000 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 1.2.0 Reporter: Aljoscha Krettek Priority: Blocker We should rename {{getManagedOperatorStateStore()}} to {{getOperatorStateStore()}} and {{getManagedKeyedStateStore()}} to {{getKeyedStateStore()}}. There are no unmanaged stores and having that extra word there seems a bit confusing, plus it makes the names longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5000) Rename Methods in ManagedInitializationContext
[ https://issues.apache.org/jira/browse/FLINK-5000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629745#comment-15629745 ] Aljoscha Krettek commented on FLINK-5000: - CC: [~srichter] > Rename Methods in ManagedInitializationContext > -- > > Key: FLINK-5000 > URL: https://issues.apache.org/jira/browse/FLINK-5000 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.2.0 >Reporter: Aljoscha Krettek >Priority: Blocker > > We should rename {{getManagedOperatorStateStore()}} to > {{getOperatorStateStore()}} and {{getManagedKeyedStateStore()}} to > {{getKeyedStateStore()}}. There are no unmanaged stores and having that extra > word there seems a bit confusing, plus it makes the names longer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4996) Make CrossHint @Public
[ https://issues.apache.org/jira/browse/FLINK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629658#comment-15629658 ] ASF GitHub Bot commented on FLINK-4996: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2743 Merging > Make CrossHint @Public > -- > > Key: FLINK-4996 > URL: https://issues.apache.org/jira/browse/FLINK-4996 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.2.0 > > > {{CrossHint}} should be annotated {{@Public}} as is {{JoinHint}}. It is > currently marked {{@Internal}} by its enclosing class {{CrossOperatorBase}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2704: [FLINK-4943] ConfigConstants: YYARN -> YARN
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2704 Thanks for the fix @Makman2! Merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4623) Create Physical Execution Plan of a DataStream
[ https://issues.apache.org/jira/browse/FLINK-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629676#comment-15629676 ] ASF GitHub Bot commented on FLINK-4623: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2720 Merging :-) > Create Physical Execution Plan of a DataStream > -- > > Key: FLINK-4623 > URL: https://issues.apache.org/jira/browse/FLINK-4623 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Anton Solovev > Labels: starter > > The {{StreamTableEnvironment#explain(Table)}} command for tables of a > {{StreamTableEnvironment}} only outputs the abstract syntax tree. It would be > helpful if the {{explain}} method could also generate a string from the > {{DataStream}} containing a physical execution plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2720: [FLINK-4623] Create Physical Execution Plan of a DataStre...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2720 Merging :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2686: [FLINK-4743] The sqrt/power function not accept the real ...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2686 Merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4943) flink-mesos/ConfigConstants: Typo: YYARN -> YARN
[ https://issues.apache.org/jira/browse/FLINK-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629675#comment-15629675 ] ASF GitHub Bot commented on FLINK-4943: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2704 Thanks for the fix @Makman2! Merging > flink-mesos/ConfigConstants: Typo: YYARN -> YARN > > > Key: FLINK-4943 > URL: https://issues.apache.org/jira/browse/FLINK-4943 > Project: Flink > Issue Type: Improvement > Components: Documentation >Reporter: Mischa Krüger >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4743) The sqrt/power function not accept the real data types.
[ https://issues.apache.org/jira/browse/FLINK-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629680#comment-15629680 ] ASF GitHub Bot commented on FLINK-4743: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2686 Merging > The sqrt/power function not accept the real data types. > --- > > Key: FLINK-4743 > URL: https://issues.apache.org/jira/browse/FLINK-4743 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.1 >Reporter: Anton Mushin >Assignee: Anton Solovev > > At time calculate the sequences sql aggregate functions for real type column, > got exception: No applicable constructor/method found for actual parameters > "float, java.math.BigDecimal" > And for column of integer type the problem does not occur. > Code reproduce: > {code} > @Test > def test():Unit={ > val env = ExecutionEnvironment.getExecutionEnvironment > val tEnv = TableEnvironment.getTableEnvironment(env, config) > val ds = env.fromElements( > (1.0f, 1), > (2.0f, 2)).toTable(tEnv) > tEnv.registerTable("MyTable", ds) > val sqlQuery = "SELECT " + > "SQRT((SUM(a*a) - SUM(a)*SUM(a)/COUNT(a))/COUNT(a)) "+ > "from (select _1 as a from MyTable)" > tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(x=>print(s"$x ")) > } > {code} > got exception: > {noformat} > org.apache.flink.api.common.InvalidProgramException: Table program cannot be > compiled. This is a bug. Please file an issue. > at > org.apache.flink.api.table.runtime.FunctionCompiler$class.compile(FunctionCompiler.scala:37) > at > org.apache.flink.api.table.runtime.FlatMapRunner.compile(FlatMapRunner.scala:28) > at > org.apache.flink.api.table.runtime.FlatMapRunner.open(FlatMapRunner.scala:42) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38) > at > org.apache.flink.api.common.operators.base.FlatMapOperatorBase.executeOnCollections(FlatMapOperatorBase.java:62) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:249) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:147) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:180) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:156) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:113) > at > org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:47) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:42) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637) > at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547) > at > org.apache.flink.api.scala.batch.sql.AggregationsITCase.test(AggregationsITCase.scala:307) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at
[GitHub] flink pull request #2658: [FLINK-4850] [ml] FlinkML - SVM predict Operation ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2658 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2743: [FLINK-4996] [core] Make CrossHint @Public
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2743 Merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4850) FlinkML - SVM predict Operation for Vector and not LaveledVector
[ https://issues.apache.org/jira/browse/FLINK-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629666#comment-15629666 ] ASF GitHub Bot commented on FLINK-4850: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2658 > FlinkML - SVM predict Operation for Vector and not LaveledVector > > > Key: FLINK-4850 > URL: https://issues.apache.org/jira/browse/FLINK-4850 > Project: Flink > Issue Type: Bug >Reporter: Thomas FOURNIER >Assignee: Theodore Vasiloudis >Priority: Minor > > It seems that evaluate operation is defined for Vector and not LabeledVector. > It impacts QuickStart guide for FlinkML when using SVM. > We need to update the documentation as follows: > val astroTest:DataSet[(Vector,Double)] = MLUtils > .readLibSVM(env, "src/main/resources/svmguide1.t") > .map(l => (l.vector, l.label)) > val predictionPairs = svm.evaluate(astroTest) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4850) FlinkML - SVM predict Operation for Vector and not LaveledVector
[ https://issues.apache.org/jira/browse/FLINK-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann resolved FLINK-4850. -- Resolution: Fixed Fixed via da991aebb038b13a2d34344cf456c32feb4222dd > FlinkML - SVM predict Operation for Vector and not LaveledVector > > > Key: FLINK-4850 > URL: https://issues.apache.org/jira/browse/FLINK-4850 > Project: Flink > Issue Type: Bug >Reporter: Thomas FOURNIER >Assignee: Theodore Vasiloudis >Priority: Minor > > It seems that evaluate operation is defined for Vector and not LabeledVector. > It impacts QuickStart guide for FlinkML when using SVM. > We need to update the documentation as follows: > val astroTest:DataSet[(Vector,Double)] = MLUtils > .readLibSVM(env, "src/main/resources/svmguide1.t") > .map(l => (l.vector, l.label)) > val predictionPairs = svm.evaluate(astroTest) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4315) Deprecate Hadoop dependent methods in flink-java
[ https://issues.apache.org/jira/browse/FLINK-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629654#comment-15629654 ] ASF GitHub Bot commented on FLINK-4315: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2637 Merging > Deprecate Hadoop dependent methods in flink-java > > > Key: FLINK-4315 > URL: https://issues.apache.org/jira/browse/FLINK-4315 > Project: Flink > Issue Type: Task > Components: Java API >Reporter: Stephan Ewen >Assignee: Evgeny Kincharov > Fix For: 2.0.0 > > > The API projects should be independent of Hadoop, because Hadoop is not an > integral part of the Flink stack, and we should have the option to offer > Flink without Hadoop dependencies. > The current batch APIs have a hard dependency on Hadoop, mainly because the > API has utility methods like `readHadoopFile(...)`. > I suggest to deprecate those methods and add helpers in the > `flink-hadoop-compatibility` project. > FLINK-4048 will later remove the deprecated methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2637: [FLINK-4315] Deprecate Hadoop dependent methods in flink-...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2637 Merging --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4850) FlinkML - SVM predict Operation for Vector and not LaveledVector
[ https://issues.apache.org/jira/browse/FLINK-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629649#comment-15629649 ] ASF GitHub Bot commented on FLINK-4850: --- Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2658 Thanks for your contribution @thvasilo. Changes look good. Will merge your PR. > FlinkML - SVM predict Operation for Vector and not LaveledVector > > > Key: FLINK-4850 > URL: https://issues.apache.org/jira/browse/FLINK-4850 > Project: Flink > Issue Type: Bug >Reporter: Thomas FOURNIER >Assignee: Theodore Vasiloudis >Priority: Minor > > It seems that evaluate operation is defined for Vector and not LabeledVector. > It impacts QuickStart guide for FlinkML when using SVM. > We need to update the documentation as follows: > val astroTest:DataSet[(Vector,Double)] = MLUtils > .readLibSVM(env, "src/main/resources/svmguide1.t") > .map(l => (l.vector, l.label)) > val predictionPairs = svm.evaluate(astroTest) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2658: [FLINK-4850] [ml] FlinkML - SVM predict Operation for Vec...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2658 Thanks for your contribution @thvasilo. Changes look good. Will merge your PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2684: Add EvaluateDataSet Operation for LabeledVector - This cl...
Github user tillrohrmann commented on the issue: https://github.com/apache/flink/pull/2684 Thanks for your contribution @tfournier314. I'm wondering whether we should introduce a new `EvaluateDataSetOperation` for the special case of a `LabeledVector`. Can't you easily achieve the same by `val evaluateDS = labeledVectorDS.map(x => (x.vector, x.label))` and then using `evaluateDS` for the `evaluate` operation? In general I think it would be better to write a generic `EvaluateDataSetOperation` which can take a `ValueExtractor` which can extract the value from the `Testing` type. Then we could offer a `ValueExtractor` for the `LabeledVector`. That way we could extend it easily for different types as well. A general remark: It's always helpful to write a PR description and format the title according to the contribution guidelines [1]. Furthermore, your code does not adhere to the existing code style. Even though Flink does not have a strict code style, it is always good to stick to the code style of the existing code. [1] http://flink.apache.org/how-to-contribute.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2741: [FLINK-4998][yarn] fail if too many task slots are config...
Github user mxm commented on the issue: https://github.com/apache/flink/pull/2741 Added a test case to verify the error reporting. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4998) ResourceManager fails when num task slots > Yarn vcores
[ https://issues.apache.org/jira/browse/FLINK-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629550#comment-15629550 ] ASF GitHub Bot commented on FLINK-4998: --- Github user mxm commented on the issue: https://github.com/apache/flink/pull/2741 Added a test case to verify the error reporting. > ResourceManager fails when num task slots > Yarn vcores > --- > > Key: FLINK-4998 > URL: https://issues.apache.org/jira/browse/FLINK-4998 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN Client >Affects Versions: 1.2.0, 1.1.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.2.0 > > > The ResourceManager fails to acquire containers when the users configures the > number of task slots to be greater than the maximum number of virtual cores > of the Yarn cluster. > We should check during deployment that the task slots are not configured to > be larger than the virtual cores. > {noformat} > 2016-11-02 14:39:01,948 ERROR org.apache.flink.yarn.YarnFlinkResourceManager > - FATAL ERROR IN YARN APPLICATION MASTER: Connection to YARN > Resource Manager failed > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=3, maxVirtualCores=1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4996) Make CrossHint @Public
[ https://issues.apache.org/jira/browse/FLINK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629521#comment-15629521 ] ASF GitHub Bot commented on FLINK-4996: --- GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/2743 [FLINK-4996] [core] Make CrossHint @Public You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 4996_make_crosshint_public Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2743.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2743 commit b6e97b9212f069100f1e19d632b821c4a7c50b8f Author: Greg HoganDate: 2016-11-02T15:02:51Z [FLINK-4996] [core] Make CrossHint @Public > Make CrossHint @Public > -- > > Key: FLINK-4996 > URL: https://issues.apache.org/jira/browse/FLINK-4996 > Project: Flink > Issue Type: Improvement > Components: Core >Affects Versions: 1.2.0 >Reporter: Greg Hogan >Assignee: Greg Hogan >Priority: Trivial > Fix For: 1.2.0 > > > {{CrossHint}} should be annotated {{@Public}} as is {{JoinHint}}. It is > currently marked {{@Internal}} by its enclosing class {{CrossOperatorBase}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2743: [FLINK-4996] [core] Make CrossHint @Public
GitHub user greghogan opened a pull request: https://github.com/apache/flink/pull/2743 [FLINK-4996] [core] Make CrossHint @Public You can merge this pull request into a Git repository by running: $ git pull https://github.com/greghogan/flink 4996_make_crosshint_public Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2743.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2743 commit b6e97b9212f069100f1e19d632b821c4a7c50b8f Author: Greg HoganDate: 2016-11-02T15:02:51Z [FLINK-4996] [core] Make CrossHint @Public --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2637: [FLINK-4315] Deprecate Hadoop dependent methods in flink-...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2637 Thanks for the update @kenmy! +1 to merge. Regarding moving the Hadoop tests from `flink-tests` to `flink-hadoop-compatibility` I agree. Let's do this as a separate issue. Do you want to create a JIRA issue for that? Thanks, Fabian --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4315) Deprecate Hadoop dependent methods in flink-java
[ https://issues.apache.org/jira/browse/FLINK-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629424#comment-15629424 ] ASF GitHub Bot commented on FLINK-4315: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2637 Thanks for the update @kenmy! +1 to merge. Regarding moving the Hadoop tests from `flink-tests` to `flink-hadoop-compatibility` I agree. Let's do this as a separate issue. Do you want to create a JIRA issue for that? Thanks, Fabian > Deprecate Hadoop dependent methods in flink-java > > > Key: FLINK-4315 > URL: https://issues.apache.org/jira/browse/FLINK-4315 > Project: Flink > Issue Type: Task > Components: Java API >Reporter: Stephan Ewen >Assignee: Evgeny Kincharov > Fix For: 2.0.0 > > > The API projects should be independent of Hadoop, because Hadoop is not an > integral part of the Flink stack, and we should have the option to offer > Flink without Hadoop dependencies. > The current batch APIs have a hard dependency on Hadoop, mainly because the > API has utility methods like `readHadoopFile(...)`. > I suggest to deprecate those methods and add helpers in the > `flink-hadoop-compatibility` project. > FLINK-4048 will later remove the deprecated methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4999) Improve accuracy of SingleInputGate#getNumberOfQueuedBuffers
Ted Yu created FLINK-4999: - Summary: Improve accuracy of SingleInputGate#getNumberOfQueuedBuffers Key: FLINK-4999 URL: https://issues.apache.org/jira/browse/FLINK-4999 Project: Flink Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} // re-try 3 times, if fails, return 0 for "unknown" for (int retry = 0; retry < 3; retry++) { ... catch (Exception ignored) {} {code} There is no synchronization around accessing inputChannels currently. Therefore the method expects potential exception. Upon the 3rd try, synchronization should be taken w.r.t. inputChannels so that the return value is accurate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2741: [FLINK-4998][yarn] fail if too many task slots are...
GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/2741 [FLINK-4998][yarn] fail if too many task slots are configured This fails the deployment of the Yarn application if the number of task slots are configured to be larger than the maximum virtual cores of the Yarn cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink FLINK-4998 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2741.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2741 commit 35c4ad3cb086abe6fa85c5755daa8a83fbdfbf56 Author: Maximilian MichelsDate: 2016-11-02T15:37:56Z [FLINK-4998][yarn] fail if too many task slots are configured This fails the deployment of the Yarn application if the number of task slots are configured to be larger than the maximum virtual cores of the Yarn cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2742: [FLINK-4944] Replace Akka's death watch with own h...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2742 [FLINK-4944] Replace Akka's death watch with own heartbeat on the TM side This PR introduces the HeartbeatActor which is used by the TaskManager to monitor the JobManager. The HeartbeatActor constantly sends Heartbeat messages to the JobManager which responds with a HeartbeatResponse. If the HeartbeatResponse fails to be received for an acceptable heartbeat pause, then the HeartbeatActor sends a HeartbeatTimeout message to the owner of the HeartbeatActor. The acceptable heartbeat pause can be extended by the HeartbeatActor if it detects that it has been stalled by garbage collection, for example. The HeartbeatActor is started as a child actor of the TaskManager. Add ClusterOptions Add comments You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink removeDeathWatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2742.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2742 commit 4437ef25a3f7a084b3f1a577411a7863410bfde3 Author: Till RohrmannDate: 2016-11-01T20:14:40Z [FLINK-4944] Replace Akka's death watch with own heartbeat on the TM side This PR introduces the HeartbeatActor which is used by the TaskManager to monitor the JobManager. The HeartbeatActor constantly sends Heartbeat messages to the JobManager which responds with a HeartbeatResponse. If the HeartbeatResponse fails to be received for an acceptable heartbeat pause, then the HeartbeatActor sends a HeartbeatTimeout message to the owner of the HeartbeatActor. The acceptable heartbeat pause can be extended by the HeartbeatActor if it detects that it has been stalled by garbage collection, for example. The HeartbeatActor is started as a child actor of the TaskManager. Add ClusterOptions Add comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4944) Replace Akka's death watch with own heartbeat on the TM side
[ https://issues.apache.org/jira/browse/FLINK-4944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629399#comment-15629399 ] ASF GitHub Bot commented on FLINK-4944: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/2742 [FLINK-4944] Replace Akka's death watch with own heartbeat on the TM side This PR introduces the HeartbeatActor which is used by the TaskManager to monitor the JobManager. The HeartbeatActor constantly sends Heartbeat messages to the JobManager which responds with a HeartbeatResponse. If the HeartbeatResponse fails to be received for an acceptable heartbeat pause, then the HeartbeatActor sends a HeartbeatTimeout message to the owner of the HeartbeatActor. The acceptable heartbeat pause can be extended by the HeartbeatActor if it detects that it has been stalled by garbage collection, for example. The HeartbeatActor is started as a child actor of the TaskManager. Add ClusterOptions Add comments You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink removeDeathWatch Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2742.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2742 commit 4437ef25a3f7a084b3f1a577411a7863410bfde3 Author: Till RohrmannDate: 2016-11-01T20:14:40Z [FLINK-4944] Replace Akka's death watch with own heartbeat on the TM side This PR introduces the HeartbeatActor which is used by the TaskManager to monitor the JobManager. The HeartbeatActor constantly sends Heartbeat messages to the JobManager which responds with a HeartbeatResponse. If the HeartbeatResponse fails to be received for an acceptable heartbeat pause, then the HeartbeatActor sends a HeartbeatTimeout message to the owner of the HeartbeatActor. The acceptable heartbeat pause can be extended by the HeartbeatActor if it detects that it has been stalled by garbage collection, for example. The HeartbeatActor is started as a child actor of the TaskManager. Add ClusterOptions Add comments > Replace Akka's death watch with own heartbeat on the TM side > > > Key: FLINK-4944 > URL: https://issues.apache.org/jira/browse/FLINK-4944 > Project: Flink > Issue Type: Improvement > Components: TaskManager >Affects Versions: 1.2.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann > Fix For: 1.2.0 > > > In order to properly implement FLINK-3347, the {{TaskManager}} must no longer > use Akka's death watch mechanism to detect {{JobManager}} failures. The > reason is that a hard {{JobManager}} failure will lead to quarantining the > {{JobManager's}} {{ActorSystem}} by the {{TaskManagers}}. This in combination > with FLINK-3347 would lead to a shutdown of all {{TaskManagers}}. > Instead we should use our own heartbeat signal to detect dead > {{JobManagers}}. In case of a heartbeat timeout, the {{TaskManager}} won't > shut down but simply cancel and clear everything. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4998) ResourceManager fails when num task slots > Yarn vcores
[ https://issues.apache.org/jira/browse/FLINK-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629366#comment-15629366 ] ASF GitHub Bot commented on FLINK-4998: --- GitHub user mxm opened a pull request: https://github.com/apache/flink/pull/2741 [FLINK-4998][yarn] fail if too many task slots are configured This fails the deployment of the Yarn application if the number of task slots are configured to be larger than the maximum virtual cores of the Yarn cluster. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mxm/flink FLINK-4998 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/2741.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2741 commit 35c4ad3cb086abe6fa85c5755daa8a83fbdfbf56 Author: Maximilian MichelsDate: 2016-11-02T15:37:56Z [FLINK-4998][yarn] fail if too many task slots are configured This fails the deployment of the Yarn application if the number of task slots are configured to be larger than the maximum virtual cores of the Yarn cluster. > ResourceManager fails when num task slots > Yarn vcores > --- > > Key: FLINK-4998 > URL: https://issues.apache.org/jira/browse/FLINK-4998 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN Client >Affects Versions: 1.2.0, 1.1.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.2.0 > > > The ResourceManager fails to acquire containers when the users configures the > number of task slots to be greater than the maximum number of virtual cores > of the Yarn cluster. > We should check during deployment that the task slots are not configured to > be larger than the virtual cores. > {noformat} > 2016-11-02 14:39:01,948 ERROR org.apache.flink.yarn.YarnFlinkResourceManager > - FATAL ERROR IN YARN APPLICATION MASTER: Connection to YARN > Resource Manager failed > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=3, maxVirtualCores=1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4998) ResourceManager fails when num task slots > Yarn vcores
[ https://issues.apache.org/jira/browse/FLINK-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629345#comment-15629345 ] Maximilian Michels commented on FLINK-4998: --- This is related to FLINK-2213. > ResourceManager fails when num task slots > Yarn vcores > --- > > Key: FLINK-4998 > URL: https://issues.apache.org/jira/browse/FLINK-4998 > Project: Flink > Issue Type: Bug > Components: ResourceManager, YARN Client >Affects Versions: 1.2.0, 1.1.3 >Reporter: Maximilian Michels >Assignee: Maximilian Michels > Fix For: 1.2.0 > > > The ResourceManager fails to acquire containers when the users configures the > number of task slots to be greater than the maximum number of virtual cores > of the Yarn cluster. > We should check during deployment that the task slots are not configured to > be larger than the virtual cores. > {noformat} > 2016-11-02 14:39:01,948 ERROR org.apache.flink.yarn.YarnFlinkResourceManager > - FATAL ERROR IN YARN APPLICATION MASTER: Connection to YARN > Resource Manager failed > org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid > resource request, requested virtual cores < 0, or requested virtual cores > > max configured, requestedVirtualCores=3, maxVirtualCores=1 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4998) ResourceManager fails when num task slots > Yarn vcores
Maximilian Michels created FLINK-4998: - Summary: ResourceManager fails when num task slots > Yarn vcores Key: FLINK-4998 URL: https://issues.apache.org/jira/browse/FLINK-4998 Project: Flink Issue Type: Bug Components: ResourceManager, YARN Client Affects Versions: 1.1.3, 1.2.0 Reporter: Maximilian Michels Assignee: Maximilian Michels Fix For: 1.2.0 The ResourceManager fails to acquire containers when the users configures the number of task slots to be greater than the maximum number of virtual cores of the Yarn cluster. We should check during deployment that the task slots are not configured to be larger than the virtual cores. {noformat} 2016-11-02 14:39:01,948 ERROR org.apache.flink.yarn.YarnFlinkResourceManager - FATAL ERROR IN YARN APPLICATION MASTER: Connection to YARN Resource Manager failed org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested virtual cores < 0, or requested virtual cores > max configured, requestedVirtualCores=3, maxVirtualCores=1 {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629341#comment-15629341 ] Stephan Ewen commented on FLINK-4022: - Thinking about this, I am actually not sure that we need the low watermark service: For Event Time: - The actual timestamps are usually part of the data, not derived from the low watermark service - If the timestamps are late, then the data is actually late (partition was discovered late) - Watermarks are derived from the data not some other service. - When the FlinkKafkaConsumer tracks per-partition-watermarks, then the new "currentWatermark" for the new partition in that source instance starts with that source's current watermark - When there is only watermark generation after the FlinkKafkaConsumer, then that one has its watermark anyways and it just sees potentially some late data passes through it For Ingestion time: - Timestamps and watermarks depend purely on the local clock [~tzulitai] and [~aljoscha] What do you think? Why do we still need a watermark service here? > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for handling topic repartitioning. > Overall, > User-facing API changes: > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > DeserializationSchema, Properties) > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > KeyedDeserializationSchema, Properties) > Implementation changes: > 1. Dynamic partition assigning depending on KafkaConsumer#subscribe > - Remove partition list querying from constructor > - Remove static partition assigning to substasks in run() > - Instead of using KafkaConsumer#assign() in fetchers to manually assign > static partitions, use subscribe() registered with the callback > implementation explained above. > 2. Restoring from checkpointed states > - Snapshotting should remain unchanged > - Restoring requires subtasks to coordinate the restored offsets they hold > before continuing (unless we are able to have
[jira] [Updated] (FLINK-4997) Extending Window Function Metadata
[ https://issues.apache.org/jira/browse/FLINK-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ventura Del Monte updated FLINK-4997: - External issue URL: (was: https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata) > Extending Window Function Metadata > -- > > Key: FLINK-4997 > URL: https://issues.apache.org/jira/browse/FLINK-4997 > Project: Flink > Issue Type: New Feature > Components: DataStream API, Streaming, Windowing Operators >Reporter: Ventura Del Monte >Assignee: Ventura Del Monte > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4997) Extending Window Function Metadata
[ https://issues.apache.org/jira/browse/FLINK-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ventura Del Monte updated FLINK-4997: - Description: https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata > Extending Window Function Metadata > -- > > Key: FLINK-4997 > URL: https://issues.apache.org/jira/browse/FLINK-4997 > Project: Flink > Issue Type: New Feature > Components: DataStream API, Streaming, Windowing Operators >Reporter: Ventura Del Monte >Assignee: Ventura Del Monte > Fix For: 1.2.0 > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-2+Extending+Window+Function+Metadata -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4997) Extending Window Function Metadata
Ventura Del Monte created FLINK-4997: Summary: Extending Window Function Metadata Key: FLINK-4997 URL: https://issues.apache.org/jira/browse/FLINK-4997 Project: Flink Issue Type: New Feature Components: DataStream API, Streaming, Windowing Operators Reporter: Ventura Del Monte Assignee: Ventura Del Monte Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4996) Make CrossHint @Public
Greg Hogan created FLINK-4996: - Summary: Make CrossHint @Public Key: FLINK-4996 URL: https://issues.apache.org/jira/browse/FLINK-4996 Project: Flink Issue Type: Improvement Components: Core Affects Versions: 1.2.0 Reporter: Greg Hogan Assignee: Greg Hogan Priority: Trivial Fix For: 1.2.0 {{CrossHint}} should be annotated {{@Public}} as is {{JoinHint}}. It is currently marked {{@Internal}} by its enclosing class {{CrossOperatorBase}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4315) Deprecate Hadoop dependent methods in flink-java
[ https://issues.apache.org/jira/browse/FLINK-4315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629201#comment-15629201 ] ASF GitHub Bot commented on FLINK-4315: --- Github user kenmy commented on the issue: https://github.com/apache/flink/pull/2637 Thanks @fhueske for a detailed review. Done all except moving Hadoop-related tests into flink-hadoop-compatibility. I'll do it sometime later. IMO this is the out of scope of issue "Deprecate Hadoop dependent methods in flink-java" as well as the moving it from flink-scala. May be moving the activity connected with hadoop-tests from this "god issue" to another issue will be better? Anyway I publish current state and I wait any advices how I may make my PR better. BR, Evgeny > Deprecate Hadoop dependent methods in flink-java > > > Key: FLINK-4315 > URL: https://issues.apache.org/jira/browse/FLINK-4315 > Project: Flink > Issue Type: Task > Components: Java API >Reporter: Stephan Ewen >Assignee: Evgeny Kincharov > Fix For: 2.0.0 > > > The API projects should be independent of Hadoop, because Hadoop is not an > integral part of the Flink stack, and we should have the option to offer > Flink without Hadoop dependencies. > The current batch APIs have a hard dependency on Hadoop, mainly because the > API has utility methods like `readHadoopFile(...)`. > I suggest to deprecate those methods and add helpers in the > `flink-hadoop-compatibility` project. > FLINK-4048 will later remove the deprecated methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2637: [FLINK-4315] Deprecate Hadoop dependent methods in flink-...
Github user kenmy commented on the issue: https://github.com/apache/flink/pull/2637 Thanks @fhueske for a detailed review. Done all except moving Hadoop-related tests into flink-hadoop-compatibility. I'll do it sometime later. IMO this is the out of scope of issue "Deprecate Hadoop dependent methods in flink-java" as well as the moving it from flink-scala. May be moving the activity connected with hadoop-tests from this "god issue" to another issue will be better? Anyway I publish current state and I wait any advices how I may make my PR better. BR, Evgeny --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629178#comment-15629178 ] Tzu-Li (Gordon) Tai commented on FLINK-4022: This is currently blocked by FLINK-4576, which I'm working on resolving first. For this task in particular there isn't much progress yet. However, I'd also like to target this feature for the 1.2 release. > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for handling topic repartitioning. > Overall, > User-facing API changes: > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > DeserializationSchema, Properties) > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > KeyedDeserializationSchema, Properties) > Implementation changes: > 1. Dynamic partition assigning depending on KafkaConsumer#subscribe > - Remove partition list querying from constructor > - Remove static partition assigning to substasks in run() > - Instead of using KafkaConsumer#assign() in fetchers to manually assign > static partitions, use subscribe() registered with the callback > implementation explained above. > 2. Restoring from checkpointed states > - Snapshotting should remain unchanged > - Restoring requires subtasks to coordinate the restored offsets they hold > before continuing (unless we are able to have merged restore states). > 3. For previous consumer functionality (consume from fixed list of topics), > the KafkaConsumer#subscribe() has a corresponding overload method for fixed > list of topics. We can simply decide which subscribe() overload to use > depending on whether a regex Pattern or list of topics is supplied. > 4. If subtasks don't initially have any assigned partitions, we shouldn't > emit MAX_VALUE watermark, since it may hold partitions after a rebalance. > Instead, un-assigned subtasks should be running a fetcher instance too and > take part as a process pool for the consumer group of the subscribed topics. -- This message was sent by Atlassian JIRA
[jira] [Commented] (FLINK-3659) Add ConnectWithBroadcast Operation
[ https://issues.apache.org/jira/browse/FLINK-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629167#comment-15629167 ] Aljoscha Krettek commented on FLINK-3659: - Yep ;-) > Add ConnectWithBroadcast Operation > -- > > Key: FLINK-3659 > URL: https://issues.apache.org/jira/browse/FLINK-3659 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.0.0 >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > We should add a new operation that has a main input that can be keyed (but > doesn't have to be) and a second input that is always broadcast. This is > similar to a {{CoFlatMap}} or {{CoMap}} but there either both inputs have to > be keyed or non-keyed. > This builds on FLINK-4940 which aims at adding broadcast/global state. When > processing an element from the broadcast input only access to broadcast state > is allowed. When processing an element from the main input access both the > regular keyed state and the broadcast state can be accessed. > I'm proposing this as an intermediate/low-level operation because it will > probably take a while until we add support for side-inputs in the API. This > new operation would allow expressing new patterns that cannot be expressed > with the currently expressed operations. > This is the new proposed API (names are non-final): > 1) Add {{DataStream.connectWithBroadcast(DataStream)}} and > {{KeyedStream.connectWithBroadcast(DataStream)}} > 2) Add {{ConnectedWithBroadcastStream}}, akin to {{ConnectedStreams}}/ > 3) Add {{BroadcastFlatMap}} and {{TimelyBroadcastFlatMap}} as the user > functions. > Sketch of the user function: > {code} > interface BroadcastFlatMapFunction { > public void flatMap(IN in, Collector out); > public void processBroadcastInput(BIN in); > } > {code} > The API names, function names are a bit verbose and we have to add two new > different ones but I don't see a way around this with the current way the > Flink API works. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3869) WindowedStream.apply with FoldFunction is too restrictive
[ https://issues.apache.org/jira/browse/FLINK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-3869: Assignee: Yassine Marzougui (was: Aljoscha Krettek) > WindowedStream.apply with FoldFunction is too restrictive > - > > Key: FLINK-3869 > URL: https://issues.apache.org/jira/browse/FLINK-3869 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Yassine Marzougui > > Right now we have this signature: > {code} > public SingleOutputStreamOperator apply(R initialValue, > FoldFunctionfoldFunction, WindowFunction function) { > {code} > but we should have this signature to allow users to return a type other than > the fold accumulator type from their window function: > {code} > public SingleOutputStreamOperator apply(ACC initialValue, > FoldFunction foldFunction, WindowFunction function) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3869) WindowedStream.apply with FoldFunction is too restrictive
[ https://issues.apache.org/jira/browse/FLINK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-3869: Issue Type: Improvement (was: Sub-task) Parent: (was: FLINK-3957) > WindowedStream.apply with FoldFunction is too restrictive > - > > Key: FLINK-3869 > URL: https://issues.apache.org/jira/browse/FLINK-3869 > Project: Flink > Issue Type: Improvement > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > Right now we have this signature: > {code} > public SingleOutputStreamOperator apply(R initialValue, > FoldFunctionfoldFunction, WindowFunction function) { > {code} > but we should have this signature to allow users to return a type other than > the fold accumulator type from their window function: > {code} > public SingleOutputStreamOperator apply(ACC initialValue, > FoldFunction foldFunction, WindowFunction function) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4576) Low Watermark Service in JobManager for Streaming Sources
[ https://issues.apache.org/jira/browse/FLINK-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629137#comment-15629137 ] Aljoscha Krettek commented on FLINK-4576: - Hi, I think the watermark needs to be collected across the parallel instances of an operator. For example, if you have a source, you would collect the current watermark from all instances of that source, combine it and then send out the minimum watermark to all those source instances. If there are several operators in the topology that would need the watermark notification functionality then this process has to be done for each operator separately, i.e. for all parallel instances of one operator. > Low Watermark Service in JobManager for Streaming Sources > - > > Key: FLINK-4576 > URL: https://issues.apache.org/jira/browse/FLINK-4576 > Project: Flink > Issue Type: New Feature > Components: JobManager, Streaming, TaskManager >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.2.0 > > > As per discussion in FLINK-4341 by [~aljoscha] and [~StephanEwen], we need a > low watermark service in the JobManager to support transparent resharding / > partition discovery for our Kafka and Kinesis consumers (and any future > streaming connectors in general for which the external system may elastically > scale up and down independently of the parallelism of sources in Flink). The > main idea is to let source subtasks that don't emit their own watermarks > (because they currently don't have data partitions to consume) emit the low > watermark across all subtasks, instead of simply emitting a Long.MAX_VALUE > watermark and forbidding them to be assigned partitions in the future. > The proposed implementation, from a high-level: a {{LowWatermarkCoordinator}} > will be added to execution graphs, periodically triggering only the source > vertices with a {{RetrieveLowWatermark}} message. The tasks reply to the > JobManager through the actor gateway (or a new interface after FLINK-4456 > gets merged) with a {{ReplyLowWatermark}} message. When the coordinator > collects all low watermarks for a particular source vertex and determines the > aggregated low watermark for this round (accounting only values that are > larger than the aggregated low watermark of the last round), it sends a > {{NotifyNewLowWatermark}} message to the source vertex's tasks. > The messages will only be relevant to tasks that implement an internal > {{LowWatermarkCooperatingTask}} interface. For now, only {{SourceStreamTask}} > should implement {{LowWatermarkCooperatingTask}}. > Source functions should implement a public {{LowWatermarkListener}} interface > if they wish to get notified of the aggregated low watermarks across > subtasks. Connectors like the Kinesis consumer can choose to emit this > watermark if the subtask currently does not have any shards, so that > downstream operators may still properly advance time windows (implementation > for this is tracked as a separate issue). > Overall, the service will include - > New messages between JobManager <-> TaskManager: > {{RetrieveLowWatermark(jobId, jobVertexId, taskId, timestamp)}} > {{ReplyLowWatermark(jobId, jobVertexId, taskId, currentLowWatermark)}} > {{NotifyNewLowWatermark(jobId, jobVertexId, taskId, newLowWatermark, > timestamp)}} > New internal task interface {{LowWatermarkCooperatingTask}} in flink-runtime > New public interface {{LowWatermarkListener}} in flink-streaming-java > Might also need to extend {{SourceFunction.SourceContext}} to support > retrieving the current low watermark of sources. > Any feedback for this is appreciated! -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2720: [FLINK-4623] Create Physical Execution Plan of a DataStre...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2720 +1 to merge. Btw @tonycox, please leave a brief comment when you update a PR. Otherwise, we won't get an email and the update might not be noticed until somebody looks at the PR. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4623) Create Physical Execution Plan of a DataStream
[ https://issues.apache.org/jira/browse/FLINK-4623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629134#comment-15629134 ] ASF GitHub Bot commented on FLINK-4623: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2720 +1 to merge. Btw @tonycox, please leave a brief comment when you update a PR. Otherwise, we won't get an email and the update might not be noticed until somebody looks at the PR. Thanks! > Create Physical Execution Plan of a DataStream > -- > > Key: FLINK-4623 > URL: https://issues.apache.org/jira/browse/FLINK-4623 > Project: Flink > Issue Type: Improvement > Components: Table API & SQL >Reporter: Timo Walther >Assignee: Anton Solovev > Labels: starter > > The {{StreamTableEnvironment#explain(Table)}} command for tables of a > {{StreamTableEnvironment}} only outputs the abstract syntax tree. It would be > helpful if the {{explain}} method could also generate a string from the > {{DataStream}} containing a physical execution plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-4991) TestTask hangs in testWatchDogInterruptsTask
[ https://issues.apache.org/jira/browse/FLINK-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi closed FLINK-4991. -- Resolution: Fixed Fix Version/s: 1.1.4 1.2.0 Fixed in {{30a53ef}} (master), {{8412234}} (release-1.1). > TestTask hangs in testWatchDogInterruptsTask > > > Key: FLINK-4991 > URL: https://issues.apache.org/jira/browse/FLINK-4991 > Project: Flink > Issue Type: Bug > Components: TaskManager >Reporter: Ufuk Celebi > Labels: test-stability > Fix For: 1.2.0, 1.1.4 > > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/172410444/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4743) The sqrt/power function not accept the real data types.
[ https://issues.apache.org/jira/browse/FLINK-4743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629126#comment-15629126 ] ASF GitHub Bot commented on FLINK-4743: --- Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2686 Great! +1 to merge :-) > The sqrt/power function not accept the real data types. > --- > > Key: FLINK-4743 > URL: https://issues.apache.org/jira/browse/FLINK-4743 > Project: Flink > Issue Type: Bug > Components: Table API & SQL >Affects Versions: 1.1.1 >Reporter: Anton Mushin >Assignee: Anton Solovev > > At time calculate the sequences sql aggregate functions for real type column, > got exception: No applicable constructor/method found for actual parameters > "float, java.math.BigDecimal" > And for column of integer type the problem does not occur. > Code reproduce: > {code} > @Test > def test():Unit={ > val env = ExecutionEnvironment.getExecutionEnvironment > val tEnv = TableEnvironment.getTableEnvironment(env, config) > val ds = env.fromElements( > (1.0f, 1), > (2.0f, 2)).toTable(tEnv) > tEnv.registerTable("MyTable", ds) > val sqlQuery = "SELECT " + > "SQRT((SUM(a*a) - SUM(a)*SUM(a)/COUNT(a))/COUNT(a)) "+ > "from (select _1 as a from MyTable)" > tEnv.sql(sqlQuery).toDataSet[Row].collect().foreach(x=>print(s"$x ")) > } > {code} > got exception: > {noformat} > org.apache.flink.api.common.InvalidProgramException: Table program cannot be > compiled. This is a bug. Please file an issue. > at > org.apache.flink.api.table.runtime.FunctionCompiler$class.compile(FunctionCompiler.scala:37) > at > org.apache.flink.api.table.runtime.FlatMapRunner.compile(FlatMapRunner.scala:28) > at > org.apache.flink.api.table.runtime.FlatMapRunner.open(FlatMapRunner.scala:42) > at > org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38) > at > org.apache.flink.api.common.operators.base.FlatMapOperatorBase.executeOnCollections(FlatMapOperatorBase.java:62) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeUnaryOperator(CollectionExecutor.java:249) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:147) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.executeDataSink(CollectionExecutor.java:180) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:156) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:129) > at > org.apache.flink.api.common.operators.CollectionExecutor.execute(CollectionExecutor.java:113) > at > org.apache.flink.api.java.CollectionEnvironment.execute(CollectionEnvironment.java:35) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:47) > at > org.apache.flink.test.util.CollectionTestEnvironment.execute(CollectionTestEnvironment.java:42) > at > org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:637) > at org.apache.flink.api.scala.DataSet.collect(DataSet.scala:547) > at > org.apache.flink.api.scala.batch.sql.AggregationsITCase.test(AggregationsITCase.scala:307) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55) > at org.junit.rules.RunRules.evaluate(RunRules.java:20) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) > at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) > at
[jira] [Commented] (FLINK-4991) TestTask hangs in testWatchDogInterruptsTask
[ https://issues.apache.org/jira/browse/FLINK-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629122#comment-15629122 ] ASF GitHub Bot commented on FLINK-4991: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2738 > TestTask hangs in testWatchDogInterruptsTask > > > Key: FLINK-4991 > URL: https://issues.apache.org/jira/browse/FLINK-4991 > Project: Flink > Issue Type: Bug > Components: TaskManager >Reporter: Ufuk Celebi > Labels: test-stability > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/172410444/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2686: [FLINK-4743] The sqrt/power function not accept the real ...
Github user fhueske commented on the issue: https://github.com/apache/flink/pull/2686 Great! +1 to merge :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request #2738: [FLINK-4991] [taskmanager] Fix too aggressive time...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2738 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629116#comment-15629116 ] ASF GitHub Bot commented on FLINK-4945: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2706 > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-4995) YarnFlinkResourceManagerTest JobManager Lost Leadership test failed
Ufuk Celebi created FLINK-4995: -- Summary: YarnFlinkResourceManagerTest JobManager Lost Leadership test failed Key: FLINK-4995 URL: https://issues.apache.org/jira/browse/FLINK-4995 Project: Flink Issue Type: Bug Components: YARN Affects Versions: 1.1.3 Reporter: Ufuk Celebi {code} Running org.apache.flink.yarn.YarnFlinkResourceManagerTest Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.621 sec <<< FAILURE! - in org.apache.flink.yarn.YarnFlinkResourceManagerTest testYarnFlinkResourceManagerJobManagerLostLeadership(org.apache.flink.yarn.YarnFlinkResourceManagerTest) Time elapsed: 0.397 sec <<< FAILURE! java.lang.AssertionError: assertion failed: expected class org.apache.flink.runtime.messages.Acknowledge, found class org.apache.flink.runtime.clusterframework.messages.RegisterResourceManager at scala.Predef$.assert(Predef.scala:165) at akka.testkit.TestKitBase$class.expectMsgClass_internal(TestKit.scala:424) at akka.testkit.TestKitBase$class.expectMsgClass(TestKit.scala:419) at akka.testkit.TestKit.expectMsgClass(TestKit.scala:718) at akka.testkit.JavaTestKit.expectMsgClass(JavaTestKit.java:408) at org.apache.flink.yarn.YarnFlinkResourceManagerTest$1.(YarnFlinkResourceManagerTest.java:179) at org.apache.flink.yarn.YarnFlinkResourceManagerTest.testYarnFlinkResourceManagerJobManagerLostLeadership(YarnFlinkResourceManagerTest.java:90) {code} https://travis-ci.org/uce/flink/jobs/172552415 Failed in a branch with an unrelated change in TaskTest. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-4945. --- Resolution: Fixed Thank you for fixing the issue. Merged to master in http://git-wip-us.apache.org/repos/asf/flink/commit/223b0aa0 > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4945: -- Fix Version/s: 1.2.0 > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-4945) KafkaConsumer logs wrong warning about confirmation for unknown checkpoint
[ https://issues.apache.org/jira/browse/FLINK-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-4945: -- Component/s: Kafka Connector > KafkaConsumer logs wrong warning about confirmation for unknown checkpoint > -- > > Key: FLINK-4945 > URL: https://issues.apache.org/jira/browse/FLINK-4945 > Project: Flink > Issue Type: Bug > Components: Kafka Connector >Reporter: Stefan Richter >Assignee: Stefan Richter >Priority: Minor > Fix For: 1.2.0 > > > Checkpoints are currently not registered in all cases. While the code still > behaves correctly this leads to misleading warnings. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request #2706: [FLINK-4945] KafkaConsumer logs wrong warning abou...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/2706 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4713) Implementing ranking evaluation scores for recommender systems
[ https://issues.apache.org/jira/browse/FLINK-4713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629109#comment-15629109 ] Gábor Hermann commented on FLINK-4713: -- We have managed to rework the evaluation framework proposed by Theodore, so that ranking predictions would fit in. Our approach is to use separate {{RankingPredictor}} and {{Predictor}} traits. One main problem however remains: there is no common superclass for {{RankingPredictor}} and {{Predictor}} so the pipelining mechanism might not work. A {{Predictor}} can only be at the and of the pipeline, so this should not really be a problem, but I do not know for sure. An alternative solution would be to have different objects {{ALS}} and {{RankingALS}} that give different predictions, but both extends only a {{Predictor}}. There could be implicit conversions between the two. I would prefer the current solution if it does not break the pipelining. [~tvas] What do you think about this? (This seems to be a problem similar to having a {{predict_proba}} function in scikit learn classification models, where the same model for the same input gives two different predictions: a {{predict}} for discrete predictions and {{predict_proba}} for giving a probability.) On the other hand, we seem to have solved the scoring issue. The users can evaluate a recommendation algorithm such as ALS by using a score operating on rankings (e.g. NDCG), or a score operating on ratings (e.g. RMSE). They only need to modify the {{Score}} they use in their code, and nothing else. The main problem was that the {{evaluate}} method and {{EvaluateDataSetOperation}} were not general enough. They prepare the evaluation to {{(trueValue, predictedValue)}} pairs (i.e. a {{DataSet\[(PredictionType, PredictionType)\]}}), while ranking evaluations needed a more general input with the true ratings ({{DataSet\[(Int,Int,Double)\]}}) and the predicted rankings ({{DataSet\[(Int,Int,Int)\]}}). Instead of using {{EvaluateDataSetOperation}} we use a more general {{PrepareOperation}}. We rename the {{Score}} in the original evaluation framework to {{PairwiseScore}}. {{RankingScore}} and {{PairwiseScore}} has a common trait {{AbstractScore}}. This way the user can use both a {{RankingScore}} and a {{PairwiseScore}} for a certain model, and only need to alter the score used in the code. In case of pairwise scores (that only need true and predicted value pairs for evaluation) {{EvaluateDataSetOperation}} is used as a {{PrepareOperation}}. It prepares the evaluation by creating {{(trueValue, predicitedValue)}} pairs from the test dataset. Thus, the result of preparing and the input of {{PairwiseScore}}s will be {{DataSet\[(PredictionType,PredictionType)\]}}. In case of rankings the {{PrepareOperation}} passes the test dataset and creates the rankings. The result of preparing and the input of {{RankingScore}}s will be {{(DataSet\[Int,Int,Double\], DataSet\[Int,Int,Int\])}}. I believe this is a fairly acceptable solution that avoids breaking the API. We did not go along with the implementation, documentation, and cleaning up the code, as we need feedback regarding API decisions. Are we on the right path? What do you think about our solution? How acceptable is it? The sketch code can be found on this branch: [https://github.com/gaborhermann/flink/tree/ranking-rec-eval] > Implementing ranking evaluation scores for recommender systems > -- > > Key: FLINK-4713 > URL: https://issues.apache.org/jira/browse/FLINK-4713 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Domokos Miklós Kelen >Assignee: Gábor Hermann > > Follow up work to [4712|https://issues.apache.org/jira/browse/FLINK-4712] > includes implementing ranking recommendation evaluation metrics (such as > precision@k, recall@k, ndcg@k), [similar to Spark's > implementations|https://spark.apache.org/docs/1.5.0/mllib-evaluation-metrics.html#ranking-systems]. > It would be beneficial if we were able to design the API such that it could > be included in the proposed evaluation framework (see > [2157|https://issues.apache.org/jira/browse/FLINK-2157]). > In it's current form, this would mean generalizing the PredictionType type > parameter of the Score class to allow for {{Array[Int]}} or {{Array[(Int, > Double)]}}, and outputting the recommendations in the form {{DataSet[(Int, > Array[Int])]}} or {{DataSet[(Int, Array[(Int,Double)])]}} meaning (user, > array of items), possibly including the predicted scores as well. > However, calculating for example nDCG for a given user u requires us to be > able to access all of the (u, item, relevance) records in the test dataset, > which means we would need to put this information in the second element of
[jira] [Commented] (FLINK-4022) Partition discovery / regex topic subscription for the Kafka consumer
[ https://issues.apache.org/jira/browse/FLINK-4022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629091#comment-15629091 ] Jamie Grier commented on FLINK-4022: I also think this would be a great feature and a few Flink users have asked about this -- both dynamic partition discover within one topic and also dynamic topic discovery. Any progress on this? > Partition discovery / regex topic subscription for the Kafka consumer > - > > Key: FLINK-4022 > URL: https://issues.apache.org/jira/browse/FLINK-4022 > Project: Flink > Issue Type: New Feature > Components: Kafka Connector, Streaming Connectors >Affects Versions: 1.0.0 >Reporter: Tzu-Li (Gordon) Tai >Assignee: Tzu-Li (Gordon) Tai > Fix For: 1.2.0 > > > Example: allow users to subscribe to "topic-n*", so that the consumer > automatically reads from "topic-n1", "topic-n2", ... and so on as they are > added to Kafka. > I propose to implement this feature by the following description: > Since the overall list of partitions to read will change after job > submission, the main big change required for this feature will be dynamic > partition assignment to subtasks while the Kafka consumer is running. This > will mainly be accomplished using Kafka 0.9.x API > `KafkaConsumer#subscribe(java.util.regex.Pattern, > ConsumerRebalanceListener)`. Each KafkaConsumers in each subtask will be > added to the same consumer group when instantiated, and rely on Kafka to > dynamically reassign partitions to them whenever a rebalance happens. The > registered `ConsumerRebalanceListener` is a callback that is called right > before and after rebalancing happens. We'll use this callback to let each > subtask commit its last offsets of partitions its currently responsible of to > an external store (or Kafka) before a rebalance; after rebalance and the > substasks gets the new partitions it'll be reading from, they'll read from > the external store to get the last offsets for their new partitions > (partitions which don't have offset entries in the store are new partitions > causing the rebalancing). > The tricky part will be restoring Flink checkpoints when the partition > assignment is dynamic. Snapshotting will remain the same - subtasks snapshot > the offsets of partitions they are currently holding. Restoring will be a > bit different in that subtasks might not be assigned matching partitions to > the snapshot the subtask is restored with (since we're letting Kafka > dynamically assign partitions). There will need to be a coordination process > where, if a restore state exists, all subtasks first commit the offsets they > receive (as a result of the restore state) to the external store, and then > all subtasks attempt to find a last offset for the partitions it is holding. > However, if the globally merged restore state feature mentioned by > [~StephanEwen] in https://issues.apache.org/jira/browse/FLINK-3231 is > available, then the restore will be simple again, as each subtask has full > access to previous global state therefore coordination is not required. > I think changing to dynamic partition assignment is also good in the long run > for handling topic repartitioning. > Overall, > User-facing API changes: > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > DeserializationSchema, Properties) > - New constructor - FlinkKafkaConsumer09(java.util.regex.Pattern, > KeyedDeserializationSchema, Properties) > Implementation changes: > 1. Dynamic partition assigning depending on KafkaConsumer#subscribe > - Remove partition list querying from constructor > - Remove static partition assigning to substasks in run() > - Instead of using KafkaConsumer#assign() in fetchers to manually assign > static partitions, use subscribe() registered with the callback > implementation explained above. > 2. Restoring from checkpointed states > - Snapshotting should remain unchanged > - Restoring requires subtasks to coordinate the restored offsets they hold > before continuing (unless we are able to have merged restore states). > 3. For previous consumer functionality (consume from fixed list of topics), > the KafkaConsumer#subscribe() has a corresponding overload method for fixed > list of topics. We can simply decide which subscribe() overload to use > depending on whether a regex Pattern or list of topics is supplied. > 4. If subtasks don't initially have any assigned partitions, we shouldn't > emit MAX_VALUE watermark, since it may hold partitions after a rebalance. > Instead, un-assigned subtasks should be running a fetcher instance too and > take part as a process pool for the consumer group of the subscribed topics. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-4177) CassandraConnectorTest.testCassandraCommitter causing unstable builds
[ https://issues.apache.org/jira/browse/FLINK-4177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629089#comment-15629089 ] ASF GitHub Bot commented on FLINK-4177: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/2484 @StephanEwen I've rebased and updated the PR. The tests now use the `TestJvmProcess` class to setup the cassandra instance and have a fixed deadline for the initial connection instead of a fixed number of attempts. I have also removed the `sleep(5000)` statement; i would like to see whether we can get by without it. > CassandraConnectorTest.testCassandraCommitter causing unstable builds > - > > Key: FLINK-4177 > URL: https://issues.apache.org/jira/browse/FLINK-4177 > Project: Flink > Issue Type: Bug > Components: Cassandra Connector, Streaming Connectors >Affects Versions: 1.1.0 >Reporter: Robert Metzger > Labels: test-stability > > This build: https://api.travis-ci.org/jobs/143272982/log.txt?deansi=true > failed with > {code} > 07/08/2016 09:59:12 Job execution switched to status FINISHED. > Tests run: 7, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 146.646 sec > <<< FAILURE! - in > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorTest > testCassandraCommitter(org.apache.flink.streaming.connectors.cassandra.CassandraConnectorTest) > Time elapsed: 9.057 sec <<< ERROR! > com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout > during write query at consistency LOCAL_SERIAL (1 replica were required but > only 0 acknowledged the write) > at > com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:73) > at > com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:26) > at > com.datastax.driver.core.DriverThrowables.propagateCause(DriverThrowables.java:37) > at > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(DefaultResultSetFuture.java:245) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:63) > at > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:39) > at > org.apache.flink.streaming.connectors.cassandra.CassandraCommitter.open(CassandraCommitter.java:103) > at > org.apache.flink.streaming.connectors.cassandra.CassandraConnectorTest.testCassandraCommitter(CassandraConnectorTest.java:284) > Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: > Cassandra timeout during write query at consistency LOCAL_SERIAL (1 replica > were required but only 0 acknowledged the write) > at > com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:100) > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:122) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:477) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at >
[GitHub] flink issue #2484: [FLINK-4177] Harden CassandraConnectorTest
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2484 @StephanEwen I've rebased and updated the PR. The tests now use the `TestJvmProcess` class to setup the cassandra instance and have a fixed deadline for the initial connection instead of a fixed number of attempts. I have also removed the `sleep(5000)` statement; i would like to see whether we can get by without it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-3869) WindowedStream.apply with FoldFunction is too restrictive
[ https://issues.apache.org/jira/browse/FLINK-3869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629062#comment-15629062 ] Fabian Hueske commented on FLINK-3869: -- I like that idea, [~aljoscha]. +1 from me to add a {{reduce}} and {{fold}} methods and deprecating the incremental {{apply}} methods. > WindowedStream.apply with FoldFunction is too restrictive > - > > Key: FLINK-3869 > URL: https://issues.apache.org/jira/browse/FLINK-3869 > Project: Flink > Issue Type: Sub-task > Components: Streaming >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > Right now we have this signature: > {code} > public SingleOutputStreamOperator apply(R initialValue, > FoldFunctionfoldFunction, WindowFunction function) { > {code} > but we should have this signature to allow users to return a type other than > the fold accumulator type from their window function: > {code} > public SingleOutputStreamOperator apply(ACC initialValue, > FoldFunction foldFunction, WindowFunction function) { > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink issue #2740: Implement StringIndexer
Github user zentol commented on the issue: https://github.com/apache/flink/pull/2740 Is there a JIRA issue for this PR? If so, could you include it in the PR title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink issue #2738: [FLINK-4991] [taskmanager] Fix too aggressive timeout and...
Github user uce commented on the issue: https://github.com/apache/flink/pull/2738 Travis passed here: https://travis-ci.org/uce/flink/builds/172551729 Merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-4991) TestTask hangs in testWatchDogInterruptsTask
[ https://issues.apache.org/jira/browse/FLINK-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629056#comment-15629056 ] ASF GitHub Bot commented on FLINK-4991: --- Github user uce commented on the issue: https://github.com/apache/flink/pull/2738 Travis passed here: https://travis-ci.org/uce/flink/builds/172551729 Merging. > TestTask hangs in testWatchDogInterruptsTask > > > Key: FLINK-4991 > URL: https://issues.apache.org/jira/browse/FLINK-4991 > Project: Flink > Issue Type: Bug > Components: TaskManager >Reporter: Ufuk Celebi > Labels: test-stability > > https://s3.amazonaws.com/archive.travis-ci.org/jobs/172410444/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)