[GitHub] flink issue #5614: [FLINK-8827] When FLINK_CONF_DIR contains spaces, execute...
Github user zentol commented on the issue: https://github.com/apache/flink/pull/5614 +1 ---
[jira] [Commented] (FLINK-8827) When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed
[ https://issues.apache.org/jira/browse/FLINK-8827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387428#comment-16387428 ] ASF GitHub Bot commented on FLINK-8827: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/5614 +1 > When FLINK_CONF_DIR contains spaces, execute zookeeper related scripts failed > - > > Key: FLINK-8827 > URL: https://issues.apache.org/jira/browse/FLINK-8827 > Project: Flink > Issue Type: Bug >Affects Versions: 1.4.0 > Environment: Red Hat Enterprise Linux Server release 6.5 (Santiago) >Reporter: Donghui Xu >Priority: Major > > When the path of FLINK_CONF_DIR including spaces, executing zookeeper related > scripts failed with the following error message: Expect binary expression. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5643: can integrate and support on apache kudu ?
Github user zentol commented on the issue: https://github.com/apache/flink/pull/5643 Pull Requests are not meant to ask question, please head to the [mailing lists](http://flink.apache.org/community.html#mailing-lists) or open a [JIRA](https://issues.apache.org/jira/projects/FLINK). ---
[jira] [Commented] (FLINK-8687) MaterializedCollectStreamResult#retrievePage should take resultLock
[ https://issues.apache.org/jira/browse/FLINK-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387372#comment-16387372 ] mingleizhang commented on FLINK-8687: - [~yuzhih...@gmail.com] Shouldn't we lock {{write}} situation ? means, also lock on {{CliTableResultView#updatePage}}. If we do not do that, I think it is still encounter a multi-threads problem. > MaterializedCollectStreamResult#retrievePage should take resultLock > --- > > Key: FLINK-8687 > URL: https://issues.apache.org/jira/browse/FLINK-8687 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: mingleizhang >Priority: Major > > Currently MaterializedCollectStreamResult#retrievePage checks page range and > calls snapshot.subList() without holding resultLock. > {{resultLock}} should be taken. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5643: can integrate and support on apache kudu ?
GitHub user qi20088 opened a pull request: https://github.com/apache/flink/pull/5643 can integrate and support on apache kudu ? *Thank you very much for contributing to Apache Flink - we are happy that you want to help us improve Flink. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.* *Please understand that we do not do this to make contributions to Flink a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.* ## Contribution Checklist - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue. - Name the pull request in the form "[FLINK-] [component] Title of the pull request", where *FLINK-* should be replaced by the actual issue number. Skip *component* if you are unsure about which is the best component. Typo fixes that have no associated JIRA issue should be named following this pattern: `[hotfix] [docs] Fix typo in event time introduction` or `[hotfix] [javadocs] Expand JavaDoc for PuncuatedWatermarkGenerator`. - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review. - Make sure that the change passes the automated tests, i.e., `mvn clean verify` passes. You can set up Travis CI to do that following [this guide](http://flink.apache.org/contribute-code.html#best-practices). - Each pull request should address only one issue, not mix up code from multiple issues. - Each commit in the pull request has a meaningful commit message (including the JIRA id) - Once all items of the checklist are addressed, remove the above text and this checklist, leaving only the filled out template below. **(The sections below can be removed for hotfixes of typos)** ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/flink release-1.5 Alternatively you can review and apply these changes as the patch at:
[jira] [Assigned] (FLINK-5621) Flink should provide a mechanism to prevent scheduling tasks on TaskManagers with operational issues
[ https://issues.apache.org/jira/browse/FLINK-5621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] vinoyang reassigned FLINK-5621: --- Assignee: vinoyang > Flink should provide a mechanism to prevent scheduling tasks on TaskManagers > with operational issues > > > Key: FLINK-5621 > URL: https://issues.apache.org/jira/browse/FLINK-5621 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.1.4 >Reporter: Jamie Grier >Assignee: vinoyang >Priority: Critical > > There are cases where jobs can get into a state where no progress can be made > if there is something pathologically wrong with one of the TaskManager nodes > in the cluster. > An example of this would be a TaskManager on a machine that runs out of disk > space. Flink never considers the TM to be "bad" and will keep using it to > attempt to run tasks -- which will continue to fail. > A suggestion for overcoming this would be to allow an option where a TM will > commit suicide if that TM was the source of an exception that caused a job to > fail/restart. > I'm sure there are plenty of other approaches to solving this.. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8687) MaterializedCollectStreamResult#retrievePage should take resultLock
[ https://issues.apache.org/jira/browse/FLINK-8687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingleizhang updated FLINK-8687: Priority: Major (was: Minor) > MaterializedCollectStreamResult#retrievePage should take resultLock > --- > > Key: FLINK-8687 > URL: https://issues.apache.org/jira/browse/FLINK-8687 > Project: Flink > Issue Type: Bug >Reporter: Ted Yu >Assignee: mingleizhang >Priority: Major > > Currently MaterializedCollectStreamResult#retrievePage checks page range and > calls snapshot.subList() without holding resultLock. > {{resultLock}} should be taken. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8871) Checkpoint cancellation is not propagated to stop checkpointing threads on the task manager
[ https://issues.apache.org/jira/browse/FLINK-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387335#comment-16387335 ] Sihua Zhou commented on FLINK-8871: --- [~srichter] Have you already work on this or decide to work on this? If not and if you don't mind, I'd like to take this ticket. > Checkpoint cancellation is not propagated to stop checkpointing threads on > the task manager > --- > > Key: FLINK-8871 > URL: https://issues.apache.org/jira/browse/FLINK-8871 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.3.2, 1.5.0, 1.4.1 >Reporter: Stefan Richter >Priority: Critical > Fix For: 1.6.0 > > > Flink currently lacks any form of feedback mechanism from the job manager / > checkpoint coordinator to the tasks when it comes to failing a checkpoint. > This means that running snapshots on the tasks are also not stopped even if > their owning checkpoint is already cancelled. Two examples for cases where > this applies are checkpoint timeouts and local checkpoint failures on a task > together with a configuration that does not fail tasks on checkpoint failure. > Notice that those running snapshots do no longer account for the maximum > number of parallel checkpoints, because their owning checkpoint is considered > as cancelled. > Not stopping the task's snapshot thread can lead to a problematic situation > where the next checkpoints already started, while the abandoned checkpoint > thread from a previous checkpoint is still lingering around running. This > scenario can potentially cascade: many parallel checkpoints will slow down > checkpointing and make timeouts even more likely. > > A possible solution is introducing a {{cancelCheckpoint}} method as > counterpart to the {{triggerCheckpoint}} method in the task manager gateway, > which is invoked by the checkpoint coordinator as part of cancelling the > checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6160) Retry JobManager/ResourceManager connection in case of timeout
[ https://issues.apache.org/jira/browse/FLINK-6160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387316#comment-16387316 ] ASF GitHub Bot commented on FLINK-6160: --- Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5440 @tillrohrmann Could you please take a look on this when available ? Thanks > Retry JobManager/ResourceManager connection in case of timeout > --- > > Key: FLINK-6160 > URL: https://issues.apache.org/jira/browse/FLINK-6160 > Project: Flink > Issue Type: Sub-task > Components: Distributed Coordination >Affects Versions: 1.3.0 >Reporter: Till Rohrmann >Priority: Major > Labels: flip-6 > > In case of a heartbeat timeout, the {{TaskExecutor}} closes the connection to > the remote component. Furthermore, it assumes that the component has actually > failed and, thus, it will only start trying to connect to the component if it > is notified about a new leader address and leader session id. This is > brittle, because the heartbeat could also time out without the component > having crashed. Thus, we should add an automatic retry to the latest known > leader address information in case of a timeout. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5440: [FLINK-6160] [flip-6] Retry JobManager/ResourceManager co...
Github user zhangminglei commented on the issue: https://github.com/apache/flink/pull/5440 @tillrohrmann Could you please take a look on this when available ? Thanks ---
[GitHub] flink issue #5481: [FLINK-8560] Access to the current key in ProcessFunction...
Github user bowenli86 commented on the issue: https://github.com/apache/flink/pull/5481 @kl0u @aljoscha I added the scala example, and I believe the only build failure in Travis is irrelevant ---
[jira] [Commented] (FLINK-8560) add KeyedProcessFunction to expose the key in onTimer() and other methods
[ https://issues.apache.org/jira/browse/FLINK-8560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387304#comment-16387304 ] ASF GitHub Bot commented on FLINK-8560: --- Github user bowenli86 commented on the issue: https://github.com/apache/flink/pull/5481 @kl0u @aljoscha I added the scala example, and I believe the only build failure in Travis is irrelevant > add KeyedProcessFunction to expose the key in onTimer() and other methods > - > > Key: FLINK-8560 > URL: https://issues.apache.org/jira/browse/FLINK-8560 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Affects Versions: 1.4.0 >Reporter: Jürgen Thomann >Assignee: Bowen Li >Priority: Major > Fix For: 1.5.0 > > > Currently it is required to store the key of a keyBy() in the processElement > method to have access to it in the OnTimerContext. > This is not so good as you have to check in the processElement method for > every element if the key is already stored and set it if it's not already set. > A possible solution would adding OnTimerContext#getCurrentKey() or a similar > method. Maybe having it in the open() method could maybe work as well. > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Getting-Key-from-keyBy-in-ProcessFunction-tt18126.html -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (FLINK-7795) Utilize error-prone to discover common coding mistakes
[ https://issues.apache.org/jira/browse/FLINK-7795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16345955#comment-16345955 ] Ted Yu edited comment on FLINK-7795 at 3/6/18 4:48 AM: --- error-prone has JDK 8 dependency . was (Author: yuzhih...@gmail.com): error-prone has JDK 8 dependency. > Utilize error-prone to discover common coding mistakes > -- > > Key: FLINK-7795 > URL: https://issues.apache.org/jira/browse/FLINK-7795 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Ted Yu >Priority: Major > > http://errorprone.info/ is a tool which detects common coding mistakes. > We should incorporate into Flink build process. > Here are the dependencies: > {code} > > com.google.errorprone > error_prone_annotation > ${error-prone.version} > provided > > > > com.google.auto.service > auto-service > 1.0-rc3 > true > > > com.google.errorprone > error_prone_check_api > ${error-prone.version} > provided > > > com.google.code.findbugs > jsr305 > > > > > com.google.errorprone > javac > 9-dev-r4023-3 > provided > > > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8428) Implement stream-stream non-window left outer join
[ https://issues.apache.org/jira/browse/FLINK-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387274#comment-16387274 ] ASF GitHub Bot commented on FLINK-8428: --- Github user walterddr commented on a diff in the pull request: https://github.com/apache/flink/pull/5327#discussion_r172408402 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/NonWindowLeftJoinWithNonEquiPredicates.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.table.runtime.join + +import org.apache.flink.api.common.state._ +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.tuple.{Tuple2 => JTuple2} +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.co.CoProcessFunction +import org.apache.flink.table.api.{StreamQueryConfig, Types} +import org.apache.flink.table.runtime.types.CRow +import org.apache.flink.types.Row +import org.apache.flink.util.Collector + +/** + * Connect data for left stream and right stream. Only use for LeftJoin with NonEquiPredicates. + * An MapState of type [Row, Long] is added to record how many rows from the right table can be + * matched for each left row. Left join without NonEquiPredicates doesn't need it because + * left rows can always join right rows as long as join keys are same. + * + * @param leftTypethe input type of left stream + * @param rightType the input type of right stream + * @param resultType the output type of join + * @param genJoinFuncName the function code of other non-equi condition + * @param genJoinFuncCode the function name of other non-equi condition + * @param queryConfig the configuration for the query to generate + */ +class NonWindowLeftJoinWithNonEquiPredicates( +leftType: TypeInformation[Row], +rightType: TypeInformation[Row], +resultType: TypeInformation[CRow], +genJoinFuncName: String, +genJoinFuncCode: String, +queryConfig: StreamQueryConfig) + extends NonWindowJoin( +leftType, +rightType, +resultType, +genJoinFuncName, +genJoinFuncCode, +queryConfig) { + + // result row, all field from right will be null + private var resultRow: Row = _ + // how many matched rows from the right table for each left row + private var leftJoinCnt: MapState[Row, Long] = _ + + override def open(parameters: Configuration): Unit = { +super.open(parameters) + +val leftJoinCntDescriptor = new MapStateDescriptor[Row, Long]( --- End diff -- I think either is fine as long as they are consistent. > Implement stream-stream non-window left outer join > -- > > Key: FLINK-8428 > URL: https://issues.apache.org/jira/browse/FLINK-8428 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > > Implement stream-stream non-window left outer join for sql/table-api. A > simple design doc can be found > [here|https://docs.google.com/document/d/1u7bJHeEBP_hFhi8Jm4oT3FqQDOm2pJDqCtq1U1WMHDo/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8428) Implement stream-stream non-window left outer join
[ https://issues.apache.org/jira/browse/FLINK-8428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387275#comment-16387275 ] ASF GitHub Bot commented on FLINK-8428: --- Github user walterddr commented on a diff in the pull request: https://github.com/apache/flink/pull/5327#discussion_r172408177 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/NonWindowJoin.scala --- @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.table.runtime.join + +import org.apache.flink.api.common.functions.FlatJoinFunction +import org.apache.flink.api.common.state.{MapState, MapStateDescriptor, ValueState, ValueStateDescriptor} +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.tuple.{Tuple2 => JTuple2} +import org.apache.flink.api.java.typeutils.TupleTypeInfo +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.co.CoProcessFunction +import org.apache.flink.table.api.{StreamQueryConfig, Types} +import org.apache.flink.table.codegen.Compiler +import org.apache.flink.table.runtime.types.CRow +import org.apache.flink.table.typeutils.TypeCheckUtils._ +import org.apache.flink.table.util.Logging +import org.apache.flink.types.Row +import org.apache.flink.util.Collector + +/** + * Connect data for left stream and right stream. Base class for stream-stream non-window Join. + * + * @param leftType the input type of left stream + * @param rightType the input type of right stream + * @param resultTypethe output type of join + * @param genJoinFuncName the function code of other non-equi condition + * @param genJoinFuncCode the function name of other non-equi condition + * @param queryConfig the configuration for the query to generate + */ +abstract class NonWindowJoin( +leftType: TypeInformation[Row], +rightType: TypeInformation[Row], +resultType: TypeInformation[CRow], +genJoinFuncName: String, +genJoinFuncCode: String, +queryConfig: StreamQueryConfig) + extends CoProcessFunction[CRow, CRow, CRow] + with Compiler[FlatJoinFunction[Row, Row, Row]] + with Logging { + + // check if input types implement proper equals/hashCode + validateEqualsHashCode("join", leftType) + validateEqualsHashCode("join", rightType) + + // state to hold left stream element + protected var leftState: MapState[Row, JTuple2[Int, Long]] = _ + // state to hold right stream element + protected var rightState: MapState[Row, JTuple2[Int, Long]] = _ + protected var cRowWrapper: CRowWrappingMultiOutputCollector = _ + + protected val minRetentionTime: Long = queryConfig.getMinIdleStateRetentionTime + protected val maxRetentionTime: Long = queryConfig.getMaxIdleStateRetentionTime + protected val stateCleaningEnabled: Boolean = minRetentionTime > 1 + + // state to record last timer of left stream, 0 means no timer + protected var leftTimer: ValueState[Long] = _ + // state to record last timer of right stream, 0 means no timer + protected var rightTimer: ValueState[Long] = _ + + // other condition function + protected var joinFunction: FlatJoinFunction[Row, Row, Row] = _ + + override def open(parameters: Configuration): Unit = { +LOG.debug(s"Compiling JoinFunction: $genJoinFuncName \n\n " + +s"Code:\n$genJoinFuncCode") +val clazz = compile( + getRuntimeContext.getUserCodeClassLoader, + genJoinFuncName, + genJoinFuncCode) +LOG.debug("Instantiating JoinFunction.") +joinFunction = clazz.newInstance() + +// initialize left and right state, the first element of tuple2 indicates how many rows of +// this row, while the second element represents the expired time of this
[GitHub] flink pull request #5327: [FLINK-8428] [table] Implement stream-stream non-w...
Github user walterddr commented on a diff in the pull request: https://github.com/apache/flink/pull/5327#discussion_r172408177 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/NonWindowJoin.scala --- @@ -0,0 +1,271 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.table.runtime.join + +import org.apache.flink.api.common.functions.FlatJoinFunction +import org.apache.flink.api.common.state.{MapState, MapStateDescriptor, ValueState, ValueStateDescriptor} +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.tuple.{Tuple2 => JTuple2} +import org.apache.flink.api.java.typeutils.TupleTypeInfo +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.co.CoProcessFunction +import org.apache.flink.table.api.{StreamQueryConfig, Types} +import org.apache.flink.table.codegen.Compiler +import org.apache.flink.table.runtime.types.CRow +import org.apache.flink.table.typeutils.TypeCheckUtils._ +import org.apache.flink.table.util.Logging +import org.apache.flink.types.Row +import org.apache.flink.util.Collector + +/** + * Connect data for left stream and right stream. Base class for stream-stream non-window Join. + * + * @param leftType the input type of left stream + * @param rightType the input type of right stream + * @param resultTypethe output type of join + * @param genJoinFuncName the function code of other non-equi condition + * @param genJoinFuncCode the function name of other non-equi condition + * @param queryConfig the configuration for the query to generate + */ +abstract class NonWindowJoin( +leftType: TypeInformation[Row], +rightType: TypeInformation[Row], +resultType: TypeInformation[CRow], +genJoinFuncName: String, +genJoinFuncCode: String, +queryConfig: StreamQueryConfig) + extends CoProcessFunction[CRow, CRow, CRow] + with Compiler[FlatJoinFunction[Row, Row, Row]] + with Logging { + + // check if input types implement proper equals/hashCode + validateEqualsHashCode("join", leftType) + validateEqualsHashCode("join", rightType) + + // state to hold left stream element + protected var leftState: MapState[Row, JTuple2[Int, Long]] = _ + // state to hold right stream element + protected var rightState: MapState[Row, JTuple2[Int, Long]] = _ + protected var cRowWrapper: CRowWrappingMultiOutputCollector = _ + + protected val minRetentionTime: Long = queryConfig.getMinIdleStateRetentionTime + protected val maxRetentionTime: Long = queryConfig.getMaxIdleStateRetentionTime + protected val stateCleaningEnabled: Boolean = minRetentionTime > 1 + + // state to record last timer of left stream, 0 means no timer + protected var leftTimer: ValueState[Long] = _ + // state to record last timer of right stream, 0 means no timer + protected var rightTimer: ValueState[Long] = _ + + // other condition function + protected var joinFunction: FlatJoinFunction[Row, Row, Row] = _ + + override def open(parameters: Configuration): Unit = { +LOG.debug(s"Compiling JoinFunction: $genJoinFuncName \n\n " + +s"Code:\n$genJoinFuncCode") +val clazz = compile( + getRuntimeContext.getUserCodeClassLoader, + genJoinFuncName, + genJoinFuncCode) +LOG.debug("Instantiating JoinFunction.") +joinFunction = clazz.newInstance() + +// initialize left and right state, the first element of tuple2 indicates how many rows of +// this row, while the second element represents the expired time of this row. +val tupleTypeInfo = new TupleTypeInfo[JTuple2[Int, Long]](Types.INT, Types.LONG) +val leftStateDescriptor = new MapStateDescriptor[Row, JTuple2[Int, Long]]( + "left", leftType, tupleTypeInfo) +val
[GitHub] flink pull request #5327: [FLINK-8428] [table] Implement stream-stream non-w...
Github user walterddr commented on a diff in the pull request: https://github.com/apache/flink/pull/5327#discussion_r172408402 --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/NonWindowLeftJoinWithNonEquiPredicates.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.flink.table.runtime.join + +import org.apache.flink.api.common.state._ +import org.apache.flink.api.common.typeinfo.TypeInformation +import org.apache.flink.api.java.tuple.{Tuple2 => JTuple2} +import org.apache.flink.configuration.Configuration +import org.apache.flink.streaming.api.functions.co.CoProcessFunction +import org.apache.flink.table.api.{StreamQueryConfig, Types} +import org.apache.flink.table.runtime.types.CRow +import org.apache.flink.types.Row +import org.apache.flink.util.Collector + +/** + * Connect data for left stream and right stream. Only use for LeftJoin with NonEquiPredicates. + * An MapState of type [Row, Long] is added to record how many rows from the right table can be + * matched for each left row. Left join without NonEquiPredicates doesn't need it because + * left rows can always join right rows as long as join keys are same. + * + * @param leftTypethe input type of left stream + * @param rightType the input type of right stream + * @param resultType the output type of join + * @param genJoinFuncName the function code of other non-equi condition + * @param genJoinFuncCode the function name of other non-equi condition + * @param queryConfig the configuration for the query to generate + */ +class NonWindowLeftJoinWithNonEquiPredicates( +leftType: TypeInformation[Row], +rightType: TypeInformation[Row], +resultType: TypeInformation[CRow], +genJoinFuncName: String, +genJoinFuncCode: String, +queryConfig: StreamQueryConfig) + extends NonWindowJoin( +leftType, +rightType, +resultType, +genJoinFuncName, +genJoinFuncCode, +queryConfig) { + + // result row, all field from right will be null + private var resultRow: Row = _ + // how many matched rows from the right table for each left row + private var leftJoinCnt: MapState[Row, Long] = _ + + override def open(parameters: Configuration): Unit = { +super.open(parameters) + +val leftJoinCntDescriptor = new MapStateDescriptor[Row, Long]( --- End diff -- I think either is fine as long as they are consistent. ---
[jira] [Assigned] (FLINK-8861) Add support for batch queries in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-8861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xingcan Cui reassigned FLINK-8861: -- Assignee: Xingcan Cui > Add support for batch queries in SQL Client > --- > > Key: FLINK-8861 > URL: https://issues.apache.org/jira/browse/FLINK-8861 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Timo Walther >Assignee: Xingcan Cui >Priority: Major > > This issue is a subtask of part two "Full Embedded SQL Client" of the > implementation plan mentioned in > [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. > Similar to streaming queries, it should be possible to execute batch queries > in the SQL Client and collect the results using {{DataSet.collect()}} for > debugging purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8689) Add runtime support of distinct filter using MapView
[ https://issues.apache.org/jira/browse/FLINK-8689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16387014#comment-16387014 ] ASF GitHub Bot commented on FLINK-8689: --- Github user walterddr commented on the issue: https://github.com/apache/flink/pull/ @hequn8128 @fhueske Thanks for the feedback. I have updated the diff to directly use DistinctAccumulator for filtering and modified the `generateAggregation` API. Please kindly take another look when you have time. I have resolved the issue of multiple layer dataview codegen. In terms of reusing same `DataView` for multiple distinct aggregations against the same field, I tried to incorporate but there are many assumptions with single mapping between `AggregateFunction`s and `Accumulator`s that's hard to deal with. I am planning to continue and improve on it in a separated JIRA, what do you think? > Add runtime support of distinct filter using MapView > - > > Key: FLINK-8689 > URL: https://issues.apache.org/jira/browse/FLINK-8689 > Project: Flink > Issue Type: Sub-task >Reporter: Rong Rong >Assignee: Rong Rong >Priority: Major > > This ticket should cover distinct aggregate function support to codegen for > *AggregateCall*, where *isDistinct* fields is set to true. > This can be verified using the following SQL, which is not currently > producing correct results. > {code:java} > SELECT > a, > SUM(b) OVER (PARTITION BY a ORDER BY proctime ROWS BETWEEN 5 PRECEDING AND > CURRENT ROW) > FROM > MyTable{code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5555: [FLINK-8689][table]Add runtime support of distinct filter...
Github user walterddr commented on the issue: https://github.com/apache/flink/pull/ @hequn8128 @fhueske Thanks for the feedback. I have updated the diff to directly use DistinctAccumulator for filtering and modified the `generateAggregation` API. Please kindly take another look when you have time. I have resolved the issue of multiple layer dataview codegen. In terms of reusing same `DataView` for multiple distinct aggregations against the same field, I tried to incorporate but there are many assumptions with single mapping between `AggregateFunction`s and `Accumulator`s that's hard to deal with. I am planning to continue and improve on it in a separated JIRA, what do you think? ---
[jira] [Created] (FLINK-8874) rewrite Flink docs/dev/stream/operators/process_function.md to recommend using KeyedProcessFunction
Bowen Li created FLINK-8874: --- Summary: rewrite Flink docs/dev/stream/operators/process_function.md to recommend using KeyedProcessFunction Key: FLINK-8874 URL: https://issues.apache.org/jira/browse/FLINK-8874 Project: Flink Issue Type: Improvement Components: DataStream API, Documentation Affects Versions: 1.5.0 Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.6.0 We need to completely rewrite Flink docs/dev/stream/operators/process_function.md to recommend using KeyedProcessFunction -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-5479) Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions
[ https://issues.apache.org/jira/browse/FLINK-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386676#comment-16386676 ] ASF GitHub Bot commented on FLINK-5479: --- Github user EronWright commented on the issue: https://github.com/apache/flink/pull/5634 I think the ideal would be that idleness would occur only for tail reads, i.e. due to a timeout from `kafkaConsumer.poll(pollTimeout)`.In other words, an intermittent connection issue would ideally not trigger idleness. > Per-partition watermarks in FlinkKafkaConsumer should consider idle partitions > -- > > Key: FLINK-5479 > URL: https://issues.apache.org/jira/browse/FLINK-5479 > Project: Flink > Issue Type: Improvement > Components: Kafka Connector >Reporter: Tzu-Li (Gordon) Tai >Priority: Blocker > Fix For: 1.5.0 > > > Reported in ML: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-topic-partition-skewness-causes-watermark-not-being-emitted-td11008.html > Similar to what's happening to idle sources blocking watermark progression in > downstream operators (see FLINK-5017), the per-partition watermark mechanism > in {{FlinkKafkaConsumer}} is also being blocked of progressing watermarks > when a partition is idle. The watermark of idle partitions is always > {{Long.MIN_VALUE}}, therefore the overall min watermark across all partitions > of a consumer subtask will never proceed. > It's normally not a common case to have Kafka partitions not producing any > data, but it'll probably be good to handle this as well. I think we should > have a localized solution similar to FLINK-5017 for the per-partition > watermarks in {{AbstractFetcher}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5634: [FLINK-5479] [kafka] Idleness detection for periodic per-...
Github user EronWright commented on the issue: https://github.com/apache/flink/pull/5634 I think the ideal would be that idleness would occur only for tail reads, i.e. due to a timeout from `kafkaConsumer.poll(pollTimeout)`.In other words, an intermittent connection issue would ideally not trigger idleness. ---
[jira] [Created] (FLINK-8873) move unit tests of KeyedStream.scala from DataStreamTest.scala to KeyedStreamTest.scala
Bowen Li created FLINK-8873: --- Summary: move unit tests of KeyedStream.scala from DataStreamTest.scala to KeyedStreamTest.scala Key: FLINK-8873 URL: https://issues.apache.org/jira/browse/FLINK-8873 Project: Flink Issue Type: Improvement Components: DataStream API, Tests Affects Versions: 1.5.0 Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.6.0 move unit tests of KeyedStream.scala from DataStreamTest.scala to KeyedStreamTest.scala, in order to have clearer separation -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8480) Implement Java API to expose join functionality of TimeBoundedStreamJoinOperator
[ https://issues.apache.org/jira/browse/FLINK-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386643#comment-16386643 ] ASF GitHub Bot commented on FLINK-8480: --- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172303424 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); + } + + checkNotNull(lowerBound, "A lower bound needs to be provided for a time-bounded join"); + checkNotNull(upperBound, "An upper bound needs to be provided for a time-bounded join"); + return new TimeBounded<>( + input1, + input2, + lowerBound.toMilliseconds(), + upperBound.toMilliseconds(), + true, + true, + keySelector1, + keySelector2 + ); + } + } + } + + /** +* Joined streams that have keys for both sides as well as the time boundaries over which +* elements should be joined defined. +* +* @param Input type of elements from the first stream +* @param Input type of elements from the second stream +* @param The type of the key +*/ + public static class TimeBounded { + + private static final String TIMEBOUNDED_JOIN_FUNC_NAME = "TimeBoundedJoin"; + + private final DataStream left; + private final DataStream right; + + private final long lowerBound; + private final long upperBound; + + private final KeySelector keySelector1; + private final KeySelector keySelector2; + + private boolean lowerBoundInclusive; + private boolean upperBoundInclusive; + + public TimeBounded( + DataStream left, + DataStream right, + long lowerBound, + long upperBound, + boolean lowerBoundInclusive, + boolean upperBoundInclusive, + KeySelector keySelector1, + KeySelector keySelector2) { + + this.left = Preconditions.checkNotNull(left); + this.right = Preconditions.checkNotNull(right); + + this.lowerBound = lowerBound; + this.upperBound = upperBound; + + this.lowerBoundInclusive = lowerBoundInclusive; + this.upperBoundInclusive = upperBoundInclusive; + + this.keySelector1 = Preconditions.checkNotNull(keySelector1); + this.keySelector2 =
[jira] [Commented] (FLINK-8480) Implement Java API to expose join functionality of TimeBoundedStreamJoinOperator
[ https://issues.apache.org/jira/browse/FLINK-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386642#comment-16386642 ] ASF GitHub Bot commented on FLINK-8480: --- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172306197 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/TimeBoundedStreamJoinOperator.java --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.functions; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.BooleanSerializer; +import org.apache.flink.api.common.typeutils.base.ListSerializer; +import org.apache.flink.api.common.typeutils.base.LongSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.typeutils.runtime.TupleSerializer; +import org.apache.flink.runtime.state.StateInitializationContext; +import org.apache.flink.runtime.state.VoidNamespace; +import org.apache.flink.runtime.state.VoidNamespaceSerializer; +import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator; +import org.apache.flink.streaming.api.operators.InternalTimer; +import org.apache.flink.streaming.api.operators.InternalTimerService; +import org.apache.flink.streaming.api.operators.TimestampedCollector; +import org.apache.flink.streaming.api.operators.Triggerable; +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; +import org.apache.flink.util.OutputTag; +import org.apache.flink.util.Preconditions; + +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * A TwoInputStreamOperator to execute time-bounded stream inner joins. + * + * By using a configurable lower and upper bound this operator will emit exactly those pairs + * (T1, T2) where t2.ts ∈ [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the + * upper bound can be configured to be either inclusive or exclusive. + * + * As soon as elements are joined they are passed to a user-defined {@link TimeBoundedJoinFunction}, + * as a {@link Tuple2}, with f0 being the left element and f1 being the right element + * + * The basic idea of this implementation is as follows: Whenever we receive an element at + * {@link #processElement1(StreamRecord)} (a.k.a. the left side), we add it to the left buffer. + * We then check the right buffer to see whether there are any elements that can be joined. If + * there are, they are joined and passed to a user-defined {@link TimeBoundedJoinFunction}. + * The same happens the other way around when receiving an element on the right side. + * + * In some cases the watermark needs to be delayed. This for example can happen if + * if t2.ts ∈ [t1.ts + 1, t1.ts + 2] and elements from t1 arrive earlier than elements from t2 and + * therefore get added to the left buffer. When an element now arrives on the right side, the + * watermark might have already progressed. The right element now gets joined with an + * older element from the left side, where the timestamp of the left element is lower than the + * current watermark, which would make this element late. This can be avoided by holding back the + * watermarks. + * + * The left and right buffers are cleared from unused values periodically + * (triggered by watermarks) in order not
[jira] [Commented] (FLINK-8480) Implement Java API to expose join functionality of TimeBoundedStreamJoinOperator
[ https://issues.apache.org/jira/browse/FLINK-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386641#comment-16386641 ] ASF GitHub Bot commented on FLINK-8480: --- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172302583 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); + } + + checkNotNull(lowerBound, "A lower bound needs to be provided for a time-bounded join"); + checkNotNull(upperBound, "An upper bound needs to be provided for a time-bounded join"); + return new TimeBounded<>( + input1, + input2, + lowerBound.toMilliseconds(), + upperBound.toMilliseconds(), + true, + true, + keySelector1, + keySelector2 + ); + } + } + } + + /** +* Joined streams that have keys for both sides as well as the time boundaries over which +* elements should be joined defined. +* +* @param Input type of elements from the first stream +* @param Input type of elements from the second stream +* @param The type of the key +*/ + public static class TimeBounded { + + private static final String TIMEBOUNDED_JOIN_FUNC_NAME = "TimeBoundedJoin"; --- End diff -- hmm... this might be not very relevant, but I'd prefer a single config class that holds all function's names, rather than having them scattered all over the code base. > Implement Java API to expose join functionality of > TimeBoundedStreamJoinOperator > > > Key: FLINK-8480 > URL: https://issues.apache.org/jira/browse/FLINK-8480 > Project: Flink > Issue Type: Sub-task >Reporter: Florian Schmidt >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8480) Implement Java API to expose join functionality of TimeBoundedStreamJoinOperator
[ https://issues.apache.org/jira/browse/FLINK-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386640#comment-16386640 ] ASF GitHub Bot commented on FLINK-8480: --- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172303671 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/TimeBoundedStreamJoinOperator.java --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.functions; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.BooleanSerializer; +import org.apache.flink.api.common.typeutils.base.ListSerializer; +import org.apache.flink.api.common.typeutils.base.LongSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.typeutils.runtime.TupleSerializer; +import org.apache.flink.runtime.state.StateInitializationContext; +import org.apache.flink.runtime.state.VoidNamespace; +import org.apache.flink.runtime.state.VoidNamespaceSerializer; +import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator; +import org.apache.flink.streaming.api.operators.InternalTimer; +import org.apache.flink.streaming.api.operators.InternalTimerService; +import org.apache.flink.streaming.api.operators.TimestampedCollector; +import org.apache.flink.streaming.api.operators.Triggerable; +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; +import org.apache.flink.util.OutputTag; +import org.apache.flink.util.Preconditions; + +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * A TwoInputStreamOperator to execute time-bounded stream inner joins. + * + * By using a configurable lower and upper bound this operator will emit exactly those pairs + * (T1, T2) where t2.ts ∈ [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the + * upper bound can be configured to be either inclusive or exclusive. --- End diff -- bound**s** > Implement Java API to expose join functionality of > TimeBoundedStreamJoinOperator > > > Key: FLINK-8480 > URL: https://issues.apache.org/jira/browse/FLINK-8480 > Project: Flink > Issue Type: Sub-task >Reporter: Florian Schmidt >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8480) Implement Java API to expose join functionality of TimeBoundedStreamJoinOperator
[ https://issues.apache.org/jira/browse/FLINK-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386639#comment-16386639 ] ASF GitHub Bot commented on FLINK-8480: --- Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172302147 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); --- End diff -- should use `IllegalStateException`. or even better, shall we create a Flink specific exception? > Implement Java API to expose join functionality of > TimeBoundedStreamJoinOperator > > > Key: FLINK-8480 > URL: https://issues.apache.org/jira/browse/FLINK-8480 > Project: Flink > Issue Type: Sub-task >Reporter: Florian Schmidt >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5482: [FLINK-8480][DataStream] Add Java API for timeboun...
Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172306197 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/TimeBoundedStreamJoinOperator.java --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.functions; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.BooleanSerializer; +import org.apache.flink.api.common.typeutils.base.ListSerializer; +import org.apache.flink.api.common.typeutils.base.LongSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.typeutils.runtime.TupleSerializer; +import org.apache.flink.runtime.state.StateInitializationContext; +import org.apache.flink.runtime.state.VoidNamespace; +import org.apache.flink.runtime.state.VoidNamespaceSerializer; +import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator; +import org.apache.flink.streaming.api.operators.InternalTimer; +import org.apache.flink.streaming.api.operators.InternalTimerService; +import org.apache.flink.streaming.api.operators.TimestampedCollector; +import org.apache.flink.streaming.api.operators.Triggerable; +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; +import org.apache.flink.util.OutputTag; +import org.apache.flink.util.Preconditions; + +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * A TwoInputStreamOperator to execute time-bounded stream inner joins. + * + * By using a configurable lower and upper bound this operator will emit exactly those pairs + * (T1, T2) where t2.ts â [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the + * upper bound can be configured to be either inclusive or exclusive. + * + * As soon as elements are joined they are passed to a user-defined {@link TimeBoundedJoinFunction}, + * as a {@link Tuple2}, with f0 being the left element and f1 being the right element + * + * The basic idea of this implementation is as follows: Whenever we receive an element at + * {@link #processElement1(StreamRecord)} (a.k.a. the left side), we add it to the left buffer. + * We then check the right buffer to see whether there are any elements that can be joined. If + * there are, they are joined and passed to a user-defined {@link TimeBoundedJoinFunction}. + * The same happens the other way around when receiving an element on the right side. + * + * In some cases the watermark needs to be delayed. This for example can happen if + * if t2.ts â [t1.ts + 1, t1.ts + 2] and elements from t1 arrive earlier than elements from t2 and + * therefore get added to the left buffer. When an element now arrives on the right side, the + * watermark might have already progressed. The right element now gets joined with an + * older element from the left side, where the timestamp of the left element is lower than the + * current watermark, which would make this element late. This can be avoided by holding back the + * watermarks. + * + * The left and right buffers are cleared from unused values periodically + * (triggered by watermarks) in order not to grow infinitely. + * + * + * @param The type of the elements in the left stream + * @param The type of the elements in the right stream + * @param The output type created by the user-defined function + */
[GitHub] flink pull request #5482: [FLINK-8480][DataStream] Add Java API for timeboun...
Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172303424 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); + } + + checkNotNull(lowerBound, "A lower bound needs to be provided for a time-bounded join"); + checkNotNull(upperBound, "An upper bound needs to be provided for a time-bounded join"); + return new TimeBounded<>( + input1, + input2, + lowerBound.toMilliseconds(), + upperBound.toMilliseconds(), + true, + true, + keySelector1, + keySelector2 + ); + } + } + } + + /** +* Joined streams that have keys for both sides as well as the time boundaries over which +* elements should be joined defined. +* +* @param Input type of elements from the first stream +* @param Input type of elements from the second stream +* @param The type of the key +*/ + public static class TimeBounded { + + private static final String TIMEBOUNDED_JOIN_FUNC_NAME = "TimeBoundedJoin"; + + private final DataStream left; + private final DataStream right; + + private final long lowerBound; + private final long upperBound; + + private final KeySelector keySelector1; + private final KeySelector keySelector2; + + private boolean lowerBoundInclusive; + private boolean upperBoundInclusive; + + public TimeBounded( + DataStream left, + DataStream right, + long lowerBound, + long upperBound, + boolean lowerBoundInclusive, + boolean upperBoundInclusive, + KeySelector keySelector1, + KeySelector keySelector2) { + + this.left = Preconditions.checkNotNull(left); + this.right = Preconditions.checkNotNull(right); + + this.lowerBound = lowerBound; + this.upperBound = upperBound; + + this.lowerBoundInclusive = lowerBoundInclusive; + this.upperBoundInclusive = upperBoundInclusive; + + this.keySelector1 = Preconditions.checkNotNull(keySelector1); + this.keySelector2 = Preconditions.checkNotNull(keySelector2); + } + + /** +* Configure whether the upper bound should be considered exclusive or inclusive. +*/ + public TimeBounded
[GitHub] flink pull request #5482: [FLINK-8480][DataStream] Add Java API for timeboun...
Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172302147 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); --- End diff -- should use `IllegalStateException`. or even better, shall we create a Flink specific exception? ---
[GitHub] flink pull request #5482: [FLINK-8480][DataStream] Add Java API for timeboun...
Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172302583 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/datastream/JoinedStreams.java --- @@ -137,6 +158,151 @@ public EqualTo equalTo(KeySelectorkeySelector) { public WithWindow window(WindowAssigner, W> assigner) { return new WithWindow<>(input1, input2, keySelector1, keySelector2, keyType, assigner, null, null); } + + /** +* Specifies the time boundaries over which the join operation works, so that +* leftElement.timestamp + lowerBound <= rightElement.timestamp <= leftElement.timestamp + upperBound +* By default both the lower and the upper bound are inclusive. This can be configured +* with {@link TimeBounded#lowerBoundExclusive(boolean)} and +* {@link TimeBounded#upperBoundExclusive(boolean)} +* +* @param lowerBound The lower bound. Needs to be smaller than or equal to the upperBound +* @param upperBound The upper bound. Needs to be bigger than or equal to the lowerBound +*/ + public TimeBounded between(Time lowerBound, Time upperBound) { + + TimeCharacteristic timeCharacteristic = + input1.getExecutionEnvironment().getStreamTimeCharacteristic(); + + if (timeCharacteristic != TimeCharacteristic.EventTime) { + throw new RuntimeException("Time-bounded stream joins are only supported in event time"); + } + + checkNotNull(lowerBound, "A lower bound needs to be provided for a time-bounded join"); + checkNotNull(upperBound, "An upper bound needs to be provided for a time-bounded join"); + return new TimeBounded<>( + input1, + input2, + lowerBound.toMilliseconds(), + upperBound.toMilliseconds(), + true, + true, + keySelector1, + keySelector2 + ); + } + } + } + + /** +* Joined streams that have keys for both sides as well as the time boundaries over which +* elements should be joined defined. +* +* @param Input type of elements from the first stream +* @param Input type of elements from the second stream +* @param The type of the key +*/ + public static class TimeBounded { + + private static final String TIMEBOUNDED_JOIN_FUNC_NAME = "TimeBoundedJoin"; --- End diff -- hmm... this might be not very relevant, but I'd prefer a single config class that holds all function's names, rather than having them scattered all over the code base. ---
[GitHub] flink pull request #5482: [FLINK-8480][DataStream] Add Java API for timeboun...
Github user bowenli86 commented on a diff in the pull request: https://github.com/apache/flink/pull/5482#discussion_r172303671 --- Diff: flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/TimeBoundedStreamJoinOperator.java --- @@ -0,0 +1,480 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.api.functions; + +import org.apache.flink.annotation.Internal; +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.api.common.state.MapState; +import org.apache.flink.api.common.state.MapStateDescriptor; +import org.apache.flink.api.common.typeutils.TypeSerializer; +import org.apache.flink.api.common.typeutils.base.BooleanSerializer; +import org.apache.flink.api.common.typeutils.base.ListSerializer; +import org.apache.flink.api.common.typeutils.base.LongSerializer; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.api.java.tuple.Tuple3; +import org.apache.flink.api.java.typeutils.runtime.TupleSerializer; +import org.apache.flink.runtime.state.StateInitializationContext; +import org.apache.flink.runtime.state.VoidNamespace; +import org.apache.flink.runtime.state.VoidNamespaceSerializer; +import org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator; +import org.apache.flink.streaming.api.operators.InternalTimer; +import org.apache.flink.streaming.api.operators.InternalTimerService; +import org.apache.flink.streaming.api.operators.TimestampedCollector; +import org.apache.flink.streaming.api.operators.Triggerable; +import org.apache.flink.streaming.api.operators.TwoInputStreamOperator; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.streamrecord.StreamRecord; +import org.apache.flink.util.OutputTag; +import org.apache.flink.util.Preconditions; + +import java.util.ArrayList; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * A TwoInputStreamOperator to execute time-bounded stream inner joins. + * + * By using a configurable lower and upper bound this operator will emit exactly those pairs + * (T1, T2) where t2.ts â [T1.ts + lowerBound, T1.ts + upperBound]. Both the lower and the + * upper bound can be configured to be either inclusive or exclusive. --- End diff -- bound**s** ---
[jira] [Updated] (FLINK-8829) Flink in EMR(YARN) is down due to Akka communication issue
[ https://issues.apache.org/jira/browse/FLINK-8829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Filichkin updated FLINK-8829: --- Description: Hi, We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink job is down due to: _2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]] Caused by: [Connection refused: ip-10-97-34-209.tr-fr-nonprod.aws-int.com/10.97.34.209:42177] 2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] 2018-02-16 19:00:05,596 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to JobManager akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager]. Triggering connection timeout._ Do you have any ideas how to troubleshoot it? was: Hi, We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink job is down due to: _2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - Association with remote system [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]]] Caused by: [Connection refused: ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com/10.97.34.209:42177] 2018-02-16 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177]] 2018-02-16 19:00:05,596 INFO org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to JobManager akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager]. Triggering connection timeout._ Do you have any ideas how to troubleshoot it? > Flink in EMR(YARN) is down due to Akka communication issue > -- > > Key: FLINK-8829 > URL: https://issues.apache.org/jira/browse/FLINK-8829 > Project: Flink > Issue Type: Bug > Components: YARN >Affects Versions: 1.3.2 >Reporter: Aleksandr Filichkin >Priority: Major > > Hi, > We have running Flink 1.3.2 app in Amazon EMR with YARN. Every week our Flink > job is down due to: > _2018-02-16 19:00:04,595 WARN akka.remote.ReliableDeliverySupervisor - > Association with remote system > [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] > has failed, address is now gated for [5000] ms. Reason: [Association failed > with > [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]]] > Caused by: [Connection refused: > ip-10-97-34-209.tr-fr-nonprod.aws-int.com/10.97.34.209:42177] 2018-02-16 > 19:00:05,593 WARN akka.remote.RemoteWatcher - Detected unreachable: > [akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177]] > 2018-02-16 19:00:05,596 INFO > org.apache.flink.runtime.client.JobSubmissionClientActor - Lost connection to > JobManager > akka.tcp://[fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.com:42177/user/jobmanager|mailto:fl...@ip-10-97-34-209.tr-fr-nonprod.aws-int.thomsonreuters.com:42177/user/jobmanager]. > Triggering connection timeout._ > Do you have any ideas how to troubleshoot it? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5642: [FLINK-8091] [flink-dist] Support running historys...
GitHub user fnk opened a pull request: https://github.com/apache/flink/pull/5642 [FLINK-8091] [flink-dist] Support running historyserver in foreground ## Brief change log *Allow historyserver to run in foreground* - The scripts "flink-console.sh" and "historyserver.sh" were adjusted to handle the "start-foreground" flag - The documentation for historyserver was adjusted accordingly ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. We use this change in our production setup with flink 1.3 but as far as i can see the related scripts did not change between releases. "mvn clean verify" succeeded. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/noris-network/flink FLINK-8091 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5642 commit 1fdc24a86b59843020e7ccecb82e9f7daf9f34f7 Author: Andreas FinkDate: 2018-03-05T17:26:57Z [FLINK-8091] Support running historyserver in foreground ---
[jira] [Commented] (FLINK-8091) Support running historyserver in foreground
[ https://issues.apache.org/jira/browse/FLINK-8091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386414#comment-16386414 ] ASF GitHub Bot commented on FLINK-8091: --- GitHub user fnk opened a pull request: https://github.com/apache/flink/pull/5642 [FLINK-8091] [flink-dist] Support running historyserver in foreground ## Brief change log *Allow historyserver to run in foreground* - The scripts "flink-console.sh" and "historyserver.sh" were adjusted to handle the "start-foreground" flag - The documentation for historyserver was adjusted accordingly ## Verifying this change This change is a trivial rework / code cleanup without any test coverage. We use this change in our production setup with flink 1.3 but as far as i can see the related scripts did not change between releases. "mvn clean verify" succeeded. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? yes - If yes, how is the feature documented? docs You can merge this pull request into a Git repository by running: $ git pull https://github.com/noris-network/flink FLINK-8091 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5642.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5642 commit 1fdc24a86b59843020e7ccecb82e9f7daf9f34f7 Author: Andreas FinkDate: 2018-03-05T17:26:57Z [FLINK-8091] Support running historyserver in foreground > Support running historyserver in foreground > --- > > Key: FLINK-8091 > URL: https://issues.apache.org/jira/browse/FLINK-8091 > Project: Flink > Issue Type: Improvement > Components: Docker, History Server >Affects Versions: 1.3.2 >Reporter: Joshua Griffith >Priority: Minor > > The historyserver runs as a daemon. To use it with Docker it would be > convenient if it supported running in the foreground. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8703) Migrate tests from LocalFlinkMiniCluster to MiniClusterResource
[ https://issues.apache.org/jira/browse/FLINK-8703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386404#comment-16386404 ] ASF GitHub Bot commented on FLINK-8703: --- Github user zentol closed the pull request at: https://github.com/apache/flink/pull/5636 > Migrate tests from LocalFlinkMiniCluster to MiniClusterResource > --- > > Key: FLINK-8703 > URL: https://issues.apache.org/jira/browse/FLINK-8703 > Project: Flink > Issue Type: Sub-task > Components: Tests >Reporter: Aljoscha Krettek >Assignee: Chesnay Schepler >Priority: Blocker > Fix For: 1.5.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5636: [FLINK-8703][tests] Port CancelingTestBase to Mini...
Github user zentol closed the pull request at: https://github.com/apache/flink/pull/5636 ---
[jira] [Commented] (FLINK-8818) Harden YarnFileStageTest upload test for eventual consistent read-after-write
[ https://issues.apache.org/jira/browse/FLINK-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386398#comment-16386398 ] ASF GitHub Bot commented on FLINK-8818: --- Github user NicoK commented on the issue: https://github.com/apache/flink/pull/5601 The main issue was actually described in FLINK-8801: According to https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write. Some S3 file system implementations may actually execute such a request for the about-to-write object and thus the read-after-write is only eventually consistent. > Harden YarnFileStageTest upload test for eventual consistent read-after-write > - > > Key: FLINK-8818 > URL: https://issues.apache.org/jira/browse/FLINK-8818 > Project: Flink > Issue Type: Sub-task > Components: FileSystem, Tests, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Fix For: 1.5.0, 1.4.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5601: [FLINK-8818][yarn/s3][tests] harden YarnFileStageTest upl...
Github user NicoK commented on the issue: https://github.com/apache/flink/pull/5601 The main issue was actually described in FLINK-8801: According to https://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel: > Amazon S3 provides read-after-write consistency for PUTS of new objects in your S3 bucket in all regions with one caveat. The caveat is that if you make a HEAD or GET request to the key name (to find if the object exists) before creating the object, Amazon S3 provides eventual consistency for read-after-write. Some S3 file system implementations may actually execute such a request for the about-to-write object and thus the read-after-write is only eventually consistent. ---
[jira] [Closed] (FLINK-8757) Add MiniClusterResource.getClusterClient()
[ https://issues.apache.org/jira/browse/FLINK-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8757. --- Resolution: Duplicate Fix Version/s: (was: 1.5.0) > Add MiniClusterResource.getClusterClient() > -- > > Key: FLINK-8757 > URL: https://issues.apache.org/jira/browse/FLINK-8757 > Project: Flink > Issue Type: Improvement > Components: Client, Tests >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek >Priority: Blocker > > Some of the tests that we need to port as part of FLINK-8700 need a way to > submit jobs asynchronously to the testing cluster. For this, we need to be > able to retrieve a {{ClusterClient}} and expose a method for asynchronous job > submission. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8871) Checkpoint cancellation is not propagated to stop checkpointing threads on the task manager
Stefan Richter created FLINK-8871: - Summary: Checkpoint cancellation is not propagated to stop checkpointing threads on the task manager Key: FLINK-8871 URL: https://issues.apache.org/jira/browse/FLINK-8871 Project: Flink Issue Type: Bug Components: State Backends, Checkpointing Affects Versions: 1.4.1, 1.3.2, 1.5.0 Reporter: Stefan Richter Fix For: 1.6.0 Flink currently lacks any form of feedback mechanism from the job manager / checkpoint coordinator to the tasks when it comes to failing a checkpoint. This means that running snapshots on the tasks are also not stopped even if their owning checkpoint is already cancelled. Two examples for cases where this applies are checkpoint timeouts and local checkpoint failures on a task together with a configuration that does not fail tasks on checkpoint failure. Notice that those running snapshots do no longer account for the maximum number of parallel checkpoints, because their owning checkpoint is considered as cancelled. Not stopping the task's snapshot thread can lead to a problematic situation where the next checkpoints already started, while the abandoned checkpoint thread from a previous checkpoint is still lingering around running. This scenario can potentially cascade: many parallel checkpoints will slow down checkpointing and make timeouts even more likely. A possible solution is introducing a {{cancelCheckpoint}} method as counterpart to the {{triggerCheckpoint}} method in the task manager gateway, which is invoked by the checkpoint coordinator as part of cancelling the checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8872) Yarn detached mode via -yd does not detach
Nico Kruber created FLINK-8872: -- Summary: Yarn detached mode via -yd does not detach Key: FLINK-8872 URL: https://issues.apache.org/jira/browse/FLINK-8872 Project: Flink Issue Type: Bug Components: Client, YARN Affects Versions: 1.5.0 Reporter: Nico Kruber Assignee: Nico Kruber Fix For: 1.5.0 Running yarn per-job cluster in detached mode currently does not work and waits for the job to finish. Example: {code} ./bin/flink run -m yarn-cluster -yn 10 -yjm 768 -ytm 3072 -ys 2 -yd -p 20 -c org.apache.flink.streaming.examples.wordcount.WordCount ./examples/streaming/WordCount.jar --input {code} Output in case of an infinite program would then end with something like this: {code} 2018-03-05 13:41:23,311 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Waiting for the cluster to be allocated 2018-03-05 13:41:23,313 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - Deploying cluster, current state ACCEPTED 2018-03-05 13:41:28,342 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - YARN application has been deployed successfully. 2018-03-05 13:41:28,343 INFO org.apache.flink.yarn.AbstractYarnClusterDescriptor - The Flink YARN client has been started in detached mode. In order to stop Flink on YARN, use the following command or a YARN web interface to stop it: yarn application -kill application_1519984124671_0006 Please also note that the temporary files of the YARN session in the home directoy will not be removed. Starting execution of program {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-8849) Wrong link from concepts/runtime to doc on chaining
[ https://issues.apache.org/jira/browse/FLINK-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8849. --- Resolution: Fixed Assignee: Ken Krugler Fix Version/s: 1.4.2 1.5.0 master: 38785a0072b58b0238615a3bdf8f6da579f98154 1.5: e19b9fd9debcc56747b1ba364cbe1d5837274d86 1.4: 5aa481d5aa2a22755959fdc48e3449535f2b44e9 > Wrong link from concepts/runtime to doc on chaining > --- > > Key: FLINK-8849 > URL: https://issues.apache.org/jira/browse/FLINK-8849 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Ken Krugler >Assignee: Ken Krugler >Priority: Minor > Fix For: 1.5.0, 1.4.2 > > > On > https://ci.apache.org/projects/flink/flink-docs-master/concepts/runtime.html > there's a link to "chaining docs" that currently points at: > https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups > but it should link to: > https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#task-chaining-and-resource-groups -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8601) Introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization
[ https://issues.apache.org/jira/browse/FLINK-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386365#comment-16386365 ] ASF GitHub Bot commented on FLINK-8601: --- Github user sihuazhou commented on the issue: https://github.com/apache/flink/pull/5641 For the sake of easy to discussion later.. > Introduce PartitionedBloomFilter for Approximate calculation and other > situations of performance optimization > - > > Key: FLINK-8601 > URL: https://issues.apache.org/jira/browse/FLINK-8601 > Project: Flink > Issue Type: New Feature > Components: DataStream API, State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > h3. Backgroud > Bloom filter is useful in many situation, for example: > * 1. Approximate calculation: deduplication (eg: UV calculation) > * 2. Performance optimization: eg, [runtime filter > join|https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html] >By using BF, we can greatly reduce the number of queries for state > data in a stream join, and these filtered queries will eventually fail to > find any results, which is a poor performance for rocksdb-based state due to > traversing ```sst``` on the disk. > However, based on the current status provided by flink, it is hard to use the > bloom filter for the following reasons: > * 1. Serialization problem: Bloom filter status can be large (for example: > 100M), if implement it based on the RocksDB state, the state data will need > to be serialized each time it is queried and updated, and the performance > will be very poor. > * 2. Data skewed: Data in different key group can be skewed, and the > information of data skewed can not be accurately predicted before the program > is running. Therefore, it is impossible to determine how much resources bloom > filter should allocate. One way to do this is to allocate space needed for > the most skewed case, but this can lead to very serious waste of resources. > h3. Requirement > Therefore, I introduce the PartitionedBloomFilter for flink, which at least > need to meet the following features: > * 1. Support for changing Parallelism > * 2. Only serialize when necessary: when performing checkpoint > * 3. Can deal with data skew problem: users only need to specify a > PartitionedBloomFilter with the desired input, fpp, system will allocate > resource dynamic. > * 4. Do not conflict with other state: user can use KeyedState and > OperateState when using this bloom filter. > * 5. Support relax ttl (ie: the data survival time at least greater than the > specified time) > Design doc: [design > doc|https://docs.google.com/document/d/1s8w2dkNFDM9Fb2zoHwHY0hJRrqatAFta42T97nDXmqc/edit?usp=sharing] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5641: [FLINK-8601][WIP] Introduce PartitionedBloomFilter for Ap...
Github user sihuazhou commented on the issue: https://github.com/apache/flink/pull/5641 For the sake of easy to discussion later.. ---
[jira] [Closed] (FLINK-8857) HBase connector read example throws exception at the end.
[ https://issues.apache.org/jira/browse/FLINK-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chesnay Schepler closed FLINK-8857. --- Resolution: Fixed Fix Version/s: 1.4.3 1.5.0 master: 72bba50aa3382dbfe904cdb36797bc716f76a129 1.5: 302781dfe5d9e0a6b9246732ac3227db2ced64f6 1.4: 87af955b3a0d9e173c1901c5998cf5876cff91ea > HBase connector read example throws exception at the end. > - > > Key: FLINK-8857 > URL: https://issues.apache.org/jira/browse/FLINK-8857 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.2.0, 1.3.2, 1.5.0, 1.4.1 >Reporter: Xu Zhang >Assignee: Xu Zhang >Priority: Trivial > Labels: easy-fix, starter > Fix For: 1.5.0, 1.4.3 > > > Running test case example of > {code:java} > flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseReadExample.java{code} > Although the result has been printed out successfully, but at the end, driver > will throw the following exception. > {code:java} > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:381) > at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259) > at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130) > Caused by: java.lang.RuntimeException: No new data sinks have been defined > since the last execution. The last execution refers to the latest call to > 'execute()', 'count()', 'collect()', or 'print()'. > at > org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1050) > at > org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1032) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:59) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926) > at com.hulu.ap.flink.scoring.pipeline.ScoringJob.main(ScoringJob.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) > ... 13 more > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5641: [FLINK-8601][WIP] Introduce PartitionedBloomFilter...
GitHub user sihuazhou opened a pull request: https://github.com/apache/flink/pull/5641 [FLINK-8601][WIP] Introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization This PR introduce PartitionedBloomFilter which support rescaling and can deal with data skew problem properly. ## Brief change log - introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization. ## Verifying this change This change can be verified by the unit tests in below files: - PartitionedBloomFilterTest.java - LinkedBloomFilterTest.java - LinkedBloomFilterNodeTest.java - PartitionedBloomFilterManagerTest.java ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) doc: [google doc](https://docs.google.com/document/d/1s8w2dkNFDM9Fb2zoHwHY0hJRrqatAFta42T97nDXmqc/edit?usp=sharing) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sihuazhou/flink bloomfilter_state Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5641.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5641 commit 5429abe0031a93596b12dada6e9696f3179eb4e8 Author: summerleafsDate: 2018-02-06T16:47:25Z introduce bloom filter state. commit 2d1f66c10fbf74272be76283b909b290ae55d4fd Author: summerleafs Date: 2018-02-07T14:52:22Z add unit tests for bloom filter state. commit 433370a12814f7bd80127d4508e1dd0812a9d3fe Author: summerleafs Date: 2018-02-07T18:12:13Z add general type support. commit 5e05ee84353516fe7ff6eb7dd3a01dfdb3337bc5 Author: summerleafs Date: 2018-02-09T15:10:11Z this is a tmp commit. commit 6e4ff0cebed853c598e0647e9f8aa56b5b59d0cc Author: summerleafs Date: 2018-02-10T14:30:13Z this is a tmp commit. commit aa672e6e1e89b185722fde44a9b4044b87010c99 Author: summerleafs Date: 2018-02-10T15:32:01Z this is a tmp commit. commit 3b04502ba277cad2a7b0bc381fb192d18b56f17d Author: summerleafs Date: 2018-02-11T11:34:54Z fix build. commit 775d6aaf354de35c7ddff242f8e006e13e9a0e76 Author: summerleafs Date: 2018-02-12T03:52:43Z add annotation for classes. commit b7f04303aa1ec1fbe9696bb58b13838b6a74a7ae Author: summerleafs Date: 2018-02-12T03:53:19Z a temp commit. commit 28222bf5fc352a26082f2aee19be70ca5f9aa9d9 Author: sihuazhou Date: 2018-03-05T16:48:15Z fix build. ---
[jira] [Commented] (FLINK-8601) Introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization
[ https://issues.apache.org/jira/browse/FLINK-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386361#comment-16386361 ] ASF GitHub Bot commented on FLINK-8601: --- GitHub user sihuazhou opened a pull request: https://github.com/apache/flink/pull/5641 [FLINK-8601][WIP] Introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization This PR introduce PartitionedBloomFilter which support rescaling and can deal with data skew problem properly. ## Brief change log - introduce PartitionedBloomFilter for Approximate calculation and other situations of performance optimization. ## Verifying this change This change can be verified by the unit tests in below files: - PartitionedBloomFilterTest.java - LinkedBloomFilterTest.java - LinkedBloomFilterNodeTest.java - PartitionedBloomFilterManagerTest.java ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (yes) doc: [google doc](https://docs.google.com/document/d/1s8w2dkNFDM9Fb2zoHwHY0hJRrqatAFta42T97nDXmqc/edit?usp=sharing) You can merge this pull request into a Git repository by running: $ git pull https://github.com/sihuazhou/flink bloomfilter_state Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5641.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5641 commit 5429abe0031a93596b12dada6e9696f3179eb4e8 Author: summerleafsDate: 2018-02-06T16:47:25Z introduce bloom filter state. commit 2d1f66c10fbf74272be76283b909b290ae55d4fd Author: summerleafs Date: 2018-02-07T14:52:22Z add unit tests for bloom filter state. commit 433370a12814f7bd80127d4508e1dd0812a9d3fe Author: summerleafs Date: 2018-02-07T18:12:13Z add general type support. commit 5e05ee84353516fe7ff6eb7dd3a01dfdb3337bc5 Author: summerleafs Date: 2018-02-09T15:10:11Z this is a tmp commit. commit 6e4ff0cebed853c598e0647e9f8aa56b5b59d0cc Author: summerleafs Date: 2018-02-10T14:30:13Z this is a tmp commit. commit aa672e6e1e89b185722fde44a9b4044b87010c99 Author: summerleafs Date: 2018-02-10T15:32:01Z this is a tmp commit. commit 3b04502ba277cad2a7b0bc381fb192d18b56f17d Author: summerleafs Date: 2018-02-11T11:34:54Z fix build. commit 775d6aaf354de35c7ddff242f8e006e13e9a0e76 Author: summerleafs Date: 2018-02-12T03:52:43Z add annotation for classes. commit b7f04303aa1ec1fbe9696bb58b13838b6a74a7ae Author: summerleafs Date: 2018-02-12T03:53:19Z a temp commit. commit 28222bf5fc352a26082f2aee19be70ca5f9aa9d9 Author: sihuazhou Date: 2018-03-05T16:48:15Z fix build. > Introduce PartitionedBloomFilter for Approximate calculation and other > situations of performance optimization > - > > Key: FLINK-8601 > URL: https://issues.apache.org/jira/browse/FLINK-8601 > Project: Flink > Issue Type: New Feature > Components: DataStream API, State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Sihua Zhou >Assignee: Sihua Zhou >Priority: Major > > h3. Backgroud > Bloom filter is useful in many situation, for example: > * 1. Approximate calculation: deduplication (eg: UV calculation) > * 2. Performance optimization: eg, [runtime filter > join|https://www.cloudera.com/documentation/enterprise/5-9-x/topics/impala_runtime_filtering.html] >By using BF, we can greatly reduce the number of queries for state > data in a stream join, and these filtered queries will eventually fail to > find any results, which is a poor performance for rocksdb-based state due to > traversing ```sst``` on the disk. > However, based on the current status provided by flink, it is hard to use the > bloom filter for the following reasons: > * 1. Serialization problem: Bloom filter status can be large (for example: > 100M), if implement it based on
[jira] [Resolved] (FLINK-8337) GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable
[ https://issues.apache.org/jira/browse/FLINK-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nico Kruber resolved FLINK-8337. Resolution: Fixed Assignee: Nico Kruber Fix Version/s: (was: 1.4.3) Should be fixed with FLINK-8517. > GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled instable > > > Key: FLINK-8337 > URL: https://issues.apache.org/jira/browse/FLINK-8337 > Project: Flink > Issue Type: Bug > Components: Gelly, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Assignee: Nico Kruber >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{GatherSumApplyITCase.testConnectedComponentsWithObjectReuseDisabled}} > fails on Travis. It looks as if a sub partition has not been registered at > the task event dispatcher. > https://travis-ci.org/apache/flink/jobs/323930301 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8849) Wrong link from concepts/runtime to doc on chaining
[ https://issues.apache.org/jira/browse/FLINK-8849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386350#comment-16386350 ] ASF GitHub Bot commented on FLINK-8849: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5630 > Wrong link from concepts/runtime to doc on chaining > --- > > Key: FLINK-8849 > URL: https://issues.apache.org/jira/browse/FLINK-8849 > Project: Flink > Issue Type: Bug > Components: Documentation >Reporter: Ken Krugler >Priority: Minor > > On > https://ci.apache.org/projects/flink/flink-docs-master/concepts/runtime.html > there's a link to "chaining docs" that currently points at: > https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups > but it should link to: > https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#task-chaining-and-resource-groups -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8857) HBase connector read example throws exception at the end.
[ https://issues.apache.org/jira/browse/FLINK-8857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386351#comment-16386351 ] ASF GitHub Bot commented on FLINK-8857: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5633 > HBase connector read example throws exception at the end. > - > > Key: FLINK-8857 > URL: https://issues.apache.org/jira/browse/FLINK-8857 > Project: Flink > Issue Type: Bug > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.2.0, 1.3.2, 1.5.0, 1.4.1 >Reporter: Xu Zhang >Assignee: Xu Zhang >Priority: Trivial > Labels: easy-fix, starter > > Running test case example of > {code:java} > flink-hbase/src/test/java/org/apache/flink/addons/hbase/example/HBaseReadExample.java{code} > Although the result has been printed out successfully, but at the end, driver > will throw the following exception. > {code:java} > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: The main method > caused an error. > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:381) > at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:838) > at org.apache.flink.client.CliFrontend.run(CliFrontend.java:259) > at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1086) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1133) > at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1130) > at > org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656) > at > org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40) > at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1130) > Caused by: java.lang.RuntimeException: No new data sinks have been defined > since the last execution. The last execution refers to the latest call to > 'execute()', 'count()', 'collect()', or 'print()'. > at > org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1050) > at > org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1032) > at > org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:59) > at > org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:926) > at com.hulu.ap.flink.scoring.pipeline.ScoringJob.main(ScoringJob.java:82) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528) > ... 13 more > {code} > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7226) REST responses contain invalid content-encoding header
[ https://issues.apache.org/jira/browse/FLINK-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386349#comment-16386349 ] ASF GitHub Bot commented on FLINK-7226: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5590 > REST responses contain invalid content-encoding header > -- > > Key: FLINK-7226 > URL: https://issues.apache.org/jira/browse/FLINK-7226 > Project: Flink > Issue Type: Bug > Components: Webfrontend >Affects Versions: 1.2.0, 1.1.4, 1.3.1, 1.4.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Major > Fix For: 1.4.0, 1.3.2 > > > FLINK-5705 made changes to the {{RuntimeMonitorHandler}} to set the > {{content-encoding}} header to {{UTF-8}}. This however isn't a valid value > for this header, and should instead be included in the {{content-type}} > header. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5590: [hotfix][REST] Fix CONTENT_TYPE header
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5590 ---
[GitHub] flink pull request #5592: [hotfix] fix javadoc link of ClusterClient#trigger...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5592 ---
[GitHub] flink pull request #5607: [hotfix][docs] Drop the incorrect parallel remark ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5607 ---
[GitHub] flink pull request #5633: [FLINK-8857] [Hbase] Avoid HBase connector read ex...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5633 ---
[GitHub] flink pull request #5630: [FLINK-8849][Documentation] Fix link to chaining d...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5630 ---
[GitHub] flink pull request #5627: [doc] Remove missed CheckpointedRestoring
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5627 ---
[jira] [Commented] (FLINK-8818) Harden YarnFileStageTest upload test for eventual consistent read-after-write
[ https://issues.apache.org/jira/browse/FLINK-8818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386341#comment-16386341 ] ASF GitHub Bot commented on FLINK-8818: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5601 S3 is actually strongly consistent when reading newly created objects, just not in listing or renaming objects (files). The test seems to actually use reads of full paths, so wondering why there is a failure in the first place. If there is an issue that the Yarn upload code relies on eventually consistent operations, then fixing the test by retries may disguise the actual issue. If there is no eventually consistent operation, then this should not be necessary in the first place. I fear this change may be down a tricky path... Can you explain/double check why the failure happened and why the retry is necessary to stabilize the test, but the actual Yarn code is not affected by this? > Harden YarnFileStageTest upload test for eventual consistent read-after-write > - > > Key: FLINK-8818 > URL: https://issues.apache.org/jira/browse/FLINK-8818 > Project: Flink > Issue Type: Sub-task > Components: FileSystem, Tests, YARN >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Fix For: 1.5.0, 1.4.3 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5601: [FLINK-8818][yarn/s3][tests] harden YarnFileStageTest upl...
Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5601 S3 is actually strongly consistent when reading newly created objects, just not in listing or renaming objects (files). The test seems to actually use reads of full paths, so wondering why there is a failure in the first place. If there is an issue that the Yarn upload code relies on eventually consistent operations, then fixing the test by retries may disguise the actual issue. If there is no eventually consistent operation, then this should not be necessary in the first place. I fear this change may be down a tricky path... Can you explain/double check why the failure happened and why the retry is necessary to stabilize the test, but the actual Yarn code is not affected by this? ---
[jira] [Closed] (FLINK-8870) End-to-end tests wrongly pass if md5sum is not installed
[ https://issues.apache.org/jira/browse/FLINK-8870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Schmidt closed FLINK-8870. -- Resolution: Invalid Behaves as expected, I just misinterpreted the output > End-to-end tests wrongly pass if md5sum is not installed > > > Key: FLINK-8870 > URL: https://issues.apache.org/jira/browse/FLINK-8870 > Project: Flink > Issue Type: Bug > Components: Tests >Affects Versions: 1.4.1 >Reporter: Florian Schmidt >Priority: Major > Fix For: 1.5.0 > > > Actual: The end-to-end tests don't fail if md5sum is not installed > {code:bash} > Job with JobID 95f6a482cc9800f1daeb907f4940a116 has finished. > Job Runtime: 1067 ms > /Users/florianschmidt/dev/flink/flink-end-to-end-tests/test-scripts/common.sh: > line 120: md5sum: command not found > pass WordCount > Stopping taskexecutor daemon (pid: 33007) on host Florians-MBP.fritz.box. > Stopping standalonesession daemon (pid: 32715) on host Florians-MBP.fritz.box. > All tests PASS > {code} > Expected: The tests should fail -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386260#comment-16386260 ] Aljoscha Krettek commented on FLINK-8807: - Fixed on release-1.4 in 64adf4bcfa387047036534059cfe361977439cca > ZookeeperCompleted checkpoint store can get stuck in infinite loop > -- > > Key: FLINK-8807 > URL: https://issues.apache.org/jira/browse/FLINK-8807 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0 > > > This code: > https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 > can be stuck forever if at least one checkpoint is not readable because > {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} > anymore. > We have to fix this and also add a unit test that verifies the loop still > works if we make one snapshot unreadable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8870) End-to-end tests wrongly pass if md5sum is not installed
Florian Schmidt created FLINK-8870: -- Summary: End-to-end tests wrongly pass if md5sum is not installed Key: FLINK-8870 URL: https://issues.apache.org/jira/browse/FLINK-8870 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 1.4.1 Reporter: Florian Schmidt Fix For: 1.5.0 Actual: The end-to-end tests don't fail if md5sum is not installed {code:bash} Job with JobID 95f6a482cc9800f1daeb907f4940a116 has finished. Job Runtime: 1067 ms /Users/florianschmidt/dev/flink/flink-end-to-end-tests/test-scripts/common.sh: line 120: md5sum: command not found pass WordCount Stopping taskexecutor daemon (pid: 33007) on host Florians-MBP.fritz.box. Stopping standalonesession daemon (pid: 32715) on host Florians-MBP.fritz.box. All tests PASS {code} Expected: The tests should fail -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386252#comment-16386252 ] ASF GitHub Bot commented on FLINK-8807: --- Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5623 Thanks for the comments! I'll incorporate them and merge. > ZookeeperCompleted checkpoint store can get stuck in infinite loop > -- > > Key: FLINK-8807 > URL: https://issues.apache.org/jira/browse/FLINK-8807 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0 > > > This code: > https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 > can be stuck forever if at least one checkpoint is not readable because > {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} > anymore. > We have to fix this and also add a unit test that verifies the loop still > works if we make one snapshot unreadable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386253#comment-16386253 ] ASF GitHub Bot commented on FLINK-8807: --- Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/5623 > ZookeeperCompleted checkpoint store can get stuck in infinite loop > -- > > Key: FLINK-8807 > URL: https://issues.apache.org/jira/browse/FLINK-8807 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0 > > > This code: > https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 > can be stuck forever if at least one checkpoint is not readable because > {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} > anymore. > We have to fix this and also add a unit test that verifies the loop still > works if we make one snapshot unreadable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5623: [FLINK-8807] Fix ZookeeperCompleted checkpoint store can ...
Github user aljoscha commented on the issue: https://github.com/apache/flink/pull/5623 Thanks for the comments! I'll incorporate them and merge. ---
[GitHub] flink pull request #5623: [FLINK-8807] Fix ZookeeperCompleted checkpoint sto...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/5623 ---
[jira] [Created] (FLINK-8869) Kafka restore from checkpoint without react to the new add partition to kafka server
aitozi created FLINK-8869: - Summary: Kafka restore from checkpoint without react to the new add partition to kafka server Key: FLINK-8869 URL: https://issues.apache.org/jira/browse/FLINK-8869 Project: Flink Issue Type: Bug Components: Kafka Connector Affects Versions: 1.3.2 Reporter: aitozi Assignee: aitozi When job restore from a savepoint and the kafka server has added serval partition , it doesnt consume data or produce data from/to the new partition -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8807) ZookeeperCompleted checkpoint store can get stuck in infinite loop
[ https://issues.apache.org/jira/browse/FLINK-8807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386197#comment-16386197 ] Aljoscha Krettek commented on FLINK-8807: - Fixed on release-1.5 in f8681a9d7732f6f36483e26ec68624809f6cd4b1 Fixed on master in 4226bf22aab5b4359998422fe53755db19785515 > ZookeeperCompleted checkpoint store can get stuck in infinite loop > -- > > Key: FLINK-8807 > URL: https://issues.apache.org/jira/browse/FLINK-8807 > Project: Flink > Issue Type: Bug > Components: State Backends, Checkpointing >Affects Versions: 1.5.0 >Reporter: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0 > > > This code: > https://github.com/apache/flink/blob/9071e3befb8c279f73c3094c9f6bddc0e7cce9e5/flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/ZooKeeperCompletedCheckpointStore.java#L201 > can be stuck forever if at least one checkpoint is not readable because > {{CompletedCheckpoint}} does not have a proper {{equals()}}/{{hashCode()}} > anymore. > We have to fix this and also add a unit test that verifies the loop still > works if we make one snapshot unreadable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8868) Support Table Function as Table
Ruidong Li created FLINK-8868: - Summary: Support Table Function as Table Key: FLINK-8868 URL: https://issues.apache.org/jira/browse/FLINK-8868 Project: Flink Issue Type: Improvement Components: Table API SQL Reporter: Ruidong Li Assignee: Ruidong Li support SQL like: SELECT * FROM TABLE(tf("a")) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8839) Table source factory discovery is broken in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386152#comment-16386152 ] ASF GitHub Bot commented on FLINK-8839: --- GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/5640 [FLINK-8839] [sql-client] Fix table source factory discovery ## What is the purpose of the change This PR fixes the table source factory discovery by adding dependencies to the classloader. It also implements an `ExecutionContext` that can be reused during the same session. ## Brief change log - New `ExecutionContext` abstraction - Possibility to pass a classloader to the Java service provider ## Verifying this change - See `DependencyTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? JavaDocs You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-8839 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5640.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5640 commit 8c7b1427f94082dc023073125b32eceda556d8cd Author: Timo WaltherDate: 2018-03-05T12:46:41Z [FLINK-8839] [sql-client] Fix table source factory discovery > Table source factory discovery is broken in SQL Client > -- > > Key: FLINK-8839 > URL: https://issues.apache.org/jira/browse/FLINK-8839 > Project: Flink > Issue Type: Bug > Components: Table API SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > Table source factories cannot not be discovered if they were added using a > jar file. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5640: [FLINK-8839] [sql-client] Fix table source factory...
GitHub user twalthr opened a pull request: https://github.com/apache/flink/pull/5640 [FLINK-8839] [sql-client] Fix table source factory discovery ## What is the purpose of the change This PR fixes the table source factory discovery by adding dependencies to the classloader. It also implements an `ExecutionContext` that can be reused during the same session. ## Brief change log - New `ExecutionContext` abstraction - Possibility to pass a classloader to the Java service provider ## Verifying this change - See `DependencyTest` ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): no - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: no - The serializers: no - The runtime per-record code paths (performance sensitive): no - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no - The S3 file system connector: no ## Documentation - Does this pull request introduce a new feature? no - If yes, how is the feature documented? JavaDocs You can merge this pull request into a Git repository by running: $ git pull https://github.com/twalthr/flink FLINK-8839 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5640.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5640 commit 8c7b1427f94082dc023073125b32eceda556d8cd Author: Timo WaltherDate: 2018-03-05T12:46:41Z [FLINK-8839] [sql-client] Fix table source factory discovery ---
[jira] [Created] (FLINK-8867) Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config
Shashank Agarwal created FLINK-8867: --- Summary: Rocksdb checkpointing failing with fs.default-scheme: hdfs:// config Key: FLINK-8867 URL: https://issues.apache.org/jira/browse/FLINK-8867 Project: Flink Issue Type: Bug Components: Configuration, State Backends, Checkpointing, YARN Affects Versions: 1.4.1, 1.4.2 Reporter: Shashank Agarwal Fix For: 1.5.0, 1.4.3 In our setup, when we put an entry in our Flink_conf file for default schema. {code} fs.default-scheme: hdfs://mydomain.com:8020/flink {code} Than application with rocksdb state backend fails with the following exception. When we remove this config it works fine. It's working fine with other state backends. {code} AsynchronousException{java.lang.Exception: Could not materialize checkpoint 1 for operator order ip stream (1/1).} at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:948) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.Exception: Could not materialize checkpoint 1 for operator order ip stream (1/1). ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more Suppressed: java.lang.Exception: Could not properly cancel managed keyed state future. at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:91) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:976) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:939) ... 5 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) at org.apache.flink.runtime.state.StateUtil.discardStateFuture(StateUtil.java:66) at org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:89) ... 7 more Caused by: java.lang.IllegalStateException at org.apache.flink.util.Preconditions.checkState(Preconditions.java:179) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$RocksDBIncrementalSnapshotOperation.materializeSnapshot(RocksDBKeyedStateBackend.java:926) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:389) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend$1.call(RocksDBKeyedStateBackend.java:386) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:40) at org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:894) ... 5 more [CIRCULAR REFERENCE:java.lang.IllegalStateException] {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8866) Create unified interfaces to configure and instatiate TableSinks
Timo Walther created FLINK-8866: --- Summary: Create unified interfaces to configure and instatiate TableSinks Key: FLINK-8866 URL: https://issues.apache.org/jira/browse/FLINK-8866 Project: Flink Issue Type: New Feature Components: Table API SQL Reporter: Timo Walther Assignee: Timo Walther Similar to the efforts done in FLINK-8240. We need unified ways to configure and instantiate TableSinks. Among other applications, this is necessary in order to declare table sinks in an environment file of the SQL client. Such that the sink can be used for {{INSERT INTO}} statements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8858) Add support for INSERT INTO in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8858: Issue Type: Sub-task (was: New Feature) Parent: FLINK-7594 > Add support for INSERT INTO in SQL Client > - > > Key: FLINK-8858 > URL: https://issues.apache.org/jira/browse/FLINK-8858 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.6.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > > The current design of SQL Client embedded mode doesn't support long running > queries. It would be useful for simple jobs that can be expressed in a single > sql statement if we can submit sql statements stored in files as long running > queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-7756) RocksDB state backend Checkpointing (Async and Incremental) is not working with CEP.
[ https://issues.apache.org/jira/browse/FLINK-7756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386138#comment-16386138 ] Aljoscha Krettek commented on FLINK-7756: - [~shashank734] Yes, because that should also fail with a better error message. > RocksDB state backend Checkpointing (Async and Incremental) is not working > with CEP. > - > > Key: FLINK-7756 > URL: https://issues.apache.org/jira/browse/FLINK-7756 > Project: Flink > Issue Type: Sub-task > Components: CEP, State Backends, Checkpointing, Streaming >Affects Versions: 1.4.0, 1.3.2 > Environment: Flink 1.3.2, Yarn, HDFS, RocksDB backend >Reporter: Shashank Agarwal >Assignee: Aljoscha Krettek >Priority: Blocker > Fix For: 1.5.0, 1.4.1, 1.4.2 > > Attachments: jobmanager.log, jobmanager_without_cassandra.log, > taskmanager.log, taskmanager_without_cassandra.log > > > When i try to use RocksDBStateBackend on my staging cluster (which is using > HDFS as file system) it crashes. But When i use FsStateBackend on staging > (which is using HDFS as file system) it is working fine. > On local with local file system it's working fine in both cases. > Please check attached logs. I have around 20-25 tasks in my app. > {code:java} > 2017-09-29 14:21:31,639 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=0). > 2017-09-29 14:21:31,640 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,020 INFO > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink - No state > to restore for the BucketingSink (taskIdx=1). > 2017-09-29 14:21:32,022 INFO > org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackend - > Initializing RocksDB keyed state backend from snapshot. > 2017-09-29 14:21:32,078 INFO com.datastax.driver.core.NettyUtil > - Found Netty's native epoll transport in the classpath, using > it > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (1/2) > (b879f192c4e8aae6671cdafb3a24c00a). > 2017-09-29 14:21:34,177 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Map (2/2) > (1ea5aef6ccc7031edc6b37da2912d90b). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Attempting to fail task externally Co-Flat Map (2/2) > (4bac8e764c67520d418a4c755be23d4d). > 2017-09-29 14:21:34,178 INFO org.apache.flink.runtime.taskmanager.Task > - Co-Flat Map (1/2) (b879f192c4e8aae6671cdafb3a24c00a) switched > from RUNNING to FAILED. > AsynchronousException{java.lang.Exception: Could not materialize checkpoint 2 > for operator Co-Flat Map (1/2).} > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:970) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.Exception: Could not materialize checkpoint 2 for > operator Co-Flat Map (1/2). > ... 6 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:43) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:897) > ... 5 more > Suppressed: java.lang.Exception: Could not properly cancel managed > keyed state future. > at > org.apache.flink.streaming.api.operators.OperatorSnapshotResult.cancel(OperatorSnapshotResult.java:90) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.cleanup(StreamTask.java:1023) > at > org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:961) > ... 5 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.IllegalStateException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at
[jira] [Updated] (FLINK-8858) Add support for INSERT INTO in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-8858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8858: Summary: Add support for INSERT INTO in SQL Client (was: SQL Client to submit long running query in file) > Add support for INSERT INTO in SQL Client > - > > Key: FLINK-8858 > URL: https://issues.apache.org/jira/browse/FLINK-8858 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Affects Versions: 1.6.0 >Reporter: Renjie Liu >Assignee: Renjie Liu >Priority: Major > > The current design of SQL Client embedded mode doesn't support long running > queries. It would be useful for simple jobs that can be expressed in a single > sql statement if we can submit sql statements stored in files as long running > queries. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8860) SlotManager spamming log files
[ https://issues.apache.org/jira/browse/FLINK-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-8860: Priority: Blocker (was: Critical) > SlotManager spamming log files > -- > > Key: FLINK-8860 > URL: https://issues.apache.org/jira/browse/FLINK-8860 > Project: Flink > Issue Type: Bug > Components: JobManager, ResourceManager >Affects Versions: 1.5.0 >Reporter: Nico Kruber >Assignee: Nico Kruber >Priority: Blocker > Labels: flip-6 > Fix For: 1.5.0 > > > {{SlotManager}} is spamming the log files a lot with > {code} > 2018-03-05 10:45:12,393 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance b16c4e516995d1e672c0933bb380770c. > 2018-03-05 10:45:12,393 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance de58fbf1c069620a4275c8b529deb20b. > 2018-03-05 10:45:12,393 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 86ab5a7e1d57bb2883fc0d1f2aebb304. > 2018-03-05 10:45:12,393 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 90f9ab2bd433db41b3ab567fd246fb3c. > 2018-03-05 10:45:12,393 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance ec99fcc5a801272402af9afe08a1001d. > 2018-03-05 10:45:12,394 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 4c1c4b5ce52195dc90196c10c26d9ef8. > 2018-03-05 10:45:12,394 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 2541d0f1398fc307aaf86bf7750535f1. > 2018-03-05 10:45:12,394 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 91d94a9fdfce3cbcef32ac1c6e7b3fbf. > 2018-03-05 10:45:22,392 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 91d94a9fdfce3cbcef32ac1c6e7b3fbf. > 2018-03-05 10:45:22,394 INFO > org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received > slot report from instance 90f9ab2bd433db41b3ab567fd246fb3c. > {code} > This message is printed once per {{TaskManager}} heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8865) Add CLI query code completion in SQL Client
Timo Walther created FLINK-8865: --- Summary: Add CLI query code completion in SQL Client Key: FLINK-8865 URL: https://issues.apache.org/jira/browse/FLINK-8865 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Timo Walther This issue is a subtask of part two "Full Embedded SQL Client" of the implementation plan mentioned in [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. Calcite already offers a code completion functionality. It would be great if we could expose this feature also through the SQL CLI Client. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8864) Add CLI query history in SQL Client
[ https://issues.apache.org/jira/browse/FLINK-8864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8864: Summary: Add CLI query history in SQL Client (was: Add CLI query history) > Add CLI query history in SQL Client > --- > > Key: FLINK-8864 > URL: https://issues.apache.org/jira/browse/FLINK-8864 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Timo Walther >Priority: Major > > This issue is a subtask of part two "Full Embedded SQL Client" of the > implementation plan mentioned in > [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. > It would be great to have the possibility of persisting the CLI's query > history. Such that queries can be reused when the CLI Client is started > again. Also a search feature as it is offered by terminals would be good. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8862) Support HBase snapshot read
[ https://issues.apache.org/jira/browse/FLINK-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386135#comment-16386135 ] ASF GitHub Bot commented on FLINK-8862: --- GitHub user neoremind opened a pull request: https://github.com/apache/flink/pull/5639 [FLINK-8862] [HBase] Support HBase snapshot read ## What is the purpose of the change *Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also [snapshot](http://hbase.apache.org/book.html#ops.snapshots) scanning solution, just like Hadoop provides 2 ways to scan HBase, one is [TableInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html), the other is [TableSnapshotInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.html), so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users.* ## Brief change log - *Create `TableInputSplitStrategy` interface and its implementations as abstraction logic for `AbstractTableInputFormat`* - *Update `HBaseRowInputFormat` and `TableInputFormat`* - *Add `HBaseSnapshotRowInputFormat` and `TableSnapshotInputFormat`* - *Extract 2 interfaces including `HBaseTableScannerAware` and `ResultToTupleMapper`* - *Add `HBaseSnapshotReadExample`* ## Verifying this change This change is already covered by existing tests as follows, and new test cases has been added as well. `org.apache.flink.addons.hbase.HBaseConnectorITCase` This change added tests and can be verified as follows: - *Manually create one snapshot for a specific HBase table, and use TableSnapshotInputFormat to do full scan.* - *Running existing HBaseReadExample to do full scan.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) - For document, please visit [JIRA ticket](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8862?filter=allopenissues), a detailed design doc and class diagram have been attached. You can merge this pull request into a Git repository by running: $ git pull https://github.com/neoremind/flink snapshot Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5639 commit 0b36b434f987a971b6463ce3441c483380cfa9dd Author: neoremindDate: 2018-03-05T14:14:09Z Support HBase snapshot read > Support HBase snapshot read > --- > > Key: FLINK-8862 > URL: https://issues.apache.org/jira/browse/FLINK-8862 > Project: Flink > Issue Type: Improvement > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.2.0 >Reporter: Xu Zhang >Priority: Major > Attachments: FLINK-8862-Design-Class-Diagram.png, > FLINK-8862-DesignDoc.pdf > > > Flink-hbase connector only supports reading/scanning HBase over region server > scanner, there is also snapshot scanning solution, just like Hadoop provides > 2 ways to scan HBase, one is TableInputFormat, the other is > TableSnapshotInputFormat, so it would be great if flink supports both > solutions to ensure more wider usage scope and provide alternatives for users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5639: [FLINK-8862] [HBase] Support HBase snapshot read
GitHub user neoremind opened a pull request: https://github.com/apache/flink/pull/5639 [FLINK-8862] [HBase] Support HBase snapshot read ## What is the purpose of the change *Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also [snapshot](http://hbase.apache.org/book.html#ops.snapshots) scanning solution, just like Hadoop provides 2 ways to scan HBase, one is [TableInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableInputFormat.html), the other is [TableSnapshotInputFormat](https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.html), so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users.* ## Brief change log - *Create `TableInputSplitStrategy` interface and its implementations as abstraction logic for `AbstractTableInputFormat`* - *Update `HBaseRowInputFormat` and `TableInputFormat`* - *Add `HBaseSnapshotRowInputFormat` and `TableSnapshotInputFormat`* - *Extract 2 interfaces including `HBaseTableScannerAware` and `ResultToTupleMapper`* - *Add `HBaseSnapshotReadExample`* ## Verifying this change This change is already covered by existing tests as follows, and new test cases has been added as well. `org.apache.flink.addons.hbase.HBaseConnectorITCase` This change added tests and can be verified as follows: - *Manually create one snapshot for a specific HBase table, and use TableSnapshotInputFormat to do full scan.* - *Running existing HBaseReadExample to do full scan.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (**yes** / no) - If yes, how is the feature documented? (not applicable / **docs** / **JavaDocs** / not documented) - For document, please visit [JIRA ticket](https://issues.apache.org/jira/projects/FLINK/issues/FLINK-8862?filter=allopenissues), a detailed design doc and class diagram have been attached. You can merge this pull request into a Git repository by running: $ git pull https://github.com/neoremind/flink snapshot Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5639.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5639 commit 0b36b434f987a971b6463ce3441c483380cfa9dd Author: neoremindDate: 2018-03-05T14:14:09Z Support HBase snapshot read ---
[jira] [Created] (FLINK-8864) Add CLI query history
Timo Walther created FLINK-8864: --- Summary: Add CLI query history Key: FLINK-8864 URL: https://issues.apache.org/jira/browse/FLINK-8864 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Timo Walther This issue is a subtask of part two "Full Embedded SQL Client" of the implementation plan mentioned in [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. It would be great to have the possibility of persisting the CLI's query history. Such that queries can be reused when the CLI Client is started again. Also a search feature as it is offered by terminals would be good. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8863) Add user-defined function support in SQL Client
Timo Walther created FLINK-8863: --- Summary: Add user-defined function support in SQL Client Key: FLINK-8863 URL: https://issues.apache.org/jira/browse/FLINK-8863 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Timo Walther This issue is a subtask of part two "Full Embedded SQL Client" of the implementation plan mentioned in [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. It should be possible to declare user-defined functions in the SQL client. For now, we limit the registration to classes that implement {{ScalarFunction}}, {{TableFunction}}, {{AggregateFunction}}. Functions that are implemented in SQL are not part of this issue. I would suggest to introduce a {{functions}} top-level property. The declaration could look similar to: {code} functions: - name: testFunction from: class <-- optional, default: class class: org.my.MyScalarFunction constructor: <-- optional, needed for certain types of functions - 42.0 - class: org.my.Class <-- possibility to create objects via properties constructor: - 1 - true - false - "whatever" - type: INT value: 1 {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8862) Support HBase snapshot read
[ https://issues.apache.org/jira/browse/FLINK-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Zhang updated FLINK-8862: Attachment: FLINK-8862-Design-Class-Diagram.png > Support HBase snapshot read > --- > > Key: FLINK-8862 > URL: https://issues.apache.org/jira/browse/FLINK-8862 > Project: Flink > Issue Type: Improvement > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.2.0 >Reporter: Xu Zhang >Priority: Major > Attachments: FLINK-8862-Design-Class-Diagram.png, > FLINK-8862-DesignDoc.pdf > > > Flink-hbase connector only supports reading/scanning HBase over region server > scanner, there is also snapshot scanning solution, just like Hadoop provides > 2 ways to scan HBase, one is TableInputFormat, the other is > TableSnapshotInputFormat, so it would be great if flink supports both > solutions to ensure more wider usage scope and provide alternatives for users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8517) StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis
[ https://issues.apache.org/jira/browse/FLINK-8517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386129#comment-16386129 ] ASF GitHub Bot commented on FLINK-8517: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5621 > StaticlyNestedIterationsITCase.testJobWithoutObjectReuse unstable on Travis > --- > > Key: FLINK-8517 > URL: https://issues.apache.org/jira/browse/FLINK-8517 > Project: Flink > Issue Type: Bug > Components: DataSet API, TaskManager, Tests >Affects Versions: 1.5.0 >Reporter: Till Rohrmann >Priority: Blocker > Labels: test-stability > Fix For: 1.5.0 > > > The {{StaticlyNestedIterationsITCase.testJobWithoutObjectReuse}} test case > fails on Travis. This exception might be relevant: > {code:java} > org.apache.flink.runtime.client.JobExecutionException: Job execution failed. > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply$mcV$sp(JobManager.scala:891) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anonfun$applyOrElse$6.apply(JobManager.scala:834) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) > at > scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) > at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39) > at > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415) > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > at > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) > at > scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) > at > scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) > Caused by: java.lang.IllegalStateException: Partition > 557b069f2b89f8ba599e6ab0956a3f5a@58f1a6b7d8ae10b9141f17c08d06cecb not > registered at task event dispatcher. > at > org.apache.flink.runtime.io.network.TaskEventDispatcher.subscribeToEvent(TaskEventDispatcher.java:107) > at > org.apache.flink.runtime.iterative.task.IterationHeadTask.initSuperstepBarrier(IterationHeadTask.java:242) > at > org.apache.flink.runtime.iterative.task.IterationHeadTask.run(IterationHeadTask.java:266) > at > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:368) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703) > at java.lang.Thread.run(Thread.java:748){code} > > https://api.travis-ci.org/v3/job/60156/log.txt -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5621: [FLINK-8517] fix missing synchronization in TaskEv...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5621 ---
[jira] [Updated] (FLINK-8862) Support HBase snapshot read
[ https://issues.apache.org/jira/browse/FLINK-8862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xu Zhang updated FLINK-8862: Attachment: FLINK-8862-DesignDoc.pdf > Support HBase snapshot read > --- > > Key: FLINK-8862 > URL: https://issues.apache.org/jira/browse/FLINK-8862 > Project: Flink > Issue Type: Improvement > Components: Batch Connectors and Input/Output Formats >Affects Versions: 1.2.0 >Reporter: Xu Zhang >Priority: Major > Attachments: FLINK-8862-DesignDoc.pdf > > > Flink-hbase connector only supports reading/scanning HBase over region server > scanner, there is also snapshot scanning solution, just like Hadoop provides > 2 ways to scan HBase, one is TableInputFormat, the other is > TableSnapshotInputFormat, so it would be great if flink supports both > solutions to ensure more wider usage scope and provide alternatives for users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8862) Support HBase snapshot read
Xu Zhang created FLINK-8862: --- Summary: Support HBase snapshot read Key: FLINK-8862 URL: https://issues.apache.org/jira/browse/FLINK-8862 Project: Flink Issue Type: Improvement Components: Batch Connectors and Input/Output Formats Affects Versions: 1.2.0 Reporter: Xu Zhang Flink-hbase connector only supports reading/scanning HBase over region server scanner, there is also snapshot scanning solution, just like Hadoop provides 2 ways to scan HBase, one is TableInputFormat, the other is TableSnapshotInputFormat, so it would be great if flink supports both solutions to ensure more wider usage scope and provide alternatives for users. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-8861) Add support for batch queries in SQL Client
Timo Walther created FLINK-8861: --- Summary: Add support for batch queries in SQL Client Key: FLINK-8861 URL: https://issues.apache.org/jira/browse/FLINK-8861 Project: Flink Issue Type: Sub-task Components: Table API SQL Reporter: Timo Walther This issue is a subtask of part two "Full Embedded SQL Client" of the implementation plan mentioned in [FLIP-24|https://cwiki.apache.org/confluence/display/FLINK/FLIP-24+-+SQL+Client]. Similar to streaming queries, it should be possible to execute batch queries in the SQL Client and collect the results using {{DataSet.collect()}} for debugging purposes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8853) SQL Client cannot emit query results that contain a rowtime attribute
[ https://issues.apache.org/jira/browse/FLINK-8853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8853: Issue Type: Sub-task (was: Bug) Parent: FLINK-7594 > SQL Client cannot emit query results that contain a rowtime attribute > - > > Key: FLINK-8853 > URL: https://issues.apache.org/jira/browse/FLINK-8853 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > Emitting a query result that contains a rowtime attribute fails with the > following exception: > {code:java} > Caused by: java.lang.ClassCastException: java.sql.Timestamp cannot be cast to > java.lang.Long > at > org.apache.flink.api.common.typeutils.base.LongSerializer.serialize(LongSerializer.java:27) > at > org.apache.flink.api.java.typeutils.runtime.RowSerializer.serialize(RowSerializer.java:160) > at > org.apache.flink.api.java.typeutils.runtime.RowSerializer.serialize(RowSerializer.java:46) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:125) > at > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.serialize(TupleSerializer.java:30) > at > org.apache.flink.streaming.experimental.CollectSink.invoke(CollectSink.java:66) > ... 44 more{code} > The problem is cause by the {{ResultStore}} which configures the > {{CollectionSink}} with the field types obtained from the {{TableSchema}}. > The type of the rowtime field is a {{TimeIndicatorType}} which is serialized > as Long. However, in the query result it is represented as Timestamp. Hence, > the type must be replaced by a {{SqlTimeTypeInfo}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8852) SQL Client does not work with new FLIP-6 mode
[ https://issues.apache.org/jira/browse/FLINK-8852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8852: Issue Type: Sub-task (was: Bug) Parent: FLINK-7594 > SQL Client does not work with new FLIP-6 mode > - > > Key: FLINK-8852 > URL: https://issues.apache.org/jira/browse/FLINK-8852 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > The SQL client does not submit queries to local Flink cluster that runs in > FLIP-6 mode. It doesn't throw an exception either. > Job submission works if the legacy Flink cluster mode is used (`mode: old`) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8850) SQL Client does not support Event-time
[ https://issues.apache.org/jira/browse/FLINK-8850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8850: Issue Type: Sub-task (was: Bug) Parent: FLINK-7594 > SQL Client does not support Event-time > -- > > Key: FLINK-8850 > URL: https://issues.apache.org/jira/browse/FLINK-8850 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.5.0 >Reporter: Fabian Hueske >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > The SQL client fails with an exception if a table includes a rowtime > attribute. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-8686) Improve basic embedded SQL client
[ https://issues.apache.org/jira/browse/FLINK-8686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther updated FLINK-8686: Issue Type: Sub-task (was: Improvement) Parent: FLINK-7594 > Improve basic embedded SQL client > -- > > Key: FLINK-8686 > URL: https://issues.apache.org/jira/browse/FLINK-8686 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Timo Walther >Assignee: Timo Walther >Priority: Blocker > Fix For: 1.5.0 > > > This issue describes follow-up issues that should be fixes in order to make > the SQL client more stable: > - Add more tests for executor > - Configure JVM heap size > - Limit changelog and table buffers > - "The input is invalid please check it again." => add allowed range > - Load dependencies recursively > - Cache table & environments in executor > - Clean up results in result store > - Improve error message for unsupported batch queries > - Add more logging instead swallowing exceptions > - List properties in error message about missing TS factory sorted by name > - Add command to show loaded TS factories and their required propeties > - Add command to reload configuration from files (no need to restart client) > - Improve error message in case of invalid json-schema (right now: > {{java.lang.IllegalArgumentException: No type could be found in node:}} > - Add switch to show full stacktraces of exceptions > - Give error message when setting unknown parameters > {{result-mode=changelog}} does not give an error but should be > {{execution.result-mode=changelog}} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-8274) Fix Java 64K method compiling limitation for CommonCalc
[ https://issues.apache.org/jira/browse/FLINK-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386109#comment-16386109 ] ASF GitHub Bot commented on FLINK-8274: --- Github user twalthr commented on the issue: https://github.com/apache/flink/pull/5613 @Xpray and @hequn8128 are you fine with merging this PR for now and then open follow-up issues for more splitting (unboxing, expression, class)? > Fix Java 64K method compiling limitation for CommonCalc > --- > > Key: FLINK-8274 > URL: https://issues.apache.org/jira/browse/FLINK-8274 > Project: Flink > Issue Type: Bug > Components: Table API SQL >Affects Versions: 1.5.0 >Reporter: Ruidong Li >Assignee: Ruidong Li >Priority: Critical > > For complex SQL Queries, the generated code for {code}DataStreamCalc{code}, > {code}DataSetCalc{code} may exceed Java's method length limitation 64kb. > > This issue will split long method to several sub method calls. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink issue #5613: [FLINK-8274] [table] Split generated methods for preventi...
Github user twalthr commented on the issue: https://github.com/apache/flink/pull/5613 @Xpray and @hequn8128 are you fine with merging this PR for now and then open follow-up issues for more splitting (unboxing, expression, class)? ---
[jira] [Commented] (FLINK-6924) ADD LOG(X) supported in TableAPI
[ https://issues.apache.org/jira/browse/FLINK-6924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386093#comment-16386093 ] ASF GitHub Bot commented on FLINK-6924: --- GitHub user buptljy opened a pull request: https://github.com/apache/flink/pull/5638 [FLINK-6924][table]ADD LOG(X) supported in TableAPI ## What is the purpose of the change * Add LOG(X) function in TableAPI. ## Brief change log * Add LOG(X) function in TableAPI. * Modify LOG(X) unit tests from "testSqlApi" to "testAllApis". ## Verifying this change * This can be tested by unit testing. ## Does this pull request potentially affect one of the following parts: * No ## Documentation * You can merge this pull request into a Git repository by running: $ git pull https://github.com/buptljy/flink log Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5638 commit 376cee8a6bc25afdf8df50b65c3fa8f35a5e4b7c Author: Liao JiayiDate: 2018-03-04T09:15:10Z add log table function commit 9ab7a07c0b614cb7af0fee7e69f6d58bf5004b28 Author: Liao Jiayi Date: 2018-03-04T09:15:23Z Merge branch 'master' of github.com:apache/flink into log > ADD LOG(X) supported in TableAPI > > > Key: FLINK-6924 > URL: https://issues.apache.org/jira/browse/FLINK-6924 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Affects Versions: 1.4.0 >Reporter: sunjincheng >Assignee: zjuwangg >Priority: Major > Labels: starter > > See FLINK-6891 for detail. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] flink pull request #5638: [FLINK-6924][table]ADD LOG(X) supported in TableAP...
GitHub user buptljy opened a pull request: https://github.com/apache/flink/pull/5638 [FLINK-6924][table]ADD LOG(X) supported in TableAPI ## What is the purpose of the change * Add LOG(X) function in TableAPI. ## Brief change log * Add LOG(X) function in TableAPI. * Modify LOG(X) unit tests from "testSqlApi" to "testAllApis". ## Verifying this change * This can be tested by unit testing. ## Does this pull request potentially affect one of the following parts: * No ## Documentation * You can merge this pull request into a Git repository by running: $ git pull https://github.com/buptljy/flink log Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5638.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5638 commit 376cee8a6bc25afdf8df50b65c3fa8f35a5e4b7c Author: Liao JiayiDate: 2018-03-04T09:15:10Z add log table function commit 9ab7a07c0b614cb7af0fee7e69f6d58bf5004b28 Author: Liao Jiayi Date: 2018-03-04T09:15:23Z Merge branch 'master' of github.com:apache/flink into log ---