[GitHub] Aitozi commented on issue #7543: [FLINK-10996]Fix the possible state leak with CEP processing time model
Aitozi commented on issue #7543: [FLINK-10996]Fix the possible state leak with CEP processing time model URL: https://github.com/apache/flink/pull/7543#issuecomment-456022735 Thanks for your reply, I will take a look into that PR these two days. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XuPingyong opened a new pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear
XuPingyong opened a new pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear URL: https://github.com/apache/flink/pull/4267 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XuPingyong removed a comment on issue #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear
XuPingyong removed a comment on issue #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear URL: https://github.com/apache/flink/pull/4267#issuecomment-456053593 no need This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] clarkyzl closed pull request #3473: [FLINK-5833] [table] Support for Hive GenericUDF
clarkyzl closed pull request #3473: [FLINK-5833] [table] Support for Hive GenericUDF URL: https://github.com/apache/flink/pull/3473 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-7015) Separate OperatorConfig from StreamConfig
[ https://issues.apache.org/jira/browse/FLINK-7015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7015: -- Labels: pull-request-available (was: ) > Separate OperatorConfig from StreamConfig > - > > Key: FLINK-7015 > URL: https://issues.apache.org/jira/browse/FLINK-7015 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Reporter: Xu Pingyong >Assignee: Xu Pingyong >Priority: Major > Labels: pull-request-available > > Motivation: > A Task contains one or more operators with chainning, however > configs of operator and task are all put in StreamConfig. For example, when a > opeator sets up with the StreamConfig, it can see the interface about > physicalEdges or chained.task.configs that are confused. Similarly a > streamTask should not see the interface aboule chain.index. > So we need to separate OperatorConfig from StreamConfig. A > streamTask builds execution enviroment with the streamConfig, and extract > operatorConfigs from it, then build streamOperators with every > operatorConfig. > >OperatorConfig: for the streamOperator to setup with, it constains > informations that only belong to the streamOperator. It contains: >1) operator information: name, id >2) Serialized StreamOperator >3) input serializer. >4) output edges and serializers. >5) chain.index >6) state.key.serializer > StreamConfig: for the streamTask to use: >1) in.physical.edges > 2) out.physical.edges >3) chained OperatorConfigs >4) execution environment: checkpoint, state.backend and so on... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XuPingyong closed pull request #4273: [FLINK-7065] Separate the flink-streaming-java module from flink-clients
XuPingyong closed pull request #4273: [FLINK-7065] Separate the flink-streaming-java module from flink-clients URL: https://github.com/apache/flink/pull/4273 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XuPingyong closed pull request #4241: [FLINK-7015] [streaming] separate OperatorConfig from StreamConfig
XuPingyong closed pull request #4241: [FLINK-7015] [streaming] separate OperatorConfig from StreamConfig URL: https://github.com/apache/flink/pull/4241 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XuPingyong commented on issue #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear
XuPingyong commented on issue #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear URL: https://github.com/apache/flink/pull/4267#issuecomment-456053593 no need This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XuPingyong closed pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear
XuPingyong closed pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear URL: https://github.com/apache/flink/pull/4267 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-5833) Support for Hive GenericUDF
[ https://issues.apache.org/jira/browse/FLINK-5833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-5833: -- Labels: pull-request-available (was: ) > Support for Hive GenericUDF > --- > > Key: FLINK-5833 > URL: https://issues.apache.org/jira/browse/FLINK-5833 > Project: Flink > Issue Type: Sub-task > Components: Table API SQL >Reporter: Zhuoluo Yang >Assignee: Zhuoluo Yang >Priority: Major > Labels: pull-request-available > > The second step of FLINK-5802 is to support Hive's GenericUDF. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7065) separate the flink-streaming-java module from flink-clients
[ https://issues.apache.org/jira/browse/FLINK-7065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7065: -- Labels: pull-request-available (was: ) > separate the flink-streaming-java module from flink-clients > > > Key: FLINK-7065 > URL: https://issues.apache.org/jira/browse/FLINK-7065 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Reporter: Xu Pingyong >Assignee: Xu Pingyong >Priority: Major > Labels: pull-request-available > > Motivation: > It is not good that "flink-streaming-java" module depends on > "flink-clients". Flink-clients should see something in "flink-streaming-java". > Related Change: > 1. LocalStreamEnvironment and RemoteStreamEnvironment can also execute > a job by the executors(LocalExecutor and RemoteExecutor). Introduce > StreamGraphExecutor which executors a streamGraph as PlanExecutor executors > the plan. StreamGraphExecutor and PlanExecutor all extend Executor. > 2. Introduce StreamExecutionEnvironmentFactory which works similarly > to ContextEnvironmentFactory in flink-clients. > When a object of ContextEnvironmentFactory, > OptimizerPlanEnvironmentFactory or PreviewPlanEnvironmentFactory is set into > ExecutionEnvironment(by calling initializeContextEnvironment), the relevant > StreamEnvFactory is alsot set into StreamExecutionEnvironment. It is similar > when calling unsetContext. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] godfreyhe closed pull request #4572: [Flink-7243] [connectors] Add ParquetInputFormat
godfreyhe closed pull request #4572: [Flink-7243] [connectors] Add ParquetInputFormat URL: https://github.com/apache/flink/pull/4572 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] godfreyhe closed pull request #3818: [FLINK-5514] [table] Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS
godfreyhe closed pull request #3818: [FLINK-5514] [table] Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS URL: https://github.com/apache/flink/pull/3818 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-5514) Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS
[ https://issues.apache.org/jira/browse/FLINK-5514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-5514: -- Labels: pull-request-available (was: ) > Implement an efficient physical execution for CUBE/ROLLUP/GROUPING SETS > --- > > Key: FLINK-5514 > URL: https://issues.apache.org/jira/browse/FLINK-5514 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Timo Walther >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > > A first support for GROUPING SETS has been added in FLINK-5303. However, the > current runtime implementation is not very efficient as it basically only > translates logical operators to physical operators i.e. grouping sets are > currently only translated into multiple groupings that are unioned together. > A rough design document for this has been created in FLINK-2980. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] XuPingyong closed pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear
XuPingyong closed pull request #4267: [FLINK-7018][streaming] Refactor streamGraph to make interfaces clear URL: https://github.com/apache/flink/pull/4267 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] godfreyhe closed pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning
godfreyhe closed pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning URL: https://github.com/apache/flink/pull/4667 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-7620) Supports user defined optimization phase
[ https://issues.apache.org/jira/browse/FLINK-7620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7620: -- Labels: pull-request-available (was: ) > Supports user defined optimization phase > > > Key: FLINK-7620 > URL: https://issues.apache.org/jira/browse/FLINK-7620 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > > Currently, the optimization phases are hardcode in {{BatchTableEnvironment}} > and {{StreamTableEnvironment}}. It's better that user could define the > optimization phases and the rules in each phase as needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7018) Refactor streamGraph to make interfaces clear
[ https://issues.apache.org/jira/browse/FLINK-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7018: -- Labels: pull-request-available (was: ) > Refactor streamGraph to make interfaces clear > - > > Key: FLINK-7018 > URL: https://issues.apache.org/jira/browse/FLINK-7018 > Project: Flink > Issue Type: Improvement > Components: DataStream API >Reporter: Xu Pingyong >Assignee: Xu Pingyong >Priority: Major > Labels: pull-request-available > > Motivation: >1. StreamGraph is a graph consisted of some streamNodes. So virtual > nodes (such as select, sideOutput, partition) should be moved away from it. > Main iterfaces of StreamGraph should be as following: > addSource(StreamNode sourceNode) > addSink(StreamNode sinkNode) >addOperator(StreamNode streamNode) >addEdge(Integer upStreamVertexID, Integer downStreamVertexID, > StreamEdge.InputOrder inputOrder, StreamPartitioner partitioner, > List outputNames, OutputTag outputTag) > getJobGraph() > 2. StreamExecutionEnvironment should not be in StreamGraph, I create > StreamGraphProperties which extracts all env information the streamGraph > needs from StreamExecutionEnvironment. It contains: > 1) executionConfig > 2) checkpointConfig > 3) timeCharacteristic > 4) stateBackend > 5) chainingEnabled > 6) cachedFiles > 7) jobName > > Related Changes: >I moved the part of dealing with virtual nodes to > StreamGraphGenerator. And get properties of StreamGraph from > StreamGraphProperties instead of StreamExecutionEnvironment. > > It is only a code abstraction internally. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6516) using real row count instead of dummy row count when optimizing plan
[ https://issues.apache.org/jira/browse/FLINK-6516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-6516: -- Labels: pull-request-available (was: ) > using real row count instead of dummy row count when optimizing plan > > > Key: FLINK-6516 > URL: https://issues.apache.org/jira/browse/FLINK-6516 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > > Currently, the statistic of {{TableSourceTable}} is {{UNKNOWN}} mostly, and > the statistic from {{ExternalCatalog}} maybe is null also. Actually, only > each {{TableSource}} knows its statistic exactly, especial for > {{FilterableTableSource}} and {{PartitionableTableSource}}. So we can add > {{getTableStats}} method in {{TableSource}}, and use it in TableSourceScan's > estimateRowCount method to get real row count. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11046) ElasticSearch6Connector cause thread blocked when index failed with retry
[ https://issues.apache.org/jira/browse/FLINK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747810#comment-16747810 ] Dawid Wysakowicz commented on FLINK-11046: -- Hi [~xueyu] Did you manage to make any progress on this issue? > ElasticSearch6Connector cause thread blocked when index failed with retry > - > > Key: FLINK-11046 > URL: https://issues.apache.org/jira/browse/FLINK-11046 > Project: Flink > Issue Type: Bug > Components: ElasticSearch Connector >Affects Versions: 1.6.2 >Reporter: luoguohao >Assignee: xueyu >Priority: Major > > When i'm using es6 sink to index into es, bulk process with some exception > catched, and i trying to reindex the document with the call > `indexer.add(action)` in the `ActionRequestFailureHandler.onFailure()` > method, but things goes incorrect. The call thread stuck there, and with the > thread dump, i saw the `bulkprocessor` object was locked by other thread. > {code:java} > public interface ActionRequestFailureHandler extends Serializable { > void onFailure(ActionRequest action, Throwable failure, int restStatusCode, > RequestIndexer indexer) throws Throwable; > } > {code} > After i read the code implemented in the `indexer.add(action)`, i find that > `synchronized` is needed on each add operation. > {code:java} > private synchronized void internalAdd(DocWriteRequest request, @Nullable > Object payload) { > ensureOpen(); > bulkRequest.add(request, payload); > executeIfNeeded(); > } > {code} > And, at i also noticed that `bulkprocessor` object would also locked in the > bulk process thread. > the bulk process operation is in the following code: > {code:java} > public void execute(BulkRequest bulkRequest, long executionId) { > Runnable toRelease = () -> {}; > boolean bulkRequestSetupSuccessful = false; > try { > listener.beforeBulk(executionId, bulkRequest); > semaphore.acquire(); > toRelease = semaphore::release; > CountDownLatch latch = new CountDownLatch(1); > retry.withBackoff(consumer, bulkRequest, new > ActionListener() { > @Override > public void onResponse(BulkResponse response) { > try { > listener.afterBulk(executionId, bulkRequest, response); > } finally { > semaphore.release(); > latch.countDown(); > } > } > @Override > public void onFailure(Exception e) { > try { > listener.afterBulk(executionId, bulkRequest, e); > } finally { > semaphore.release(); > latch.countDown(); > } > } > }, Settings.EMPTY); > bulkRequestSetupSuccessful = true; >if (concurrentRequests == 0) { >latch.await(); > } > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > logger.info(() -> new ParameterizedMessage("Bulk request {} has been > cancelled.", executionId), e); > listener.afterBulk(executionId, bulkRequest, e); > } catch (Exception e) { > logger.warn(() -> new ParameterizedMessage("Failed to execute bulk > request {}.", executionId), e); > listener.afterBulk(executionId, bulkRequest, e); > } finally { > if (bulkRequestSetupSuccessful == false) { // if we fail on > client.bulk() release the semaphore > toRelease.run(); > } > } > } > {code} > As the read line i marked above, i think, that's the reason why the retry > operation thread was block, because the the bulk process thread never release > the lock on `bulkprocessor`. and, i also trying to figure out why the field > `concurrentRequests` was set to zero. And i saw the the initialize for > bulkprocessor in class `ElasticsearchSinkBase`: > {code:java} > protected BulkProcessor buildBulkProcessor(BulkProcessor.Listener listener) { > ... > BulkProcessor.Builder bulkProcessorBuilder = > callBridge.createBulkProcessorBuilder(client, listener); > // This makes flush() blocking > bulkProcessorBuilder.setConcurrentRequests(0); > > ... > return bulkProcessorBuilder.build(); > } > {code} > this field value was set to zero explicitly. So, all things seems to make > sense, but i still wonder why the retry operation is not in the same thread > as the bulk process execution, after i read the code, `bulkAsync` method > might be the last puzzle. > {code:java} > @Override > public BulkProcessor.Builder createBulkProcessorBuilder(RestHighLevelClient > client, BulkProcessor.Listener listener) { > return BulkProcessor.builder(client::bulkAsync, listener); > } >
[jira] [Commented] (FLINK-11046) ElasticSearch6Connector cause thread blocked when index failed with retry
[ https://issues.apache.org/jira/browse/FLINK-11046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747831#comment-16747831 ] xueyu commented on FLINK-11046: --- Hi, [~dawidwys], sorry that I only writed the codes and didn't write any test and end-to-end test yet...The codes are on [my branch|https://github.com/xueyumusic/flink/tree/es6-onfailure]. According to my current work status I estimated that still need about two weeks from my side.. If this issue is urgent, please feel free to take it over. Thank you and sorry for delay..., [~dawidwys] > ElasticSearch6Connector cause thread blocked when index failed with retry > - > > Key: FLINK-11046 > URL: https://issues.apache.org/jira/browse/FLINK-11046 > Project: Flink > Issue Type: Bug > Components: ElasticSearch Connector >Affects Versions: 1.6.2 >Reporter: luoguohao >Assignee: xueyu >Priority: Major > > When i'm using es6 sink to index into es, bulk process with some exception > catched, and i trying to reindex the document with the call > `indexer.add(action)` in the `ActionRequestFailureHandler.onFailure()` > method, but things goes incorrect. The call thread stuck there, and with the > thread dump, i saw the `bulkprocessor` object was locked by other thread. > {code:java} > public interface ActionRequestFailureHandler extends Serializable { > void onFailure(ActionRequest action, Throwable failure, int restStatusCode, > RequestIndexer indexer) throws Throwable; > } > {code} > After i read the code implemented in the `indexer.add(action)`, i find that > `synchronized` is needed on each add operation. > {code:java} > private synchronized void internalAdd(DocWriteRequest request, @Nullable > Object payload) { > ensureOpen(); > bulkRequest.add(request, payload); > executeIfNeeded(); > } > {code} > And, at i also noticed that `bulkprocessor` object would also locked in the > bulk process thread. > the bulk process operation is in the following code: > {code:java} > public void execute(BulkRequest bulkRequest, long executionId) { > Runnable toRelease = () -> {}; > boolean bulkRequestSetupSuccessful = false; > try { > listener.beforeBulk(executionId, bulkRequest); > semaphore.acquire(); > toRelease = semaphore::release; > CountDownLatch latch = new CountDownLatch(1); > retry.withBackoff(consumer, bulkRequest, new > ActionListener() { > @Override > public void onResponse(BulkResponse response) { > try { > listener.afterBulk(executionId, bulkRequest, response); > } finally { > semaphore.release(); > latch.countDown(); > } > } > @Override > public void onFailure(Exception e) { > try { > listener.afterBulk(executionId, bulkRequest, e); > } finally { > semaphore.release(); > latch.countDown(); > } > } > }, Settings.EMPTY); > bulkRequestSetupSuccessful = true; >if (concurrentRequests == 0) { >latch.await(); > } > } catch (InterruptedException e) { > Thread.currentThread().interrupt(); > logger.info(() -> new ParameterizedMessage("Bulk request {} has been > cancelled.", executionId), e); > listener.afterBulk(executionId, bulkRequest, e); > } catch (Exception e) { > logger.warn(() -> new ParameterizedMessage("Failed to execute bulk > request {}.", executionId), e); > listener.afterBulk(executionId, bulkRequest, e); > } finally { > if (bulkRequestSetupSuccessful == false) { // if we fail on > client.bulk() release the semaphore > toRelease.run(); > } > } > } > {code} > As the read line i marked above, i think, that's the reason why the retry > operation thread was block, because the the bulk process thread never release > the lock on `bulkprocessor`. and, i also trying to figure out why the field > `concurrentRequests` was set to zero. And i saw the the initialize for > bulkprocessor in class `ElasticsearchSinkBase`: > {code:java} > protected BulkProcessor buildBulkProcessor(BulkProcessor.Listener listener) { > ... > BulkProcessor.Builder bulkProcessorBuilder = > callBridge.createBulkProcessorBuilder(client, listener); > // This makes flush() blocking > bulkProcessorBuilder.setConcurrentRequests(0); > > ... > return bulkProcessorBuilder.build(); > } > {code} > this field value was set to zero explicitly. So, all things seems to make > sense, but i still wonder why the retry operation is not in the same thread > as the bulk
[GitHub] zhangxinyu1 commented on issue #6770: [FLINK-10002] [Webfrontend] WebUI shows jm/tm logs more friendly.
zhangxinyu1 commented on issue #6770: [FLINK-10002] [Webfrontend] WebUI shows jm/tm logs more friendly. URL: https://github.com/apache/flink/pull/6770#issuecomment-456045406 Three classes: `JobManagerLogFileHandlerTest.java`, `JobManagerLogListHandlerTest.java`, `TaskManagerLogListHandler.java` and a method `testRequestUploadFile` in TaskExecutor.java are added for unit testing. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #7497: [FLINK-11338] Bump maven-enforcer-plugin from 3.0.0-M1 to 3.0.0-M2
Fokko commented on issue #7497: [FLINK-11338] Bump maven-enforcer-plugin from 3.0.0-M1 to 3.0.0-M2 URL: https://github.com/apache/flink/pull/7497#issuecomment-456021166 Mostly getting ready for Java8> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dawidwys commented on issue #7543: [FLINK-10996]Fix the possible state leak with CEP processing time model
dawidwys commented on issue #7543: [FLINK-10996]Fix the possible state leak with CEP processing time model URL: https://github.com/apache/flink/pull/7543#issuecomment-456020369 Hi @Aitozi, First of all I wouldn't say this fixes a bug, but rather introduces a new feature that should be discussed beforehand. I think we should aim for a better approach, that would further unify Processing and Event time that is already covered by [FLINK-7384](https://github.com/apache/flink/pull/4514/files) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #7499: [FLINK-11340] Bump commons-configuration from 1.7 to 1.10
Fokko commented on issue #7499: [FLINK-11340] Bump commons-configuration from 1.7 to 1.10 URL: https://github.com/apache/flink/pull/7499#issuecomment-456022861 https://commons.apache.org/proper/commons-configuration/changes-report.html#a1.10 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #7498: [FLINK-11339] Bump exec-maven-plugin from 1.5.0 to 1.6.0
Fokko commented on issue #7498: [FLINK-11339] Bump exec-maven-plugin from 1.5.0 to 1.6.0 URL: https://github.com/apache/flink/pull/7498#issuecomment-456022154 https://github.com/mojohaus/exec-maven-plugin/compare/exec-maven-plugin-1.5.0...exec-maven-plugin-1.6.0 Dropped support for
[GitHub] Fokko edited a comment on issue #7497: [FLINK-11338] Bump maven-enforcer-plugin from 3.0.0-M1 to 3.0.0-M2
Fokko edited a comment on issue #7497: [FLINK-11338] Bump maven-enforcer-plugin from 3.0.0-M1 to 3.0.0-M2 URL: https://github.com/apache/flink/pull/7497#issuecomment-456021166 Mostly getting ready for Java8> Failed build is a flaky test: https://travis-ci.org/Fokko/flink/builds/480128791 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KurtYoung closed pull request #4940: [FLINK-7959] [table] Split CodeGenerator into CodeGeneratorContext and ExprCodeGenerator.
KurtYoung closed pull request #4940: [FLINK-7959] [table] Split CodeGenerator into CodeGeneratorContext and ExprCodeGenerator. URL: https://github.com/apache/flink/pull/4940 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-7959) Split CodeGenerator into CodeGeneratorContext and ExprCodeGenerator
[ https://issues.apache.org/jira/browse/FLINK-7959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7959: -- Labels: pull-request-available (was: ) > Split CodeGenerator into CodeGeneratorContext and ExprCodeGenerator > --- > > Key: FLINK-7959 > URL: https://issues.apache.org/jira/browse/FLINK-7959 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Kurt Young >Assignee: Kurt Young >Priority: Major > Labels: pull-request-available > > Right now {{CodeGenerator}} actually acts two roles, one is responsible for > generating codes from RexNode, and the other one is keeping lots of reusable > statements. It makes more sense to split these logic into two dedicated > classes. > The new {{CodeGeneratorContext}} will keep all the reusable statements, while > the new {{ExprCodeGenerator}} will only do generating codes from RexNode. > And for classes like {{AggregationCodeGenerator}} or > {{FunctionCodeGenerator}}, I think the should not be the subclasses of the > {{CodeGenerator}}, but should all as standalone classes. They can create > {{ExprCodeGenerator}} when they need to generating codes from RexNode, and > they can also generating codes by themselves. The {{CodeGeneratorContext}} > can be passed around to collect all reusable statements, and list them in the > final generated class. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] godfreyhe closed pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning
godfreyhe closed pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning URL: https://github.com/apache/flink/pull/4667 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-5859) support partition pruning on Table API & SQL
[ https://issues.apache.org/jira/browse/FLINK-5859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-5859: -- Labels: pull-request-available (was: ) > support partition pruning on Table API & SQL > > > Key: FLINK-5859 > URL: https://issues.apache.org/jira/browse/FLINK-5859 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: godfrey he >Assignee: godfrey he >Priority: Major > Labels: pull-request-available > > Many data sources are partitionable storage, e.g. HDFS, Druid. And many > queries just need to read a small subset of the total data. We can use > partition information to prune or skip over files irrelevant to the user’s > queries. Both query optimization time and execution time can be reduced > obviously, especially for a large partitioned table. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] godfreyhe closed pull request #3860: [FLINK-6516] [table] using real row count instead of dummy row count when optimizing plan
godfreyhe closed pull request #3860: [FLINK-6516] [table] using real row count instead of dummy row count when optimizing plan URL: https://github.com/apache/flink/pull/3860 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] godfreyhe closed pull request #4820: [FLINK-7834] [table] cost model extends network cost and memory cost
godfreyhe closed pull request #4820: [FLINK-7834] [table] cost model extends network cost and memory cost URL: https://github.com/apache/flink/pull/4820 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] godfreyhe closed pull request #4682: [FLINK-7620] [table] Supports user defined optimization phase
godfreyhe closed pull request #4682: [FLINK-7620] [table] Supports user defined optimization phase URL: https://github.com/apache/flink/pull/4682 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] godfreyhe opened a new pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning
godfreyhe opened a new pull request #4667: [FLINK-5859] [table] Add PartitionableTableSource for partition pruning URL: https://github.com/apache/flink/pull/4667 ## What is the purpose of the change This pull request adds PartitionableTableSource for partition pruning when optimizing the query plan. That way both query optimization time and execution time can be reduced obviously, especially for a large partitioned table. ## Brief change log - *Adds PartitionableTableSource which extends FilterableTableSource* - *Adds setRelBuilder method in FilterableTableSource class* - *Adds implementation for partition pruning and extracting partition predicates* ## Verifying this change This change added tests and can be verified as follows: - *Added integration tests for PartitionableTableSource on batch and stream sql* - *Added test that validates the correct of partition pruning* - *Added test that validates the correct of extracting partition predicates* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) ## Documentation - Does this pull request introduce a new feature? (yes) - If yes, how is the feature documented? (JavaDocs) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] zhangxinyu1 commented on issue #6770: [FLINK-10002] [Webfrontend] WebUI shows jm/tm logs more friendly.
zhangxinyu1 commented on issue #6770: [FLINK-10002] [Webfrontend] WebUI shows jm/tm logs more friendly. URL: https://github.com/apache/flink/pull/6770#issuecomment-456047509 @yanghua @tillrohrmann @ifndef-SleePy Sorry to bother you. Could you please look at this pr if you have time? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-6626) Unifying lifecycle management of SubmittedJobGraph- and CompletedCheckpointStore
[ https://issues.apache.org/jira/browse/FLINK-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747808#comment-16747808 ] Till Rohrmann commented on FLINK-6626: -- Sure [~app-tarush]. Before diving into the implementation details I would be interested in your solution idea/design. > Unifying lifecycle management of SubmittedJobGraph- and > CompletedCheckpointStore > > > Key: FLINK-6626 > URL: https://issues.apache.org/jira/browse/FLINK-6626 > Project: Flink > Issue Type: Improvement > Components: Distributed Coordination >Affects Versions: 1.3.0, 1.4.0 >Reporter: Till Rohrmann >Assignee: Tarush Grover >Priority: Major > > Currently, Flink uses the {{SubmittedJobGraphStore}} to persist {{JobGraphs}} > such that they can be recovered in case of failures. The > {{SubmittedJobGraphStore}} is managed by by the {{JobManager}}. Additionally, > Flink has the {{CompletedCheckpointStore}} which stores checkpoints for a > given {{ExecutionGraph}}/job. The {{CompletedCheckpointStore}} is managed by > the {{CheckpointCoordinator}}. > The {{SubmittedJobGraphStore}} and the {{CompletedCheckpointStore}} are > somewhat related because in the latter we store checkpoints for jobs > contained in the former. I think it would be nice wrt lifecycle management to > let the {{SubmittedJobGraphStore}} manage the lifecycle of the > {{CompletedCheckpointStore}}, because often it does not make much sense to > keep only checkpoints without a job or a job without checkpoints. > An idea would be when we register a job with the {{SubmittedJobGraphStore}} > then it returns a {{CompletedCheckpointStore}}. This store can then be given > to the {{CheckpointCoordinator}} to store the checkpoints. When a job enters > a terminal state it could be the responsibility of the > {{SubmittedJobGraphStore}} to decide what to do with the job data > ({{JobGraph}} and {{Checkpoints}}), e.g. keeping it or cleaning it up. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] wuchong closed pull request #4189: [FLINK-6958] [async] Async I/O hang when the source is bounded and collect timeout
wuchong closed pull request #4189: [FLINK-6958] [async] Async I/O hang when the source is bounded and collect timeout URL: https://github.com/apache/flink/pull/4189 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Guibo-Pan commented on issue #6773: [FLINK-10152] [table] Add reverse function in Table API and SQL
Guibo-Pan commented on issue #6773: [FLINK-10152] [table] Add reverse function in Table API and SQL URL: https://github.com/apache/flink/pull/6773#issuecomment-455989720 @twalthr @xccui @tillrohrmann Is here anyone who can help review this? Thanks~ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249114818 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java ## @@ -251,6 +269,18 @@ public FlinkKafkaConsumerBase( // Configuration // + /** +* Make sure that auto commit is disabled when our offset commit mode is ON_CHECKPOINTS. +* This overwrites whatever setting the user configured in the properties. +* @param properties - Kafka configuration properties to be adjusted +* @param offsetCommitMode offset commit mode +*/ + protected static void adjustAutoCommitConfig(Properties properties, OffsetCommitMode offsetCommitMode) { Review comment: The method can be package-private. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] wuchong closed pull request #4145: [FLINK-6938][FLINK-6939] [cep] Not store IterativeCondition with NFA state and support RichFunction interface
wuchong closed pull request #4145: [FLINK-6938][FLINK-6939] [cep] Not store IterativeCondition with NFA state and support RichFunction interface URL: https://github.com/apache/flink/pull/4145 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-6958) Async I/O timeout not work
[ https://issues.apache.org/jira/browse/FLINK-6958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-6958: -- Labels: pull-request-available (was: ) > Async I/O timeout not work > -- > > Key: FLINK-6958 > URL: https://issues.apache.org/jira/browse/FLINK-6958 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.2.1 >Reporter: feng xiaojie >Assignee: Jark Wu >Priority: Major > Labels: pull-request-available > > when use Async I/O with UnorderedStreamElementQueue, the queue will always > full if you don't call the AsyncCollector.collect to ack them. > Timeout shall collect these entries when the timeout trigger,but it isn't work > I debug find, > when time out, it will call resultFuture.completeExceptionally(error); > but not call UnorderedStreamElementQueue.onCompleteHandler > it will cause that async i/o hang always -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] fhueske commented on issue #6873: [hotfix] Add Review Progress section to PR description template.
fhueske commented on issue #6873: [hotfix] Add Review Progress section to PR description template. URL: https://github.com/apache/flink/pull/6873#issuecomment-456085278 I think if we modify the review checklist in place (i.e., update it in the PR description) this is almost automatically given. AFAIK, only the PR author and members of the ASF Github organization (or maybe even registered Flink committers?) can update the PR description. If we ask committers to sign-off changes with their name, we should be good. A review progress bot would be great to have, IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] hequn8128 commented on a change in pull request #6787: [FLINK-8577][table] Implement proctime DataStream to Table upsert conversion
hequn8128 commented on a change in pull request #6787: [FLINK-8577][table] Implement proctime DataStream to Table upsert conversion URL: https://github.com/apache/flink/pull/6787#discussion_r249476414 ## File path: flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/stream/sql/FromUpsertStreamTest.scala ## @@ -0,0 +1,140 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.table.api.stream.sql + +import org.apache.flink.api.common.typeinfo.BasicTypeInfo +import org.apache.flink.api.java.typeutils.{RowTypeInfo, TupleTypeInfo} +import org.apache.flink.api.scala._ +import org.apache.flink.table.api.Types +import org.apache.flink.table.api.scala._ +import org.apache.flink.table.utils.TableTestUtil.{UpsertTableNode, term, unaryNode} +import org.apache.flink.table.utils.{StreamTableTestUtil, TableTestBase} +import org.apache.flink.types.Row +import org.junit.Test +import org.apache.flink.api.java.tuple.{Tuple2 => JTuple2} +import java.lang.{Boolean => JBool} + +class FromUpsertStreamTest extends TableTestBase { + + private val streamUtil: StreamTableTestUtil = streamTestUtil() + + @Test + def testRemoveUpsertToRetraction() = { +streamUtil.addTableFromUpsert[(Boolean, (Int, String, Long))]( + "MyTable", 'a, 'b.key, 'c, 'proctime.proctime, 'rowtime.rowtime) + +val sql = "SELECT a, b, c, proctime, rowtime FROM MyTable" + +val expected = + unaryNode( +"DataStreamCalc", +UpsertTableNode(0), +term("select", "a", "b", "c", "PROCTIME(proctime) AS proctime", + "CAST(rowtime) AS rowtime") + ) +streamUtil.verifySql(sql, expected) + } + + @Test + def testMaterializeTimeIndicatorAndCalcUpsertToRetractionTranspose() = { Review comment: `Calc` can't always be pushed down(More details in `CalcUpsertToRetractionTransposeRule`), so there are two tests for each of the case: - `testCalcTransposeUpsertToRetraction()` - `testCalcCannotTransposeUpsertToRetraction()` As for `testMaterializeTimeIndicatorAndCalcUpsertToRetractionTranspose()`, maybe it's better to rename to `testMaterializeTimeIndicator()`. It is dedicated to test the materialization logic. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] StefanRRichter commented on issue #7009: [FLINK-10712] Support to restore state when using RestartPipelinedRegionStrategy
StefanRRichter commented on issue #7009: [FLINK-10712] Support to restore state when using RestartPipelinedRegionStrategy URL: https://github.com/apache/flink/pull/7009#issuecomment-456105318 I think I agree with the assessment of the existing operators. About adding a `RecoveryMode` to consider, would that mean that we would prevent all jobs that use union state to work with partial recovery? I think if we just consider a few popular operators like `KafkaConsumer`, that would already prevent a lot of jobs from using different recovery modes. I can see that this comes from the concern about existing code that uses union state. However, stricly speaking it should not be a regression because those recovery modes previously did not support state recovery at all. We also cannot prevent users from making wrong implementations, so I feel like a good thing to do is document what to care care for union state when using such recovery modes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (FLINK-11326) Using offsets to adjust windows to timezones UTC-8 throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/FLINK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kezhu Wang reassigned FLINK-11326: -- Assignee: Kezhu Wang > Using offsets to adjust windows to timezones UTC-8 throws > IllegalArgumentException > -- > > Key: FLINK-11326 > URL: https://issues.apache.org/jira/browse/FLINK-11326 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.3 >Reporter: TANG Wen-hui >Assignee: Kezhu Wang >Priority: Major > > According to comments, we can use offset to adjust windows to timezones other > than UTC-0. For example, in China you would have to specify an offset of > {{Time.hours(-8)}}. > > {code:java} > /** > * Creates a new {@code TumblingEventTimeWindows} {@link WindowAssigner} that > assigns > * elements to time windows based on the element timestamp and offset. > * > * For example, if you want window a stream by hour,but window begins at > the 15th minutes > * of each hour, you can use {@code of(Time.hours(1),Time.minutes(15))},then > you will get > * time windows start at 0:15:00,1:15:00,2:15:00,etc. > * > * Rather than that,if you are living in somewhere which is not using > UTC±00:00 time, > * such as China which is using UTC+08:00,and you want a time window with > size of one day, > * and window begins at every 00:00:00 of local time,you may use {@code > of(Time.days(1),Time.hours(-8))}. > * The parameter of offset is {@code Time.hours(-8))} since UTC+08:00 is 8 > hours earlier than UTC time. > * > * @param size The size of the generated windows. > * @param offset The offset which window start would be shifted by. > * @return The time policy. > */ > public static TumblingEventTimeWindows of(Time size, Time offset) { > return new TumblingEventTimeWindows(size.toMilliseconds(), > offset.toMilliseconds()); > }{code} > > but when offset is smaller than zero, a IllegalArgumentException will be > throwed. > > {code:java} > protected TumblingEventTimeWindows(long size, long offset) { > if (offset < 0 || offset >= size) { > throw new IllegalArgumentException("TumblingEventTimeWindows parameters must > satisfy 0 <= offset < size"); > } > this.size = size; > this.offset = offset; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-6196) Support dynamic schema in Table Function
[ https://issues.apache.org/jira/browse/FLINK-6196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-6196: -- Labels: pull-request-available (was: ) > Support dynamic schema in Table Function > > > Key: FLINK-6196 > URL: https://issues.apache.org/jira/browse/FLINK-6196 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Zhuoluo Yang >Assignee: Zhuoluo Yang >Priority: Major > Labels: pull-request-available > > In many of our use cases. We have to decide the schema of a UDTF at the run > time. For example. udtf('c1, c2, c3') will generate three columns for a > lateral view. > Most systems such as calcite and hive support this feature. However, the > current implementation of flink didn't implement the feature correctly. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] clarkyzl closed pull request #3623: [FLINK-6196] [table] Support dynamic schema in Table Function
clarkyzl closed pull request #3623: [FLINK-6196] [table] Support dynamic schema in Table Function URL: https://github.com/apache/flink/pull/3623 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-11399) Parsing nested ROW()s in SQL
Benoît Paris created FLINK-11399: Summary: Parsing nested ROW()s in SQL Key: FLINK-11399 URL: https://issues.apache.org/jira/browse/FLINK-11399 Project: Flink Issue Type: Bug Components: Table API SQL Affects Versions: 1.7.1 Reporter: Benoît Paris Hi! I'm trying to build a nested structure in SQL (mapping to json with flink-json). This works fine: {code:java} INSERT INTO outputTable SELECT ROW(col1, col2) FROM ( SELECT col1, ROW(col1, col1) as col2 FROM inputTable ) tbl2 {code} (and I use it as a workaround), but it fails in the simpler version: {code:java} INSERT INTO outputTable SELECT ROW(col1, ROW(col1, col1)) FROM inputTable {code} , yielding the following stacktrace: {noformat} Exception in thread "main" org.apache.flink.table.api.SqlParserException: SQL parse failed. Encountered ", ROW" at line 1, column 40. Was expecting one of: ")" ... "," ... "," ... "," ... "," ... "," ... at org.apache.flink.table.calcite.FlinkPlannerImpl.parse(FlinkPlannerImpl.scala:94) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:803) at org.apache.flink.table.api.TableEnvironment.sqlUpdate(TableEnvironment.scala:777) at TestBug.main(TestBug.java:32) Caused by: org.apache.calcite.sql.parser.SqlParseException: Encountered ", ROW" at line 1, column 40. Was expecting one of: ")" ... "," ... "," ... "," ... "," ... "," ... at org.apache.calcite.sql.parser.impl.SqlParserImpl.convertException(SqlParserImpl.java:347) at org.apache.calcite.sql.parser.impl.SqlParserImpl.normalizeException(SqlParserImpl.java:128) at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:137) at org.apache.calcite.sql.parser.SqlParser.parseStmt(SqlParser.java:162) at org.apache.flink.table.calcite.FlinkPlannerImpl.parse(FlinkPlannerImpl.scala:90) ... 3 more Caused by: org.apache.calcite.sql.parser.impl.ParseException: Encountered ", ROW" at line 1, column 40. Was expecting one of: ")" ... "," ... "," ... "," ... "," ... "," ... at org.apache.calcite.sql.parser.impl.SqlParserImpl.generateParseException(SqlParserImpl.java:23019) at org.apache.calcite.sql.parser.impl.SqlParserImpl.jj_consume_token(SqlParserImpl.java:22836) at org.apache.calcite.sql.parser.impl.SqlParserImpl.ParenthesizedSimpleIdentifierList(SqlParserImpl.java:4466) at org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression3(SqlParserImpl.java:3328) at org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression2b(SqlParserImpl.java:3066) at org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression2(SqlParserImpl.java:3092) at org.apache.calcite.sql.parser.impl.SqlParserImpl.Expression(SqlParserImpl.java:3045) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SelectExpression(SqlParserImpl.java:1525) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SelectItem(SqlParserImpl.java:1500) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SelectList(SqlParserImpl.java:1477) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlSelect(SqlParserImpl.java:912) at org.apache.calcite.sql.parser.impl.SqlParserImpl.LeafQuery(SqlParserImpl.java:552) at org.apache.calcite.sql.parser.impl.SqlParserImpl.LeafQueryOrExpr(SqlParserImpl.java:3030) at org.apache.calcite.sql.parser.impl.SqlParserImpl.QueryOrExpr(SqlParserImpl.java:2949) at org.apache.calcite.sql.parser.impl.SqlParserImpl.OrderedQueryOrExpr(SqlParserImpl.java:463) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlInsert(SqlParserImpl.java:1212) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlStmt(SqlParserImpl.java:847) at org.apache.calcite.sql.parser.impl.SqlParserImpl.SqlStmtEof(SqlParserImpl.java:869) at org.apache.calcite.sql.parser.impl.SqlParserImpl.parseSqlStmtEof(SqlParserImpl.java:184) at org.apache.calcite.sql.parser.SqlParser.parseQuery(SqlParser.java:130) ... 5 more{noformat} I was thinking it could be a naming/referencing issue; or I was not using ROW() properly, in the json-idiomatic way I want to push on it. Anyway this is very minor, thanks for all the good work on Flink! Cheers, Ben -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (FLINK-7834) cost model (RelOptCost) extends network cost and memory cost
[ https://issues.apache.org/jira/browse/FLINK-7834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-7834: -- Labels: pull-request-available (was: ) > cost model (RelOptCost) extends network cost and memory cost > > > Key: FLINK-7834 > URL: https://issues.apache.org/jira/browse/FLINK-7834 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: godfrey he >Priority: Major > Labels: pull-request-available > > {{RelOptCost}} defines an interface for optimizer cost in terms of number of > row processed, CPU cost, and I/O cost. Flink is a distributed framework, > network and memory are also two important cost metrics which should be > considered for optimizer. This feature is to extend RelOptCost. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-11249) FlinkKafkaProducer011 can not be migrated to FlinkKafkaProducer
[ https://issues.apache.org/jira/browse/FLINK-11249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16747937#comment-16747937 ] Piotr Nowojski commented on FLINK-11249: I have realised about one more issue. This fix alone might not fully solve the problem. With this fix in place, user will be able to update his job from {{0.11}} connector to the universal one, but what happens if he upgrades the Kafka Brokers at any point of time? If user stops Kafka Brokers, upgrades and then restarts them, does this process preserves the pending transactions, that Flink already "pre committed"? Or are they automatically aborted? If they are automatically aborted we might have a data loss from our perspective. If "pre committed" transactions are aborted during the Kafka brokers upgrades, we would need "clean stop with savepoint" feature to handle this user story. I guess this needs more experiments and more testing. CC [~tzulitai] [~aljoscha] > FlinkKafkaProducer011 can not be migrated to FlinkKafkaProducer > --- > > Key: FLINK-11249 > URL: https://issues.apache.org/jira/browse/FLINK-11249 > Project: Flink > Issue Type: Sub-task > Components: Kafka Connector >Affects Versions: 1.7.0, 1.7.1 >Reporter: Piotr Nowojski >Assignee: vinoyang >Priority: Blocker > Labels: pull-request-available > Fix For: 1.7.2, 1.8.0 > > Time Spent: 10m > Remaining Estimate: 0h > > As reported by a user on the mailing list "How to migrate Kafka Producer ?" > (on 18th December 2018), {{FlinkKafkaProducer011}} can not be migrated to > {{FlinkKafkaProducer}} and the same problem can occur in the future Kafka > producer versions/refactorings. > The issue is that {{ListState > FlinkKafkaProducer#nextTransactionalIdHintState}} field is serialized using > java serializers and this is causing problems/collisions on > {{FlinkKafkaProducer011.NextTransactionalIdHint}} vs > {{FlinkKafkaProducer.NextTransactionalIdHint}}. > To fix that we probably need to release new versions of those classes, that > will rewrite/upgrade this state field to a new one, that doesn't relay on > java serialization. After this, we could drop the support for the old field > and that in turn will allow users to upgrade from 0.11 connector to the > universal one. > One bright side is that technically speaking our {{FlinkKafkaProducer011}} > has the same compatibility matrix as the universal one (it's also forward & > backward compatible with the same Kafka versions), so for the time being > users can stick to {{FlinkKafkaProducer011}}. > FYI [~tzulitai] [~yanghua] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11398) Add a dedicated phase to materialize time indicators for nodes produce updates
Hequn Cheng created FLINK-11398: --- Summary: Add a dedicated phase to materialize time indicators for nodes produce updates Key: FLINK-11398 URL: https://issues.apache.org/jira/browse/FLINK-11398 Project: Flink Issue Type: Improvement Components: Table API SQL Reporter: Hequn Cheng Assignee: Hequn Cheng As discussed [here|https://github.com/apache/flink/pull/6787#discussion_r249056249], we need a dedicated phase to materialize time indicators for nodes produce updates. Details: Currently, we materialize time indicators in `RelTimeInidicatorConverter`. We need to introduce another materialize phase that materializes all time attributes on nodes that produce updates. We can not do it inside `RelTimeInidicatorConverter`, because only later, after physical optimization phase, we know whether it is a non-window outer join which will produce updates There are a few other things we need to consider. - Whether we can unify the two converter phase. - Take window with early fire into consideration(not been implemented yet). In this case, we don't need to materialize time indicators even it produces updates. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11400) JobManagerRunner does not wait for suspension of JobMaster
Till Rohrmann created FLINK-11400: - Summary: JobManagerRunner does not wait for suspension of JobMaster Key: FLINK-11400 URL: https://issues.apache.org/jira/browse/FLINK-11400 Project: Flink Issue Type: Bug Components: Distributed Coordination Affects Versions: 1.7.1, 1.6.3, 1.8.0 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 1.8.0 The {{JobManagerRunner}} does not wait for the suspension of the {{JobMaster}} to finish before granting leadership again. This can lead to a state where the {{JobMaster}} tries to start the {{ExecutionGraph}} but the {{SlotPool}} is still stopped. I suggest to linearize the leadership operations (granting and revoking leadership) similarly to the {{Dispatcher}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] rmetzger commented on issue #6873: [hotfix] Add Review Progress section to PR description template.
rmetzger commented on issue #6873: [hotfix] Add Review Progress section to PR description template. URL: https://github.com/apache/flink/pull/6873#issuecomment-456081384 @zentol: I'm a bit against this `**NOTE: THE REVIEW PROGRESS MUST ONLY BE UPDATED BY AN APACHE FLINK COMMITTER!**` line, as it seems not very inviting for new community members. Maybe as a convention, reviewers put into the comments what they have reviewed, so if a committer who's merging a PR has doubts about the checklist, they can check the comments? I'm also thinking* about implementing a bot that takes care of tracking the checklist. *strongly enough to have created a project locally already :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249367559 ## File path: flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer011.java ## @@ -641,6 +641,9 @@ public void invoke(KafkaTransactionState transaction, IN next, Context context) } else { record = new ProducerRecord<>(targetTopic, null, timestamp, serializedKey, serializedValue); } + for (Map.Entry header: schema.headers(next)) { Review comment: space before `:` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dawidwys commented on a change in pull request #7436: [FLINK-11071][core] add support for dynamic proxy classes resolution …
dawidwys commented on a change in pull request #7436: [FLINK-11071][core] add support for dynamic proxy classes resolution … URL: https://github.com/apache/flink/pull/7436#discussion_r249462832 ## File path: flink-core/src/test/java/org/apache/flink/util/InstantiationUtilTest.java ## @@ -46,6 +50,17 @@ */ public class InstantiationUtilTest extends TestLogger { + @Test + public void testResolveProxyClass() throws Exception { Review comment: This test does not check the new behavior. The proxy class is available in the same classloader as `InstantiationUtil`, which corresponds to a situation that the proxy class is available on the parent classpath in flink cluster. What you should actually test is when the proxy class is only available from user classloader. In `org.apache.flink.runtime.classloading.ClassLoaderTest#testMessageDecodingWithUnavailableClass` you can check how to create a new classloader with a class. Also always please make sure that your test fails without the changes that are supposed to fix the tested problem (It is not the case here, it passes without your changes). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11398) Add a dedicated phase to materialize time indicators for nodes produce updates
[ https://issues.apache.org/jira/browse/FLINK-11398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hequn Cheng updated FLINK-11398: Description: As discussed [here|https://github.com/apache/flink/pull/6787#discussion_r247926320], we need a dedicated phase to materialize time indicators for nodes produce updates. Details: Currently, we materialize time indicators in `RelTimeInidicatorConverter`. We need to introduce another materialize phase that materializes all time attributes on nodes that produce updates. We can not do it inside `RelTimeInidicatorConverter`, because only later, after physical optimization phase, we know whether it is a non-window outer join which will produce updates There are a few other things we need to consider. - Whether we can unify the two converter phase. - Take window with early fire into consideration(not been implemented yet). In this case, we don't need to materialize time indicators even it produces updates. was: As discussed [here|https://github.com/apache/flink/pull/6787#discussion_r249056249], we need a dedicated phase to materialize time indicators for nodes produce updates. Details: Currently, we materialize time indicators in `RelTimeInidicatorConverter`. We need to introduce another materialize phase that materializes all time attributes on nodes that produce updates. We can not do it inside `RelTimeInidicatorConverter`, because only later, after physical optimization phase, we know whether it is a non-window outer join which will produce updates There are a few other things we need to consider. - Whether we can unify the two converter phase. - Take window with early fire into consideration(not been implemented yet). In this case, we don't need to materialize time indicators even it produces updates. > Add a dedicated phase to materialize time indicators for nodes produce updates > -- > > Key: FLINK-11398 > URL: https://issues.apache.org/jira/browse/FLINK-11398 > Project: Flink > Issue Type: Improvement > Components: Table API SQL >Reporter: Hequn Cheng >Assignee: Hequn Cheng >Priority: Major > > As discussed > [here|https://github.com/apache/flink/pull/6787#discussion_r247926320], we > need a dedicated phase to materialize time indicators for nodes produce > updates. > Details: > Currently, we materialize time indicators in `RelTimeInidicatorConverter`. We > need to introduce another materialize phase that materializes all time > attributes on nodes that produce updates. We can not do it inside > `RelTimeInidicatorConverter`, because only later, after physical optimization > phase, we know whether it is a non-window outer join which will produce > updates > There are a few other things we need to consider. > - Whether we can unify the two converter phase. > - Take window with early fire into consideration(not been implemented yet). > In this case, we don't need to materialize time indicators even it produces > updates. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] bowenli86 commented on issue #7011: [FLINK-10768][Table & SQL] Move external catalog related code from TableEnvironment to CatalogManager
bowenli86 commented on issue #7011: [FLINK-10768][Table & SQL] Move external catalog related code from TableEnvironment to CatalogManager URL: https://github.com/apache/flink/pull/7011#issuecomment-456154882 closing this PR in favor of new plan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bowenli86 closed pull request #7011: [FLINK-10768][Table & SQL] Move external catalog related code from TableEnvironment to CatalogManager
bowenli86 closed pull request #7011: [FLINK-10768][Table & SQL] Move external catalog related code from TableEnvironment to CatalogManager URL: https://github.com/apache/flink/pull/7011 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bowenli86 closed pull request #6970: [FLINK-10696][Table API & SQL]Add APIs to ExternalCatalog, CrudExternalCatalog and InMemoryCrudExternalCatalog for views and UDFs
bowenli86 closed pull request #6970: [FLINK-10696][Table API & SQL]Add APIs to ExternalCatalog, CrudExternalCatalog and InMemoryCrudExternalCatalog for views and UDFs URL: https://github.com/apache/flink/pull/6970 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bowenli86 closed pull request #7012: [FLINK-10769][Table & SQL] port InMemoryExternalCatalog to java
bowenli86 closed pull request #7012: [FLINK-10769][Table & SQL] port InMemoryExternalCatalog to java URL: https://github.com/apache/flink/pull/7012 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bowenli86 commented on issue #7012: [FLINK-10769][Table & SQL] port InMemoryExternalCatalog to java
bowenli86 commented on issue #7012: [FLINK-10769][Table & SQL] port InMemoryExternalCatalog to java URL: https://github.com/apache/flink/pull/7012#issuecomment-456154851 closing this PR in favor of new plan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] bowenli86 commented on issue #6997: [FLINK-10697][Table & SQL] Create an in-memory catalog that stores Flink's meta objects
bowenli86 commented on issue #6997: [FLINK-10697][Table & SQL] Create an in-memory catalog that stores Flink's meta objects URL: https://github.com/apache/flink/pull/6997#issuecomment-456154907 closing this PR in favor of new plan This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249512219 ## File path: flink-connectors/flink-connector-kafka-0.8/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/SimpleConsumerThread.java ## @@ -370,8 +370,8 @@ else if (partitionsRemoved) { keyPayload.get(keyBytes); } - final T value = deserializer.deserialize(keyBytes, valueBytes, - currentPartition.getTopic(), currentPartition.getPartition(), offset); + final T value = deserializer.deserialize( Review comment: Here we could try: ``` ConsumerRecord consumerRecord = new ConsumerRecord<>(currentPartition.getTopic(), currentPartition.getPartition(), keyBytes, valueBytes, offset); final T value = deserializer.deserialize(consumerRecord); ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249368384 ## File path: flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka011ConsumerRecord.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.kafka.internal; + +import org.apache.flink.shaded.guava18.com.google.common.base.Function; +import org.apache.flink.shaded.guava18.com.google.common.collect.Iterables; + +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.common.header.Header; + +import javax.annotation.Nonnull; +import javax.annotation.Nullable; + +import java.util.AbstractMap; +import java.util.Map; + +/** + * Extends base Kafka09ConsumerRecord to provide access to Kafka headers. Review comment: could you put into `{@link Kafka09ConsumerRecord}` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249478973 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.java ## @@ -32,6 +34,49 @@ */ @PublicEvolving public interface KeyedDeserializationSchema extends Serializable, ResultTypeQueryable { + /** +* Kafka record to be deserialized. +* Record consists of key,value pair, topic name, partition offset, headers and a timestamp (if available) +*/ + interface Record { Review comment: kafka-clients 0.8 actually has `ConsumerRecord`, just `SimpleConsumerThread` does not use it but it seems to be possible to manually wrap `MessageAndOffset` into that `ConsumerRecord`. I would give it a try. It seems to be simpler option at the moment and would eliminate currently introduced inheritance for the sake of wrapping. Not sure, though, how big the risk is that the Kafka API changes again. The Flink wrapping, as now, seems to be a safer option but it also adds some performance overhead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249510977 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java ## @@ -82,93 +84,110 @@ */ @Internal public abstract class FlinkKafkaConsumerBase extends RichParallelSourceFunction implements - CheckpointListener, - ResultTypeQueryable, - CheckpointedFunction { + CheckpointListener, Review comment: This class contains a lot of unrelated changes, which makes it more difficult to review. I would suggest to have either a follow-up PR for them or at least put them into a separate commit. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
azagrebin commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249482757 ## File path: flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/internal/Kafka011ConsumerRecord.java ## @@ -0,0 +1,57 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.connectors.kafka.internal; + +import org.apache.flink.shaded.guava18.com.google.common.base.Function; +import org.apache.flink.shaded.guava18.com.google.common.collect.Iterables; + +import org.apache.kafka.clients.consumer.ConsumerRecord; +import org.apache.kafka.common.header.Header; + +import javax.annotation.Nonnull; +import javax.annotation.Nullable; + +import java.util.AbstractMap; +import java.util.Map; + +/** + * Extends base Kafka09ConsumerRecord to provide access to Kafka headers. + */ +class Kafka011ConsumerRecord extends Kafka09ConsumerRecord { + /** +* Wraps {@link Header} as Map.Entry. +*/ + private static final Function> HEADER_TO_MAP_ENTRY_FUNCTION = + new Function>() { + @Nonnull + @Override + public Map.Entry apply(@Nullable Header header) { + return new AbstractMap.SimpleImmutableEntry<>(header.key(), header.value()); + } + }; + + Kafka011ConsumerRecord(ConsumerRecord consumerRecord) { + super(consumerRecord); + } + + @Override + public Iterable> headers() { + return Iterables.transform(consumerRecord.headers(), HEADER_TO_MAP_ENTRY_FUNCTION); Review comment: Could we avoid relying on non-standard java libraries like guava? The record/headers processing is also on the time critical path of per record latency. I would suggest to implement our own Iterable wrapper which does only this header wrapping. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11400) JobManagerRunner does not wait for suspension of JobMaster
[ https://issues.apache.org/jira/browse/FLINK-11400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748078#comment-16748078 ] TisonKun commented on FLINK-11400: -- Do you mean introduce a {{recoveryOperation}} in {{JobManagerRunner}}? > JobManagerRunner does not wait for suspension of JobMaster > -- > > Key: FLINK-11400 > URL: https://issues.apache.org/jira/browse/FLINK-11400 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.6.3, 1.7.1, 1.8.0 >Reporter: Till Rohrmann >Assignee: Till Rohrmann >Priority: Major > Fix For: 1.8.0 > > > The {{JobManagerRunner}} does not wait for the suspension of the > {{JobMaster}} to finish before granting leadership again. This can lead to a > state where the {{JobMaster}} tries to start the {{ExecutionGraph}} but the > {{SlotPool}} is still stopped. > I suggest to linearize the leadership operations (granting and revoking > leadership) similarly to the {{Dispatcher}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] GJL opened a new pull request #7546: [FLINK-11390][tests] Port YARNSessionCapacitySchedulerITCase#testTaskManagerFailure to new code base
GJL opened a new pull request #7546: [FLINK-11390][tests] Port YARNSessionCapacitySchedulerITCase#testTaskManagerFailure to new code base URL: https://github.com/apache/flink/pull/7546 ## What is the purpose of the change *Port `YARNSessionCapacitySchedulerITCase#testTaskManagerFailure` to new code base.* cc: @tillrohrmann ## Brief change log - *Extract HA test out of `testTaskManagerFailure`* - *Rename test `testTaskManagerFailure` so that the name reflects what is actually asserted* ## Verifying this change This change is already covered by existing tests, such as *itself*. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / **no**) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / **no**) - The serializers: (yes / **no** / don't know) - The runtime per-record code paths (performance sensitive): (yes / **no** / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / **no** / don't know) - The S3 file system connector: (yes / **no** / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / **no**) - If yes, how is the feature documented? (**not applicable** / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11401) Allow compression on ParquetBulkWriter
[ https://issues.apache.org/jira/browse/FLINK-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11401: --- Labels: pull-request-available (was: ) > Allow compression on ParquetBulkWriter > -- > > Key: FLINK-11401 > URL: https://issues.apache.org/jira/browse/FLINK-11401 > Project: Flink > Issue Type: Improvement >Affects Versions: 1.7.1 >Reporter: Fokko Driesprong >Assignee: Fokko Driesprong >Priority: Major > Labels: pull-request-available > Fix For: 1.8.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko opened a new pull request #7547: [FLINK-11401] Allow setting of compression on ParquetWriter
Fokko opened a new pull request #7547: [FLINK-11401] Allow setting of compression on ParquetWriter URL: https://github.com/apache/flink/pull/7547 ## What is the purpose of the change *(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)* ## Brief change log *(for example:)* - *The TaskInfo is stored in the blob store on job creation time as a persistent artifact* - *Deployments RPC transmits only the blob storage reference* - *TaskManagers retrieve the TaskInfo from the blob cache* ## Verifying this change *(Please pick either of the following options)* This change is a trivial rework / code cleanup without any test coverage. *(or)* This change is already covered by existing tests, such as *(please describe tests)*. *(or)* This change added tests and can be verified as follows: *(example:)* - *Added integration tests for end-to-end deployment with large payloads (100MB)* - *Extended integration test for recovery after master (JobManager) failure* - *Added test that validates that TaskInfo is transferred only once across recoveries* - *Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.* ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (yes / no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes / no) - The serializers: (yes / no / don't know) - The runtime per-record code paths (performance sensitive): (yes / no / don't know) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know) - The S3 file system connector: (yes / no / don't know) ## Documentation - Does this pull request introduce a new feature? (yes / no) - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-11401) Allow compression on ParquetBulkWriter
Fokko Driesprong created FLINK-11401: Summary: Allow compression on ParquetBulkWriter Key: FLINK-11401 URL: https://issues.apache.org/jira/browse/FLINK-11401 Project: Flink Issue Type: Improvement Affects Versions: 1.7.1 Reporter: Fokko Driesprong Assignee: Fokko Driesprong Fix For: 1.8.0 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249606898 ## File path: flink-connectors/flink-connector-kafka-0.11/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaProducer011.java ## @@ -641,6 +641,9 @@ public void invoke(KafkaTransactionState transaction, IN next, Context context) } else { record = new ProducerRecord<>(targetTopic, null, timestamp, serializedKey, serializedValue); } + for (Map.Entry header: schema.headers(next)) { Review comment: Pushed e0ca3e0 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kezhuw opened a new pull request #7548: [FLINK-11326] Fix forbidden negative offset in window assigners
kezhuw opened a new pull request #7548: [FLINK-11326] Fix forbidden negative offset in window assigners URL: https://github.com/apache/flink/pull/7548 ## What is the purpose of the change Allow negative window offset in window assignment as the javadoc says. ## Brief change log - Allow negative window offset in window assignment. - Throw `IllegalArgumentException` if offset is out of range for `SlidingEventTimeWindows.of` and `SlidingProcessingTimeWindows.of`. ## Verifying this change This change is already covered by existing tests and new test cases has been added to allow negative window offset. ## Does this pull request potentially affect one of the following parts: - Dependencies (does it add or upgrade a dependency): (no) - The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (yes) - The serializers: (no) - The runtime per-record code paths (performance sensitive): (no) - Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) - The S3 file system connector: (no) ## Documentation - Does this pull request introduce a new feature? (no) - If yes, how is the feature documented? (not applicable) ## Breaking changes * `@PublicEvolving` API `SlidingEventTimeWindows.of` and `SlidingProcessingTimeWindows.of` allows out of window offset previously, this merge request forbid this behavior. This way they behaves same as `TumblingEventTimeWindows.of` and `TumblingProcessingTimeWindows.of`. I think it is easier for caller to understand [sliding windows](https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/windows.html#sliding-windows). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
[ https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-11402: Component/s: Local Runtime > User code can fail with an UnsatisfiedLinkError in the presence of multiple > classloaders > > > Key: FLINK-11402 > URL: https://issues.apache.org/jira/browse/FLINK-11402 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination, Local Runtime >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Priority: Major > Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz > > > As reported on the user mailing list thread "[`env.java.opts` not persisting > after job canceled or failed and then > restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, > there can be issues with using native libraries and user code class loading. > h2. Steps to reproduce > I was able to reproduce the issue reported on the mailing list using > [snappy-java|https://github.com/xerial/snappy-java] in a user program. > Running the attached user program works fine on initial submission, but > results in a failure when re-executed. > I'm using Flink 1.7.0 using a standalone cluster started via > {{bin/start-cluster.sh}}. > 0. Unpack attached Maven project and build using {{mvn clean package}} *or* > directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} > 1. Download > [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] > and unpack libsnappyjava for your system: > {code} > jar tf snappy-java-1.1.7.2.jar | grep libsnappy > ... > org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so > ... > org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib > ... > {code} > 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} > (path needs to be adjusted for your system): > {code} > env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 > {code} > 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > Program execution finished > Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. > Job Runtime: 359 ms > {code} > 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: 7d69baca58f33180cb9251449ddcd396) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > ... 17 more > Caused by: java.lang.UnsatisfiedLinkError: Native Library > /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded > in another classloader > at
[jira] [Updated] (FLINK-11326) Using offsets to adjust windows to timezones UTC-8 throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/FLINK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kezhu Wang updated FLINK-11326: --- Affects Version/s: 1.7.1 > Using offsets to adjust windows to timezones UTC-8 throws > IllegalArgumentException > -- > > Key: FLINK-11326 > URL: https://issues.apache.org/jira/browse/FLINK-11326 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.3, 1.7.1 >Reporter: TANG Wen-hui >Assignee: Kezhu Wang >Priority: Major > > According to comments, we can use offset to adjust windows to timezones other > than UTC-0. For example, in China you would have to specify an offset of > {{Time.hours(-8)}}. > > {code:java} > /** > * Creates a new {@code TumblingEventTimeWindows} {@link WindowAssigner} that > assigns > * elements to time windows based on the element timestamp and offset. > * > * For example, if you want window a stream by hour,but window begins at > the 15th minutes > * of each hour, you can use {@code of(Time.hours(1),Time.minutes(15))},then > you will get > * time windows start at 0:15:00,1:15:00,2:15:00,etc. > * > * Rather than that,if you are living in somewhere which is not using > UTC±00:00 time, > * such as China which is using UTC+08:00,and you want a time window with > size of one day, > * and window begins at every 00:00:00 of local time,you may use {@code > of(Time.days(1),Time.hours(-8))}. > * The parameter of offset is {@code Time.hours(-8))} since UTC+08:00 is 8 > hours earlier than UTC time. > * > * @param size The size of the generated windows. > * @param offset The offset which window start would be shifted by. > * @return The time policy. > */ > public static TumblingEventTimeWindows of(Time size, Time offset) { > return new TumblingEventTimeWindows(size.toMilliseconds(), > offset.toMilliseconds()); > }{code} > > but when offset is smaller than zero, a IllegalArgumentException will be > throwed. > > {code:java} > protected TumblingEventTimeWindows(long size, long offset) { > if (offset < 0 || offset >= size) { > throw new IllegalArgumentException("TumblingEventTimeWindows parameters must > satisfy 0 <= offset < size"); > } > this.size = size; > this.offset = offset; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249606803 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java ## @@ -251,6 +269,18 @@ public FlinkKafkaConsumerBase( // Configuration // + /** +* Make sure that auto commit is disabled when our offset commit mode is ON_CHECKPOINTS. +* This overwrites whatever setting the user configured in the properties. +* @param properties - Kafka configuration properties to be adjusted +* @param offsetCommitMode offset commit mode +*/ + protected static void adjustAutoCommitConfig(Properties properties, OffsetCommitMode offsetCommitMode) { Review comment: Pushed 3cfda16 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] KarmaGYZ commented on issue #7423: [hotfix][docs] Remove the legacy hint in production-ready doc
KarmaGYZ commented on issue #7423: [hotfix][docs] Remove the legacy hint in production-ready doc URL: https://github.com/apache/flink/pull/7423#issuecomment-456233322 @alpinegizmo Thanks for the review. In the stale version, this sentence point to the asynchronous snapshotting. However, I keep this sentence because there are [explanation](https://ci.apache.org/projects/flink/flink-docs-master/ops/state/state_backends.html#the-rocksdbstatebackend) about the limitation of the throughput of RocksDBBackend. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dianfu closed pull request #7354: [FLINK-11214] [table] Support non mergable aggregates for group windows on batch table
dianfu closed pull request #7354: [FLINK-11214] [table] Support non mergable aggregates for group windows on batch table URL: https://github.com/apache/flink/pull/7354 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
[ https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748241#comment-16748241 ] Ufuk Celebi commented on FLINK-11402: - Similar issue for RocksDB (FLINK-5408) > User code can fail with an UnsatisfiedLinkError in the presence of multiple > classloaders > > > Key: FLINK-11402 > URL: https://issues.apache.org/jira/browse/FLINK-11402 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Priority: Major > Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz > > > As reported on the user mailing list thread "[`env.java.opts` not persisting > after job canceled or failed and then > restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, > there can be issues with using native libraries and user code class loading. > h2. Steps to reproduce > I was able to reproduce the issue reported on the mailing list using > [snappy-java|https://github.com/xerial/snappy-java] in a user program. > Running the attached user program works fine on initial submission, but > results in a failure when re-executed. > I'm using Flink 1.7.0 using a standalone cluster started via > {{bin/start-cluster.sh}}. > 0. Unpack attached Maven project and build using {{mvn clean package}} *or* > directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} > 1. Download > [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] > and unpack libsnappyjava for your system: > {code} > jar tf snappy-java-1.1.7.2.jar | grep libsnappy > ... > org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so > ... > org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib > ... > {code} > 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} > (path needs to be adjusted for your system): > {code} > env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 > {code} > 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > Program execution finished > Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. > Job Runtime: 359 ms > {code} > 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: 7d69baca58f33180cb9251449ddcd396) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > ... 17 more > Caused by: java.lang.UnsatisfiedLinkError: Native Library > /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded > in another classloader > at
[jira] [Updated] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
[ https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-11402: Attachment: hello-snappy.tgz > User code can fail with an UnsatisfiedLinkError in the presence of multiple > classloaders > > > Key: FLINK-11402 > URL: https://issues.apache.org/jira/browse/FLINK-11402 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Priority: Major > Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz > > > As reported on the user mailing list thread "[`env.java.opts` not persisting > after job canceled or failed and then > restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, > there can be issues with using native libraries and user code class loading. > h2. Steps to reproduce > I was able to reproduce the issue reported on the mailing list using > [snappy-java|https://github.com/xerial/snappy-java] in a user program. > Running the attached user program works fine on initial submission, but > results in a failure when re-executed. > I'm using Flink 1.7.0 using a standalone cluster started via > {{bin/start-cluster.sh}}. > 0. Unpack attached Maven project and build using {{mvn clean package}} *or* > directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} > 1. Download > [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] > and unpack libsnappyjava for your system: > {code} > jar tf snappy-java-1.1.7.2.jar | grep libsnappy > ... > org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so > ... > org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib > ... > {code} > 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} > (path needs to be adjusted for your system): > {code} > env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 > {code} > 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > Program execution finished > Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. > Job Runtime: 359 ms > {code} > 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: 7d69baca58f33180cb9251449ddcd396) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > ... 17 more > Caused by: java.lang.UnsatisfiedLinkError: Native Library > /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded > in another classloader > at
[jira] [Updated] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
[ https://issues.apache.org/jira/browse/FLINK-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-11402: Attachment: hello-snappy-1.0-SNAPSHOT.jar > User code can fail with an UnsatisfiedLinkError in the presence of multiple > classloaders > > > Key: FLINK-11402 > URL: https://issues.apache.org/jira/browse/FLINK-11402 > Project: Flink > Issue Type: Bug > Components: Distributed Coordination >Affects Versions: 1.7.0 >Reporter: Ufuk Celebi >Priority: Major > Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz > > > As reported on the user mailing list thread "[`env.java.opts` not persisting > after job canceled or failed and then > restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, > there can be issues with using native libraries and user code class loading. > h2. Steps to reproduce > I was able to reproduce the issue reported on the mailing list using > [snappy-java|https://github.com/xerial/snappy-java] in a user program. > Running the attached user program works fine on initial submission, but > results in a failure when re-executed. > I'm using Flink 1.7.0 using a standalone cluster started via > {{bin/start-cluster.sh}}. > 0. Unpack attached Maven project and build using {{mvn clean package}} *or* > directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} > 1. Download > [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] > and unpack libsnappyjava for your system: > {code} > jar tf snappy-java-1.1.7.2.jar | grep libsnappy > ... > org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so > ... > org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib > ... > {code} > 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} > (path needs to be adjusted for your system): > {code} > env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 > {code} > 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > Program execution finished > Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. > Job Runtime: 359 ms > {code} > 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} > {code} > bin/flink run hello-snappy-1.0-SNAPSHOT.jar > Starting execution of program > > The program finished with the following exception: > org.apache.flink.client.program.ProgramInvocationException: Job failed. > (JobID: 7d69baca58f33180cb9251449ddcd396) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) > at > org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) > at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) > at > org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) > at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) > at > org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) > at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) > at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) > at > org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) > at > org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) > at > org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) > at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) > Caused by: org.apache.flink.runtime.client.JobExecutionException: Job > execution failed. > at > org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) > at > org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) > ... 17 more > Caused by: java.lang.UnsatisfiedLinkError: Native Library > /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded > in another classloader > at
[GitHub] KarmaGYZ commented on a change in pull request #7260: [hotfix][docs] Fix typo and make improvement in Kafka Connectors doc
KarmaGYZ commented on a change in pull request #7260: [hotfix][docs] Fix typo and make improvement in Kafka Connectors doc URL: https://github.com/apache/flink/pull/7260#discussion_r249611548 ## File path: docs/dev/connectors/kafka.md ## @@ -681,13 +681,13 @@ chosen by passing appropriate `semantic` parameter to the `FlinkKafkaProducer011 `Semantic.EXACTLY_ONCE` mode relies on the ability to commit transactions that were started before taking a checkpoint, after recovering from the said checkpoint. If the time -between Flink application crash and completed restart is larger then Kafka's transaction timeout +between Flink application crash and completed restart is larger than Kafka's transaction timeout there will be data loss (Kafka will automatically abort transactions that exceeded timeout time). Having this in mind, please configure your transaction timeout appropriately to your expected down times. Kafka brokers by default have `transaction.max.timeout.ms` set to 15 minutes. This property will -not allow to set transaction timeouts for the producers larger then it's value. +not allow to set transaction timeouts for the producers larger than it's value. `FlinkKafkaProducer011` by default sets the `transaction.timeout.ms` property in producer config to Review comment: It sounds more clear and fluent to me. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249606472 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.java ## @@ -32,6 +34,49 @@ */ @PublicEvolving public interface KeyedDeserializationSchema extends Serializable, ResultTypeQueryable { + /** +* Kafka record to be deserialized. +* Record consists of key,value pair, topic name, partition offset, headers and a timestamp (if available) +*/ + interface Record { Review comment: @alexeyt820 as pointed out by @azagrebin, kafka 0.8 seems to have `ConsumerRecord` interface https://archive.apache.org/dist/kafka/0.8.2-beta/java-doc/org/apache/kafka/clients/consumer/ConsumerRecord.html As Flink is moving to the direction of just one modern kafka connector, I am also wondering if it is ok to drop kafka 0.8 connector? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (FLINK-11326) Using offsets to adjust windows to timezones UTC-8 throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/FLINK-11326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated FLINK-11326: --- Labels: pull-request-available (was: ) > Using offsets to adjust windows to timezones UTC-8 throws > IllegalArgumentException > -- > > Key: FLINK-11326 > URL: https://issues.apache.org/jira/browse/FLINK-11326 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.3, 1.7.1 >Reporter: TANG Wen-hui >Assignee: Kezhu Wang >Priority: Major > Labels: pull-request-available > > According to comments, we can use offset to adjust windows to timezones other > than UTC-0. For example, in China you would have to specify an offset of > {{Time.hours(-8)}}. > > {code:java} > /** > * Creates a new {@code TumblingEventTimeWindows} {@link WindowAssigner} that > assigns > * elements to time windows based on the element timestamp and offset. > * > * For example, if you want window a stream by hour,but window begins at > the 15th minutes > * of each hour, you can use {@code of(Time.hours(1),Time.minutes(15))},then > you will get > * time windows start at 0:15:00,1:15:00,2:15:00,etc. > * > * Rather than that,if you are living in somewhere which is not using > UTC±00:00 time, > * such as China which is using UTC+08:00,and you want a time window with > size of one day, > * and window begins at every 00:00:00 of local time,you may use {@code > of(Time.days(1),Time.hours(-8))}. > * The parameter of offset is {@code Time.hours(-8))} since UTC+08:00 is 8 > hours earlier than UTC time. > * > * @param size The size of the generated windows. > * @param offset The offset which window start would be shifted by. > * @return The time policy. > */ > public static TumblingEventTimeWindows of(Time size, Time offset) { > return new TumblingEventTimeWindows(size.toMilliseconds(), > offset.toMilliseconds()); > }{code} > > but when offset is smaller than zero, a IllegalArgumentException will be > throwed. > > {code:java} > protected TumblingEventTimeWindows(long size, long offset) { > if (offset < 0 || offset >= size) { > throw new IllegalArgumentException("TumblingEventTimeWindows parameters must > satisfy 0 <= offset < size"); > } > this.size = size; > this.offset = offset; > }{code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11402) User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders
Ufuk Celebi created FLINK-11402: --- Summary: User code can fail with an UnsatisfiedLinkError in the presence of multiple classloaders Key: FLINK-11402 URL: https://issues.apache.org/jira/browse/FLINK-11402 Project: Flink Issue Type: Bug Components: Distributed Coordination Affects Versions: 1.7.0 Reporter: Ufuk Celebi Attachments: hello-snappy-1.0-SNAPSHOT.jar, hello-snappy.tgz As reported on the user mailing list thread "[`env.java.opts` not persisting after job canceled or failed and then restarted|https://lists.apache.org/thread.html/37cc1b628e16ca6c0bacced5e825de8057f88a8d601b90a355b6a291@%3Cuser.flink.apache.org%3E];, there can be issues with using native libraries and user code class loading. h2. Steps to reproduce I was able to reproduce the issue reported on the mailing list using [snappy-java|https://github.com/xerial/snappy-java] in a user program. Running the attached user program works fine on initial submission, but results in a failure when re-executed. I'm using Flink 1.7.0 using a standalone cluster started via {{bin/start-cluster.sh}}. 0. Unpack attached Maven project and build using {{mvn clean package}} *or* directly use attached {{hello-snappy-1.0-SNAPSHOT.jar}} 1. Download [snappy-java-1.1.7.2.jar|http://central.maven.org/maven2/org/xerial/snappy/snappy-java/1.1.7.2/snappy-java-1.1.7.2.jar] and unpack libsnappyjava for your system: {code} jar tf snappy-java-1.1.7.2.jar | grep libsnappy ... org/xerial/snappy/native/Linux/x86_64/libsnappyjava.so ... org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib ... {code} 2. Configure system library path to {{libsnappyjava}} in {{flink-conf.yaml}} (path needs to be adjusted for your system): {code} env.java.opts: -Djava.library.path=/.../org/xerial/snappy/native/Mac/x86_64 {code} 3. Run attached {{hello-snappy-1.0-SNAPSHOT.jar}} {code} bin/flink run hello-snappy-1.0-SNAPSHOT.jar Starting execution of program Program execution finished Job with JobID ae815b918dd7bc64ac8959e4e224f2b4 has finished. Job Runtime: 359 ms {code} 4. Rerun attached {{hello-snappy-1.0-SNAPSHOT.jar}} {code} bin/flink run hello-snappy-1.0-SNAPSHOT.jar Starting execution of program The program finished with the following exception: org.apache.flink.client.program.ProgramInvocationException: Job failed. (JobID: 7d69baca58f33180cb9251449ddcd396) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487) at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66) at com.github.uce.HelloSnappy.main(HelloSnappy.java:18) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421) at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427) at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813) at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287) at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213) at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050) at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126) at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30) at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126) Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146) at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265) ... 17 more Caused by: java.lang.UnsatisfiedLinkError: Native Library /.../org/xerial/snappy/native/Mac/x86_64/libsnappyjava.jnilib already loaded in another classloader at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1907) at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1861) at java.lang.Runtime.loadLibrary0(Runtime.java:870) at java.lang.System.loadLibrary(System.java:1122) at org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:182) at org.xerial.snappy.SnappyLoader.loadSnappyApi(SnappyLoader.java:154) at org.xerial.snappy.Snappy.(Snappy.java:47) at
[GitHub] stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249607011 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.java ## @@ -32,6 +34,49 @@ */ @PublicEvolving public interface KeyedDeserializationSchema extends Serializable, ResultTypeQueryable { + /** +* Kafka record to be deserialized. +* Record consists of key,value pair, topic name, partition offset, headers and a timestamp (if available) +*/ + interface Record { Review comment: We can add a new deserialize method to `KeyedDeserializationSchema` interface with a default implementation that just forwards to the other deserialize method (mark as deprecated). {code} T deserialize(ConsumerRecord consumerRecord) throws IOException { return deserialize(messageKey, message, topic, partition, offset); } {code} This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
stevenzwu commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249607011 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/util/serialization/KeyedDeserializationSchema.java ## @@ -32,6 +34,49 @@ */ @PublicEvolving public interface KeyedDeserializationSchema extends Serializable, ResultTypeQueryable { + /** +* Kafka record to be deserialized. +* Record consists of key,value pair, topic name, partition offset, headers and a timestamp (if available) +*/ + interface Record { Review comment: We can add a new deserialize method to `KeyedDeserializationSchema` interface with a default implementation that just forwards to the other deserialize method (mark as deprecated). ```java T deserialize(ConsumerRecord consumerRecord) throws IOException { return deserialize(messageKey, message, topic, partition, offset); } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (FLINK-11214) Support non mergable aggregates for group windows on batch table
[ https://issues.apache.org/jira/browse/FLINK-11214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dian Fu closed FLINK-11214. --- Resolution: Won't Do > Support non mergable aggregates for group windows on batch table > > > Key: FLINK-11214 > URL: https://issues.apache.org/jira/browse/FLINK-11214 > Project: Flink > Issue Type: New Feature > Components: Table API SQL >Reporter: Dian Fu >Assignee: Dian Fu >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, it does not support non-mergable aggregates for group window on > batch table. It would be nice to support it as many code paths(but not all) > have already considered it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Clarkkkkk closed pull request #6628: [FLINK-10245] [Streaming Connector] Add Pojo, Tuple, Row and Scala Product DataStream Sink for HBase
Clark closed pull request #6628: [FLINK-10245] [Streaming Connector] Add Pojo, Tuple, Row and Scala Product DataStream Sink for HBase URL: https://github.com/apache/flink/pull/6628 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249637422 ## File path: flink-connectors/flink-connector-kafka-0.8/src/main/java/org/apache/flink/streaming/connectors/kafka/internals/SimpleConsumerThread.java ## @@ -370,8 +370,8 @@ else if (partitionsRemoved) { keyPayload.get(keyBytes); } - final T value = deserializer.deserialize(keyBytes, valueBytes, - currentPartition.getTopic(), currentPartition.getPartition(), offset); + final T value = deserializer.deserialize( Review comment: Thank you for suggestion, I will try This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (FLINK-9920) BucketingSinkFaultToleranceITCase fails on travis
[ https://issues.apache.org/jira/browse/FLINK-9920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16748419#comment-16748419 ] Yun Tang commented on FLINK-9920: - Another instance: [https://api.travis-ci.org/v3/job/481937341/log.txt] [~aljoscha] when you try to verify this case locally. Did you use hadoop-2.8.3 just like travis used, and the local OS is linux not mac-os? > BucketingSinkFaultToleranceITCase fails on travis > - > > Key: FLINK-9920 > URL: https://issues.apache.org/jira/browse/FLINK-9920 > Project: Flink > Issue Type: Bug > Components: filesystem-connector >Affects Versions: 1.6.0, 1.7.0 >Reporter: Chesnay Schepler >Assignee: Aljoscha Krettek >Priority: Critical > Labels: test-stability > Fix For: 1.8.0 > > > https://travis-ci.org/zentol/flink/jobs/407021898 > {code} > Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.082 sec > <<< FAILURE! - in > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSinkFaultToleranceITCase > runCheckpointedProgram(org.apache.flink.streaming.connectors.fs.bucketing.BucketingSinkFaultToleranceITCase) > Time elapsed: 5.696 sec <<< FAILURE! > java.lang.AssertionError: Read line does not match expected pattern. > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.flink.streaming.connectors.fs.bucketing.BucketingSinkFaultToleranceITCase.postSubmit(BucketingSinkFaultToleranceITCase.java:182) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Issue Comment Deleted] (FLINK-11374) See more failover and can filter by time range
[ https://issues.apache.org/jira/browse/FLINK-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lining updated FLINK-11374: --- Comment: was deleted (was: agg by vertex !image-2019-01-22-11-42-33-808.png!) > See more failover and can filter by time range > -- > > Key: FLINK-11374 > URL: https://issues.apache.org/jira/browse/FLINK-11374 > Project: Flink > Issue Type: Improvement > Components: REST, Webfrontend >Reporter: lining >Assignee: lining >Priority: Major > Attachments: image-2019-01-22-11-40-53-135.png, > image-2019-01-22-11-42-33-808.png > > > Now failover just show limit size task failover latest time. If task has > failed many time, we can not see the earlier time failover. Can we add filter > by time to see failover which contains task attemp fail msg. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11405) rest api can see exception by start end time filter
lining created FLINK-11405: -- Summary: rest api can see exception by start end time filter Key: FLINK-11405 URL: https://issues.apache.org/jira/browse/FLINK-11405 Project: Flink Issue Type: Sub-task Reporter: lining Assignee: lining -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (FLINK-11404) web ui add see page and can filter by time
lining created FLINK-11404: -- Summary: web ui add see page and can filter by time Key: FLINK-11404 URL: https://issues.apache.org/jira/browse/FLINK-11404 Project: Flink Issue Type: Sub-task Reporter: lining Assignee: lining -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (FLINK-6101) GroupBy fields with arithmetic expression (include UDF) can not be selected
[ https://issues.apache.org/jira/browse/FLINK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lincoln.lee closed FLINK-6101. -- Resolution: Later > GroupBy fields with arithmetic expression (include UDF) can not be selected > --- > > Key: FLINK-6101 > URL: https://issues.apache.org/jira/browse/FLINK-6101 > Project: Flink > Issue Type: Bug > Components: Table API SQL >Reporter: lincoln.lee >Assignee: lincoln.lee >Priority: Minor > > currently the TableAPI do not support selecting GroupBy fields with > expression either using original field name or the expression > {code} > val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, > 'd, 'e) > .groupBy('e, 'b % 3) > .select('b, 'c.min, 'e, 'a.avg, 'd.count) > {code} > caused > {code} > org.apache.flink.table.api.ValidationException: Cannot resolve [b] given > input [e, ('b % 3), TMP_0, TMP_1, TMP_2]. > {code} > (BTW, this syntax is invalid in RDBMS which will indicate the selected column > is invalid in the select list because it is not contained in either an > aggregate function or the GROUP BY clause in SQL Server.) > and > {code} > val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, > 'd, 'e) > .groupBy('e, 'b % 3) > .select('b%3, 'c.min, 'e, 'a.avg, 'd.count) > {code} > will also cause > {code} > org.apache.flink.table.api.ValidationException: Cannot resolve [b] given > input [e, ('b % 3), TMP_0, TMP_1, TMP_2]. > {code} > and add an alias in groupBy clause "group(e, 'b%3 as 'b)" work without avail. > and apply an UDF doesn’t work either > {code} >table.groupBy('a, Mod('b, 3)).select('a, Mod('b, 3), 'c.count, 'c.count, > 'd.count, 'e.avg) > org.apache.flink.table.api.ValidationException: Cannot resolve [b] given > input [a, org.apache.flink.table.api.scala.batch.table.Mod$('b, 3), TMP_0, > TMP_1, TMP_2]. > {code} > the only way to get this work can be > {code} > val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, > 'd, 'e) > .select('a, 'b%3 as 'b, 'c, 'd, 'e) > .groupBy('e, 'b) > .select('b, 'c.min, 'e, 'a.avg, 'd.count) > {code} > One way to solve this is to add support alias in groupBy clause ( it seems a > bit odd against SQL though TableAPI has a different groupBy grammar), > and I prefer to support select original expressions and UDF in groupBy > clause(make consistent with SQL). > as thus: > {code} > // use expression > val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, > 'd, 'e) > .groupBy('e, 'b % 3) > .select('b % 3, 'c.min, 'e, 'a.avg, 'd.count) > // use UDF > val t = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c, > 'd, 'e) > .groupBy('e, Mod('b,3)) > .select(Mod('b,3), 'c.min, 'e, 'a.avg, 'd.count) > {code} > After had a look into the code, found there was a problem in the groupBy > implementation, validation hadn't considered the expressions in groupBy > clause. it should be noted that a table has been actually changed after > groupBy operation ( a new Table) and the groupBy keys replace the original > field reference in essence. > > What do you think? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249637189 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java ## @@ -82,93 +84,110 @@ */ @Internal public abstract class FlinkKafkaConsumerBase extends RichParallelSourceFunction implements - CheckpointListener, - ResultTypeQueryable, - CheckpointedFunction { + CheckpointListener, Review comment: Yes, sorry. Looks like I intentionally reformat it. Will revert. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers
alexeyt820 commented on a change in pull request #6615: [FLINK-8354] [flink-connectors] Add ability to access and provider Kafka headers URL: https://github.com/apache/flink/pull/6615#discussion_r249637189 ## File path: flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java ## @@ -82,93 +84,110 @@ */ @Internal public abstract class FlinkKafkaConsumerBase extends RichParallelSourceFunction implements - CheckpointListener, - ResultTypeQueryable, - CheckpointedFunction { + CheckpointListener, Review comment: Yes, sorry. Looks like I un-intentionally reformat it. Will revert. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (FLINK-11403) Remove ResultPartitionConsumableNotifier from ResultPartition
zhijiang created FLINK-11403: Summary: Remove ResultPartitionConsumableNotifier from ResultPartition Key: FLINK-11403 URL: https://issues.apache.org/jira/browse/FLINK-11403 Project: Flink Issue Type: Sub-task Components: Network Reporter: zhijiang Assignee: zhijiang Fix For: 1.8.0 This is the precondition for introducing pluggable {{ShuffleService}} on TM side. In current process of creating {{ResultPartition}}, the {{ResultPartitionConsumableNotifier}} regarded as TM level component has to be passed into the constructor. In order to create {{ResultPartition}} easily from {{ShuffleService}}, the required information should be covered by {{ResultPartitionDeploymentDescriptor}} as much as possible, then we could remove this notifier from the constructor. And it is also reasonable for notifying consumable partition via {{TaskActions}} which is already covered in {{ResultPartition}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005)