date:20220224

[jira] [Created] (FLINK-26367) Move sanity check in FlinkService#cancelJob to DefaultDeploymentValidator

2022-02-24 Thread Yang Wang (Jira)

Yang Wang created FLINK-26367:
-

 Summary: Move sanity check in FlinkService#cancelJob to 
DefaultDeploymentValidator
 Key: FLINK-26367
 URL: https://issues.apache.org/jira/browse/FLINK-26367
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Yang Wang


FLINK-26136 has introduced a unified validator, we should move the 
{{FlinkService#cancelJob}} to {{{}DefaultDeploymentValidator{}}}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26334) when the (timestamp - offset + windowSize) is less than 0 the calculation result of TimeWindow.getWindowSTartWithOffset is incorrect

2022-02-24 Thread realdengziqi (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

realdengziqi updated FLINK-26334:
-
Priority: Blocker  (was: Major)

> when the (timestamp - offset + windowSize) is less than 0 the calculation 
> result of TimeWindow.getWindowSTartWithOffset is incorrect
> 
>
> Key: FLINK-26334
> URL: https://issues.apache.org/jira/browse/FLINK-26334
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.0, 1.14.3
> Environment: flink version 1.14.3
>Reporter: realdengziqi
>Priority: Blocker
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
>  
> source code
> {code:java}
> //Method to get the window start for a timestamp.
> //Params:
> //timestamp – epoch millisecond to get the window start.
> //offset – The offset which window start would be shifted by.
> //windowSize – The size of the generated windows.
> //Returns:
> //window start
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
> return timestamp - (timestamp - offset + windowSize) % windowSize;
> } {code}
> If windowSize is 6 seconds, an element with a timestamp of -7000L should be 
> assigned to a window with a start time of -12000L. But this code will assign 
> it to the window whose start time is -6000L.
> According to the current calculation method, when the timestamp is (timestamp 
> - offset + windowSize)  is less than 0, the start time of the calculated time 
> window will be offset by one windowsSide unit in the direction of 0.
> I had a discussion with a friend and thought it was because the current 
> calculation logic is rounding towards 0. We should make it round to -∞.
> Do you think this is a bug. We would like to submit a pull request on github 
> to fix it.
> Below is a sample program for a scrolling window.
> {code:java}
> public class Test01 {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> env
> .fromElements(
> Tuple2.of("a",-7L*1000L),// start time should be 
> -12s
> Tuple2.of("b",-1L*1000L),
> Tuple2.of("c",1L*1000L),
> Tuple2.of("d",7L*1000L)
> )
> .assignTimestampsAndWatermarks(
> 
> WatermarkStrategy.>forMonotonousTimestamps()
> .withTimestampAssigner(
> new 
> SerializableTimestampAssigner>() {
> @Override
> public long 
> extractTimestamp(Tuple2 element, long recordTimestamp) {
> return element.f1;
> }
> }
> )
> )
> .keyBy(r->1)
> .window(TumblingEventTimeWindows.of(Time.seconds(6)))
> .process(
> new ProcessWindowFunction, 
> String, Integer, TimeWindow>() {
> @Override
> public void process(Integer integer, 
> ProcessWindowFunction, String, Integer, 
> TimeWindow>.Context context, Iterable> elements, 
> Collector out) throws Exception {
> for (Tuple2 element : elements) 
> {
> JSONObject item = new JSONObject();
> item.put("data",element.toString());
> item.put("windowStartTime",new 
> Timestamp(context.window().getStart()).toString() );
> item.put("windowEndTime",new 
> Timestamp(context.window().getEnd()).toString() );
> out.collect(item.toJSONString());
> }
> }
> }
> )
> .print();
> env.execute();
> }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032265310


   
   ## CI report:
   
   * ce34627cabe844f0af69ddca6f1b5599e5bf1ef8 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32197)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Closed] (FLINK-26141) Support last-state upgrade mode

2022-02-24 Thread Yang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Wang closed FLINK-26141.
-
Resolution: Fixed

Fixed via:

main: baaed88b21a27df0eb26a5bdce4516ec3c7510c1

> Support last-state upgrade mode
> ---
>
> Key: FLINK-26141
> URL: https://issues.apache.org/jira/browse/FLINK-26141
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Yang Wang
>Priority: Major
>  Labels: pull-request-available
>
> The operator currently only implements savepoint and stateless upgrade 
> strategies for Flink jobs.
> We should investigate if we can provide last-state upgrade strategy that 
> would use the latest available checkpoint the make job upgrades instead of 
> savepoints.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] lindong28 closed pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



lindong28 closed pull request #66:
URL: https://github.com/apache/flink-ml/pull/66


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



lindong28 commented on pull request #66:
URL: https://github.com/apache/flink-ml/pull/66#issuecomment-1050616922


   Thanks for the update! LGTM.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #18917: error

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18917:
URL: https://github.com/apache/flink/pull/18917#issuecomment-1050613556


   
   ## CI report:
   
   * 7bd2b6a4ebda7ea5b727984a4b4877c0dbf8432b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32201)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18917: error

2022-02-24 Thread GitBox



flinkbot commented on pull request #18917:
URL: https://github.com/apache/flink/pull/18917#issuecomment-1050613861


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit 7bd2b6a4ebda7ea5b727984a4b4877c0dbf8432b (Fri Feb 25 
07:47:43 UTC 2022)
   
   **Warnings:**
* Documentation files were touched, but no `docs/content.zh/` files: Update 
Chinese documentation or file Jira ticket.
* **Invalid pull request title: No valid Jira ID provided**
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18917: error

2022-02-24 Thread GitBox



flinkbot commented on pull request #18917:
URL: https://github.com/apache/flink/pull/18917#issuecomment-1050613556


   
   ## CI report:
   
   * 7bd2b6a4ebda7ea5b727984a4b4877c0dbf8432b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] zhaishujie opened a new pull request #18917: error

2022-02-24 Thread GitBox



zhaishujie opened a new pull request #18917:
URL: https://github.com/apache/flink/pull/18917


   
   
   ## What is the purpose of the change
   
   *(For example: This pull request makes task deployment go through the blob 
server, rather than through RPC. That way we avoid re-transferring them on each 
deployment (during recovery).)*
   
   
   ## Brief change log
   
   *(for example:)*
 - *The TaskInfo is stored in the blob store on job creation time as a 
persistent artifact*
 - *Deployments RPC transmits only the blob storage reference*
 - *TaskManagers retrieve the TaskInfo from the blob cache*
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
 - *Added integration tests for end-to-end deployment with large payloads 
(100MB)*
 - *Extended integration test for recovery after master (JobManager) 
failure*
 - *Added test that validates that TaskInfo is transferred only once across 
recoveries*
 - *Manually verified the change by running a 4 node cluser with 2 
JobManagers and 4 TaskManagers, a stateful streaming program, and killing one 
JobManager and two TaskManagers during the execution, verifying that recovery 
happens correctly.*
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (yes / no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / no)
 - The serializers: (yes / no / don't know)
 - The runtime per-record code paths (performance sensitive): (yes / no / 
don't know)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
 - The S3 file system connector: (yes / no / don't know)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (yes / no)
 - If yes, how is the feature documented? (not applicable / docs / JavaDocs 
/ not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-ml] lindong28 commented on pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



lindong28 commented on pull request #66:
URL: https://github.com/apache/flink-ml/pull/66#issuecomment-1050611807


   @zhipeng93 It is generally preferred to have a test to reproduce this bug 
with high likelihood, or be able to reproduce this bug manually. This is 
generally done on a best-effort basis. The goal is to increase the confidence 
that we indeed fixed a bug with this PR.
   
   If the flaky test is really hard to reproduce and there is no known way to 
cover the bug with a new test, we can still choose to merge the PR based on our 
understanding. This is just less than ideal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26354) "-restoreMode" should be "--restoreMode" and should have a shorthand

2022-02-24 Thread Konstantin Knauf (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497959#comment-17497959
 ] 

Konstantin Knauf commented on FLINK-26354:
--

>From my perspective consistency with existing options trumps the other 
>arguments here. So far, we (almost) always have a short option and a long 
>option. So that's what we should do here and in 
>https://issues.apache.org/jira/browse/FLINK-26353 as well.

> "-restoreMode" should be "--restoreMode" and should have a shorthand
> 
>
> Key: FLINK-26354
> URL: https://issues.apache.org/jira/browse/FLINK-26354
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.15.0
>Reporter: Konstantin Knauf
>Priority: Minor
>
> {code:java}
> -restoreMode  Defines how should we restore
> from the given savepoint.
> Supported options: [claim -
> claim ownership of the 
> savepoint
> and delete once it is 
> subsumed,
> no_claim (default) - do not
> claim ownership, the first
> checkpoint will not reuse any
> files from the restored one,
> legacy - the old behaviour, do
> not assume ownership of the
> savepoint files, but can reuse
> some shared files.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497408#comment-17497408
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 7:39 AM:


# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely, because 
neither can not guarantee that "checkpoint triggering speed < state deletion 
speed".
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.


was (Author: ym):
# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely, because 
neither can guarantee that "checkpoint triggering speed < state deletion speed".
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18915:
URL: https://github.com/apache/flink/pull/18915#issuecomment-1050485121


   
   ## CI report:
   
   * acc2bffef19cb00ce33dac3330cd0e1b7d692835 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32198)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497408#comment-17497408
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 7:32 AM:


# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely, because 
neither can guarantee that "checkpoint triggering speed < state deletion speed".
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.


was (Author: ym):
# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely, because 
neither can guarantee that "checkpoint triggering speed < state deletion speed".
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497408#comment-17497408
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 7:32 AM:


# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely, because 
neither can guarantee that "checkpoint triggering speed < state deletion speed".
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.


was (Author: ym):
# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely.
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (FLINK-26136) Implement shared validation logic for FlinkDeployment objects

2022-02-24 Thread Gyula Fora (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gyula Fora closed FLINK-26136.
--
Resolution: Fixed

Merged: 85d997eaa0310a298bb3a3a704afacf3844635f7

> Implement shared validation logic for FlinkDeployment objects
> -
>
> Key: FLINK-26136
> URL: https://issues.apache.org/jira/browse/FLINK-26136
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Major
>  Labels: pull-request-available
>
> At the moment there is only a very basic “placeholder” validation logic 
> implemented in the webhook module: 
> org.apache.flink.kubernetes.operator.admission.FlinkDeploymentValidator
> We should aim to validate parts of the FlinkDeployment that can be done 
> upfront, things like most common Flink config options, parallelism, resources 
> etc.
> As described in https://issues.apache.org/jira/browse/FLINK-26135 this 
> validation should be part of the flink-kubernetes-operator module.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497947#comment-17497947
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 7:25 AM:


The severity of this problem also depends on the test setup:
 * how often checkpoint is triggered
 * how big the state is
 * materialization interval (maybe)

If as the number/example shared above, triggering cp each 1s, I do agree that 
severity is an improvement/major.


was (Author: ym):
>From the number shared above, I do agree that this is an improvement/major.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497947#comment-17497947
 ] 

Yuan Mei commented on FLINK-26306:
--

>From the number shared, I do agree that this is an improvement/major.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497947#comment-17497947
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 7:12 AM:


>From the number shared above, I do agree that this is an improvement/major.


was (Author: ym):
>From the number shared, I do agree that this is an improvement/major.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] wangyang0918 commented on a change in pull request #18739: [FLINK-26030][yarn] Set FLINK_LIB_DIR to 'lib' under working dir in YARN containers

2022-02-24 Thread GitBox



wangyang0918 commented on a change in pull request #18739:
URL: https://github.com/apache/flink/pull/18739#discussion_r814522068



##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1829,4 +1809,41 @@ public static void logDetachedClusterInformation(
 yarnApplicationId,
 yarnApplicationId);
 }
+
+@VisibleForTesting
+Map generateApplicationMasterEnv(
+final YarnApplicationFileUploader fileUploader,
+final String classPathStr,
+final String localFlinkJarStr,
+final String appIdStr)
+throws IOException {
+final Map env = new HashMap<>();
+// set YARN classpath
+env.put(ENV_FLINK_CLASSPATH, classPathStr);
+Utils.setupYarnClassPath(this.yarnConfiguration, env);
+// 
https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md#identity-on-an-insecure-cluster-hadoop_user_name
+env.put(
+YarnConfigKeys.ENV_HADOOP_USER_NAME,
+UserGroupInformation.getCurrentUser().getUserName());
+// set user specified app master environment variables
+env.putAll(
+ConfigurationUtils.getPrefixedKeyValuePairs(
+ResourceManagerOptions.CONTAINERIZED_MASTER_ENV_PREFIX,
+this.flinkConfiguration));
+// Set FLINK_LIB_DIR to `lib` folder under working dir in container
+env.put(ENV_FLINK_LIB_DIR, Path.CUR_DIR + "/" + 
ConfigConstants.DEFAULT_FLINK_LIB_DIR);

Review comment:
   We might need to add a test to guard the `ENV_FLINK_LIB_DIR` will not be 
overridden by user specified environments.

##
File path: 
flink-yarn/src/main/java/org/apache/flink/yarn/YarnClusterDescriptor.java
##
@@ -1829,4 +1809,41 @@ public static void logDetachedClusterInformation(
 yarnApplicationId,
 yarnApplicationId);
 }
+
+@VisibleForTesting
+Map generateApplicationMasterEnv(
+final YarnApplicationFileUploader fileUploader,
+final String classPathStr,
+final String localFlinkJarStr,
+final String appIdStr)
+throws IOException {
+final Map env = new HashMap<>();
+// set YARN classpath
+env.put(ENV_FLINK_CLASSPATH, classPathStr);

Review comment:
   nit: why do we change the put operation orders as before?

##
File path: 
flink-yarn/src/test/java/org/apache/flink/yarn/YarnClusterDescriptorTest.java
##
@@ -830,4 +832,42 @@ private YarnClusterDescriptor 
createYarnClusterDescriptor(Configuration configur
 .setYarnClusterInformationRetriever(() -> YARN_MAX_VCORES)
 .build();
 }
+
+@Test
+public void testGenerateApplicationMasterEnv() throws IOException {
+final Configuration flinkConfig = new Configuration();
+final File flinkHomeDir = temporaryFolder.newFolder();
+final String fakeLocalFlinkJar = "./lib/flink_dist.jar";
+final String fakeClassPath = fakeLocalFlinkJar + ":./usrlib/user.jar";
+final ApplicationId appId = ApplicationId.newInstance(0, 0);
+try (final YarnClusterDescriptor yarnClusterDescriptor =
+createYarnClusterDescriptor(flinkConfig)) {
+final YarnApplicationFileUploader yarnApplicationFileUploader =
+YarnApplicationFileUploader.from(
+FileSystem.get(new YarnConfiguration()),
+new Path(flinkHomeDir.getPath()),
+new ArrayList<>(),
+appId,
+DFSConfigKeys.DFS_REPLICATION_DEFAULT);
+final Map masterEnv =
+yarnClusterDescriptor.generateApplicationMasterEnv(
+yarnApplicationFileUploader,
+fakeClassPath,
+fakeLocalFlinkJar,
+appId.toString());
+
+Assert.assertEquals("./lib", 
masterEnv.get(ConfigConstants.ENV_FLINK_LIB_DIR));

Review comment:
   ```suggestion
   assertEquals("./lib", 
masterEnv.get(ConfigConstants.ENV_FLINK_LIB_DIR));
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] 1996fanrui commented on pull request #18852: [FLINK-26049][checkpoint] Improve logic after checkpoint trigger fails

2022-02-24 Thread GitBox



1996fanrui commented on pull request #18852:
URL: https://github.com/apache/flink/pull/18852#issuecomment-1050574741


   > @1996fanrui , thanks a lot for these changes. I left a several comments in 
the PR, please, take a look. I also have an one question: According to ticket, 
you had a problem with `flink won't execute the tolerable-failed-checkpoints 
logic. ` but as I see you didn't make any fix for that. Do I understand 
correctly, that you had such a problem because you have the old version of 
Flink but right now(with the master) it is not a problem anymore?
   
   Hi @akalash , thanks for your review. 
   
   Yeah, you are right. I have replied in JIRA that `our prod env use Flink 
1.13. I see some jiras have resolved this issue.`, so I just fixed another 
problems.
   
   https://issues.apache.org/jira/browse/FLINK-23189 and 
https://issues.apache.org/jira/browse/FLINK-24344
   
I will address your comments and resubmit this PR as soon as possible. 
Thanks a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #18641: [FLINK-25761] [docs] Translate Avro format page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18641:
URL: https://github.com/apache/flink/pull/18641#issuecomment-1031234534


   
   ## CI report:
   
   * 684cc9586eed1e214683ea3d881e30bbc1065670 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32196)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-18229) Pending worker requests should be properly cleared

2022-02-24 Thread Xintong Song (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xintong Song reassigned FLINK-18229:


Assignee: huweihua

> Pending worker requests should be properly cleared
> --
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>Affects Versions: 1.9.3, 1.10.1, 1.11.0
>Reporter: Xintong Song
>Assignee: huweihua
>Priority: Major
> Fix For: 1.15.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-18229) Pending worker requests should be properly cleared

2022-02-24 Thread Xintong Song (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497938#comment-17497938
 ] 

Xintong Song commented on FLINK-18229:
--

Thanks for volunteering, [~huwh]. You are assigned.

> Pending worker requests should be properly cleared
> --
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>Affects Versions: 1.9.3, 1.10.1, 1.11.0
>Reporter: Xintong Song
>Assignee: huweihua
>Priority: Major
> Fix For: 1.15.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497937#comment-17497937
 ] 

Yang Wang commented on FLINK-26356:
---

I think we could always use the {{-rest.}} to access the 
Flink rest server no matter the HA is enabled or not. Right?

> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26334) when the (timestamp - offset + windowSize) is less than 0 the calculation result of TimeWindow.getWindowSTartWithOffset is incorrect

2022-02-24 Thread realdengziqi (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

realdengziqi updated FLINK-26334:
-
Affects Version/s: 1.15.0

> when the (timestamp - offset + windowSize) is less than 0 the calculation 
> result of TimeWindow.getWindowSTartWithOffset is incorrect
> 
>
> Key: FLINK-26334
> URL: https://issues.apache.org/jira/browse/FLINK-26334
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.15.0, 1.14.3
> Environment: flink version 1.14.3
>Reporter: realdengziqi
>Priority: Major
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
>  
> source code
> {code:java}
> //Method to get the window start for a timestamp.
> //Params:
> //timestamp – epoch millisecond to get the window start.
> //offset – The offset which window start would be shifted by.
> //windowSize – The size of the generated windows.
> //Returns:
> //window start
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
> return timestamp - (timestamp - offset + windowSize) % windowSize;
> } {code}
> If windowSize is 6 seconds, an element with a timestamp of -7000L should be 
> assigned to a window with a start time of -12000L. But this code will assign 
> it to the window whose start time is -6000L.
> According to the current calculation method, when the timestamp is (timestamp 
> - offset + windowSize)  is less than 0, the start time of the calculated time 
> window will be offset by one windowsSide unit in the direction of 0.
> I had a discussion with a friend and thought it was because the current 
> calculation logic is rounding towards 0. We should make it round to -∞.
> Do you think this is a bug. We would like to submit a pull request on github 
> to fix it.
> Below is a sample program for a scrolling window.
> {code:java}
> public class Test01 {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> env
> .fromElements(
> Tuple2.of("a",-7L*1000L),// start time should be 
> -12s
> Tuple2.of("b",-1L*1000L),
> Tuple2.of("c",1L*1000L),
> Tuple2.of("d",7L*1000L)
> )
> .assignTimestampsAndWatermarks(
> 
> WatermarkStrategy.>forMonotonousTimestamps()
> .withTimestampAssigner(
> new 
> SerializableTimestampAssigner>() {
> @Override
> public long 
> extractTimestamp(Tuple2 element, long recordTimestamp) {
> return element.f1;
> }
> }
> )
> )
> .keyBy(r->1)
> .window(TumblingEventTimeWindows.of(Time.seconds(6)))
> .process(
> new ProcessWindowFunction, 
> String, Integer, TimeWindow>() {
> @Override
> public void process(Integer integer, 
> ProcessWindowFunction, String, Integer, 
> TimeWindow>.Context context, Iterable> elements, 
> Collector out) throws Exception {
> for (Tuple2 element : elements) 
> {
> JSONObject item = new JSONObject();
> item.put("data",element.toString());
> item.put("windowStartTime",new 
> Timestamp(context.window().getStart()).toString() );
> item.put("windowEndTime",new 
> Timestamp(context.window().getEnd()).toString() );
> out.collect(item.toJSONString());
> }
> }
> }
> )
> .print();
> env.execute();
> }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25963) FLIP-212: Introduce Flink Kubernetes Operator

2022-02-24 Thread Yang Wang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-25963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497933#comment-17497933
 ] 

Yang Wang commented on FLINK-25963:
---

FYI: I have enabled the auto link reference for flink-kubernetes-operator repo.

 

https://issues.apache.org/jira/browse/INFRA-22916

> FLIP-212: Introduce Flink Kubernetes Operator
> -
>
> Key: FLINK-25963
> URL: https://issues.apache.org/jira/browse/FLINK-25963
> Project: Flink
>  Issue Type: Improvement
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Assignee: Thomas Weise
>Priority: Major
>
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-212%3A+Introduce+Flink+Kubernetes+Operator]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-18229) Pending worker requests should be properly cleared

2022-02-24 Thread huweihua (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497929#comment-17497929
 ] 

huweihua commented on FLINK-18229:
--

Hi [~xtsong] , I would like to deal with this issue, could you assign it to me?

> Pending worker requests should be properly cleared
> --
>
> Key: FLINK-18229
> URL: https://issues.apache.org/jira/browse/FLINK-18229
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes, Deployment / YARN, Runtime / 
> Coordination
>Affects Versions: 1.9.3, 1.10.1, 1.11.0
>Reporter: Xintong Song
>Priority: Major
> Fix For: 1.15.0
>
>
> Currently, if Kubernetes/Yarn does not have enough resources to fulfill 
> Flink's resource requirement, there will be pending pod/container requests on 
> Kubernetes/Yarn. These pending resource requirements are never cleared until 
> either fulfilled or the Flink cluster is shutdown.
> However, sometimes Flink no longer needs the pending resources. E.g., the 
> slot request is then fulfilled by another slots that become available, or the 
> job failed due to slot request timeout (in a session cluster). In such cases, 
> Flink does not remove the resource request until the resource is allocated, 
> then it discovers that it no longer needs the allocated resource and release 
> them. This would affect the underlying Kubernetes/Yarn cluster, especially 
> when the cluster is under heavy workload.
> It would be good for Flink to cancel pod/container requests as earlier as 
> possible if it can discover that some of the pending workers are no longer 
> needed.
> There are several approaches potentially achieve this.
>  # We can always check whether there's a pending worker that can be canceled 
> when a \{{PendingTaskManagerSlot}} is unassigned.
>  # We can have a separate timeout for requesting new worker. If the resource 
> cannot be allocated within the given time since requested, we should cancel 
> that resource request and claim a resource allocation failure.
>  # We can share the same timeout for starting new worker (proposed in 
> FLINK-13554). This is similar to 2), but it requires the worker to be 
> registered, rather than allocated, before timeout.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] RocMarshal edited a comment on pull request #18480: [FLINK-25789][docs-zh] Translate the formats/hadoop page into Chinese.

2022-02-24 Thread GitBox



RocMarshal edited a comment on pull request #18480:
URL: https://github.com/apache/flink/pull/18480#issuecomment-1037264660


   @gaoyunhaii @wuchong Could you help me to merge it if there's nothing 
Inappropriate？  Thank you very much.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497923#comment-17497923
 ] 

luoyuxia edited comment on FLINK-26355 at 2/25/22, 6:14 AM:


[~zoucao] HiveTableSqlFunction  is deprecated. I think the better way is 
waiting [FLINK-26364|https://issues.apache.org/jira/browse/FLINK-26364]  ready.
Hopefully I'll fix it when I'm free. 
Yes, we should consider about compatibility for backporting. But I'm not sure 
what's your concern the about backforward compatibility as it seems won't bring 
any bother about backforward compatibility. Would you like to explain more in 
case I miss some thing?


was (Author: luoyuxia):
[~zoucao] HiveTableSqlFunction  is deprecated. I think the better way is 
waiting [FLINK-26364|https://issues.apache.org/jira/browse/FLINK-26364]  ready.
Hopefully I'll fix it when I'm free. 
Yes, we should consider about compatibility for backporting. But I'm not sure 
what's your concern the about backforward compatibility as it seems won't bring 
any bother about backforward compatibility. Would you like to explain more in 
case I miss some thing.

> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497923#comment-17497923
 ] 

luoyuxia commented on FLINK-26355:
--

[~zoucao] HiveTableSqlFunction  is deprecated. I think the better way is 
waiting [FLINK-26364|https://issues.apache.org/jira/browse/FLINK-26364]  ready.
Hopefully I'll fix it when I'm free. 
Yes, we should consider about compatibility for backporting. But I'm not sure 
what's your concern the about backforward compatibility as it seems won't bring 
any bother about backforward compatibility. Would you like to explain more in 
case I miss some thing.

> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] wangyang0918 commented on pull request #17502: [FLINK-24538][runtime][tests] Fix ZooKeeperLeaderElectionTest.testLeaderShouldBeCorrectedWhenOverwritten fails with NPE

2022-02-24 Thread GitBox



wangyang0918 commented on pull request #17502:
URL: https://github.com/apache/flink/pull/17502#issuecomment-1050551870


   Sorry for the late response. @dmvk I think your suggestion makes sense to 
me. Then we will have only one place `waitForNewLeader` or 
`waitForEmptyLeaderInformation` to update the `leader`.
   
   @xmarker WDYT?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink-table-store] shenzhu commented on pull request #24: [FLINK-26217] Introduce manifest.merge-min-count in commit

2022-02-24 Thread GitBox



shenzhu commented on pull request #24:
URL: https://github.com/apache/flink-table-store/pull/24#issuecomment-1050550061


   Hey @tsreaper , I updated this PR and added some tests, would you mind 
taking a look again when you have a moment? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread zoucao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497918#comment-17497918
 ] 

zoucao commented on FLINK-26355:


I use default dialect with hive module rather than hive dialect. Is it 
necessary to fix the varchar type in `HiveTableSqlFunction#coerce` ? or waiting 
FLINK-26364 ready. WDYS [~luoyuxia]. If we want to fix it in FLINK-26364, I 
think we should consider more about compatibility for backporting.

> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] schumiyi commented on pull request #18914: [FLINK-26259][table-planner]Partial insert and partition insert canno…

2022-02-24 Thread GitBox



schumiyi commented on pull request #18914:
URL: https://github.com/apache/flink/pull/18914#issuecomment-1050529549


   @wuchong @JingsongLi Would you like to review this pr? Thanks for your time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #18916: [FLINK-26314] Disable unaligned checkpoints for StreamingExecutionFileSinkITCase.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18916:
URL: https://github.com/apache/flink/pull/18916#issuecomment-1050514075


   
   ## CI report:
   
   * be996b8c559bb17a7a4f3c3d397b79ed70d8ec9b Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32200)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18916: [FLINK-26314] Disable unaligned checkpoints for StreamingExecutionFileSinkITCase.

2022-02-24 Thread GitBox



flinkbot commented on pull request #18916:
URL: https://github.com/apache/flink/pull/18916#issuecomment-1050515056


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit be996b8c559bb17a7a4f3c3d397b79ed70d8ec9b (Fri Feb 25 
04:38:32 UTC 2022)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18916: [FLINK-26314] Disable unaligned checkpoints for StreamingExecutionFileSinkITCase.

2022-02-24 Thread GitBox



flinkbot commented on pull request #18916:
URL: https://github.com/apache/flink/pull/18916#issuecomment-1050514075


   
   ## CI report:
   
   * be996b8c559bb17a7a4f3c3d397b79ed70d8ec9b UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26314) StreamingCompactingFileSinkITCase.testFileSink failed on azure

2022-02-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26314:
---
Labels: pull-request-available test-stability  (was: test-stability)

> StreamingCompactingFileSinkITCase.testFileSink failed on azure
> --
>
> Key: FLINK-26314
> URL: https://issues.apache.org/jira/browse/FLINK-26314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Gen Luo
>Priority: Critical
>  Labels: pull-request-available, test-stability
>
> {code:java}
> Feb 22 13:34:32 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 12.735 s <<< FAILURE! - in 
> org.apache.flink.connector.file.sink.StreamingCompactingFileSinkITCase
> Feb 22 13:34:32 [ERROR] StreamingCompactingFileSinkITCase.testFileSink  Time 
> elapsed: 3.311 s  <<< FAILURE!
> Feb 22 13:34:32 java.lang.AssertionError: The record 6788 should occur 4 
> times,  but only occurs 3time expected:<4> but was:<3>
> Feb 22 13:34:32   at org.junit.Assert.fail(Assert.java:89)
> Feb 22 13:34:32   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 22 13:34:32   at org.junit.Assert.assertEquals(Assert.java:647)
> Feb 22 13:34:32   at 
> org.apache.flink.connector.file.sink.utils.IntegerFileSinkTestDataUtils.checkIntegerSequenceSinkOutput(IntegerFileSinkTestDataUtils.java:155)
> Feb 22 13:34:32   at 
> org.apache.flink.connector.file.sink.FileSinkITBase.testFileSink(FileSinkITBase.java:84)
> Feb 22 13:34:32   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Feb 22 13:34:32   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Feb 22 13:34:32   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Feb 22 13:34:32   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 22 13:34:32   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Feb 22 13:34:32   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Feb 22 13:34:32   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Feb 22 13:34:32   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Feb 22 13:34:32   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Feb 22 13:34:32   at org.junit.runners.Suite.runChild(Suite.java:128)
> Feb 22 13:34:32   at org.junit.runners.Suite.runChild(Suite.java:27)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>  {code}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=32023=logs=d44f43ce-542c-597d-bf94-b0718c71e5e8=ed165f3f-d0f6-524b-5279-86f8ee7d0e2d=11103

[GitHub] [flink] pltbkd opened a new pull request #18916: [FLINK-26314] Disable unaligned checkpoints for StreamingExecutionFileSinkITCase.

2022-02-24 Thread GitBox



pltbkd opened a new pull request #18916:
URL: https://github.com/apache/flink/pull/18916


   
   
   ## What is the purpose of the change
   
   This pull request fixes the unstable test case of 
StreamingCompactingFileSinkITCase since the compactor of FileSink can't work 
with unaligned checkpoint.
   
   ## Brief change log
   
 - Disable unaligned checkpoints explicitly for 
StreamingExecutionFileSinkITCase
   
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   This change is already covered by existing tests.
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26360:
-
Description: 
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't support or some other issues when 
using hive synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowledge 
about flink sql.

Feel free to leave your comment.

  was:
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't support or some issues when using 
hive synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowledge 
about flink sql.

Feel free to leave your comment.


> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Assignee: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't support or some other issues when 
> using hive synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowledge 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26360:
-
Description: 
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't support or some issues when using 
hive synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowledge 
about flink sql.

Feel free to leave your comment.

  was:
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't support or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowledge 
about flink sql.

Feel free to leave your comment.


> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Assignee: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't support or some issues when using 
> hive synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowledge 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26366) Support "insert directory"

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26366:
-
Description: 
In hive, it allow to  write data into the filesystem from queries using the 
following sql
"
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available 
starting with Hive 0.11.0)
  SELECT ... FROM ...
"
See more detail in 
[Writingdataintothefilesystemfromqueries|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82903069#LanguageManualDML-Writingdataintothefilesystemfromqueries].

As it's quite often used in production environment. We also need to support 
such usage in hive dialect.


  was:
In hive, it allow to  write data into the filesystem from queries using the 
following sql
"
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available 
starting with Hive 0.11.0)
  SELECT ... FROM ...
"
See more detail in 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82903069#LanguageManualDML-Writingdataintothefilesystemfromqueries.

As it's quite often used in production environment. We also need to support 
such usage in hive dialect.



> Support "insert directory"
> --
>
> Key: FLINK-26366
> URL: https://issues.apache.org/jira/browse/FLINK-26366
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> In hive, it allow to  write data into the filesystem from queries using the 
> following sql
> "
> INSERT OVERWRITE [LOCAL] DIRECTORY directory1
>   [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available 
> starting with Hive 0.11.0)
>   SELECT ... FROM ...
> "
> See more detail in 
> [Writingdataintothefilesystemfromqueries|https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82903069#LanguageManualDML-Writingdataintothefilesystemfromqueries].
> As it's quite often used in production environment. We also need to support 
> such usage in hive dialect.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18891: [FLINK-25243][k8s] Increase the k8s transactional operation max retries in the integration tests

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18891:
URL: https://github.com/apache/flink/pull/18891#issuecomment-1048391421


   
   ## CI report:
   
   * 4a1e6f20a8703f2fa59879da3d287e5ad5e284c5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32081)
 
   * 59ac7f07a95fd92cea3894cde9245c41bdd7a851 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32199)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-26366) Support "insert directory"

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26366:


 Summary: Support "insert directory"
 Key: FLINK-26366
 URL: https://issues.apache.org/jira/browse/FLINK-26366
 Project: Flink
  Issue Type: Sub-task
Reporter: luoyuxia


In hive, it allow to  write data into the filesystem from queries using the 
following sql
"
INSERT OVERWRITE [LOCAL] DIRECTORY directory1
  [ROW FORMAT row_format] [STORED AS file_format] (Note: Only available 
starting with Hive 0.11.0)
  SELECT ... FROM ...
"
See more detail in 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=82903069#LanguageManualDML-Writingdataintothefilesystemfromqueries.

As it's quite often used in production environment. We also need to support 
such usage in hive dialect.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18891: [FLINK-25243][k8s] Increase the k8s transactional operation max retries in the integration tests

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18891:
URL: https://github.com/apache/flink/pull/18891#issuecomment-1048391421


   
   ## CI report:
   
   * 4a1e6f20a8703f2fa59879da3d287e5ad5e284c5 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32081)
 
   * 59ac7f07a95fd92cea3894cde9245c41bdd7a851 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Assigned] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu reassigned FLINK-26360:
---

Assignee: luoyuxia

> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Assignee: luoyuxia
>Priority: Major
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't support or some bugs when using 
> hive synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowledge 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread Jark Wu (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jark Wu updated FLINK-26360:

Fix Version/s: 1.16.0

> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Assignee: luoyuxia
>Priority: Major
> Fix For: 1.16.0
>
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't support or some bugs when using 
> hive synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowledge 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497891#comment-17497891
 ] 

luoyuxia commented on FLINK-26355:
--

[~zoucao] Thanks for reporting it.  Seems you're using hive dialect? Yes, you 
can rewrite `getHiveResultType` to fix it.
But to work around it, you can change you sql to 
{code:sql}
 json_tuple(tb.json, repeat('f1', 1), repeat('f2', 1)) 
{code}
Hopes it can help.

But, anyway, the long running way is to use new type inference as 
[FLINK-26364|https://issues.apache.org/jira/browse/FLINK-26364] described.



> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] v-qiunan commented on pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



v-qiunan commented on pull request #18915:
URL: https://github.com/apache/flink/pull/18915#issuecomment-1050498632


   hello


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26365) improve hive udaf

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26365:
-
Description: 
Currently we can't get call Hive UDAF that requires constant arguments like 
[FLINK-16568|https://issues.apache.org/jira/browse/FLINK-16568] describes.
Also, we can't infer the suitable argument type which may cause some issue as 
[FLINK-25641|https://issues.apache.org/jira/browse/FLINK-25641] describes.


There has been a jira 
[FLINK-15855|https://issues.apache.org/jira/browse/FLINK-15855] to use new type 
inference for hive udaf, so we just keep track in here.

> improve hive udaf
> -
>
> Key: FLINK-26365
> URL: https://issues.apache.org/jira/browse/FLINK-26365
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Currently we can't get call Hive UDAF that requires constant arguments like 
> [FLINK-16568|https://issues.apache.org/jira/browse/FLINK-16568] describes.
> Also, we can't infer the suitable argument type which may cause some issue as 
> [FLINK-25641|https://issues.apache.org/jira/browse/FLINK-25641] describes.
> There has been a jira 
> [FLINK-15855|https://issues.apache.org/jira/browse/FLINK-15855] to use new 
> type inference for hive udaf, so we just keep track in here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26314) StreamingCompactingFileSinkITCase.testFileSink failed on azure

2022-02-24 Thread Gen Luo (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497887#comment-17497887
 ] 

Gen Luo commented on FLINK-26314:
-

I found that the PseudoRandomValueSelector is selecting PT0S for 
execution.checkpointing.alignment-timeout and true for 
execution.checkpointing.unaligned for this test. I can reproduce these issues 
with these options. It seems that the compactor for FileSink can not work with 
unaligned checkpoint at present. 

I will first create a PR with a hotfix for the test to disable unaligned 
checkpoint explicitly, then work on a bugfix to make the compactor supporting 
unaligned.

> StreamingCompactingFileSinkITCase.testFileSink failed on azure
> --
>
> Key: FLINK-26314
> URL: https://issues.apache.org/jira/browse/FLINK-26314
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.15.0
>Reporter: Yun Gao
>Assignee: Gen Luo
>Priority: Critical
>  Labels: test-stability
>
> {code:java}
> Feb 22 13:34:32 [ERROR] Tests run: 2, Failures: 1, Errors: 0, Skipped: 0, 
> Time elapsed: 12.735 s <<< FAILURE! - in 
> org.apache.flink.connector.file.sink.StreamingCompactingFileSinkITCase
> Feb 22 13:34:32 [ERROR] StreamingCompactingFileSinkITCase.testFileSink  Time 
> elapsed: 3.311 s  <<< FAILURE!
> Feb 22 13:34:32 java.lang.AssertionError: The record 6788 should occur 4 
> times,  but only occurs 3time expected:<4> but was:<3>
> Feb 22 13:34:32   at org.junit.Assert.fail(Assert.java:89)
> Feb 22 13:34:32   at org.junit.Assert.failNotEquals(Assert.java:835)
> Feb 22 13:34:32   at org.junit.Assert.assertEquals(Assert.java:647)
> Feb 22 13:34:32   at 
> org.apache.flink.connector.file.sink.utils.IntegerFileSinkTestDataUtils.checkIntegerSequenceSinkOutput(IntegerFileSinkTestDataUtils.java:155)
> Feb 22 13:34:32   at 
> org.apache.flink.connector.file.sink.FileSinkITBase.testFileSink(FileSinkITBase.java:84)
> Feb 22 13:34:32   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)
> Feb 22 13:34:32   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> Feb 22 13:34:32   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> Feb 22 13:34:32   at java.lang.reflect.Method.invoke(Method.java:498)
> Feb 22 13:34:32   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> Feb 22 13:34:32   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> Feb 22 13:34:32   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> Feb 22 13:34:32   at 
> org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:54)
> Feb 22 13:34:32   at 
> org.apache.flink.util.TestNameProvider$1.evaluate(TestNameProvider.java:45)
> Feb 22 13:34:32   at 
> org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
> Feb 22 13:34:32   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> Feb 22 13:34:32   at org.junit.runners.Suite.runChild(Suite.java:128)
> Feb 22 13:34:32   at org.junit.runners.Suite.runChild(Suite.java:27)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> Feb 22 13:34:32   at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> Feb 22 13:34:32   at 
>

[jira] [Created] (FLINK-26365) improve hive udaf

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26365:


 Summary: improve hive udaf
 Key: FLINK-26365
 URL: https://issues.apache.org/jira/browse/FLINK-26365
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26364) Improve hive udtf

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26364:
-
Description: 
There're some issue sreported releated to calling hive udft like 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727], 
[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355] whose main 
cause is type inference and some others releated. 

There has been a jira for use the new type inference for hive udtf 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
So, we just keep track it in here.


  was:
There're some issue sreported releated to calling hive udft like 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
 whose main cause is type inference and some others releated. 

There has been a jira for use the new type inference for hive udtf 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
So, we just keep track it in here.



> Improve hive udtf
> -
>
> Key: FLINK-26364
> URL: https://issues.apache.org/jira/browse/FLINK-26364
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> There're some issue sreported releated to calling hive udft like 
> [FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727], 
> [FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355] whose main 
> cause is type inference and some others releated. 
> There has been a jira for use the new type inference for hive udtf 
> [FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
> So, we just keep track it in here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread zoucao (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497885#comment-17497885
 ] 

zoucao commented on FLINK-26355:


Hi, [~luoyuxia], I found this problem in the process of driving the type 
exception about json_tuple, the same problem proposed in FLINK-25727 we faced. 
To skip this type not match exception, I change the SQL from 
{code:java} json_tuple(tb.json, 'f1', 'f2') {code}
 to  
{code:java} json_tuple(tb.json, cast('f1' as string), cast('f2' as sting))  
{code}
After rewriting the SQL, I got the 'not support VarcharType' exception from 
`HiveTableSqlFunction#coerce`, I wish it can help you.
By the way, I found the [pr|https://github.com/apache/flink/pull/18548] 
proposed in FLINK-25727 is not work in my environment,  the method 
`setArgumentTypesAndConstants` was not invoked. IIUC, 
`HiveGenericUDTF#setArgumentTypesAndConstants` was only invoked by 
`HiveGenericUDAF`, should we rewrite `getHiveResultType` to fix it ?
Feel free to let me know what I can help to do about it.

> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26364) Improve hive udtf

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26364:
-
Description: 
There're some issue sreported releated to calling hive udft like 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
 whose main cause is type inference and some others releated. 

There has been a jira for use the new type inference for hive udtf 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
So, we just keep track it in here.


  was:
There're some issue sreported releated to calling hive udft like 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
 whose main cause is type inference and some others releated. There has been a 
jira for use the new type inference for hive udtf 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
So, we just keep track it in here.



> Improve hive udtf
> -
>
> Key: FLINK-26364
> URL: https://issues.apache.org/jira/browse/FLINK-26364
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> There're some issue sreported releated to calling hive udft like 
> [FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
>  whose main cause is type inference and some others releated. 
> There has been a jira for use the new type inference for hive udtf 
> [FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
> So, we just keep track it in here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26364) Improve hive udtf

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26364:
-
Description: 
There're some issue sreported releated to calling hive udft like 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
 whose main cause is type inference and some others releated. There has been a 
jira for use the new type inference for hive udtf 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
So, we just keep track it in here.


> Improve hive udtf
> -
>
> Key: FLINK-26364
> URL: https://issues.apache.org/jira/browse/FLINK-26364
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> There're some issue sreported releated to calling hive udft like 
> [FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727],[FLINK-26355|https://issues.apache.org/jira/browse/FLINK-26355],
>  whose main cause is type inference and some others releated. There has been 
> a jira for use the new type inference for hive udtf 
> [FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].
> So, we just keep track it in here.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18915:
URL: https://github.com/apache/flink/pull/18915#issuecomment-1050485121


   
   ## CI report:
   
   * acc2bffef19cb00ce33dac3330cd0e1b7d692835 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32198)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (FLINK-26364) Improve hive udtf

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26364:


 Summary: Improve hive udtf
 Key: FLINK-26364
 URL: https://issues.apache.org/jira/browse/FLINK-26364
 Project: Flink
  Issue Type: Sub-task
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] zhipeng93 edited a comment on pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



zhipeng93 edited a comment on pull request #66:
URL: https://github.com/apache/flink-ml/pull/66#issuecomment-1050431249


   Hi Dong, Thanks for the review.
   
   > Could you confirm that the flaky test could be reproduced before this 
patch but not after this patch?
   
   I have repeatedly run this flaky test for up to 100 times and there is no 
failures.
   Do you mean that we need another test case for this bug fix? I think 
`LogisticRegressionTest#testMoreSubtaskThanData` already covers this case --- 
It sometimes fails before, but did not fail for 100 runs after this fix.

   > And could you update the AllReduceImpl's Java doc to replace `only one 
double array` with `up to one double array`? And maybe update allReduceSum() 
Java doc similarly?
   
   The java doc is updated and I also added one more test case to cover this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



flinkbot commented on pull request #18915:
URL: https://github.com/apache/flink/pull/18915#issuecomment-1050485301


   Thanks a lot for your contribution to the Apache Flink project. I'm the 
@flinkbot. I help the community
   to review your pull request. We will use this comment to track the progress 
of the review.
   
   
   ## Automated Checks
   Last check on commit acc2bffef19cb00ce33dac3330cd0e1b7d692835 (Fri Feb 25 
03:26:13 UTC 2022)
   
   **Warnings:**
* No documentation files were touched! Remember to keep the Flink docs up 
to date!
* **This pull request references an unassigned [Jira 
ticket](https://issues.apache.org/jira/browse/FLINK-26351).** According to the 
[code contribution 
guide](https://flink.apache.org/contributing/contribute-code.html), tickets 
need to be assigned before starting with the implementation work.
   
   
   Mention the bot in a comment to re-run the automated checks.
   ## Review Progress
   
   * ❓ 1. The [description] looks good.
   * ❓ 2. There is [consensus] that the contribution should go into to Flink.
   * ❓ 3. Needs [attention] from.
   * ❓ 4. The change fits into the overall [architecture].
   * ❓ 5. Overall code [quality] is good.
   
   Please see the [Pull Request Review 
Guide](https://flink.apache.org/contributing/reviewing-prs.html) for a full 
explanation of the review process.
The Bot is tracking the review progress through labels. Labels are applied 
according to the order of the review items. For consensus, approval by a Flink 
committer of PMC member is required Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot approve description` to approve one or more aspects (aspects: 
`description`, `consensus`, `architecture` and `quality`)
- `@flinkbot approve all` to approve all aspects
- `@flinkbot approve-until architecture` to approve everything until 
`architecture`
- `@flinkbot attention @username1 [@username2 ..]` to require somebody's 
attention
- `@flinkbot disapprove architecture` to remove an approval you gave earlier
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot commented on pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



flinkbot commented on pull request #18915:
URL: https://github.com/apache/flink/pull/18915#issuecomment-1050485121


   
   ## CI report:
   
   * acc2bffef19cb00ce33dac3330cd0e1b7d692835 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26363) Fail to use such expression like if(1= 1, 2, 3)

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497881#comment-17497881
 ] 

luoyuxia commented on FLINK-26363:
--

There're are some issue like 
[FLINK-25095|https://issues.apache.org/jira/browse/FLINK-25095], 
[FLINK-20765|https://issues.apache.org/jira/browse/FLINK-20765] with same 
reason to report such case. 
We keep tracking it in here.

> Fail to use such expression like if(1= 1, 2, 3)
> ---
>
> Key: FLINK-26363
> URL: https://issues.apache.org/jira/browse/FLINK-26363
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Using hive dialect, can be reproduced using the following sql
> {code:sql}
> select if(1=1, 2,3)
> {code}
> Then it'll throw such exception " Mismatch of function's argument data type 
> 'BOOLEAN NOT NULL' and actual argument type 'BOOLEAN'.".
> The main reason is ScalarOperatorGens always generates expressions will 
> nullable return type, which can be different from the inferred type in 
> Calcite.
> We need to fix it in ScalarOperatorGens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26351) After scaling a flink task running on k8s, the flink web ui graph always shows the parallelism of the first deployment.

2022-02-24 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated FLINK-26351:
---
Labels: pull-request-available  (was: )

> After scaling a flink task running on k8s, the flink web ui graph always 
> shows the parallelism of the first deployment.
> ---
>
> Key: FLINK-26351
> URL: https://issues.apache.org/jira/browse/FLINK-26351
> Project: Flink
>  Issue Type: Bug
>  Components: API / Core
>Affects Versions: 1.15.0
>Reporter: qiunan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.15.0
>
>
> In the code，flink web ui graph data from under method.
> AdaptiveScheduler.requestJob()
> {code:java}
>  @Override
>     public ExecutionGraphInfo requestJob() {
>         return new ExecutionGraphInfo(state.getJob(), 
> exceptionHistory.toArrayList());
>    } {code}
> This executionGraphInfo is task restart build and restore to state.
> You can see the code, the parallelism recalculate and copy jobGraph to reset.
> AdaptiveScheduler.createExecutionGraphWithAvailableResourcesAsync().
> {code:java}
> vertexParallelism = determineParallelism(slotAllocator);
> JobGraph adjustedJobGraph = jobInformation.copyJobGraph();
> for (JobVertex vertex : adjustedJobGraph.getVertices()) {
> JobVertexID id = vertex.getID();
> // use the determined "available parallelism" to use
> // the resources we have access to
> vertex.setParallelism(vertexParallelism.getParallelism(id));
> }{code}
> But in the restoreState copy jobGraph again, so the jobGraph parallelism 
> always deployed for the first time.
> AdaptiveScheduler.createExecutionGraphAndRestoreState(VertexParallelismStore 
> adjustedParallelismStore)  
> {code:java}
> private ExecutionGraph createExecutionGraphAndRestoreState(
> VertexParallelismStore adjustedParallelismStore) throws Exception {
> return executionGraphFactory.createAndRestoreExecutionGraph(
> jobInformation.copyJobGraph(),
> completedCheckpointStore,
> checkpointsCleaner,
> checkpointIdCounter,
> 
> TaskDeploymentDescriptorFactory.PartitionLocationConstraint.MUST_BE_KNOWN,
> initializationTimestamp,
> vertexAttemptNumberStore,
> adjustedParallelismStore,
> deploymentTimeMetrics,
> LOG);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26363) Fail to use such expression like if(1= 1, 2, 3)

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26363:
-
Description: 
Using hive dialect, can be reproduced using the following sql

{code:sql}
select if(1=1, 2,3)
{code}

Then it'll throw such exception " Mismatch of function's argument data type 
'BOOLEAN NOT NULL' and actual argument type 'BOOLEAN'.".

The main reason is ScalarOperatorGens always generates expressions will 
nullable return type, which can be different from the inferred type in Calcite.
We need to fix it in ScalarOperatorGens.

  was:
Using hive dialect, can be reproduced using the following sql

{code:sql}
select if(1=1, 2,3)
{code}



> Fail to use such expression like if(1= 1, 2, 3)
> ---
>
> Key: FLINK-26363
> URL: https://issues.apache.org/jira/browse/FLINK-26363
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Using hive dialect, can be reproduced using the following sql
> {code:sql}
> select if(1=1, 2,3)
> {code}
> Then it'll throw such exception " Mismatch of function's argument data type 
> 'BOOLEAN NOT NULL' and actual argument type 'BOOLEAN'.".
> The main reason is ScalarOperatorGens always generates expressions will 
> nullable return type, which can be different from the inferred type in 
> Calcite.
> We need to fix it in ScalarOperatorGens.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] v-qiunan opened a new pull request #18915: [FLINK-26351][k8s/web ui] After scaling a flink task running on k8s, …

2022-02-24 Thread GitBox



v-qiunan opened a new pull request #18915:
URL: https://github.com/apache/flink/pull/18915


   …the flink web ui graph always shows the parallelism of the first deployment.
   
   
   
   ## What is the purpose of the change
   
   *(After scaling a flink task running on k8s, the flink web ui graph always 
shows the parallelism of the first deployment.)*
   
   
   ## Brief change log
   
 - *When the flink k8s task builds the ExecutionGraph, it is return this 
graph instead of clone graph.*
   
   ## Verifying this change
   
   Please make sure both new and modified tests in this PR follows the 
conventions defined in our code quality guide: 
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
   
   This change is already covered by existing tests, such as *(please describe 
tests)*.
   AdaptiveSchedulerTest
   
   ## Does this pull request potentially affect one of the following parts:
   
 - Dependencies (does it add or upgrade a dependency): (no)
 - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
 - The serializers: (no)
 - The runtime per-record code paths (performance sensitive): (no)
 - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: ( no)
 - The S3 file system connector: (no)
   
   ## Documentation
   
 - Does this pull request introduce a new feature? (no)
 - If yes, how is the feature documented? (not applicable)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26363) Fail to use such expression like if(1= 1, 2, 3)

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26363:
-
Description: 
Using hive dialect, can be reproduced using the following sql

{code:sql}
select if(1=1, 2,3)
{code}


> Fail to use such expression like if(1= 1, 2, 3)
> ---
>
> Key: FLINK-26363
> URL: https://issues.apache.org/jira/browse/FLINK-26363
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Using hive dialect, can be reproduced using the following sql
> {code:sql}
> select if(1=1, 2,3)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26363) Fail to use such expression like if(1= 1, 2, 3)

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26363:


 Summary: Fail to use such expression like if(1= 1, 2, 3)
 Key: FLINK-26363
 URL: https://issues.apache.org/jira/browse/FLINK-26363
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Hive
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26354) "-restoreMode" should be "--restoreMode" and should have a shorthand

2022-02-24 Thread Yun Tang (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497878#comment-17497878
 ] 

Yun Tang commented on FLINK-26354:
--

Since the latest version of commons-cli is 1.5.0, which is what we already used 
in Flink. And it must have a short option key while the long option key could 
be optional. And FLINK-22701 actually suffers from this problem due to short 
option key conflicts. Maybe we could change the dependent library in 
Flink-1.16. For current solution, I prefer [~knaufk]'s idea, maybe we could 
introduce the short option key such 'rm' for the long option key 'restoreMode'.

> "-restoreMode" should be "--restoreMode" and should have a shorthand
> 
>
> Key: FLINK-26354
> URL: https://issues.apache.org/jira/browse/FLINK-26354
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.15.0
>Reporter: Konstantin Knauf
>Priority: Minor
>
> {code:java}
> -restoreMode  Defines how should we restore
> from the given savepoint.
> Supported options: [claim -
> claim ownership of the 
> savepoint
> and delete once it is 
> subsumed,
> no_claim (default) - do not
> claim ownership, the first
> checkpoint will not reuse any
> files from the restored one,
> legacy - the old behaviour, do
> not assume ownership of the
> savepoint files, but can reuse
> some shared files.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032265310


   
   ## CI report:
   
   * 4324aa57719c8e0f0e408daaa4dd7064109bc5fa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32104)
 
   * ce34627cabe844f0af69ddca6f1b5599e5bf1ef8 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32197)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1032265310


   
   ## CI report:
   
   * 4324aa57719c8e0f0e408daaa4dd7064109bc5fa Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32104)
 
   * ce34627cabe844f0af69ddca6f1b5599e5bf1ef8 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] MrWhiteSike commented on pull request #18655: [FLINK-25799] [docs] Translate table/filesystem.md page into Chinese.

2022-02-24 Thread GitBox



MrWhiteSike commented on pull request #18655:
URL: https://github.com/apache/flink/pull/18655#issuecomment-1050465364


   [@RocMarshal](https://github.com/RocMarshal) Thanks for the comments. Please 
review it again. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Aitozi (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26356:
---
Description: 
The clusterClient is built as below. The config is mixed up with the 
FlinkDeploymentSpec and local default config. 
{code:java}
final int port = config.getInteger(RestOptions.PORT);
final String host =
config.getString(
RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
namespace));
final String restServerAddress = String.format("http://%s:%s;, host, port); 
{code}
But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
enabled, so the option can not obtain from the FlinkDeploymentSpec.

Furthermore, the default rest url is not suitable for all the service type. I 
think we should extract the rest endpoint from the Flink external service.

  was:
The clusterClient is built as below. The config is mixed up with the 
FlinkDeploymentSpec and local default config. 
{code:java}
final int port = config.getInteger(RestOptions.PORT);
final String host =
config.getString(
RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
namespace));
final String restServerAddress = String.format("http://%s:%s;, host, port); 
{code}
But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
enabled, so the option can not obtain from the FlinkDeploymentSpec.

Furthermore, the default rest url is not suitable for all the service type. I 
think we should extract the rest endpoint from the Flink external service.

One more concern is that, if the operator manage the multiple namespace, the 
rest url of \{{serviceName.namespace}} may not enough, it can not access across 
the namespace. 


> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26360:
-
Description: 
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't support or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowledge 
about flink sql.

Feel free to leave your comment.

  was:
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't suport or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowle 
about flink sql.

Feel free to leave your comment.


> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't support or some bugs when using 
> hive synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowledge 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26362) "IndexOutOfBoundsException" when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497857#comment-17497857
 ] 

luoyuxia commented on FLINK-26362:
--

We should add a extra project node when type conversion is required. I'll fix 
it.

> "IndexOutOfBoundsException" when subquery select all field from using hive 
> dialect
> --
>
> Key: FLINK-26362
> URL: https://issues.apache.org/jira/browse/FLINK-26362
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> With hive dialect, can be reproduced using following code:
> {code:java}
>  tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
>  tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
> List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
> c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
> collect());
> {code}
> Then it will throw IndexOutOfBoundsException: 0, the reason is 
> " If it's a subquery and the project is identity, we skip creating this 
> project. This is to handle an issue with calcite SubQueryRemoveRule. The rule 
> checks col uniqueness by calling RelMetadataQuery::areColumnsUnique with an 
> empty col set, which always returns null for a project and thus introduces 
> unnecessary agg node.
> "
> So there could be no project node and only tablescan node in subquery, but 
> when we try to do type conversion for the subquery, with the following code, 
> it'll throw exception when there's no project node.
> {code:java}
> if (queryRelNode instanceof Project) {
>   return replaceProjectForTypeConversion(
> rexBuilder,
> (Project) queryRelNode,
> targetCalcTypes,
> targetHiveTypes,
> funcConverter);
> } else {
>   RelNode newInput =
> addTypeConversions(
> rexBuilder,
> queryRelNode.getInput(0),
> targetCalcTypes,
> targetHiveTypes,
> funcConverter);
> queryRelNode.replaceInput(0, newInput);
> return queryRelNode;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26362) "IndexOutOfBoundsException" when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26362:
-
Description: 
With hive dialect, can be reproduced using following code:

{code:java}
 tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
 tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
collect());
{code}

Then it will throw IndexOutOfBoundsException: 0, the reason is 
" If it's a subquery and the project is identity, we skip creating this 
project. This is to handle an issue with calcite SubQueryRemoveRule. The rule 
checks col uniqueness by calling RelMetadataQuery::areColumnsUnique with an 
empty col set, which always returns null for a project and thus introduces 
unnecessary agg node.
"
So there could be no project node and only tablescan node in subquery, but when 
we try to do type conversion for the subquery, with the following code, it'll 
throw exception when there's no project node.

{code:java}
if (queryRelNode instanceof Project) {
  return replaceProjectForTypeConversion(
rexBuilder,
(Project) queryRelNode,
targetCalcTypes,
targetHiveTypes,
funcConverter);
} else {
  RelNode newInput =
addTypeConversions(
rexBuilder,
queryRelNode.getInput(0),
targetCalcTypes,
targetHiveTypes,
funcConverter);
queryRelNode.replaceInput(0, newInput);
return queryRelNode;
}
{code}



  was:
With hive dialect, can be reproduced using following code:

{code:java}
 tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
 tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
collect());
{code}

Then it will throw index



> "IndexOutOfBoundsException" when subquery select all field from using hive 
> dialect
> --
>
> Key: FLINK-26362
> URL: https://issues.apache.org/jira/browse/FLINK-26362
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> With hive dialect, can be reproduced using following code:
> {code:java}
>  tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
>  tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
> List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
> c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
> collect());
> {code}
> Then it will throw IndexOutOfBoundsException: 0, the reason is 
> " If it's a subquery and the project is identity, we skip creating this 
> project. This is to handle an issue with calcite SubQueryRemoveRule. The rule 
> checks col uniqueness by calling RelMetadataQuery::areColumnsUnique with an 
> empty col set, which always returns null for a project and thus introduces 
> unnecessary agg node.
> "
> So there could be no project node and only tablescan node in subquery, but 
> when we try to do type conversion for the subquery, with the following code, 
> it'll throw exception when there's no project node.
> {code:java}
> if (queryRelNode instanceof Project) {
>   return replaceProjectForTypeConversion(
> rexBuilder,
> (Project) queryRelNode,
> targetCalcTypes,
> targetHiveTypes,
> funcConverter);
> } else {
>   RelNode newInput =
> addTypeConversions(
> rexBuilder,
> queryRelNode.getInput(0),
> targetCalcTypes,
> targetHiveTypes,
> funcConverter);
> queryRelNode.replaceInput(0, newInput);
> return queryRelNode;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18914: [FLINK-26259][table-planner]Partial insert and partition insert canno…

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18914:
URL: https://github.com/apache/flink/pull/18914#issuecomment-1050372877


   
   ## CI report:
   
   * a591d1c40b987902427e7692e208e972b70bcb33 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32192)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26362) "IndexOutOfBoundsException" when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26362:
-
Description: 
With hive dialect, can be reproduced using following code:

{code:java}
 tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
 tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
collect());
{code}

Then it will throw index


> "IndexOutOfBoundsException" when subquery select all field from using hive 
> dialect
> --
>
> Key: FLINK-26362
> URL: https://issues.apache.org/jira/browse/FLINK-26362
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> With hive dialect, can be reproduced using following code:
> {code:java}
>  tableEnv.executeSql("CREATE TABLE t1 (c1 INT, c2 CHAR(100))");
>  tableEnv.executeSql("CREATE TABLE t2 (c1 INT)");
> List results = CollectionUtil.iteratorToList(tableEnv.executeSql("SELECT 
> c1 FROM t1 WHERE c1 IN (SELECT c1 FROM t2)").
> collect());
> {code}
> Then it will throw index



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18641: [FLINK-25761] [docs] Translate Avro format page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18641:
URL: https://github.com/apache/flink/pull/18641#issuecomment-1031234534


   
   ## CI report:
   
   * 6d23d5651638281b39766cf214cbbb2b111a4da7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32146)
 
   * 684cc9586eed1e214683ea3d881e30bbc1065670 Azure: 
[PENDING](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32196)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Updated] (FLINK-26362) "IndexOutOfBoundsException" when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26362:
-
Summary: "IndexOutOfBoundsException" when subquery select all field from 
using hive dialect  (was: IndexOutOfBoundsException when subquery select all 
field from using hive dialect)

> "IndexOutOfBoundsException" when subquery select all field from using hive 
> dialect
> --
>
> Key: FLINK-26362
> URL: https://issues.apache.org/jira/browse/FLINK-26362
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26362) IndexOutOfBoundsException when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26362:


 Summary: IndexOutOfBoundsException when subquery select all field 
from using hive dialect
 Key: FLINK-26362
 URL: https://issues.apache.org/jira/browse/FLINK-26362
 Project: Flink
  Issue Type: Sub-task
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Closed] (FLINK-26343) IndexOutOfBoundsException when subquery select all field from using hive dialect

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia closed FLINK-26343.

Resolution: Duplicate

> IndexOutOfBoundsException when subquery select all field from  using hive 
> dialect
> -
>
> Key: FLINK-26343
> URL: https://issues.apache.org/jira/browse/FLINK-26343
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.13.3
>Reporter: luoyuxia
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] flinkbot edited a comment on pull request #18641: [FLINK-25761] [docs] Translate Avro format page into Chinese.

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18641:
URL: https://github.com/apache/flink/pull/18641#issuecomment-1031234534


   
   ## CI report:
   
   * 6d23d5651638281b39766cf214cbbb2b111a4da7 Azure: 
[SUCCESS](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32146)
 
   * 684cc9586eed1e214683ea3d881e30bbc1065670 UNKNOWN
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] MrWhiteSike commented on pull request #18641: [FLINK-25761] [docs] Translate Avro format page into Chinese.

2022-02-24 Thread GitBox



MrWhiteSike commented on pull request #18641:
URL: https://github.com/apache/flink/pull/18641#issuecomment-1050452261


   [@RocMarshal](https://github.com/RocMarshal) Thanks for the comments.I 
updated this pr based on your comments. Please review it again. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26361) "Unexpected correlate variable $cor0" when using hive dialect to write a subquery

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497850#comment-17497850
 ] 

luoyuxia commented on FLINK-26361:
--

I'll fix it.

> "Unexpected correlate variable $cor0" when using hive dialect to write a 
> subquery
> -
>
> Key: FLINK-26361
> URL: https://issues.apache.org/jira/browse/FLINK-26361
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> With hive dialect, can be reproduced using such code: 
> {code:java}
> tableEnv.executeSql("CREATE TABLE src (key string, value string)");
> tableEnv.executeSql("create view cv1 as \n"
> + "select * \n"
> + "from src b where exists\n"
> + "  (select a.key \n"
> + "  from src a \n"
> + "  where b.value = a.value )");
> List results =
> CollectionUtil.iteratorToList(tableEnv.executeSql("select * 
> from src where src"
> + ".key in (select key from cv1)").collect());
> {code}
> The plan for such sql is :
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[key, value])
>   LogicalProject(key=[$0], value=[$1])
> LogicalFilter(condition=[IN($0, {
> LogicalProject(key=[$0])
>   LogicalProject(key=[$0], value=[$1])
> LogicalFilter(condition=[EXISTS({
> LogicalProject(key=[$0])
>   LogicalFilter(condition=[=($cor0.value, $1)])
> LogicalTableScan(table=[[test-catalog, default, src]])
> })])
>   LogicalTableScan(table=[[test-catalog, default, src]])
> })])
>   LogicalTableScan(table=[[test-catalog, default, src]])
> {code}
> The node LogicalFilter(condition=[=($cor0.value, $1)]) contains `$cor0`, but 
> miss variablesSet. To fix it, we should pass variablesSet when create 
> LogicalFilter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26361) "Unexpected correlate variable $cor0" when using hive dialect to write a subquery

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26361:
-
Description: 
With hive dialect, can be reproduced using such code: 

{code:java}
tableEnv.executeSql("CREATE TABLE src (key string, value string)");
tableEnv.executeSql("create view cv1 as \n"
+ "select * \n"
+ "from src b where exists\n"
+ "  (select a.key \n"
+ "  from src a \n"
+ "  where b.value = a.value )");
List results =
CollectionUtil.iteratorToList(tableEnv.executeSql("select * 
from src where src"
+ ".key in (select key from cv1)").collect());
{code}


The plan for such sql is :

{code:java}
LogicalSink(table=[*anonymous_collect$1*], fields=[key, value])
  LogicalProject(key=[$0], value=[$1])
LogicalFilter(condition=[IN($0, {
LogicalProject(key=[$0])
  LogicalProject(key=[$0], value=[$1])
LogicalFilter(condition=[EXISTS({
LogicalProject(key=[$0])
  LogicalFilter(condition=[=($cor0.value, $1)])
LogicalTableScan(table=[[test-catalog, default, src]])
})])
  LogicalTableScan(table=[[test-catalog, default, src]])
})])
  LogicalTableScan(table=[[test-catalog, default, src]])
{code}

The node LogicalFilter(condition=[=($cor0.value, $1)]) contains `$cor0`, but 
miss variablesSet. To fix it, we should pass variablesSet when create 
LogicalFilter.




> "Unexpected correlate variable $cor0" when using hive dialect to write a 
> subquery
> -
>
> Key: FLINK-26361
> URL: https://issues.apache.org/jira/browse/FLINK-26361
> Project: Flink
>  Issue Type: Sub-task
>Reporter: luoyuxia
>Priority: Major
>
> With hive dialect, can be reproduced using such code: 
> {code:java}
> tableEnv.executeSql("CREATE TABLE src (key string, value string)");
> tableEnv.executeSql("create view cv1 as \n"
> + "select * \n"
> + "from src b where exists\n"
> + "  (select a.key \n"
> + "  from src a \n"
> + "  where b.value = a.value )");
> List results =
> CollectionUtil.iteratorToList(tableEnv.executeSql("select * 
> from src where src"
> + ".key in (select key from cv1)").collect());
> {code}
> The plan for such sql is :
> {code:java}
> LogicalSink(table=[*anonymous_collect$1*], fields=[key, value])
>   LogicalProject(key=[$0], value=[$1])
> LogicalFilter(condition=[IN($0, {
> LogicalProject(key=[$0])
>   LogicalProject(key=[$0], value=[$1])
> LogicalFilter(condition=[EXISTS({
> LogicalProject(key=[$0])
>   LogicalFilter(condition=[=($cor0.value, $1)])
> LogicalTableScan(table=[[test-catalog, default, src]])
> })])
>   LogicalTableScan(table=[[test-catalog, default, src]])
> })])
>   LogicalTableScan(table=[[test-catalog, default, src]])
> {code}
> The node LogicalFilter(condition=[=($cor0.value, $1)]) contains `$cor0`, but 
> miss variablesSet. To fix it, we should pass variablesSet when create 
> LogicalFilter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Aitozi (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497848#comment-17497848
 ] 

Aitozi commented on FLINK-26356:


I have not make it very clear yet, I will do more investigation first. I can 
take this ticket. 

> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.
> One more concern is that, if the operator manage the multiple namespace, the 
> rest url of \{{serviceName.namespace}} may not enough, it can not access 
> across the namespace. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26361) "Unexpected correlate variable $cor0" when using hive dialect to write a subquery

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26361:


 Summary: "Unexpected correlate variable $cor0" when using hive 
dialect to write a subquery
 Key: FLINK-26361
 URL: https://issues.apache.org/jira/browse/FLINK-26361
 Project: Flink
  Issue Type: Sub-task
Reporter: luoyuxia






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread luoyuxia (Jira)



 [ 
https://issues.apache.org/jira/browse/FLINK-26360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

luoyuxia updated FLINK-26360:
-
Description: 
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't suport or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowle 
about flink sql.

Feel free to leave your comment.

  was:
Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't suport or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowle 
about flink sql.


> [Umbrella] Improvement for Hive Query Syntax Compatibility
> --
>
> Key: FLINK-26360
> URL: https://issues.apache.org/jira/browse/FLINK-26360
> Project: Flink
>  Issue Type: Improvement
>  Components: Connectors / Hive
>Reporter: luoyuxia
>Priority: Major
>
> Currently, we have a support for hive synatax compatibility in flink as 
> described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].
> But there're still some features we don't suport or some bugs when using hive 
> synatax.
> In here, we want to make a improvement to solve the issues encountered when 
> using Hive dialect to make it be more smoothly when you mrigate your hive job 
> to flink or enable you write flink job using hive synatax with less knowle 
> about flink sql.
> Feel free to leave your comment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25727) Flink SQL convert constant string to char type which cause hive udtf json_tuple not work

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-25727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497847#comment-17497847
 ] 

luoyuxia commented on FLINK-25727:
--

I will fix it in 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].

> Flink SQL convert constant string to char type which cause hive udtf 
> json_tuple not work
> 
>
> Key: FLINK-25727
> URL: https://issues.apache.org/jira/browse/FLINK-25727
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Hive
>Affects Versions: 1.14.2
> Environment: Flink 1.14.2
> Hive 2.3.9
>Reporter: syntomic
>Priority: Not a Priority
>  Labels: features, pull-request-available
> Attachments: image-2022-01-20-17-44-58-478.png
>
>
> Flink SQL(use default dialect) is:
> {code:java}
> SELECT
>     a.`log`,
>     b.`role_id`
> FROM
>     tmp_kafka a, lateral table(json_tuple(`log`, 'role_id')) AS b(`role_id`); 
> {code}
> Exception is:
> {code:java}
> org.apache.flink.table.api.ValidationException: SQL validation failed. 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.org$apache$flink$table$planner$calcite$FlinkPlannerImpl$$validate(FlinkPlannerImpl.scala:164)
>   at 
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:107)
>   at 
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:215)
>   at 
> org.apache.flink.table.planner.delegation.ParserImpl.parse(ParserImpl.java:101)
>   at 
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlQuery(TableEnvironmentImpl.java:716)
>   at 
> org.apache.zeppelin.flink.sql.AbstractStreamSqlJob.run(AbstractStreamSqlJob.java:106)
>   at 
> org.apache.zeppelin.flink.FlinkStreamSqlInterpreter.callInnerSelect(FlinkStreamSqlInterpreter.java:86)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterpreter.callSelect(FlinkSqlInterpreter.java:494)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterpreter.callCommand(FlinkSqlInterpreter.java:257)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterpreter.runSqlList(FlinkSqlInterpreter.java:151)
>   at 
> org.apache.zeppelin.flink.FlinkSqlInterpreter.internalInterpret(FlinkSqlInterpreter.java:109)
>   at 
> org.apache.zeppelin.interpreter.AbstractInterpreter.interpret(AbstractInterpreter.java:55)
>   at 
> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:110)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:860)
>   at 
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:752)
>   at org.apache.zeppelin.scheduler.Job.run(Job.java:172)
>   at 
> org.apache.zeppelin.scheduler.AbstractScheduler.runJob(AbstractScheduler.java:132)
>   at 
> org.apache.zeppelin.scheduler.ParallelScheduler.lambda$runJobInScheduler$0(ParallelScheduler.java:46)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: 
> java.lang.reflect.InvocationTargetException
>   at 
> org.apache.flink.table.planner.functions.utils.HiveFunctionUtils.invokeGetResultType(HiveFunctionUtils.java:83)
>   at 
> org.apache.flink.table.planner.functions.utils.HiveTableSqlFunction.getRowType(HiveTableSqlFunction.java:116)
>   at 
> org.apache.flink.table.planner.functions.utils.TableSqlFunction$$anon$1.inferReturnType(TableSqlFunction.scala:89)
>   at 
> org.apache.calcite.sql.validate.ProcedureNamespace.validateImpl(ProcedureNamespace.java:69)
>   at 
> org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:84)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:997)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:975)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3085)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3070)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateJoin(SqlValidatorImpl.java:3133)
>   at 
> org.apache.flink.table.planner.calcite.FlinkCalciteSqlValidator.validateJoin(FlinkCalciteSqlValidator.java:117)
>   at 
> org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom(SqlValidatorImpl.java:3076)
>   at 
>

[jira] [Commented] (FLINK-26355) VarCharType was not be considered in HiveTableSqlFunction

2022-02-24 Thread luoyuxia (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497845#comment-17497845
 ] 

luoyuxia commented on FLINK-26355:
--

[~zoucao] The answer is yes. There's a similar issue 
[FLINK-25727|https://issues.apache.org/jira/browse/FLINK-25727]. It does a 
problem, I will fix it in 
[FLINK-15854|https://issues.apache.org/jira/browse/FLINK-15854].

And by the way, could you like to show your sql and exception to help us cover 
more case?

> VarCharType was not be considered in HiveTableSqlFunction
> -
>
> Key: FLINK-26355
> URL: https://issues.apache.org/jira/browse/FLINK-26355
> Project: Flink
>  Issue Type: Improvement
>  Components: Table SQL / Ecosystem
>Reporter: zoucao
>Priority: Major
>
> VarCharType was not be considered in `HiveTableSqlFunction#coerce`, see 
> [link|https://github.com/apache/flink/blob/a7192af8707f3f0d0f30fc71f3477edd92135cac/flink-table/flink-table-planner/src/main/java/org/apache/flink/table/planner/functions/utils/HiveTableSqlFunction.java#L146],
>  before invoke `HiveTableSqlFunction#coerce`, flink will call the method 
> `createFieldTypeFromLogicalType` to build argumentsArray, if the field's type 
> is varchar, the exception will be thrown.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (FLINK-26360) [Umbrella] Improvement for Hive Query Syntax Compatibility

2022-02-24 Thread luoyuxia (Jira)

luoyuxia created FLINK-26360:


 Summary: [Umbrella] Improvement for Hive Query Syntax Compatibility
 Key: FLINK-26360
 URL: https://issues.apache.org/jira/browse/FLINK-26360
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: luoyuxia


Currently, we have a support for hive synatax compatibility in flink as 
described in [FLINK-21529|https://issues.apache.org/jira/browse/FLINK-21529].

But there're still some features we don't suport or some bugs when using hive 
synatax.

In here, we want to make a improvement to solve the issues encountered when 
using Hive dialect to make it be more smoothly when you mrigate your hive job 
to flink or enable you write flink job using hive synatax with less knowle 
about flink sql.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] zhipeng93 edited a comment on pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



zhipeng93 edited a comment on pull request #66:
URL: https://github.com/apache/flink-ml/pull/66#issuecomment-1050431249


   Not ready..


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (FLINK-26306) Triggered checkpoints can be delayed by discarding shared state

2022-02-24 Thread Yuan Mei (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497408#comment-17497408
 ] 

Yuan Mei edited comment on FLINK-26306 at 2/25/22, 1:42 AM:


# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, but can not prevent the problem completely.
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.


was (Author: ym):
# Why using a separate pool for deletion is not a good idea?
 # If the answer to 1 is due to "backpressure". When mentioning "backpressure", 
do you mean triggering/starting new checkpoints faster than we can 
subsume/delete the old ones' states? 
 # If yes, then using separate pools, we can still pause triggering new 
checkpoint if state deletion speed not catching up
 # I agree that batching deletion and randomizing triggering materialization 
can mitigate the problem, and can not prevent it completely.
 # When talking about `backpressure`, isn't it usually related to data 
processing? I do not think checkpointing should affect normal data processing 
if that's the case.

> Triggered checkpoints can be delayed by discarding shared state
> ---
>
> Key: FLINK-26306
> URL: https://issues.apache.org/jira/browse/FLINK-26306
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Checkpointing
>Affects Versions: 1.15.0, 1.14.3
>Reporter: Roman Khachatryan
>Assignee: Roman Khachatryan
>Priority: Major
> Fix For: 1.16.0
>
>
> Quick note: CheckpointCleaner is not involved here.
> When a checkpoint is subsumed, SharedStateRegistry schedules its unused 
> shared state for async deletion. It uses common IO pool for this and adds a 
> Runnable per state handle. ( see SharedStateRegistryImpl.scheduleAsyncDelete)
> When a checkpoint is started, CheckpointCoordinator uses the same thread pool 
> to initialize the location for it. (see 
> CheckpointCoordinator.initializeCheckpoint)
> The thread pool is of fixed size 
> [jobmanager.io-pool.size|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#jobmanager-io-pool-size];
>  by default it's the number of CPU cores) and uses FIFO queue for tasks.
> When there is a spike in state deletion, the next checkpoint is delayed 
> waiting for an available IO thread.
> Back-pressure seems reasonable here (similar to CheckpointCleaner); however, 
> this shared state deletion could be spread across multiple subsequent 
> checkpoints, not neccesarily the next one.
>  
> I believe the issue is an pre-existing one; but it particularly affects 
> changelog state backend, because 1) such spikes are likely there; 2) 
> workloads are latency sensitive.
> In the tests, checkpoint duration grows from seconds to minutes immediately 
> after the materialization.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink-ml] zhipeng93 commented on pull request #66: [FLINK-26263] (followup) Check data size in LogisticRegression

2022-02-24 Thread GitBox



zhipeng93 commented on pull request #66:
URL: https://github.com/apache/flink-ml/pull/66#issuecomment-1050431249


   Hi Dong, Thanks for the review.
   
   > Could you confirm that the flaky test could be reproduced before this 
patch but not after this patch?
   
   I have repeatedly run this flaky test for up to 100 times and there is no 
failures.
   Do you mean that we need another test case for this bug fix? I think 
`LogisticRegressionTest#testMoreSubtaskThanData` already covers this case --- 
It sometimes fails before, but did not fail for 100 runs after this fix.

   > And could you update the AllReduceImpl's Java doc to replace `only one 
double array` with `up to one double array`? And maybe update allReduceSum() 
Java doc similarly?
   
   The java doc is updated and I also added one more test case to cover this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] flinkbot edited a comment on pull request #18913: [FLINK-26331] Make cleanup retry strategy configurable similar to how the task restart is configurable

2022-02-24 Thread GitBox



flinkbot edited a comment on pull request #18913:
URL: https://github.com/apache/flink/pull/18913#issuecomment-1050153597


   
   ## CI report:
   
   * 0ba635c5ae44773c15a90e8c668007e744b37251 Azure: 
[FAILURE](https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_build/results?buildId=32190)
 
   
   
   Bot commands
 The @flinkbot bot supports the following commands:
   
- `@flinkbot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] afedulov commented on a change in pull request #18903: [FLINK-26342][docs][formats] Add DataStream documentation for the CSV format

2022-02-24 Thread GitBox



afedulov commented on a change in pull request #18903:
URL: https://github.com/apache/flink/pull/18903#discussion_r814220659



##
File path: docs/content/docs/connectors/datastream/formats/csv.md
##
@@ -0,0 +1,87 @@
+---
+title:  "CSV"
+weight: 4
+type: docs
+aliases:
+- /dev/connectors/formats/csv.html
+- /apis/streaming/connectors/formats/csv.html
+---
+
+
+
+# CSV format
+
+To use the CSV format you need to add the Flink CSV dependency to your project:
+
+```xml
+
+   org.apache.flink
+   flink-csv
+   {{< version >}}
+
+```
+
+### Read
+
+Flink supports reading CSV files using `CsvReaderFormat`. The reader utilizes 
Jackson library and allows passing the corresponding configuration for the CSV 
schema and parsing options.
+
+`CsvReaderFormat` can be initialized and used like this:
+```java
+CsvReaderFormat csvFormat = CsvReaderFormat.forPojo(SomePojo.class);
+FileSource source = 
+FileSource.forRecordStreamFormat(csvFormat, 
Path.fromLocalFile(...)).build();
+```
+
+The schema for CSV parsing, in this case, is automatically derived based on 
the fields of the `SomePojo` class using the `Jackson` library. (Note: you 
might need to add `@JsonPropertyOrder({field1, field2, ...})` annotation to 
your class definition with the fields order exactly matching those of the CSV 
file columns).
+
+If you need more fine-grained control over the CSV schema or the parsing 
options, use the more low-level `forSchema` static factory method of 
`CsvReaderFormat`:
+
+```java
+CsvReaderFormat forSchema(CsvMapper mapper, 
+ CsvSchema schema, 
+ TypeInformation typeInformation) 
+```
+
+Similarly to the `TextLineInputFormat`, `CsvReaderFormat` can be used in both 
continues and batch modes (see [TextLineInputFormat]({{< ref 
"docs/connectors/formats/text_files" >}}  for examples).
+
+### Write
+
+For writing CSV files, use `CsvBulkWriter`. An example of producing CSV files 
from a stream of POJOs  looks like this:
+
+```java
+DataStream stream = ...
+BulkWriter.Factory bulkWriterFactory = (out) -> 
CsvBulkWriter.forPojo(SomePojo.class, out);
+
+FileSink sink = FileSink.forBulkFormat(new Path(...), 
bulkWriterFactory)
+.withBucketAssigner(new 
BasePathBucketAssigner<>())
+.build();
+
+stream.sinkTo(sink);
+```
+
+Similarly to the reader, the writer also utilizes Jackson library and supports 
fully customizable configuration using its `CsvMapper` and `CsvSchema` classes:
+
+```java
+
+Converter converter = (value, context) -> value; 
//An optional custom types converter, no-op in this example 
+CsvMapper csvMapper = new CsvMapper();
+CsvSchema schema = 
csvMapper.schemaFor(CityPojo.class).withColumnSeparator("|");;
+BulkWriter.Factory bulkWriterFactory = (out) -> 
CsvBulkWriter.forSchema(csvMapper, schema, converter, null, out);

Review comment:
   Hmm, indeed. IIRC Arvid proposed to make it package-private during the 
review, but that was before we added convenience static factory methods. It was 
not really defined as a requirement, but I do not see why not expose it in the 
next version.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (FLINK-26334) when the (timestamp - offset + windowSize) is less than 0 the calculation result of TimeWindow.getWindowSTartWithOffset is incorrect

2022-02-24 Thread realdengziqi (Jira)



[ 
https://issues.apache.org/jira/browse/FLINK-26334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497832#comment-17497832
 ] 

realdengziqi commented on FLINK-26334:
--

I want to work on this:D

> when the (timestamp - offset + windowSize) is less than 0 the calculation 
> result of TimeWindow.getWindowSTartWithOffset is incorrect
> 
>
> Key: FLINK-26334
> URL: https://issues.apache.org/jira/browse/FLINK-26334
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.14.3
> Environment: flink version 1.14.3
>Reporter: realdengziqi
>Priority: Major
>   Original Estimate: 16h
>  Remaining Estimate: 16h
>
>  
> source code
> {code:java}
> //Method to get the window start for a timestamp.
> //Params:
> //timestamp – epoch millisecond to get the window start.
> //offset – The offset which window start would be shifted by.
> //windowSize – The size of the generated windows.
> //Returns:
> //window start
> public static long getWindowStartWithOffset(long timestamp, long offset, long 
> windowSize) {
> return timestamp - (timestamp - offset + windowSize) % windowSize;
> } {code}
> If windowSize is 6 seconds, an element with a timestamp of -7000L should be 
> assigned to a window with a start time of -12000L. But this code will assign 
> it to the window whose start time is -6000L.
> According to the current calculation method, when the timestamp is (timestamp 
> - offset + windowSize)  is less than 0, the start time of the calculated time 
> window will be offset by one windowsSide unit in the direction of 0.
> I had a discussion with a friend and thought it was because the current 
> calculation logic is rounding towards 0. We should make it round to -∞.
> Do you think this is a bug. We would like to submit a pull request on github 
> to fix it.
> Below is a sample program for a scrolling window.
> {code:java}
> public class Test01 {
> public static void main(String[] args) throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> env.setParallelism(1);
> env
> .fromElements(
> Tuple2.of("a",-7L*1000L),// start time should be 
> -12s
> Tuple2.of("b",-1L*1000L),
> Tuple2.of("c",1L*1000L),
> Tuple2.of("d",7L*1000L)
> )
> .assignTimestampsAndWatermarks(
> 
> WatermarkStrategy.>forMonotonousTimestamps()
> .withTimestampAssigner(
> new 
> SerializableTimestampAssigner>() {
> @Override
> public long 
> extractTimestamp(Tuple2 element, long recordTimestamp) {
> return element.f1;
> }
> }
> )
> )
> .keyBy(r->1)
> .window(TumblingEventTimeWindows.of(Time.seconds(6)))
> .process(
> new ProcessWindowFunction, 
> String, Integer, TimeWindow>() {
> @Override
> public void process(Integer integer, 
> ProcessWindowFunction, String, Integer, 
> TimeWindow>.Context context, Iterable> elements, 
> Collector out) throws Exception {
> for (Tuple2 element : elements) 
> {
> JSONObject item = new JSONObject();
> item.put("data",element.toString());
> item.put("windowStartTime",new 
> Timestamp(context.window().getStart()).toString() );
> item.put("windowEndTime",new 
> Timestamp(context.window().getEnd()).toString() );
> out.collect(item.toJSONString());
> }
> }
> }
> )
> .print();
> env.execute();
> }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[GitHub] [flink] afedulov commented on a change in pull request #18911: [FLINK-26357][format] add FLINK API annotations

2022-02-24 Thread GitBox



afedulov commented on a change in pull request #18911:
URL: https://github.com/apache/flink/pull/18911#discussion_r814275794



##
File path: 
flink-formats/flink-parquet/src/main/java/org/apache/flink/formats/parquet/avro/AvroParquetRecordFormat.java
##
@@ -42,7 +43,17 @@
 
 import static org.apache.flink.util.Preconditions.checkArgument;
 
-/** A reader format that reads individual Avro records from a Parquet stream. 
*/
+/**
+ * A reader format that reads individual Avro records from a Parquet stream. 
This class leverages
+ * {@link ParquetReader} underneath. Developer should make sure the parquet 
files can be worked with
+ * provided avro schema and take care of any further compatibility issue.
+ *
+ * It is recommended to use the factory class {@link AvroParquetReaders} 
which is capable to

Review comment:
   Suggestion:
   
   ```
   For instantiation, it is recommended to use the factory class  {@link 
AvroParquetReaders}. It is capable of creating 
   versions of {@link AvroParquetRecordFormat} that can work with {@link 
GenericRecord GenericRecords}, {@link 
   org.apache.avro.specific.SpecificRecord SpecificRecords}, or {@link 
org.apache.avro.reflect.ReflectData reflect 
   records}.
   ```
   But I would probably remove the second sentence altogether because this is 
what the user will see in the API of the AvroParquetReaders class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 5 6 >

1 - 100 of 561 matches

Mail list logo