[jira] [Created] (FLINK-17994) Fix the race condition between CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived

2020-05-27 Thread Zhijiang (Jira)
Zhijiang created FLINK-17994:


 Summary: Fix the race condition between 
CheckpointBarrierUnaligner#processBarrier and #notifyBarrierReceived
 Key: FLINK-17994
 URL: https://issues.apache.org/jira/browse/FLINK-17994
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Reporter: Zhijiang
Assignee: Zhijiang
 Fix For: 1.11.0


The race condition issue happens as follow:
 * ch1 is received from network by netty thread and schedule the ch1 into 
mailbox via #notifyBarrierReceived
 * ch2 is received from network by netty thread, but before calling 
#notifyBarrierReceived this barrier was inserted into channel's data queue in 
advance. Then it would cause task thread process ch2 earlier than 
#notifyBarrierReceived by netty thread.
 * Task thread would execute checkpoint for ch2 directly because ch2 > ch1.
 * After that, the previous scheduled ch1 is performed from mailbox by task 
thread, then it causes the IllegalArgumentException inside 
SubtaskCheckpointCoordinatorImpl#checkpointState because it breaks the 
assumption that checkpoint is executed in incremental way. 

One possible solution for this race condition is inserting the received barrier 
into channel's data queue after calling #notifyBarrierReceived, then we can 
make the assumption that the checkpoint is always triggered by netty thread, to 
simplify the current situation that checkpoint might be triggered either by 
task thread or netty thread. 

To do so we can also avoid accessing #notifyBarrierReceived method by task 
thread while processing the barrier to simplify the logic inside 
CheckpointBarrierUnaligner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17993) read schema from orc file

2020-05-27 Thread dingli123 (Jira)
dingli123 created FLINK-17993:
-

 Summary: read schema from orc file
 Key: FLINK-17993
 URL: https://issues.apache.org/jira/browse/FLINK-17993
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ORC
Reporter: dingli123


now , if we use OrcTableSource , we must provide schema info.

In fact, the orc file already store schema info in file.

(hive --orcfiledump can see the schema info)

so OrcTableSource can read schema info from orc file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17992) Exception from RemoteInputChannel#onBuffer should not fail the whole NetworkClientHandler

2020-05-27 Thread Zhijiang (Jira)
Zhijiang created FLINK-17992:


 Summary: Exception from RemoteInputChannel#onBuffer should not 
fail the whole NetworkClientHandler
 Key: FLINK-17992
 URL: https://issues.apache.org/jira/browse/FLINK-17992
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Network
Affects Versions: 1.10.1, 1.10.0
Reporter: Zhijiang
Assignee: Zhijiang
 Fix For: 1.11.0


RemoteInputChannel#onBuffer is invoked by 
CreditBasedPartitionRequestClientHandler while receiving and decoding the 
network data. #onBuffer can throw exceptions which would tag the error in 
client handler and fail all the added input channels inside handler. Then it 
would cause a tricky potential issue as following.

If the RemoteInputChannel is canceling by canceler thread, then the task thread 
might exit early than canceler thread terminate. That means the 
PartitionRequestClient might not be closed (triggered by canceler thread) while 
the new task attempt is already deployed into this TaskManger. Therefore the 
new task might reuse the previous PartitionRequestClient while requesting 
partitions, but note that the respective client handler was already tagged an 
error before during above RemoteInputChannel#onBuffer. It will cause the next 
round unnecessary failover.

It is hard to find this potential issue in production because it can be 
restored normal finally after one or more additional failover. We find this 
potential problem from UnalignedCheckpointITCase because it will define the 
precise restart times within configured failures.

The solution is to only fail the respective task when its internal 
RemoteInputChannel#onBuffer throws any exceptions instead of failing the whole 
channels inside client handler, then the client is still health and can also be 
reused by other input channels as long as it is not released yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17991) Add JdbcAsyncLookupFunction

2020-05-27 Thread Jira
底限 created FLINK-17991:
--

 Summary: Add JdbcAsyncLookupFunction
 Key: FLINK-17991
 URL: https://issues.apache.org/jira/browse/FLINK-17991
 Project: Flink
  Issue Type: New Feature
  Components: Connectors / JDBC
Affects Versions: 1.12.0
Reporter: 底限


 It is very useful to support JdbcAsyncLookupFunction, user can use this to 
async lookup dimension fields in his streaming job.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17990) ArrowSourceFunctionTestBase.testParallelProcessing is instable

2020-05-27 Thread Dian Fu (Jira)
Dian Fu created FLINK-17990:
---

 Summary: ArrowSourceFunctionTestBase.testParallelProcessing is 
instable
 Key: FLINK-17990
 URL: https://issues.apache.org/jira/browse/FLINK-17990
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.11.0
Reporter: Dian Fu
Assignee: Dian Fu


It failed on cron JDK 11 tests with following error:
{code}
org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTest 
2020-05-27T21:37:21.2840832Z [ERROR] 
testParallelProcessing(org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTest)
 Time elapsed: 0.433 s <<< FAILURE! 2020-05-27T21:37:21.2841551Z 
java.lang.AssertionError: expected:<4> but was:<5> 2020-05-27T21:37:21.2842025Z 
at org.junit.Assert.fail(Assert.java:88) 2020-05-27T21:37:21.2842470Z at 
org.junit.Assert.failNotEquals(Assert.java:834) 2020-05-27T21:37:21.2842994Z at 
org.junit.Assert.assertEquals(Assert.java:645) 2020-05-27T21:37:21.2843484Z at 
org.junit.Assert.assertEquals(Assert.java:631) 2020-05-27T21:37:21.2844779Z at 
org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTestBase.checkElementsEquals(ArrowSourceFunctionTestBase.java:243)
 2020-05-27T21:37:21.2845873Z at 
org.apache.flink.table.runtime.arrow.sources.ArrowSourceFunctionTestBase.testParallelProcessing(ArrowSourceFunctionTestBase.java:233)
{code}



instance: 
[https://dev.azure.com/apache-flink/98463496-1af2-4620-8eab-a2ecc1a2e6fe/_apis/build/builds/2304/logs/535]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

2020-05-27 Thread Guowei Ma
Hi,
I think the StreamingFileSink could not support Azure currently.
You could find more detailed info from here[1].

[1] https://issues.apache.org/jira/browse/FLINK-17444
Best,
Guowei


Israel Ekpo  于2020年5月28日周四 上午6:04写道:

> You can assign the task to me and I will like to collaborate with someone
> to fix it.
>
> On Wed, May 27, 2020 at 5:52 PM Israel Ekpo  wrote:
>
>> Some users are running into issues when using Azure Blob Storage for the
>> StreamFileSink
>>
>> https://issues.apache.org/jira/browse/FLINK-17989
>>
>> The issue is because certain packages are relocated in the POM file and
>> some classes are dropped in the final shaded jar
>>
>> I have attempted to comment out the relocated and recompile the source
>> but I keep hitting roadblocks of other relocation and filtration each time
>> I update a specific pom file
>>
>> How can this be addressed so that these users can be unblocked? Why are
>> the classes filtered out? What is the workaround? I can work on the patch
>> if I have some guidance.
>>
>> This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same
>> issue but I am yet to confirm
>>
>> Thanks.
>>
>>
>>
>


Re: [DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

2020-05-27 Thread Israel Ekpo
You can assign the task to me and I will like to collaborate with someone
to fix it.

On Wed, May 27, 2020 at 5:52 PM Israel Ekpo  wrote:

> Some users are running into issues when using Azure Blob Storage for the
> StreamFileSink
>
> https://issues.apache.org/jira/browse/FLINK-17989
>
> The issue is because certain packages are relocated in the POM file and
> some classes are dropped in the final shaded jar
>
> I have attempted to comment out the relocated and recompile the source but
> I keep hitting roadblocks of other relocation and filtration each time I
> update a specific pom file
>
> How can this be addressed so that these users can be unblocked? Why are
> the classes filtered out? What is the workaround? I can work on the patch
> if I have some guidance.
>
> This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same
> issue but I am yet to confirm
>
> Thanks.
>
>
>


[DISCUSS] FLINK-17989 - java.lang.NoClassDefFoundError org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter

2020-05-27 Thread Israel Ekpo
Some users are running into issues when using Azure Blob Storage for the
StreamFileSink

https://issues.apache.org/jira/browse/FLINK-17989

The issue is because certain packages are relocated in the POM file and
some classes are dropped in the final shaded jar

I have attempted to comment out the relocated and recompile the source but
I keep hitting roadblocks of other relocation and filtration each time I
update a specific pom file

How can this be addressed so that these users can be unblocked? Why are the
classes filtered out? What is the workaround? I can work on the patch if I
have some guidance.

This is an issue in Flink 1.9 and 1.10 and I believe 1.11 has the same
issue but I am yet to confirm

Thanks.


[jira] [Created] (FLINK-17989) java.lang.NoClassDefFoundError: org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter

2020-05-27 Thread Israel Ekpo (Jira)
Israel Ekpo created FLINK-17989:
---

 Summary: java.lang.NoClassDefFoundError: 
org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter
 Key: FLINK-17989
 URL: https://issues.apache.org/jira/browse/FLINK-17989
 Project: Flink
  Issue Type: Bug
  Components: Connectors / FileSystem, FileSystems
Affects Versions: 1.10.1, 1.9.3
 Environment: Ubuntu 18

Java 1.8

Flink 1.9.x and 1.10.x
Reporter: Israel Ekpo
 Fix For: 1.10.1, 1.9.3


In the POM.xml classes from certain packages are relocated and filtered out of 
the final jar In the POM.xml classes from certain packages are relocated and 
filtered out of the final jar 
This is causing errors for customers and users that are using the 
StreamingFileSink with Azure Blob Storage in Flink version 1.9.x, 1.10.x and 
possibly 1.11.x 


https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-azure-fs-hadoop/pom.xml#L170https://github.com/apache/flink/blob/release-1.9/flink-filesystems/flink-fs-hadoop-shaded/pom.xml#L233

I would like to know why the relocation is happening and the reasoning behind 
the exclusion and filtering of the classes 
It seems to affect just the Azure file systems in my sample implementations

 

 
{code:java}
String outputPath = 
"wasbs://contai...@account.blob.core.windows.net/streaming-output/";

final StreamingFileSink sink = StreamingFileSink .forRowFormat(new 
Path(outputPath), new SimpleStringEncoder("UTF-8")) .build();
stream.addSink(sink);

// execute programenv.execute(StreamingJob.class.getCanonicalName()); 
{code}

2020-05-27 17:23:16java.lang.NoClassDefFoundError: 
org/apache/flink/fs/azure/common/hadoop/HadoopRecoverableWriter at 
org.apache.flink.fs.azure.common.hadoop.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
 at 
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
 at 
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.(Buckets.java:112)
 at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242)
 at 
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327)
 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
 at 
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
 at 
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395) 
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705) at 
org.apache.flink.runtime.taskmanager.Task.run(Task.java:530) at 
java.lang.Thread.run(Thread.java:748)Caused by: 
java.lang.ClassNotFoundException: 
org.apache.flink.fs.azure.common.hadoop.HadoopRecoverableWriter at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418) at 
org.apache.flink.util.ChildFirstClassLoader.loadClass(ChildFirstClassLoader.java:60)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:351)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17988) Checkpointing slows down after reaching state.checkpoints.num-retained

2020-05-27 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17988:
-

 Summary: Checkpointing slows down after reaching 
state.checkpoints.num-retained
 Key: FLINK-17988
 URL: https://issues.apache.org/jira/browse/FLINK-17988
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17987) KafkaITCase.testStartFromGroupOffsets fails with SchemaException: Error reading field

2020-05-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17987:
--

 Summary: KafkaITCase.testStartFromGroupOffsets fails with 
SchemaException: Error reading field
 Key: FLINK-17987
 URL: https://issues.apache.org/jira/browse/FLINK-17987
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka, Tests
Affects Versions: 1.12.0
Reporter: Robert Metzger


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=2276=logs=c5f0071e-1851-543e-9a45-9ac140befc32=684b1416-4c17-504e-d5ab-97ee44e08a20

{code}
2020-05-27T13:05:24.5355101Z Test 
testStartFromGroupOffsets(org.apache.flink.streaming.connectors.kafka.KafkaITCase)
 failed with:
2020-05-27T13:05:24.5355935Z 
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 
'api_versions': Error reading array of size 131084, only 12 bytes available
2020-05-27T13:05:24.5356501Zat 
org.apache.kafka.common.protocol.types.Schema.read(Schema.java:77)
2020-05-27T13:05:24.5356911Zat 
org.apache.kafka.common.protocol.ApiKeys.parseResponse(ApiKeys.java:308)
2020-05-27T13:05:24.5357350Zat 
org.apache.kafka.common.protocol.ApiKeys$1.parseResponse(ApiKeys.java:152)
2020-05-27T13:05:24.5357838Zat 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:687)
2020-05-27T13:05:24.5358333Zat 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:811)
2020-05-27T13:05:24.5358840Zat 
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:544)
2020-05-27T13:05:24.5359297Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:265)
2020-05-27T13:05:24.5359832Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
2020-05-27T13:05:24.5360659Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:227)
2020-05-27T13:05:24.5361292Zat 
org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.awaitMetadataUpdate(ConsumerNetworkClient.java:161)
2020-05-27T13:05:24.5361885Zat 
org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureCoordinatorReady(AbstractCoordinator.java:245)
2020-05-27T13:05:24.5362454Zat 
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:657)
2020-05-27T13:05:24.5363089Zat 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1425)
2020-05-27T13:05:24.5363558Zat 
org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1384)
2020-05-27T13:05:24.5364130Zat 
org.apache.flink.streaming.connectors.kafka.KafkaTestEnvironmentImpl$KafkaOffsetHandlerImpl.setCommittedOffset(KafkaTestEnvironmentImpl.java:444)
2020-05-27T13:05:24.5365027Zat 
org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase.runStartFromGroupOffsets(KafkaConsumerTestBase.java:554)
2020-05-27T13:05:24.5365596Zat 
org.apache.flink.streaming.connectors.kafka.KafkaITCase.testStartFromGroupOffsets(KafkaITCase.java:158)
2020-05-27T13:05:24.5366035Zat 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2020-05-27T13:05:24.5366425Zat 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
2020-05-27T13:05:24.5366871Zat 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
2020-05-27T13:05:24.5367285Zat 
java.lang.reflect.Method.invoke(Method.java:498)
2020-05-27T13:05:24.5367675Zat 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
2020-05-27T13:05:24.5368142Zat 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
2020-05-27T13:05:24.5368655Zat 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
2020-05-27T13:05:24.5369103Zat 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
2020-05-27T13:05:24.5369590Zat 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
2020-05-27T13:05:24.5370094Zat 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
2020-05-27T13:05:24.5370543Zat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
2020-05-27T13:05:24.5370947Zat java.lang.Thread.run(Thread.java:748)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Season of Docs - Ex-participant Intro

2020-05-27 Thread Kartik Khare
Hi,
Accidentally hit on Single Reply button instead of Reply All.
Are you able to see my previous mail now?

On Wed, May 27, 2020 at 11:09 PM Marta Paes Moreira 
wrote:

> Hi, Kartik!
>
> Sorry that you didn't get our replies to your self-presentation. You can
> find them below!
>
> Let me know if you have any questions.
>
> Marta
>
> -- Forwarded message -
> From: Marta Paes Moreira 
> Date: Thu, May 14, 2020 at 8:37 AM
> Subject: Re: Season of Docs - Ex-participant Intro
> To: dev 
> Cc: 
>
>
> Hey, Kartik!
>
> Thanks for reaching out and sharing your experience on GSoD last year.
>
> Let us know if you have any questions about the project that we can help
> with before the applications phase kicks off!
>
> Marta
>
> On Wed, May 13, 2020 at 11:30 PM Seth Wiesman  wrote:
>
>> Hey Kartik,
>>
>> Nice to connect, I am one of the GSoD mentors, along with Aljoscha in cc.
>> Happy to see you're interested in working with the community.
>>
>> Seth
>>
>> On Wed, May 13, 2020 at 4:10 PM Kartik Khare 
>> wrote:
>>
>> > Hi,
>> > This is Kartik. I an genuinely interested in working with your
>> organisation
>> > for Google season of Docs. I work as data engineer @ walmart labs but
>> also
>> > write a lot of blogs on distributed systems on
>> > https://medium.com/@kharekartik.
>> >
>> > I contributed one on flink's website as well:
>> >
>> >
>> https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html
>> >
>> > I participated as technical writer last year as well for Apache Airflow.
>> > You can check my project and journey here:
>> >
>> >
>> https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/
>> >
>> > Let's keep in touch.
>> >
>> >
>> > Regards,
>> > Kartik Khare
>> >
>>
>


Fwd: Season of Docs - Ex-participant Intro

2020-05-27 Thread Marta Paes Moreira
Hi, Kartik!

Sorry that you didn't get our replies to your self-presentation. You can
find them below!

Let me know if you have any questions.

Marta

-- Forwarded message -
From: Marta Paes Moreira 
Date: Thu, May 14, 2020 at 8:37 AM
Subject: Re: Season of Docs - Ex-participant Intro
To: dev 
Cc: 


Hey, Kartik!

Thanks for reaching out and sharing your experience on GSoD last year.

Let us know if you have any questions about the project that we can help
with before the applications phase kicks off!

Marta

On Wed, May 13, 2020 at 11:30 PM Seth Wiesman  wrote:

> Hey Kartik,
>
> Nice to connect, I am one of the GSoD mentors, along with Aljoscha in cc.
> Happy to see you're interested in working with the community.
>
> Seth
>
> On Wed, May 13, 2020 at 4:10 PM Kartik Khare 
> wrote:
>
> > Hi,
> > This is Kartik. I an genuinely interested in working with your
> organisation
> > for Google season of Docs. I work as data engineer @ walmart labs but
> also
> > write a lot of blogs on distributed systems on
> > https://medium.com/@kharekartik.
> >
> > I contributed one on flink's website as well:
> >
> >
> https://flink.apache.org/news/2020/02/07/a-guide-for-unit-testing-in-apache-flink.html
> >
> > I participated as technical writer last year as well for Apache Airflow.
> > You can check my project and journey here:
> >
> >
> https://airflow.apache.org/blog/experience-in-google-season-of-docs-2019-with-apache-airflow/
> >
> > Let's keep in touch.
> >
> >
> > Regards,
> > Kartik Khare
> >
>


[jira] [Created] (FLINK-17986) Erroneous check in FsCheckpointStateOutputStream#write(int)

2020-05-27 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17986:
-

 Summary: Erroneous check in 
FsCheckpointStateOutputStream#write(int)
 Key: FLINK-17986
 URL: https://issues.apache.org/jira/browse/FLINK-17986
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing, Runtime / Task
Affects Versions: 1.11.0, 1.12.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.11.0, 1.12.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Stateful Functions 2.1.0 soon?

2020-05-27 Thread Tzu-Li (Gordon) Tai
Hi all,

Quick update on progress:
There is still ongoing work on adding state TTL for remote functions
(FLINK-17875), but the PR is expected to be mergeable over the next day or
two.

Therefore the feature branch cut will be slightly delayed. I'll announce
the cut in a separate email once it happens.

Gordon


On Tue, May 26, 2020, 1:32 AM Stephan Ewen  wrote:

> Nice work, thanks for pushing this, Gordon!
>
> +1 also from my side for a quick release.
>
> I think it already warrants a release to have the 1.10.1 upgrade and the
> fix to not fail on savepoints that are triggered concurrently to a
> checkpoint.
> Even nicer that there are two cool new features included.
>
> On Fri, May 22, 2020 at 7:13 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Thanks for the positive feedback so far.
> >
> > Lets then set the feature freeze date for Stateful Functions 2.1.0 to be
> > next Wednesday (May 27th).
> >
> > We've made good progress over the past days, all mentioned features
> merged
> > besides the following:
> > - https://issues.apache.org/jira/browse/FLINK-17875 State TTL support
> for
> > remote functions
> >
> > Will keep track of that and hopefully cut the feature branch as
> scheduled.
> >
> > Cheers,
> > Gordon
> >
> > On Thu, May 21, 2020 at 7:22 PM Yuan Mei  wrote:
> >
> > > faster iteration definitely helps early-stage projects.
> > >
> > > +1
> > >
> > > Best,
> > > Yuan
> > >
> > >
> > > On Thu, May 21, 2020 at 4:14 PM Congxian Qiu 
> > > wrote:
> > >
> > > > +1 from my side to have smaller and more frequent feature releases
> for
> > > the
> > > > project in its early phases.
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > Marta Paes Moreira  于2020年5月21日周四 下午12:49写道:
> > > >
> > > > > +1 for more frequent releases with a shorter (but
> feedback-informed)
> > > > > feature set.
> > > > >
> > > > > Thanks, Gordon (and Igal)!
> > > > >
> > > > > Marta
> > > > >
> > > > > On Thu, 21 May 2020 at 03:44, Yu Li  wrote:
> > > > >
> > > > > > +1, it makes a lot of sense for stateful functions to evolve
> > faster.
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Wed, 20 May 2020 at 23:36, Zhijiang <
> wangzhijiang...@aliyun.com
> > > > > > .invalid>
> > > > > > wrote:
> > > > > >
> > > > > > > I also like this idea, considering stateful functions flexible
> > > enough
> > > > > to
> > > > > > > have a faster release cycle. +1 from my side.
> > > > > > >
> > > > > > > Best,
> > > > > > > Zhijiang
> > > > > > >
> > > > > > >
> > > > > > >
> > --
> > > > > > > From:Seth Wiesman 
> > > > > > > Send Time:2020年5月20日(星期三) 21:45
> > > > > > > To:dev 
> > > > > > > Subject:Re: [DISCUSS] Releasing Stateful Functions 2.1.0 soon?
> > > > > > >
> > > > > > > +1 for a fast release cycle
> > > > > > >
> > > > > > > Seth
> > > > > > >
> > > > > > > On Wed, May 20, 2020 at 8:43 AM Robert Metzger <
> > > rmetz...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I like the idea of releasing Statefun more frequently to have
> > > > faster
> > > > > > > > feedback cycles!
> > > > > > > >
> > > > > > > > No objections for releasing 2.1.0 from my side.
> > > > > > > >
> > > > > > > > On Wed, May 20, 2020 at 2:22 PM Tzu-Li (Gordon) Tai <
> > > > > > tzuli...@apache.org
> > > > > > > >
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi devs,
> > > > > > > > >
> > > > > > > > > Since Stateful Functions 2.0 was released early April,
> > > > > > > > > we've been getting some good feedback from various
> channels,
> > > > > > > > > including the Flink mailing lists, JIRA issues, as well as
> > > Stack
> > > > > > > Overflow
> > > > > > > > > questions.
> > > > > > > > >
> > > > > > > > > Some of the discussions have actually translated into new
> > > > features
> > > > > > > > > currently being implemented into the project, such as:
> > > > > > > > >
> > > > > > > > >- State TTL for the state primitives in Stateful
> Functions
> > > > (for
> > > > > > both
> > > > > > > > >embedded/remote functions)
> > > > > > > > >- Transport for remote functions via UNIX domain
> sockets,
> > > > which
> > > > > > > would
> > > > > > > > be
> > > > > > > > >useful when remote functions are co-located with Flink
> > > > StateFun
> > > > > > > > workers
> > > > > > > > >(i.e. the "sidecar" deployment mode)
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Besides that, some critical shortcomings have already been
> > > > > addressed
> > > > > > > > since
> > > > > > > > > the last release:
> > > > > > > > >
> > > > > > > > >- After upgrading to Flink 1.10.1, failure recovery in
> > > > Stateful
> > > > > > > > >Functions now works properly with the new scheduler.
> > > > > > > > >- Support for concurrent checkpoints
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > With these ongoing threads, while it's only been just short
> > of
> > > 2
> > > > > > months
> > > > > > > > > 

[jira] [Created] (FLINK-17985) Consolidate Connector Documentation

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17985:


 Summary: Consolidate Connector Documentation
 Key: FLINK-17985
 URL: https://issues.apache.org/jira/browse/FLINK-17985
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


The connector documentation is a mess and needs to be reorganized. We should 
focus on organizing based on external system vs flink api and make formats a 
first class section. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17984) Update Flink sidebar nav

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17984:


 Summary: Update Flink sidebar nav
 Key: FLINK-17984
 URL: https://issues.apache.org/jira/browse/FLINK-17984
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


Flink's docs contain two distinct sections

1) Getting started material
2) Reference documenation

This should be better differentiated in the side nav. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17983) Add new Flink docs homepage (style)

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17983:


 Summary: Add new Flink docs homepage (style)
 Key: FLINK-17983
 URL: https://issues.apache.org/jira/browse/FLINK-17983
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


The Flink documentation homepage is one of the first things users see when 
working with Flink. We should invest in making it more visually appealing. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17982) Finish Flink concepts docs

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17982:


 Summary: Finish Flink concepts docs
 Key: FLINK-17982
 URL: https://issues.apache.org/jira/browse/FLINK-17982
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


The concepts section in the documentation contains several TODO's. These should 
be replaced with content. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17981) Add new Flink docs homepage content

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17981:


 Summary: Add new Flink docs homepage content
 Key: FLINK-17981
 URL: https://issues.apache.org/jira/browse/FLINK-17981
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


Flink docs homepage requires a serious redesign to better guide users through 
the different sections of the documentation. 

This ticket is focused soley on updating the text. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17980) Simplify Flink Getting Started Material

2020-05-27 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-17980:


 Summary: Simplify Flink Getting Started Material
 Key: FLINK-17980
 URL: https://issues.apache.org/jira/browse/FLINK-17980
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Seth Wiesman


With the addition of hands-on training, Flink contains too much overlapping 
material and is confusing for new users. We propose adding a Try / Learn 
reorganization. 

The updated structure should: 

* Remove the "Getting Started" section and replace it with "Try Flink" 
containing the walkthroughs and playground

* Move project setup content to DataStream

* Rename Hands-on Flink to Learn Flink



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17979) Support rescaling for Unaligned Checkpoints

2020-05-27 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17979:
-

 Summary: Support rescaling for Unaligned Checkpoints
 Key: FLINK-17979
 URL: https://issues.apache.org/jira/browse/FLINK-17979
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Reporter: Roman Khachatryan
 Fix For: 1.12.0


This is one of the limitations of Unaligned Checkpoints MVP.

(see [Unaligned checkpoints: recovery & 
rescaling|https://docs.google.com/document/d/1T2WB163uf8xt6Eu2JS0Jyy2XZyF4YpnzGiHlo6twrks/edit?usp=sharing]
 for possible options)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17978) Test Hadoop dependency change

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17978:
-

 Summary: Test Hadoop dependency change
 Key: FLINK-17978
 URL: https://issues.apache.org/jira/browse/FLINK-17978
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Test the Hadoop dependency change:

* Run Flink with HBase/ORC (maybe add e2e test)
* Validate meaningful exception message if Hadoop dependency is missing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17977) Check log sanity

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17977:
-

 Summary: Check log sanity
 Key: FLINK-17977
 URL: https://issues.apache.org/jira/browse/FLINK-17977
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Run a normal Flink workload (e.g. job with fixed number of failures on session 
cluster) and check that the produced Flink logs make sense and don't contain 
confusing statements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17976) Test native K8s integration

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17976:
-

 Summary: Test native K8s integration
 Key: FLINK-17976
 URL: https://issues.apache.org/jira/browse/FLINK-17976
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Test Flink's native K8s integration:

* session mode
* application mode
* custom Flink image
* custom configuration and log properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17975) Test ZooKeeper 3.5 support

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17975:
-

 Summary: Test ZooKeeper 3.5 support
 Key: FLINK-17975
 URL: https://issues.apache.org/jira/browse/FLINK-17975
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Setup a Flink cluster with ZooKeeper 3.5 and run some HA tests on it (killing 
processes, failing jobs).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17974) Test new Flink Docker image

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17974:
-

 Summary: Test new Flink Docker image
 Key: FLINK-17974
 URL: https://issues.apache.org/jira/browse/FLINK-17974
 Project: Flink
  Issue Type: Task
  Components: Deployment / Docker
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Test Flink's new Docker image and the corresponding Dockerfile:

* Try to build custom image
* Try to run different Flink processes (Master (session, per-job), TaskManager)
* Try custom configuration and log properties



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17973) Test memory configuration of Flink cluster

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17973:
-

 Summary: Test memory configuration of Flink cluster
 Key: FLINK-17973
 URL: https://issues.apache.org/jira/browse/FLINK-17973
 Project: Flink
  Issue Type: Task
  Components: Runtime / Coordination, Tests
Affects Versions: 1.11.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Make sure that Flink processes (in particular Master processes) fail with a 
meaningful exception message if they exceed the configured memory budgets.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17972) Consider restructuring channel state

2020-05-27 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-17972:
-

 Summary: Consider restructuring channel state
 Key: FLINK-17972
 URL: https://issues.apache.org/jira/browse/FLINK-17972
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Checkpointing
Affects Versions: 1.11.0
Reporter: Roman Khachatryan
Assignee: Roman Khachatryan
 Fix For: 1.12.0


Current structure is the following (this PR doesn't change it):

{{}}
{code:java}
Each subtask reports to JM TaskStateSnapshot
  each with zero ore more OperatorSubtaskState,
each with zero or more InputChannelStateHandle and 
ResultSubpartitionStateHandle
  each referencing an underlying StreamStateHandle
{code}
The underlying {{StreamStateHandle}} duplicates filename 
({{ByteStreamStateHandle}} has it too at least because of {{equals/hashcode}} I 
guess).

An alternative would be something like

{{}}
{code:java}
Each subtask reports to JM TaskStateSnapshot
  each with zero ore more OperatorSubtaskState
each with zero or one StreamStateHandle (for channel state)
each with zero or more InputChannelStateHandle and 
ResultSubpartitionStateHandle{code}
{{}}

 

{{}}

{{(p}}{{robably, with }}{{StreamStateHandle}}{{ and }}{{InputChannelStateHandle 
and ResultSubpartitionStateHandle}}{{ encapsulated)}}

{{}}

It would be more effective (less data duplication) but probably also more 
error-prone (implicit structure), less flexible (re-scaling).{{}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17971) Speed up RocksDB bulk loading with SST generation and ingestion

2020-05-27 Thread Joey Pereira (Jira)
Joey Pereira created FLINK-17971:


 Summary: Speed up RocksDB bulk loading with SST generation and 
ingestion
 Key: FLINK-17971
 URL: https://issues.apache.org/jira/browse/FLINK-17971
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / State Backends
Reporter: Joey Pereira


RocksDB provides an API for creating SST files and ingesting them directly into 
RocksDB: 
[https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files]

Using this method for bulk loading data into RocksDB may provide a significant 
performance increase, specifically for paths doing inserts such as full 
savepoint recovery and state migrations. This is one method of optimizing bulk 
loads, as described in https://issues.apache.org/jira/browse/FLINK-17288

This was discussed on the user maillist: 
[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/RocksDB-savepoint-recovery-performance-improvements-td35238.html]

A draft PR is here: [https://github.com/apache/flink/pull/12345/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17970) Increase default value of IO pool executor to 4 * #cores

2020-05-27 Thread Till Rohrmann (Jira)
Till Rohrmann created FLINK-17970:
-

 Summary: Increase default value of IO pool executor to 4 * #cores
 Key: FLINK-17970
 URL: https://issues.apache.org/jira/browse/FLINK-17970
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Coordination
Affects Versions: 1.11.0
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 1.11.0


Currently, the default value of 
{{ClusterOptions.CLUSTER_IO_EXECUTOR_POOL_SIZE}} is #cores. I propose to 
increase it to 4 * #cores to support a higher load of blocking IO operations. 

Moreover, I propose to use a cached thread pool instead of a fixed thread pool. 
That way, only those use cases which have high IO load will actually occupy the 
required resources to start more threads.

Last but not least, I propose to change the config option name from 
{{cluster.io-executor.pool-size}} to {{cluster.io-pool.size}} which is a bit 
shorter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17969) Enhance Flink (Task) logging to include job name as context diagnostic information

2020-05-27 Thread Bhagavan (Jira)
Bhagavan created FLINK-17969:


 Summary: Enhance Flink (Task) logging to include job name as 
context diagnostic information
 Key: FLINK-17969
 URL: https://issues.apache.org/jira/browse/FLINK-17969
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Affects Versions: 1.10.0
Reporter: Bhagavan


Problem statement:
We use a shared session cluster (Standalone/Yarn) to execute jobs. All logs 
from the cluster are shipped using log aggregation framework (Logstash/Splunk) 
so that application diagnostic is easier.
However, we are missing one vital information in the logline. i.e. Job name so 
that we can filter the logs for a single job.

Background
Currently, Flink logging uses SLF4J as API to abstract away from concrete 
logging implementation (log4j 1.x, Logback or log4j2) and configuration of 
logging pattern and implementation can be configured at deployment, However, 
there is no MDC info from framework indicating job context.

Proposed improvement.

Add jobName field to Task class so that we can add it as MDC when task thread 
starts executing.

Change is trivial and uses SLF4J MDC API.

With this change, user can customise logging pattern to include MDC (e.g. in 
Logback [%X{jobName}])

Change required.
{code:java}
@@ -319,6 +323,7 @@ public class Task implements Runnable, TaskSlotPayload, 
TaskActions, PartitionPr
 
this.jobId = jobInformation.getJobId();
+   this.jobName = jobInformation.getJobName();
this.vertexId = taskInformation.getJobVertexId();
@@ -530,8 +535,10 @@ public class Task implements Runnable, TaskSlotPayload, 
TaskActions, PartitionPr
@Override
public void run() {
try {
+   MDC.put("jobName", this.jobName);
doRun();
} finally {
+   MDC.remove("jobName");
terminationFuture.complete(executionState);
}
}
{code}

if we are in agreement for this small change. Will raise PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17968) Hadoop Configuration is not properly serialized in HBaseRowInputFormat

2020-05-27 Thread Robert Metzger (Jira)
Robert Metzger created FLINK-17968:
--

 Summary: Hadoop Configuration is not properly serialized in 
HBaseRowInputFormat
 Key: FLINK-17968
 URL: https://issues.apache.org/jira/browse/FLINK-17968
 Project: Flink
  Issue Type: Bug
  Components: Connectors / HBase, Table SQL / Ecosystem
Affects Versions: 1.12.0
Reporter: Robert Metzger


This pull request mentions an issue with the serialization of the 
{{Configuration}} field in {{HBaseRowInputFormat}}: 
https://github.com/apache/flink/pull/12146.
After reviewing the code, it seems that this field is {transient}, thus it is 
always {null} at runtime.

Note {{HBaseRowDataInputFormat}} is likely suffering from the same issue 
(because the code has been copied)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17967) Could not find document *** in docs

2020-05-27 Thread Aven Wu (Jira)
Aven Wu created FLINK-17967:
---

 Summary: Could not find document *** in docs
 Key: FLINK-17967
 URL: https://issues.apache.org/jira/browse/FLINK-17967
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.11.1
Reporter: Aven Wu
 Fix For: 1.12.0, 1.11.1


When I run build_docs.sh -p in docs dir with *master*, some error as list down 
below. And web server cannot started.
{color:#FF}Liquid Exception: Could not find document 
'concepts/stateful-stream-processing.md' in tag 'link'. Make sure the document 
exists and the path is correct. in dev/stream/state/state.zh.md{color}

List files also have same problem.
_concepts/flink-architecture.zh.md_
_dev/batch/hadoop_compatibility.zh.md_
_dev/batch/index.zh.md_
_dev/connectors/cassandra.zh.md_
_dev/stream/operators/index.zh.md_
_dev/stream/operators/state.zh.md_
_dev/table/common.zh.md_

1.11 seems has same problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17966) Documentation improvement of checkpointing API

2020-05-27 Thread Vijaya Bhaskar V (Jira)
Vijaya Bhaskar V created FLINK-17966:


 Summary: Documentation improvement of checkpointing API
 Key: FLINK-17966
 URL: https://issues.apache.org/jira/browse/FLINK-17966
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.10.1
Reporter: Vijaya Bhaskar V


As per mail  thread discussion:

[https://lists.apache.org/thread.html/rfe38f6f15f35c28b3ea7865cee2a4dd1d31d93c8d212ad0723b9fd7b%40%3Cuser.flink.apache.org%3E]

While using Flink REST Monitoring API's related to checkpoints: 
[https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html]
 , its better to give reference of  
[https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/checkpoint_monitoring.html#overview-tab]
   and  retained check points: 
[https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/checkpoints.html]

and give example of various fields of it,  in the scenarios

a) When job has enough number of attempts after failure and recovery

b) When job has retained check points, how the response looks like

c) When the job has last but one attempt left and has already few check points 
left



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17965) Hive dialect doesn't properly handle special character escaping with SQL CLI

2020-05-27 Thread Rui Li (Jira)
Rui Li created FLINK-17965:
--

 Summary: Hive dialect doesn't properly handle special character 
escaping with SQL CLI
 Key: FLINK-17965
 URL: https://issues.apache.org/jira/browse/FLINK-17965
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Client
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17964) Hivemodule does not support map type

2020-05-27 Thread Jira
夏帅 created FLINK-17964:
--

 Summary: Hivemodule does not support map type
 Key: FLINK-17964
 URL: https://issues.apache.org/jira/browse/FLINK-17964
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Affects Versions: 1.10.1
Reporter: 夏帅
 Fix For: 1.12.0, 1.11.1


{code:java}
//代码占位符
Exception in thread "main" scala.MatchError: MAP (of class 
org.apache.flink.table.types.logical.LogicalTypeRoot)Exception in thread "main" 
scala.MatchError: MAP (of class 
org.apache.flink.table.types.logical.LogicalTypeRoot) at 
org.apache.flink.table.planner.codegen.CodeGenUtils$.hashCodeForType(CodeGenUtils.scala:212)
 at 
org.apache.flink.table.planner.codegen.HashCodeGenerator$$anonfun$2.apply(HashCodeGenerator.scala:97)
 at 
org.apache.flink.table.planner.codegen.HashCodeGenerator$$anonfun$2.apply(HashCodeGenerator.scala:91)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
 at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
 at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at 
scala.collection.TraversableLike$class.map(TraversableLike.scala:234) at 
scala.collection.AbstractTraversable.map(Traversable.scala:104) at 
org.apache.flink.table.planner.codegen.HashCodeGenerator$.generateCodeBody(HashCodeGenerator.scala:91)
 at 
org.apache.flink.table.planner.codegen.HashCodeGenerator$.generateRowHash(HashCodeGenerator.scala:61)
 at 
org.apache.flink.table.planner.plan.nodes.physical.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.scala:182)
 at 
org.apache.flink.table.planner.plan.nodes.physical.batch.BatchExecExchange.translateToPlanInternal(BatchExecExchange.scala:52)
 at 
org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:58)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17963) Revert execution environment patching in StatefulFunctionsJob (FLINK-16926)

2020-05-27 Thread Tzu-Li (Gordon) Tai (Jira)
Tzu-Li (Gordon) Tai created FLINK-17963:
---

 Summary: Revert execution environment patching in 
StatefulFunctionsJob (FLINK-16926)
 Key: FLINK-17963
 URL: https://issues.apache.org/jira/browse/FLINK-17963
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Affects Versions: statefun-2.0.1, statefun-2.1.0
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


In FLINK-16926, we explicitly "patched" the {{StreamExecutionEnvironment}} due 
to FLINK-16560.

Now that we have upgraded the Flink version in StateFun to 1.10.1 which 
includes a fix for FLINK-16560, we can now revert the patching of 
{{StreamExecutionEnvironment}} in the {{StatefulFunctionsJob}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release flink-shaded 11.0, release candidate #1

2020-05-27 Thread Robert Metzger
+1 (binding)

Checks:
- diff to flink-shaded 1.10:
https://github.com/apache/flink-shaded/compare/release-10.0...release-11.0-rc1

- mvn clean install passes on the source archive
- sha of source archive is correct
- source archive is signed by Chesnay
- mvn staging repo looks reasonable
- flink-shaded-zookeeper 3 jar license documentation seems correct



On Mon, May 25, 2020 at 7:14 PM Chesnay Schepler  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 11.0,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2], which are signed with the key with fingerprint 11D464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-11.0-rc1" [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Chesnay
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522=12347784
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-11.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1372/
> [5] https://github.com/apache/flink-shaded/tree/release-11.0-rc1
> [6] https://github.com/apache/flink-web/pull/340
>
>


[jira] [Created] (FLINK-17962) Add document for how to define Python UDF with DDL

2020-05-27 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-17962:
---

 Summary: Add document for how to define Python UDF with DDL
 Key: FLINK-17962
 URL: https://issues.apache.org/jira/browse/FLINK-17962
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: Hequn Cheng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17961) Create an Elasticsearch source

2020-05-27 Thread Etienne Chauchot (Jira)
Etienne Chauchot created FLINK-17961:


 Summary: Create an Elasticsearch source
 Key: FLINK-17961
 URL: https://issues.apache.org/jira/browse/FLINK-17961
 Project: Flink
  Issue Type: New Feature
Reporter: Etienne Chauchot


There is only an Elasticsearch sink available. There are opensource github 
repos such as [this 
one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also the 
apache bahir project does not provide an Elasticsearch source connector for 
flink either. IMHO I think the project would benefit from having an stock 
source connector for ES alongside with the available sink connector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17960) Improve commands in the "Common Questions" document for PyFlink

2020-05-27 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-17960:
---

 Summary: Improve commands in the "Common Questions" document for 
PyFlink
 Key: FLINK-17960
 URL: https://issues.apache.org/jira/browse/FLINK-17960
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Affects Versions: 1.11.0
Reporter: Hequn Cheng


Currently, in the ["Common Questions" 
|https://ci.apache.org/projects/flink/flink-docs-master/dev/table/python/common_questions.html#preparing-python-virtual-environment]document,
 we have the command `$ setup-pyflink-virtual-env.sh` to run the script. 
However, the script is not executable. It would be better to replace the 
command with `$ sh setup-pyflink-virtual-env.sh` and add download command.
{code}
$ curl -O 
https://ci.apache.org/projects/flink/flink-docs-master/downloads/setup-pyflink-virtual-env.sh
$ sh setup-pyflink-virtual-env.sh
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] (Document) Backwards Compatibility of Savepoints

2020-05-27 Thread Ufuk Celebi
I agree with Konstantin and Steven that it makes sense to point this out
explicitly.

I think that the following would be helpful:

1/ Mention breaking compatibility in release notes

2/ Update the linked table to reflect compatibilities while pointing out
what the community commits to maintain going forward (e.g. "happens to
work" vs. "guaranteed to work")

In general, the table is quite large. Would it make sense to order the
releases in reverse order (assuming that the table is more relevant for
recent releases)?

– Ufuk

On Tue, May 26, 2020 at 8:36 PM Steven Wu  wrote:

> > A use case for this might be when you want to rollback a framework
> upgrade (after some time) due to e.g. a performance
> or stability issue.
>
> Downgrade (that Konstantin called out) is an important and realistic
> scenario. It will be great to support backward compatibility for savepoint
> or at least document any breaking change.
>
> On Tue, May 26, 2020 at 4:39 AM Piotr Nowojski 
> wrote:
>
> > Hi,
> >
> > It might have been implicit choice, but so far we were not supporting the
> > scenario that you are asking for. It has never been tested and we have
> > lot’s of state migration code sprinkled among our code base (for example
> > upgrading state fields of the operators like [1]), that only supports
> > upgrades, not downgrades.
> >
> > Also we do not have testing infrastructure for checking the downgrades.
> We
> > would need to check if save points taken from master branch, are readable
> > by previous releases (not release branch!).
> >
> > So all in all, I don’t think it can be easily done. It would require some
> > effort to start maintaining backward compatibility.
> >
> > Piotrek
> >
> > [1]
> >
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011#migrateNextTransactionalIdHindState
> >
> > > On 26 May 2020, at 13:18, Konstantin Knauf  wrote:
> > >
> > > Hi everyone,
> > >
> > > I recently stumbled across the fact that Savepoints created with Flink
> > 1.11
> > > can not be read by Flink 1.10. A use case for this might be when you
> want
> > > to rollback a framework upgrade (after some time) due to e.g. a
> > performance
> > > or stability issue.
> > >
> > > From the documentation [1] it seems as if the Savepoint format is
> > generally
> > > only forward-compatible although in many cases it is actually also
> > > backwards compatible (e.g. Savepoint taken in Flink 1.10, restored with
> > > Flink 1.9).
> > >
> > > Was it a deliberate choice not to document any backwards compatibility?
> > If
> > > not, should we add the missing entries in the compatibility table?
> > >
> > > Thanks,
> > >
> > > Konstantin
> > >
> > > [1]
> > >
> >
> https://ci.apache.org/projects/flink/flink-docs-master/ops/upgrading.html#compatibility-table
> > >
> > > --
> > >
> > > Konstantin Knauf
> > >
> > > https://twitter.com/snntrable
> > >
> > > https://github.com/knaufk
> >
> >
>


[jira] [Created] (FLINK-17959) Exception: "CANCELLED: call already cancelled" is thrown when run python udf

2020-05-27 Thread Hequn Cheng (Jira)
Hequn Cheng created FLINK-17959:
---

 Summary: Exception: "CANCELLED: call already cancelled" is thrown 
when run python udf
 Key: FLINK-17959
 URL: https://issues.apache.org/jira/browse/FLINK-17959
 Project: Flink
  Issue Type: Bug
  Components: API / Python
Affects Versions: 1.10.1, 1.11.0
Reporter: Hequn Cheng


The exception is thrown when running Python UDF:
{code:java}
May 27, 2020 3:20:49 PM 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor run
SEVERE: Exception while executing runnable 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed@3960b30e
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.StatusRuntimeException: CANCELLED: 
call already cancelled
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Status.asRuntimeException(Status.java:524)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$ServerCallStreamObserverImpl.onCompleted(ServerCalls.java:366)
at 
org.apache.beam.runners.fnexecution.state.GrpcStateService$Inbound.onError(GrpcStateService.java:145)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.stub.ServerCalls$StreamingServerCallHandler$StreamingServerCallListener.onCancel(ServerCalls.java:270)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.PartialForwardingServerCallListener.onCancel(PartialForwardingServerCallListener.java:40)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:23)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onCancel(ForwardingServerCallListener.java:40)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.Contexts$ContextualizedServerCallListener.onCancel(Contexts.java:96)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.closed(ServerCallImpl.java:337)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1Closed.runInContext(ServerImpl.java:793)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at 
org.apache.beam.vendor.grpc.v1p21p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}

The job can output the right results however it seems something goes wrong 
during the shutdown procedure.

You can reproduce the exception with the following code(note: the exception 
happens occasionally):
{code}
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.table import StreamTableEnvironment, DataTypes
from pyflink.table.descriptors import Schema, OldCsv, FileSystem
from pyflink.table.udf import udf

env = StreamExecutionEnvironment.get_execution_environment()
env.set_parallelism(1)
t_env = StreamTableEnvironment.create(env)

add = udf(lambda i, j: i + j, [DataTypes.BIGINT(), DataTypes.BIGINT()], 
DataTypes.BIGINT())

t_env.register_function("add", add)

t_env.connect(FileSystem().path('/tmp/input')) \
.with_format(OldCsv()
 .field('a', DataTypes.BIGINT())
 .field('b', DataTypes.BIGINT())) \
.with_schema(Schema()
 .field('a', DataTypes.BIGINT())
 .field('b', DataTypes.BIGINT())) \
.create_temporary_table('mySource')

t_env.connect(FileSystem().path('/tmp/output')) \
.with_format(OldCsv()
 .field('sum', DataTypes.BIGINT())) \
.with_schema(Schema()
 .field('sum', DataTypes.BIGINT())) \
.create_temporary_table('mySink')

t_env.from_path('mySource')\
.select("add(a, b)") \
.insert_into('mySink')

t_env.execute("tutorial_job")
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17958) Kubernetes session constantly allocates taskmanagers after cancel a job

2020-05-27 Thread Yang Wang (Jira)
Yang Wang created FLINK-17958:
-

 Summary: Kubernetes session constantly allocates taskmanagers 
after cancel a job
 Key: FLINK-17958
 URL: https://issues.apache.org/jira/browse/FLINK-17958
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.0, 1.12.0
Reporter: Yang Wang
 Fix For: 1.11.0


When i am testing the {{kubernetes-session.sh}}, i find that the 
{{KubernetesResourceManager}} will constantly allocate taskmanager after cancel 
a job. I think it may be caused by a bug of the following code. When the 
{{dividend}} is 0 and {{divisor}} is bigger than 1, the return value will be 1. 
However, we expect it to be 0.
{code:java}
/**
 * Divide and rounding up to integer.
 * E.g., divideRoundUp(3, 2) returns 2.
 * @param dividend value to be divided by the divisor
 * @param divisor value by which the dividend is to be divided
 * @return the quotient rounding up to integer
 */
public static int divideRoundUp(int dividend, int divisor) {
   return (dividend - 1) / divisor + 1;
}{code}
 

How to reproduce this issue?
 # Start a Kubernetes session
 # Submit a Flink job to the existing session
 # Cancel the job and wait for the TaskManager released via idle timeout
 # More and more TaskManagers will be allocated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17957) Forbidden syntax "CREATE SYSTEM FUNCTION" for sql parser

2020-05-27 Thread Danny Chen (Jira)
Danny Chen created FLINK-17957:
--

 Summary: Forbidden syntax "CREATE SYSTEM FUNCTION" for sql parser
 Key: FLINK-17957
 URL: https://issues.apache.org/jira/browse/FLINK-17957
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Affects Versions: 1.11.0
Reporter: Danny Chen
 Fix For: 1.12.0


This syntax is invalid, but the parser still works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-17956) Add Flink 1.11 MigrationVersion

2020-05-27 Thread Aljoscha Krettek (Jira)
Aljoscha Krettek created FLINK-17956:


 Summary: Add Flink 1.11 MigrationVersion
 Key: FLINK-17956
 URL: https://issues.apache.org/jira/browse/FLINK-17956
 Project: Flink
  Issue Type: Task
  Components: API / Core
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)