Re: (无主题)

2020-12-25 Thread Shawn Huang
Hi Liwen,

you don't need contributor permissions to contribute to Apache Flink.

Just open/pick an issue and ping a committer to assign the ticket to you.

Best,
Shawn Huang


刘力文  于2020年12月25日周五 下午6:24写道:

> Hi Guys,
> I want to contribute to Apache Flink.
> Would you please give me the permission as a contributor?
> My JIRA ID is Liwen Liu.
>
>
> | |
> 刘力文
> |
> |
> 邮箱:liulw_g...@163.com
> |
>
> 签名由 网易邮箱大师 定制


[jira] [Created] (FLINK-20773) Support allow-unescaped-control-chars option for JSON format

2020-12-25 Thread xiaozilong (Jira)
xiaozilong created FLINK-20773:
--

 Summary: Support allow-unescaped-control-chars option for JSON 
format
 Key: FLINK-20773
 URL: https://issues.apache.org/jira/browse/FLINK-20773
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Affects Versions: 1.12.0
Reporter: xiaozilong
 Attachments: image-2020-12-25-20-21-50-637.png

Can we add a option `allow-unquoted-ctrl-char` for json format because of it 
will throw exception when exist illegal unquoted characters in data.

!image-2020-12-25-20-21-50-637.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Make Temporal Join syntax easier to use

2020-12-25 Thread Leonard Xu
Hi, Shuo Cheng

Thanks for bringing this topic, I think it’s a good idea to simplify current 
temporal join syntax.  
But I think the most valuable thing is to simplify the long keyword FOR 
SYSTEM_TIME AS OF  lying in FOR SYSTEM_TIME AS OF L.PROCTIME  syntax  rather 
than simplify the  L.PROCTIME reference to a function PROCTIME() or  
pseudo-column _PROCTIME.

> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to avoid 
> conflict) to the table by default, just like the pseudo-column of classic 
> database
-1 for pseudo-column, I didn’t see any benefit for importing a  pretty 
pseudo-column concept and the motivation only comes from a minor syntax 
simplification purpose.   

For PROCTIME(), I’ve same concern it’s not clear who keeps the function() ? LHS 
table or RHS table or event JOIN ? I think Danny’s explanation the PROCTIME() 
computation should happen in row level makes sense, but I think the computation 
should be triggered when the LHS table’s record correlates the version of RHS 
table. And for event time temporal join, the syntax  FOR SYSTEM_TIME AS OF 
l.rowtime also follows similar semantic that use the rowtime in LHS table’s 
record to correlates the version of RHS table. 

Thus I tend to keep current temporal join syntax, I won’t say +1 or -1 for  
'FOR SYSTEM_TIME AS OF PROCTIME()’ simplification. 


Best,
Leonard

 
> I don't think add a pseudo-column is a good solution because of these 
> reasons:
> 
> - The normal pseudo-column or system column like _rowID_ has a
> underneath storage, user can select the column from a table [1] , and each
> row has a deterministic value bind to it for the pseudo-column (although it
> may change when the row is deleted and inserted again), but the PROCTIME
> for Flink behaves more like a row level runtime attribute which is
> different for different queries and even different SQL contexts.
> 
> - The pseudo-column make the table schema more complex but they are only
> useful when we want to use the time-attributes.
> 
>> Actually, we have another simpler solution, i.e., enrich the syntax for
> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'.
> 
> Maybe our biggest concern is that the syntax does not make it clear where
> the who triggers the PROCTIME() computation, similar with
> `current_timestamp`, from my understanding, the `PROCTIME()` is computed in
> row level by the system, when a record from the LHS is used to join the RHS
> table. So generally i'm +1 for 'FOR SYSTEM_TIME AS OF PROCTIME()'.
> 
> BTW, in the long term, we should think how to simplifies the `FOR
> SYSTEM_TIME AS OF ` syntax, because the proc time temporal table join is
> the most common case and we should make the temporal table join default to
> be 'PROCTIME'. Ideally a normal `A JOIN B` can describe a PROCTIME temporal
> table join. The solution to add pseudo-column seems deviate further and
> further from this path.
> 
> [1]
> https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns008.htm
> 
> Shuo Cheng  于2020年12月21日周一 上午10:16写道:
>>  

  
>> I think maybe we can add a pseudo-column `PROCTIME` (or `_PROCTIME` to
>> avoid conflict) to the table by default, just like the pseudo-column of
>> classic database, e.g., `ROWID` in Oracle. 




 Less elegant solution
>> Actually, we have another simpler solution, i.e., enrich the syntax for
>> temporal table join to support 'FOR SYSTEM_TIME AS OF PROCTIME()'. It is
>> also very convenient, However, the `PROCTIME()` in 'FOR SYSTEM_TIME AS OF
>> PROCTIME()' is ambiguous, because it cannot tell where the version time of
>> temporal table comes from, left table or right table? The former is what we
>> want. So I think this solution is not preferred.
>> 
>> Looking forward to your feedback~
>> 
>> Best,
>> Shuo
>> 



[jira] [Created] (FLINK-20772) [DISCUSS] RocksDBValueState with TTL occurs NullPointerException when calling update(null) method

2020-12-25 Thread Seongbae Chang (Jira)
Seongbae Chang created FLINK-20772:
--

 Summary: [DISCUSS] RocksDBValueState with TTL occurs 
NullPointerException when calling update(null) method 
 Key: FLINK-20772
 URL: https://issues.apache.org/jira/browse/FLINK-20772
 Project: Flink
  Issue Type: Bug
  Components: Runtime / State Backends
Affects Versions: 1.11.2
 Environment: Flink version: 1.11.2

Flink Cluster: Standalone cluster with 3 Job managers and Task managers on 
CentOS 7
Reporter: Seongbae Chang


h2. Problem
 * I use ValueState for my custom trigger and set TTL for these ValueState in 
RocksDB backend environment.
 * I found an error when I used this code. I know that ValueState.update(null) 
works equally to ValueState.clear() in general. Unfortunately, this error 
occurs after using TTL

{code:java}
// My Code
ctx.getPartitionedState(batchTotalSizeStateDesc).update(null);
{code}
 * I tested this in Flink 1.11.2, but I think it would be a problem in upper 
versions.
 * Plus, I'm a beginner. So, if there is any problem in this discussion issue, 
please give me advice about that. And I'll fix it! 

{code:java}
// Error Stacktrace
Caused by: TimerException{org.apache.flink.util.FlinkRuntimeException: Error 
while adding data to RocksDB}
... 12 more
Caused by: org.apache.flink.util.FlinkRuntimeException: Error while adding data 
to RocksDB
at 
org.apache.flink.contrib.streaming.state.RocksDBValueState.update(RocksDBValueState.java:108)
at 
org.apache.flink.runtime.state.ttl.TtlValueState.update(TtlValueState.java:50)
at .onProcessingTime(ActionBatchTimeTrigger.java:102)
at .onProcessingTime(ActionBatchTimeTrigger.java:29)
at 
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator$Context.onProcessingTime(WindowOperator.java:902)
at 
org.apache.flink.streaming.runtime.operators.windowing.WindowOperator.onProcessingTime(WindowOperator.java:498)
at 
org.apache.flink.streaming.api.operators.InternalTimerServiceImpl.onProcessingTime(InternalTimerServiceImpl.java:260)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invokeProcessingTimeCallback(StreamTask.java:1220)
... 11 more
Caused by: java.lang.NullPointerException
at 
org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:69)
at 
org.apache.flink.api.common.typeutils.base.IntSerializer.serialize(IntSerializer.java:32)
at 
org.apache.flink.api.common.typeutils.CompositeSerializer.serialize(CompositeSerializer.java:142)
at 
org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValueInternal(AbstractRocksDBState.java:158)
at 
org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValue(AbstractRocksDBState.java:178)
at 
org.apache.flink.contrib.streaming.state.AbstractRocksDBState.serializeValue(AbstractRocksDBState.java:167)
at 
org.apache.flink.contrib.streaming.state.RocksDBValueState.update(RocksDBValueState.java:106)
... 18 more
{code}
 
h2. Reason
 * It relates to RocksDBValueState with TTLValueState
 * In RocksDBValueState(as well as other types of ValueState), *.update(null)* 
has to be caught in if-clauses(null checking). However, it skips the null 
checking and then tries to serialize the null value.

{code:java}
// 
https://github.com/apache/flink/blob/release-1.11/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBValueState.java#L96-L110

@Override
public void update(V value) { 
if (value == null) { 
clear(); 
return; 
}
 
try { 
backend.db.put(columnFamily, writeOptions, 
serializeCurrentKeyWithGroupAndNamespace(), serializeValue(value)); 
} catch (Exception e) { 
throw new FlinkRuntimeException("Error while adding data to RocksDB", 
e);  
}
}{code}
 *  It is because that TtlValueState wraps the value(null) with the 
LastAccessTime and makes the new TtlValue Object with the null value.

{code:java}
// 
https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlValueState.java#L47-L51

@Override
public void update(T value) throws IOException { 
accessCallback.run(); 
original.update(wrapWithTs(value));
}
{code}
{code:java}
// 
https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlUtils.java#L46-L48

static  TtlValue wrapWithTs(V value, long ts) { 
return new TtlValue<>(value, ts);
}{code}
{code:java}
// 
https://github.com/apache/flink/blob/release-1.11/flink-runtime/src/main/java/org/apache/flink/runtime/state/ttl/TtlValue.java

public class TtlValue implements Serializable {
private static final long serialVersionUID = 5221129704201125020L;

@Nullable
private final T 

(无主题)

2020-12-25 Thread 刘力文
Hi Guys,
I want to contribute to Apache Flink.
Would you please give me the permission as a contributor?
My JIRA ID is Liwen Liu.


| |
刘力文
|
|
邮箱:liulw_g...@163.com
|

签名由 网易邮箱大师 定制

[jira] [Created] (FLINK-20771) Hive partition is not added when there is a lot of data

2020-12-25 Thread hehuiyuan (Jira)
hehuiyuan created FLINK-20771:
-

 Summary: Hive partition is not added when there is a lot of data
 Key: FLINK-20771
 URL: https://issues.apache.org/jira/browse/FLINK-20771
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Hive
Reporter: hehuiyuan
 Attachments: image-2020-12-25-18-09-42-707.png, 
image-2020-12-25-18-15-07-519.png

Hive partition is not added when the data is huge .

!image-2020-12-25-18-09-42-707.png|width=437,height=115!

  Before partition commit, *inProgressPart* will be reinitialize .

But bucket is active , the partition is 

!image-2020-12-25-18-15-07-519.png|width=574,height=192!

bucket is active , so the  notifyBucketInactive is  not executed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-20770) Incorrect description for config option kubernetes.rest-service.exposed.type

2020-12-25 Thread Yang Wang (Jira)
Yang Wang created FLINK-20770:
-

 Summary: Incorrect description for config option 
kubernetes.rest-service.exposed.type
 Key: FLINK-20770
 URL: https://issues.apache.org/jira/browse/FLINK-20770
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Affects Versions: 1.11.3, 1.12.0
Reporter: Yang Wang


{code:java}
public static final ConfigOption REST_SERVICE_EXPOSED_TYPE =
   key("kubernetes.rest-service.exposed.type")
   .enumType(ServiceExposedType.class)
   .defaultValue(ServiceExposedType.LoadBalancer)
   .withDescription("The type of the rest service (ClusterIP or NodePort or 
LoadBalancer). " +
  "When set to ClusterIP, the rest service will not be created.");
{code}
The description of the config option is not correct. We will always create the 
rest service after refactoring the Kubernetes decorators in FLINK-16194. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Releasing Apache Flink 1.12.1

2020-12-25 Thread Xintong Song
Hi devs,

I'm very glad to announce that all known blocker issues for release-1.12.1
have been resolved.

I'm creating our first release candidate now and will start a separate
voting thread as soon as RC1 is created.

Thanks everyone, and Merry Christmas.

Thank you~

Xintong Song



On Wed, Dec 23, 2020 at 6:07 PM Xintong Song  wrote:

> Hi devs,
>
> Updates on the progress of release.
>
> In the past week, more than 20 issues were resolved for release 1.12.1.
> Thanks for the efforts.
>
> We still have 3 unresolved release blockers at the moment.
>
>- [FLINK-20648] Unable to restore from savepoints with Kubernetes HA.
>Consensus has been reached on the solution. @Yang Wang is working on a
>PR.
>- [FLINK-20654] Unaligned checkpoint recovery may lead to corrupted
>data stream.
>@Roman Khachatryan is still investigating the problem.
>- [FLINK-20664] Support setting service account for TaskManager pod.
>Boris Lublinsky has opened a PR, which is already reviewed and close
>to mergeable.
>
> Since we are targeting a swift release, I'm not intended to further delay
> the release for other non-blocker issues, unless there's a good reason.
> If there's anything that you believe is absolutely necessary for release
> 1.12.1, please reach out to me.
> Otherwise, the voting process will be started as soon as the above
> blockers are addressed.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Mon, Dec 21, 2020 at 10:05 AM Xingbo Huang  wrote:
>
>> Hi Xintong,
>>
>> Thanks a lot for driving this.
>>
>> I'd like to bring one more issue to your attention:
>> https://issues.apache.org/jira/browse/FLINK-20389.
>> This issue occurs quite frequently. Arvid and Kezhu have done some
>> investigations of this issue and it may indicate a bug of the new Source
>> API. It would be great to figure out the root cause of this issue.
>>
>> Best,
>> Xingbo
>>
>> Xintong Song  于2020年12月18日周五 下午7:49写道:
>>
>> > Thanks for the replies so far.
>> >
>> > I've been reaching out to the owners of the reported issues. It seems
>> most
>> > of the blockers are likely resolved in the next few days.
>> >
>> > Since some of the issues are quite critical, I'd like to aim for a
>> *feature
>> > freeze on Dec. 23rd*, and start the release voting process by the end of
>> > this week.
>> >
>> > If there's anything you might need more time for, please reach out to
>> me.
>> >
>> > Thank you~
>> >
>> > Xintong Song
>> >
>> >
>> >
>> > On Fri, Dec 18, 2020 at 3:19 PM Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> > wrote:
>> >
>> > > Thanks Xintong for driving this.
>> > >
>> > > I'd like to make two more issues related to the Kinesis connector
>> changes
>> > > in 1.12.0 a blocker for 1.12.1:
>> > > https://issues.apache.org/jira/browse/FLINK-20630
>> > > https://issues.apache.org/jira/browse/FLINK-20629
>> > >
>> > > There are already PRs for these issues from @Cranmer, Danny
>> > > , will try to merge these very soon.
>> > >
>> > > Cheers,
>> > > Gordon
>> > >
>> > > On Fri, Dec 18, 2020 at 1:19 PM Guowei Ma 
>> wrote:
>> > >
>> > >> Thanks for driving this release Xintong.
>> > >> I think https://issues.apache.org/jira/browse/FLINK-20652 should be
>> > >> addressed.
>> > >>
>> > >> Best,
>> > >> Guowei
>> > >>
>> > >>
>> > >> On Fri, Dec 18, 2020 at 11:53 AM Jingsong Li > >
>> > >> wrote:
>> > >>
>> > >> > Thanks for volunteering as our release manager Xintong. +1 for
>> > releasing
>> > >> > Flink 1.12.1 soon.
>> > >> >
>> > >> > I think https://issues.apache.org/jira/browse/FLINK-20665 should
>> be
>> > >> > addressed, I marked it as a Blocker.
>> > >> >
>> > >> > Best,
>> > >> > Jingsong
>> > >> >
>> > >> > On Fri, Dec 18, 2020 at 11:16 AM Yang Wang 
>> > >> wrote:
>> > >> >
>> > >> > > Hi David,
>> > >> > >
>> > >> > > I will take a look this ticket FLINK-20648 and try to get it
>> > resolved
>> > >> in
>> > >> > > this release cycle.
>> > >> > >
>> > >> > > @Xintong Song 
>> > >> > > One more Kubernetes HA related issue. We need to support setting
>> > >> service
>> > >> > > account for TaskManager pod[1]. Even though we have a work around
>> > for
>> > >> > this
>> > >> > > issue, but it is not acceptable to always let the default service
>> > >> account
>> > >> > > with enough permissions.
>> > >> > >
>> > >> > > [1]. https://issues.apache.org/jira/browse/FLINK-20664
>> > >> > >
>> > >> > > Best,
>> > >> > > Yang
>> > >> > >
>> > >> > >
>> > >> > > David Morávek  于2020年12月18日周五
>> 上午12:47写道:
>> > >> > >
>> > >> > > > Hi, I think https://issues.apache.org/jira/browse/FLINK-20648
>> > >> should
>> > >> > be
>> > >> > > > addressed, as Kubernetes HA was one of the main selling points
>> of
>> > >> this
>> > >> > > > release. WDYT?
>> > >> > > >
>> > >> > > > D.
>> > >> > > >
>> > >> > > > Sent from my iPhone
>> > >> > > >
>> > >> > > > > On 17. 12. 2020, at 13:54, Yun Tang 
>> wrote:
>> > >> > > > >
>> > >> > > > > Thanks for driving this quick-fix release.
>> > >> > > > > +1 for fixing the bug of RocksDB 

[jira] [Created] (FLINK-20769) Support minibatch to optimize Python UDAF

2020-12-25 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-20769:


 Summary: Support minibatch to optimize Python UDAF
 Key: FLINK-20769
 URL: https://issues.apache.org/jira/browse/FLINK-20769
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Huang Xingbo
 Fix For: 1.13.0


Support minibatch to optimize Python UDAF



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Apache Flink Stateful Functions 2.2.2, release candidate #2

2020-12-25 Thread Dian Fu
Thanks a lot for driving this, Gordon!

+1 (binding)

- verified the checksum and signature
- looked through the changes since 2.2.1 for license changes, verified the 
NOTICE file of statefun-flink-datastream and it looks good to me
- the website PR LGTM

Regards,
Dian

> 在 2020年12月24日,下午3:21,Xingbo Huang  写道:
> 
> +1 (non-binding)
> 
> - Verify checksums and GPG files
> - Verify that the source archives do not contains any binaries
> - Build the source with Maven to ensure all source files have Apache
> headers (JDK8)
> Command: mvn clean install -Papache-release
> - Run e2e tests (JDK8)
> - Check that all POM files, Dockerfiles, examples point to the same
> version. That includes the quickstart artifact POM files.
> - pip install apache_flink_statefun-2.2.2-py3-none-any.whl
> - Verified NOTICE files in statefun-flink-datastream,
> statefun-flink-distribution and statefun-ridesharing-example-simulator
> 
> Best,
> Xingbo



[jira] [Created] (FLINK-20768) Support routing field for Elasticsearch connector

2020-12-25 Thread wangsan (Jira)
wangsan created FLINK-20768:
---

 Summary: Support routing field for Elasticsearch connector
 Key: FLINK-20768
 URL: https://issues.apache.org/jira/browse/FLINK-20768
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Reporter: wangsan


Routing in Elasticsearch can help with search efficency for large scale 
dataset, we should support this feature as an optional config in Elasticsearch 
connector.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)