请问如何将已有字段设置为rowtime属性

2019-10-28 Thread 苏 欣
各位好,我想使用kafka消息中的某个字段作为rowtime属性,遇到了以下问题,使用flink版本为1.9.1。 以下是我尝试的两种用法,都会报错。请问大家有没有遇到过类似的问题,怎么解决的,谢谢! 代码一: tEnv.connect( new Kafka() .version("universal") .topic("flink-test-dept-1") .startFromGroupOffsets()

Re: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread Dian Fu
I guess this is a bug in Hadoop 2.6.5 and has been fixed in Hadoop 2.8.0 [1]. You can work around it by explicitly setting the configration "hadoop.security.crypto.codec.classes.aes.ctr.nopadding" as "org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,

Re:Re: Flink 的 log 文件夹下产生了 34G 日志

2019-10-28 Thread Henry
哇,非常感谢,打开了。我研究一下,感谢啦 在 2019-10-28 23:23:57,"Dian Fu" 写道: >我这边是正常的,那你试下这个链接:http://mail-archives.apache.org/mod_mbox/flink-user-zh/201909.mbox/%3c465086db-1679-4dd9-aab5-086bb4039...@gmail.com%3E >

Re: Flink 的 log 文件夹下产生了 34G 日志

2019-10-28 Thread Dian Fu
我这边是正常的,那你试下这个链接:http://mail-archives.apache.org/mod_mbox/flink-user-zh/201909.mbox/%3c465086db-1679-4dd9-aab5-086bb4039...@gmail.com%3E > 在 2019年10月28日,下午10:00,Henry 写道: >

Re:Re: Flink 的 log 文件夹下产生了 34G 日志

2019-10-28 Thread Henry
谢谢您。不过连接打不开呀。 http://apache-flink.147419.n8.nabble.com/Flink-BlobServerConnection-NoSuchFileException-td722.html#a724 在 2019-10-28 20:02:21,"Dian Fu" 写道: >之前有过类似问题,你看一下这个回复,对你是否有帮助:http://apache-flink.147419.n8.nabble.com/Flink-BlobServerConnection-NoSuchFileException-td722.html#a724 >

Re: Flink 1.5+ performance in a Java standalone environment

2019-10-28 Thread Jakub Danilewicz
Thanks for your replies. We use Flink from within a standalone Java 8 application (no Hadoop, no clustering), so it's basically boils down to running a simple code like this: import java.util.*; import org.apache.flink.api.java.ExecutionEnvironment; import org.apache.flink.graph.*; import

RE: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, >From debug logs I could see below logs in taskmanager. Please have a look. org.apache.hadoop.ipc.ProtobufRpcEngine Call: addBlock took 372ms"} org.apache.hadoop.ipc.ProtobufRpcEngine Call: addBlock took 372ms"} org.apache.hadoop.hdfs.DFSClient pipeline = 10.76.113.216:1044"}

Re: Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread Dian Fu
It seems that the CryptoCodec is null from the exception stack trace. This may occur when "hadoop.security.crypto.codec.classes.aes.ctr.nopadding" is misconfigured. You could change the log level to "DEBUG" and it will show more detailed information about why CryptoCodec is null. > 在

Re: Flink 的 log 文件夹下产生了 34G 日志

2019-10-28 Thread Dian Fu
之前有过类似问题,你看一下这个回复,对你是否有帮助:http://apache-flink.147419.n8.nabble.com/Flink-BlobServerConnection-NoSuchFileException-td722.html#a724 > 在 2019年10月28日,下午3:54,Henry 写道: > > 您好! > > >

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread 杨力
Thank you for your response. Registering a timer at Long.MaxValue works. And I have found the mistake in my original code. When a timer fires and there are elements in the priority queue with timestamp greater than current watermark, they do not get processed. A new timer should be registered for

Flink 1.8.1 HDFS 2.6.5 issue

2019-10-28 Thread V N, Suchithra (Nokia - IN/Bangalore)
Hi, I am trying to execute Wordcount.jar in Flink 1.8.1 with Hadoop version 2.6.5. HDFS is enabled with Kerberos+SSL. While writing output to HDFS, facing the below exception and job will be failed. Please let me know if any suggestions to debug this issue. Caused by:

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread David Anderson
The reason why the watermark is not advancing is that assignAscendingTimestamps is a periodic watermark generator. This style of watermark generator is called at regular intervals to create watermarks -- by default, this is done every 200 msec. With only a tiny bit of data to process, the job

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
Hi Michael, >From the WindowTranslationTest, I did not see anything about the initialization of mini-cluster. Here we are testing operator, it seems operator test harness has provided the necessary infrastructure. You can try to see if there is anything missed. Best, Vino Nguyen, Michael

PreAggregate operator with timeout trigger

2019-10-28 Thread Felipe Gutierrez
Hi all, I have my own stream operator which trigger an aggregation based on the number of items received (OneInputStreamOperator#processElement(StreamRecord)). However, it is possible to not trigger my aggregation if my operator does not receive the max items that have been set. So, I need a

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread Nguyen, Michael
Hi Vino, This is a great example – thank you! It looks like I need to instantiate a StreamExecutionEnvironment to order to get my OneInputStreamOperator. Would I need to setup a local flinkCluster using MiniClusterWithClientResource in order to use StreamExecutionEnvironment? Best, Michael

Re: Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread vino yang
Hi Michael, You may need to know `KeyedOneInputStreamOperatorTestHarness` test class. You can consider `WindowTranslationTest#testAggregateWithWindowFunctionEventTime` or `WindowTranslationTest#testAggregateWithWindowFunctionProcessingTime`[1](both of them call `processElementAndEnsureOutput`)

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread Dian Fu
Before a program close, it will emit Long.MaxValue as the watermark and that watermark will trigger all the windows. This is the reason why your `timeWindow` program could work. However, for the first program, you have not registered the event time timer(though

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread 杨力
It seems to be the case. But when I use timeWindow or CEP with fromCollection, it works well. For example, ``` sEnv.fromCollection(Seq[Long](1, 1002, 2002, 3002)).assignAscendingTimestamps(identity[Long]) .keyBy(_ % 2).timeWindow(Time.seconds(1)).sum(0).print() ``` prints ``` 1 1002 2002

Flink 的 log 文件夹下产生了 34G 日志

2019-10-28 Thread Henry
您好! 请问一下,发现flink任务失败了,然后程序自己重启也失败,然后就不停的写日志。产生了 34G log,没看出来这是因为啥导致的, 谢谢大家了。 日志内容如下: https://img-blog.csdnimg.cn/20191028155219828.png https://img-blog.csdnimg.cn/20191028155224293.png https://img-blog.csdnimg.cn/20191028155227494.png

Re: Cannot modify parallelism (rescale job) more than once

2019-10-28 Thread vino yang
Hi Pankaj, It seems it is a bug. You can report it by opening a Jira issue. Best, Vino Pankaj Chand 于2019年10月28日周一 上午10:51写道: > Hello, > > I am trying to modify the parallelism of a streaming Flink job (wiki-edits > example) multiple times on a standalone cluster (one local machine) having >

Re: Complex SQL Queries on Java Streams

2019-10-28 Thread Mohammed Tabreaz
Thanks for the feedback, clear about non blocking interfaces. However, can you clarify or guide me to any other libraries which can be used with java collections for complex analytics. On Mon, Oct 28, 2019, 11:29 Jörn Franke wrote: > Flink is merely StreamProcessing. I would not use it in a

Re: Complex SQL Queries on Java Streams

2019-10-28 Thread Jörn Franke
Flink is merely StreamProcessing. I would not use it in a synchronous web call. However, I would not make any complex analytic function available on a synchronous web service. I would deploy a messaging bus (rabbitmq, zeromq etc) and send the request there (if the source is a web app

Testing AggregateFunction() and ProcessWindowFunction() on KeyedDataStream

2019-10-28 Thread Nguyen, Michael
Hello everbody, Has anyone tried testing AggregateFunction() and ProcessWindowFunction() on a KeyedDataStream? I have reviewed the testing page on Flink’s official website (https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/testing.html) and I am not quite sure how I could

Complex SQL Queries on Java Streams

2019-10-28 Thread Mohammed Tabreaz
Recently we moved from Oracle to Cassandra. In Oracle we were using advance analytical functions such as lag, lead and Macth_Recognize heavily. I was trying to identify equivalent functionality in java, and came across Apache Flink, however I'm not sure if I should use that library in stand-alone

Re: Issue with writeAsText() to S3 bucket

2019-10-28 Thread Nguyen, Michael
Hi Fabian, Thank you for the response. So I am currently using .writeAsText() to print out 9 different datastreams in one Flink job as I am printing my original datastream with various filters applied to it. I usually see around 6-7 of my datastreams successfully list the JSON file in my S3

DynamoStreams Consumer millisBehindLatest metric

2019-10-28 Thread Vinay Patil
Hi, I am currently using FlinkDynamoStreamsConsumer in Production, for monitoring the lag I am relying on millisBehindLatest metric but this always returns -1 even if the dynamo stream contains million records upfront. Also, it would be great if we can add a documentation mentioning that Flink

Re: Custom Partitioning with keyed state

2019-10-28 Thread Congxian Qiu
Hi Have you tried the key selector function[1]? [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/api_concepts.html#define-keys-using-key-selector-functions Best, Congxian Heidi Hazem Mohamed 于2019年10月27日周日 下午11:04写道: > Hi, > > What I want : I have my own partitioning technique

Re: Watermark won't advance in ProcessFunction

2019-10-28 Thread Dian Fu
Hi, It generates watermark periodically by default in the underlying implementation of `assignAscendingTimestamps`. So for your test program, the watermark is still not generated yet and I think that's the reason why it's Long.MinValue. Regards, Dian > 在 2019年10月28日,上午11:59,杨力 写道: > >