Re: Batch reading from Cassandra. How to?

2020-02-17 Thread Till Rohrmann
Hi Lasse, as far as I know, the best way to read from Cassandra is to use the CassandraInputFormat [1]. Unfortunately, there is no such optimized way to read a large amount of data as Spark offers it at the moment. But if you want to contribute this feature to Flink, then the community would

Re: Test sink behaviour

2020-02-17 Thread Till Rohrmann
Hi David, if you want to test the behavior together with S3, then you could check that S3 contains a file after the job has completed. If you want to test the failure and retry behaviour, then I would suggest to introduce an own abstraction for the S3 access which you can control. That way you

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
Hi Yang, I've just tried your suggestions, but, unfortunately, in yarn per job mode it doesn't work, both commands return null. I double checked that file is shipped to yarn container, but I feel that it happens later in process. At the moment I'm reading file with File interface, instead of

Re: Process stream multiple time with different KeyBy

2020-02-17 Thread Eduardo Winpenny Tejedor
Hi Sebastien, Without being entirely sure of what's your use case/end goal I'll tell you (some of) the options Flink provides you for defining a flow. If your use case is to apply the same rule to each of your "swimlanes" of data (one with category=foo AND subcategory=bar, another with

Process stream multiple time with different KeyBy

2020-02-17 Thread Lehuede sebastien
Hi all, I'm currently working on a Flink Application where I match events against a set of rules. At the beginning I wanted to dynamically create streams following the category of events (Event are JSON formatted and I've a field like "category":"foo" in each event) but I'm stuck by the

AW: Process stream multiple time with different KeyBy

2020-02-17 Thread theo.diefent...@scoop-software.de
Hi Sebastian, I'd also highly recommend a recent Flink blog post to you where exactly this question was answered in quote some detail : https://flink.apache.org/news/2020/01/15/demo-fraud-detection.html Best regardsTheo Ursprüngliche Nachricht Von: Eduardo Winpenny Tejedor

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Yang Wang
Hi Maxim, Both Yarn per-job and session cluster should work. Since before the JobManager and TaskManager launcher, Yarn NodeManager could guarantee that all the local resources have been localized and accessible. If do not want to use the getResource to read the file and use File interface

Re: job history server

2020-02-17 Thread Richard Moorhead
I did not know that. I have since wiped the directory. I will post when I see this error again. On Mon, Feb 17, 2020 at 8:03 PM Benchao Li wrote: > `df -H` only gives the sizes, not inodes information. Could you also show > us the result of `df -iH`? > > Richard Moorhead 于2020年2月18日周二

Re: Flink 'Job Cluster' mode Ui Access

2020-02-17 Thread Jatin Banger
Hi, Recently i upgraded flink version to 1.8.3 For Session cluster it shows the version correctly. But for job cluster. I get this in the logs *Starting StandaloneJobClusterEntryPoint (Version: , Rev:6322618, Date:04.09.2019 @ 22:07:41 CST)* And my Classpath has these jars: *Classpath:

job history server

2020-02-17 Thread Richard Moorhead
I see the following exception often: 2020-02-17 18:13:26,796 ERROR org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - Failure while fetching/processing job archive for job eaf0639027aca1624adaa100bdf1332e. java.nio.file.FileSystemException:

Re: job history server

2020-02-17 Thread Benchao Li
Hi Richard, Have you checked that inodes of the disk partition were full or not? Richard Moorhead 于2020年2月18日周二 上午8:16写道: > I see the following exception often: > > 2020-02-17 18:13:26,796 ERROR > org.apache.flink.runtime.webmonitor.history.HistoryServerArchiveFetcher - > Failure while

Re: job history server

2020-02-17 Thread Richard Moorhead
Yes, I did. I mentioned it last but I should have been clearer: 22526:~/ $ df -H [18:15:20] FilesystemSize Used Avail Use% Mounted on /dev/mapper/vg00-rootlv00 2.1G 777M 1.2G 41% / tmpfs 2.1G 753M 1.4G 37%

Re: job history server

2020-02-17 Thread Benchao Li
`df -H` only gives the sizes, not inodes information. Could you also show us the result of `df -iH`? Richard Moorhead 于2020年2月18日周二 上午9:40写道: > Yes, I did. I mentioned it last but I should have been clearer: > > 22526:~/ $ df -H > > >[18:15:20] > Filesystem

Can Connected Components run on a streaming dataset using iterate delta?

2020-02-17 Thread kant kodali
Hi All, I am wondering if connected components can run on a streaming data? or say incremental batch? I see that with delta iteration not all vertices need to participate at every iteration

Flink's Either type information

2020-02-17 Thread jacopo.gobbi
Hi all, How can an Either value be returned by a KeyedBroadcastProcessFunction? We keep getting "InvalidTypesException: Type extraction is not possible on Either type as it does not contain information about the 'left' type." when doing: out.collect(Either.Right(myObject)); Thanks, Jacopo

Parallelize Kafka Deserialization of a single partition?

2020-02-17 Thread Theo Diefenthal
Hi, As for most pipelines, our flink pipeline start with parsing source kafka events into POJOs. We perform this step within a KafkaDeserizationSchema so that we properly extract the event itme timestamp for the downstream Timestamp-Assigner. Now it turned out that parsing is currently the

Re: FlinkCEP questions - architecture

2020-02-17 Thread Kostas Kloudas
Hi Juergen, I will reply to your questions inline. As a general comment I would suggest to also have a look at [3] so that you have an idea of some of the alternatives. With that said, here come the answers :) 1) We receive files every day, which are exports from some database tables, containing

Re: Flink job fails with org.apache.kafka.common.KafkaException: Failed to construct kafka consumer

2020-02-17 Thread Piotr Nowojski
Hey, sorry but I know very little about the KafkaConsumer. I hope that someone else might know more. However, did you try to google this issue? It doesn’t sound like Flink specific problem, but like a general Kafka issue. Also a solution might be just as simple as bumping the limit of opened

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Maxim Parkachov
Hi Yang, thanks, this explains why classpath behavior changed, but now I struggle to understand how I could overwrite resource, which is already shipped in job jar. Before I had job.properties files in JAR in under resources/lib/job.properties for local development and deploying on cluster it

Re: [Flink 1.10] Classpath doesn't include custom files in lib/

2020-02-17 Thread Yang Wang
Hi Maxim, I have verified that the following two ways could both work. getClass().getClassLoader().getResource("lib/job.properties") getClass().getClassLoader().getResource("job.properties") Best, Yang Maxim Parkachov 于2020年2月17日周一 下午6:47写道: > Hi Yang, > > thanks, this explains why

Re: flink-1.10.0通过run -m yarn-cluster提交任务时异常

2020-02-17 Thread Weihua Hu
Hi, amenhub 你应该是要把作业提交到 yarn 上吧。这个错误应该没有正确的加载 FlinkYarnSessionCli 导致的,这些日志不是失败的根因。可以多提供一些日志看看。 Best Weihua Hu > 2020年2月18日 10:56,amenhub 写道: > > parseHostPortAddress

flink-1.10.0通过run -m yarn-cluster提交任务时异常

2020-02-17 Thread amenhub
各位好, The program finished with the following exception: java.lang.IllegalArgumentException: The given host:port ('yarn-cluster') doesn't contain a valid port at org.apache.flink.util.NetUtils.validateHostPortString(NetUtils.java:108) at

Flink 1.10执行tpc-ds求助

2020-02-17 Thread faaron zheng
我用的是之前flink tpc-ds性能优化比赛的代码来执行tpc-ds, flink版本是1.10。简单的修改了一下flink源码中ParquetTableSource的构造函数,支持传递TableSchema。但是在运行query1的时候报了如下的错,在校验数据源的时候decimal精度和范围不一致导致,而数据本生的格式就是Decimal(7,2)。请问这个校验是必须的么?我是应该在哪里转换么?  

Re:Re: flink-1.10.0通过run -m yarn-cluster提交任务时异常

2020-02-17 Thread amenhub
hi, Weihua 如你所说,我想要通过flink on yarn的run方式提交任务到集群上,但是当我运行./bin/flink run -m yarn-cluster ../examples/batch/WordCount.jar ,还是一样的错误, 日志信息只有这么一些;如果按您所说,是因为没有成功加载FlinkYarnSessionCli导致的,那导致没有成功加载的原因有哪些方面呢?谢谢! 祝好,amenhub 在 2020-02-18 11:29:13,"Weihua Hu" 写道: >Hi, amenhub > >你应该是要把作业提交到 yarn

Re: ParquetTableSource在blink table planner下的使用问题

2020-02-17 Thread jun su
hi Jark, 就是因为我的数据里 event_name 字段的value 没有 "没有这个值" , 所以才比较奇怪 Jark Wu 于2020年2月18日周二 下午12:15写道: > Hi jun, > > 这个是符合预期的行为哈。这说明你的 source 中有4条 event_name 的值是 '没有这个值' > > Best, > Jark > > On Mon, 17 Feb 2020 at 23:26, jun su wrote: > >> hi Jark Wu, >> >> 感谢你的帮助 , 我在之前的问询中还发现了一些别的问题: >> >>

Re: ParquetTableSource在blink table planner下的使用问题

2020-02-17 Thread Jark Wu
Hi jun, 这个是符合预期的行为哈。这说明你的 source 中有4条 event_name 的值是 '没有这个值' Best, Jark On Mon, 17 Feb 2020 at 23:26, jun su wrote: > hi Jark Wu, > > 感谢你的帮助 , 我在之前的问询中还发现了一些别的问题: > > 发现ParquetTableSource在flink table planner下, stream/batch 两个模式下都有这个情况: > 当select一个字段, 并且where条件有 = 判断的话, 输出结果是将where条件 >

Re: ParquetTableSource在blink table planner下的使用问题

2020-02-17 Thread Jark Wu
排查了下,确实是个 bug,我开了个 issue 来跟进解决: https://issues.apache.org/jira/browse/FLINK-16113 当前的 workaround 可以将常量放到 selelct 中,比如 select a,b,'windows进程创建' from MyTable where c = 'windows进程创建' Best, Jark On Mon, 17 Feb 2020 at 15:15, jun su wrote: > 上一个问题补充, 在blink table planner下: > > select event_name