Looking for MultipleLinearRegression in Flink

2021-12-14 Thread thekingofcity
Hi, I'm looking for multiple linear regression in recent Flink versions. I do find it in Flink 1.2 but have no idea where to find it in 1.10+. https://nightlies.apache.org/flink/flink-docs-release-1.2/dev/libs/ml/multiple_linear_regression.html I also find a flink-ml repo but can't find it eith

Direct buffer memory in job with hbase client

2021-12-14 Thread Anton
Hi, from time to time my job is stopping to process messages with warn message listed below. Tried to increase jobmanager.memory.process.size and taskmanager.memory.process.size but it didn't help. What else can I try? "Framework Off-heap" is 128mb now as seen is task manager dashboard and Task Of

Re: Hybrid Source with Parquet Files from GCS + KafkaSource

2021-12-14 Thread Meghajit Mazumdar
Hi, Thanks. I was able to get this working. Had to use recordFileFormat though. Is there a performance difference between FileRecordFormat and BulkFormat

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-14 Thread Yingjie Cao
Hi Till, Thanks for the suggestion. I think it makes a lot of sense to also extend the documentation for the sort shuffle to include a tuning guide. Best, Yingjie Till Rohrmann 于2021年12月14日周二 18:57写道: > As part of this FLIP, does it make sense to also extend the documentation > for the sort sh

Re: reading gz files

2021-12-14 Thread Caizhi Weng
Hi! Thanks for raising this issue. This is unfortunately a bug. I've created a JIRA ticket [1] and you can check the progress of this issue there. [1] https://issues.apache.org/jira/browse/FLINK-25311 Egor Ryashin 于2021年12月15日周三 02:33写道: > Hey, > > I’m using Flink 1.14 and having trouble inges

Re: Java dependencies management in Pyflink

2021-12-14 Thread Paul Lam
Hi Dian, Thanks a lot for your input. That’s a valid solution. We avoid using fat jars in Java API, because it easily leads to class conflicts. But PyFlink is like SQL API, user-imported Java dependencies are comparatively rare, so fat jar is a proper choice. Best, Paul Lam > 2021年12月14日 19:

Re: UDF and Broadcast State Pattern

2021-12-14 Thread Caizhi Weng
Hi! Currently you can't use broadcast state in Flink SQL UDF because UDFs are all stateless. However you mentioned your use case that you want to control the logic in UDF with some information. If that is the case, you can just run a thread in your UDF to read that information and change the beha

Re: Python Interop with Java defined org.apache.flink.api.common.functions.Function classes?

2021-12-14 Thread Dian Fu
Hi Kevin, You could try to use it as following: ``` from pyflink.java_gateway import get_gateway jvm = get_gateway().jvm ds = ( DataStream(ds._j_data_stream.map(jvm.com.example.MyJavaMapFunction())) ) ``` Regards, Dian On Wed, Dec 15, 2021 at 5:41 AM Kevin Lam wrote: > Hi all, > > We cur

Re: FileSource with Parquet Format - parallelism level

2021-12-14 Thread Krzysztof Chmielewski
Hi Arvid, thank you for your response. I did a little bit more digging and analyzing and I noticed one thing, Please correct me if I'm wrong. Whether the Parquet file will be read in parallel in fact depends on underlying file system. If the file system supports file blocks then we will have spli

UDF and Broadcast State Pattern

2021-12-14 Thread Krzysztof Chmielewski
Hi, Is there a way to build an UDF [1] for FLink SQL that can be used with Broadcast State Pattern [2]? I have a use case, where I would like to be able to use broadcast control stream to change logic in UDF. Regards, Krzysztof Chmielewski [1] https://nightlies.apache.org/flink/flink-docs-master

Python Interop with Java defined org.apache.flink.api.common.functions.Function classes?

2021-12-14 Thread Kevin Lam
Hi all, We currently operate several Flink applications using the Scala API, and run on kubernetes in Application mode. We're interested in researching the Python API and how we can support Python for application developers that prefer to use Python. We have a common library which implements a nu

unaligned checkpoint for job with large start delay

2021-12-14 Thread Mason Chen
Hi all, I'm using Flink 1.13 and my job is experiencing high start delay, more so than high alignment time. (our flip 27 kafka source is heavily backpressured). Since our alignment timeout is set to 1s, the unaligned checkpoint never triggers since alignment delay is always below the threshold. I

reading gz files

2021-12-14 Thread Egor Ryashin
Hey, I’m using Flink 1.14 and having trouble ingesting data from json gz file. I’ve successfully created a table but number of records is wrong. I’m using this SQL: create table i1( line_item_id STRING ) with ( 'connector'='filesystem', 'path'='/Users/egorryashin/temp/test.json', 'format' = '

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-14 Thread Yun Gao
Hi, > I think setting taskmanager.network.sort-shuffle.min-parallelism to 1 and > using sort-shuffle for all cases by default is a good suggestion. I am not > choosing this value mainly because two reasons: > 1. The first one is that it increases the usage of network memory which may > cause

Re: Sending an Alert to Slack, AWS sns, mattermost

2021-12-14 Thread Seth Wiesman
Sure, Just implement `RichSinkFunction`. You will initialize your client inside the open method and then send alerts from invoke. Seth On Mon, Dec 13, 2021 at 9:17 PM Robert Cullen wrote: > Yes, That's the correct use case. Will this work with the DataStream > API? UDFs are for the Table API

Re: Java dependencies management in Pyflink

2021-12-14 Thread Dian Fu
Hi Paul, For connectors(including Kafka), it's recommended to use the fat jar which contains the dependencies. For example, for kafka, you could use https://repo.maven.apache.org/maven2/org/apache/flink/flink-sql-connector-kafka_2.11/1.14.0/flink-sql-connector-kafka_2.11-1.14.0.jar Regards, Dian

Re: [DISCUSS] Change some default config values of blocking shuffle

2021-12-14 Thread Till Rohrmann
As part of this FLIP, does it make sense to also extend the documentation for the sort shuffle [1] to include a tuning guide? I am thinking of a more in depth description of what things you might observe and how to influence them with the configuration options. [1] https://nightlies.apache.org/fli

Java dependencies management in Pyflink

2021-12-14 Thread Paul Lam
Hi! I’m trying out PyFlink and looking for the best practice to manage Java dependencies. The docs recommends to use ‘pipeline-jars’ configuration or command line options to specify jars for a PyFlink job. However, PyFlink users may not know what Java dependencies is required. For example, a

Re: [DISCUSS] Strong read-after-write consistency of Flink FileSystems

2021-12-14 Thread David Morávek
Any other thoughts on the topic? If there are no concerns, I'd continue with creating a FLIP for changing the "written" contract of the Flink FileSystems to reflect this. Best, D. On Wed, Dec 8, 2021 at 5:53 PM David Morávek wrote: > Hi Martijn, > > I simply wasn't aware of that one :) It seem