Re: Debugging Kryo Fallback

2024-04-02 Thread Salva Alcántara
FYI Reposted in SO: - https://stackoverflow.com/questions/78265380/how-to-debug-the-kryo-fallback-in-flink On Thu, Mar 28, 2024 at 7:24 AM Salva Alcántara wrote: > I wonder which is the simplest way of troubleshooting/debugging what > causes the Kryo fallback. > > Detecting it is just a matter

Re: join two streams with pyflink

2024-04-02 Thread Biao Geng
Hi Thierry, Your case is not very complex and I believe all programming language(e.g. Java, Python, SQL) interfaces of flink can do that. When using pyflink, you can use pyflink datastream/table/SQL API. Here are some examples of using pyflink table api:

Re: How to handle tuple keys with null values

2024-04-02 Thread Hang Ruan
Hi Sachin. I think maybe we could cast the Long as String to handle the null value. Or as Asimansu said, try to filter out the null data. Best, Hang Asimansu Bera 于2024年4月3日周三 08:35写道: > Hello Sachin, > > The same issue had been reported in the past and JIRA was closed without > resolution. >

Execute Python UDF in Java Flink application

2024-04-02 Thread Zhou, Tony
Hi everyone, Out of curiosity, I have a high level question with Flink: I have a use case where I want to define some UDFs in Python while have the main logics written in Java. I am wondering how complex it is going to be with this design choice, or even if it is possible with Flink. Thanks,

Re: GCS FileSink Read Timeouts

2024-04-02 Thread Asimansu Bera
Hello Dylan, I'm not an expert. There are many configuration settings(tuning) which could be setup via flink configuration. Pls refer to the second link below - specifically retry options. https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/filesystems/gcs/

Re: How to handle tuple keys with null values

2024-04-02 Thread Asimansu Bera
Hello Sachin, The same issue had been reported in the past and JIRA was closed without resolution. https://issues.apache.org/jira/browse/FLINK-4823 I do see this is as a data quality issue. You need to understand what you would like to do with the null value. Either way, better to filter out

GCS FileSink Read Timeouts

2024-04-02 Thread Dylan Fontana via user
Hey Flink Users, We've been facing an issue with GCS that I'm keen to hear the community's thoughts or insights on. We're using the GCS FileSystem on a FileSink to write parquets in our Flink app. We're finding sporadic instances of `com.google.cloud.storage.StorageException: Read timed out`

Re: Understanding checkpoint/savepoint storage requirements

2024-04-02 Thread Robert Young
Thank you both for the information! Rob On Thu, Mar 28, 2024 at 7:08 PM Asimansu Bera wrote: > To add more details to it so that it will be clear why access to > persistent object stores for all JVM processes are required for a job graph > of Flink for consistent recovery. > *JoB Manager:* > >

How to handle tuple keys with null values

2024-04-02 Thread Sachin Mittal
Hello folks, I am keying my stream using a Tuple: example: public class MyKeySelector implements KeySelector> { @Override public Tuple2 getKey(Data data) { return Tuple2.of(data.id, data.id1); } } Now id1 can have null values. In this case how should I handle this? Right now I am getting

join two streams with pyflink

2024-04-02 Thread Fokou Toukam, Thierry
Hi, i have 2 streams as sean in this example (6> {"tripId": "275118740", "timestamp": "2024-04-02T06:20:00Z", "stopSequence": 13, "stopId": "61261", "bearing": 0.0, "speed": 0.0, "vehicleId": "39006", "routeId": "747"} 1> {"visibility": 1, "weather_conditions": "clear sky", "timestamp":