Re: Redis as a State Backend

2024-01-31 Thread Chirag Dewan via user
guess. Best,Zakelly On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user wrote: Hi, I was looking at the FLIP-254: Redis Streams Connector and I was wondering if Flink ever considered Redis as a state backend? And if yes, why was it discarded compared to RocksDB?  If someone can point me

Redis as a State Backend

2024-01-29 Thread Chirag Dewan via user
Hi, I was looking at the FLIP-254: Redis Streams Connector and I was wondering if Flink ever considered Redis as a state backend? And if yes, why was it discarded compared to RocksDB?  If someone can point me towards any deep dives on why RocksDB is a better fit as a state backend, it would be

Re: Invalid Null Check in DefaultFileFilter

2023-10-26 Thread Chirag Dewan via user
of defensive programming for a public interface and the decision here is to be more lenient when facing potentially erroneous user input rather than blow up the whole application with a NullPointerException. Best,Alexander Fedulov On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user wrote: Hi

Re: Enhancing File Processing and Kafka Integration with Flink Jobs

2023-10-26 Thread Chirag Dewan via user
Hi Arjun, Flink's FileSource doesnt move or delete the files as of now. It will keep the files as is and remember the name of the file read in checkpointed state to ensure it doesnt read the same file twice. Flink's source API works in a way that single Enumerator operates on the JobManager.

Invalid Null Check in DefaultFileFilter

2023-10-25 Thread Chirag Dewan via user
Hi, I was looking at this check in DefaultFileFilter: public boolean test(Path path) { final String fileName = path.getName(); if (fileName == null || fileName.length() == 0) { return true; }Was wondering how can a file name be null? And if null, shouldnt this be return false?

Re: Securing Keytab File in Flink

2023-09-26 Thread Chirag Dewan via user
* Rotate the keytab time to time* The keytab can be encrypted at rest but that's fully custom logic outside of Flink G On Fri, Sep 15, 2023 at 7:05 AM Chirag Dewan via user wrote: Hi, I am trying to implement a HDFS Source connector that can collect files from Kerberos enabled HDFS. As per

Securing Keytab File in Flink

2023-09-14 Thread Chirag Dewan via user
Hi, I am trying to implement a HDFS Source connector that can collect files from Kerberos enabled HDFS. As per the Kerberos support, I have provided my keytab file to Job Managers and all the Task Managers. Now, I understand that keytab file is a security concern and if left unsecured can be

Re: Keytab Setup on Kubernetes

2023-09-06 Thread Chirag Dewan via user
/docs/deployment/security/security-delegation-token/ G On Tue, Sep 5, 2023 at 1:31 PM Chirag Dewan via user wrote: Hi, I am trying to use the FileSource to collect files from HDFS. The HDFS cluster is secured and has Kerberos enabled. My Flink cluster runs on Kubernetes (not using the Fli

Keytab Setup on Kubernetes

2023-09-05 Thread Chirag Dewan via user
Hi, I am trying to use the FileSource to collect files from HDFS. The HDFS cluster is secured and has Kerberos enabled. My Flink cluster runs on Kubernetes (not using the Flink operator) with 2 Job Managers in HA and 3 Task Managers. I wanted to understand the correct way to configure the

Re: Splitting in Stream Formats for File Source

2023-08-21 Thread Chirag Dewan via user
/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java#L57 Best,Ron Chirag Dewan via user 于2023年8月17日周四 12:00写道: Hi,I am trying to collect files from HDFS in my DataStream job. I need to collect two types of files - CSV and Parquet.  I understand that Flink

Splitting in Stream Formats for File Source

2023-08-16 Thread Chirag Dewan via user
Hi,I am trying to collect files from HDFS in my DataStream job. I need to collect two types of files - CSV and Parquet.  I understand that Flink supports both formats, but in Streaming mode, Flink doesnt support splitting these formats. Splitting is only supported in Table API. I wanted to

Flink Job across Data Centers

2023-04-12 Thread Chirag Dewan via user
Hi, Can anyone share any experience on running Flink jobs across data centers? I am trying to create a Multi site/Geo Replicated Kafka cluster. I want that my Flink job to be closely colocated with my Kafka multi site cluster. If the Flink job is bound to a single data center, I believe we will

Questions on S3 File Sink Behavior

2023-03-29 Thread Chirag Dewan via user
Hi,   We are tying to use Flink's File sink to distribute files to AWS S3 storage. We are using Flink provided Hadoop s3a connector as plugin. We have some observations that we needed to clarify: 1. When using file sink for local filesystem distribution, we can see that the sink creates 3

Re: CSV File Sink in Streaming Use Case

2023-03-10 Thread Chirag Dewan via user
`CsvBulkWriter` and create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in `DataStreamCsvITCase.testCustomBulkWriter` Best,Shammon On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user wrote: Hi, I am working on a Java DataStream application and need to implement a File sink

CSV File Sink in Streaming Use Case

2023-03-07 Thread Chirag Dewan via user
Hi, I am working on a Java DataStream application and need to implement a File sink with CSV format. I see that I have two options here - Row and Bulk (https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1) So for CSV file distribution

Avro 1.11 with Flink 1.14

2022-07-26 Thread Chirag Dewan via user
Hi, Is it possible to use Avro 1.11 with Flink 1.14? I know that Avro version is still at 1.10, but due to my job using Avro 1.11, I was planning to use it in Flink as well.  Also, I know that Avro 1.10 had some performance issues with Flink 1.12 ([FLINK-19440] Performance regression on

Kafka Source in a Geo Replicated Kafka Cluster

2022-03-02 Thread Chirag Dewan
Hi, I need to manage geo-redundancy in my Kafka cluster across zones. I am planning to do this with Apache Mirror Maker to maintain an active-passive site. I wanted to understand consumer and producer failover when the primary cluster fails. Is there any way to detect and failover Flink's Kafka

Re: Failed to serialize the result for RPC call : requestMultipleJobDetails after Upgrading to Flink 1.14.3

2022-02-16 Thread Chirag Dewan
(i.e., the overview over all jobs). On 16/02/2022 06:15, Chirag Dewan wrote: Ah, should have looked better. I think  https://issues.apache.org/jira/browse/FLINK-25732 causes this. Are there any side effects of this? How can I avoid this problem so that it doesn't affect my processing

Re: Failed to serialize the result for RPC call : requestMultipleJobDetails after Upgrading to Flink 1.14.3

2022-02-15 Thread Chirag Dewan
Ah, should have looked better. I think  https://issues.apache.org/jira/browse/FLINK-25732 causes this. Are there any side effects of this? How can I avoid this problem so that it doesn't affect my processing? Thanks On Wednesday, 16 February, 2022, 10:19:12 am IST, Chirag Dewan wrote

Failed to serialize the result for RPC call : requestMultipleJobDetails after Upgrading to Flink 1.14.3

2022-02-15 Thread Chirag Dewan
Hi, We are running a Flink cluster with 2 JMs in HA and 2 TMs on a standalone K8 cluster. After migrating to 1.14.3, we started to see some exceptions in the JM logs: 2022-02-15 11:30:00,100 ERROR org.apache.flink.runtime.rest.handler.job.JobIdsHandler      [] POD_NAME:

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-09 Thread Chirag Dewan
sed by: org.apache.flink.util.SerializedThrowable: java.lang.IllegalArgumentException: Key group 2 is not in KeyGroupRange{startKeyGroup=64, endKeyGroup=95}. I have checked that there's no concurrent access on the ValueState. Any more leads? Thanks,Chirag On Monday, 7 June, 2021, 06:56:56 pm IST, Chirag Dewan wrote:

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-08 Thread Chirag Dewan
by: org.apache.flink.util.SerializedThrowable: java.lang.IllegalArgumentException: Key group 2 is not in KeyGroupRange{startKeyGroup=64, endKeyGroup=95}. I have checked that there's no concurrent access on the ValueState. Any more leads? Thanks,Chirag On Monday, 7 June, 2021, 06:56:56 pm IST, Chirag Dewan

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-07 Thread Chirag Dewan
. Does this lead to state corruption? Thanks,Chirag On Monday, 7 June, 2021, 08:54:39 am IST, Chirag Dewan wrote: Thanks for the reply Yun. I strangely don't see any nulls. And infact this exception comes on the first few records and then job starts processing normally. Also, I don't see

Re: Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-06 Thread Chirag Dewan
Thanks for the reply Yun. I strangely don't see any nulls. And infact this exception comes on the first few records and then job starts processing normally. Also, I don't see any reason for Concurrent access to the state in my code. Could more CPU cores than task slots to the Task Manager be

Multiple Exceptions during Load Test in State Access APIs with RocksDB

2021-06-05 Thread Chirag Dewan
Hi, I am getting multiple exceptions while trying to use RocksDB as astate backend.  I have 2 Task Managers with 2 taskslots and 4 cores each.  Below is our setup:   Kafka(Topic with 2 partitions) ---> FlinkKafkaConsumer(2Parallelism) > KeyedProcessFunction(4 Parallelism) >

Re: The Role of TimerService in ProcessFunction

2021-03-25 Thread Chirag Dewan
the ProcessFunction on a keyed stream and there you can use the TimerService. It is advised to use a KeyedProcessFunction on a keyed stream, however for backwards compatibility the old behaviour has been kept. Hope that it clarifies the things a bit. Best, Dawid On 17/03/2021 07:47, Chirag Dewan wrote

The Role of TimerService in ProcessFunction

2021-03-17 Thread Chirag Dewan
Hi, Currently, both ProcessFunction and KeyedProcessFunction (and their CoProcess counterparts) expose the Context and TimerService in the processElement() method. However, if we use the TimerService in non keyed context, it gives a runtime error.  I am a bit confused about these APIs. Is there

Production Readiness of File Source

2021-03-16 Thread Chirag Dewan
Hi, I am intending to use the File source for a production use case. I have a few use cases that are currently not supported like deleting a file once it's processed.  So I was wondering if we can use this in production or write my own implementation? Is there any recommendations around this?

Re: Understanding Job Manager Web UI in HA Mode

2021-02-15 Thread Chirag Dewan
it. Cheers, Till On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan wrote: Hi, We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager

Understanding Job Manager Web UI in HA Mode

2021-02-15 Thread Chirag Dewan
Hi, We configured Job Manager HA with Kubernetes strategy and found that the Web UI for all 3 Job Managers is accessible on their configured rpc addresses. There's no information on the Web UI that suggests which Job Manager is the leader or task managers are registered to. However, from the

Re: Flink Jobmanager HA deployment on k8s

2021-01-19 Thread Chirag Dewan
Hi, Can we have multiple replicas with ZK HA in K8 as well?In this case, how does Task Managers and clients recover the Job Manager RPC address? Are they updated in ZK?Also, since there are 3 replicas behind the same service endpoint and only one of them is the leader, how should clients reach

Throwing Recoverable Exceptions from Tasks

2020-12-27 Thread Chirag Dewan
Hi, I am building an alerting system where based on some input events I need to raise an alert from the user defined aggregate function.  My first approach was to use an asynchronous REST API to send alerts outside the task slot. But this obviously involves IO from within the task and if I

Flink Kafka Connection Failure Notifications

2019-04-30 Thread Chirag Dewan
Hi, I am using Flink 1.7.2 with Kafka Connector 0.11 for Consuming records from Kafka.  I observed that if the broker is down, Kafka Consumer does nothing but logs the connection error and keeps on reconnecting to the broker. And infact the log level seems to be DEBUG.  Is there any way to

Incorrect Javadoc in CheckpointedFunction.java?

2019-02-14 Thread Chirag Dewan
Hi, I was going through the Javadoc for CheckpointedFunction.java, it says that: * // get the state data structure for the per-key state * countPerKey = context.getKeyedStateStore().getReducingState( * new ReducingStateDescriptor<>("perKeyCount", new

Re: Broadcast state before events stream consumption

2019-02-13 Thread Chirag Dewan
topic is related to event-time alignment in sources, which has been actively discussed in the community in the past and we might be able to solve this in a similar way in the future.  Cheers,  Konstantin On Fri, Feb 8, 2019 at 5:48 PM Chirag Dewan wrote: Hi Vadim, I would be interested in this too.  Pr

Re: Broadcast state before events stream consumption

2019-02-08 Thread Chirag Dewan
Hi Vadim, I would be interested in this too.  Presently, I have to read my lookup source in the open method and keep it in a cache. By doing that I cannot make use of the broadcast state until ofcourse the first emit comes on the Broadcast stream. The problem with waiting the event stream is

JDBCAppendTableSink on Data stream

2019-02-05 Thread Chirag Dewan
Hi, In the documentation, the JDBC sink is mentioned as a source on Table API/stream.  Can I use the same sink with a Data stream as well? My use case is to read the data from Kafka and send the data to Postgres. I was also hoping to achieve Exactly-Once since these will mainly be Idempotent

Endorsed lib in Flink

2019-02-05 Thread Chirag Dewan
Hi, Is there some sort of endorsed lib in Flink yet? A brief about my use case : I am using a 3PP in my job which uses SLF4J as logging facade but has included a log4j1 binding in its source code. And I am trying to use log4j2 for my Flink application. I wired Flink to use log4j2 - added all

Re: Custom Metrics in Windowed AggregateFunction

2018-12-19 Thread Chirag Dewan
(aggregateFunction, windowFunction) and register metrics in the windowFunction? Best, Dawid On 19/12/2018 04:30, Chirag Dewan wrote: Hi, I am writing a Flink job for aggregating events in a window.  I am trying to use the AggregateFunction implementation for this.  Now, since

Re: Need the way to create custom metrics

2018-12-18 Thread Chirag Dewan
I have a similar issue. I raised a JIRA :  https://issues.apache.org/jira/browse/FLINK-11198 Thanks, Chirag On Wednesday, 19 December, 2018, 11:35:02 AM IST, Fabian Hueske wrote: Hi, AFAIK it is not possible to collect metrics for an AggregateFunction.You can open a feature request by

Custom Metrics in Windowed AggregateFunction

2018-12-18 Thread Chirag Dewan
Hi, I am writing a Flink job for aggregating events in a window.  I am trying to use the AggregateFunction implementation for this.  Now, since WindowedStream does not allow a RichAggregateFunction for aggregation, I cant use the RuntimeContext to get the Metric group.  I dont even see any other

Re: FlinkUserClassLoader in AggregateFunction

2018-10-05 Thread Chirag Dewan
  Thread.currentThread().getContextClassLoader(), which always should have the user-code ClassLoader set. Best,Aljoscha On 4. Oct 2018, at 12:14, Chirag Dewan wrote: Hi All, Is there any other way to get hold of the FlinkUserClassLoaderother than the RuntimeContext? The problem is, AggregateFunction

FlinkUserClassLoader in AggregateFunction

2018-10-04 Thread Chirag Dewan
Hi All, Is there any other way to get hold of the FlinkUserClassLoaderother than the RuntimeContext? The problem is, AggregateFunction cant be a RichFunction. I understand that's  because of the state merging issue(from a thread here earlier). Now, I need DynamicClassLoading in

Re: In-Memory Lookup in Flink Operators

2018-10-04 Thread Chirag Dewan
-to-broadcast-state-in-apache-flink Am So., 30. Sep. 2018 um 10:48 Uhr schrieb Chirag Dewan : Thanks Lasse, that is rightly put. That's the only solution I can think of too. Only thing which I can't get my head around is using the coMap and coFlatMap functions with such a stream. Since they dont

Re: In-Memory Lookup in Flink Operators

2018-09-30 Thread Chirag Dewan
is done first time is not simple but a simple solution could be to implement a delay operation or keep the data in your process function until data arrive from your database stream.  Med venlig hilsen / Best regardsLasse Nedergaard Den 28. sep. 2018 kl. 06.28 skrev Chirag Dewan : Hi, I saw Apache

In-Memory Lookup in Flink Operators

2018-09-27 Thread Chirag Dewan
Hi, I saw Apache Flink User Mailing List archive. - static/dynamic lookups in flink streaming being discussed, and then I saw this FLIP  https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API.   I know we havent made much progress on this topic. I still wanted to

Re: Production readiness of Flink Job Stop Service

2018-07-22 Thread Chirag Dewan
#resuming-from- savepoints[2]: https://ci.apache.org/ projects/flink/flink-docs- release-1.5/monitoring/rest_ api.html#cancel-job-with- savepoint[3]: https://ci.apache.org/ projects/flink/flink-docs- release-1.5/ops/upgrading.html Thanks, vino. 2018-07-19 14:25 GMT+08:00 Chirag Dewan : Hi, I am

Production readiness of Flink Job Stop Service

2018-07-19 Thread Chirag Dewan
Hi, I am planning to use the Stop Service for stopping/resuming/pausing my Flink Job. My intention is to stop sources before we take the savepoint i.e. stop with savepoint.  I know that since Flink 1.4.2, Stop is not stable/not production ready.  With Flink 1.5 can it be used for stopping jobs?

Hundreds of parallel jobs on Flink Cluster

2018-06-07 Thread Chirag Dewan
Hi, I am coming across a use case where I may have to run more than100 parallel jobs(which may have different processing needs) on a Flink cluster.  My flink cluster, currently, has 1 Job Manager and 4/5 Task Managers depending on the processing needed is running on a Kubernetes cluster with 3

Is Flink:1.5 Docker image broken?

2018-05-30 Thread Chirag Dewan
Hi, flink:latest docker image doesn't seem to work. I am not able to access the Flink Dashboard after deploying it on Kubernetes.   Anyone else facing the issue? Thanks, Chirag 

Gluster as file system for state backend

2018-05-30 Thread Chirag Dewan
Hi, I am evaluating some File Systems as state backend. I can see that Flink currently supports S3, MAPRFS and HDFS as file systems.  However, I was hoping I can use Gluster as my state backend, since its already a part of existing eco system. Since I have stateful operators in my job and I am

Large number of sources in Flink Job

2018-05-27 Thread Chirag Dewan
Hi, I am working on a use case where my Flink job needs to collect data from thousands of sources.  As an example, I want to collect data from more than 2000 File Directories, process(filter, transform) the data and distribute the processed data streams to 200 different directories. Are there

Flink FSStateBackend Checkpointing Buffer size

2018-05-14 Thread Chirag Dewan
Hi, I am trying to use Gluster File System as my FileSystem backed by RocksDB as state backend. I can see from FsCheckpointStateOutputStream that the  DEFAULT_WRITE_BUFFER_SIZE = 4096. Is the buffer size configurable in any way? Any idea about the checkpointing performance with default buffer

Re: Retaining uploaded job jars on Flink HA restarts on Kubernetes

2018-05-07 Thread Chirag Dewan
I think you are looking for jobmanager.web.tmpdir along with upload.dir  >From the documentation : - jobmanager.web.tmpdir: This configuration parameter allows defining the Flink web directory to be used by the web interface. The web interface will copy its static files into the

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Chirag Dewan
ackend. This is because of some internal implementation details that allow the FS checkpoints to be slightly more consise in the file format but we might „de-optimize“ this minor difference for the sake of compatibility in the near future. Am 26.04.2018 um 15:22 schrieb Chirag Dewan <ch

Re: Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Chirag Dewan
resume your job (savepoints). Best,Stefan Am 26.04.2018 um 13:16 schrieb Chirag Dewan <chirag.dewa...@yahoo.in>: Hi, I am working on a use case where I need to store a large amount of data in state. I am using RocksDB as my state backend. Now to ensure data replication, I want to store the R

Using RocksDB as State Backend over a Distributed File System

2018-04-26 Thread Chirag Dewan
Hi, I am working on a use case where I need to store a large amount of data in state. I am using RocksDB as my state backend. Now to ensure data replication, I want to store the RocksDB files in some distributed file system. >From the documentation I can see that Flink recommends a list of

Re: Record Delivery Guarantee with Kafka 1.0.0

2018-03-13 Thread Chirag Dewan
, which flushes any buffered records.  Is my understanding correct here? Or am I still missing something?   thanks, Chirag   On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: Hi, I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE se

Re: Partial aggregation result sink

2018-03-12 Thread Chirag Dewan
Hi LiYue, This should help : Apache Flink 1.5-SNAPSHOT Documentation: Windows | | | | Apache Flink 1.5-SNAPSHOT Documentation: Windows | | | So basically you need to register a processing time trigger at every 10 minutes and on callback, you can FIRE the window result like this:  

Record Delivery Guarantee with Kafka 1.0.0

2018-03-12 Thread Chirag Dewan
Hi, I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing some data loss on Task Manager failure. Its a simple job with parallelism=1 and a single Task Manager. After a few checkpoints(kafka flush's) i kill one of my Task Manager running as a container on Docker Swarm. 

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Chirag Dewan
should be roughly the same settings that you use in your JobManager. They are described here:  https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode On 14. Feb 2018, at 15:32, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: Thanks Aljoscha. I haven't chec

Re: Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-14 Thread Chirag Dewan
rectly connect to ZooKeeper as well? They need this in order to find the JobManager leader. Best,Aljoscha On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote: Hi, I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager HA, I have started a 3 no

Deploying Flink with JobManager HA on Docker Swarm/Kubernetes

2018-02-13 Thread Chirag Dewan
Hi, I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For JobManager HA, I have started a 3 node zookeeper service on the same swarm network and configured Flink's zookeeper quorum with zookeeper service instances.  JobManager gets started with the LeaderElectionService and