guess.
Best,Zakelly
On Tue, Jan 30, 2024 at 2:15 PM Chirag Dewan via user
wrote:
Hi,
I was looking at the FLIP-254: Redis Streams Connector and I was wondering if
Flink ever considered Redis as a state backend? And if yes, why was it
discarded compared to RocksDB?
If someone can point me
Hi,
I was looking at the FLIP-254: Redis Streams Connector and I was wondering if
Flink ever considered Redis as a state backend? And if yes, why was it
discarded compared to RocksDB?
If someone can point me towards any deep dives on why RocksDB is a better fit
as a state backend, it would be
of defensive programming for a public interface and
the decision here is to be more lenient when facing potentially erroneous user
input rather than blow up the whole application with a NullPointerException.
Best,Alexander Fedulov
On Thu, 26 Oct 2023 at 07:35, Chirag Dewan via user
wrote:
Hi
Hi Arjun,
Flink's FileSource doesnt move or delete the files as of now. It will keep the
files as is and remember the name of the file read in checkpointed state to
ensure it doesnt read the same file twice.
Flink's source API works in a way that single Enumerator operates on the
JobManager.
Hi,
I was looking at this check in DefaultFileFilter:
public boolean test(Path path) {
final String fileName = path.getName();
if (fileName == null || fileName.length() == 0) {
return true;
}Was wondering how can a file name be null?
And if null, shouldnt this be return false?
* Rotate the keytab time
to time* The keytab can be encrypted at rest but that's fully custom logic
outside of Flink
G
On Fri, Sep 15, 2023 at 7:05 AM Chirag Dewan via user
wrote:
Hi,
I am trying to implement a HDFS Source connector that can collect files from
Kerberos enabled HDFS. As per
Hi,
I am trying to implement a HDFS Source connector that can collect files from
Kerberos enabled HDFS. As per the Kerberos support, I have provided my keytab
file to Job Managers and all the Task Managers.
Now, I understand that keytab file is a security concern and if left unsecured
can be
/docs/deployment/security/security-delegation-token/
G
On Tue, Sep 5, 2023 at 1:31 PM Chirag Dewan via user
wrote:
Hi,
I am trying to use the FileSource to collect files from HDFS. The HDFS cluster
is secured and has Kerberos enabled.
My Flink cluster runs on Kubernetes (not using the Fli
Hi,
I am trying to use the FileSource to collect files from HDFS. The HDFS cluster
is secured and has Kerberos enabled.
My Flink cluster runs on Kubernetes (not using the Flink operator) with 2 Job
Managers in HA and 3 Task Managers. I wanted to understand the correct way to
configure the
/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/reader/StreamFormat.java#L57
Best,Ron
Chirag Dewan via user 于2023年8月17日周四 12:00写道:
Hi,I am trying to collect files from HDFS in my DataStream job. I need to
collect two types of files - CSV and Parquet.
I understand that Flink
Hi,I am trying to collect files from HDFS in my DataStream job. I need to
collect two types of files - CSV and Parquet.
I understand that Flink supports both formats, but in Streaming mode, Flink
doesnt support splitting these formats. Splitting is only supported in Table
API.
I wanted to
Hi,
Can anyone share any experience on running Flink jobs across data centers?
I am trying to create a Multi site/Geo Replicated Kafka cluster. I want that my
Flink job to be closely colocated with my Kafka multi site cluster. If the
Flink job is bound to a single data center, I believe we will
Hi,
We are tying to use Flink's File sink to distribute files to AWS S3 storage. We
are using Flink provided Hadoop s3a connector as plugin.
We have some observations that we needed to clarify:
1. When using file sink for local filesystem distribution, we can see that the
sink creates 3
`CsvBulkWriter` and
create `FileSink` by `FileSink.forBulkFormat`. You can see the detail in
`DataStreamCsvITCase.testCustomBulkWriter`
Best,Shammon
On Tue, Mar 7, 2023 at 7:41 PM Chirag Dewan via user
wrote:
Hi,
I am working on a Java DataStream application and need to implement a File sink
Hi,
I am working on a Java DataStream application and need to implement a File sink
with CSV format.
I see that I have two options here - Row and Bulk
(https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/connectors/datastream/filesystem/#format-types-1)
So for CSV file distribution
Hi,
Is it possible to use Avro 1.11 with Flink 1.14? I know that Avro version is
still at 1.10, but due to my job using Avro 1.11, I was planning to use it in
Flink as well.
Also, I know that Avro 1.10 had some performance issues with Flink 1.12
([FLINK-19440] Performance regression on
Hi,
I need to manage geo-redundancy in my Kafka cluster across zones. I am planning
to do this with Apache Mirror Maker to maintain an active-passive site.
I wanted to understand consumer and producer failover when the primary cluster
fails. Is there any way to detect and failover Flink's Kafka
(i.e., the overview over all jobs).
On 16/02/2022 06:15, Chirag Dewan wrote:
Ah, should have looked better. I think
https://issues.apache.org/jira/browse/FLINK-25732 causes this.
Are there any side effects of this? How can I avoid this problem so that it
doesn't affect my processing
Ah, should have looked better. I think
https://issues.apache.org/jira/browse/FLINK-25732 causes this.
Are there any side effects of this? How can I avoid this problem so that it
doesn't affect my processing?
Thanks
On Wednesday, 16 February, 2022, 10:19:12 am IST, Chirag Dewan
wrote
Hi,
We are running a Flink cluster with 2 JMs in HA and 2 TMs on a standalone K8
cluster. After migrating to 1.14.3, we started to see some exceptions in the JM
logs:
2022-02-15 11:30:00,100 ERROR
org.apache.flink.runtime.rest.handler.job.JobIdsHandler [] POD_NAME:
sed by: org.apache.flink.util.SerializedThrowable:
java.lang.IllegalArgumentException: Key group 2 is not in
KeyGroupRange{startKeyGroup=64, endKeyGroup=95}.
I have checked that there's no concurrent access on the ValueState.
Any more leads?
Thanks,Chirag
On Monday, 7 June, 2021, 06:56:56 pm IST, Chirag Dewan
wrote:
by: org.apache.flink.util.SerializedThrowable:
java.lang.IllegalArgumentException: Key group 2 is not in
KeyGroupRange{startKeyGroup=64, endKeyGroup=95}.
I have checked that there's no concurrent access on the ValueState.
Any more leads?
Thanks,Chirag
On Monday, 7 June, 2021, 06:56:56 pm IST, Chirag Dewan
.
Does this lead to state corruption?
Thanks,Chirag
On Monday, 7 June, 2021, 08:54:39 am IST, Chirag Dewan
wrote:
Thanks for the reply Yun. I strangely don't see any nulls. And infact this
exception comes on the first few records and then job starts processing
normally.
Also, I don't see
Thanks for the reply Yun. I strangely don't see any nulls. And infact this
exception comes on the first few records and then job starts processing
normally.
Also, I don't see any reason for Concurrent access to the state in my code.
Could more CPU cores than task slots to the Task Manager be
Hi,
I am getting multiple exceptions while trying to use RocksDB as astate backend.
I have 2 Task Managers with 2 taskslots and 4 cores each.
Below is our setup:
Kafka(Topic with 2 partitions) ---> FlinkKafkaConsumer(2Parallelism) >
KeyedProcessFunction(4 Parallelism) >
the ProcessFunction on a keyed stream and
there you can use the TimerService. It is advised to use a KeyedProcessFunction
on a keyed stream, however for backwards compatibility the old behaviour has
been kept.
Hope that it clarifies the things a bit.
Best,
Dawid
On 17/03/2021 07:47, Chirag Dewan wrote
Hi,
Currently, both ProcessFunction and KeyedProcessFunction (and their CoProcess
counterparts) expose the Context and TimerService in the processElement()
method. However, if we use the TimerService in non keyed context, it gives a
runtime error.
I am a bit confused about these APIs. Is there
Hi,
I am intending to use the File source for a production use case. I have a few
use cases that are currently not supported like deleting a file once it's
processed.
So I was wondering if we can use this in production or write my own
implementation? Is there any recommendations around this?
it.
Cheers,
Till
On Mon, Feb 15, 2021 at 9:38 AM Chirag Dewan wrote:
Hi,
We configured Job Manager HA with Kubernetes strategy and found that the Web UI
for all 3 Job Managers is accessible on their configured rpc addresses. There's
no information on the Web UI that suggests which Job Manager
Hi,
We configured Job Manager HA with Kubernetes strategy and found that the Web UI
for all 3 Job Managers is accessible on their configured rpc addresses. There's
no information on the Web UI that suggests which Job Manager is the leader or
task managers are registered to. However, from the
Hi,
Can we have multiple replicas with ZK HA in K8 as well?In this case, how does
Task Managers and clients recover the Job Manager RPC address? Are they updated
in ZK?Also, since there are 3 replicas behind the same service endpoint and
only one of them is the leader, how should clients reach
Hi,
I am building an alerting system where based on some input events I need to
raise an alert from the user defined aggregate function.
My first approach was to use an asynchronous REST API to send alerts outside
the task slot. But this obviously involves IO from within the task and if I
Hi,
I am using Flink 1.7.2 with Kafka Connector 0.11 for Consuming records from
Kafka.
I observed that if the broker is down, Kafka Consumer does nothing but logs the
connection error and keeps on reconnecting to the broker. And infact the log
level seems to be DEBUG.
Is there any way to
Hi,
I was going through the Javadoc for CheckpointedFunction.java, it says that:
* // get the state data structure for the per-key state
* countPerKey = context.getKeyedStateStore().getReducingState(
* new ReducingStateDescriptor<>("perKeyCount", new
topic is related to event-time alignment in sources, which has been
actively discussed in the community in the past and we might be able to solve
this in a similar way in the future.
Cheers,
Konstantin
On Fri, Feb 8, 2019 at 5:48 PM Chirag Dewan wrote:
Hi Vadim,
I would be interested in this too.
Pr
Hi Vadim,
I would be interested in this too.
Presently, I have to read my lookup source in the open method and keep it in a
cache. By doing that I cannot make use of the broadcast state until ofcourse
the first emit comes on the Broadcast stream.
The problem with waiting the event stream is
Hi,
In the documentation, the JDBC sink is mentioned as a source on Table
API/stream.
Can I use the same sink with a Data stream as well?
My use case is to read the data from Kafka and send the data to Postgres.
I was also hoping to achieve Exactly-Once since these will mainly be Idempotent
Hi,
Is there some sort of endorsed lib in Flink yet?
A brief about my use case :
I am using a 3PP in my job which uses SLF4J as logging facade but has included
a log4j1 binding in its source code. And I am trying to use log4j2 for my Flink
application.
I wired Flink to use log4j2 - added all
(aggregateFunction, windowFunction) and register metrics in the
windowFunction?
Best,
Dawid
On 19/12/2018 04:30, Chirag Dewan wrote:
Hi,
I am writing a Flink job for aggregating events in a window.
I am trying to use the AggregateFunction implementation for this.
Now, since
I have a similar issue. I raised a JIRA :
https://issues.apache.org/jira/browse/FLINK-11198
Thanks,
Chirag
On Wednesday, 19 December, 2018, 11:35:02 AM IST, Fabian Hueske
wrote:
Hi,
AFAIK it is not possible to collect metrics for an AggregateFunction.You can
open a feature request by
Hi,
I am writing a Flink job for aggregating events in a window.
I am trying to use the AggregateFunction implementation for this.
Now, since WindowedStream does not allow a RichAggregateFunction for
aggregation, I cant use the RuntimeContext to get the Metric group.
I dont even see any other
Thread.currentThread().getContextClassLoader(), which always should have the
user-code ClassLoader set.
Best,Aljoscha
On 4. Oct 2018, at 12:14, Chirag Dewan wrote:
Hi All,
Is there any other way to get hold of the FlinkUserClassLoaderother than the
RuntimeContext?
The problem is, AggregateFunction
Hi All,
Is there any other way to get hold of the FlinkUserClassLoaderother than the
RuntimeContext?
The problem is, AggregateFunction cant be a RichFunction. I understand that's
because of the state merging issue(from a thread here earlier). Now, I need
DynamicClassLoading in
-to-broadcast-state-in-apache-flink
Am So., 30. Sep. 2018 um 10:48 Uhr schrieb Chirag Dewan
:
Thanks Lasse, that is rightly put. That's the only solution I can think of too.
Only thing which I can't get my head around is using the coMap and coFlatMap
functions with such a stream. Since they dont
is done first time is not simple but a simple solution
could be to implement a delay operation or keep the data in your process
function until data arrive from your database stream.
Med venlig hilsen / Best regardsLasse Nedergaard
Den 28. sep. 2018 kl. 06.28 skrev Chirag Dewan :
Hi,
I saw Apache
Hi,
I saw Apache Flink User Mailing List archive. - static/dynamic lookups in flink
streaming being discussed, and then I saw this FLIP
https://cwiki.apache.org/confluence/display/FLINK/FLIP-17+Side+Inputs+for+DataStream+API.
I know we havent made much progress on this topic. I still wanted to
#resuming-from- savepoints[2]: https://ci.apache.org/
projects/flink/flink-docs- release-1.5/monitoring/rest_
api.html#cancel-job-with- savepoint[3]: https://ci.apache.org/
projects/flink/flink-docs- release-1.5/ops/upgrading.html
Thanks, vino.
2018-07-19 14:25 GMT+08:00 Chirag Dewan :
Hi,
I am
Hi,
I am planning to use the Stop Service for stopping/resuming/pausing my Flink
Job. My intention is to stop sources before we take the savepoint i.e. stop
with savepoint.
I know that since Flink 1.4.2, Stop is not stable/not production ready.
With Flink 1.5 can it be used for stopping jobs?
Hi,
I am coming across a use case where I may have to run more than100 parallel
jobs(which may have different processing needs) on a Flink cluster.
My flink cluster, currently, has 1 Job Manager and 4/5 Task Managers depending
on the processing needed is running on a Kubernetes cluster with 3
Hi,
flink:latest docker image doesn't seem to work. I am not able to access the
Flink Dashboard after deploying it on Kubernetes.
Anyone else facing the issue?
Thanks,
Chirag
Hi,
I am evaluating some File Systems as state backend. I can see that Flink
currently supports S3, MAPRFS and HDFS as file systems.
However, I was hoping I can use Gluster as my state backend, since its already
a part of existing eco system. Since I have stateful operators in my job and I
am
Hi,
I am working on a use case where my Flink job needs to collect data from
thousands of sources.
As an example, I want to collect data from more than 2000 File Directories,
process(filter, transform) the data and distribute the processed data streams
to 200 different directories.
Are there
Hi,
I am trying to use Gluster File System as my FileSystem backed by RocksDB as
state backend. I can see from FsCheckpointStateOutputStream that the
DEFAULT_WRITE_BUFFER_SIZE = 4096.
Is the buffer size configurable in any way? Any idea about the checkpointing
performance with default buffer
I think you are looking for jobmanager.web.tmpdir along with upload.dir
>From the documentation :
-
jobmanager.web.tmpdir: This configuration parameter allows defining the Flink
web directory to be used by the web interface. The web interface will copy its
static files into the
ackend. This is because of some
internal implementation details that allow the FS checkpoints to be slightly
more consise in the file format but we might „de-optimize“ this minor
difference for the sake of compatibility in the near future.
Am 26.04.2018 um 15:22 schrieb Chirag Dewan <ch
resume your job (savepoints).
Best,Stefan
Am 26.04.2018 um 13:16 schrieb Chirag Dewan <chirag.dewa...@yahoo.in>:
Hi,
I am working on a use case where I need to store a large amount of data in
state. I am using RocksDB as my state backend. Now to ensure data replication,
I want to store the R
Hi,
I am working on a use case where I need to store a large amount of data in
state. I am using RocksDB as my state backend. Now to ensure data replication,
I want to store the RocksDB files in some distributed file system.
>From the documentation I can see that Flink recommends a list of
, which flushes any buffered records.
Is my understanding correct here? Or am I still missing something?
thanks,
Chirag
On Monday, 12 March, 2018, 12:59:51 PM IST, Chirag Dewan
<chirag.dewa...@yahoo.in> wrote:
Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE se
Hi LiYue,
This should help : Apache Flink 1.5-SNAPSHOT Documentation: Windows
|
|
| |
Apache Flink 1.5-SNAPSHOT Documentation: Windows
|
|
|
So basically you need to register a processing time trigger at every 10 minutes
and on callback, you can FIRE the window result like this:
Hi,
I am trying to use Kafka Sink 0.11 with ATLEAST_ONCE semantic and experiencing
some data loss on Task Manager failure.
Its a simple job with parallelism=1 and a single Task Manager. After a few
checkpoints(kafka flush's) i kill one of my Task Manager running as a container
on Docker Swarm.
should be roughly the same settings that you use in your JobManager. They
are described here:
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#zookeeper-based-ha-mode
On 14. Feb 2018, at 15:32, Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
Thanks Aljoscha.
I haven't chec
rectly connect to ZooKeeper
as well? They need this in order to find the JobManager leader.
Best,Aljoscha
On 14. Feb 2018, at 06:12, Chirag Dewan <chirag.dewa...@yahoo.in> wrote:
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For
JobManager HA, I have started a 3 no
Hi,
I am trying to deploy a Flink cluster (1 JM, 2TM) on a Docker Swarm. For
JobManager HA, I have started a 3 node zookeeper service on the same swarm
network and configured Flink's zookeeper quorum with zookeeper service
instances.
JobManager gets started with the LeaderElectionService and
63 matches
Mail list logo