write into parquet with variable number of columns

2021-08-05 Thread Sharipov, Rinat
va:334)* Maybe someone has tried this feature and can guess what's wrong with the current code and how to make it work. Anyway I have a fallback - accumulate a butch of events, define the schema for them and write into file system manually, but I still hope that I can do this in more elegant way. Thx for your advice and time ! -- Best Regards, *Sharipov Rinat* CleverDATA make your data clever

Re: [Flink::Test] access registered accumulators via harness

2020-10-30 Thread Sharipov, Rinat
uot;myCustomCounter1").getCount(), equalTo(1)); assertThat(harness.getMetricGroup().counter("myCustomCounter2").getCount(), equalTo(0)); } } What do you think about such a harness API proposal ? Thx ! пт, 30 окт. 2020 г. в 12:54, Dawid Wysakowicz : > Hi Rinat, > > F

[Flink::Test] access registered accumulators via harness

2020-10-27 Thread Sharipov, Rinat
Hi mates ! I guess that I'm doing something wrong, but I couldn't find a way to access registered accumulators and their values via *org.apache.flink.streaming.util.**ProcessFunctionTestHarness *function wrapper that I'm using to test my functions. During the code research I've found, that

Re: PyFlink :: Bootstrap UDF function

2020-10-15 Thread Sharipov, Rinat
Hi Dian ! Thx a lot for your reply, it's very helpful for us. чт, 15 окт. 2020 г. в 04:30, Dian Fu : > Hi Rinat, > > It's called in single thread fashion and so there is no need for the > synchronization. > > Besides, there is a pair of open/close methods in the ScalarFuncti

PyFlink :: Bootstrap UDF function

2020-10-14 Thread Sharipov, Rinat
Hi mates ! I keep moving in my research of new features of PyFlink and I'm really excited about that functionality. My main goal is to understand how to integrate our ML registry, powered by ML Flow and PyFlink jobs and what restrictions we have. I need to bootstrap the UDF function on it's

Re: PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-13 Thread Sharipov, Rinat
t. > > Best, > Xingbo > > Sharipov, Rinat 于2020年10月13日周二 上午1:56写道: > >> Hi mates ! >> >> I'm very new at pyflink and trying to register a custom UDF function >> using python API. >> Currently I faced an issue in both server env and my local IDE >>

PyFlink 1.11.2 couldn’t configure [taskmanager.memory.task.off-heap.size] property when registering custom UDF function

2020-10-12 Thread Sharipov, Rinat
Hi mates ! I'm very new at pyflink and trying to register a custom UDF function using python API. Currently I faced an issue in both server env and my local IDE environment. When I'm trying to execute the example below I got an error message: *The configured Task Off-Heap Memory 0 bytes is less

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
Hi Dian, thx for your reply ! I was wondering to replace UDF on the fly from Flink, of course I'm pretty sure that it's possible to implement update logic directly in Python, thx for idea Regards, Rinat пн, 12 окт. 2020 г. в 14:20, Dian Fu : > Hi Rinat, > > Do you want to replace

Re: [PyFlink] update udf functions on the fly

2020-10-12 Thread Sharipov, Rinat
is - can I update the UDF function on the fly without a job restart ? Because new model versions become available on a daily basis and we should use them as soon as possible. Thx ! пн, 12 окт. 2020 г. в 11:32, Arvid Heise : > Hi Rinat, > > Which API are you using? If you use datas

Re: [PyFlink] register udf functions with different versions of the same library in the same job

2020-10-12 Thread Sharipov, Rinat
what would be even better with different > python environments and they won't clash > > A PyFlink job All nodes use the same python environment path currently. So > there is no way to make each UDF use a different python execution > environment. Maybe you need to use multiple jobs to a

[PyFlink] update udf functions on the fly

2020-10-10 Thread Sharipov, Rinat
Hi mates ! I'm in the beginning of the road of building a recommendation pipeline on top of Flink. I'm going to register a list of UDF python functions on job startups where each UDF is an ML model. Over time new model versions appear in the ML registry and I would like to update my UDF

[PyFlink] register udf functions with different versions of the same library in the same job

2020-10-09 Thread Sharipov, Rinat
Hi mates ! I've just read an amazing article about PyFlink and I'm absolutely delighted. I got some questions about udf registration, and it seems that it's possible to specify the list of libraries that

“feedback loop” and checkpoints in itearative streams

2020-04-04 Thread Sharipov, Rinat
Hi mates, for some reason, it's necessary to create a feedback look in my streaming application. The best choice to implement it was iterative stream, but at the moment of job implementation (flink version is 1.6.1) it wasn't checkpointed. So I decided to write this output into kafka. As I see,

StreamingFileSink with hdfs less than 2.7

2019-06-17 Thread Rinat
from tmp catalog move file to the final state what do you think about it ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

Re: [Flink 1.6.1] _metadata file in retained checkpoint

2019-06-14 Thread Rinat
Hi Vasyl, thx for your reply, I’ll check > On 10 Jun 2019, at 14:22, Vasyl Bervetskyi wrote: > > Hi Rinat, > > Savepoint need to be triggered when you want to create point in time which > you want to use in future to revert back your state, also you could cancel > job

[Flink 1.6.1] _metadata file in retained checkpoint

2019-06-05 Thread Rinat
? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

expose number of entries in map state as a metric

2019-03-07 Thread Rinat
of time Thx a lot Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

event time & watermarks in connected streams with broadcast state

2019-02-27 Thread Rinat
ice with the more convenient approach. Thx ! Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

[flink :: connected-streams :: integration-tests]

2019-02-21 Thread Rinat
state manually on job startup, but I still can’t find the way hotw to do that. Thx for your advices. Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA mak

Re: In-Memory state serialization with kryo fails

2019-02-15 Thread Rinat
; Gordon > > [1] > https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/custom_serialization.html > > <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/state/custom_serialization.html> > On Wed, Feb 13, 2019 at 2:56 AM Rinat <mailto:r.s

[flink :: connected-streams :: integration-tests]

2019-02-15 Thread Rinat
state manually on job startup, but I still can’t find the way hotw to do that. Thx for your advices. Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

In-Memory state serialization with kryo fails

2019-02-12 Thread Rinat
cerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

[Flink 1.6.1] local app infinitely hangs

2018-10-15 Thread Rinat
"VM Thread" os_prio=31 tid=0x7f9b6a01b800 nid=0x2d03 runnable "GC task thread#0 (ParallelGC)" os_prio=31 tid=0x7f9b6900b000 nid=0x2503 runnable "GC task thread#1 (ParallelGC)" os_prio=31 tid=0x7f9b6900c000 nid=0x2703 runnable "GC task thread#2 (ParallelGC)" os_prio=31 tid=0x7f9b6900c800 nid=0x2903 runnable "GC task thread#3 (ParallelGC)" os_prio=31 tid=0x7f9b6900d000 nid=0x2b03 runnable "VM Periodic Task Thread" os_prio=31 tid=0x7f9b68a67000 nid=0x4d03 waiting on condition JNI global references: 682 Sincerely yours,Rinat SharipovSoftware Engineer at 1DMP CORE Teamemail: r.shari...@cleverdata.rumobile: +7 (925) 416-37-26CleverDATAmake your data clever

Re: [BucketingSink] notify on moving into pending/ final state

2018-10-11 Thread Rinat
://issues.apache.org/jira/browse/FLINK-9592 Thx ! > On 14 Jun 2018, at 23:29, Rinat wrote: > > Hi Piotr, I’ve create an issue > https://issues.apache.org/jira/browse/FLINK-9592 > <https://issues.apache.org/jira/browse/FLINK-9592> > > The third proposal looks great, may I t

Re: [deserialization schema] skip data, that couldn't be properly deserialized

2018-10-10 Thread Rinat
Hi Fabian, I have created the issue, https://issues.apache.org/jira/browse/FLINK-10525 Thx ! > On 10 Oct 2018, at 16:47, Fabian Hueske wrote: > > Hi Rinat, > > Thanks for discussing this idea. Yes, I think this would be a good feature. > Can you open a Jira issue and de

[deserialization schema] skip data, that couldn't be properly deserialized

2018-10-04 Thread Rinat
and other Deserializers in the source code ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-07-06 Thread Rinat
he whole flink repository, to run your example locally from IDE, but from my point of view, problem exists, and the following test shows it’s existance, please, have a look I’m working on flink project assembly on my local machine … Thx > On 25 Jun 2018, at 10:44, Rinat wrote: > > Hi

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-25 Thread Rinat
<>("test1", 1L)); testHarness.setProcessingTime(101L); testHarness.snapshot(0, 0); testHarness.notifyOfCompletedCheckpoint(0); sink.close(); assertThat(Files.exists(bucket.resolve("part-0-1")), is(true)); } > On 24 Jun 2018, at 06:02, zhangminglei <1871

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-23 Thread Rinat
Hi mates, could anyone please have a look on my PR, that fixes issue of incorrect indexing in BucketingSink component ? Thx > On 18 Jun 2018, at 10:55, Rinat wrote: > > I’ve created a JIRA issue https://issues.apache.org/jira/browse/FLINK-9603 > <https://issues.apache.org/ji

Re: [BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-18 Thread Rinat
I’ve created a JIRA issue https://issues.apache.org/jira/browse/FLINK-9603 <https://issues.apache.org/jira/browse/FLINK-9603> and added a proposal with PR. Thx > On 16 Jun 2018, at 17:21, Rinat wrote: > > Hi mates, since 1.5 release, BucketingSink has ability to c

[BucketingSink] incorrect indexing of part files, when part suffix is specified

2018-06-16 Thread Rinat
Before creating of writer, we appending the partSuffix here, but it should be already appended, before index checks if (partSuffix != null) { partPath = partPath.suffix(partSuffix); } I’ll create an issue and try to submit a fix Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email:

Re: [BucketingSink] notify on moving into pending/ final state

2018-06-14 Thread Rinat
} > > Alternatively, we could implement before mentioned callbacks support in > TwoPhaseCommitSinkFunction and provide such feature to > Kafka/Pravega/BucketingSink at once. > > Piotrek Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

Re: [BucketingSink] notify on moving into pending/ final state

2018-06-13 Thread Rinat
such feature is that Kostas is now > working on larger BucketingSink rework/refactor. > > Piotrek > >> On 8 Jun 2018, at 16:38, Rinat > <mailto:r.shari...@cleverdata.ru>> wrote: >> >> Hi mates, I got a proposal about functionality of BucketingSink. >&

Checkpoint/ Savepoint usage

2018-06-13 Thread Rinat
RocksDB, it could only be used as a checkpointing backend, and when I’ll decide to create savepoint, it’ll be stored in hdfs ? do we have any ability to configure the job, to use last checkpoint as a starting state out of the box ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team

Re: State life-cycle for different state-backend implementations

2018-06-13 Thread Rinat
Hi Sihua, Thx for your reply > On 9 Jun 2018, at 11:42, sihua zhou wrote: > > Hi Rinat, > > I think there is one configuration {{state.checkpoints.num-retained}} to > control the maximum number of completed checkpoints to retain, the default > value is 1. So the risk

State life-cycle for different state-backend implementations

2018-06-08 Thread Rinat
. Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

[BucketingSink] notify on moving into pending/ final state

2018-06-08 Thread Rinat
could be useful, and could be a part of BucketingSink API What do you sink, should I make a PR ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverdata.ru <mailto:a.totma...@cleverdata.ru> mobile: +7 (925) 416-37-26 CleverDATA make your data clever

Re: [flink-connector-filesystem] OutOfMemory in checkpointless environment

2018-06-08 Thread Rinat
ld > make sure that it fails early if it used that way. > > It would be great if you could open a JIRA. > > On 08.06.2018 10:08, Rinat wrote: >> Piotr, thx for your reply, for now everything is pretty clear. But from my >> point of view, it’s better to add some infor

Re: [flink-connector-filesystem] OutOfMemory in checkpointless environment

2018-06-08 Thread Rinat
asons, you could always > use for example > org.apache.flink.streaming.api.datastream.DataStream#writeUsingOutputFormat > and write your own simple `OutputFormat` or look if one of the existing ones > meet your needs. > > Piotrek > >> On 7 Jun 2018, at 14:23, Rinat > &

[flink-connector-filesystem] OutOfMemory in checkpointless environment

2018-06-07 Thread Rinat
rtunity to run the job without checkpointing, but really, if I do so, I got an exception in sink component. What do you think about this ? Do anyone got the same problem, and how’ve you solved it ? Sincerely yours, Rinat Sharipov Software Engineer at 1DMP CORE Team email: r.shari...@cleverd

Re: Flink send checkpointing message in IT

2017-11-07 Thread Rinat
check) > > On 02.11.2017 19:11, Rinat wrote: >> Chesnay, thanks for your reply, it was very helpful, but I took logic from >> this test template and tried to reuse it in my IT case, but found one more >> issue. >> I’ve registered an accumulator in my source function,

Re: Flink send checkpointing message in IT

2017-11-02 Thread Rinat
; How to do this depends a bit on how your test case is written, but you can > take a look at the SavepointMigrationTestBase#executeAndSavepoint which is > all about running josb and triggering > savepoints once certain conditions have been met. > > On 30.10.2017 16:01, Rinat wrote

How to lock and wait, until checkpointing is completed

2017-10-30 Thread Rinat
Hi guys, got one more question for you, maybe someone already implemented such feature or found a good technique. I wrote an IT, that runs a flink job, that reads data from kafka topic, and flushes it onto fs using BucketingSink. I implemented some custom logic, that fires on

Flink send checkpointing message in IT

2017-10-30 Thread Rinat
Hi guys, I’ve got a question about working with checkpointing. I would like to implement IT test, where source is a fixed collection of items and sink performs additional logic, when checkpointing is completed. I would like to force executing checkpointing, when all messages from my test

Re: How to test new sink

2017-10-23 Thread Rinat
is to understand the existing best practices for testing functions, that uses TimeService and after this, if it’ll be necessary, create an issue with proposals. Thx. > On 23 Oct 2017, at 17:51, Timo Walther <twal...@apache.org> wrote: > > Hi Rinat, > > using one of the Flink tes

How to test new sink

2017-10-23 Thread Rinat
Hi !!! I’ve just implemented a new sink, that extends functionality of existing BucketingSink, currently I’m trying to test functionality, that is related with timing. My sink implements ProcessingTimeCallback, similarly with the original BucketingSink. I’m trying to inject

BucketingSink with disabled checkpointing will never clean up it's state

2017-10-20 Thread Rinat
Hi, got one more little question about BucketingSink with disabled checkpointing. In terms of my current task, I’m looking through sources of BucketingSink and it seem’s that I found an issue for the case, when checkpointing is disabled. BucketingSink - is a flink rich function, that also

Re: Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Rinat
Piotrek, thanks for your reply. Yes, now I’m looking for the most suitable way to extend BucketingSink functionality, to handle moments of moving the file into final state. I thought, that maybe someone has already implemented such thing or knows any other approaches that will help me to not

Flink BucketingSink, subscribe on moving of file into final state

2017-10-20 Thread Rinat
Hi All ! I’m trying to create a meta-info file, that contains link to file, created by Flink BucketingSink. At first I was trying to implement my own org.apache.flink.streaming.connectors.fs.Writer, that creates a meta-file on close method call. But I understood, that it’s not completely

Flink Job Deployment (Not enough resources)

2017-09-04 Thread Rinat
Hi everyone, I’ve got the following problem, when I’m trying to submit new job and if cluster has not enough resources, job submission fails with the following exception But in YARN job hangs and wait’s for requested resources. When resources become available, job successfully runs. What can I

Flink Job Deployment

2017-09-04 Thread Rinat
Hi folks ! I’ve got a question about running flink job on the top of YARN. Is there any possibility to store job sources in hdfs, for example /app/flink/job-name/ - /lib/*.jar - /etc/*.properties and specify directories, that should be added to the job classpath ? Thx.

Re: write into hdfs using avro

2017-07-27 Thread Rinat
kWriter is a simple example of a writer that uses Avro for > serialization, and takes as input KV 2-tuples. > If you want to have a writer that takes as input your own event types, AFAIK > you’ll need to implement your own Writer. > > Cheers, > Gordon > > On 21 July

write into hdfs using avro

2017-07-21 Thread Rinat
Hi, folks ! I’ve got a little question, I’m trying to save stream of events from Kafka into HDSF using org.apache.flink.streaming.connectors.fs.bucketing.BucketingSink with AVRO serialization. If I properly understood, I should use some implementation of