date:20220131

Re: KafkaSource vs FlinkKafkaConsumer010

2022-01-31 Thread Francesco Guardiani

The latter link you posted refers to a very old flink release. You shold
use the first link, which refers to latest release

FG

On Tue, Feb 1, 2022 at 8:20 AM HG  wrote:

> Hello all
>
> I am confused.
> What is the difference between KafkaSource as defined in :
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/
> and FlinkKafkaConsumer010 as defined in
> https://nightlies.apache.org/flink/flink-docs-release-
> 1.2/api/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer010.html
> 
>
> When should I use which?
>
> Regards Hans
>

Re: MetricRegistryTestUtils java class (flink-runtime/metrics) not found in source code version 1.14.3

2022-01-31 Thread Martijn Visser

Hi Prasanna,

Just a quick note that the Github links are all pointing to release
candidate 1 for 1.14.3. The released version is in
https://github.com/apache/flink/blob/release-1.14.3/flink-runtime/src/test/java/org/apache/flink/runtime/metrics/MetricRegistryTestUtils.java

Best regards,

Martijn

On Tue, 1 Feb 2022 at 07:35, Prasanna kumar 
wrote:

> NVM was able to find it in a different place
>
>
> https://github.com/apache/flink/blob/release-1.14.3-rc1/flink-runtime/src/test/java/org/apache/flink/runtime/metrics/MetricRegistryTestUtils.java
>
> On Tue, Feb 1, 2022 at 11:58 AM Prasanna kumar <
> prasannakumarram...@gmail.com> wrote:
>
>> Hi,
>>
>> Team, We are writing our own prometheus reporter to make sure that we are
>> capturing data in histograms rather than summaries.
>>
>> We were able to do it successfully in version 1.12.7.
>>
>> But while upgrading to version 1.14.3 , we find
>> that MetricRegistryTestUtils is not available in the src code given by
>> flink in github.
>>
>> PrometheusReporterTest.java throws error that this file is unavailable
>>
>> [image: Screen Shot 2022-02-01 at 11.50.09 AM.png]
>>
>> [image: Screen Shot 2022-02-01 at 11.53.17 AM.png]
>>
>> The below code is in 1.12.7 but method
>> defaultMetricRegistryConfiguration  is deprecated in the latest version.
>>
>> [image: Screen Shot 2022-02-01 at 11.51.46 AM.png]
>>
>> Looking at the Github link
>> https://github.com/apache/flink/tree/release-1.14.3-rc1/flink-runtime/src/main/java/org/apache/flink/runtime/metrics
>> also shows that the MetricRegistryTestUtils is not available. It's not
>> available in
>> https://github.com/apache/flink/tree/master/flink-runtime/src/main/java/org/apache/flink/runtime/metrics
>> master branch as well.
>>
>> [image: Screen Shot 2022-02-01 at 11.55.19 AM.png]
>>
>> Could the community please add the Class file in GITHUB.
>>
>> Thanks,
>> Prasanna.
>>
>

KafkaSource vs FlinkKafkaConsumer010

2022-01-31 Thread HG

Hello all

I am confused.
What is the difference between KafkaSource as defined in :
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/connectors/datastream/kafka/
and FlinkKafkaConsumer010 as defined in
https://nightlies.apache.org/flink/flink-docs-release-
1.2/api/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumer010.html


When should I use which?

Regards Hans

Re: Flink 1.14 metrics : taskmanager host name missing

2022-01-31 Thread Mayur Gubrele

Hi,

Can someone please take a look at this? It's a major blocker for us.

Thanks,
Mayur

On Fri, Jan 21, 2022 at 2:11 PM Mayur Gubrele 
wrote:

> Hello,
>
> We recently upgraded our Flink cluster to 1.14 and noticed that all the
> taskmanager metrics we receive in our Prometheus data source get host IPs
> instead of hostnames, which was the case earlier before we moved to 1.14.
>
> I see on the flink dashboard as well under taskmanager details, host IPs
> are being populated now. This is a breaking change for us. Some of our APIs
> use this host tag value which used to be the hostname earlier.
>
> Can you tell us if there's a way we can configure to get hostnames instead
> of IPs?
>
> Thanks,
> Mayur
>

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith

ok it's working! Thanks. Just out of curiosity, why is the println of keyBy
printing twice?

On Mon, 31 Jan 2022 at 17:22, John Smith  wrote:

> Oh ok. I was reading here:
> https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/streaming_analytics/#latency-vs-completeness
> and Idid a cut and paste lol
>
> Ok let you know.
>
> On Mon, 31 Jan 2022 at 17:18, Dario Heinisch 
> wrote:
>
>> Then you should be using a process based time window, in your case:
>> TumblingProcessingTimeWindows
>>
>> See
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/
>> for more info
>> On 31.01.22 23:13, John Smith wrote:
>>
>> Hi Dario, I don't care about event time I just want to do tumbling window
>> over the "processing time" I.e: count whatever I have in the last 5 minutes.
>>
>> On Mon, 31 Jan 2022 at 17:09, Dario Heinisch 
>> wrote:
>>
>>> Hi John
>>>
>>> This is because you are using event time (TumblingEventTimeWinodws) but
>>> you do not have a event time watermark strategy.
>>> It is also why I opened:
>>> https://issues.apache.org/jira/browse/FLINK-24623 because I feel like
>>> Flink should be throwing an exception in that case
>>> on startup.
>>>
>>> Take a look at the documentation at:
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/
>>> which should have everything.
>>>
>>> > In order to work with event time, Flink needs to know the events
>>> timestamps, meaning each element in the stream needs to have its event
>>> timestamp assigned. This is usually done by accessing/extracting the
>>> timestamp from > some field in the element by using a TimestampAssigner.
>>> > Timestamp assignment goes hand-in-hand with generating watermarks,
>>> which tell the system about progress in event time. You can configure this
>>> by specifying a WatermarkGenerator.
>>>
>>> Best regards,
>>>
>>> Dario
>>> On 31.01.22 22:28, John Smith wrote:
>>>
>>> Hi I have the following job... I'm expecting the System.out
>>> .println(key.toString());   to at least print, but nothing prints.
>>>
>>> - .flatMap: Fires prints my debug message once as expected.
>>> - .keyBy: Also fires, but prints my debug message twice.
>>> - .apply: Doesn't seem to fire. The debug statement doesn't seem to
>>> print. I'm expecting it to print the key from above keyBy.
>>>
>>> DataStream slStream = env.fromSource(kafkaSource, 
>>> WatermarkStrategy.noWatermarks(), "Kafka Source")
>>> .uid(kafkaTopic).name(kafkaTopic)
>>> .setParallelism(1)
>>> .flatMap(new MapToMyEvent("my-event", "message")) // <--- This works
>>> .uid("map-json-logs").name("map-json-logs");
>>> slStream.keyBy(new MinutesKeySelector(windowSizeMins)) // <--- This prints 
>>> twice
>>> .window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
>>> .apply(new WindowFunction>> String, String>, TimeWindow>() {
>>> @Overridepublic void apply(Tuple3>> String> key, TimeWindow window, Iterable input, Collector 
>>> out) throws Exception {
>>> // This should print.
>>> System.out.println(key.toString());// Do nothing for now
>>> }
>>> })
>>> .uid("process").name("process")
>>> ;
>>>
>>>
>>>
>>>

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith

Oh ok. I was reading here:
https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/learn-flink/streaming_analytics/#latency-vs-completeness
and Idid a cut and paste lol

Ok let you know.

On Mon, 31 Jan 2022 at 17:18, Dario Heinisch 
wrote:

> Then you should be using a process based time window, in your case:
> TumblingProcessingTimeWindows
>
> See
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/
> for more info
> On 31.01.22 23:13, John Smith wrote:
>
> Hi Dario, I don't care about event time I just want to do tumbling window
> over the "processing time" I.e: count whatever I have in the last 5 minutes.
>
> On Mon, 31 Jan 2022 at 17:09, Dario Heinisch 
> wrote:
>
>> Hi John
>>
>> This is because you are using event time (TumblingEventTimeWinodws) but
>> you do not have a event time watermark strategy.
>> It is also why I opened:
>> https://issues.apache.org/jira/browse/FLINK-24623 because I feel like
>> Flink should be throwing an exception in that case
>> on startup.
>>
>> Take a look at the documentation at:
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/
>> which should have everything.
>>
>> > In order to work with event time, Flink needs to know the events
>> timestamps, meaning each element in the stream needs to have its event
>> timestamp assigned. This is usually done by accessing/extracting the
>> timestamp from > some field in the element by using a TimestampAssigner.
>> > Timestamp assignment goes hand-in-hand with generating watermarks,
>> which tell the system about progress in event time. You can configure this
>> by specifying a WatermarkGenerator.
>>
>> Best regards,
>>
>> Dario
>> On 31.01.22 22:28, John Smith wrote:
>>
>> Hi I have the following job... I'm expecting the System.out
>> .println(key.toString());   to at least print, but nothing prints.
>>
>> - .flatMap: Fires prints my debug message once as expected.
>> - .keyBy: Also fires, but prints my debug message twice.
>> - .apply: Doesn't seem to fire. The debug statement doesn't seem to
>> print. I'm expecting it to print the key from above keyBy.
>>
>> DataStream slStream = env.fromSource(kafkaSource, 
>> WatermarkStrategy.noWatermarks(), "Kafka Source")
>> .uid(kafkaTopic).name(kafkaTopic)
>> .setParallelism(1)
>> .flatMap(new MapToMyEvent("my-event", "message")) // <--- This works
>> .uid("map-json-logs").name("map-json-logs");
>> slStream.keyBy(new MinutesKeySelector(windowSizeMins)) // <--- This prints 
>> twice
>> .window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
>> .apply(new WindowFunction> String, String>, TimeWindow>() {
>> @Overridepublic void apply(Tuple3> String> key, TimeWindow window, Iterable input, Collector 
>> out) throws Exception {
>> // This should print.
>> System.out.println(key.toString());// Do nothing for now 
>>}
>> })
>> .uid("process").name("process")
>> ;
>>
>>
>>
>>

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch

Then you should be using a process based time window, in your case: 
TumblingProcessingTimeWindows

See 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/windows/ 
for more info

On 31.01.22 23:13, John Smith wrote:
Hi Dario, I don't care about event time I just want to do 
tumbling window over the "processing time" I.e: count whatever I have 
in the last 5 minutes.

On Mon, 31 Jan 2022 at 17:09, Dario Heinisch 
 wrote:

Hi John

This is because you are using event time
(TumblingEventTimeWinodws) but you do not have a event time
watermark strategy.
It is also why I opened:
https://issues.apache.org/jira/browse/FLINK-24623 because I feel
like Flink should be throwing an exception in that case
on startup.

Take a look at the documentation at:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/

which should have everything.

> In order to work with event time, Flink needs to know the events
timestamps, meaning each element in the stream needs to have its
event timestamp assigned. This is usually done by
accessing/extracting the timestamp from > some field in the
element by using a TimestampAssigner.
> Timestamp assignment goes hand-in-hand with generating
watermarks, which tell the system about progress in event time.
You can configure this by specifying a WatermarkGenerator.

Best regards,

Dario

On 31.01.22 22:28, John Smith wrote:

Hi I have the following job... I'm expecting the
System.out.println(key.toString());  to at least print, but
nothing prints.

- .flatMap: Fires prints my debug message once as expected.
- .keyBy: Also fires, but prints my debug message twice.
- .apply: Doesn't seem to fire. The debug statement doesn't seem
to print. I'm expecting it to print the key from above keyBy.

DataStream slStream = env.fromSource(kafkaSource, 
WatermarkStrategy.noWatermarks(), "Kafka Source")
 .uid(kafkaTopic).name(kafkaTopic)
 .setParallelism(1)
 .flatMap(new MapToMyEvent("my-event", "message")) // <--- This 
works
 .uid("map-json-logs").name("map-json-logs"); slStream.keyBy(new 
MinutesKeySelector(windowSizeMins)) // <--- This prints twice
 .window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
 .apply(new WindowFunction, TimeWindow>() {
 @Override public void apply(Tuple3 key, TimeWindow 
window, Iterable input, Collector out)throws Exception {
 // This should print. System.out.println(key.toString()); 
// Do nothing for now }
 })
 .uid("process").name("process")
 ;

Re: Tumbling window apply will not "fire"

2022-01-31 Thread John Smith

Hi Dario, I don't care about event time I just want to do tumbling window
over the "processing time" I.e: count whatever I have in the last 5 minutes.

On Mon, 31 Jan 2022 at 17:09, Dario Heinisch 
wrote:

> Hi John
>
> This is because you are using event time (TumblingEventTimeWinodws) but
> you do not have a event time watermark strategy.
> It is also why I opened: https://issues.apache.org/jira/browse/FLINK-24623
> because I feel like Flink should be throwing an exception in that case
> on startup.
>
> Take a look at the documentation at:
> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/
> which should have everything.
>
> > In order to work with event time, Flink needs to know the events
> timestamps, meaning each element in the stream needs to have its event
> timestamp assigned. This is usually done by accessing/extracting the
> timestamp from > some field in the element by using a TimestampAssigner.
> > Timestamp assignment goes hand-in-hand with generating watermarks, which
> tell the system about progress in event time. You can configure this by
> specifying a WatermarkGenerator.
>
> Best regards,
>
> Dario
> On 31.01.22 22:28, John Smith wrote:
>
> Hi I have the following job... I'm expecting the System.out
> .println(key.toString());   to at least print, but nothing prints.
>
> - .flatMap: Fires prints my debug message once as expected.
> - .keyBy: Also fires, but prints my debug message twice.
> - .apply: Doesn't seem to fire. The debug statement doesn't seem to print.
> I'm expecting it to print the key from above keyBy.
>
> DataStream slStream = env.fromSource(kafkaSource, 
> WatermarkStrategy.noWatermarks(), "Kafka Source")
> .uid(kafkaTopic).name(kafkaTopic)
> .setParallelism(1)
> .flatMap(new MapToMyEvent("my-event", "message")) // <--- This works
> .uid("map-json-logs").name("map-json-logs");
> slStream.keyBy(new MinutesKeySelector(windowSizeMins)) // <--- This prints 
> twice
> .window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
> .apply(new WindowFunction String, String>, TimeWindow>() {
> @Overridepublic void apply(Tuple3 String> key, TimeWindow window, Iterable input, Collector 
> out) throws Exception {
> // This should print.
> System.out.println(key.toString());// Do nothing for now  
>   }
> })
> .uid("process").name("process")
> ;
>
>
>
>

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch

Hi John

This is because you are using event time (TumblingEventTimeWinodws) but 
you do not have a event time watermark strategy.
It is also why I opened: 
https://issues.apache.org/jira/browse/FLINK-24623 because I feel like 
Flink should be throwing an exception in that case

on startup.

Take a look at the documentation at: 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/event-time/generating_watermarks/ 

which should have everything.

> In order to work with event time, Flink needs to know the events 
timestamps, meaning each element in the stream needs to have its event 
timestamp assigned. This is usually done by accessing/extracting the 
timestamp from > some field in the element by using a TimestampAssigner.
> Timestamp assignment goes hand-in-hand with generating watermarks, 
which tell the system about progress in event time. You can configure 
this by specifying a WatermarkGenerator.

Best regards,

Dario

On 31.01.22 22:28, John Smith wrote:
Hi I have the following job... I'm expecting the 
System.out.println(key.toString());  to at least print, but nothing 
prints.

- .flatMap: Fires prints my debug message once as expected.
- .keyBy: Also fires, but prints my debug message twice.
- .apply: Doesn't seem to fire. The debug statement doesn't seem to 
print. I'm expecting it to print the key from above keyBy.

DataStream slStream = env.fromSource(kafkaSource, 
WatermarkStrategy.noWatermarks(), "Kafka Source")
 .uid(kafkaTopic).name(kafkaTopic)
 .setParallelism(1)
 .flatMap(new MapToMyEvent("my-event", "message")) // <--- This works
 .uid("map-json-logs").name("map-json-logs"); slStream.keyBy(new 
MinutesKeySelector(windowSizeMins)) // <--- This prints twice
 .window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
 .apply(new WindowFunction, TimeWindow>() {
 @Override public void apply(Tuple3 key, TimeWindow 
window, Iterable input, Collector out)throws Exception {
 // This should print. System.out.println(key.toString()); // 
Do nothing for now }
 })
 .uid("process").name("process")
 ;

Read Avro type records from kafka using Python - Datastream API

2022-01-31 Thread Hussein El Ghoul

Hello,

I am currently working on program that uses flink to read avro type records 
from kafka.
I have the avro schema of the records I want to read in a file but I looked all 
over github, the documentation and stack Overflow for examples on how to use 
AvroRowDeserializationSchema to deserialize a Kafka topic with Avro type but 
could not find any.
Could you please provide an example using Python that is compatible with the 
Datastream API ?

Best Regards,
Hussein
Quiqup - Data Enginee

Tumbling window apply will not "fire"

2022-01-31 Thread John Smith

Hi I have the following job... I'm expecting the System.out
.println(key.toString());   to at least print, but nothing prints.

- .flatMap: Fires prints my debug message once as expected.
- .keyBy: Also fires, but prints my debug message twice.
- .apply: Doesn't seem to fire. The debug statement doesn't seem to print.
I'm expecting it to print the key from above keyBy.

DataStream slStream = env.fromSource(kafkaSource,
WatermarkStrategy.noWatermarks(), "Kafka Source")
.uid(kafkaTopic).name(kafkaTopic)
.setParallelism(1)
.flatMap(new MapToMyEvent("my-event", "message")) // <--- This works
.uid("map-json-logs").name("map-json-logs");
slStream.keyBy(new MinutesKeySelector(windowSizeMins)) // <---
This prints twice
.window(TumblingEventTimeWindows.of(Time.minutes(windowSizeMins)))
.apply(new WindowFunction, TimeWindow>() {
@Override
public void apply(Tuple3 key,
TimeWindow window, Iterable input, Collector out)
throws Exception {
// This should print.
System.out.println(key.toString());

// Do nothing for now
}
})
.uid("process").name("process")
;

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-01-31 Thread John Smith

Hi yes, see below. So it only seems to show the consumer offsets if
checkpointing is on... That's the only diff I can see between my two
different jobs. And the moment I enabled it on the job. It started showing
in Kafka Explorer here: https://www.kafkatool.com/

return KafkaSource.builder()
.setBootstrapServers(bootstrapServers)
.setTopics(topic)
.setValueOnlyDeserializer(new VertxJsonSchema())
.setGroupId(consumerGroup)
.setStartingOffsets(oi)
.setProperties(props)
.build();


On Mon, 31 Jan 2022 at 12:03, Fabian Paul  wrote:

> Hi John,
> First I would like to ask you to give us some more information about
> how you consume from Kafka with Flink. It is currently recommended to
> use the KafkaSource that subsumes the deprecated FlinkKafkaConsumer.
>
> One thing to already note is that by default Flink does not commit the
> Kafka to offset back to the topic because it is not needed from a
> Flink perspective and is only supported on a best-effort basis if a
> checkpoint completes.
>
> I am not very familiar with Kafka Explorer but I can imagine it only
> shows the consumer group if there are actually committed offsets
>
> Best,
> Fabian
>
> On Mon, Jan 31, 2022 at 3:41 PM John Smith  wrote:
> >
> > Hi using flink 1.14.3, I have a job that doesn't use checkpointing. I
> have noticed that the Kafka Consumer Group is not set?
> >
> > I use Kafka Explorer to see all the consumers and when I run the job I
> don't see the consumer group. Finally I decided to enable checkpointing and
> restart the job and finally saw the consumer group.
> >
> > Is this expected behaviour?
>

Re: Resolving a CatalogTable

2022-01-31 Thread Balázs Varga

Hi Timo,

Thanks for the reply. I've thought a bit more about the problem,
considering your points.
This is not critical as of now, but for the sake of discussion, I think it
could be interesting.

The problem stems from the fact that we don't create the table via DDL, but
the following custom method:
- The user uploads an Avro schema definition (avsc file) accompanied by
some metadata. The metadata contains info about event time and such.
- Types are extracted from the avro Schema using the AvroSchemaConverter
class (currently into TypeInformation).
- There are no computed columns, UDFs, etc. The created tables are always
backed by Kafka.
- The TypeInformation and metadata are used to create a TableSchema that is
to be persisted.

I realize that Flink's new model works well if we were creating the table
through DDL, because then it would go through the CatalogManager, where the
table is resolved.
It seems a viable solution to do exactly that. Instead of directly creating
a CatalogTable from the Avro schema; create a TableDescriptor, obtain a
TableEnvironment, and call createTable along with an object path pointing
back to my Catalog. What do you think about this approach?

Also, if my understanding is correct, this means that the reason Flink
doesn't expose the resolution functionality is because in the general case,
when there might be external references, we need the all catalogs to
resolve these. So actually Flink covers the intended use cases properly.

Thanks,
Balazs

On Fri, Jan 28, 2022 at 3:37 PM Timo Walther  wrote:

> Hi Balazs,
>
> you are right, the new APIs only allow the serialization of resolved
> instances. This ensures that only validated, correct instances are put
> into the persistent storage such as a database. The framework will
> always provide resolved instances and call the corresponding methods
> with those. They should be easily serializable.
>
> However, when reading from a persistent storage such as a database, the
> framework needs to validate the input and resolved expressions and data
> types (e.g. from a string representation).
>
> The new design reflects the reality better. A catalog implementation
> does not need to be symmetric. It follows the principle:
>
> - "Resolved" into the catalog (with all information if implementers need
> it)
> - "Unresolved" out of the catalog (let the framework deal with the
> resolution, also with cross references to other catalogs)
>
>
> Use ResolvedCatalogTable#toProperties for putting basic info into your
> database.
>
> Use CatalogTable#fromProperties to restore the table.
>
> This is esp important for expression resolution of computed columns and
> watermark strategies. Functions could come from other catalogs as well.
>
> So for implementers it is usally not important to resolved the
> `CatalogTable` manually.
>
> If it is important for you, maybe you can elaborate a bit on your use case?
>
> Regards,
> Timo
>
>
> On 26.01.22 12:18, Balázs Varga wrote:
> > Hi everyone,
> >
> > I'm trying to migrate from the old set of CatalogTable related APIs
> > (CatalogTableImpl, TableSchema, DescriptorProperties) to the new ones
> > (CatalogBaseTable, Schema and ResolvedSchema, CatalogPropertiesUtil), in
> > a custom catalog.
> >
> > The catalog stores table definitions, and the current logic involves
> > persisting the
> > schema from a CatalogBaseTable to a database. When we get a table, its
> > definition is read from the database and the CatalogTable is built up
> > and returned.
> >
> > For this, we currently serialize the schema like this:
> > descriptorProperties.putTableSchema(Schema.SCHEMA,
> > catalogBaseTable.getSchema());
> >
> > The new API seems to intentionally only allow the serialization of the
> > Resolved version of objects (e.g. ResolvedCatalogTable, ResolvedSchema).
> >
> > 1. Could you please clarify why this limitation was put into place? It
> > seems to me that it would
> > be sufficient to resolve the CatalogTables once we are actually trying
> > to pass the table to the DynamicTableFactory.
> >
> > 2. What additional information is gained during the resolution of a
> > CatalogTable, and where does that information come from? Are there some
> > references to things in other catalogs?
> >
> > 3. Is it possible to "manually" resolve a CatalogTable? (invoke
> > something like what the internal DefaultSchemaResolver does). What
> > context is required?
> >
> > Thanks,
> > Balazs
> >
>
>

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Andrew Otto

https://golb.hplar.ch/2018/02/Access-Server-Sent-Events-from-Java.html
looks like a nice tutorial.

On Mon, Jan 31, 2022 at 12:27 PM Andrew Otto  wrote:

> Any SSE/EventSource Java Client should work.  I have not personally used
> one.  From a quick search, maybe
> https://github.com/launchdarkly/okhttp-eventsource or something like it?
>
>
>
> On Mon, Jan 31, 2022 at 11:45 AM Francesco Guardiani <
> france...@ververica.com> wrote:
>
>> > Shameless plug:  Maybe the Wikipedia EventStreams
>>  SSE API
>>  would make for a great
>> connector example in Flink?
>>
>> Sounds like a great idea! Do you have a ready to use Java Client for
>> that?
>>
>> On Mon, Jan 31, 2022 at 3:47 PM Jing Ge  wrote:
>>
>>> Thanks @Martijn for driving this! +1 for deprecating and removing it.
>>> All the concerns mentioned previously are valid. It is good to know that
>>> the upcoming connector template/archetype will help the user for the
>>> kickoff. Beyond that, speaking of using a real connector as a sample, since
>>> Flink is heading towards the unified batch and stream processing, IMHO, it
>>> would be nice to pick up a feasible connector for this trend to let the
>>> user get a sample close to the use cases.
>>>
>>> Best regards
>>> Jing
>>>
>>> On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:
>>>
 Shameless plug:  Maybe the Wikipedia EventStreams
  SSE
 API  would make for a
 great connector example in Flink?

 :D

 On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
 wrote:

> Hi all,
>
> Thanks for your feedback. It's not about having this connector in the
> main repo, that has been voted on already. This is strictly about the
> connector itself, since it's not maintained and most probably also can't 
> be
> used due to changes in Twitter's API that aren't reflected in our 
> connector
> implementation. Therefore I propose to remove it.
>
> Fully agree on the template part, what's good to know is that a
> connector template/archetype is part of the goals for the external
> connector repository.
>
> Best regards,
>
> Martijn
>
> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
> france...@ververica.com> wrote:
>
>> Hi,
>>
>> I agree with the concern about having this connector in the main
>> repo. But I think in general it doesn't harm to have a sample connector 
>> to
>> show how to develop a custom connector, and I think that the Twitter
>> connector can be a good candidate for such a template. It needs rework 
>> for
>> sure, as it has evident issues, notably it doesn't work with table.
>>
>> So i understand if we wanna remove what we have right now, but I
>> think we should have some replacement for a "connector template", which 
>> is
>> both ready to use and easy to hack to build your own connector starting
>> from it. Twitter API is a good example for such a template, as it's both
>> "related" to the known common use cases of Flink and because is quite
>> simple to get started with.
>>
>> FG
>>
>> On Sun, Jan 30, 2022 at 12:31 PM David Anderson <
>> da...@alpinegizmo.com> wrote:
>>
>>> I agree.
>>>
>>> The Twitter connector is used in a few (unofficial) tutorials, so if
>>> we remove it that will make it more difficult for those tutorials to be
>>> maintained. On the other hand, if I recall correctly, that connector 
>>> uses
>>> V1 of the Twitter API, which has been deprecated, so it's really not 
>>> very
>>> useful even for that purpose.
>>>
>>> David
>>>
>>>
>>>
>>> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser <
>>> mart...@ververica.com> wrote:
>>>
 Hi everyone,

 I would like to discuss deprecating Flinks' Twitter connector [1].
 This was one of the first connectors that was added to Flink, which 
 could
 be used to access the tweets from Twitter. Given the evolution of Flink
 over Twitter, I don't think that:

 * Users are still using this connector at all
 * That the code for this connector should be in the main Flink
 codebase.

 Given the circumstances, I would propose to deprecate and remove
 this connector. I'm looking forward to your thoughts. If you agree, 
 please
 also let me know if you think we should first deprecate it in Flink 
 1.15
 and remove it in a version after that, or if you think we can remove it
 directly.

 Best regards,

 Martijn Visser
 https://twitter.com/MartijnVisser82

 [1]

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Andrew Otto

Any SSE/EventSource Java Client should work.  I have not personally used
one.  From a quick search, maybe
https://github.com/launchdarkly/okhttp-eventsource or something like it?



On Mon, Jan 31, 2022 at 11:45 AM Francesco Guardiani <
france...@ververica.com> wrote:

> > Shameless plug:  Maybe the Wikipedia EventStreams
>  SSE API
>  would make for a great
> connector example in Flink?
>
> Sounds like a great idea! Do you have a ready to use Java Client for that?
>
> On Mon, Jan 31, 2022 at 3:47 PM Jing Ge  wrote:
>
>> Thanks @Martijn for driving this! +1 for deprecating and removing it. All
>> the concerns mentioned previously are valid. It is good to know that the
>> upcoming connector template/archetype will help the user for the kickoff.
>> Beyond that, speaking of using a real connector as a sample, since Flink is
>> heading towards the unified batch and stream processing, IMHO, it would be
>> nice to pick up a feasible connector for this trend to let the user get a
>> sample close to the use cases.
>>
>> Best regards
>> Jing
>>
>> On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:
>>
>>> Shameless plug:  Maybe the Wikipedia EventStreams
>>>  SSE
>>> API  would make for a great
>>> connector example in Flink?
>>>
>>> :D
>>>
>>> On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
>>> wrote:
>>>
 Hi all,

 Thanks for your feedback. It's not about having this connector in the
 main repo, that has been voted on already. This is strictly about the
 connector itself, since it's not maintained and most probably also can't be
 used due to changes in Twitter's API that aren't reflected in our connector
 implementation. Therefore I propose to remove it.

 Fully agree on the template part, what's good to know is that a
 connector template/archetype is part of the goals for the external
 connector repository.

 Best regards,

 Martijn

 On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
 france...@ververica.com> wrote:

> Hi,
>
> I agree with the concern about having this connector in the main repo.
> But I think in general it doesn't harm to have a sample connector to show
> how to develop a custom connector, and I think that the Twitter connector
> can be a good candidate for such a template. It needs rework for sure, as
> it has evident issues, notably it doesn't work with table.
>
> So i understand if we wanna remove what we have right now, but I think
> we should have some replacement for a "connector template", which is both
> ready to use and easy to hack to build your own connector starting from 
> it.
> Twitter API is a good example for such a template, as it's both "related"
> to the known common use cases of Flink and because is quite simple to get
> started with.
>
> FG
>
> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
> wrote:
>
>> I agree.
>>
>> The Twitter connector is used in a few (unofficial) tutorials, so if
>> we remove it that will make it more difficult for those tutorials to be
>> maintained. On the other hand, if I recall correctly, that connector uses
>> V1 of the Twitter API, which has been deprecated, so it's really not very
>> useful even for that purpose.
>>
>> David
>>
>>
>>
>> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to discuss deprecating Flinks' Twitter connector [1].
>>> This was one of the first connectors that was added to Flink, which 
>>> could
>>> be used to access the tweets from Twitter. Given the evolution of Flink
>>> over Twitter, I don't think that:
>>>
>>> * Users are still using this connector at all
>>> * That the code for this connector should be in the main Flink
>>> codebase.
>>>
>>> Given the circumstances, I would propose to deprecate and remove
>>> this connector. I'm looking forward to your thoughts. If you agree, 
>>> please
>>> also let me know if you think we should first deprecate it in Flink 1.15
>>> and remove it in a version after that, or if you think we can remove it
>>> directly.
>>>
>>> Best regards,
>>>
>>> Martijn Visser
>>> https://twitter.com/MartijnVisser82
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>>
>>>

Re: How to put external data in EmbeddedRocksDB

2022-01-31 Thread Fabian Paul

Hi Surendra,

I do not think there is an out-of-the-box way to do look-ups to the
local rocksdb instance. In general, I am a bit skeptical about whether
it is a good idea to use the rocksdb instance for your state
management, and the looks up in parallel. It may overload the rocksdb
and cause unexpected side effects.

Can you elaborate a bit more about your use case and what kind of
information you are writing/reading?

Best,
Fabian

On Sat, Jan 29, 2022 at 8:00 AM Surendra Lalwani
 wrote:
>
> Hi Team,
>
> I am working on a solution where we need to perform some lookup from flink 
> job, earlier we were using Redis and calling that redis using UDF from Flink 
> Job but now we are planning to remove external dependency from Flink of Redis 
> and want to use the embedded rocks db as look up store, does anyone of you 
> have any idea on how this can be done or if anybody has any other solution, 
> that is also appreciable.
>
> Thanks and Regards ,
> Surendra Lalwani
>
>
> 
> IMPORTANT NOTICE: This e-mail, including any attachments, may contain 
> confidential information and is intended only for the addressee(s) named 
> above. If you are not the intended recipient(s), you should not disseminate, 
> distribute, or copy this e-mail. Please notify the sender by reply e-mail 
> immediately if you have received this e-mail in error and permanently delete 
> all copies of the original message from your system. E-mail transmission 
> cannot be guaranteed to be secure as it could be intercepted, corrupted, 
> lost, destroyed, arrive late or incomplete, or contain viruses. Company 
> accepts no liability for any damage or loss of confidential information 
> caused by this email or due to any virus transmitted by this email or 
> otherwise.

Re: Flink 1.14.2/3 - KafkaSink vs deprecated FlinkKafkaProducer

2022-01-31 Thread Fabian Paul

Hi Daniel,

Thanks for reaching out, we are constantly trying to improve the
reliability of our connectors. I assume you are running the KafkaSink
with exactly-once delivery guarantee. On startup, the KafkaSink tries
to abort lingering transactions from previous executions.
Unfortunately, nothing comes to my mind immediately why your job
hangs.

Can you maybe share the logs with us from such a run? It would be also
great to know more information about your environment e.g. bounded or
unbounded jobs, parallelism, Kafka client/server version, potential
topic acls, transactional id prefix.

Best,
Fabian

On Mon, Jan 31, 2022 at 3:27 PM Daniel Peled
 wrote:
>
> Hi everyone,
>
> Has anyone encountered any problem with the new KafkaSink that is used in 
> Flink 1.14 ?
>
> When running our jobs, the sinks of some of our jobs are stuck in 
> initializing for more than an hour.
> The only thing that helps is deleting the topic __transaction_state.
> After deleting this topic, all sinks are immediately released and are in 
> running status.
> The problem is quite random each time in a different job.
> There are times that all jobs start running without any problems.
>
> Unfortunately we had to go back to the deprecated FlinkKafkaProducer.
>
> We didn't have these problems with Flink 1.13 and FlinkKafkaProducer
>
> Any ideas on what to do ?
> What are we doing wrong?
>
> BR,
> Daniel
>

Re: Kafka Consumer Group Name not set if no checkpointing?

2022-01-31 Thread Fabian Paul

Hi John,
First I would like to ask you to give us some more information about
how you consume from Kafka with Flink. It is currently recommended to
use the KafkaSource that subsumes the deprecated FlinkKafkaConsumer.

One thing to already note is that by default Flink does not commit the
Kafka to offset back to the topic because it is not needed from a
Flink perspective and is only supported on a best-effort basis if a
checkpoint completes.

I am not very familiar with Kafka Explorer but I can imagine it only
shows the consumer group if there are actually committed offsets

Best,
Fabian

On Mon, Jan 31, 2022 at 3:41 PM John Smith  wrote:
>
> Hi using flink 1.14.3, I have a job that doesn't use checkpointing. I have 
> noticed that the Kafka Consumer Group is not set?
>
> I use Kafka Explorer to see all the consumers and when I run the job I don't 
> see the consumer group. Finally I decided to enable checkpointing and restart 
> the job and finally saw the consumer group.
>
> Is this expected behaviour?

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Francesco Guardiani

> Shameless plug:  Maybe the Wikipedia EventStreams
 SSE API
 would make for a great
connector example in Flink?

Sounds like a great idea! Do you have a ready to use Java Client for that?

On Mon, Jan 31, 2022 at 3:47 PM Jing Ge  wrote:

> Thanks @Martijn for driving this! +1 for deprecating and removing it. All
> the concerns mentioned previously are valid. It is good to know that the
> upcoming connector template/archetype will help the user for the kickoff.
> Beyond that, speaking of using a real connector as a sample, since Flink is
> heading towards the unified batch and stream processing, IMHO, it would be
> nice to pick up a feasible connector for this trend to let the user get a
> sample close to the use cases.
>
> Best regards
> Jing
>
> On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:
>
>> Shameless plug:  Maybe the Wikipedia EventStreams
>>  SSE API
>>  would make for a great
>> connector example in Flink?
>>
>> :D
>>
>> On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
>> wrote:
>>
>>> Hi all,
>>>
>>> Thanks for your feedback. It's not about having this connector in the
>>> main repo, that has been voted on already. This is strictly about the
>>> connector itself, since it's not maintained and most probably also can't be
>>> used due to changes in Twitter's API that aren't reflected in our connector
>>> implementation. Therefore I propose to remove it.
>>>
>>> Fully agree on the template part, what's good to know is that a
>>> connector template/archetype is part of the goals for the external
>>> connector repository.
>>>
>>> Best regards,
>>>
>>> Martijn
>>>
>>> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
>>> france...@ververica.com> wrote:
>>>
 Hi,

 I agree with the concern about having this connector in the main repo.
 But I think in general it doesn't harm to have a sample connector to show
 how to develop a custom connector, and I think that the Twitter connector
 can be a good candidate for such a template. It needs rework for sure, as
 it has evident issues, notably it doesn't work with table.

 So i understand if we wanna remove what we have right now, but I think
 we should have some replacement for a "connector template", which is both
 ready to use and easy to hack to build your own connector starting from it.
 Twitter API is a good example for such a template, as it's both "related"
 to the known common use cases of Flink and because is quite simple to get
 started with.

 FG

 On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
 wrote:

> I agree.
>
> The Twitter connector is used in a few (unofficial) tutorials, so if
> we remove it that will make it more difficult for those tutorials to be
> maintained. On the other hand, if I recall correctly, that connector uses
> V1 of the Twitter API, which has been deprecated, so it's really not very
> useful even for that purpose.
>
> David
>
>
>
> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> I would like to discuss deprecating Flinks' Twitter connector [1].
>> This was one of the first connectors that was added to Flink, which could
>> be used to access the tweets from Twitter. Given the evolution of Flink
>> over Twitter, I don't think that:
>>
>> * Users are still using this connector at all
>> * That the code for this connector should be in the main Flink
>> codebase.
>>
>> Given the circumstances, I would propose to deprecate and remove this
>> connector. I'm looking forward to your thoughts. If you agree, please 
>> also
>> let me know if you think we should first deprecate it in Flink 1.15 and
>> remove it in a version after that, or if you think we can remove it
>> directly.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>
>>

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Jing Ge

Thanks @Martijn for driving this! +1 for deprecating and removing it. All
the concerns mentioned previously are valid. It is good to know that the
upcoming connector template/archetype will help the user for the kickoff.
Beyond that, speaking of using a real connector as a sample, since Flink is
heading towards the unified batch and stream processing, IMHO, it would be
nice to pick up a feasible connector for this trend to let the user get a
sample close to the use cases.

Best regards
Jing

On Mon, Jan 31, 2022 at 3:07 PM Andrew Otto  wrote:

> Shameless plug:  Maybe the Wikipedia EventStreams
>  SSE API
>  would make for a great
> connector example in Flink?
>
> :D
>
> On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
> wrote:
>
>> Hi all,
>>
>> Thanks for your feedback. It's not about having this connector in the
>> main repo, that has been voted on already. This is strictly about the
>> connector itself, since it's not maintained and most probably also can't be
>> used due to changes in Twitter's API that aren't reflected in our connector
>> implementation. Therefore I propose to remove it.
>>
>> Fully agree on the template part, what's good to know is that a connector
>> template/archetype is part of the goals for the external
>> connector repository.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani <
>> france...@ververica.com> wrote:
>>
>>> Hi,
>>>
>>> I agree with the concern about having this connector in the main repo.
>>> But I think in general it doesn't harm to have a sample connector to show
>>> how to develop a custom connector, and I think that the Twitter connector
>>> can be a good candidate for such a template. It needs rework for sure, as
>>> it has evident issues, notably it doesn't work with table.
>>>
>>> So i understand if we wanna remove what we have right now, but I think
>>> we should have some replacement for a "connector template", which is both
>>> ready to use and easy to hack to build your own connector starting from it.
>>> Twitter API is a good example for such a template, as it's both "related"
>>> to the known common use cases of Flink and because is quite simple to get
>>> started with.
>>>
>>> FG
>>>
>>> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
>>> wrote:
>>>
 I agree.

 The Twitter connector is used in a few (unofficial) tutorials, so if we
 remove it that will make it more difficult for those tutorials to be
 maintained. On the other hand, if I recall correctly, that connector uses
 V1 of the Twitter API, which has been deprecated, so it's really not very
 useful even for that purpose.

 David



 On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
 wrote:

> Hi everyone,
>
> I would like to discuss deprecating Flinks' Twitter connector [1].
> This was one of the first connectors that was added to Flink, which could
> be used to access the tweets from Twitter. Given the evolution of Flink
> over Twitter, I don't think that:
>
> * Users are still using this connector at all
> * That the code for this connector should be in the main Flink
> codebase.
>
> Given the circumstances, I would propose to deprecate and remove this
> connector. I'm looking forward to your thoughts. If you agree, please also
> let me know if you think we should first deprecate it in Flink 1.15 and
> remove it in a version after that, or if you think we can remove it
> directly.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>
>

Kafka Consumer Group Name not set if no checkpointing?

2022-01-31 Thread John Smith

Hi using flink 1.14.3, I have a job that doesn't use checkpointing. I have
noticed that the Kafka Consumer Group is not set?

I use Kafka Explorer to see all the consumers and when I run the job I
don't see the consumer group. Finally I decided to enable checkpointing and
restart the job and finally saw the consumer group.

Is this expected behaviour?

Flink 1.14.2/3 - KafkaSink vs deprecated FlinkKafkaProducer

2022-01-31 Thread Daniel Peled

Hi everyone,

Has anyone encountered any problem with the new KafkaSink that is used in
Flink 1.14 ?

When running our jobs, the *sinks* of some of our jobs are stuck in
initializing for more than an hour.
The only thing that helps is deleting the topic *__transaction_state*.
After deleting this topic, all sinks are immediately released and are in
running status.
The problem is quite random each time in a different job.
There are times that all jobs start running without any problems.

Unfortunately we had to go back to the deprecated FlinkKafkaProducer.

*We didn't have these problems with Flink 1.13 and FlinkKafkaProducer*

Any ideas on what to do ?
What are we doing wrong?

BR,
Daniel

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Andrew Otto

Shameless plug:  Maybe the Wikipedia EventStreams
 SSE API
 would make for a great
connector example in Flink?

:D

On Mon, Jan 31, 2022 at 5:41 AM Martijn Visser 
wrote:

> Hi all,
>
> Thanks for your feedback. It's not about having this connector in the main
> repo, that has been voted on already. This is strictly about the connector
> itself, since it's not maintained and most probably also can't be used due
> to changes in Twitter's API that aren't reflected in our connector
> implementation. Therefore I propose to remove it.
>
> Fully agree on the template part, what's good to know is that a connector
> template/archetype is part of the goals for the external
> connector repository.
>
> Best regards,
>
> Martijn
>
> On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani 
> wrote:
>
>> Hi,
>>
>> I agree with the concern about having this connector in the main repo.
>> But I think in general it doesn't harm to have a sample connector to show
>> how to develop a custom connector, and I think that the Twitter connector
>> can be a good candidate for such a template. It needs rework for sure, as
>> it has evident issues, notably it doesn't work with table.
>>
>> So i understand if we wanna remove what we have right now, but I think we
>> should have some replacement for a "connector template", which is both
>> ready to use and easy to hack to build your own connector starting from it.
>> Twitter API is a good example for such a template, as it's both "related"
>> to the known common use cases of Flink and because is quite simple to get
>> started with.
>>
>> FG
>>
>> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
>> wrote:
>>
>>> I agree.
>>>
>>> The Twitter connector is used in a few (unofficial) tutorials, so if we
>>> remove it that will make it more difficult for those tutorials to be
>>> maintained. On the other hand, if I recall correctly, that connector uses
>>> V1 of the Twitter API, which has been deprecated, so it's really not very
>>> useful even for that purpose.
>>>
>>> David
>>>
>>>
>>>
>>> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
>>> wrote:
>>>
 Hi everyone,

 I would like to discuss deprecating Flinks' Twitter connector [1]. This
 was one of the first connectors that was added to Flink, which could be
 used to access the tweets from Twitter. Given the evolution of Flink over
 Twitter, I don't think that:

 * Users are still using this connector at all
 * That the code for this connector should be in the main Flink
 codebase.

 Given the circumstances, I would propose to deprecate and remove this
 connector. I'm looking forward to your thoughts. If you agree, please also
 let me know if you think we should first deprecate it in Flink 1.15 and
 remove it in a version after that, or if you think we can remove it
 directly.

 Best regards,

 Martijn Visser
 https://twitter.com/MartijnVisser82

 [1]
 https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Martijn Visser

Hi all,

Thanks for your feedback. It's not about having this connector in the main
repo, that has been voted on already. This is strictly about the connector
itself, since it's not maintained and most probably also can't be used due
to changes in Twitter's API that aren't reflected in our connector
implementation. Therefore I propose to remove it.

Fully agree on the template part, what's good to know is that a connector
template/archetype is part of the goals for the external
connector repository.

Best regards,

Martijn

On Mon, 31 Jan 2022 at 11:32, Francesco Guardiani 
wrote:

> Hi,
>
> I agree with the concern about having this connector in the main repo. But
> I think in general it doesn't harm to have a sample connector to show how
> to develop a custom connector, and I think that the Twitter connector can
> be a good candidate for such a template. It needs rework for sure, as it
> has evident issues, notably it doesn't work with table.
>
> So i understand if we wanna remove what we have right now, but I think we
> should have some replacement for a "connector template", which is both
> ready to use and easy to hack to build your own connector starting from it.
> Twitter API is a good example for such a template, as it's both "related"
> to the known common use cases of Flink and because is quite simple to get
> started with.
>
> FG
>
> On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
> wrote:
>
>> I agree.
>>
>> The Twitter connector is used in a few (unofficial) tutorials, so if we
>> remove it that will make it more difficult for those tutorials to be
>> maintained. On the other hand, if I recall correctly, that connector uses
>> V1 of the Twitter API, which has been deprecated, so it's really not very
>> useful even for that purpose.
>>
>> David
>>
>>
>>
>> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
>> wrote:
>>
>>> Hi everyone,
>>>
>>> I would like to discuss deprecating Flinks' Twitter connector [1]. This
>>> was one of the first connectors that was added to Flink, which could be
>>> used to access the tweets from Twitter. Given the evolution of Flink over
>>> Twitter, I don't think that:
>>>
>>> * Users are still using this connector at all
>>> * That the code for this connector should be in the main Flink codebase.
>>>
>>> Given the circumstances, I would propose to deprecate and remove this
>>> connector. I'm looking forward to your thoughts. If you agree, please also
>>> let me know if you think we should first deprecate it in Flink 1.15 and
>>> remove it in a version after that, or if you think we can remove it
>>> directly.
>>>
>>> Best regards,
>>>
>>> Martijn Visser
>>> https://twitter.com/MartijnVisser82
>>>
>>> [1]
>>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>>
>>>

Re: [DISCUSS] Deprecate/remove Twitter connector

2022-01-31 Thread Francesco Guardiani

Hi,

I agree with the concern about having this connector in the main repo. But
I think in general it doesn't harm to have a sample connector to show how
to develop a custom connector, and I think that the Twitter connector can
be a good candidate for such a template. It needs rework for sure, as it
has evident issues, notably it doesn't work with table.

So i understand if we wanna remove what we have right now, but I think we
should have some replacement for a "connector template", which is both
ready to use and easy to hack to build your own connector starting from it.
Twitter API is a good example for such a template, as it's both "related"
to the known common use cases of Flink and because is quite simple to get
started with.

FG

On Sun, Jan 30, 2022 at 12:31 PM David Anderson 
wrote:

> I agree.
>
> The Twitter connector is used in a few (unofficial) tutorials, so if we
> remove it that will make it more difficult for those tutorials to be
> maintained. On the other hand, if I recall correctly, that connector uses
> V1 of the Twitter API, which has been deprecated, so it's really not very
> useful even for that purpose.
>
> David
>
>
>
> On Fri, Jan 21, 2022 at 9:34 AM Martijn Visser 
> wrote:
>
>> Hi everyone,
>>
>> I would like to discuss deprecating Flinks' Twitter connector [1]. This
>> was one of the first connectors that was added to Flink, which could be
>> used to access the tweets from Twitter. Given the evolution of Flink over
>> Twitter, I don't think that:
>>
>> * Users are still using this connector at all
>> * That the code for this connector should be in the main Flink codebase.
>>
>> Given the circumstances, I would propose to deprecate and remove this
>> connector. I'm looking forward to your thoughts. If you agree, please also
>> let me know if you think we should first deprecate it in Flink 1.15 and
>> remove it in a version after that, or if you think we can remove it
>> directly.
>>
>> Best regards,
>>
>> Martijn Visser
>> https://twitter.com/MartijnVisser82
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/twitter/
>>
>>

Re: KafkaSource vs FlinkKafkaConsumer010

Re: MetricRegistryTestUtils java class (flink-runtime/metrics) not found in source code version 1.14.3

KafkaSource vs FlinkKafkaConsumer010

Re: Flink 1.14 metrics : taskmanager host name missing

Re: Tumbling window apply will not "fire"

Re: Tumbling window apply will not "fire"

Re: Tumbling window apply will not "fire"

Re: Tumbling window apply will not "fire"

Re: Tumbling window apply will not "fire"

Read Avro type records from kafka using Python - Datastream API

Tumbling window apply will not "fire"

Re: Kafka Consumer Group Name not set if no checkpointing?

Re: Resolving a CatalogTable

Re: [DISCUSS] Deprecate/remove Twitter connector

Re: [DISCUSS] Deprecate/remove Twitter connector

Re: How to put external data in EmbeddedRocksDB

Re: Flink 1.14.2/3 - KafkaSink vs deprecated FlinkKafkaProducer

Re: Kafka Consumer Group Name not set if no checkpointing?

Re: [DISCUSS] Deprecate/remove Twitter connector

Re: [DISCUSS] Deprecate/remove Twitter connector

Kafka Consumer Group Name not set if no checkpointing?

Flink 1.14.2/3 - KafkaSink vs deprecated FlinkKafkaProducer

Re: [DISCUSS] Deprecate/remove Twitter connector

Re: [DISCUSS] Deprecate/remove Twitter connector

Re: [DISCUSS] Deprecate/remove Twitter connector

25 matches

Site Navigation

Mail list logo

Footer information