[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125944#comment-17125944 ] Etienne Chauchot commented on FLINK-17961: -- I just commented in the existing design discussion thread: [https://lists.apache.org/thread.html/r33cd907cecfd125ab1164ddc8a4d8e45d6bd3afd332fbb034881b1ff%40%3Cdev.flink.apache.org%3E] > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17125806#comment-17125806 ] Etienne Chauchot commented on FLINK-17961: -- [~chesnay] ES source can definitely mask the overall complexity to the user. As an example in Apache Beam ([available here|https://github.com/apache/beam/blob/e1963c11f9a853564d62f83993dec08ed8a9321f/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L156]) what we do is we use sliced scroll to split the input collection for parallel reading and apply it to the user ES query or to a default _select * from index_ when there is no provided query. Thus, the user API remains simple with _ESIO.read().from(index).withQuery(query)._ My worries here are more related to streaming and failover capabilities raised by Aljoscha. Even though ES is a main source (not an enrichment one IMO) it does not meet some Flink expectancies (cf comments above). So the question is reduced to: is it worth investing some time to make an ES source still? Regarding the thread on an ES table source, I'll read it and comment if I have anything useful to say. > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124925#comment-17124925 ] Chesnay Schepler commented on FLINK-17961: -- There's an ongoing discussion about adding an elasticsearch sink on the mailing list, with a design document; primarily aimed at the table API though. Overall I'm somewhat skeptic given how many different ways there are to read data from Elasticsearch; with basic queries only returning a limited number of values and pagination/scrolling working slightly differently and partially relying on the initial query (after_key). Unless there's a client that abstracts all these differences it seems difficult to implement in a way where users don't have to configure a lot themselves. > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17124918#comment-17124918 ] Aljoscha Krettek commented on FLINK-17961: -- I think that's a good idea, I see ES more as a "side source" than a main source. People would probably use it to enrich a main stream of data? Unfortunately we don't have a general API for that currently, so yes, people would have to manually implement such lookups. > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17119635#comment-17119635 ] Etienne Chauchot commented on FLINK-17961: -- Thanks Aljoscha for commenting. ES has data streams features but only for time series data; the aim of this source is to read all kind of data. Apart from data streams it behaves like a database. You read the content of an index (similar to a table) corresponding to the given query (similar to SQL). So, regarding streaming changes, if there are changes between 2 read requests, at the second the whole index (containing the change) will be read another time. Regarding failover: I guess exactly once semantics cannot be guaranteed only at least once. Indeed there is no ack mechanism on already read data. Under those circumstances, I guess an ES source cannot get into ES. So what should a user do to read from ES? Should he send ES requests manually from a Map ? > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118872#comment-17118872 ] Aljoscha Krettek commented on FLINK-17961: -- I think this one is tricky. We should only add a source if we can also support it for streaming programs. Meaning it has to be fault-tolerant and support reading updates. Is it possible to get, for example, a stream of changes from ES? > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118740#comment-17118740 ] Etienne Chauchot commented on FLINK-17961: -- A question: I'm wondering if the source needs to implement _CheckpointedFunction or related._ ES source will be of batch type, executing an ES query storing the results and serving them one by one via the context.collect. For such a source, is this important to store the elements read so far in a checkpointed state in case of failure or should we just ignore the checkpointing mechanism and re-read from ES in case of error? > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17118405#comment-17118405 ] Etienne Chauchot commented on FLINK-17961: -- [~aljoscha] can you assign me this ticket? > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (FLINK-17961) Create an Elasticsearch source
[ https://issues.apache.org/jira/browse/FLINK-17961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17117590#comment-17117590 ] Etienne Chauchot commented on FLINK-17961: -- I was thinking about implementing it as a _RichParallelSourceFunction_ with similar architecture as the simple _IntegerSource._ Is this the correct way to go ? Also, to connect to ES there are low level Rest client that supports all versions of ES but with a very low level REST string API and there is the high level REST client that has versions for each ES backend. As the existing ES sink uses high level REST client and as packages are already divided into ES version specific modules, it is better to keep this organization and use the high level client. > Create an Elasticsearch source > -- > > Key: FLINK-17961 > URL: https://issues.apache.org/jira/browse/FLINK-17961 > Project: Flink > Issue Type: New Feature > Components: Connectors / ElasticSearch >Reporter: Etienne Chauchot >Priority: Minor > > There is only an Elasticsearch sink available. There are opensource github > repos such as [this > one|[https://github.com/mnubo/flink-elasticsearch-source-connector]]. Also > the apache bahir project does not provide an Elasticsearch source connector > for flink either. IMHO I think the project would benefit from having an > bundled source connector for ES alongside with the available sink connector. -- This message was sent by Atlassian Jira (v8.3.4#803005)