Re: Question: Determining Total Recovery Time

2020-02-10 Thread Arvid Heise
Hi Morgan, as Timo pointed out, there is no general solution, but in your setting, you could look at the consumer lag of the input topic after a crash. Lag would spike until all tasks restarted and reprocessing begins. Offsets are only committed on checkpoints though by default. Best, Arvid On

Batch reading from Cassandra. How to?

2020-02-10 Thread Lasse Nedergaard
Hi. We would like to do some batch analytics on our data set stored in Cassandra and are looking for an efficient way to load data from a single table. Not by key, but random 15%, 50% or 100% Data bricks has create an efficient way to load Cassandra data into Apache Spark and they are doing it by

Re: SSL configuration - default behaviour

2020-02-10 Thread Krzysztof Chmielewski
Thanks Robert, just a small suggestion maybe to change the documentation a little bit. I'm not sure if its only my impression but from sentence: *" All internal connections are SSL authenticated and encrypted"* initially I thought that this is the default configuration. Thanks, Krzysztof pon.,

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Dian Fu
+1 (non-binding) - Verified the signature and checksum - Pip installed the package successfully: pip install apache-flink-1.9.2.tar.gz - Run word count example successfully. Regards, Dian > 在 2020年2月11日,上午11:44,jincheng sun 写道: > > > +1 (binding) > > - Install the PyFlink by `pip install`

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Dian Fu
+1 (non-binding) - Verified the signature and checksum - Pip installed the package successfully: pip install apache-flink-1.9.2.tar.gz - Run word count example successfully. Regards, Dian > 在 2020年2月11日,上午11:44,jincheng sun 写道: > > > +1 (binding) > > - Install the PyFlink by `pip install`

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
+1 (binding) - Install the PyFlink by `pip install` [SUCCESS] - Run word_count in both command line and IDE [SUCCESS] Best, Jincheng Wei Zhong 于2020年2月11日周二 上午11:17写道: > Hi, > > Thanks for driving this, Jincheng. > > +1 (non-binding) > > - Verified signatures and checksums. > - Verified

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
+1 (binding) - Install the PyFlink by `pip install` [SUCCESS] - Run word_count in both command line and IDE [SUCCESS] Best, Jincheng Wei Zhong 于2020年2月11日周二 上午11:17写道: > Hi, > > Thanks for driving this, Jincheng. > > +1 (non-binding) > > - Verified signatures and checksums. > - Verified

Flink sink??ElasticSearch????

2020-02-10 Thread ????????
sink to ElasticSearchES package etl.estest; import org.apache.flink.api.common.functions.RuntimeContext; import org.apache.flink.api.common.serialization.SimpleStringSchema; import org.apache.flink.api.java.tuple.Tuple4;

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Wei Zhong
Hi, Thanks for driving this, Jincheng. +1 (non-binding) - Verified signatures and checksums. - Verified README.md and setup.py. - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python 3.7.5 successfully. - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread Wei Zhong
Hi, Thanks for driving this, Jincheng. +1 (non-binding) - Verified signatures and checksums. - Verified README.md and setup.py. - Run `pip install apache-flink-1.9.2.tar.gz` in Python 2.7.15 and Python 3.7.5 successfully. - Start local pyflink shell in Python 2.7.15 and Python 3.7.5 via

Re:Re: Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi ,guys Thanks for kind reply. Actually I want to know how to change client side haddop conf while using table API within my program. Hope some useful sug. At 2020-02-11 02:42:31, "Bowen Li" wrote: Hi sunfulin, Sounds like you didn't config the hadoop HA correctly on the client

Re:Re: Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi ,guys Thanks for kind reply. Actually I want to know how to change client side haddop conf while using table API within my program. Hope some useful sug. At 2020-02-11 02:42:31, "Bowen Li" wrote: Hi sunfulin, Sounds like you didn't config the hadoop HA correctly on the client

Re: merge implementation in count distinct

2020-02-10 Thread Jark Wu
Hi Fanbin, Thanks for reporting this. I think you are right, the implementation is not correct. I have created a JIRA issue [1] to fix this. Btw, the CountDistinctWithMerge in blink planner is implemented correctly [2]. Best, Jark [1]: https://issues.apache.org/jira/browse/FLINK-15979 [2]:

merge implementation in count distinct

2020-02-10 Thread Fanbin Bu
Hi, For the following implementation of merge, https://github.com/apache/flink/blob/0ab1549f52f1f544e8492757c6b0d562bf50a061/flink-table/flink-table-planner/src/test/java/org/apache/flink/table/runtime/utils/JavaUserDefinedAggFunctions.java#L224 what if acc has the some keys in mergeAcc? the

Re: Flink connect hive with hadoop HA

2020-02-10 Thread Bowen Li
Hi sunfulin, Sounds like you didn't config the hadoop HA correctly on the client side according to [1]. Let us know if it helps resolve the issue. [1] https://stackoverflow.com/questions/25062788/namenode-ha-unknownhostexception-nameservice1 On Mon, Feb 10, 2020 at 7:11 AM Khachatryan Roman

Re: Flink connect hive with hadoop HA

2020-02-10 Thread Bowen Li
Hi sunfulin, Sounds like you didn't config the hadoop HA correctly on the client side according to [1]. Let us know if it helps resolve the issue. [1] https://stackoverflow.com/questions/25062788/namenode-ha-unknownhostexception-nameservice1 On Mon, Feb 10, 2020 at 7:11 AM Khachatryan Roman

Re: Flink Minimal requirements

2020-02-10 Thread Khachatryan Roman
Hi Kristof, Flink doesn't have any specific requirements. You can run Flink on a single node with just one core. The number of threads is dynamic. However, you'll probably want to configure memory usage if the default values are greater than what the actual machine has. Regards, Roman On Mon,

FlinkCEP questions - architecture

2020-02-10 Thread Juergen Donnerstag
Hi, we're in very early stages evaluating options. I'm not a Flink expert, but did read some of the docs and watched videos. Could you please help me understand if and how certain of our reqs are covered by Flink (CEP). Is this mailing list the right channel for such questions? 1) We receive

Re: Exactly once semantics for hdfs sink

2020-02-10 Thread Khachatryan Roman
Hi Vishwas, Yes, Streaming File Sink does support exactly-once semantics and can be used with HDFS. Regards, Roman On Mon, Feb 10, 2020 at 5:20 PM Vishwas Siravara wrote: > Hi all, > I want to use the StreamingFile sink for writing data to hdfs. Can I > achieve exactly once semantics with

Exactly once semantics for hdfs sink

2020-02-10 Thread Vishwas Siravara
Hi all, I want to use the StreamingFile sink for writing data to hdfs. Can I achieve exactly once semantics with this sink ? Best, HW.

Re: Flink connect hive with hadoop HA

2020-02-10 Thread Khachatryan Roman
Hi, Could you please provide a full stacktrace? Regards, Roman On Mon, Feb 10, 2020 at 2:12 PM sunfulin wrote: > Hi, guys > I am using Flink 1.10 and test functional cases with hive intergration. > Hive with 1.1.0-cdh5.3.0 and with hadoop HA enabled.Running flink job I can > see successful

Re: Flink Elasticsearch upsert document in ES

2020-02-10 Thread ORIOL LOPEZ SANCHEZ
When building the request, you should build an UpdateRequest, like the following snippet: import org.elasticsearch.action.update.UpdateRequest import org.elasticsearch.common.xcontent.XContentType val doc: String = ??? val targetIndex: String = ??? val indexType: Option[String] = ??? new

Flink Elasticsearch upsert document in ES

2020-02-10 Thread ApoorvK
Team, Presently I have added elasticsearch as a sink to a stream and inserting the json data, the problem is when I restore the application in case of crash it reprocess the data in between (meanwhile a backend application updates the document in ES) and flink reinsert the document in ES and all

Re: SSL configuration - default behaviour

2020-02-10 Thread Robert Metzger
Hi, thanks a lot for your message. By default, internal connections are not encrypted. On Fri, Feb 7, 2020 at 4:08 PM KristoffSC wrote: > Hi, > In documentation [1] we can read that > > All internal connections are SSL authenticated and encrypted. The > connections use mutual authentication,

Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Flavio Pompermaier
+1 for dropping all Elasticsearch connectors < 6.x On Mon, Feb 10, 2020 at 2:45 PM Dawid Wysakowicz wrote: > Hi all, > > As described in this https://issues.apache.org/jira/browse/FLINK-11720 > ticket our elasticsearch 5.x connector does not work out of the box on > some systems and requires a

Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Benchao Li
+1 for dropping 2.x - 5.x. FYI currently only 6.x and 7.x ES Connectors are supported by table api. Flavio Pompermaier 于2020年2月10日周一 下午10:03写道: > +1 for dropping all Elasticsearch connectors < 6.x > > On Mon, Feb 10, 2020 at 2:45 PM Dawid Wysakowicz > wrote: > > > Hi all, > > > > As described

Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Itamar Syn-Hershko
+1 from dropping old versions because of jar hells etc. However, in the wild there are still a lot of 2.x clusters and definitely 5.x clusters that are having a hard time upgrading. We know because we assist those on a daily basis. It is very easy to create an HTTP based connector that works with

Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Robert Metzger
Thanks for starting this discussion! +1 to drop both On Mon, Feb 10, 2020 at 2:45 PM Dawid Wysakowicz wrote: > Hi all, > > As described in this https://issues.apache.org/jira/browse/FLINK-11720 > ticket our elasticsearch 5.x connector does not work out of the box on > some systems and requires

[DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Dawid Wysakowicz
Hi all, As described in this https://issues.apache.org/jira/browse/FLINK-11720 ticket our elasticsearch 5.x connector does not work out of the box on some systems and requires a version bump. This also happens for our e2e. We cannot bump the version in es 5.x connector, because 5.x connector

Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi, guys I am using Flink 1.10 and test functional cases with hive intergration. Hive with 1.1.0-cdh5.3.0 and with hadoop HA enabled.Running flink job I can see successful connection with hive metastore, but cannot read table data with exception: java.lang.IllegalArgumentException:

Flink connect hive with hadoop HA

2020-02-10 Thread sunfulin
Hi, guys I am using Flink 1.10 and test functional cases with hive intergration. Hive with 1.1.0-cdh5.3.0 and with hadoop HA enabled.Running flink job I can see successful connection with hive metastore, but cannot read table data with exception: java.lang.IllegalArgumentException:

Re: Flink Elasticsearch upsert document in ES

2020-02-10 Thread Apoorv Upadhyay
I have tried by providing opType to elasticsearch builder, I am getting an error message "document already exists" on my console, but it still updates the value in elasticsearch val jsonString = write(record) val rqst: IndexRequest = Requests.indexRequest .index(parameter.get("esIndexName"))

[VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
Hi everyone, Please review and vote on the release candidate #1 for the PyFlink version 1.9.2, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * the official

[VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #1

2020-02-10 Thread jincheng sun
Hi everyone, Please review and vote on the release candidate #1 for the PyFlink version 1.9.2, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * the official

Re: [Help] Anyone know where I can find performance test result?

2020-02-10 Thread Khachatryan Roman
+ user@flink.apache.org (re-adding) If you have a PR and would like to check the performance you can reach Flink committers to see the results at http://codespeed.dak8s.net:8000/ This UI uses https://github.com/tobami/codespeed So you can also set it up in your environment. Regards, Roman On

Re: [Help] Anyone know where I can find performance test result?

2020-02-10 Thread Khachatryan Roman
Hi Xu Yan, Do you mean flink-benchmarks repo? Regards, Roman On Mon, Feb 10, 2020 at 4:18 AM 闫旭 wrote: > Hi there, > > I am just exploring the apache flink git repo and found the performance > test. I have already test on my local machine, I’m wondering if we got > online result? > > Thanks

Re: Flink HA for Job Cluster

2020-02-10 Thread KristoffSC
Thanks you both for answers. So I just want to have this right. I can I achieve HA for Job Cluster Docker config having the zookeeper quorum configured like mentioned in [1] right (with s3 and zookeeper)? I assume to modify default Job Cluster config to match the [1] setup. [1]

Flink Minimal requirements

2020-02-10 Thread KristoffSC
Hi all, well this may be a little bit strange question, but are there any minimal machine requirements (memory size, CPU etc) and non functional requirements (number of nodes, network ports ports, etc) for Flink? I know it all boils down to what my deployed Job will be, but if we just could put