[jira] [Resolved] (BEAM-4825) Nexmark query3 is flaky on Direct runner. State and/or timer issue ?
[ https://issues.apache.org/jira/browse/BEAM-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-4825. Fix Version/s: Not applicable Resolution: Fixed No flakiness observed (nexmark perfkit dashboards for direct runner) in the last quarter, closing the ticket > Nexmark query3 is flaky on Direct runner. State and/or timer issue ? > > > Key: BEAM-4825 > URL: https://issues.apache.org/jira/browse/BEAM-4825 > Project: Beam > Issue Type: Bug > Components: runner-direct >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > > Query3 exercises state and timers. It asks this question to Nexmark auction > system: > Who is selling in particular US states? > And the sketch of its code is: > * Apply global window to events with trigger repeatedly after at least > nbEvents in pane => results will be materialized each time nbEvents are > received. > * input1: collection of auctions events filtered by category and keyed by > seller id > * input2: collection of persons events filtered by US state codes and keyed > by person id > * CoGroupByKey to group auctions and persons by personId/sellerId + tags to > distinguish persons and auctions > * ParDo to do the incremental join: auctions and person events can arrive > out of order > * person element stored in persistent state in order to match future > auctions by that person. Set a timer to clear the person state after a TTL > * auction elements stored in persistent state until we have seen the > corresponding person record. Then, it can be output and cleared > * output NameCityStateId(person.name, person.city, person.state, auction.id) > objects > > > *The output size should be constant and it is not.* > *The output size of this query is different between batch and streaming modes* > > See query 3 dashboard in this graph: > https://apache-beam-testing.appspot.com/explore?dashboard=5099379773931520 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-5178) Gradle: Try to reduce the initialization time when running a UTest multiple times.
[ https://issues.apache.org/jira/browse/BEAM-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-5178. Fix Version/s: Not applicable Resolution: Fixed with gradle changes in the build and IDE plugin improvement, this delay is less harmful, closing the ticket. > Gradle: Try to reduce the initialization time when running a UTest multiple > times. > -- > > Key: BEAM-5178 > URL: https://issues.apache.org/jira/browse/BEAM-5178 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Etienne Chauchot >Priority: P2 > Labels: gradle, stale-P2 > Fix For: Not applicable > > > When we run a unit test multiple times in intellij, we pay each time 6s in > initialization of the test (on my machine). Is there a configuration item > like caching that allows to reduce this time? > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-5177) Integrate javadocs in the IDE with gradle
[ https://issues.apache.org/jira/browse/BEAM-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-5177. -- Fix Version/s: Not applicable Resolution: Won't Fix Not related to Beam, IDE configuration. Closing > Integrate javadocs in the IDE with gradle > - > > Key: BEAM-5177 > URL: https://issues.apache.org/jira/browse/BEAM-5177 > Project: Beam > Issue Type: Improvement > Components: build-system >Reporter: Etienne Chauchot >Priority: P2 > Labels: gradle, stale-P2 > Fix For: Not applicable > > > It would be good to have automatic javadoc artifacts download for external > dependencies -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-6344) :beam-sdks-java-io-elasticsearch-tests-2:test failed owing to get 503 from ES
[ https://issues.apache.org/jira/browse/BEAM-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-6344. -- Fix Version/s: Not applicable Resolution: Fixed There were recent overload counter measures on ES tests and there was no flaky tests since. Closing this issue. > :beam-sdks-java-io-elasticsearch-tests-2:test failed owing to get 503 from ES > - > > Key: BEAM-6344 > URL: https://issues.apache.org/jira/browse/BEAM-6344 > Project: Beam > Issue Type: Test > Components: test-failures >Reporter: Boyuan Zhang >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > > [https://builds.apache.org/job/beam_Release_NightlySnapshot/290/] > log: > https://scans.gradle.com/s/umeki7yl4ipq6/tests/fo2bghfaj5ysq-7zrcy3fjk3uwg -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7290) Upgrade hadoop-client
[ https://issues.apache.org/jira/browse/BEAM-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-7290. Fix Version/s: Not applicable Resolution: Fixed this ticket was open for a CVE on hadoop-client dep but , there is no more CVE raised for current beam dep. Closing > Upgrade hadoop-client > - > > Key: BEAM-7290 > URL: https://issues.apache.org/jira/browse/BEAM-7290 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-7291) Upgrade hadoop-common
[ https://issues.apache.org/jira/browse/BEAM-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-7291. -- Fix Version/s: Not applicable Resolution: Fixed this ticket was open for a CVE on hadoop-common but , there is no more CVE raised for current beam dep. Closing > Upgrade hadoop-common > - > > Key: BEAM-7291 > URL: https://issues.apache.org/jira/browse/BEAM-7291 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7292) Upgrade hadoop-mapreduce-client-core
[ https://issues.apache.org/jira/browse/BEAM-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-7292. Fix Version/s: Not applicable Resolution: Fixed this ticket was open for a CVE on hadoop-mapreduce-client-core but , there is no more CVE raised for current beam dep. Closing > Upgrade hadoop-mapreduce-client-core > > > Key: BEAM-7292 > URL: https://issues.apache.org/jira/browse/BEAM-7292 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-7295) Upgrade javax.mail
[ https://issues.apache.org/jira/browse/BEAM-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-7295. -- Fix Version/s: Not applicable Resolution: Fixed No more in use in Beam so closing the issue > Upgrade javax.mail > -- > > Key: BEAM-7295 > URL: https://issues.apache.org/jira/browse/BEAM-7295 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-7298) Upgrade zookeeper
[ https://issues.apache.org/jira/browse/BEAM-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-7298. Fix Version/s: Not applicable Resolution: Fixed Opened because of a CVE in zookeeper. [~iemejia] upgraded zookeeper beam dep on 08/02/2019 so closing this issue > Upgrade zookeeper > - > > Key: BEAM-7298 > URL: https://issues.apache.org/jira/browse/BEAM-7298 > Project: Beam > Issue Type: Sub-task > Components: dependencies >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-7400) Authentication for ElasticsearchIO seems to be failing
[ https://issues.apache.org/jira/browse/BEAM-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-7400: --- Description: It seem to be failing at least on elasticsearch cloud: {code:java} [WARNING] org.elasticsearch.client.ResponseException: GET https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: HTTP/1.1 401 Unauthorized {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:monitor/main] requires authentication","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action [cluster:monitor/main] requires authentication","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}},"status":401} {code} was: It seem to be failing at list on elasticsearch cloud: {code} [WARNING] org.elasticsearch.client.ResponseException: GET https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: HTTP/1.1 401 Unauthorized {"error":{"root_cause":[{"type":"security_exception","reason":"action [cluster:monitor/main] requires authentication","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action [cluster:monitor/main] requires authentication","header":{"WWW-Authenticate":["Bearer realm=\"security\"","ApiKey","Basic realm=\"security\" charset=\"UTF-8\""]}},"status":401} {code} > Authentication for ElasticsearchIO seems to be failing > -- > > Key: BEAM-7400 > URL: https://issues.apache.org/jira/browse/BEAM-7400 > Project: Beam > Issue Type: Bug > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2 > > It seem to be failing at least on elasticsearch cloud: > {code:java} > [WARNING] > org.elasticsearch.client.ResponseException: GET > https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: > HTTP/1.1 401 Unauthorized > {"error":{"root_cause":[{"type":"security_exception","reason":"action > [cluster:monitor/main] requires > authentication","header":{"WWW-Authenticate":["Bearer > realm=\"security\"","ApiKey","Basic realm=\"security\" > charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action > [cluster:monitor/main] requires > authentication","header":{"WWW-Authenticate":["Bearer > realm=\"security\"","ApiKey","Basic realm=\"security\" > charset=\"UTF-8\""]}},"status":401} > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8859) Fix SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-8859. -- Fix Version/s: Not applicable Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/BEAM-8907 > Fix SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner > -- > > Key: BEAM-8859 > URL: https://issues.apache.org/jira/browse/BEAM-8859 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-P2, structured-streaming > Fix For: Not applicable > > > validates runner test > org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded > fails in spark structured streaming runner -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-4797) Allow the user to limit the number of result docs in ElasticsearchIO.read()
[ https://issues.apache.org/jira/browse/BEAM-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-4797. Fix Version/s: Not applicable Resolution: Won't Fix A contributor offered to implement the feature but never sent the PR. The other user that asked for it solved his problem manually so closing the issue. > Allow the user to limit the number of result docs in ElasticsearchIO.read() > --- > > Key: BEAM-4797 > URL: https://issues.apache.org/jira/browse/BEAM-4797 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: P2 > Labels: stale-assigned > Fix For: Not applicable > > > In some cases, like sampling, the users will be interested in limiting the > number of docs a ESIO.read() returns. > It is as simple as allowing the user to pass terminate_after parameter to the > IO configuration. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-5370) Set a cause exception in ElasticsearchIO#getBackendVersion
[ https://issues.apache.org/jira/browse/BEAM-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-5370. Fix Version/s: 2.7.0 Resolution: Fixed Forgot to close that one, thanks for the reminded. I went through the history to see in which version i was fixed. > Set a cause exception in ElasticsearchIO#getBackendVersion > -- > > Key: BEAM-5370 > URL: https://issues.apache.org/jira/browse/BEAM-5370 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Affects Versions: 2.6.0 >Reporter: Shinsuke Sugaya >Assignee: Etienne Chauchot >Priority: P3 > Labels: stale-assigned > Fix For: 2.7.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I could not check a root cause of IllegalArgumentException in > ElasticsearchIO#getBackendVersion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5757) Elasticsearch IO provide delete function
[ https://issues.apache.org/jira/browse/BEAM-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123561#comment-17123561 ] Etienne Chauchot commented on BEAM-5757: Delete an entire index does not make a lot of sense inside the IO but deleting documents in this index does and some other IOS including CassandraIO do support such a feature. If someone is willing to submit a PR on that I could do the review. > Elasticsearch IO provide delete function > > > Key: BEAM-5757 > URL: https://issues.apache.org/jira/browse/BEAM-5757 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Affects Versions: 2.7.0 >Reporter: Prasad Marne >Priority: P2 > Labels: stale-assigned > > Hello beam maintainers, > I am trying to delete some index from Elasticsearch using beam pipeline. I > couldn't find and delete function in ElasticsearchIO. It would be nice to > have and would make sense for cleaning up indexes overtime. > I also checked some other IO classes and they also don't support delete. Not > sure if beam has some policy against supporting delete. > Please guide me on this. I am willing to create pull request if this feature > makes sense to project contributors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-5757) Elasticsearch IO provide delete function
[ https://issues.apache.org/jira/browse/BEAM-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-5757: -- Assignee: (was: Etienne Chauchot) > Elasticsearch IO provide delete function > > > Key: BEAM-5757 > URL: https://issues.apache.org/jira/browse/BEAM-5757 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Affects Versions: 2.7.0 >Reporter: Prasad Marne >Priority: P2 > Labels: stale-assigned > > Hello beam maintainers, > I am trying to delete some index from Elasticsearch using beam pipeline. I > couldn't find and delete function in ElasticsearchIO. It would be nice to > have and would make sense for cleaning up indexes overtime. > I also checked some other IO classes and they also don't support delete. Not > sure if beam has some policy against supporting delete. > Please guide me on this. I am willing to create pull request if this feature > makes sense to project contributors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-6304) can ElasticsearchIO add a ExceptionHandlerFn
[ https://issues.apache.org/jira/browse/BEAM-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-6304. Fix Version/s: Not applicable Resolution: Won't Fix As it received no update, I guess the user followed my advice and implemented it as a pre-processing DoFn > can ElasticsearchIO add a ExceptionHandlerFn > - > > Key: BEAM-6304 > URL: https://issues.apache.org/jira/browse/BEAM-6304 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch >Affects Versions: Not applicable >Reporter: big >Assignee: Etienne Chauchot >Priority: P2 > Labels: stale-assigned > Fix For: Not applicable > > > I use ElasticsearchIO to write my data to elasticSearch. However, the data is > from other platform and not easy to check its validity. If we get the invalid > data, we can ignore it( even though use batch insert, we can ignore all of > them). So, I wish has a registered exception catch function to process it. > From now on, I read the source code about write function in ProcessElement, > it just throw the exception and cause my job to stop. > I can catch pipeline.run().waitUntilFinish() on direct runner and force it > run again use while statement ungracefully. However, when it deploy to Flink, > it will fail because Flink report exception that it cannot optimize the job. > If there is a method let user to decide how to process exception is required. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-8860) Implement combine with context in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-8860: -- Assignee: (was: Etienne Chauchot) > Implement combine with context in new spark runner > -- > > Key: BEAM-8860 > URL: https://issues.apache.org/jira/browse/BEAM-8860 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-assigned, structured-streaming > > Validates runner tests below fail in spark structured streaming runner > because combine with context (for side inputs) is not implemented: > > 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9032) Replace broadcast variables based side inputs with temp views
[ https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-9032. Fix Version/s: Not applicable Resolution: Won't Fix Temp views need to be created/accessed through spark sql context. So the spark sql context need to be accessed from the SideInputReader but it is not serializable. So the temp views cannot be used for side inputs, keeping the broadcast variables. > Replace broadcast variables based side inputs with temp views > - > > Key: BEAM-9032 > URL: https://issues.apache.org/jira/browse/BEAM-9032 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: P2 > Labels: stale-assigned, structured-streaming > Fix For: Not applicable > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9032) Replace broadcast variables based side inputs with temp views
[ https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123510#comment-17123510 ] Etienne Chauchot commented on BEAM-9032: Forgot to put the update. Thanks for the reminder Kenn ! > Replace broadcast variables based side inputs with temp views > - > > Key: BEAM-9032 > URL: https://issues.apache.org/jira/browse/BEAM-9032 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: P2 > Labels: stale-assigned, structured-streaming > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory
[ https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-9307: -- Assignee: (was: Etienne Chauchot) > Put windows inside the key to avoid having all values for the same key in > memory > > > Key: BEAM-9307 > URL: https://issues.apache.org/jira/browse/BEAM-9307 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Priority: P2 > Labels: stale-assigned, structured-streaming > > On both group by key and combinePerKey. > Like it was done for the current runner. > See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9436) Improve performance of GBK
[ https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-9436. Fix Version/s: 2.20.0 Resolution: Fixed Performance of GBK was improved by linked PR even if materialization cannot be avoided so closing this ticket. The improvement was merged for 2.20 so targeting this version > Improve performance of GBK > -- > > Key: BEAM-9436 > URL: https://issues.apache.org/jira/browse/BEAM-9436 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: P2 > Labels: stale-assigned, structured-streaming > Fix For: 2.20.0 > > Time Spent: 15h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111273#comment-17111273 ] Etienne Chauchot commented on BEAM-10017: - there is a jar task in the cassandra gradle module. You can use your IDE with the gradle plugin > Expose SocketOptions timeouts in CassandraIO > > > Key: BEAM-10017 > URL: https://issues.apache.org/jira/browse/BEAM-10017 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Reporter: Nathan Fisher >Priority: P3 > Time Spent: 50m > Remaining Estimate: 0h > > Currently there are no options to tune the configuration of the CassandraIO > reader/writer. This can be useful for either slow clusters, large queries, or > high latency links. > The intent would be to expose the following configuration elements as setters > on the CassandraIO builder similar to withKeyspace and other methods. > > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int > connectTimeoutMillis)}} > Sets the connection timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean > keepAlive)}} > Sets whether to enable TCP keepalive.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int > readTimeoutMillis)}} > Sets the per-host read timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int > receiveBufferSize)}} > Sets a hint to the size of the underlying buffers for incoming network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean > reuseAddress)}} > Sets whether to enable reuse-address.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int > sendBufferSize)}} > Sets a hint to the size of the underlying buffers for outgoing network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int > soLinger)}} > Sets the linger-on-close timeout.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean > tcpNoDelay)}} > Sets whether to disable Nagle's algorithm.| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110950#comment-17110950 ] Etienne Chauchot commented on BEAM-10017: - Hi Nathan, thanks for moving the design discussion to the ticket and applying the no-knobs philosophy of Beam. Connect timeout and Read timeout make total sense. > Expose SocketOptions timeouts in CassandraIO > > > Key: BEAM-10017 > URL: https://issues.apache.org/jira/browse/BEAM-10017 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Reporter: Nathan Fisher >Priority: P3 > Time Spent: 50m > Remaining Estimate: 0h > > Currently there are no options to tune the configuration of the CassandraIO > reader/writer. This can be useful for either slow clusters, large queries, or > high latency links. > The intent would be to expose the following configuration elements as setters > on the CassandraIO builder similar to withKeyspace and other methods. > > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int > connectTimeoutMillis)}} > Sets the connection timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean > keepAlive)}} > Sets whether to enable TCP keepalive.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int > readTimeoutMillis)}} > Sets the per-host read timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int > receiveBufferSize)}} > Sets a hint to the size of the underlying buffers for incoming network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean > reuseAddress)}} > Sets whether to enable reuse-address.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int > sendBufferSize)}} > Sets a hint to the size of the underlying buffers for outgoing network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int > soLinger)}} > Sets the linger-on-close timeout.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean > tcpNoDelay)}} > Sets whether to disable Nagle's algorithm.| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO
[ https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-10017: Summary: Expose SocketOptions timeouts in CassandraIO (was: Expose SocketOptions timeouts in CassandraIO builder) > Expose SocketOptions timeouts in CassandraIO > > > Key: BEAM-10017 > URL: https://issues.apache.org/jira/browse/BEAM-10017 > Project: Beam > Issue Type: Improvement > Components: io-java-cassandra >Reporter: Nathan Fisher >Priority: P3 > Time Spent: 50m > Remaining Estimate: 0h > > Currently there are no options to tune the configuration of the CassandraIO > reader/writer. This can be useful for either slow clusters, large queries, or > high latency links. > The intent would be to expose the following configuration elements as setters > on the CassandraIO builder similar to withKeyspace and other methods. > > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int > connectTimeoutMillis)}} > Sets the connection timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean > keepAlive)}} > Sets whether to enable TCP keepalive.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int > readTimeoutMillis)}} > Sets the per-host read timeout in milliseconds.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int > receiveBufferSize)}} > Sets a hint to the size of the underlying buffers for incoming network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean > reuseAddress)}} > Sets whether to enable reuse-address.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int > sendBufferSize)}} > Sets a hint to the size of the underlying buffers for outgoing network I/O.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int > soLinger)}} > Sets the linger-on-close timeout.| > |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean > tcpNoDelay)}} > Sets whether to disable Nagle's algorithm.| -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-3926) Support MetricsPusher in Dataflow Runner
[ https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108082#comment-17108082 ] Etienne Chauchot edited comment on BEAM-3926 at 5/15/20, 8:47 AM: -- [~foegler], [~pabloem], [~ajam...@google.com], I have a user who asks for this feature in Dataflow. Is there a willingness to implement it for the Dataflow runner? If so, as the MetricsPusher needs to be instanciated at the engine side (cf arguments in the description of the ticket), I was wondering if the worker part of the dataflow runner could be the correct spot and as it was donated it would enable the community to implement the feature for Dataflow. was (Author: echauchot): [~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is there a willingness to implement it for the Dataflow runner? If so, as the MetricsPusher needs to be instanciated at the engine side (cf arguments in the description of the ticket), I was wondering if the worker part of the dataflow runner could be the correct spot and as it was donated it would enable the community to implement the feature for Dataflow. > Support MetricsPusher in Dataflow Runner > > > Key: BEAM-3926 > URL: https://issues.apache.org/jira/browse/BEAM-3926 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Pablo Estrada >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > See [relevant email > thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E]. > From [~echauchot]: > > _AFAIK Dataflow being a cloud hosted engine, the related runner is very > different from the others. It just submits a job to the cloud hosted engine. > So, no access to metrics container etc... from the runner. So I think that > the MetricsPusher (component responsible for merging metrics and pushing them > to a sink backend) must not be instanciated in DataflowRunner otherwise it > would be more a client (driver) piece of code and we will lose all the > interest of being close to the execution engine (among other things > instrumentation of the execution of the pipelines). I think that the > MetricsPusher needs to be instanciated in the actual Dataflow engine._ > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3926) Support MetricsPusher in Dataflow Runner
[ https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108082#comment-17108082 ] Etienne Chauchot commented on BEAM-3926: [~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is there a willingness to implement it for the Dataflow runner? If so, as the MetricsPusher needs to be instanciated at the engine side (cf arguments in the description of the ticket), I was wondering if the worker part of the dataflow runner could be the correct spot and as it was donated it would enable the community to implement the feature for Dataflow. > Support MetricsPusher in Dataflow Runner > > > Key: BEAM-3926 > URL: https://issues.apache.org/jira/browse/BEAM-3926 > Project: Beam > Issue Type: Sub-task > Components: runner-dataflow >Reporter: Scott Wegner >Assignee: Pablo Estrada >Priority: Major > Time Spent: 5h 50m > Remaining Estimate: 0h > > See [relevant email > thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E]. > From [~echauchot]: > > _AFAIK Dataflow being a cloud hosted engine, the related runner is very > different from the others. It just submits a job to the cloud hosted engine. > So, no access to metrics container etc... from the runner. So I think that > the MetricsPusher (component responsible for merging metrics and pushing them > to a sink backend) must not be instanciated in DataflowRunner otherwise it > would be more a client (driver) piece of code and we will lose all the > interest of being close to the execution engine (among other things > instrumentation of the execution of the pipelines). I think that the > MetricsPusher needs to be instanciated in the actual Dataflow engine._ > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-8025. Fix Version/s: 2.21.0 Resolution: Fixed I took a look at the statistics since my fix PR was merged and within 14 days, no failure. So I'm closing this ticket. > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Fix For: 2.21.0 > > Time Spent: 6h > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098905#comment-17098905 ] Etienne Chauchot commented on BEAM-8025: My last fix was merged, I'm waiting for 2 weeks to see if the increased retrial/retrial delay are enough to fix the flakiness issue. > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Time Spent: 6h > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096405#comment-17096405 ] Etienne Chauchot commented on BEAM-8025: Just submitted a PR: [https://github.com/apache/beam/pull/11578] I put you as reviewer. > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Time Spent: 4.5h > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391 ] Etienne Chauchot edited comment on BEAM-8025 at 4/30/20, 10:17 AM: --- Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But it is difficult to reproduce. I lack time a lot these weeks but I can try to take a look a it. was (Author: echauchot): Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But it is difficult to reproduce > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Time Spent: 4h 20m > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391 ] Etienne Chauchot edited comment on BEAM-8025 at 4/30/20, 10:02 AM: --- Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But it is difficult to reproduce was (Author: echauchot): Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Time Spent: 4h 20m > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391 ] Etienne Chauchot commented on BEAM-8025: Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Critical > Labels: flake > Time Spent: 4h 20m > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9436) Improve performance of GBK
[ https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-9436: --- Summary: Improve performance of GBK (was: Try to avoid elements list materialization in GBK) > Improve performance of GBK > -- > > Key: BEAM-9436 > URL: https://issues.apache.org/jira/browse/BEAM-9436 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 13h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9133) CassandraIOTest.classMethod test is still flaky
[ https://issues.apache.org/jira/browse/BEAM-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-9133. Fix Version/s: Not applicable Resolution: Duplicate [~aromanenko] thanks. I know it is still flaky. That is why I reopened BEAM-8025 some days ago. So, this ticket is duplicate. Unfortunately I have no time to work on that right now. Maybe [~adejanovski] ? > CassandraIOTest.classMethod test is still flaky > --- > > Key: BEAM-9133 > URL: https://issues.apache.org/jira/browse/BEAM-9133 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.17.0 >Reporter: Alexey Romanenko >Assignee: Etienne Chauchot >Priority: Critical > Fix For: Not applicable > > > CassandraIOTest is still flaky. For example: > https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1646/ > https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1625/ > {code} > Error Message > java.lang.RuntimeException: Unable to create embedded Cassandra cluster > Stacktrace > java.lang.RuntimeException: Unable to create embedded Cassandra cluster > at > org.apache.beam.sdk.io.cassandra.CassandraIOTest.buildCluster(CassandraIOTest.java:167) > at > org.apache.beam.sdk.io.cassandra.CassandraIOTest.beforeClass(CassandraIOTest.java:146) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) > at org.junit.runners.ParentRunner.run(ParentRunner.java:412) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) > at > org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) > at > org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) > at > org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) > at > org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) > at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) > at > org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) > at > org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175) > at > org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157) >
[jira] [Updated] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-8025: --- Fix Version/s: (was: 2.19.0) > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Blocker > Time Spent: 4h 20m > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reopened BEAM-8025: Still flaky. Might be for another reason though: {code:java} java.lang.RuntimeException: Unable to create embedded Cassandra cluster at org.apache.beam.sdk.io.cassandra.CassandraIOTest.buildCluster(CassandraIOTest.java:167) at org.apache.beam.sdk.io.cassandra.CassandraIOTest.beforeClass(CassandraIOTest.java:146) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at org.junit.runners.ParentRunner.run(ParentRunner.java:412) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404) at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63) at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55) at java.lang.Thread.run(Thread.java:748) {code} and {code:java} java.lang.NullPointerException at info.archinnov.achilles.embedded.CassandraShutDownHook.shutDownNow(CassandraShutDownHook.java:81) at org.apache.beam.sdk.io.cassandra.CassandraIOTest.afterClass(CassandraIOTest.java:172) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at
[jira] [Created] (BEAM-9480) org.apache.beam.sdk.io.mongodb.MongoDbIOTest.testReadWithAggregate is flaky
Etienne Chauchot created BEAM-9480: -- Summary: org.apache.beam.sdk.io.mongodb.MongoDbIOTest.testReadWithAggregate is flaky Key: BEAM-9480 URL: https://issues.apache.org/jira/browse/BEAM-9480 Project: Beam Issue Type: Test Components: io-java-mongodb Reporter: Etienne Chauchot [https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1831/testReport/junit/org.apache.beam.sdk.io.mongodb/MongoDbIOTest/testReadWithAggregate/] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9470) :sdks:java:io:kinesis:test is flaky
Etienne Chauchot created BEAM-9470: -- Summary: :sdks:java:io:kinesis:test is flaky Key: BEAM-9470 URL: https://issues.apache.org/jira/browse/BEAM-9470 Project: Beam Issue Type: Test Components: io-java-kinesis Reporter: Etienne Chauchot [https://scans.gradle.com/s/b4jmmu72ku5jc/console-log?task=:sdks:java:io:kinesis:test] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9436) Try to avoid elements list materialization in GBK
[ https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051980#comment-17051980 ] Etienne Chauchot commented on BEAM-9436: Could not avoid materialization because ReduceFnRunner needs an Iterable> but managed to avoid a flatmap step and memory consumption with KV creation. > Try to avoid elements list materialization in GBK > - > > Key: BEAM-9436 > URL: https://issues.apache.org/jira/browse/BEAM-9436 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9436) Try to avoid elements list materialization in GBK
Etienne Chauchot created BEAM-9436: -- Summary: Try to avoid elements list materialization in GBK Key: BEAM-9436 URL: https://issues.apache.org/jira/browse/BEAM-9436 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory
[ https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-9307: -- Assignee: Etienne Chauchot > Put windows inside the key to avoid having all values for the same key in > memory > > > Key: BEAM-9307 > URL: https://issues.apache.org/jira/browse/BEAM-9307 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > > On both group by key and combinePerKey. > Like it was done for the current runner. > See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory
[ https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-9307: --- Description: On both group by key and combinePerKey. Like it was done for the current runner. See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 was: Like it was done for the current runner. See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 > Put windows inside the key to avoid having all values for the same key in > memory > > > Key: BEAM-9307 > URL: https://issues.apache.org/jira/browse/BEAM-9307 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > Labels: structured-streaming > > On both group by key and combinePerKey. > Like it was done for the current runner. > See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation
[ https://issues.apache.org/jira/browse/BEAM-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-9327. Fix Version/s: Not applicable Resolution: Won't Fix recoding adding an input to an accumulator translation with manual window merging is less performant (mesured with nexmark) than calling mergeAccumulators(accumulator, addInput(createAccumulator, input)) Using current SparkCombineFn in the new runner is less performant also. > Avoid creating a new accumulator for each add input in Combine translation > -- > > Key: BEAM-9327 > URL: https://issues.apache.org/jira/browse/BEAM-9327 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Fix For: Not applicable > > > similar to latest inprovement in the current runner. See: > [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation
[ https://issues.apache.org/jira/browse/BEAM-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-9327: -- Assignee: Etienne Chauchot > Avoid creating a new accumulator for each add input in Combine translation > -- > > Key: BEAM-9327 > URL: https://issues.apache.org/jira/browse/BEAM-9327 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > > similar to latest inprovement in the current runner. See: > [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory
[ https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-9307: --- Summary: Put windows inside the key to avoid having all values for the same key in memory (was: new spark runner: put windows inside the key to avoid having all values for the same key in memory) > Put windows inside the key to avoid having all values for the same key in > memory > > > Key: BEAM-9307 > URL: https://issues.apache.org/jira/browse/BEAM-9307 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > > Like it was done for the current runner. > See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation
Etienne Chauchot created BEAM-9327: -- Summary: Avoid creating a new accumulator for each add input in Combine translation Key: BEAM-9327 URL: https://issues.apache.org/jira/browse/BEAM-9327 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot similar to latest inprovement in the current runner. See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9307) new spark runner: put windows inside the key to avoid having all values for the same key in memory
Etienne Chauchot created BEAM-9307: -- Summary: new spark runner: put windows inside the key to avoid having all values for the same key in memory Key: BEAM-9307 URL: https://issues.apache.org/jira/browse/BEAM-9307 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Like it was done for the current runner. See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9270) Unable to use ElasticsearchIO for indexpatterns: index-*
[ https://issues.apache.org/jira/browse/BEAM-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033716#comment-17033716 ] Etienne Chauchot commented on BEAM-9270: I did some more testing: the problem is really restricted to only the size estimation. Once got a proper estimation, then the read *index** works fine > Unable to use ElasticsearchIO for indexpatterns: index-* > > > Key: BEAM-9270 > URL: https://issues.apache.org/jira/browse/BEAM-9270 > Project: Beam > Issue Type: Bug > Components: io-java-elasticsearch >Affects Versions: 2.19.0 >Reporter: Krishnaiah Narukulla >Priority: Major > > ElasticsearchIO input doesnot work with index patterns with wildcard. for > example: index-*. I It works well with single index. It becomes problem if > we need to query multiple indices using index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9270) Unable to use ElasticsearchIO for indexpatterns: index-*
[ https://issues.apache.org/jira/browse/BEAM-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033655#comment-17033655 ] Etienne Chauchot commented on BEAM-9270: Hi [~krisnaru], thannks for raising this ! I can indeed reproduce this bug. What happens is that: Beam Read needs to evaluate the size of the data it needs to read to know how to split for parallel read. To get the size of data the ElasticsearchIO calls the stats API of ES. Right now the IO uses a low level ES REST client. So it gets a json object from ES cluster listing all the indices available in the cluster. The IO then searches for the index name provided by the user in the json. In that case it does not find a match with the "*" character and considers that the provided index does not exist and that the size is 0 and returns no document. To fix that it would require to update the json parsing by manually expanding the wildcard. But there is an ongoing complete refactoring of ElasticsearchIO (see the thread on the dev mailing list) that would allow to use high level ES client and that will unlock many ES features. So either we do a fix to match the index name in the stats using a regExpr or we wait for the new ElasticsearchIO. > Unable to use ElasticsearchIO for indexpatterns: index-* > > > Key: BEAM-9270 > URL: https://issues.apache.org/jira/browse/BEAM-9270 > Project: Beam > Issue Type: Bug > Components: io-java-elasticsearch >Affects Versions: 2.19.0 >Reporter: Krishnaiah Narukulla >Priority: Major > > ElasticsearchIO input doesnot work with index patterns with wildcard. for > example: index-*. I It works well with single index. It becomes problem if > we need to query multiple indices using index pattern. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework
[ https://issues.apache.org/jira/browse/BEAM-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025780#comment-17025780 ] Etienne Chauchot commented on BEAM-8470: [~udim] it was already, see there is a new column > Create a new Spark runner based on Spark Structured streaming framework > --- > > Key: BEAM-8470 > URL: https://issues.apache.org/jira/browse/BEAM-8470 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Fix For: 2.18.0 > > Time Spent: 16h > Remaining Estimate: 0h > > h1. Why is it worth creating a new runner based on structured streaming: > Because this new framework brings: > * Unified batch and streaming semantics: > * no more RDD/DStream distinction, as in Beam (only PCollection) > * Better state management: > * incremental state instead of saving all each time > * No more synchronous saving delaying computation: per batch and partition > delta file saved asynchronously + in-memory hashmap synchronous put/get > * Schemas in datasets: > * The dataset knows the structure of the data (fields) and can optimize > later on > * Schemas in PCollection in Beam > * New Source API > * Very close to Beam bounded source and unbounded sources > h1. Why make a new runner from scratch? > * Structured streaming framework is very different from the RDD/Dstream > framework > h1. We hope to gain > * More up to date runner in terms of libraries: leverage new features > * Leverage learnt practices from the previous runners > * Better performance thanks to the DAG optimizer (catalyst) and by > simplifying the code. > * Simplify the code and ease the maintenance > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-9205) Regression in validates runner tests configuration in spark module
[ https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-9205. -- Fix Version/s: 2.20.0 Resolution: Fixed > Regression in validates runner tests configuration in spark module > -- > > Key: BEAM-9205 > URL: https://issues.apache.org/jira/browse/BEAM-9205 > Project: Beam > Issue Type: Sub-task > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.20.0 > > > Not all the metrics tests are run: at least MetricsPusher is no more run -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9205) Regression in validates runner tests configuration in spark module
[ https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-9205: --- Parent: BEAM-3310 Issue Type: Sub-task (was: Test) > Regression in validates runner tests configuration in spark module > -- > > Key: BEAM-9205 > URL: https://issues.apache.org/jira/browse/BEAM-9205 > Project: Beam > Issue Type: Sub-task > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > > Not all the metrics tests are run: at least MetricsPusher is no more run -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9205) Regression in validates runner tests configuration in spark module
Etienne Chauchot created BEAM-9205: -- Summary: Regression in validates runner tests configuration in spark module Key: BEAM-9205 URL: https://issues.apache.org/jira/browse/BEAM-9205 Project: Beam Issue Type: Test Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot Not all the metrics tests are run: at least MetricsPusher is no more run -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9187) org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles is flaky
Etienne Chauchot created BEAM-9187: -- Summary: org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles is flaky Key: BEAM-9187 URL: https://issues.apache.org/jira/browse/BEAM-9187 Project: Beam Issue Type: Test Components: runner-core Reporter: Etienne Chauchot [org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles|https://builds.apache.org/job/beam_PreCommit_Java_Commit/9730/testReport/junit/org.apache.beam.runners.fnexecution.control/DefaultJobBundleFactoryTest/loadBalancesBundles/] UTest seem to be flaky: [https://builds.apache.org/job/beam_PreCommit_Java_Commit/9730/testReport/junit/org.apache.beam.runners.fnexecution.control/DefaultJobBundleFactoryTest/loadBalancesBundles/] I have runner-core as tag but in reality this test is in org.apache.beam.runners.fnexecution.control but there is no corresponding tag -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs
[ https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020913#comment-17020913 ] Etienne Chauchot edited comment on BEAM-9065 at 1/22/20 9:53 AM: - Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure in a UTest was (Author: echauchot): Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure. > Spark runner accumulates metrics (incorrectly) between runs > --- > > Key: BEAM-9065 > URL: https://issues.apache.org/jira/browse/BEAM-9065 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When pipeline.run() is called, MetricsAccumulator (wrapper of > MetricsContainerStepMap spark accumulator) is initialized. Spark needs this > class to be a singleton for failover. The problem is that when several > pipelines are run inside the same JVM, the initialization of > MetricsAccumulator singleton does not reset the underlying spark accumulator > causing metrics to be accumulated between runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs
[ https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020913#comment-17020913 ] Etienne Chauchot commented on BEAM-9065: Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure. > Spark runner accumulates metrics (incorrectly) between runs > --- > > Key: BEAM-9065 > URL: https://issues.apache.org/jira/browse/BEAM-9065 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.20.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When pipeline.run() is called, MetricsAccumulator (wrapper of > MetricsContainerStepMap spark accumulator) is initialized. Spark needs this > class to be a singleton for failover. The problem is that when several > pipelines are run inside the same JVM, the initialization of > MetricsAccumulator singleton does not reset the underlying spark accumulator > causing metrics to be accumulated between runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs
[ https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011875#comment-17011875 ] Etienne Chauchot commented on BEAM-9065: [~udim] If 2.18 is not released when the patch is ready I prefer that it is cherry picked to 2.18 Also I'm still investigating in the precise environment to reproduce this observed behavior so it might not come before 2.19 > Spark runner accumulates metrics (incorrectly) between runs > --- > > Key: BEAM-9065 > URL: https://issues.apache.org/jira/browse/BEAM-9065 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.18.0 > > Time Spent: 40m > Remaining Estimate: 0h > > When pipeline.run() is called, MetricsAccumulator (wrapper of > MetricsContainerStepMap spark accumulator) is initialized. Spark needs this > class to be a singleton for failover. The problem is that when several > pipelines are run inside the same JVM, the initialization of > MetricsAccumulator singleton does not reset the underlying spark accumulator > causing metrics to be accumulated between runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-9065) metrics are not reset upon init
[ https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-9065: --- Fix Version/s: 2.18.0 > metrics are not reset upon init > --- > > Key: BEAM-9065 > URL: https://issues.apache.org/jira/browse/BEAM-9065 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.18.0 > > Time Spent: 10m > Remaining Estimate: 0h > > When pipeline.run() is called, MetricsAccumulator (wrapper of > MetricsContainerStepMap spark accumulator) is initialized. Spark needs this > class to be a singleton for failover. The problem is that when several > pipelines are run inside the same JVM, the initialization of > MetricsAccumulator singleton does not reset the underlying spark accumulator > causing metrics to be accumulated between runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9065) metrics are not reset upon init
Etienne Chauchot created BEAM-9065: -- Summary: metrics are not reset upon init Key: BEAM-9065 URL: https://issues.apache.org/jira/browse/BEAM-9065 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot When pipeline.run() is called, MetricsAccumulator (wrapper of MetricsContainerStepMap spark accumulator) is initialized. Spark needs this class to be a singleton for failover. The problem is that when several pipelines are run inside the same JVM, the initialization of MetricsAccumulator singleton does not reset the underlying spark accumulator causing metrics to be accumulated between runs. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-9032) Replace broadcast variables based side inputs with temp views
[ https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-9032 started by Etienne Chauchot. -- > Replace broadcast variables based side inputs with temp views > - > > Key: BEAM-9032 > URL: https://issues.apache.org/jira/browse/BEAM-9032 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9032) Replace broadcast variables based side inputs with temp views
Etienne Chauchot created BEAM-9032: -- Summary: Replace broadcast variables based side inputs with temp views Key: BEAM-9032 URL: https://issues.apache.org/jira/browse/BEAM-9032 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-5192) Support Elasticsearch 7.x
[ https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-5192. -- Fix Version/s: (was: Not applicable) 2.19.0 Resolution: Fixed > Support Elasticsearch 7.x > - > > Key: BEAM-5192 > URL: https://issues.apache.org/jira/browse/BEAM-5192 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.19.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > ES v7 is not out yet. But Elastic team scheduled a breaking change for ES > 7.0: the removal of the type feature. See > [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch] > This will require a good amont of changes in the IO. > This ticket is there to track the future work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-9019) Improve Spark Encoders (wrappers of beam coders)
Etienne Chauchot created BEAM-9019: -- Summary: Improve Spark Encoders (wrappers of beam coders) Key: BEAM-9019 URL: https://issues.apache.org/jira/browse/BEAM-9019 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot To improve maintenability and performance, replace as much as possible of catalyst generated code with java compiled code. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-5690. Fix Version/s: 2.19.0 Resolution: Fixed > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > Fix For: 2.19.0 > > Time Spent: 4.5h > Remaining Estimate: 0h > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999012#comment-16999012 ] Etienne Chauchot commented on BEAM-5690: Yes, thanks for pointing out [~iemejia], I forgot to close the related jira. > Issue with GroupByKey in BeamSql using SparkRunner > -- > > Key: BEAM-5690 > URL: https://issues.apache.org/jira/browse/BEAM-5690 > Project: Beam > Issue Type: Task > Components: runner-spark >Reporter: Kenneth Knowles >Priority: Major > Time Spent: 4.5h > Remaining Estimate: 0h > > Reported on user@ > {quote}We are trying to setup a pipeline with using BeamSql and the trigger > used is default (AfterWatermark crosses the window). > Below is the pipeline: > >KafkaSource (KafkaIO) >---> Windowing (FixedWindow 1min) >---> BeamSql >---> KafkaSink (KafkaIO) > > We are using Spark Runner for this. > The BeamSql query is: > {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code} > We are grouping by Col3 which is a string. It can hold values string[0-9]. > > The records are getting emitted out at 1 min to kafka sink, but the output > record in kafka is not as expected. > Below is the output observed: (WST and WET are indicators for window start > time and window end time) > {code} > {"count_col1":1,"Col3":"string5","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":3,"Col3":"string7","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":2,"Col3":"string8","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string2","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":1,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 +"} > {"count_col1":0,"Col3":"string6","WST":"2018-10-09 09-55-00 > +","WET":"2018-10-09 09-56-00 0} > {code} > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-8894. Fix Version/s: Not applicable Resolution: Not A Problem Lack of definition of the model in that case. Should runners re-encode ? Waiting for a common definition among runners, all runners exclude this test. > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Fix For: Not applicable > > Time Spent: 20m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8338) Support ES 7.x for ElasticsearchIO
[ https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-8338. -- Fix Version/s: Not applicable Resolution: Duplicate > Support ES 7.x for ElasticsearchIO > -- > > Key: BEAM-8338 > URL: https://issues.apache.org/jira/browse/BEAM-8338 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Michal Brunát >Priority: Major > Fix For: Not applicable > > Time Spent: 4h 10m > Remaining Estimate: 0h > > Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x. > We should support ES 7.x for ElasticsearchIO. > [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html] > > [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Reopened] (BEAM-5192) Support Elasticsearch 7.x
[ https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reopened BEAM-5192: Assignee: Etienne Chauchot (was: Chet Aldrich) reopening to resume [~zhongchen] (who has no more time) on that > Support Elasticsearch 7.x > - > > Key: BEAM-5192 > URL: https://issues.apache.org/jira/browse/BEAM-5192 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: Not applicable > > > ES v7 is not out yet. But Elastic team scheduled a breaking change for ES > 7.0: the removal of the type feature. See > [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch] > This will require a good amont of changes in the IO. > This ticket is there to track the future work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8894 started by Etienne Chauchot. -- > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 10m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Work started] (BEAM-8025) Cassandra IO classMethod test is flaky
[ https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on BEAM-8025 started by Etienne Chauchot. -- > Cassandra IO classMethod test is flaky > -- > > Key: BEAM-8025 > URL: https://issues.apache.org/jira/browse/BEAM-8025 > Project: Beam > Issue Type: Bug > Components: io-java-cassandra, test-failures >Affects Versions: 2.16.0 >Reporter: Kyle Weaver >Assignee: Etienne Chauchot >Priority: Blocker > Time Spent: 3h 40m > Remaining Estimate: 0h > > The most recent runs of this test are failing: > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/] > [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner
[ https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-8830. Fix Version/s: 2.19.0 Resolution: Fixed > fix Flatten tests in Spark Structured Streaming runner > -- > > Key: BEAM-8830 > URL: https://issues.apache.org/jira/browse/BEAM-8830 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Fix For: 2.19.0 > > Time Spent: 5.5h > Remaining Estimate: 0h > > 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' > > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-3310) Push metrics to a backend in an runner agnostic way
[ https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993587#comment-16993587 ] Etienne Chauchot commented on BEAM-3310: unassigned myself because remaining work is to support Metrics pusher in runners other than Spark or Flink > Push metrics to a backend in an runner agnostic way > --- > > Key: BEAM-3310 > URL: https://issues.apache.org/jira/browse/BEAM-3310 > Project: Beam > Issue Type: New Feature > Components: runner-extensions-metrics, sdk-java-core >Reporter: Etienne Chauchot >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > The idea is to avoid relying on the runners to provide access to the metrics > (either at the end of the pipeline or while it runs) because they don't have > all the same capabilities towards metrics (e.g. spark runner configures sinks > like csv, graphite or in memory sinks using the spark engine conf). The > target is to push the metrics in the common runner code so that no matter the > chosen runner, a user can get his metrics out of beam. > Here is the link to the discussion thread on the dev ML: > https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E > And the design doc: > https://s.apache.org/runner_independent_metrics_extraction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-3310) Push metrics to a backend in an runner agnostic way
[ https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-3310: -- Assignee: (was: Etienne Chauchot) > Push metrics to a backend in an runner agnostic way > --- > > Key: BEAM-3310 > URL: https://issues.apache.org/jira/browse/BEAM-3310 > Project: Beam > Issue Type: New Feature > Components: runner-extensions-metrics, sdk-java-core >Reporter: Etienne Chauchot >Priority: Major > Time Spent: 18h 50m > Remaining Estimate: 0h > > The idea is to avoid relying on the runners to provide access to the metrics > (either at the end of the pipeline or while it runs) because they don't have > all the same capabilities towards metrics (e.g. spark runner configures sinks > like csv, graphite or in memory sinks using the spark engine conf). The > target is to push the metrics in the common runner code so that no matter the > chosen runner, a user can get his metrics out of beam. > Here is the link to the discussion thread on the dev ML: > https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E > And the design doc: > https://s.apache.org/runner_independent_metrics_extraction -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581 ] Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:17 PM: -- discussion on heterogeneous coders in the output PCollection of flatten:[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E] was (Author: echauchot): [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] on heterogeneous coders in the output PCollection of flatten: [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 10m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-4088) ExecutorServiceParallelExecutorTest#ensureMetricsThreadDoesntLeak in PR #4965 does not pass in gradle
[ https://issues.apache.org/jira/browse/BEAM-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-4088: -- Assignee: (was: Etienne Chauchot) > ExecutorServiceParallelExecutorTest#ensureMetricsThreadDoesntLeak in PR #4965 > does not pass in gradle > - > > Key: BEAM-4088 > URL: https://issues.apache.org/jira/browse/BEAM-4088 > Project: Beam > Issue Type: Sub-task > Components: testing >Reporter: Etienne Chauchot >Priority: Major > Fix For: Not applicable > > Time Spent: 3.5h > Remaining Estimate: 0h > > This is a new test being added to ensure threads don't leak. The failure > seems to indicate that threads do leak. > This test fails using gradle but previously passed using maven > PR: https://github.com/apache/beam/pull/4965 > Presubmit Failure: > * https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/4059/ > * > https://scans.gradle.com/s/grha56432j3t2/tests/jqhvlvf72f7pg-ipde5etqqejoa?openStackTraces=WzBd -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581 ] Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:17 PM: -- discussion on heterogeneous coders in the output PCollection of flatten: [https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E] was (Author: echauchot): discussion on heterogeneous coders in the output PCollection of flatten:[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E] > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 10m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581 ] Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:16 PM: -- [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] on heterogeneous coders in the output PCollection of flatten: [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] was (Author: echauchot): [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] on heterogeneous coders in the output PCollection of flatten: > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 10m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581 ] Etienne Chauchot commented on BEAM-8894: [discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]] on heterogeneous coders in the output PCollection of flatten: > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Time Spent: 10m > Remaining Estimate: 0h > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8909) Implement Timer API translation in spark runner
Etienne Chauchot created BEAM-8909: -- Summary: Implement Timer API translation in spark runner Key: BEAM-8909 URL: https://issues.apache.org/jira/browse/BEAM-8909 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8908) Implement State API translation in spark runner
Etienne Chauchot created BEAM-8908: -- Summary: Implement State API translation in spark runner Key: BEAM-8908 URL: https://issues.apache.org/jira/browse/BEAM-8908 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8907) Implement SplittableDoFn translation in spark runner
Etienne Chauchot created BEAM-8907: -- Summary: Implement SplittableDoFn translation in spark runner Key: BEAM-8907 URL: https://issues.apache.org/jira/browse/BEAM-8907 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework
[ https://issues.apache.org/jira/browse/BEAM-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-8470. Fix Version/s: 2.18.0 Resolution: Fixed > Create a new Spark runner based on Spark Structured streaming framework > --- > > Key: BEAM-8470 > URL: https://issues.apache.org/jira/browse/BEAM-8470 > Project: Beam > Issue Type: Improvement > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Labels: structured-streaming > Fix For: 2.18.0 > > Time Spent: 15h 50m > Remaining Estimate: 0h > > h1. Why is it worth creating a new runner based on structured streaming: > Because this new framework brings: > * Unified batch and streaming semantics: > * no more RDD/DStream distinction, as in Beam (only PCollection) > * Better state management: > * incremental state instead of saving all each time > * No more synchronous saving delaying computation: per batch and partition > delta file saved asynchronously + in-memory hashmap synchronous put/get > * Schemas in datasets: > * The dataset knows the structure of the data (fields) and can optimize > later on > * Schemas in PCollection in Beam > * New Source API > * Very close to Beam bounded source and unbounded sources > h1. Why make a new runner from scratch? > * Structured streaming framework is very different from the RDD/Dstream > framework > h1. We hope to gain > * More up to date runner in terms of libraries: leverage new features > * Leverage learnt practices from the previous runners > * Better performance thanks to the DAG optimizer (catalyst) and by > simplifying the code. > * Simplify the code and ease the maintenance > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-8894) Fix multiple coder bug in new spark runner
[ https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988666#comment-16988666 ] Etienne Chauchot commented on BEAM-8894: I hav split this tests because it might be unrelated to flatten > Fix multiple coder bug in new spark runner > -- > > Key: BEAM-8894 > URL: https://issues.apache.org/jira/browse/BEAM-8894 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > > The test below does not pass. I fails with an EOF while calling > NullableCoder(BigEndianCoder.decode) > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > In the "old" spark runner, this test passes because it never call the > NullableCoder because there is no serialization done. > There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (BEAM-6209) Remove Http Metrics Sink specific methods from PipelineOptions
[ https://issues.apache.org/jira/browse/BEAM-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot resolved BEAM-6209. Fix Version/s: 2.10.0 Resolution: Fixed > Remove Http Metrics Sink specific methods from PipelineOptions > -- > > Key: BEAM-6209 > URL: https://issues.apache.org/jira/browse/BEAM-6209 > Project: Beam > Issue Type: Improvement > Components: runner-extensions-metrics >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > Fix For: 2.10.0 > > > Methods specific to Metrics Http Sink should be moved to a > PipelineOptionsMetricsHttpSink interface to avoid having technology specific > methods in base classes/interfaces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-2499) Support Custom Windows in Spark runner
[ https://issues.apache.org/jira/browse/BEAM-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-2499: -- Assignee: (was: Etienne Chauchot) > Support Custom Windows in Spark runner > -- > > Key: BEAM-2499 > URL: https://issues.apache.org/jira/browse/BEAM-2499 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > > If we extend {{IntervalWindow}} and we try to merge these custom windows like > in this PR: > https://github.com/apache/beam/pull/3286 > Then spark runner fails with > {{org.apache.beam.sdk.Pipeline$PipelineExecutionException: > java.lang.ClassCastException: > org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to > org.apache.beam.sdk.transforms.windowing.MergingCustomWindowsTest$CustomWindow}} > It seems to be because of the cast to {{IntervalWindow}} there: > https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkGlobalCombineFn.java#L111 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-2176) Support state API in Spark streaming mode.
[ https://issues.apache.org/jira/browse/BEAM-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-2176: -- Assignee: (was: Etienne Chauchot) > Support state API in Spark streaming mode. > -- > > Key: BEAM-2176 > URL: https://issues.apache.org/jira/browse/BEAM-2176 > Project: Beam > Issue Type: Sub-task > Components: runner-spark >Reporter: Aviem Zur >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-4226) Migrate hadoop dependency to 2.7.4 or upper to fix a CVE
[ https://issues.apache.org/jira/browse/BEAM-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-4226: -- Assignee: (was: Etienne Chauchot) > Migrate hadoop dependency to 2.7.4 or upper to fix a CVE > > > Key: BEAM-4226 > URL: https://issues.apache.org/jira/browse/BEAM-4226 > Project: Beam > Issue Type: Task > Components: sdk-java-core >Reporter: Etienne Chauchot >Priority: Major > > apache hadoop is subject to a vulnerability: > CVE-2016-6811: Apache Hadoop Privilege escalation vulnerability > We should upgrade the dep to maybe 2.7.4 which is the closest to what we > actually use (2.7.3) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (BEAM-5171) org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests are flaky in Spark runner
[ https://issues.apache.org/jira/browse/BEAM-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot reassigned BEAM-5171: -- Assignee: (was: Etienne Chauchot) > org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests > are flaky in Spark runner > - > > Key: BEAM-5171 > URL: https://issues.apache.org/jira/browse/BEAM-5171 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Valentyn Tymofieiev >Priority: Major > > Two tests: > org.apache.beam.sdk.io.CountingSourceTest.testUnboundedSourceSplits > org.apache.beam.sdk.io.CountingSourceTest.testBoundedSourceSplits > failed in a PostCommit [Spark Validates Runner test > suite|https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/1277/testReport/] > with an error that seems to be common for Spark. Could this be due to > misconfiguration of Spark cluster? > Task serialization failed: java.io.IOException: Failed to create local dir in > /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d. > java.io.IOException: Failed to create local dir in > /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d. > at > org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70) > at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:116) > at > org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1511) > at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1045) > at > org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083) > at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:841) > at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1404) > at > org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:123) > at > org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88) > at > org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) > at > org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62) > at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1482) > at > org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1039) > at > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:947) > at > org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:891) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1780) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8894) Fix multiple coder bug in new spark runner
Etienne Chauchot created BEAM-8894: -- Summary: Fix multiple coder bug in new spark runner Key: BEAM-8894 URL: https://issues.apache.org/jira/browse/BEAM-8894 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot The test below does not pass. I fails with an EOF while calling NullableCoder(BigEndianCoder.decode) 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' In the "old" spark runner, this test passes because it never call the NullableCoder because there is no serialization done. There may be a problem in the NullableCoder itself -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner
[ https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot updated BEAM-8830: --- Description: 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' was: 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' > fix Flatten tests in Spark Structured Streaming runner > -- > > Key: BEAM-8830 > URL: https://issues.apache.org/jira/browse/BEAM-8830 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Assignee: Etienne Chauchot >Priority: Major > > 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' > > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-8839) Failed ValidatesRunner tests for new Spark Runner
[ https://issues.apache.org/jira/browse/BEAM-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-8839. -- Fix Version/s: Not applicable Resolution: Duplicate split this failures into 3 tickets for 3 causes: -dofnlifecycle -combine with context -flatten > Failed ValidatesRunner tests for new Spark Runner > - > > Key: BEAM-8839 > URL: https://issues.apache.org/jira/browse/BEAM-8839 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Alexey Romanenko >Priority: Major > Labels: structured-streaming > Fix For: Not applicable > > > Some of the transform translations for new Spark runner (Flatten and Combine > in particular) fail on several corner cases when ValidatesRunner tests are > running. These tests were excluded from run for new Spark Runner to keep > Jenkins job green and detect regression easily (if any) but they have to be > addressed and correspondent translations have to be fixed. > List of these tests: > {code} > > 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext' > > 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext' > > 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty' > > 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext' > > 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext' > > 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext' > 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' > > 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' > > 'org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8860) Implement combine with context in new spark runner
Etienne Chauchot created BEAM-8860: -- Summary: Implement combine with context in new spark runner Key: BEAM-8860 URL: https://issues.apache.org/jira/browse/BEAM-8860 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot Validates runner tests below fail in spark structured streaming runner because combine with context (for side inputs) is not implemented: 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8859) Fix org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner
Etienne Chauchot created BEAM-8859: -- Summary: Fix org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner Key: BEAM-8859 URL: https://issues.apache.org/jira/browse/BEAM-8859 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Etienne Chauchot validates runner test org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded fails in spark structured streaming runner -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner
Etienne Chauchot created BEAM-8830: -- Summary: fix Flatten tests in Spark Structured Streaming runner Key: BEAM-8830 URL: https://issues.apache.org/jira/browse/BEAM-8830 Project: Beam Issue Type: Bug Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty' 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo' -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-5192) Support Elasticsearch 7.x
[ https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-5192. -- Fix Version/s: Not applicable Resolution: Duplicate > Support Elasticsearch 7.x > - > > Key: BEAM-5192 > URL: https://issues.apache.org/jira/browse/BEAM-5192 > Project: Beam > Issue Type: Improvement > Components: io-java-elasticsearch >Reporter: Etienne Chauchot >Assignee: Chet Aldrich >Priority: Major > Fix For: Not applicable > > > ES v7 is not out yet. But Elastic team scheduled a breaking change for ES > 7.0: the removal of the type feature. See > [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch] > This will require a good amont of changes in the IO. > This ticket is there to track the future work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-198) Spark runner batch translator to work with Datasets instead of RDDs
[ https://issues.apache.org/jira/browse/BEAM-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-198. - Fix Version/s: Not applicable Resolution: Fixed > Spark runner batch translator to work with Datasets instead of RDDs > --- > > Key: BEAM-198 > URL: https://issues.apache.org/jira/browse/BEAM-198 > Project: Beam > Issue Type: New Feature > Components: runner-spark >Reporter: Amit Sela >Assignee: Jean-Baptiste Onofré >Priority: Major > Fix For: Not applicable > > > Currently, the Spark runner translates batch pipelines into RDD code, meaning > it doesn't benefit from the optimizations DataFrames (which isn't type-safe) > enjoys. > With Datasets, batch pipelines will benefit the optimizations, adding to that > that Datasets are type-safe and encoder-based they seem like a much better > fit for the Beam model. > Looking ahead, Datasets is a good choice since it's the basis for the future > of Spark streaming as well (Structured Streaming) so this will hopefully lay > a solid foundation for a native integration between Spark 2.0 and Beam. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework
Etienne Chauchot created BEAM-8470: -- Summary: Create a new Spark runner based on Spark Structured streaming framework Key: BEAM-8470 URL: https://issues.apache.org/jira/browse/BEAM-8470 Project: Beam Issue Type: Improvement Components: runner-spark Reporter: Etienne Chauchot Assignee: Etienne Chauchot h1. Why is it worth creating a new runner based on structured streaming: Because this new framework brings: * Unified batch and streaming semantics: * no more RDD/DStream distinction, as in Beam (only PCollection) * Better state management: * incremental state instead of saving all each time * No more synchronous saving delaying computation: per batch and partition delta file saved asynchronously + in-memory hashmap synchronous put/get * Schemas in datasets: * The dataset knows the structure of the data (fields) and can optimize later on * Schemas in PCollection in Beam * New Source API * Very close to Beam bounded source and unbounded sources h1. Why make a new runner from scratch? * Structured streaming framework is very different from the RDD/Dstream framework h1. We hope to gain * More up to date runner in terms of libraries: leverage new features * Leverage learnt practices from the previous runners * Better performance thanks to the DAG optimizer (catalyst) and by simplifying the code. * Simplify the code and ease the maintenance -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (BEAM-7184) Performance regression on spark-runner
[ https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Etienne Chauchot closed BEAM-7184. -- Fix Version/s: Not applicable Resolution: Cannot Reproduce > Performance regression on spark-runner > -- > > Key: BEAM-7184 > URL: https://issues.apache.org/jira/browse/BEAM-7184 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > Fix For: Not applicable > > > There is a performance degradation of+200% in spark runner starting on 04/10 > for all the Nexmark queries. See > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7184) Performance regression on spark-runner
[ https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947644#comment-16947644 ] Etienne Chauchot commented on BEAM-7184: As it has not shown up since April, I guess we can close it for now, if a similar perf drop raises again, we could still reopen. > Performance regression on spark-runner > -- > > Key: BEAM-7184 > URL: https://issues.apache.org/jira/browse/BEAM-7184 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > > There is a performance degradation of+200% in spark runner starting on 04/10 > for all the Nexmark queries. See > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (BEAM-7184) Performance regression on spark-runner
[ https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946820#comment-16946820 ] Etienne Chauchot commented on BEAM-7184: Do we know what was causing the perf drop ? > Performance regression on spark-runner > -- > > Key: BEAM-7184 > URL: https://issues.apache.org/jira/browse/BEAM-7184 > Project: Beam > Issue Type: Bug > Components: runner-spark >Reporter: Etienne Chauchot >Priority: Major > > There is a performance degradation of+200% in spark runner starting on 04/10 > for all the Nexmark queries. See > https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712 -- This message was sent by Atlassian Jira (v8.3.4#803005)