[jira] [Resolved] (BEAM-4825) Nexmark query3 is flaky on Direct runner. State and/or timer issue ?

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-4825.

Fix Version/s: Not applicable
   Resolution: Fixed

No flakiness observed (nexmark perfkit dashboards for direct runner) in the 
last quarter, closing the ticket

> Nexmark query3 is flaky on Direct runner. State and/or timer issue ?
> 
>
> Key: BEAM-4825
> URL: https://issues.apache.org/jira/browse/BEAM-4825
> Project: Beam
>  Issue Type: Bug
>  Components: runner-direct
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>
> Query3 exercises state and timers. It asks this question to Nexmark auction 
> system:
> Who is selling in particular US states?
> And the sketch of its code is:
>  * Apply global window to events with trigger repeatedly after at least 
> nbEvents in pane =>  results will be materialized each time nbEvents are 
> received.
>  * input1: collection of auctions events filtered by category and keyed by 
> seller id
>  * input2: collection of persons events filtered by US state codes and keyed 
> by person id
>  * CoGroupByKey to group auctions and persons by personId/sellerId + tags to 
> distinguish persons and auctions
>  * ParDo to do the incremental join: auctions and person events can arrive 
> out of order
>  * person element stored in persistent state in order to match future 
> auctions by that person. Set a timer to clear the person state after a TTL
>  * auction elements stored in persistent state until we have seen the 
> corresponding person record. Then, it can be output and cleared
>  * output NameCityStateId(person.name, person.city, person.state, auction.id) 
> objects
>  
>  
> *The output size should be constant and it is not.*
> *The output size of this query is different between batch and streaming modes*
>  
> See query 3 dashboard in this graph: 
> https://apache-beam-testing.appspot.com/explore?dashboard=5099379773931520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5178) Gradle: Try to reduce the initialization time when running a UTest multiple times.

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-5178.

Fix Version/s: Not applicable
   Resolution: Fixed

with gradle changes in the build and IDE plugin improvement, this delay is less 
harmful, closing the ticket.

> Gradle: Try to reduce the initialization time when running a UTest multiple 
> times.
> --
>
> Key: BEAM-5178
> URL: https://issues.apache.org/jira/browse/BEAM-5178
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: gradle, stale-P2
> Fix For: Not applicable
>
>
> When we run a unit test multiple times in intellij, we pay each time 6s in 
> initialization of the test (on my machine). Is there a configuration item 
> like caching that allows to reduce this time?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5177) Integrate javadocs in the IDE with gradle

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-5177.
--
Fix Version/s: Not applicable
   Resolution: Won't Fix

Not related to Beam, IDE configuration. Closing

> Integrate javadocs in the IDE with gradle
> -
>
> Key: BEAM-5177
> URL: https://issues.apache.org/jira/browse/BEAM-5177
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: gradle, stale-P2
> Fix For: Not applicable
>
>
> It would be good to have automatic javadoc artifacts download for external 
> dependencies



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6344) :beam-sdks-java-io-elasticsearch-tests-2:test failed owing to get 503 from ES

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-6344.
--
Fix Version/s: Not applicable
   Resolution: Fixed

There were recent overload counter measures on ES tests and there was no flaky 
tests since. Closing this issue.

> :beam-sdks-java-io-elasticsearch-tests-2:test failed owing to get 503 from ES
> -
>
> Key: BEAM-6344
> URL: https://issues.apache.org/jira/browse/BEAM-6344
> Project: Beam
>  Issue Type: Test
>  Components: test-failures
>Reporter: Boyuan Zhang
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>
> [https://builds.apache.org/job/beam_Release_NightlySnapshot/290/]
> log: 
> https://scans.gradle.com/s/umeki7yl4ipq6/tests/fo2bghfaj5ysq-7zrcy3fjk3uwg



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7290) Upgrade hadoop-client

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-7290.

Fix Version/s: Not applicable
   Resolution: Fixed

this ticket was open for a CVE on hadoop-client dep but , there is no more CVE 
raised for current beam dep. Closing

> Upgrade hadoop-client
> -
>
> Key: BEAM-7290
> URL: https://issues.apache.org/jira/browse/BEAM-7290
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7291) Upgrade hadoop-common

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-7291.
--
Fix Version/s: Not applicable
   Resolution: Fixed

this ticket was open for a CVE on hadoop-common but , there is no more CVE 
raised for current beam dep. Closing

> Upgrade hadoop-common
> -
>
> Key: BEAM-7291
> URL: https://issues.apache.org/jira/browse/BEAM-7291
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7292) Upgrade hadoop-mapreduce-client-core

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-7292.

Fix Version/s: Not applicable
   Resolution: Fixed

this ticket was open for a CVE on hadoop-mapreduce-client-core but , there is 
no more CVE raised for current beam dep. Closing

> Upgrade hadoop-mapreduce-client-core
> 
>
> Key: BEAM-7292
> URL: https://issues.apache.org/jira/browse/BEAM-7292
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7295) Upgrade javax.mail

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-7295.
--
Fix Version/s: Not applicable
   Resolution: Fixed

No more in use in Beam so closing the issue

> Upgrade javax.mail
> --
>
> Key: BEAM-7295
> URL: https://issues.apache.org/jira/browse/BEAM-7295
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7298) Upgrade zookeeper

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-7298.

Fix Version/s: Not applicable
   Resolution: Fixed

Opened because of a CVE in zookeeper.  [~iemejia] upgraded zookeeper beam dep 
on 08/02/2019 so closing this issue

> Upgrade zookeeper
> -
>
> Key: BEAM-7298
> URL: https://issues.apache.org/jira/browse/BEAM-7298
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7400) Authentication for ElasticsearchIO seems to be failing

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-7400:
---
Description: 
It seem to be failing at least on elasticsearch cloud:
{code:java}
[WARNING] 
org.elasticsearch.client.ResponseException: GET 
https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: 
HTTP/1.1 401 Unauthorized
{"error":{"root_cause":[{"type":"security_exception","reason":"action 
[cluster:monitor/main] requires 
authentication","header":{"WWW-Authenticate":["Bearer 
realm=\"security\"","ApiKey","Basic realm=\"security\" 
charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action 
[cluster:monitor/main] requires 
authentication","header":{"WWW-Authenticate":["Bearer 
realm=\"security\"","ApiKey","Basic realm=\"security\" 
charset=\"UTF-8\""]}},"status":401}
{code}

  was:
It seem to be failing at list on elasticsearch cloud:

{code}
[WARNING] 
org.elasticsearch.client.ResponseException: GET 
https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: 
HTTP/1.1 401 Unauthorized
{"error":{"root_cause":[{"type":"security_exception","reason":"action 
[cluster:monitor/main] requires 
authentication","header":{"WWW-Authenticate":["Bearer 
realm=\"security\"","ApiKey","Basic realm=\"security\" 
charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action 
[cluster:monitor/main] requires 
authentication","header":{"WWW-Authenticate":["Bearer 
realm=\"security\"","ApiKey","Basic realm=\"security\" 
charset=\"UTF-8\""]}},"status":401}
{code}




> Authentication for ElasticsearchIO seems to be failing
> --
>
> Key: BEAM-7400
> URL: https://issues.apache.org/jira/browse/BEAM-7400
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2
>
> It seem to be failing at least on elasticsearch cloud:
> {code:java}
> [WARNING] 
> org.elasticsearch.client.ResponseException: GET 
> https://a3626ec04ef549318d444cde10db468d.europe-west1.gcp.cloud.es.io:9243/: 
> HTTP/1.1 401 Unauthorized
> {"error":{"root_cause":[{"type":"security_exception","reason":"action 
> [cluster:monitor/main] requires 
> authentication","header":{"WWW-Authenticate":["Bearer 
> realm=\"security\"","ApiKey","Basic realm=\"security\" 
> charset=\"UTF-8\""]}}],"type":"security_exception","reason":"action 
> [cluster:monitor/main] requires 
> authentication","header":{"WWW-Authenticate":["Bearer 
> realm=\"security\"","ApiKey","Basic realm=\"security\" 
> charset=\"UTF-8\""]}},"status":401}
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8859) Fix SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-8859.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/BEAM-8907

> Fix SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner
> --
>
> Key: BEAM-8859
> URL: https://issues.apache.org/jira/browse/BEAM-8859
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-P2, structured-streaming
> Fix For: Not applicable
>
>
> validates runner test 
> org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded
> fails in spark structured streaming runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4797) Allow the user to limit the number of result docs in ElasticsearchIO.read()

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-4797.

Fix Version/s: Not applicable
   Resolution: Won't Fix

A contributor offered to implement the feature but never sent the PR. The other 
user that asked for it solved his problem manually so closing the issue.

> Allow the user to limit the number of result docs in ElasticsearchIO.read()
> ---
>
> Key: BEAM-4797
> URL: https://issues.apache.org/jira/browse/BEAM-4797
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>
> In some cases, like sampling, the users will be interested in limiting the 
> number of docs a ESIO.read() returns.
> It is as simple as allowing the user to pass terminate_after parameter to the 
> IO configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5370) Set a cause exception in ElasticsearchIO#getBackendVersion

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-5370.

Fix Version/s: 2.7.0
   Resolution: Fixed

Forgot to close that one, thanks for the reminded. I went through the history 
to see in which version i was fixed.

> Set a cause exception in ElasticsearchIO#getBackendVersion
> --
>
> Key: BEAM-5370
> URL: https://issues.apache.org/jira/browse/BEAM-5370
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Affects Versions: 2.6.0
>Reporter: Shinsuke Sugaya
>Assignee: Etienne Chauchot
>Priority: P3
>  Labels: stale-assigned
> Fix For: 2.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I could not check a root cause of IllegalArgumentException in 
> ElasticsearchIO#getBackendVersion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5757) Elasticsearch IO provide delete function

2020-06-02 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123561#comment-17123561
 ] 

Etienne Chauchot commented on BEAM-5757:


Delete an entire index does not make a lot of sense inside the IO but deleting 
documents in this index does and some other IOS including CassandraIO do 
support such a feature. If someone is willing to submit a PR on that I could do 
the review.

> Elasticsearch IO provide delete function
> 
>
> Key: BEAM-5757
> URL: https://issues.apache.org/jira/browse/BEAM-5757
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Affects Versions: 2.7.0
>Reporter: Prasad Marne
>Priority: P2
>  Labels: stale-assigned
>
> Hello beam maintainers,
> I am trying to delete some index from Elasticsearch using beam pipeline. I 
> couldn't find and delete function in ElasticsearchIO. It would be nice to 
> have and would make sense for cleaning up indexes overtime.
>  I also checked some other IO classes and they also don't support delete. Not 
> sure if beam has some policy against supporting delete. 
>  Please guide me on this. I am willing to create pull request if this feature 
> makes sense to project contributors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5757) Elasticsearch IO provide delete function

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-5757:
--

Assignee: (was: Etienne Chauchot)

> Elasticsearch IO provide delete function
> 
>
> Key: BEAM-5757
> URL: https://issues.apache.org/jira/browse/BEAM-5757
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Affects Versions: 2.7.0
>Reporter: Prasad Marne
>Priority: P2
>  Labels: stale-assigned
>
> Hello beam maintainers,
> I am trying to delete some index from Elasticsearch using beam pipeline. I 
> couldn't find and delete function in ElasticsearchIO. It would be nice to 
> have and would make sense for cleaning up indexes overtime.
>  I also checked some other IO classes and they also don't support delete. Not 
> sure if beam has some policy against supporting delete. 
>  Please guide me on this. I am willing to create pull request if this feature 
> makes sense to project contributors.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6304) can ElasticsearchIO add a ExceptionHandlerFn

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-6304.

Fix Version/s: Not applicable
   Resolution: Won't Fix

As it received no update, I guess the user followed my advice and implemented 
it as a pre-processing DoFn

> can ElasticsearchIO add a ExceptionHandlerFn 
> -
>
> Key: BEAM-6304
> URL: https://issues.apache.org/jira/browse/BEAM-6304
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-elasticsearch
>Affects Versions: Not applicable
>Reporter: big
>Assignee: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned
> Fix For: Not applicable
>
>
> I use ElasticsearchIO to write my data to elasticSearch. However, the data is 
> from other platform and not easy to check its validity. If we get the invalid 
> data, we can ignore it( even though use batch insert, we can ignore all of 
> them). So, I wish has a registered exception catch function to process it. 
> From now on, I read the source code about write function in ProcessElement, 
> it just throw the exception and cause my job to stop. 
> I can catch   pipeline.run().waitUntilFinish() on direct runner and force it 
> run again use while statement ungracefully. However, when it deploy to Flink, 
> it will fail because Flink report exception that it cannot optimize the job.
> If there is a method let user to decide how to process exception is required.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8860) Implement combine with context in new spark runner

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-8860:
--

Assignee: (was: Etienne Chauchot)

> Implement combine with context in new spark runner
> --
>
> Key: BEAM-8860
> URL: https://issues.apache.org/jira/browse/BEAM-8860
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned, structured-streaming
>
> Validates runner tests below fail in spark structured streaming runner 
> because combine with context (for side inputs) is not implemented:
>  
> 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9032) Replace broadcast variables based side inputs with temp views

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-9032.

Fix Version/s: Not applicable
   Resolution: Won't Fix

Temp views need to be created/accessed through spark sql context. So the spark 
sql context need to be accessed from the SideInputReader but it is not 
serializable. So the temp views cannot be used for side inputs, keeping the 
broadcast variables.

> Replace broadcast variables based side inputs with temp views
> -
>
> Key: BEAM-9032
> URL: https://issues.apache.org/jira/browse/BEAM-9032
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned, structured-streaming
> Fix For: Not applicable
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9032) Replace broadcast variables based side inputs with temp views

2020-06-02 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123510#comment-17123510
 ] 

Etienne Chauchot commented on BEAM-9032:


Forgot to put the update. Thanks for the reminder Kenn !

> Replace broadcast variables based side inputs with temp views
> -
>
> Key: BEAM-9032
> URL: https://issues.apache.org/jira/browse/BEAM-9032
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned, structured-streaming
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-9307:
--

Assignee: (was: Etienne Chauchot)

> Put windows inside the key to avoid having all values for the same key in 
> memory
> 
>
> Key: BEAM-9307
> URL: https://issues.apache.org/jira/browse/BEAM-9307
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned, structured-streaming
>
> On both group by key and combinePerKey.
> Like it was done for the current runner.
> See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9436) Improve performance of GBK

2020-06-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-9436.

Fix Version/s: 2.20.0
   Resolution: Fixed

Performance of GBK was improved by linked PR even if materialization cannot be 
avoided so closing this ticket. The improvement was merged for 2.20 so 
targeting this version

> Improve performance of GBK
> --
>
> Key: BEAM-9436
> URL: https://issues.apache.org/jira/browse/BEAM-9436
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: P2
>  Labels: stale-assigned, structured-streaming
> Fix For: 2.20.0
>
>  Time Spent: 15h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO

2020-05-19 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111273#comment-17111273
 ] 

Etienne Chauchot commented on BEAM-10017:
-

there is a jar task in the cassandra gradle module. You can use your IDE with 
the gradle plugin

> Expose SocketOptions timeouts in CassandraIO
> 
>
> Key: BEAM-10017
> URL: https://issues.apache.org/jira/browse/BEAM-10017
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Reporter: Nathan Fisher
>Priority: P3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are no options to tune the configuration of the CassandraIO 
> reader/writer. This can be useful for either slow clusters, large queries, or 
> high latency links.
> The intent would be to expose the following configuration elements as setters 
> on the CassandraIO builder similar to withKeyspace and other methods.
>  
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int
>  connectTimeoutMillis)}}
> Sets the connection timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean
>  keepAlive)}}
> Sets whether to enable TCP keepalive.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int
>  readTimeoutMillis)}}
> Sets the per-host read timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int
>  receiveBufferSize)}}
> Sets a hint to the size of the underlying buffers for incoming network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean
>  reuseAddress)}}
> Sets whether to enable reuse-address.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int
>  sendBufferSize)}}
> Sets a hint to the size of the underlying buffers for outgoing network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int
>  soLinger)}}
> Sets the linger-on-close timeout.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean
>  tcpNoDelay)}}
> Sets whether to disable Nagle's algorithm.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO

2020-05-19 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110950#comment-17110950
 ] 

Etienne Chauchot commented on BEAM-10017:
-

Hi Nathan, thanks for moving the design discussion to the ticket and applying 
the no-knobs philosophy of Beam.

Connect timeout and Read timeout make total sense.

> Expose SocketOptions timeouts in CassandraIO
> 
>
> Key: BEAM-10017
> URL: https://issues.apache.org/jira/browse/BEAM-10017
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Reporter: Nathan Fisher
>Priority: P3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are no options to tune the configuration of the CassandraIO 
> reader/writer. This can be useful for either slow clusters, large queries, or 
> high latency links.
> The intent would be to expose the following configuration elements as setters 
> on the CassandraIO builder similar to withKeyspace and other methods.
>  
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int
>  connectTimeoutMillis)}}
> Sets the connection timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean
>  keepAlive)}}
> Sets whether to enable TCP keepalive.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int
>  readTimeoutMillis)}}
> Sets the per-host read timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int
>  receiveBufferSize)}}
> Sets a hint to the size of the underlying buffers for incoming network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean
>  reuseAddress)}}
> Sets whether to enable reuse-address.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int
>  sendBufferSize)}}
> Sets a hint to the size of the underlying buffers for outgoing network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int
>  soLinger)}}
> Sets the linger-on-close timeout.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean
>  tcpNoDelay)}}
> Sets whether to disable Nagle's algorithm.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-10017) Expose SocketOptions timeouts in CassandraIO

2020-05-19 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-10017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-10017:

Summary: Expose SocketOptions timeouts in CassandraIO  (was: Expose 
SocketOptions timeouts in CassandraIO builder)

> Expose SocketOptions timeouts in CassandraIO
> 
>
> Key: BEAM-10017
> URL: https://issues.apache.org/jira/browse/BEAM-10017
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-cassandra
>Reporter: Nathan Fisher
>Priority: P3
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently there are no options to tune the configuration of the CassandraIO 
> reader/writer. This can be useful for either slow clusters, large queries, or 
> high latency links.
> The intent would be to expose the following configuration elements as setters 
> on the CassandraIO builder similar to withKeyspace and other methods.
>  
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setConnectTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setConnectTimeoutMillis-int-](int
>  connectTimeoutMillis)}}
> Sets the connection timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setKeepAlive|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setKeepAlive-boolean-](boolean
>  keepAlive)}}
> Sets whether to enable TCP keepalive.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReadTimeoutMillis|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReadTimeoutMillis-int-](int
>  readTimeoutMillis)}}
> Sets the per-host read timeout in milliseconds.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReceiveBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReceiveBufferSize-int-](int
>  receiveBufferSize)}}
> Sets a hint to the size of the underlying buffers for incoming network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setReuseAddress|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setReuseAddress-boolean-](boolean
>  reuseAddress)}}
> Sets whether to enable reuse-address.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSendBufferSize|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSendBufferSize-int-](int
>  sendBufferSize)}}
> Sets a hint to the size of the underlying buffers for outgoing network I/O.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setSoLinger|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setSoLinger-int-](int
>  soLinger)}}
> Sets the linger-on-close timeout.|
> |{{[SocketOptions|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html]}}|{{[setTcpNoDelay|https://docs.datastax.com/en/drivers/java/3.8/com/datastax/driver/core/SocketOptions.html#setTcpNoDelay-boolean-](boolean
>  tcpNoDelay)}}
> Sets whether to disable Nagle's algorithm.|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-3926) Support MetricsPusher in Dataflow Runner

2020-05-15 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108082#comment-17108082
 ] 

Etienne Chauchot edited comment on BEAM-3926 at 5/15/20, 8:47 AM:
--

[~foegler], [~pabloem], [~ajam...@google.com], I have a user who asks for this 
feature in Dataflow. Is there a willingness to implement it for the Dataflow 
runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.


was (Author: echauchot):
[~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is 
there a willingness to implement it for the Dataflow runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.

> Support MetricsPusher in Dataflow Runner
> 
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3926) Support MetricsPusher in Dataflow Runner

2020-05-15 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17108082#comment-17108082
 ] 

Etienne Chauchot commented on BEAM-3926:


[~foegler], [~pabloem], I have a user who asks for this feature in Dataflow. Is 
there a willingness to implement it for the Dataflow runner? 

If so, as the MetricsPusher needs to be instanciated at the engine side (cf 
arguments in the description of the ticket), I was wondering if the worker part 
of the dataflow runner could be the correct spot and as it was donated it would 
enable the community to implement the feature for Dataflow.

> Support MetricsPusher in Dataflow Runner
> 
>
> Key: BEAM-3926
> URL: https://issues.apache.org/jira/browse/BEAM-3926
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-dataflow
>Reporter: Scott Wegner
>Assignee: Pablo Estrada
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> See [relevant email 
> thread|https://lists.apache.org/thread.html/2e87f0adcdf8d42317765f298e3e6fdba72917a72d4a12e71e67e4b5@%3Cdev.beam.apache.org%3E].
>  From [~echauchot]:
>   
> _AFAIK Dataflow being a cloud hosted engine, the related runner is very 
> different from the others. It just submits a job to the cloud hosted engine. 
> So, no access to metrics container etc... from the runner. So I think that 
> the MetricsPusher (component responsible for merging metrics and pushing them 
> to a sink backend) must not be instanciated in DataflowRunner otherwise it 
> would be more a client (driver) piece of code and we will lose all the 
> interest of being close to the execution engine (among other things 
> instrumentation of the execution of the pipelines).  I think that the 
> MetricsPusher needs to be instanciated in the actual Dataflow engine._
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-05-14 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-8025.

Fix Version/s: 2.21.0
   Resolution: Fixed

I took a look at the statistics since my fix PR was merged and within 14 days, 
no failure. So I'm closing this ticket.

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
> Fix For: 2.21.0
>
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-05-04 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17098905#comment-17098905
 ] 

Etienne Chauchot commented on BEAM-8025:


My last fix was merged, I'm waiting for 2 weeks to see if the increased 
retrial/retrial delay are enough to fix the flakiness issue.

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-04-30 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096405#comment-17096405
 ] 

Etienne Chauchot commented on BEAM-8025:


Just submitted a PR: [https://github.com/apache/beam/pull/11578] I put you as 
reviewer.

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-04-30 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391
 ] 

Etienne Chauchot edited comment on BEAM-8025 at 4/30/20, 10:17 AM:
---

Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But 
it is difficult to reproduce. I lack time a lot these weeks but I can try to 
take a look a it.


was (Author: echauchot):
Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But 
it is difficult to reproduce 

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-04-30 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391
 ] 

Etienne Chauchot edited comment on BEAM-8025 at 4/30/20, 10:02 AM:
---

Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout. But 
it is difficult to reproduce 


was (Author: echauchot):
Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout.

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-04-30 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096391#comment-17096391
 ] 

Etienne Chauchot commented on BEAM-8025:


Hi Kenn, no it is due to jenkins cluster load. I can give a higher timeout.

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Critical
>  Labels: flake
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9436) Improve performance of GBK

2020-03-26 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9436:
---
Summary: Improve performance of GBK  (was: Try to avoid elements list 
materialization in GBK)

> Improve performance of GBK
> --
>
> Key: BEAM-9436
> URL: https://issues.apache.org/jira/browse/BEAM-9436
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9133) CassandraIOTest.classMethod test is still flaky

2020-03-25 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-9133.

Fix Version/s: Not applicable
   Resolution: Duplicate

[~aromanenko] thanks. I know it is still flaky. That is why I reopened  
BEAM-8025 some days ago. So, this ticket is duplicate. Unfortunately I have no 
time to work on that right now. Maybe [~adejanovski] ?

> CassandraIOTest.classMethod test is still flaky
> ---
>
> Key: BEAM-9133
> URL: https://issues.apache.org/jira/browse/BEAM-9133
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.17.0
>Reporter: Alexey Romanenko
>Assignee: Etienne Chauchot
>Priority: Critical
> Fix For: Not applicable
>
>
> CassandraIOTest is still flaky. For example:
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1646/
> https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1625/
> {code}
> Error Message
> java.lang.RuntimeException: Unable to create embedded Cassandra cluster
> Stacktrace
> java.lang.RuntimeException: Unable to create embedded Cassandra cluster
>   at 
> org.apache.beam.sdk.io.cassandra.CassandraIOTest.buildCluster(CassandraIOTest.java:167)
>   at 
> org.apache.beam.sdk.io.cassandra.CassandraIOTest.beforeClass(CassandraIOTest.java:146)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
>   at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
>  

[jira] [Updated] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-03-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-8025:
---
Fix Version/s: (was: 2.19.0)

> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Blocker
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-8025) Cassandra IO classMethod test is flaky

2020-03-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reopened BEAM-8025:


Still flaky.

Might be for another reason though:

 
{code:java}
java.lang.RuntimeException: Unable to create embedded Cassandra cluster
at 
org.apache.beam.sdk.io.cassandra.CassandraIOTest.buildCluster(CassandraIOTest.java:167)
at 
org.apache.beam.sdk.io.cassandra.CassandraIOTest.beforeClass(CassandraIOTest.java:146)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305)
at org.junit.runners.ParentRunner.run(ParentRunner.java:412)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
at 
org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
at 
org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at 
org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
at java.lang.Thread.run(Thread.java:748)
{code}
and 

 
{code:java}
java.lang.NullPointerException
at 
info.archinnov.achilles.embedded.CassandraShutDownHook.shutDownNow(CassandraShutDownHook.java:81)
at 
org.apache.beam.sdk.io.cassandra.CassandraIOTest.afterClass(CassandraIOTest.java:172)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 

[jira] [Created] (BEAM-9480) org.apache.beam.sdk.io.mongodb.MongoDbIOTest.testReadWithAggregate is flaky

2020-03-10 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9480:
--

 Summary: 
org.apache.beam.sdk.io.mongodb.MongoDbIOTest.testReadWithAggregate is flaky
 Key: BEAM-9480
 URL: https://issues.apache.org/jira/browse/BEAM-9480
 Project: Beam
  Issue Type: Test
  Components: io-java-mongodb
Reporter: Etienne Chauchot


[https://builds.apache.org/job/beam_PreCommit_Java_Phrase/1831/testReport/junit/org.apache.beam.sdk.io.mongodb/MongoDbIOTest/testReadWithAggregate/]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9470) :sdks:java:io:kinesis:test is flaky

2020-03-09 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9470:
--

 Summary: :sdks:java:io:kinesis:test is flaky
 Key: BEAM-9470
 URL: https://issues.apache.org/jira/browse/BEAM-9470
 Project: Beam
  Issue Type: Test
  Components: io-java-kinesis
Reporter: Etienne Chauchot


[https://scans.gradle.com/s/b4jmmu72ku5jc/console-log?task=:sdks:java:io:kinesis:test]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9436) Try to avoid elements list materialization in GBK

2020-03-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051980#comment-17051980
 ] 

Etienne Chauchot commented on BEAM-9436:


Could not avoid materialization because ReduceFnRunner needs an 
Iterable> but managed to avoid a flatmap step and memory 
consumption with KV creation.

> Try to avoid elements list materialization in GBK
> -
>
> Key: BEAM-9436
> URL: https://issues.apache.org/jira/browse/BEAM-9436
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9436) Try to avoid elements list materialization in GBK

2020-03-04 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9436:
--

 Summary: Try to avoid elements list materialization in GBK
 Key: BEAM-9436
 URL: https://issues.apache.org/jira/browse/BEAM-9436
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory

2020-03-04 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-9307:
--

Assignee: Etienne Chauchot

> Put windows inside the key to avoid having all values for the same key in 
> memory
> 
>
> Key: BEAM-9307
> URL: https://issues.apache.org/jira/browse/BEAM-9307
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> On both group by key and combinePerKey.
> Like it was done for the current runner.
> See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory

2020-03-04 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9307:
---
Description: 
On both group by key and combinePerKey.

Like it was done for the current runner.

See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10

  was:
Like it was done for the current runner.

See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10


> Put windows inside the key to avoid having all values for the same key in 
> memory
> 
>
> Key: BEAM-9307
> URL: https://issues.apache.org/jira/browse/BEAM-9307
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> On both group by key and combinePerKey.
> Like it was done for the current runner.
> See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation

2020-03-04 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-9327.

Fix Version/s: Not applicable
   Resolution: Won't Fix

recoding adding an input to an accumulator translation with manual window 
merging is less performant (mesured with nexmark) than calling 
mergeAccumulators(accumulator,  addInput(createAccumulator, input))

Using current SparkCombineFn in the new runner is less performant also.

> Avoid creating a new accumulator for each add input in Combine translation
> --
>
> Key: BEAM-9327
> URL: https://issues.apache.org/jira/browse/BEAM-9327
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: Not applicable
>
>
> similar to latest inprovement in the current runner. See: 
> [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation

2020-02-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-9327:
--

Assignee: Etienne Chauchot

> Avoid creating a new accumulator for each add input in Combine translation
> --
>
> Key: BEAM-9327
> URL: https://issues.apache.org/jira/browse/BEAM-9327
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>
> similar to latest inprovement in the current runner. See: 
> [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9307) Put windows inside the key to avoid having all values for the same key in memory

2020-02-17 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9307:
---
Summary: Put windows inside the key to avoid having all values for the same 
key in memory  (was: new spark runner: put windows inside the key to avoid 
having all values for the same key in memory)

> Put windows inside the key to avoid having all values for the same key in 
> memory
> 
>
> Key: BEAM-9307
> URL: https://issues.apache.org/jira/browse/BEAM-9307
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> Like it was done for the current runner.
> See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9327) Avoid creating a new accumulator for each add input in Combine translation

2020-02-17 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9327:
--

 Summary: Avoid creating a new accumulator for each add input in 
Combine translation
 Key: BEAM-9327
 URL: https://issues.apache.org/jira/browse/BEAM-9327
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot


similar to latest inprovement in the current runner. See: 
[https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 12



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9307) new spark runner: put windows inside the key to avoid having all values for the same key in memory

2020-02-13 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9307:
--

 Summary: new spark runner: put windows inside the key to avoid 
having all values for the same key in memory
 Key: BEAM-9307
 URL: https://issues.apache.org/jira/browse/BEAM-9307
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot


Like it was done for the current runner.

See: [https://www.youtube.com/watch?v=ZIFtmx8nBow=721s] min 10



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9270) Unable to use ElasticsearchIO for indexpatterns: index-*

2020-02-10 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033716#comment-17033716
 ] 

Etienne Chauchot commented on BEAM-9270:


I did some more testing: the problem is really restricted to only the size 
estimation. Once got a proper estimation, then the read *index** works fine

> Unable to use ElasticsearchIO for indexpatterns: index-*
> 
>
> Key: BEAM-9270
> URL: https://issues.apache.org/jira/browse/BEAM-9270
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Affects Versions: 2.19.0
>Reporter: Krishnaiah Narukulla
>Priority: Major
>
> ElasticsearchIO input doesnot work with index patterns with wildcard. for 
> example: index-*. I It works well with single index.  It becomes problem if 
> we need to query multiple indices using index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9270) Unable to use ElasticsearchIO for indexpatterns: index-*

2020-02-10 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17033655#comment-17033655
 ] 

Etienne Chauchot commented on BEAM-9270:


Hi [~krisnaru], thannks for raising this !

I can indeed reproduce this bug.

What happens is that: 

Beam Read needs to evaluate the size of the data it needs to read to know how 
to split for parallel read. To get the size of data the ElasticsearchIO calls 
the stats API of ES. Right now the IO uses a low level ES REST client. So it 
gets a json object from ES cluster listing all the indices available in the 
cluster. The IO then searches for the index name provided by the user in the 
json. In that case it does not find a match with the "*" character and 
considers that the provided index does not exist and that the size is 0 and 
returns no document. 

 

To fix that it would require to update the json parsing by manually expanding 
the wildcard. But there is an ongoing complete refactoring of ElasticsearchIO 
(see the thread on the dev mailing list) that would allow to use high level ES 
client and that will unlock many ES features. So either we do a fix to match 
the index name in the stats using a regExpr or we wait for the new 
ElasticsearchIO.

 

 

> Unable to use ElasticsearchIO for indexpatterns: index-*
> 
>
> Key: BEAM-9270
> URL: https://issues.apache.org/jira/browse/BEAM-9270
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-elasticsearch
>Affects Versions: 2.19.0
>Reporter: Krishnaiah Narukulla
>Priority: Major
>
> ElasticsearchIO input doesnot work with index patterns with wildcard. for 
> example: index-*. I It works well with single index.  It becomes problem if 
> we need to query multiple indices using index pattern.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2020-01-29 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17025780#comment-17025780
 ] 

Etienne Chauchot commented on BEAM-8470:


[~udim] it was already, see there is a new column

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: 2.18.0
>
>  Time Spent: 16h
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-9205.
--
Fix Version/s: 2.20.0
   Resolution: Fixed

> Regression in validates runner tests configuration in spark module
> --
>
> Key: BEAM-9205
> URL: https://issues.apache.org/jira/browse/BEAM-9205
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.20.0
>
>
> Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9205:
---
Parent: BEAM-3310
Issue Type: Sub-task  (was: Test)

> Regression in validates runner tests configuration in spark module
> --
>
> Key: BEAM-9205
> URL: https://issues.apache.org/jira/browse/BEAM-9205
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9205) Regression in validates runner tests configuration in spark module

2020-01-28 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9205:
--

 Summary: Regression in validates runner tests configuration in 
spark module
 Key: BEAM-9205
 URL: https://issues.apache.org/jira/browse/BEAM-9205
 Project: Beam
  Issue Type: Test
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Not all the metrics tests are run: at least MetricsPusher is no more run



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9187) org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles is flaky

2020-01-24 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9187:
--

 Summary: 
org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles
 is flaky
 Key: BEAM-9187
 URL: https://issues.apache.org/jira/browse/BEAM-9187
 Project: Beam
  Issue Type: Test
  Components: runner-core
Reporter: Etienne Chauchot


[org.apache.beam.runners.fnexecution.control.DefaultJobBundleFactoryTest.loadBalancesBundles|https://builds.apache.org/job/beam_PreCommit_Java_Commit/9730/testReport/junit/org.apache.beam.runners.fnexecution.control/DefaultJobBundleFactoryTest/loadBalancesBundles/]
 UTest seem to be flaky: 

[https://builds.apache.org/job/beam_PreCommit_Java_Commit/9730/testReport/junit/org.apache.beam.runners.fnexecution.control/DefaultJobBundleFactoryTest/loadBalancesBundles/]

 

I have runner-core as tag but in reality this test is in 
org.apache.beam.runners.fnexecution.control but there is no corresponding tag



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs

2020-01-22 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020913#comment-17020913
 ] 

Etienne Chauchot edited comment on BEAM-9065 at 1/22/20 9:53 AM:
-

Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure 
in a UTest


was (Author: echauchot):
Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure.

> Spark runner accumulates metrics (incorrectly) between runs
> ---
>
> Key: BEAM-9065
> URL: https://issues.apache.org/jira/browse/BEAM-9065
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When pipeline.run() is called, MetricsAccumulator (wrapper of 
> MetricsContainerStepMap spark accumulator) is initialized. Spark needs this 
> class to be a singleton for failover. The problem is that when several 
> pipelines are run inside the same JVM, the initialization of 
> MetricsAccumulator singleton does not reset the underlying spark accumulator  
> causing metrics to be accumulated between runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs

2020-01-22 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17020913#comment-17020913
 ] 

Etienne Chauchot commented on BEAM-9065:


Hey ! Yes, no problem, no hurry, we are still waiting to reproduce the failure.

> Spark runner accumulates metrics (incorrectly) between runs
> ---
>
> Key: BEAM-9065
> URL: https://issues.apache.org/jira/browse/BEAM-9065
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.20.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When pipeline.run() is called, MetricsAccumulator (wrapper of 
> MetricsContainerStepMap spark accumulator) is initialized. Spark needs this 
> class to be a singleton for failover. The problem is that when several 
> pipelines are run inside the same JVM, the initialization of 
> MetricsAccumulator singleton does not reset the underlying spark accumulator  
> causing metrics to be accumulated between runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-9065) Spark runner accumulates metrics (incorrectly) between runs

2020-01-09 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17011875#comment-17011875
 ] 

Etienne Chauchot commented on BEAM-9065:


[~udim] If 2.18 is not released when the patch is ready I prefer that it is 
cherry picked to 2.18

Also I'm still investigating in the precise environment to reproduce this 
observed behavior so it might not come before 2.19

> Spark runner accumulates metrics (incorrectly) between runs
> ---
>
> Key: BEAM-9065
> URL: https://issues.apache.org/jira/browse/BEAM-9065
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When pipeline.run() is called, MetricsAccumulator (wrapper of 
> MetricsContainerStepMap spark accumulator) is initialized. Spark needs this 
> class to be a singleton for failover. The problem is that when several 
> pipelines are run inside the same JVM, the initialization of 
> MetricsAccumulator singleton does not reset the underlying spark accumulator  
> causing metrics to be accumulated between runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-9065) metrics are not reset upon init

2020-01-08 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-9065:
---
Fix Version/s: 2.18.0

> metrics are not reset upon init
> ---
>
> Key: BEAM-9065
> URL: https://issues.apache.org/jira/browse/BEAM-9065
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.18.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When pipeline.run() is called, MetricsAccumulator (wrapper of 
> MetricsContainerStepMap spark accumulator) is initialized. Spark needs this 
> class to be a singleton for failover. The problem is that when several 
> pipelines are run inside the same JVM, the initialization of 
> MetricsAccumulator singleton does not reset the underlying spark accumulator  
> causing metrics to be accumulated between runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9065) metrics are not reset upon init

2020-01-08 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9065:
--

 Summary: metrics are not reset upon init
 Key: BEAM-9065
 URL: https://issues.apache.org/jira/browse/BEAM-9065
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


When pipeline.run() is called, MetricsAccumulator (wrapper of 
MetricsContainerStepMap spark accumulator) is initialized. Spark needs this 
class to be a singleton for failover. The problem is that when several 
pipelines are run inside the same JVM, the initialization of MetricsAccumulator 
singleton does not reset the underlying spark accumulator  causing metrics to 
be accumulated between runs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-9032) Replace broadcast variables based side inputs with temp views

2020-01-06 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-9032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-9032 started by Etienne Chauchot.
--
> Replace broadcast variables based side inputs with temp views
> -
>
> Key: BEAM-9032
> URL: https://issues.apache.org/jira/browse/BEAM-9032
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9032) Replace broadcast variables based side inputs with temp views

2019-12-26 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9032:
--

 Summary: Replace broadcast variables based side inputs with temp 
views
 Key: BEAM-9032
 URL: https://issues.apache.org/jira/browse/BEAM-9032
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5192) Support Elasticsearch 7.x

2019-12-26 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-5192.
--
Fix Version/s: (was: Not applicable)
   2.19.0
   Resolution: Fixed

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-9019) Improve Spark Encoders (wrappers of beam coders)

2019-12-23 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-9019:
--

 Summary: Improve Spark Encoders (wrappers of beam coders)
 Key: BEAM-9019
 URL: https://issues.apache.org/jira/browse/BEAM-9019
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


To improve maintenability and performance, replace as much as possible of 
catalyst generated code with java compiled code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-12-18 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-5690.

Fix Version/s: 2.19.0
   Resolution: Fixed

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5690) Issue with GroupByKey in BeamSql using SparkRunner

2019-12-18 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999012#comment-16999012
 ] 

Etienne Chauchot commented on BEAM-5690:


Yes, thanks for pointing out [~iemejia], I forgot to close the related jira.

> Issue with GroupByKey in BeamSql using SparkRunner
> --
>
> Key: BEAM-5690
> URL: https://issues.apache.org/jira/browse/BEAM-5690
> Project: Beam
>  Issue Type: Task
>  Components: runner-spark
>Reporter: Kenneth Knowles
>Priority: Major
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Reported on user@
> {quote}We are trying to setup a pipeline with using BeamSql and the trigger 
> used is default (AfterWatermark crosses the window). 
> Below is the pipeline:
>   
>KafkaSource (KafkaIO) 
>---> Windowing (FixedWindow 1min)
>---> BeamSql
>---> KafkaSink (KafkaIO)
>  
> We are using Spark Runner for this. 
> The BeamSql query is:
> {code}select Col3, count(*) as count_col1 from PCOLLECTION GROUP BY Col3{code}
> We are grouping by Col3 which is a string. It can hold values string[0-9]. 
>  
> The records are getting emitted out at 1 min to kafka sink, but the output 
> record in kafka is not as expected.
> Below is the output observed: (WST and WET are indicators for window start 
> time and window end time)
> {code}
> {"count_col1":1,"Col3":"string5","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":3,"Col3":"string7","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":2,"Col3":"string8","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string2","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":1,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00   +"}
> {"count_col1":0,"Col3":"string6","WST":"2018-10-09  09-55-00   
> +","WET":"2018-10-09  09-56-00 0}
> {code}
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-8894.

Fix Version/s: Not applicable
   Resolution: Not A Problem

Lack of definition of the model in that case. Should runners re-encode ? 
Waiting for a common definition among runners, all runners exclude this test.

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: Not applicable
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8338) Support ES 7.x for ElasticsearchIO

2019-12-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-8338.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

> Support ES 7.x for ElasticsearchIO
> --
>
> Key: BEAM-8338
> URL: https://issues.apache.org/jira/browse/BEAM-8338
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Michal Brunát
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Elasticsearch has released 7.4 but ElasticsearchIO only supports 2x,5.x,6.x.
>  We should support ES 7.x for ElasticsearchIO.
>  [https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html]
>  
> [https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-5192) Support Elasticsearch 7.x

2019-12-12 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reopened BEAM-5192:

  Assignee: Etienne Chauchot  (was: Chet Aldrich)

reopening to resume [~zhongchen] (who has no more time) on that 

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-11 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8894 started by Etienne Chauchot.
--
> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Work started] (BEAM-8025) Cassandra IO classMethod test is flaky

2019-12-11 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on BEAM-8025 started by Etienne Chauchot.
--
> Cassandra IO classMethod test is flaky
> --
>
> Key: BEAM-8025
> URL: https://issues.apache.org/jira/browse/BEAM-8025
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-cassandra, test-failures
>Affects Versions: 2.16.0
>Reporter: Kyle Weaver
>Assignee: Etienne Chauchot
>Priority: Blocker
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The most recent runs of this test are failing:
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7398/]
> [https://builds.apache.org/job/beam_PreCommit_Java_Commit/7399/console]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-11 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-8830.

Fix Version/s: 2.19.0
   Resolution: Fixed

> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: 2.19.0
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3310) Push metrics to a backend in an runner agnostic way

2019-12-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993587#comment-16993587
 ] 

Etienne Chauchot commented on BEAM-3310:


unassigned myself because remaining work is to support Metrics pusher in 
runners other than Spark or Flink

> Push metrics to a backend in an runner agnostic way
> ---
>
> Key: BEAM-3310
> URL: https://issues.apache.org/jira/browse/BEAM-3310
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-extensions-metrics, sdk-java-core
>Reporter: Etienne Chauchot
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-3310) Push metrics to a backend in an runner agnostic way

2019-12-11 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-3310:
--

Assignee: (was: Etienne Chauchot)

> Push metrics to a backend in an runner agnostic way
> ---
>
> Key: BEAM-3310
> URL: https://issues.apache.org/jira/browse/BEAM-3310
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-extensions-metrics, sdk-java-core
>Reporter: Etienne Chauchot
>Priority: Major
>  Time Spent: 18h 50m
>  Remaining Estimate: 0h
>
> The idea is to avoid relying on the runners to provide access to the metrics 
> (either at the end of the pipeline or while it runs) because they don't have 
> all the same capabilities towards metrics (e.g. spark runner configures sinks 
>  like csv, graphite or in memory sinks using the spark engine conf). The 
> target is to push the metrics in the common runner code so that no matter the 
> chosen runner, a user can get his metrics out of beam.
> Here is the link to the discussion thread on the dev ML: 
> https://lists.apache.org/thread.html/01a80d62f2df6b84bfa41f05e15fda900178f882877c294fed8be91e@%3Cdev.beam.apache.org%3E
> And the design doc:
> https://s.apache.org/runner_independent_metrics_extraction



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581
 ] 

Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:17 PM:
--

discussion on heterogeneous coders in the output PCollection of 
flatten:[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]


was (Author: echauchot):
[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]
 on heterogeneous coders in the output PCollection of flatten: 
[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4088) ExecutorServiceParallelExecutorTest#ensureMetricsThreadDoesntLeak in PR #4965 does not pass in gradle

2019-12-11 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-4088:
--

Assignee: (was: Etienne Chauchot)

> ExecutorServiceParallelExecutorTest#ensureMetricsThreadDoesntLeak in PR #4965 
> does not pass in gradle
> -
>
> Key: BEAM-4088
> URL: https://issues.apache.org/jira/browse/BEAM-4088
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> This is a new test being added to ensure threads don't leak. The failure 
> seems to indicate that threads do leak.
> This test fails using gradle but previously passed using maven
> PR: https://github.com/apache/beam/pull/4965
> Presubmit Failure: 
>  * https://builds.apache.org/job/beam_PreCommit_Java_GradleBuild/4059/
>  * 
> https://scans.gradle.com/s/grha56432j3t2/tests/jqhvlvf72f7pg-ipde5etqqejoa?openStackTraces=WzBd



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581
 ] 

Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:17 PM:
--

discussion on heterogeneous coders in the output PCollection of flatten: 
[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]


was (Author: echauchot):
discussion on heterogeneous coders in the output PCollection of 
flatten:[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581
 ] 

Etienne Chauchot edited comment on BEAM-8894 at 12/11/19 2:16 PM:
--

[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]
 on heterogeneous coders in the output PCollection of flatten: 
[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]


was (Author: echauchot):
[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]
 on heterogeneous coders in the output PCollection of flatten: 

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-11 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16993581#comment-16993581
 ] 

Etienne Chauchot commented on BEAM-8894:


[discussion|[https://lists.apache.org/thread.html/bf340f975b78a5e4237cd2678bc9a09b0a47bb8adae2b0f7034c6749%40%3Cdev.beam.apache.org%3E]]
 on heterogeneous coders in the output PCollection of flatten: 

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8909) Implement Timer API translation in spark runner

2019-12-06 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8909:
--

 Summary: Implement Timer API translation in spark runner
 Key: BEAM-8909
 URL: https://issues.apache.org/jira/browse/BEAM-8909
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8908) Implement State API translation in spark runner

2019-12-06 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8908:
--

 Summary: Implement State API translation in spark runner
 Key: BEAM-8908
 URL: https://issues.apache.org/jira/browse/BEAM-8908
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8907) Implement SplittableDoFn translation in spark runner

2019-12-06 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8907:
--

 Summary: Implement SplittableDoFn translation in spark runner
 Key: BEAM-8907
 URL: https://issues.apache.org/jira/browse/BEAM-8907
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-8470.

Fix Version/s: 2.18.0
   Resolution: Fixed

> Create a new Spark runner based on Spark Structured streaming framework
> ---
>
> Key: BEAM-8470
> URL: https://issues.apache.org/jira/browse/BEAM-8470
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>  Labels: structured-streaming
> Fix For: 2.18.0
>
>  Time Spent: 15h 50m
>  Remaining Estimate: 0h
>
> h1. Why is it worth creating a new runner based on structured streaming:
> Because this new framework brings:
>  * Unified batch and streaming semantics:
>  * no more RDD/DStream distinction, as in Beam (only PCollection)
>  * Better state management:
>  * incremental state instead of saving all each time
>  * No more synchronous saving delaying computation: per batch and partition 
> delta file saved asynchronously + in-memory hashmap synchronous put/get
>  * Schemas in datasets:
>  * The dataset knows the structure of the data (fields) and can optimize 
> later on
>  * Schemas in PCollection in Beam
>  * New Source API
>  * Very close to Beam bounded source and unbounded sources
> h1. Why make a new runner from scratch?
>  * Structured streaming framework is very different from the RDD/Dstream 
> framework
> h1. We hope to gain
>  * More up to date runner in terms of libraries: leverage new features
>  * Leverage learnt practices from the previous runners
>  * Better performance thanks to the DAG optimizer (catalyst) and by 
> simplifying the code.
>  * Simplify the code and ease the maintenance
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988666#comment-16988666
 ] 

Etienne Chauchot commented on BEAM-8894:


I hav split this tests because it might be unrelated to flatten

> Fix multiple coder bug in new spark runner
> --
>
> Key: BEAM-8894
> URL: https://issues.apache.org/jira/browse/BEAM-8894
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> The test below does not pass. I fails with an EOF while calling 
> NullableCoder(BigEndianCoder.decode) 
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
> In the "old" spark runner, this test passes because it never call the 
> NullableCoder because there is no serialization done. 
> There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6209) Remove Http Metrics Sink specific methods from PipelineOptions

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot resolved BEAM-6209.

Fix Version/s: 2.10.0
   Resolution: Fixed

> Remove Http Metrics Sink specific methods from PipelineOptions
> --
>
> Key: BEAM-6209
> URL: https://issues.apache.org/jira/browse/BEAM-6209
> Project: Beam
>  Issue Type: Improvement
>  Components: runner-extensions-metrics
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
> Fix For: 2.10.0
>
>
> Methods specific to Metrics Http Sink should be moved to a 
> PipelineOptionsMetricsHttpSink interface to avoid having technology specific 
> methods in base classes/interfaces.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2499) Support Custom Windows in Spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-2499:
--

Assignee: (was: Etienne Chauchot)

> Support Custom Windows in Spark runner
> --
>
> Key: BEAM-2499
> URL: https://issues.apache.org/jira/browse/BEAM-2499
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> If we extend {{IntervalWindow}} and we try to merge these custom windows like 
> in this PR:
> https://github.com/apache/beam/pull/3286
> Then spark runner fails with 
> {{org.apache.beam.sdk.Pipeline$PipelineExecutionException: 
> java.lang.ClassCastException: 
> org.apache.beam.sdk.transforms.windowing.IntervalWindow cannot be cast to 
> org.apache.beam.sdk.transforms.windowing.MergingCustomWindowsTest$CustomWindow}}
> It seems to be because of the cast to {{IntervalWindow}} there: 
> https://github.com/apache/beam/blob/master/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/SparkGlobalCombineFn.java#L111



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-2176) Support state API in Spark streaming mode.

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-2176:
--

Assignee: (was: Etienne Chauchot)

> Support state API in Spark streaming mode.
> --
>
> Key: BEAM-2176
> URL: https://issues.apache.org/jira/browse/BEAM-2176
> Project: Beam
>  Issue Type: Sub-task
>  Components: runner-spark
>Reporter: Aviem Zur
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4226) Migrate hadoop dependency to 2.7.4 or upper to fix a CVE

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-4226:
--

Assignee: (was: Etienne Chauchot)

> Migrate hadoop dependency to 2.7.4 or upper to fix a CVE
> 
>
> Key: BEAM-4226
> URL: https://issues.apache.org/jira/browse/BEAM-4226
> Project: Beam
>  Issue Type: Task
>  Components: sdk-java-core
>Reporter: Etienne Chauchot
>Priority: Major
>
> apache hadoop is subject to a vulnerability:
> CVE-2016-6811: Apache Hadoop Privilege escalation vulnerability
> We should upgrade the dep to maybe 2.7.4 which is the closest to what we 
> actually use (2.7.3)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5171) org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests are flaky in Spark runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot reassigned BEAM-5171:
--

Assignee: (was: Etienne Chauchot)

> org.apache.beam.sdk.io.CountingSourceTest.test[Un]boundedSourceSplits tests 
> are flaky in Spark runner
> -
>
> Key: BEAM-5171
> URL: https://issues.apache.org/jira/browse/BEAM-5171
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Valentyn Tymofieiev
>Priority: Major
>
> Two tests: 
>  org.apache.beam.sdk.io.CountingSourceTest.testUnboundedSourceSplits 
>  org.apache.beam.sdk.io.CountingSourceTest.testBoundedSourceSplits
> failed in a PostCommit [Spark Validates Runner test 
> suite|https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/1277/testReport/]
>  with an error that seems to be common for Spark. Could this be due to 
> misconfiguration of Spark cluster? 
> Task serialization failed: java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d.
>  java.io.IOException: Failed to create local dir in 
> /tmp/blockmgr-de91f449-e5d1-4be4-acaa-3ee06fdfa95b/1d.
>  at 
> org.apache.spark.storage.DiskBlockManager.getFile(DiskBlockManager.scala:70)
>  at org.apache.spark.storage.DiskStore.remove(DiskStore.scala:116)
>  at 
> org.apache.spark.storage.BlockManager.removeBlockInternal(BlockManager.scala:1511)
>  at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1045)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1083)
>  at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:841)
>  at org.apache.spark.storage.BlockManager.putSingle(BlockManager.scala:1404)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:123)
>  at 
> org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:88)
>  at 
> org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
>  at 
> org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:62)
>  at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1482)
>  at 
> org.apache.spark.scheduler.DAGScheduler.submitMissingTasks(DAGScheduler.scala:1039)
>  at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:947)
>  at 
> org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:891)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1780)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1772)
>  at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1761)
>  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8894) Fix multiple coder bug in new spark runner

2019-12-05 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8894:
--

 Summary: Fix multiple coder bug in new spark runner
 Key: BEAM-8894
 URL: https://issues.apache.org/jira/browse/BEAM-8894
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


The test below does not pass. I fails with an EOF while calling 
NullableCoder(BigEndianCoder.decode) 

'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'

In the "old" spark runner, this test passes because it never call the 
NullableCoder because there is no serialization done. 

There may be a problem in the NullableCoder itself



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-12-05 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot updated BEAM-8830:
---
Description: 
'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
 
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'

  was:
'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'


> fix Flatten tests in Spark Structured Streaming runner
> --
>
> Key: BEAM-8830
> URL: https://issues.apache.org/jira/browse/BEAM-8830
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Assignee: Etienne Chauchot
>Priority: Major
>
> 'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8839) Failed ValidatesRunner tests for new Spark Runner

2019-12-02 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-8839.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

split this failures into 3 tickets for 3 causes:

-dofnlifecycle

-combine with context

-flatten

> Failed ValidatesRunner tests for new Spark Runner
> -
>
> Key: BEAM-8839
> URL: https://issues.apache.org/jira/browse/BEAM-8839
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Alexey Romanenko
>Priority: Major
>  Labels: structured-streaming
> Fix For: Not applicable
>
>
> Some of the transform translations for new Spark runner (Flatten and Combine 
> in particular) fail on several corner cases when ValidatesRunner tests are 
> running. These tests were excluded from run for new Spark Runner to keep 
> Jenkins job green and detect regression easily (if any) but they have to be 
> addressed and correspondent translations have to be fixed.
> List of these tests:
> {code}
>  
> 'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext'
>  
> 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext'
>  
> 'org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty'
>  
> 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext'
>  
> 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext'
>  
> 'org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext'
>  'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
>  'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
>  'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
>  
> 'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'
>  
> 'org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8860) Implement combine with context in new spark runner

2019-12-02 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8860:
--

 Summary: Implement combine with context in new spark runner
 Key: BEAM-8860
 URL: https://issues.apache.org/jira/browse/BEAM-8860
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


Validates runner tests below fail in spark structured streaming runner because 
combine with context (for side inputs) is not implemented:
 
'org.apache.beam.sdk.transforms.CombineFnsTest.testComposedCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$CombineWithContextTests.testSimpleCombineWithContextEmpty''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testFixedWindowsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSessionsCombineWithContext''org.apache.beam.sdk.transforms.CombineTest$WindowingTests.testSlidingWindowsCombineWithContext'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8859) Fix org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded in new spark runner

2019-12-02 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8859:
--

 Summary: Fix 
org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded 
in new spark runner
 Key: BEAM-8859
 URL: https://issues.apache.org/jira/browse/BEAM-8859
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot


validates runner test 
org.apache.beam.sdk.transforms.SplittableDoFnTest.testLifecycleMethodsBounded
fails in spark structured streaming runner



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8830) fix Flatten tests in Spark Structured Streaming runner

2019-11-26 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8830:
--

 Summary: fix Flatten tests in Spark Structured Streaming runner
 Key: BEAM-8830
 URL: https://issues.apache.org/jira/browse/BEAM-8830
 Project: Beam
  Issue Type: Bug
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


'org.apache.beam.sdk.transforms.FlattenTest.testEmptyFlattenAsSideInput'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenMultipleCoders'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmpty'
'org.apache.beam.sdk.transforms.FlattenTest.testFlattenPCollectionsEmptyThenParDo'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-5192) Support Elasticsearch 7.x

2019-11-26 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-5192.
--
Fix Version/s: Not applicable
   Resolution: Duplicate

> Support Elasticsearch 7.x
> -
>
> Key: BEAM-5192
> URL: https://issues.apache.org/jira/browse/BEAM-5192
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-elasticsearch
>Reporter: Etienne Chauchot
>Assignee: Chet Aldrich
>Priority: Major
> Fix For: Not applicable
>
>
> ES v7 is not out yet. But Elastic team scheduled a breaking change for ES 
> 7.0: the removal of the type feature. See 
> [https://www.elastic.co/blog/index-type-parent-child-join-now-future-in-elasticsearch]
> This will require a good amont of changes in the IO. 
> This ticket is there to track the future work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-198) Spark runner batch translator to work with Datasets instead of RDDs

2019-10-24 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-198.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Spark runner batch translator to work with Datasets instead of RDDs
> ---
>
> Key: BEAM-198
> URL: https://issues.apache.org/jira/browse/BEAM-198
> Project: Beam
>  Issue Type: New Feature
>  Components: runner-spark
>Reporter: Amit Sela
>Assignee: Jean-Baptiste Onofré
>Priority: Major
> Fix For: Not applicable
>
>
> Currently, the Spark runner translates batch pipelines into RDD code, meaning 
> it doesn't benefit from the optimizations DataFrames (which isn't type-safe) 
> enjoys.
> With Datasets, batch pipelines will benefit the optimizations, adding to that 
> that Datasets are type-safe and encoder-based they seem like a much better 
> fit for the Beam model.
> Looking ahead, Datasets is a good choice since it's the basis for the future 
> of Spark streaming as well  (Structured Streaming) so this will hopefully lay 
> a solid foundation for a native integration between Spark 2.0 and Beam.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8470) Create a new Spark runner based on Spark Structured streaming framework

2019-10-24 Thread Etienne Chauchot (Jira)
Etienne Chauchot created BEAM-8470:
--

 Summary: Create a new Spark runner based on Spark Structured 
streaming framework
 Key: BEAM-8470
 URL: https://issues.apache.org/jira/browse/BEAM-8470
 Project: Beam
  Issue Type: Improvement
  Components: runner-spark
Reporter: Etienne Chauchot
Assignee: Etienne Chauchot


h1. Why is it worth creating a new runner based on structured streaming:

Because this new framework brings:
 * Unified batch and streaming semantics:
 * no more RDD/DStream distinction, as in Beam (only PCollection)


 * Better state management:
 * incremental state instead of saving all each time
 * No more synchronous saving delaying computation: per batch and partition 
delta file saved asynchronously + in-memory hashmap synchronous put/get


 * Schemas in datasets:
 * The dataset knows the structure of the data (fields) and can optimize later 
on
 * Schemas in PCollection in Beam


 * New Source API
 * Very close to Beam bounded source and unbounded sources

h1. Why make a new runner from scratch?
 * Structured streaming framework is very different from the RDD/Dstream 
framework

h1. We hope to gain
 * More up to date runner in terms of libraries: leverage new features
 * Leverage learnt practices from the previous runners
 * Better performance thanks to the DAG optimizer (catalyst) and by simplifying 
the code.
 * Simplify the code and ease the maintenance

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7184) Performance regression on spark-runner

2019-10-09 Thread Etienne Chauchot (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Etienne Chauchot closed BEAM-7184.
--
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
> Fix For: Not applicable
>
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7184) Performance regression on spark-runner

2019-10-09 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16947644#comment-16947644
 ] 

Etienne Chauchot commented on BEAM-7184:


As it has not shown up since April, I guess we can close it for now, if a 
similar perf drop raises again, we could still reopen.

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7184) Performance regression on spark-runner

2019-10-08 Thread Etienne Chauchot (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16946820#comment-16946820
 ] 

Etienne Chauchot commented on BEAM-7184:


Do we know what was causing the perf drop ?

> Performance regression on spark-runner
> --
>
> Key: BEAM-7184
> URL: https://issues.apache.org/jira/browse/BEAM-7184
> Project: Beam
>  Issue Type: Bug
>  Components: runner-spark
>Reporter: Etienne Chauchot
>Priority: Major
>
> There is a performance degradation of+200% in spark runner starting on 04/10 
> for all the Nexmark queries. See 
> https://apache-beam-testing.appspot.com/explore?dashboard=5138380291571712



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >