[jira] [Created] (BEAM-9559) Remove smoke load test for Java and Python dsk

2020-03-19 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-9559:
---

 Summary: Remove smoke load test for Java and Python dsk
 Key: BEAM-9559
 URL: https://issues.apache.org/jira/browse/BEAM-9559
 Project: Beam
  Issue Type: Wish
  Components: testing
Reporter: Lukasz Gajowy
Assignee: Michał Walenia


As discussed in PR: 
[https://github.com/apache/beam/pull/11135#discussion_r392852028] 

No one seems to use them and regular load test will fail too whenever something 
is wrong. There's plenty of load tests now and they are smaller than they used 
to be at the time of creating the smoke tests. If something is wrong there will 
be quite quick feedback from them. All that makes the smoke test redundant imo. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project and exhaust GCP resources

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8938:
---

Assignee: Kamil Wasilewski

> Tests end up leaving stale Dataflow jobs in apache-beam-testing project and 
> exhaust GCP resources
> -
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kamil Wasilewski
>Priority: Blocker
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.
> Please see the devlist thread for more context: 
> [https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project and exhaust GCP resources

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000161#comment-17000161
 ] 

Lukasz Gajowy commented on BEAM-8938:
-

[~kamilwu] I assigned you because this requires some care. Feel free to 
unassign/reasign 

> Tests end up leaving stale Dataflow jobs in apache-beam-testing project and 
> exhaust GCP resources
> -
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Kamil Wasilewski
>Priority: Blocker
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.
> Please see the devlist thread for more context: 
> [https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000158#comment-17000158
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

There are still problem with Java11 tests (timeouting)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8424:
---

Assignee: Michal Walenia

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000158#comment-17000158
 ] 

Lukasz Gajowy edited comment on BEAM-8424 at 12/19/19 4:00 PM:
---

There are still problems with Java11 tests (timeouting)


was (Author: łukaszg):
There are still problem with Java11 tests (timeouting)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-7368) Run Python GBK load tests on portable Flink runner

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-7368.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Run Python GBK load tests on portable Flink runner
> --
>
> Key: BEAM-7368
> URL: https://issues.apache.org/jira/browse/BEAM-7368
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7245) Encapsulate supplier, monitor and metric naming logic in some common TestMetric type

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000153#comment-17000153
 ] 

Lukasz Gajowy commented on BEAM-7245:
-

[~mwalenia] is this ticket finished?

> Encapsulate supplier, monitor and metric naming logic in some common 
> TestMetric type 
> -
>
> Key: BEAM-7245
> URL: https://issues.apache.org/jira/browse/BEAM-7245
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> After an offline discussion together with @mwalenia we decided to create 
> concrete classes for each metric type (Item_count, byte_count, time). Each 
> class like this will contain: 
> - metric name
> - supplier for the metric 
> - monitor for the metric
> It turns out that all this (along with the monitor/supplier can be 
> encapsulated and then attached to the pipeline/metrics reading where needed. 
> This will also encapsulate the naming logic (so that there are no typos 
> again).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-7245) Encapsulate supplier, monitor and metric naming logic in some common TestMetric type

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000153#comment-17000153
 ] 

Lukasz Gajowy edited comment on BEAM-7245 at 12/19/19 3:58 PM:
---

[~mwalenia] should we close this ticket?


was (Author: łukaszg):
[~mwalenia] is this ticket finished?

> Encapsulate supplier, monitor and metric naming logic in some common 
> TestMetric type 
> -
>
> Key: BEAM-7245
> URL: https://issues.apache.org/jira/browse/BEAM-7245
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> After an offline discussion together with @mwalenia we decided to create 
> concrete classes for each metric type (Item_count, byte_count, time). Each 
> class like this will contain: 
> - metric name
> - supplier for the metric 
> - monitor for the metric
> It turns out that all this (along with the monitor/supplier can be 
> encapsulated and then attached to the pipeline/metrics reading where needed. 
> This will also encapsulate the naming logic (so that there are no typos 
> again).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-7115) TFRecordIOIT write_time metrics are allways 0.0

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000151#comment-17000151
 ] 

Lukasz Gajowy commented on BEAM-7115:
-

[~pawel.pasterz] do you think you could take a look?

> TFRecordIOIT write_time metrics are allways 0.0
> ---
>
> Key: BEAM-7115
> URL: https://issues.apache.org/jira/browse/BEAM-7115
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>
> Is it because the test is so small? Or the metric is not collected well?
> This is visible in the dashboards: 
> [https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688] 
> (look for TFRecordIOIT widget)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6969) Provide way to collect start/end read/write time inside the IOs

2019-12-19 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000150#comment-17000150
 ] 

Lukasz Gajowy commented on BEAM-6969:
-

[~pawel.pasterz] [~mwalenia] you might find this (historical) ticket 
interesting... :)

> Provide way to collect start/end read/write time inside the IOs
> ---
>
> Key: BEAM-6969
> URL: https://issues.apache.org/jira/browse/BEAM-6969
> Project: Beam
>  Issue Type: Wish
>  Components: io-ideas, testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Currently, IO tests measure time using Metrics API but collect start/end time 
> from ParDo transforms that are adjacent to the IO. It's fine for some tests 
> but maybe could be done better. The drawback of the current solution is that 
> we cannot collect time before PBegin and after PDone. Other than that the 
> time we collect now is still not the exact time of read/write start/end but 
> only the time at which first/last record appeared in the DoFn.
> See: 
> [TimeMonitor.java|https://github.com/apache/beam/blob/957b7cc7746aa626d2eb4dea341f668ec19d5d39/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
>  as an example of such DoFn.
> Possible solution: save metrics in startBundle / finishBundle method in IOs 
> whenever a dedicated pipelineOption is set to true. 
> In general, maybe it's a good idea to place some other metrics inside IOs 
> too? wdyt?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6448) Enable SDK log messages in test log output

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-6448.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Enable SDK log messages in test log output 
> ---
>
> Key: BEAM-6448
> URL: https://issues.apache.org/jira/browse/BEAM-6448
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> Dataflow test logs on Jenkins do not contain SDK log messages. Enabling them 
> would make debugging easier.   
> See this 
> [suggestion|https://github.com/apache/beam/pull/7497#pullrequestreview-192245735]
>  for reference.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6408) beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-6408.
---
Fix Version/s: Not applicable
   Resolution: Resolved

> beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts
> --
>
> Key: BEAM-6408
> URL: https://issues.apache.org/jira/browse/BEAM-6408
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This job starts a load test that lasts 4 hours on Dataflow (on 10 workers). 
> It fails due to jenkins timeout. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-6349) Exceptions (IllegalArgumentException or NoClassDefFoundError) when running tests on Dataflow runner

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-6349.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Exceptions (IllegalArgumentException or NoClassDefFoundError) when running 
> tests on Dataflow runner
> ---
>
> Key: BEAM-6349
> URL: https://issues.apache.org/jira/browse/BEAM-6349
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Craig Chambers
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Running GroupByKeyLoadTest results in the following error on Dataflow runner:
>  
> {code:java}
> java.lang.ExceptionInInitializerError
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:344)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$2.typedApply(IntrinsicMapTaskExecutorFactory.java:338)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:63)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks$TypeSafeNodeFunction.apply(Networks.java:50)
>   at 
> org.apache.beam.runners.dataflow.worker.graph.Networks.replaceDirectedNetworkNodes(Networks.java:87)
>   at 
> org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory.create(IntrinsicMapTaskExecutorFactory.java:120)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.doWork(BatchDataflowWorker.java:337)
>   at 
> org.apache.beam.runners.dataflow.worker.BatchDataflowWorker.getAndPerformWork(BatchDataflowWorker.java:291)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.doWork(DataflowBatchWorkerHarness.java:135)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:115)
>   at 
> org.apache.beam.runners.dataflow.worker.DataflowBatchWorkerHarness$WorkerThread.call(DataflowBatchWorkerHarness.java:102)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.IllegalArgumentException: Multiple entries with same 
> key: 
> kind:varint=org.apache.beam.runners.dataflow.util.CloudObjectTranslators$8@39b69c48
>  and 
> kind:varint=org.apache.beam.runners.dataflow.worker.RunnerHarnessCoderCloudObjectTranslatorRegistrar$1@7966f294
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap.checkNoConflict(ImmutableMap.java:136)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.checkNoConflictInKeyBucket(RegularImmutableMap.java:100)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.RegularImmutableMap.fromEntryArray(RegularImmutableMap.java:86)
>   at 
> org.apache.beam.repackaged.beam_runners_google_cloud_dataflow_java.com.google.common.collect.ImmutableMap$Builder.build(ImmutableMap.java:300)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.populateCloudObjectTranslators(CloudObjects.java:60)
>   at 
> org.apache.beam.runners.dataflow.util.CloudObjects.(CloudObjects.java:39)
>   ... 15 more
> {code}
>  
> Example command to run the tests (FWIW, it also runs the  "clean" task 
> although I don't know if it's necessary):
> {code:java}
> ./gradlew clean :beam-sdks-java-load-tests:run --info 
> -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest 
> -Prunner=:beam-runners-google-cloud-dataflow-java 
> '-PloadTest.args=--sourceOptions={"numRecords":1000,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}
>  
> --stepOptions={"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true,"perBundleDelay":1,"perBundleDelayType":"MIXED","cpuUtilizationInMixedDelay":0.5}
>  --fanout=1 --iterations=1 --runner=DataflowRunner'{code}
>  
> After reverting commit bac909b8e237ef8a2ab7e17ac986e5cc90143e5b ([PR: 
> 7351|https://github.com/apache/beam/pull/7351]) I can no longer reproduce 
> this issue.



--
This message was sent by 

[jira] [Closed] (BEAM-4367) Phrase-triggered job (IOIT) never stopped running.

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-4367.
---
Fix Version/s: Not applicable
   Resolution: Cannot Reproduce

> Phrase-triggered job (IOIT) never stopped running.
> --
>
> Key: BEAM-4367
> URL: https://issues.apache.org/jira/browse/BEAM-4367
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> +Steps to reproduce:+ 
> 1. Define a new Jenkins' job and allow "Phrase triggering" it and timmer 
> triggering (cron job).
> 2. Type "Run seed job" in a pull request comment to trigger Jenkins Seed job 
> on the PR.
> 3. Type the phrase to trigger the job. (eg. "Run Java ParquetIO Performance 
> Test")
> 4. Run the seed job again (from master branch).
> +Expected Result:
> +The job gets triggered only few times (once would be best). It never 
> triggers from cron after the seed job from the master branch is run because 
> there's no job definition on master.
> +Actual result:+
> The job never stops triggering even though seed job from the master branch 
> was run many times.
> This happened while developing ParquetIO. The ParquetIO IT kept running for 
> 15 times more (as the time of writing this issue) even though seed job from 
> master should have canceled it. Is it due to the fact that the seed job (from 
> step 2) failed?
> +See:
> +[https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_ParquetIOIT/]
>  
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_SeedJob/] 
> [https://builds.apache.org/view/A-D/view/Beam/job/beam_SeedJob/1725/console] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-3561) Provide kubernetes cluster instance for IOITs.

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-3561.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Provide kubernetes cluster instance for IOITs.
> --
>
> Key: BEAM-3561
> URL: https://issues.apache.org/jira/browse/BEAM-3561
> Project: Beam
>  Issue Type: Task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>
> Performance tests that require running Kubernetes scripts currently cannot be 
> run on Jenkins. This is due to the fact that there is no dedicated kubernetes 
> cluster for them so Jenkins jobs cannot setup the needed infrastructure 
> anywhere.
> To allow running such tests we should provide an instance of kubernetes 
> cluster (for example a cluster hosted on GKE) and all necessary credentials 
> to connect with it from Jenkins executors (proper kubeconfig file on all 
> Jenkins executors). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4041) Performance tests fail due to kubernetes load balancer problems

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-4041.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Performance tests fail due to kubernetes load balancer problems
> ---
>
> Key: BEAM-4041
> URL: https://issues.apache.org/jira/browse/BEAM-4041
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Recently, as we added more IOITs to be run on jenkins using kubernetes, some 
> of them started to fail randomly, because they couldn't retrieve LoadBalancer 
> address. Normally obtaining the address took about one minute. Perfkit waits 
> for the address (actively checking for it) for 3 minutes. This should be 
> enough for getting the address, yet it recently started to exceed the 3 
> minutes limit. I also noticed that this error didn't happen when there were 
> fewer tests.
> Example logs:
> https://builds.apache.org/view/A-D/view/Beam/job/beam_PerformanceTests_Compressed_TextIOIT_HDFS/31/console



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8940) Load dataflow jobs using java 11 in Java 11 Dataflow tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8940:
---

Assignee: Michal Walenia

> Load dataflow jobs using java 11 in Java 11 Dataflow tests
> --
>
> Key: BEAM-8940
> URL: https://issues.apache.org/jira/browse/BEAM-8940
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Major
>
> Currently, Java 11 tests use only java11 docker worker image for verifying 
> Java 11 compatibility. Everything else (artifact staging and job startup) is 
> done using Java 8. It should be done with java11 as well - this is how users 
> will do it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4420) Add KafkaIO Integration Tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4420:
---

Assignee: (was: Lukasz Gajowy)

> Add KafkaIO Integration Tests
> -
>
> Key: BEAM-4420
> URL: https://issues.apache.org/jira/browse/BEAM-4420
> Project: Beam
>  Issue Type: Test
>  Components: io-java-kafka, testing
>Reporter: Ismaël Mejía
>Priority: Minor
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> It is a good idea to have ITs for KafkaIO.
> There are two possible issues:
> 1. The tests should probably invert the pattern to be readThenWrite given 
> that Unbounded IOs block on Read and ...
> 2. Until we have a way to do PAsserts on Unbounded sources we can rely on 
> withMaxNumRecords to ensure this test ends.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8319) Errorprone 0.0.13 fails during JDK11 build

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8319:
---

Assignee: (was: Lukasz Gajowy)

> Errorprone 0.0.13 fails during JDK11 build
> --
>
> Key: BEAM-8319
> URL: https://issues.apache.org/jira/browse/BEAM-8319
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I'm using openjdk 1.11.02. After switching version to;
> {code:java}
> javaVersion = 11 {code}
> in BeamModule Plugin and running
> {code:java}
> ./gradlew clean build -p sdks/java/code -xtest {code}
> building fails. I was able to run errorprone after upgrading it but had 
> problems with conflicting guava version. See more here: 
> https://issues.apache.org/jira/browse/BEAM-5085
>  
> Stacktrace:
> {code:java}
> org.gradle.api.tasks.TaskExecutionException: Execution failed for task 
> ':model:pipeline:compileJava'.
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:121)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter$2.accept(ExecuteActionsTaskExecuter.java:117)
> at org.gradle.internal.Try$Failure.ifSuccessfulOrElse(Try.java:184)
> at 
> org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.execute(ExecuteActionsTaskExecuter.java:110)
> at 
> org.gradle.api.internal.tasks.execution.ResolveIncrementalChangesTaskExecuter.execute(ResolveIncrementalChangesTaskExecuter.java:84)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskOutputCachingStateExecuter.execute(ResolveTaskOutputCachingStateExecuter.java:91)
> at 
> org.gradle.api.internal.tasks.execution.FinishSnapshotTaskInputsBuildOperationTaskExecuter.execute(FinishSnapshotTaskInputsBuildOperationTaskExecuter.java:51)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBuildCacheKeyExecuter.execute(ResolveBuildCacheKeyExecuter.java:102)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionStateTaskExecuter.execute(ResolveBeforeExecutionStateTaskExecuter.java:74)
> at 
> org.gradle.api.internal.tasks.execution.ValidatingTaskExecuter.execute(ValidatingTaskExecuter.java:58)
> at 
> org.gradle.api.internal.tasks.execution.SkipEmptySourceFilesTaskExecuter.execute(SkipEmptySourceFilesTaskExecuter.java:109)
> at 
> org.gradle.api.internal.tasks.execution.ResolveBeforeExecutionOutputsTaskExecuter.execute(ResolveBeforeExecutionOutputsTaskExecuter.java:67)
> at 
> org.gradle.api.internal.tasks.execution.StartSnapshotTaskInputsBuildOperationTaskExecuter.execute(StartSnapshotTaskInputsBuildOperationTaskExecuter.java:52)
> at 
> org.gradle.api.internal.tasks.execution.ResolveAfterPreviousExecutionStateTaskExecuter.execute(ResolveAfterPreviousExecutionStateTaskExecuter.java:46)
> at 
> org.gradle.api.internal.tasks.execution.CleanupStaleOutputsExecuter.execute(CleanupStaleOutputsExecuter.java:93)
> at 
> org.gradle.api.internal.tasks.execution.FinalizePropertiesTaskExecuter.execute(FinalizePropertiesTaskExecuter.java:45)
> at 
> org.gradle.api.internal.tasks.execution.ResolveTaskExecutionModeExecuter.execute(ResolveTaskExecutionModeExecuter.java:94)
> at 
> org.gradle.api.internal.tasks.execution.SkipTaskWithNoActionsExecuter.execute(SkipTaskWithNoActionsExecuter.java:57)
> at 
> org.gradle.api.internal.tasks.execution.SkipOnlyIfTaskExecuter.execute(SkipOnlyIfTaskExecuter.java:56)
> at 
> org.gradle.api.internal.tasks.execution.CatchExceptionTaskExecuter.execute(CatchExceptionTaskExecuter.java:36)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.executeTask(EventFiringTaskExecuter.java:63)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:49)
> at 
> org.gradle.api.internal.tasks.execution.EventFiringTaskExecuter$1.call(EventFiringTaskExecuter.java:46)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:416)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$CallableBuildOperationWorker.execute(DefaultBuildOperationExecutor.java:406)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor$1.execute(DefaultBuildOperationExecutor.java:165)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:250)
> at 
> org.gradle.internal.operations.DefaultBuildOperationExecutor.execute(DefaultBuildOperationExecutor.java:158)
> 

[jira] [Assigned] (BEAM-8940) Load dataflow jobs using java 11 in Java 11 Dataflow tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8940:
---

Assignee: (was: Lukasz Gajowy)

> Load dataflow jobs using java 11 in Java 11 Dataflow tests
> --
>
> Key: BEAM-8940
> URL: https://issues.apache.org/jira/browse/BEAM-8940
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>
> Currently, Java 11 tests use only java11 docker worker image for verifying 
> Java 11 compatibility. Everything else (artifact staging and job startup) is 
> done using Java 8. It should be done with java11 as well - this is how users 
> will do it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8559:
---

Assignee: Michal Walenia  (was: Lukasz Gajowy)

> Run Dataflow Nexmark suites with Java 11
> 
>
> Key: BEAM-8559
> URL: https://issues.apache.org/jira/browse/BEAM-8559
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing-nexmark
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Minor
> Fix For: Not applicable
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.
> The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
> verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8003) Remove all mentions of PKB on Confluence / website docs

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-8003.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Remove all mentions of PKB on Confluence / website docs
> ---
>
> Key: BEAM-8003
> URL: https://issues.apache.org/jira/browse/BEAM-8003
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing, website
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7658) Synthetic unbounded source looses (duplicates?) data while splitting

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-7658:
---

Assignee: (was: Lukasz Gajowy)

> Synthetic unbounded source looses (duplicates?) data while splitting
> 
>
> Key: BEAM-7658
> URL: https://issues.apache.org/jira/browse/BEAM-7658
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>
> This came out while creating KafkaIOIT ingesting data generated using 
> SyntheticUnboundedSource. Hashcode of data created by
> {code:java}
> .apply("Calculate hashcode", Combine.globally(new 
> HashingFn()).withoutDefaults()){code}
> was different for 1000 records every time. When a number of splits was set to 
> 1 the problem disappeared (sourceOptions.forceNumInitialBundles). 
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7786) Clean up docker images in GCR after test on Portable Flink runner

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-7786:
---

Assignee: (was: Lukasz Gajowy)

> Clean up docker images in GCR after test on Portable Flink runner
> -
>
> Key: BEAM-7786
> URL: https://issues.apache.org/jira/browse/BEAM-7786
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Kasia Kucharczyk
>Priority: Major
>
> It would be useful to have TTL of docker images saved in gcr while testing 
> Portable Flink tests. If TTL is not possible then the solution would be to 
> clean up images with test tear down.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5980) Add load tests for Core Apache Beam operations

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-5980:
---

Assignee: (was: Lukasz Gajowy)

> Add load tests for Core Apache Beam operations 
> ---
>
> Key: BEAM-5980
> URL: https://issues.apache.org/jira/browse/BEAM-5980
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This involves adding a suite of load tests described in this proposal: 
> [https://s.apache.org/load-test-basic-operations]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8266) Run IOITs on Flink Runner

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8266:
---

Assignee: (was: Lukasz Gajowy)

> Run IOITs on Flink Runner
> -
>
> Key: BEAM-8266
> URL: https://issues.apache.org/jira/browse/BEAM-8266
> Project: Beam
>  Issue Type: New Feature
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> IO integration tests currently run only on Dataflow and Direct runners. There 
> are jenkins jobs only for Dataflow runner. The goal of this ticket is to 
> create jenkins jobs(s) with IOITs running on Apache Flink runner.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7659) Create IO tests for synthetic sources

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-7659:
---

Assignee: (was: Lukasz Gajowy)

> Create IO tests for synthetic sources
> -
>
> Key: BEAM-7659
> URL: https://issues.apache.org/jira/browse/BEAM-7659
> Project: Beam
>  Issue Type: Wish
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>
> Synthetic sources are not tested thoroughly. It would be good to test 
> generated data consistency (IOIT tests?) in different configurations (eg. 
> multiple splits).
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7368) Run Python GBK load tests on portable Flink runner

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-7368:
---

Assignee: (was: Lukasz Gajowy)

> Run Python GBK load tests on portable Flink runner
> --
>
> Key: BEAM-7368
> URL: https://issues.apache.org/jira/browse/BEAM-7368
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 11h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-7772) Stop using Perfkit Benchmarker tool in all tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-7772:
---

Assignee: (was: Lukasz Gajowy)

> Stop using Perfkit Benchmarker tool in all tests
> 
>
> Key: BEAM-7772
> URL: https://issues.apache.org/jira/browse/BEAM-7772
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 6h 50m
>  Remaining Estimate: 0h
>
> [Devlist thread 
> link|https://lists.apache.org/thread.html/dab1c093799248787e8b75e63b66d7389b594b649a4d9a4a5db1cfbb@%3Cdev.beam.apache.org%3E]
>  
> Currently Python, IOIT and some Dataflow and Spark performance tests are 
> relying on Perfkit Benchmarker tool. Due to the reasons discussed on the 
> devlist it was decided to remove it from Beam's tests. 
> Problems that we face currently:
>  # Changes to Gradle tasks/build configuration in the Beam codebase have to 
> be reflected in Perfkit code. This required PRs to Perfkit which can last and 
> the tests break due to this sometimes (no change in Perfkit + change already 
> there in beam = incompatibility). This is what happened in PR 8919 (above),
>  # Can't run in Python3 (depends on python 2 only library like functools32),
>  # Black box testing which hard to collect pipeline related metrics,
>  # Measurement of run time is inaccurate,
>  # It offers relatively small elasticity in comparison with eg. Jenkins tasks 
> in terms of setting up the testing infrastructure (runners, databases). For 
> example, if we'd like to setup Flink runner, and reuse it in consequent tests 
> in one go, that would be impossible. We can easily do this in Jenkins.
> Tests that use Perfkit:
>  # IO integration tests,
>  # Python performance tests,
>  # beam_PerformanceTests_Dataflow (disabled),
>  # beam_PerformanceTests_Spark (failing constantly - looks not maintained).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6394) Support for writing protobuf data to parquet

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-6394:
---

Assignee: (was: Lukasz Gajowy)

> Support for writing protobuf data to parquet
> 
>
> Key: BEAM-6394
> URL: https://issues.apache.org/jira/browse/BEAM-6394
> Project: Beam
>  Issue Type: New Feature
>  Components: io-java-parquet
>Reporter: Jozef Vilcek
>Priority: Major
>
> Parquet infrastructure does support writing protobuf data to parquet. Beam's 
> ParquetIO could give pipeline developers an option to write protobuf data 
> instead of converting them to avro.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-7431) ObjectSizeCalculator causing regressions in load tests and IOITs

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-7431.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> ObjectSizeCalculator causing regressions in load tests and IOITs
> 
>
> Key: BEAM-7431
> URL: https://issues.apache.org/jira/browse/BEAM-7431
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Critical
> Fix For: Not applicable
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> It looks (based on my investigation) that `ObjectSizeCalculator` is causing a 
> regression in load tests of core operations. the runtime increased from ~400s 
> to ~2000s. Calculating object size this way seems to be an expensive 
> operation - we probably could change it to something simpler (or not?). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5222) Parquet IO position invalid when writing integers

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-5222:
---

Assignee: (was: Lukasz Gajowy)

> Parquet IO position invalid when writing integers
> -
>
> Key: BEAM-5222
> URL: https://issues.apache.org/jira/browse/BEAM-5222
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-parquet
>Reporter: George Hilios
>Priority: Minor
>
> Please see 
> [https://github.com/apache/beam/blob/master/sdks/java/io/parquet/src/main/java/org/apache/beam/sdk/io/parquet/ParquetIO.java#L345]
> On write(int b), position is incremented by 1. On write(byte[] b, int off, 
> int len) position is incremented by the length of the byte array.
> write(int b) should increment by 4 bytes, if I'm understanding correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6408) beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-6408:
---

Assignee: (was: Lukasz Gajowy)

> beam_Java_LoadTests_GroupByKey_Dataflow_Small timeouts
> --
>
> Key: BEAM-6408
> URL: https://issues.apache.org/jira/browse/BEAM-6408
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This job starts a load test that lasts 4 hours on Dataflow (on 10 workers). 
> It fails due to jenkins timeout. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5982) Create Side input load test for Java SDK

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-5982:
---

Assignee: (was: Lukasz Gajowy)

> Create Side input load test for Java SDK
> 
>
> Key: BEAM-5982
> URL: https://issues.apache.org/jira/browse/BEAM-5982
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> This is more thoroughly described in this proposal: 
> [https://docs.google.com/document/d/1PuIQv4v06eosKKwT76u7S6IP88AnXhTf870Rcj1AHt4/edit?usp=sharing]
>  
> In short: this ticket is about implementing the Side Input load test that 
> uses SyntheticStep and Synthetic source to create load on the pipeline. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-6351) OutOfMemoryError on DirectRunner while running load tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-6351.
-
Fix Version/s: Not applicable
   Resolution: Not A Bug

That's actually a normal situation - the local (Direct) runner just runs out of 
resources when trying to process that huge amount of data. Closing 

> OutOfMemoryError on DirectRunner while running load tests 
> --
>
> Key: BEAM-6351
> URL: https://issues.apache.org/jira/browse/BEAM-6351
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kasia Kucharczyk
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The GroupByKey Java load test with 10 number of records is failing on 
> DirectRunner with OutOfMemory Error. Then  the test is aborted after timeout.
> This is [example failing run of the 
> job|https://builds.apache.org/job/beam_Java_LoadTests_GroupByKey_Direct_Small_PR/2/].
> The stacktrace is following:
> {code:java}
> Exception in thread "main" java.lang.RuntimeException: 
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:204)
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:64)
> 18:02:15  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
> 18:02:15  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
> 18:02:15  at org.apache.beam.sdk.loadtests.LoadTest.run(LoadTest.java:75)
> 18:02:15  at 
> org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.run(GroupByKeyLoadTest.java:58)
> 18:02:15  at 
> org.apache.beam.sdk.loadtests.GroupByKeyLoadTest.main(GroupByKeyLoadTest.java:130)
> 18:02:15 Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
> 18:02:15  at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:97)
> 18:02:15  at 
> org.apache.beam.sdk.util.CoderUtils.decodeFromByteArray(CoderUtils.java:92)
> 18:02:15  at 
> org.apache.beam.sdk.util.MutationDetectors$CodedValueMutationDetector.(MutationDetectors.java:117)
> 18:02:15  at 
> org.apache.beam.sdk.util.MutationDetectors.forValueWithCoder(MutationDetectors.java:44)
> 18:02:15  at 
> org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:112)
> 18:02:15  at 
> org.apache.beam.runners.direct.BoundedReadEvaluatorFactory$BoundedReadEvaluator.processElement(BoundedReadEvaluatorFactory.java:151)
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.processElements(DirectTransformExecutor.java:160)
> 18:02:15  at 
> org.apache.beam.runners.direct.DirectTransformExecutor.run(DirectTransformExecutor.java:124)
> 18:02:15  at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> 18:02:15  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> 18:02:15  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 18:02:15  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 18:02:15  at java.lang.Thread.run(Thread.java:748)
> 19:29:58 Build timed out (after 100 minutes). Marking the build as aborted.
> {code}
> The command to run this test is:
> {code:java}
> gradlew --info  
> -PloadTest.mainClass=org.apache.beam.sdk.loadtests.GroupByKeyLoadTest 
> -Prunner=:beam-runners-direct-java '-PloadTest.args=--publishToBigQuery=false 
> --sourceOptions={"numRecords":10,"splitPointFrequencyRecords":1,"keySizeBytes":1,"valueSizeBytes":9,"numHotKeys":0,"hotKeyFraction":0,"seed":123456,"bundleSizeDistribution":{"type":"const","const":42},"forceNumInitialBundles":100,"progressShape":"LINEAR","initializeDelayDistribution":{"type":"const","const":42}}
>  
> --stepOptions={"outputRecordsPerInputRecord":1,"preservesInputKeyDistribution":true,"perBundleDelay":1,"perBundleDelayType":"MIXED","cpuUtilizationInMixedDelay":0.5}
>  --fanout=10 --iterations=1 --runner=DirectRunner' 
> :beam-sdks-java-load-tests:run
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4508) Adapt old IOITs to current IO performance testing standards

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4508:
---

Assignee: (was: Lukasz Gajowy)

> Adapt old IOITs to current IO performance testing standards
> ---
>
> Key: BEAM-4508
> URL: https://issues.apache.org/jira/browse/BEAM-4508
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> There are some IOITs that were created before current IO testing 
> infrastructure appeared in beam. We should adapt the old tests to meet 
> current standards. 
> Documentation describing IOIT testing: 
> [https://beam.apache.org/documentation/io/testing/#i-o-transform-integration-tests]
> We should make the old tests coherent with other tests, more specifically: 
>  - write them in writeThenReadAll style
>  - enable running them with Perfkit
>  - provide Jenkins jobs to run them periodically 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6449) Create PostCommit smoke test suite (besides phrase triggered one)

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-6449:
---

Assignee: (was: Lukasz Gajowy)

> Create PostCommit smoke test suite (besides phrase triggered one)
> -
>
> Key: BEAM-6449
> URL: https://issues.apache.org/jira/browse/BEAM-6449
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Labels: triaged
>
> Load tests are very huge and consume lots of resources. In order to avoid 
> unnecessary runs on not failing code, we should have a small post-commit 
> variant that would validate if everything works well. If run post commit (on 
> every commit) such smoke tests will give us time to fix build/runtime/other 
> errors before actual load test suites are run. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6065) Provide standard configs to run load tests

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-6065:
---

Assignee: (was: Lukasz Gajowy)

> Provide standard configs to run load tests
> --
>
> Key: BEAM-6065
> URL: https://issues.apache.org/jira/browse/BEAM-6065
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Perhaps a wrapper/builder to run all tests conveniently with a set of 
> parameters
> We  can think of:
> Uniform
> Normal
> Onekey
> Fanout 256 (could be uniform)
> (+ maybe a few hot keys in a larger distribution.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-6278) Refactor Post Commit nexmark jobs to use the NexmarkBuilder

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-6278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-6278:
---

Assignee: (was: Lukasz Gajowy)

> Refactor Post Commit nexmark jobs to use the NexmarkBuilder
> ---
>
> Key: BEAM-6278
> URL: https://issues.apache.org/jira/browse/BEAM-6278
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>
> PR triggered nexmark jobs use it successfully so it's simply a code 
> duplication reduction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4383) Enable block size support in ParquetIO

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4383:
---

Assignee: (was: Lukasz Gajowy)

> Enable block size support in ParquetIO
> --
>
> Key: BEAM-4383
> URL: https://issues.apache.org/jira/browse/BEAM-4383
> Project: Beam
>  Issue Type: Improvement
>  Components: io-ideas, io-java-parquet
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Parquet API allows block size support, which can improve IO performance when 
> working with Parquet files. Currently, the ParquetIO does not support it at 
> all so it looks like a room for improvement for this IO.
> Good intro into this topic: [https://www.dremio.com/tuning-parquet/] 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5559) Beam Dependency Update Request: com.google.guava:guava

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-5559:
---

Assignee: (was: Lukasz Gajowy)

> Beam Dependency Update Request: com.google.guava:guava
> --
>
> Key: BEAM-5559
> URL: https://issues.apache.org/jira/browse/BEAM-5559
> Project: Beam
>  Issue Type: Sub-task
>  Components: dependencies
>Reporter: Beam JIRA Bot
>Priority: Major
>
>  - 2018-10-01 19:30:53.471497 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2018-10-08 12:18:05.174889 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 26.0-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-15 12:32:27.737694 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-04-22 12:10:18.539470 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 20.0. The latest version is 27.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-12 22:48:00.063941 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 26.0-jre. The latest version is 28.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-11-19 21:06:09.552946 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 26.0-jre. The latest version is 28.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-02 12:11:56.870028 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 26.0-jre. The latest version is 28.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 
>  - 2019-12-09 12:11:09.244912 
> -
> Please consider upgrading the dependency com.google.guava:guava. 
> The current version is 26.0-jre. The latest version is 28.1-jre 
> cc: 
>  Please refer to [Beam Dependency Guide 
> |https://beam.apache.org/contribute/dependencies/]for more information. 
> Do Not Modify The Description Above. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4298) Perfkit runs are always "official" while running them with Jenkins

2019-12-19 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4298:
---

Assignee: (was: Lukasz Gajowy)

> Perfkit runs are always "official" while running them with Jenkins
> --
>
> Key: BEAM-4298
> URL: https://issues.apache.org/jira/browse/BEAM-4298
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Major
>  Labels: triaged
>
> +What is "official" flag?+
> "Official" flag is a PerfkitBenchmarker boolean flag that is set in Jenkins. 
> Perfkit uses it to determine whether test results should be treated as 
> official ones. Currently, in the whole Performance Testing Framework it means 
> that such results will be:
>  - displayed in the Performance Testing Dashboards 
> ([https://apache-beam-testing.appspot.com/explore?dashboard=5755685136498688])
>  - used for detecting anomalies in those dashboards and reporting them to the 
> community (this is work in progress at the time of writing)
> +How can Performance tests be run right now?+
> Currently, we have two options for running performance tests:
> - run them periodically (this is done by Jenkins on master branch 4 times a 
> day)
> - trigger them on demand from Pull Request - for example by typing "Run Java 
> TextIO Performance Test" on Github comment in the PR. For now, they also use 
> master for building the code, which is useless in terms of running them in 
> the PR. They should use the branches code instead otherwise it's misguiding 
> for the user and error-prone. This issue is addressed here: 
> https://issues.apache.org/jira/browse/BEAM-4140 
> +What is the problem?+ 
> We shouldn't mark on demand runs as "official", because those are run to test 
> unmerged code. This will pollute the dashboards with unwanted results on the 
> plots. in the future, when anomaly detection is merged, it will discover 
> false-positive anomalies (based on unmerged code). 
> +Proposed solution:+
> The official flag in Jenkins should be set based on "GIT_BRANCH" environment 
> variable (AFAIK it's out of the box in Jenkins). Only results from master 
> should be official. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998297#comment-16998297
 ] 

Lukasz Gajowy edited comment on BEAM-5495 at 12/17/19 3:03 PM:
---

[~romain.manni-bucau] well then, it seems that I was not careful enough - 
apologies. If you want to correct my error [?] always feel free to contribute 
your solution improving the one I proposed. 

Regarding your concern - as I said I don't know much about xbean so it's hard 
for me to compare them in terms of quality, "giving guarantee of health" and 
specifically say which one is a better option for the job. All I can say is 
that all the input in the PR was talked about earlier on the 
[devlist|https://lists.apache.org/thread.html/61ae8750b4ed20413c6e93ba949ddd48dd0107a0a039ef518f9d6d21%40%3Cdev.beam.apache.org%3E]
 and classgraph seemed a sensible solution. This, of course, can be discussed 
further.

The bright side is that changing the implementation is as easy as implementing 
the PipelineResourcesDetector interface so it should be a relatively easy thing 
to do. The default algorithm can also be changed easily.


was (Author: łukaszg):
[~romain.manni-bucau] well then, it seems that I was not careful enough - 
apologies. If you want to correct my error(?) always feel free to contribute 
your solution improving the one I proposed. 

Regarding your concern - as I said I don't know much about xbean so it's hard 
for me to compare them in terms of quality, "giving guarantee of health" and 
specifically say which one is a better option for the job. All I can say is 
that all the input in the PR was talked about earlier on the 
[devlist|https://lists.apache.org/thread.html/61ae8750b4ed20413c6e93ba949ddd48dd0107a0a039ef518f9d6d21%40%3Cdev.beam.apache.org%3E]
 and classgraph seemed a sensible solution. This, of course, can be discussed 
further.

The bright side is that changing the implementation is as easy as implementing 
the PipelineResourcesDetector interface so it should be a relatively easy thing 
to do. The default algorithm can also be changed easily.

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998297#comment-16998297
 ] 

Lukasz Gajowy commented on BEAM-5495:
-

[~romain.manni-bucau] well then, it seems that I was not careful enough - 
apologies. If you want to correct my error(?) always feel free to contribute 
your solution improving the one I proposed. 

Regarding your concern - as I said I don't know much about xbean so it's hard 
for me to compare them in terms of quality, "giving guarantee of health" and 
specifically say which one is a better option for the job. All I can say is 
that all the input in the PR was talked about earlier on the 
[devlist|https://lists.apache.org/thread.html/61ae8750b4ed20413c6e93ba949ddd48dd0107a0a039ef518f9d6d21%40%3Cdev.beam.apache.org%3E]
 and classgraph seemed a sensible solution. This, of course, can be discussed 
further.

The bright side is that changing the implementation is as easy as implementing 
the PipelineResourcesDetector interface so it should be a relatively easy thing 
to do. The default algorithm can also be changed easily.

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998160#comment-16998160
 ] 

Lukasz Gajowy commented on BEAM-5495:
-

I did not consider xbean nor knew about it. Since classgraph looked the most 
promising from what I saw, does the job well and can be easily replaced with 
other libraries if needed (thanks to the way the whole PR was implemented) I 
decided to use it.

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-12-17 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998042#comment-16998042
 ] 

Lukasz Gajowy commented on BEAM-5495:
-

I'm not sure what you are asking about - what proposal do you have in mind? Is 
it a better solution than the one implemented? 

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: 2.19.0
>
>  Time Spent: 14.5h
>  Remaining Estimate: 0h
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8947) Change Jenkins jobs configuration to account for new Flink Docker image names

2019-12-13 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8947:

Status: Open  (was: Triage Needed)

> Change Jenkins jobs configuration to account for new Flink Docker image names
> -
>
> Key: BEAM-8947
> URL: https://issues.apache.org/jira/browse/BEAM-8947
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Major
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> After changes in 8b859ab8a52778d1bc14ca76f2eef7c9e70d528d Flink Docker 
> containers changed their names. Because of that, Jenkins jobs that weren't 
> changed to account for it fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8951) Stop using nose in load tests

2019-12-11 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8951:

Status: Open  (was: Triage Needed)

> Stop using nose in load tests
> -
>
> Key: BEAM-8951
> URL: https://issues.apache.org/jira/browse/BEAM-8951
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>
> The community is considering moving away from nose to pytest: 
> https://issues.apache.org/jira/browse/BEAM-3713. We should change the way of 
> running Python load tests: instead of being subclasses of 
> `unittest.TestCase`, they could be plain Python scripts, just like wordcount 
> examples. This will bring one additional benefit: _LOAD_TEST_ENABLED_ guard 
> will be no longer needed and could be safely removed



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8940) Load dataflow jobs using java 11 in Java 11 Dataflow tests

2019-12-10 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8940:
---

 Summary: Load dataflow jobs using java 11 in Java 11 Dataflow tests
 Key: BEAM-8940
 URL: https://issues.apache.org/jira/browse/BEAM-8940
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Lukasz Gajowy
Assignee: Lukasz Gajowy


Currently, Java 11 tests use only java11 docker worker image for verifying Java 
11 compatibility. Everything else (artifact staging and job startup) is done 
using Java 8. It should be done with java11 as well - this is how users will do 
it. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8939) beam_CancelStaleDataflowJobs is failing

2019-12-10 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8939:
---

 Summary: beam_CancelStaleDataflowJobs is failing
 Key: BEAM-8939
 URL: https://issues.apache.org/jira/browse/BEAM-8939
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Lukasz Gajowy


this job is failing and due to that no stale dataflow jobs are canceled leading 
up to resource quota exhaustion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project and exhaust GCP resources

2019-12-10 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8938:

Description: 
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
  - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
  - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
  - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)

 -  
[testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
 (Java SDK)
  
 Temporary solution is to ignore them. Real solution requires greater 
investigation.

 

Please see the devlist thread for more context: 
[https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]

  was:
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
  - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
  - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
  - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)

 -  
[testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
 (Java SDK)
  
 Temporary solution is to ignore them. Real solution requires greater 
investigation.


> Tests end up leaving stale Dataflow jobs in apache-beam-testing project and 
> exhaust GCP resources
> -
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Blocker
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.
>  
> Please see the devlist thread for more context: 
> [https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project and exhaust GCP resources

2019-12-10 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8938:

Description: 
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
  - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
  - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
  - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)

 -  
[testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
 (Java SDK)
  
 Temporary solution is to ignore them. Real solution requires greater 
investigation.

Please see the devlist thread for more context: 
[https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]

  was:
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
  - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
  - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
  - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)

 -  
[testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
 (Java SDK)
  
 Temporary solution is to ignore them. Real solution requires greater 
investigation.

 

Please see the devlist thread for more context: 
[https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]


> Tests end up leaving stale Dataflow jobs in apache-beam-testing project and 
> exhaust GCP resources
> -
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Blocker
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.
> Please see the devlist thread for more context: 
> [https://lists.apache.org/thread.html/01eb33ae9c05d12bb0698f91adc0021662fdfe2978cfdfde28dc56b2%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project and exhaust GCP resources

2019-12-10 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8938:

Summary: Tests end up leaving stale Dataflow jobs in apache-beam-testing 
project and exhaust GCP resources  (was: Tests end up leaving stale Dataflow 
jobs in apache-beam-testing project)

> Tests end up leaving stale Dataflow jobs in apache-beam-testing project and 
> exhaust GCP resources
> -
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Blocker
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8938) Tests end up leaving stale Dataflow jobs in apache-beam-testing project

2019-12-10 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8938:

Summary: Tests end up leaving stale Dataflow jobs in apache-beam-testing 
project  (was: Tests result in stale dataflow jobs in apache-beam-testing 
project)

> Tests end up leaving stale Dataflow jobs in apache-beam-testing project
> ---
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Blocker
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8938) Tests result in stale dataflow jobs in apache-beam-testing project

2019-12-10 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8938:

Description: 
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
  - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
  - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
  - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)

 -  
[testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
 (Java SDK)
  
 Temporary solution is to ignore them. Real solution requires greater 
investigation.

  was:
Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
 - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
 - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
 - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)
 
Temporary solution is to ignore them. Real solution requires greater 
investigation.


> Tests result in stale dataflow jobs in apache-beam-testing project
> --
>
> Key: BEAM-8938
> URL: https://issues.apache.org/jira/browse/BEAM-8938
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Blocker
>
> Some tests (I'm not sure if this is the exhaustive list but they seem to 
> appear in the dataflow console repeatedly) that seem to not be killed and eat 
> our resources: 
>   - 
> [test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
>  (spotted multiple times in the dataflow console) (Python SDK)
>   - 
> [test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
>  (Python SDK)
>   - 
> [testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
>  (Java SDK)
>  -  
> [testPairWithIndexBasicBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L125]
>  (Java SDK)
>   
>  Temporary solution is to ignore them. Real solution requires greater 
> investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8938) Tests result in stale dataflow jobs in apache-beam-testing project

2019-12-10 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8938:
---

 Summary: Tests result in stale dataflow jobs in 
apache-beam-testing project
 Key: BEAM-8938
 URL: https://issues.apache.org/jira/browse/BEAM-8938
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Lukasz Gajowy


Some tests (I'm not sure if this is the exhaustive list but they seem to appear 
in the dataflow console repeatedly) that seem to not be killed and eat our 
resources: 
 - 
[test_reshuffle_preserves_timestamps|https://github.com/apache/beam/blob/719b8cc5e51dcd3e98425ecae5ec246657d46eca/sdks/python/apache_beam/transforms/util_test.py#L487]
 (spotted multiple times in the dataflow console) (Python SDK)
 - 
[test_flatten_same_pcollections|https://github.com/apache/beam/blob/44d456830442e5f13b7fd3bd684695e2b69e2c0d/sdks/python/apache_beam/transforms/ptransform_test.py#L596]
 (Python SDK)
 - 
[testPairWithIndexWindowedTimestampedBounded|https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/core/src/test/java/org/apache/beam/sdk/transforms/SplittableDoFnTest.java#L158]
 (Java SDK)
 
Temporary solution is to ignore them. Real solution requires greater 
investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8918) Split BigQueryIOIT test into two tests for Avro and Json reads/writes

2019-12-09 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8918:

Status: Open  (was: Triage Needed)

> Split BigQueryIOIT test into two tests for Avro and Json reads/writes
> -
>
> Key: BEAM-8918
> URL: https://issues.apache.org/jira/browse/BEAM-8918
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>
> Since BigQueryIO doesn't support streaming writes, it is necessary to split 
> the IO integration test according to write method.
> The split will also make the test use the designed amount of data instead of 
> twice as much.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8919) Move JAVA_11_HOME and JAVA_8_HOME variables to Jenkins envs.

2019-12-09 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8919:
---

 Summary: Move JAVA_11_HOME and JAVA_8_HOME variables to Jenkins 
envs.
 Key: BEAM-8919
 URL: https://issues.apache.org/jira/browse/BEAM-8919
 Project: Beam
  Issue Type: Bug
  Components: testing
Reporter: Lukasz Gajowy


Some tests that use different java versions rely on the following paths to java 
home:

final String JAVA_11_HOME = '/usr/lib/jvm/java-11-openjdk-amd64'
final String JAVA_8_HOME = '/usr/lib/jvm/java-8-openjdk-amd64'

 

The paths itself should be held as jenkins env variables. Benefits: 

 - easier to reuse

 - no room for typo in the path

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988910#comment-16988910
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

I created a PR where I test if the only Java-related change that was introduced 
right before the tests started timeouting affects the VR tests execution time: 
[https://github.com/apache/beam/pull/10295]

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Description: 
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]

[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]

these jobs take more than currently set timeout (3h). 

 

EDIT: currently, after reopening the issue the timeout is set to 4.5h. 

 

 

 

  was:
[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]

[https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]


these jobs take more than currently set timeout (3h). 

 

 

 


> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
> EDIT: currently, after reopening the issue the timeout is set to 4.5h. 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) Java Dataflow ValidatesRunner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Summary: Java Dataflow ValidatesRunner tests are timeouting  (was: java 11 
dataflow validates runner tests are timeouting)

> Java Dataflow ValidatesRunner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988847#comment-16988847
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

Moreover, when the Jenkins job is aborted due to timeout, we are unable to see 
the gradle scan (gradle process is killed and scan is not generated). 

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-8424:
---

Assignee: (was: Lukasz Gajowy)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988846#comment-16988846
 ] 

Lukasz Gajowy commented on BEAM-8424:
-

The previous fix (pr #9819) mitigated the issue and tests didn't timeout after 
bumping the time limit to 4,5h. Currently, the tests exceed even this limit. 

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reopened BEAM-8424:
-
  Assignee: Lukasz Gajowy

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-8424.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Status: Open  (was: Triage Needed)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8424) java 11 dataflow validates runner tests are timeouting

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8424:

Status: Open  (was: Triage Needed)

> java 11 dataflow validates runner tests are timeouting
> --
>
> Key: BEAM-8424
> URL: https://issues.apache.org/jira/browse/BEAM-8424
> Project: Beam
>  Issue Type: Bug
>  Components: test-failures
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_Dataflow/]
> [https://builds.apache.org/view/A-D/view/Beam/view/All/job/beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow/]
> these jobs take more than currently set timeout (3h). 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8895) BigQueryIO streaming test on Java is flaky

2019-12-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8895:

Status: Open  (was: Triage Needed)

> BigQueryIO streaming test on Java is flaky
> --
>
> Key: BEAM-8895
> URL: https://issues.apache.org/jira/browse/BEAM-8895
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>
> {code:java}
> SEVERE: 2019-12-05T06:57:31.089Z: java.lang.RuntimeException: 
> com.google.api.client.googleapis.json.GoogleJsonResponseException: 400 Bad 
> Request
> 07:57:32 {
> 07:57:32   "code" : 400,
> 07:57:32   "errors" : [ {
> 07:57:32 "domain" : "global",
> 07:57:32 "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32 "reason" : "invalid"
> 07:57:32   } ],
> 07:57:32   "message" : "Invalid table ID 
> \"bqio_write_10GB_java_e27dc010-6896-41ac-90f3-25b5adc58617\". Table IDs must 
> be alphanumeric (plus underscores) and must be at most 1024 characters long. 
> Also, Table decorators cannot be used.",
> 07:57:32   "status" : "INVALID_ARGUMENT"
> 07:57:32 }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-3708) Implement the portable lifted Combiner transforms in Java SDK

2019-12-03 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-3708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986987#comment-16986987
 ] 

Lukasz Gajowy commented on BEAM-3708:
-

According to Portability Support matrix this issue is a blocker for combine 
operation in java: 
[https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit?ts=5dd6b19b#gid=0]

 

is this still the case (i see that it's resolved now, just wanted to confirm)?

> Implement the portable lifted Combiner transforms in Java SDK
> -
>
> Key: BEAM-3708
> URL: https://issues.apache.org/jira/browse/BEAM-3708
> Project: Beam
>  Issue Type: Sub-task
>  Components: sdk-java-core, sdk-java-harness
>Reporter: Daniel Oliveira
>Assignee: Daniel Oliveira
>Priority: Major
>  Labels: portability
> Fix For: 2.6.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Lifted combines are split into separate parts with different URNs. These 
> parts need to be implemented in the Java SDK harness so that the SDK can 
> actually execute them when receiving Combine transforms with the 
> corresponding URNs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8826) Investigate possibility of system metrics usage in portable performance tests

2019-11-26 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8826:
---

 Summary: Investigate possibility of system metrics usage in 
portable performance tests
 Key: BEAM-8826
 URL: https://issues.apache.org/jira/browse/BEAM-8826
 Project: Beam
  Issue Type: Task
  Components: testing
Reporter: Lukasz Gajowy


We currently use 
[TimeMonitor.java|https://github.com/apache/beam/blob/master/sdks/java/testing/test-utils/src/main/java/org/apache/beam/sdk/testutils/metrics/TimeMonitor.java]
 and 
[MeasureTime.py|https://github.com/apache/beam/blob/1988284a89b10b60eea48325f8a3b370b551c77c/sdks/python/apache_beam/testing/load_tests/load_test_metrics_utils.py#L406]
 DoFns to collect runtime in both portable and non-portable performance tests. 
However, in portable tests it seems to be possible to use [TOTAL_TIME_MSECS 
|https://github.com/apache/beam/blob/1988284a89b10b60eea48325f8a3b370b551c77c/model/pipeline/src/main/proto/metrics.proto#L130]for
 collecting execution time. Other system metrics are available as well (size, 
bundle size etc).

It seems like a good way to simplify things and get more useful metrics from 
portable jobs so it is worth investigating ways of using it in performance 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8792) Bring back the names of the runtime metrics to "runtime"

2019-11-20 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8792:

Status: Open  (was: Triage Needed)

> Bring back the names of the runtime metrics to "runtime"
> 
>
> Key: BEAM-8792
> URL: https://issues.apache.org/jira/browse/BEAM-8792
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Priority: Major
>
> Since this PR ([https://github.com/apache/beam/pull/8941),] the names of the 
> runtime metrics defined in Python load tests pipelines have changed to a 
> combination of metrics namespace and "runtime". This made querying BigQuery 
> table containing the results more difficult. The goal is to bring back the 
> names of the metrics to "runtime" to stay concise with the previous records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-4775) JobService should support returning metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-4775.
-
Fix Version/s: Not applicable
   Resolution: Fixed

the JobServer part is done - all kinds of MonitroingInfos are forwarded through 
grpc to PortableRunners (sdk side). The PortableRunners can decide how to 
digest them (e.g. create MetricsResult from monitoring infos where possible). 

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 55h
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4775) JobService should support returning metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4775:
---

Assignee: Lukasz Gajowy  (was: Kamil Wasilewski)

> JobService should support returning metrics
> ---
>
> Key: BEAM-4775
> URL: https://issues.apache.org/jira/browse/BEAM-4775
> Project: Beam
>  Issue Type: Bug
>  Components: beam-model
>Reporter: Eugene Kirpichov
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 55h
>  Remaining Estimate: 0h
>
> Design doc: [https://s.apache.org/get-metrics-api].
> Further discussion is ongoing on [this 
> doc|https://docs.google.com/document/d/1m83TsFvJbOlcLfXVXprQm1B7vUakhbLZMzuRrOHWnTg/edit?ts=5c826bb4#heading=h.faqan9rjc6dm].
> We want to report job metrics back to the portability harness from the runner 
> harness, for displaying to users.
> h1. Relevant PRs in flight:
> h2. Ready for Review:
>  * [#8022|https://github.com/apache/beam/pull/8022]: correct the Job RPC 
> protos from [#8018|https://github.com/apache/beam/pull/8018].
> h2. Iterating / Discussing:
>  * [#7971|https://github.com/apache/beam/pull/7971]: Flink portable metrics: 
> get ptransform from MonitoringInfo, not stage name
>  ** this is a simpler, Flink-specific PR that is basically duplicated inside 
> each of the following two, so may be worth trying to merge in first
>  * #[7915|https://github.com/apache/beam/pull/7915]: use MonitoringInfo data 
> model in Java SDK metrics
>  * [#7868|https://github.com/apache/beam/pull/7868]: MonitoringInfo URN tweaks
> h2. Merged
>  * [#8018|https://github.com/apache/beam/pull/8018]: add job metrics RPC 
> protos
>  * [#7867|https://github.com/apache/beam/pull/7867]: key MetricResult by a 
> MetricKey
>  * [#7938|https://github.com/apache/beam/pull/7938]: move MonitoringInfo 
> protos to model/pipeline module
>  * [#7883|https://github.com/apache/beam/pull/7883]: Add 
> MetricQueryResults.allMetrics() helper
>  * [#7866|https://github.com/apache/beam/pull/7866]: move function helpers 
> from fn-harness to sdks/java/core
>  * [#7890|https://github.com/apache/beam/pull/7890]: consolidate MetricResult 
> implementations
> h2. Closed
>  * [#7934|https://github.com/apache/beam/pull/7934]: job metrics RPC + SDK 
> support
>  * [#7876|https://github.com/apache/beam/pull/7876]: Clean up metric protos; 
> support integer distributions, gauges



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-4777) Python PortableRunner should support metrics

2019-11-18 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-4777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-4777:
---

Assignee: Kamil Wasilewski  (was: Lukasz Gajowy)

> Python PortableRunner should support metrics
> 
>
> Key: BEAM-4777
> URL: https://issues.apache.org/jira/browse/BEAM-4777
> Project: Beam
>  Issue Type: Bug
>  Components: sdk-py-core
>Reporter: Eugene Kirpichov
>Assignee: Kamil Wasilewski
>Priority: Major
>
> BEAM-4775 concerns adding metrics to the JobService API; the current issue is 
> about making Python PortableRunner understand them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-7404) ObjectSizeCalculator not supported in Portable Java pipelines

2019-11-14 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-7404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-7404:

Fix Version/s: (was: 2.16.0)
   Not applicable

> ObjectSizeCalculator not supported in Portable Java pipelines
> -
>
> Key: BEAM-7404
> URL: https://issues.apache.org/jira/browse/BEAM-7404
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Lukasz Gajowy
>Assignee: Michal Walenia
>Priority: Major
> Fix For: Not applicable
>
> Attachments: stacktrace.txt
>
>
> In byte monitor there's a problem when using ObjectSizeCalculator. See the 
> stacktrace for details. I think we should replace it.
>  
> {code:java}
> Caused by: java.lang.ExceptionInInitializerError
> at 
> jdk.nashorn.internal.ir.debug.ObjectSizeCalculator.getObjectSize(ObjectSizeCalculator.java:122)
> at 
> org.apache.beam.sdk.testutils.metrics.ByteMonitor.processElement(ByteMonitor.java:42)
> Caused by: java.lang.UnsupportedOperationException: ObjectSizeCalculator only 
> supported on HotSpot VM{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-8616) ParquetIO should have Hadoop dependencies as provided

2019-11-13 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973499#comment-16973499
 ] 

Lukasz Gajowy edited comment on BEAM-8616 at 11/13/19 4:39 PM:
---

[~iemejia] I agree, however, I can't find the label - which one do you have in 
mind?


was (Author: łukaszg):
I agree, however, I can't find the label - which one do you have in mind?

> ParquetIO should have Hadoop dependencies as provided
> -
>
> Key: BEAM-8616
> URL: https://issues.apache.org/jira/browse/BEAM-8616
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-parquet
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> ParquetIO has the hadoop-client dependency as a compile-time dependency 
> however this dependency should be provided by the user as defined in 
> parquet-hadoop. By pinning a hadoop version we are limiting users from 
> providing different Hadoop jars (as they can with native Parquet), it also 
> limits us from providing different hadoop versions to test that the Parquet 
> module is compatible with Hadoop 3 (when it will).
> Note this is a 'backwards incompatible' change in the sense that users might 
> need to explicitly provide the dependency from now on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8616) ParquetIO should have Hadoop dependencies as provided

2019-11-13 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973499#comment-16973499
 ] 

Lukasz Gajowy commented on BEAM-8616:
-

I agree, however, I can't find the label - which one do you have in mind?

> ParquetIO should have Hadoop dependencies as provided
> -
>
> Key: BEAM-8616
> URL: https://issues.apache.org/jira/browse/BEAM-8616
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-parquet
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> ParquetIO has the hadoop-client dependency as a compile-time dependency 
> however this dependency should be provided by the user as defined in 
> parquet-hadoop. By pinning a hadoop version we are limiting users from 
> providing different Hadoop jars (as they can with native Parquet), it also 
> limits us from providing different hadoop versions to test that the Parquet 
> module is compatible with Hadoop 3 (when it will).
> Note this is a 'backwards incompatible' change in the sense that users might 
> need to explicitly provide the dependency from now on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8616) ParquetIO should have Hadoop dependencies as provided

2019-11-12 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16972396#comment-16972396
 ] 

Lukasz Gajowy commented on BEAM-8616:
-

ParquetIO is marked as experimental so I think it's safe to assume that its 
dependencies will change too, do you agree? 

> ParquetIO should have Hadoop dependencies as provided
> -
>
> Key: BEAM-8616
> URL: https://issues.apache.org/jira/browse/BEAM-8616
> Project: Beam
>  Issue Type: Sub-task
>  Components: io-java-parquet
>Affects Versions: 2.16.0
>Reporter: Ismaël Mejía
>Assignee: Ismaël Mejía
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> ParquetIO has the hadoop-client dependency as a compile-time dependency 
> however this dependency should be provided by the user as defined in 
> parquet-hadoop. By pinning a hadoop version we are limiting users from 
> providing different Hadoop jars (as they can with native Parquet), it also 
> limits us from providing different hadoop versions to test that the Parquet 
> module is compatible with Hadoop 3 (when it will).
> Note this is a 'backwards incompatible' change in the sense that users might 
> need to explicitly provide the dependency from now on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-11-08 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16970485#comment-16970485
 ] 

Lukasz Gajowy commented on BEAM-5495:
-

I'd like to push things forward here so I assigned myself. I'm looking into 
both solutions now and trying to determine what is best for the project. I will 
also revieve this thread on the devlist for all devs: 
[https://lists.apache.org/thread.html/61ae8750b4ed20413c6e93ba949ddd48dd0107a0a039ef518f9d6d21@%3Cdev.beam.apache.org%3E]

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (BEAM-5495) PipelineResources algorithm is not working in most environments

2019-11-08 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy reassigned BEAM-5495:
---

Assignee: Lukasz Gajowy

> PipelineResources algorithm is not working in most environments
> ---
>
> Key: BEAM-5495
> URL: https://issues.apache.org/jira/browse/BEAM-5495
> Project: Beam
>  Issue Type: Bug
>  Components: runner-flink, runner-spark, sdk-java-core
>Reporter: Romain Manni-Bucau
>Assignee: Lukasz Gajowy
>Priority: Major
>
> Issue are:
> 1. it assumes the classloader is an URLClassLoader (not always true and java 
> >= 9 breaks that as well for the app loader)
> 2. it uses loader.getURLs() which leads to including the JRE itself in the 
> staged file
> Looks like this detect resource algorithm can't work and should be replaced 
> by a SPI rather than a built-in and not extensible algorithm. Another valid 
> alternative is to just drop that "guess" logic and force the user to set 
> staged files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8559) Run Dataflow Nexmark suites with Java 11

2019-11-05 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8559:
---

 Summary: Run Dataflow Nexmark suites with Java 11
 Key: BEAM-8559
 URL: https://issues.apache.org/jira/browse/BEAM-8559
 Project: Beam
  Issue Type: Sub-task
  Components: testing-nexmark
Reporter: Lukasz Gajowy
Assignee: Lukasz Gajowy
 Fix For: Not applicable


This task is similar to https://issues.apache.org/jira/browse/BEAM-6936.

The goal is to run Nexmark suites with Java 11 but compile with java 8 to 
verify compatibility. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8558) BigQueryIOIT Jenkins job flakes

2019-11-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8558:

Status: Open  (was: Triage Needed)

> BigQueryIOIT Jenkins job flakes
> ---
>
> Key: BEAM-8558
> URL: https://issues.apache.org/jira/browse/BEAM-8558
> Project: Beam
>  Issue Type: Bug
>  Components: testing
>Reporter: Michal Walenia
>Assignee: Michal Walenia
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Java BigQueryIOIT fails with exception:
>  
> {code:java}
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT > testWriteThenRead 
> FAILED java.lang.IllegalStateException: BigQuery table is not empty: 
> apache-beam-testing:beam_performance.bqio_write_10GB_java. at 
> org.apache.beam.vendor.guava.v26_0_jre.com.google.common.base.Preconditions.checkState(Preconditions.java:588)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryHelpers.verifyTableNotExistOrEmpty(BigQueryHelpers.java:511)
>  at 
> org.apache.beam.sdk.io.gcp.bigquery.BigQueryIO$Write.validate(BigQueryIO.java:2246)
>  at 
> org.apache.beam.sdk.Pipeline$ValidateVisitor.enterCompositeTransform(Pipeline.java:643)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:653)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
>  at 
> org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
>  at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:460) at 
> org.apache.beam.sdk.Pipeline.validate(Pipeline.java:579) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:314) at 
> org.apache.beam.sdk.Pipeline.run(Pipeline.java:301) at 
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWrite(BigQueryIOIT.java:146)
>  at 
> org.apache.beam.sdk.bigqueryioperftests.BigQueryIOIT.testWriteThenRead(BigQueryIOIT.java:116)
> {code}
> The fix is to append unique UUID to the table names in the test.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-11-05 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-8432.
-
Resolution: Fixed

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-11-04 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966883#comment-16966883
 ] 

Lukasz Gajowy commented on BEAM-8432:
-

I think you're right - I missed this one. Submitting PR...

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8548) Provide separate Jenkins job instances for each triggering modes

2019-11-02 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8548:

Description: 
Currently, there are several Jenkins job definitions that can be run in 
multiple ways (excluding manual job invocation from jenkins dashboard): 
  - periodic invoaction (cron)

 - pre/post-commit

 - phrase triggered invocation (on demand)

I'd suggest we separate the single job that can be triggered many ways to 
multiple job instances that can be triggered one way only. For an IOIT this 
would look like this (example): 
  -  
[beam_PerformanceTests_MongoDBIO_IT|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_MongoDBIO_IT/]
 (for the "cron" job version)

 -  beam_PerformanceTest_MongoDBIO_IT_PR (the phrase triggered version)

 

*Why even do that?*

This approach brings much more elasticity in terms of job configuration. For 
example:
  - we can stop sending emails to builds@ for jobs that are Phrase triggered - 
phrase triggering is signalled on github so there's no need for an email. 
builds@ could in turn be notified only for important reasons 
(preCommit/postCommit fails, cron job fails). This was discussed in BEAM-8422.

 - we can store metrics collected during testing in different db tables/not 
store them at all so that the results from master/Pr branches do not mix up. 
Ideally, when we look at the IOITs chart, we'd like to skip the results from a 
Phrase trigged job invocations and stick only to data collected from master 
branch (cron jobs versions would do that). This was also discussed in BEAM-6011

Some of the jobs already follow this approach, at least partially. Part of task 
would be to ensure that we are consistent in naming and conventions that we 
follow (_cron, _Pr/_phrase suffixes in the job names, more?). It would be best 
to enforce the conventions programmatically using job builders and proper API 
written over groovy job dsl. This is so that it's impossible to break the 
conventions when adding new jobs.

 

  

  was:
Currently, there are several Jenkins job definitions that can be run in 
multiple ways (excluding manual job invocation from jenkins dashboard): 
  - periodic invoaction (cron)

 - pre/post-commit

 - phrase triggered invocation (on demand)

I'd suggest we separate the single job that can be triggered many ways to 
multiple job instances that can be triggered one way only. For an IOIT this 
would look like this (example): 
  -  
[beam_PerformanceTests_MongoDBIO_IT|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_MongoDBIO_IT/]
 (for the "cron" job version)

 -  beam_PerformanceTest_MongoDBIO_IT_PR (the phrase triggered version)

 

*Why even do that?*

This approach brings much more elasticity in terms of job configuration. For 
example:
  - we can stop sending emails to builds@ for jobs that are Phrase triggered - 
phrase triggering is signalled on github so there's no need for an email. 
builds@ could in turn be notified only for important reasons 
(preCommit/postCommit fails, cron job fails). This was discussed in BEAM-8422.

 - we can store metrics collected during testing in different db tables/not 
store them at all so that the results from master/Pr branches do not mix up. 
Ideally, when we look at the IOITs chart, we'd like to skip the results from a 
Phrase trigged job invocations and stick only to data collected from master 
branch (cron jobs versions would do that). This was also discussed in BEAM-6011

 - Some of the jobs already follow this approach, at least partially. Part of 
task would be to ensure that we are consistent in naming and conventions that 
we follow (_cron, _Pr/_phrase suffixes in the job names, more?). It would be 
best to enforce the conventions programmatically using job builders and proper 
API written over groovy job dsl. This is so that it's impossible to break the 
conventions when adding new jobs.

 

  


> Provide separate Jenkins job instances for each triggering modes
> 
>
> Key: BEAM-8548
> URL: https://issues.apache.org/jira/browse/BEAM-8548
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Currently, there are several Jenkins job definitions that can be run in 
> multiple ways (excluding manual job invocation from jenkins dashboard): 
>   - periodic invoaction (cron)
>  - pre/post-commit
>  - phrase triggered invocation (on demand)
> I'd suggest we separate the single job that can be triggered many ways to 
> multiple job instances that can be triggered one way only. For an IOIT this 
> would look like this (example): 
>   -  
> 

[jira] [Closed] (BEAM-8422) Send emails to builds@ when job ends with state "ABORTED"

2019-11-02 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy closed BEAM-8422.
---
Fix Version/s: Not applicable
   Resolution: Fixed

> Send emails to builds@ when job ends with state "ABORTED"
> -
>
> Key: BEAM-8422
> URL: https://issues.apache.org/jira/browse/BEAM-8422
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I noticed that timeouting jobs end up in an ABORTED state (black dot on the 
> Jenkins dashboard). No email is sent to the builds@ list when this happens. 
> It reduces the visibility of a problem - if one relies on builds@ they won't 
> see that Jenkins job takes too much time.
>  
> At the time of writing, 5 jobs are affected:
> beam_PostCommit_Java11_ValidatesRunner_Dataflow_PR
> beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow
> beam_PostCommit_Java_PVR_Spark_Batch
> beam_PostCommit_Python37_PR
> beam_sonarqube_report
>  
> I propose changing this behavior and send emails to builds@ when job ends 
> with ABORTED state.
> The drawback of this solution is that every time someone aborts the job 
> manually the email will be sent too -  there's no way to distinguish those 
> two situations. However, IMO we should not allow timeouts to be unnoticed and 
> manual job aborting does not happen very often (even committers cannot do 
> that now in Jenkins). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8422) Send emails to builds@ when job ends with state "ABORTED"

2019-11-02 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965517#comment-16965517
 ] 

Lukasz Gajowy commented on BEAM-8422:
-

[~kamilwu] created: https://issues.apache.org/jira/browse/BEAM-8548 for future 
reference. Thanks!

> Send emails to builds@ when job ends with state "ABORTED"
> -
>
> Key: BEAM-8422
> URL: https://issues.apache.org/jira/browse/BEAM-8422
> Project: Beam
>  Issue Type: Bug
>  Components: build-system
>Reporter: Lukasz Gajowy
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> I noticed that timeouting jobs end up in an ABORTED state (black dot on the 
> Jenkins dashboard). No email is sent to the builds@ list when this happens. 
> It reduces the visibility of a problem - if one relies on builds@ they won't 
> see that Jenkins job takes too much time.
>  
> At the time of writing, 5 jobs are affected:
> beam_PostCommit_Java11_ValidatesRunner_Dataflow_PR
> beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow
> beam_PostCommit_Java_PVR_Spark_Batch
> beam_PostCommit_Python37_PR
> beam_sonarqube_report
>  
> I propose changing this behavior and send emails to builds@ when job ends 
> with ABORTED state.
> The drawback of this solution is that every time someone aborts the job 
> manually the email will be sent too -  there's no way to distinguish those 
> two situations. However, IMO we should not allow timeouts to be unnoticed and 
> manual job aborting does not happen very often (even committers cannot do 
> that now in Jenkins). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8548) Provide separate Jenkins job instances for each triggering modes

2019-11-02 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16965516#comment-16965516
 ] 

Lukasz Gajowy commented on BEAM-8548:
-

CC: [~kamilwu]

> Provide separate Jenkins job instances for each triggering modes
> 
>
> Key: BEAM-8548
> URL: https://issues.apache.org/jira/browse/BEAM-8548
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>
> Currently, there are several Jenkins job definitions that can be run in 
> multiple ways (excluding manual job invocation from jenkins dashboard): 
>   - periodic invoaction (cron)
>  - pre/post-commit
>  - phrase triggered invocation (on demand)
> I'd suggest we separate the single job that can be triggered many ways to 
> multiple job instances that can be triggered one way only. For an IOIT this 
> would look like this (example): 
>   -  
> [beam_PerformanceTests_MongoDBIO_IT|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_MongoDBIO_IT/]
>  (for the "cron" job version)
>  -  beam_PerformanceTest_MongoDBIO_IT_PR (the phrase triggered version)
>  
> *Why even do that?*
> This approach brings much more elasticity in terms of job configuration. For 
> example:
>   - we can stop sending emails to builds@ for jobs that are Phrase triggered 
> - phrase triggering is signalled on github so there's no need for an email. 
> builds@ could in turn be notified only for important reasons 
> (preCommit/postCommit fails, cron job fails). This was discussed in BEAM-8422.
>  - we can store metrics collected during testing in different db tables/not 
> store them at all so that the results from master/Pr branches do not mix up. 
> Ideally, when we look at the IOITs chart, we'd like to skip the results from 
> a Phrase trigged job invocations and stick only to data collected from master 
> branch (cron jobs versions would do that). This was also discussed in 
> BEAM-6011
>  - Some of the jobs already follow this approach, at least partially. Part of 
> task would be to ensure that we are consistent in naming and conventions that 
> we follow (_cron, _Pr/_phrase suffixes in the job names, more?). It would be 
> best to enforce the conventions programmatically using job builders and 
> proper API written over groovy job dsl. This is so that it's impossible to 
> break the conventions when adding new jobs.
>  
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8548) Provide separate Jenkins job instances for each triggering modes

2019-11-02 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8548:
---

 Summary: Provide separate Jenkins job instances for each 
triggering modes
 Key: BEAM-8548
 URL: https://issues.apache.org/jira/browse/BEAM-8548
 Project: Beam
  Issue Type: Improvement
  Components: testing
Reporter: Lukasz Gajowy


Currently, there are several Jenkins job definitions that can be run in 
multiple ways (excluding manual job invocation from jenkins dashboard): 
  - periodic invoaction (cron)

 - pre/post-commit

 - phrase triggered invocation (on demand)

I'd suggest we separate the single job that can be triggered many ways to 
multiple job instances that can be triggered one way only. For an IOIT this 
would look like this (example): 
  -  
[beam_PerformanceTests_MongoDBIO_IT|https://builds.apache.org/view/A-D/view/Beam/view/PerformanceTests/job/beam_PerformanceTests_MongoDBIO_IT/]
 (for the "cron" job version)

 -  beam_PerformanceTest_MongoDBIO_IT_PR (the phrase triggered version)

 

*Why even do that?*

This approach brings much more elasticity in terms of job configuration. For 
example:
  - we can stop sending emails to builds@ for jobs that are Phrase triggered - 
phrase triggering is signalled on github so there's no need for an email. 
builds@ could in turn be notified only for important reasons 
(preCommit/postCommit fails, cron job fails). This was discussed in BEAM-8422.

 - we can store metrics collected during testing in different db tables/not 
store them at all so that the results from master/Pr branches do not mix up. 
Ideally, when we look at the IOITs chart, we'd like to skip the results from a 
Phrase trigged job invocations and stick only to data collected from master 
branch (cron jobs versions would do that). This was also discussed in BEAM-6011

 - Some of the jobs already follow this approach, at least partially. Part of 
task would be to ensure that we are consistent in naming and conventions that 
we follow (_cron, _Pr/_phrase suffixes in the job names, more?). It would be 
best to enforce the conventions programmatically using job builders and proper 
API written over groovy job dsl. This is so that it's impossible to break the 
conventions when adding new jobs.

 

  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6303) Add .parquet extension to files in ParquetIO

2019-10-29 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961871#comment-16961871
 ] 

Lukasz Gajowy commented on BEAM-6303:
-

I decided to improve the docs and do nothing else for this issue. PR subbmitted.

> Add .parquet extension to files in ParquetIO
> 
>
> Key: BEAM-6303
> URL: https://issues.apache.org/jira/browse/BEAM-6303
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There should be .parquet extension added by default when writing files with 
> ParquetIO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (BEAM-6303) Add .parquet extension to files in ParquetIO

2019-10-29 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961871#comment-16961871
 ] 

Lukasz Gajowy edited comment on BEAM-6303 at 10/29/19 10:31 AM:


I decided to improve the docs and do nothing else for this issue. PR submitted.


was (Author: łukaszg):
I decided to improve the docs and do nothing else for this issue. PR subbmitted.

> Add .parquet extension to files in ParquetIO
> 
>
> Key: BEAM-6303
> URL: https://issues.apache.org/jira/browse/BEAM-6303
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There should be .parquet extension added by default when writing files with 
> ParquetIO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-10-29 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8432:

Status: Open  (was: Triage Needed)

> Parametrize source & target compatibility for beam Java modules
> ---
>
> Key: BEAM-8432
> URL: https://issues.apache.org/jira/browse/BEAM-8432
> Project: Beam
>  Issue Type: Improvement
>  Components: build-system
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
> [JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].
> For the sake of migrating the project to Java 11 we could use a mechanism 
> that will allow parametrizing the version from the command line, e.g:
> {code:java}
> // this could set source and target compatibility to 11:
> ./gradlew clean build -PjavaVersion=11{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-6303) Add .parquet extension to files in ParquetIO

2019-10-28 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-6303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961315#comment-16961315
 ] 

Lukasz Gajowy commented on BEAM-6303:
-

There is an easy way to provide the extension right now:
{code:java}
FileIO.write()
.via(ParquetIO.sink(SCHEMA))
.to(filenamePrefix)
.withSuffix(".parquet")){code}
Other file ios (TfrecordIO, AvroIO, TextIO) do not set the default extension as 
well and their javadoc comments suggest using withSuffix() method as well. 

If we don't want to decorate the sink with the suffix by wrapping it in 
PTransform (imho there's no need to do that for the suffix only), the best 
solution is to update the javadoc comment in ParquetIO.

> Add .parquet extension to files in ParquetIO
> 
>
> Key: BEAM-6303
> URL: https://issues.apache.org/jira/browse/BEAM-6303
> Project: Beam
>  Issue Type: Improvement
>  Components: io-java-parquet
>Reporter: Lukasz Gajowy
>Assignee: Lukasz Gajowy
>Priority: Major
>
> There should be .parquet extension added by default when writing files with 
> ParquetIO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (BEAM-8462) Upgrade supported Flink version to 1.9

2019-10-25 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy resolved BEAM-8462.
-
Fix Version/s: Not applicable
   Resolution: Fixed

> Upgrade supported Flink version to 1.9
> --
>
> Key: BEAM-8462
> URL: https://issues.apache.org/jira/browse/BEAM-8462
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
> Fix For: Not applicable
>
>  Time Spent: 5h 40m
>  Remaining Estimate: 0h
>
> Recently, Flink 1.9 support has been introduced in Beam. At the same moment, 
> load tests still use Flink 1.7. We should consider an upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (BEAM-8422) Send emails to builds@ when job ends with state "ABORTED"

2019-10-24 Thread Lukasz Gajowy (Jira)


[ 
https://issues.apache.org/jira/browse/BEAM-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16959062#comment-16959062
 ] 

Lukasz Gajowy commented on BEAM-8422:
-

I looked more into possibilities of turning off emails for phrase triggered 
jobs. 

This would be possible for jobs that have their definitions "duplicated" for PR 
triggers (with _PR suffix) and Cron jobs. We could deal with the problem on a 
job definition level and that's it - "_*_PR" jobs skip mailers and others 
wouldn't.

The problem is with jobs that have one definition (we still have some of them) 
- we cannot dynamically set mailers based on env variables (like BUILD_CAUSE 
variable) or some other conditions that we do not know upfront. Since build 
cause is known only after the job is defined in Jenkins this is impossible to 
do.

IMO, at least for now, we should enable emails for all aborted jobs. Once we 
have all jobs duplicated as described above we can easily improve this.  

 

 

 

 

 

 

 

 

 

 

 

 

> Send emails to builds@ when job ends with state "ABORTED"
> -
>
> Key: BEAM-8422
> URL: https://issues.apache.org/jira/browse/BEAM-8422
> Project: Beam
>  Issue Type: Improvement
>  Components: testing
>Reporter: Lukasz Gajowy
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I noticed that timeouting jobs end up in an ABORTED state (black dot on the 
> Jenkins dashboard). No email is sent to the builds@ list when this happens. 
> It reduces the visibility of a problem - if one relies on builds@ they won't 
> see that Jenkins job takes too much time.
>  
> At the time of writing, 5 jobs are affected:
> beam_PostCommit_Java11_ValidatesRunner_Dataflow_PR
> beam_PostCommit_Java11_ValidatesRunner_PortabilityApi_Dataflow
> beam_PostCommit_Java_PVR_Spark_Batch
> beam_PostCommit_Python37_PR
> beam_sonarqube_report
>  
> I propose changing this behavior and send emails to builds@ when job ends 
> with ABORTED state.
> The drawback of this solution is that every time someone aborts the job 
> manually the email will be sent too -  there's no way to distinguish those 
> two situations. However, IMO we should not allow timeouts to be unnoticed and 
> manual job aborting does not happen very often (even committers cannot do 
> that now in Jenkins). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8462) Upgrade supported Flink version to 1.9

2019-10-24 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8462:

Status: Open  (was: Triage Needed)

> Upgrade supported Flink version to 1.9
> --
>
> Key: BEAM-8462
> URL: https://issues.apache.org/jira/browse/BEAM-8462
> Project: Beam
>  Issue Type: Sub-task
>  Components: testing
>Reporter: Kamil Wasilewski
>Assignee: Kamil Wasilewski
>Priority: Major
>
> Recently, Flink 1.9 support has been introduced in Beam. At the same moment, 
> load tests still use Flink 1.7. We should consider an upgrade.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (BEAM-8207) KafkaIOITs generate different hashes each run, sometimes dropping records

2019-10-21 Thread Lukasz Gajowy (Jira)


 [ 
https://issues.apache.org/jira/browse/BEAM-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lukasz Gajowy updated BEAM-8207:

Status: Open  (was: Triage Needed)

> KafkaIOITs generate different hashes each run, sometimes dropping records
> -
>
> Key: BEAM-8207
> URL: https://issues.apache.org/jira/browse/BEAM-8207
> Project: Beam
>  Issue Type: Bug
>  Components: io-java-kafka, testing
>Reporter: Michal Walenia
>Priority: Major
>
> While working to adapt Java's KafkaIOIT to work with a large dataset 
> generated by a SyntheticSource I encountered a problem. I want to push 100M 
> records through a Kafka topic, verify data correctness and at the same time 
> check the performance of KafkaIO.Write and KafkaIO.Read.
>  
> To perform the tests I'm using a Kafka cluster on Kubernetes from the Beam 
> repo 
> ([here|https://github.com/apache/beam/tree/master/.test-infra/kubernetes/kafka-cluster]).
>  
> The expected result would be that first the records are generated in a 
> deterministic way (using hashes of list positions as Random seeds), next they 
> are written to Kafka - this concludes the write pipeline.
> As for reading and correctness checking - first, the data is read from the 
> topic and after being decoded into String representations, a hashcode of the 
> whole PCollection is calculated (For details, check KafkaIOIT.java).
>  
> During the testing I ran into several problems:
> 1. When all the records are read from the Kafka topic, the hash is different 
> each time.
> 2. Sometimes not all the records are read and the Dataflow task waits for the 
> input indefinitely, occasionally throwing exceptions.
>  
> I believe there are two possible causes of this behavior:
>  
> either there is something wrong with the Kafka cluster configuration
> or KafkaIO behaves erratically on high data volumes, duplicating and/or 
> dropping records.
> Second option seems troubling and I would be grateful for help with the first.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (BEAM-8432) Parametrize source & target compatibility for beam Java modules

2019-10-18 Thread Lukasz Gajowy (Jira)
Lukasz Gajowy created BEAM-8432:
---

 Summary: Parametrize source & target compatibility for beam Java 
modules
 Key: BEAM-8432
 URL: https://issues.apache.org/jira/browse/BEAM-8432
 Project: Beam
  Issue Type: Improvement
  Components: build-system
Reporter: Lukasz Gajowy
Assignee: Lukasz Gajowy
 Fix For: Not applicable


Currently, "javaVersion" property is hardcoded in BeamModulePlugin in 
[JavaNatureConfiguration|https://github.com/apache/beam/blob/master/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L82].

For the sake of migrating the project to Java 11 we could use a mechanism that 
will allow parametrizing the version from the command line, e.g:
{code:java}
// this could set source and target compatibility to 11:

./gradlew clean build -PjavaVersion=11{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >