[jira] [Updated] (BEAM-1188) More Verifiers For Python E2E Tests
[ https://issues.apache.org/jira/browse/BEAM-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-1188: --- Description: Add more basic verifiers in e2e test to verify output data in different storage/fs: 1. File verifier: compute and verify checksum of file(s) that’s stored on a filesystem (GCS / local fs). 2. Bigquery verifier: query from Bigquery table and verify response content. Also update TestOptions.on_success_matcher to accept a list of matchers instead of single one. Note: Have retry when doing IO to avoid test flacky that may come from inconsistency of the filesystem. This problem happened in Java integration tests. > More Verifiers For Python E2E Tests > --- > > Key: BEAM-1188 > URL: https://issues.apache.org/jira/browse/BEAM-1188 > Project: Beam > Issue Type: Task > Components: sdk-py, testing >Reporter: Mark Liu >Assignee: Mark Liu > > Add more basic verifiers in e2e test to verify output data in different > storage/fs: > 1. File verifier: compute and verify checksum of file(s) that’s stored on a > filesystem (GCS / local fs). > 2. Bigquery verifier: query from Bigquery table and verify response content. > Also update TestOptions.on_success_matcher to accept a list of matchers > instead of single one. > Note: Have retry when doing IO to avoid test flacky that may come from > inconsistency of the filesystem. This problem happened in Java integration > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1188) More Verifiers For Python E2E Tests
[ https://issues.apache.org/jira/browse/BEAM-1188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-1188: --- Description: Add more basic verifiers in e2e test to verify output data in different storage/fs: 1. File verifier: compute and verify checksum of file(s) that’s stored on a filesystem (GCS / local fs). 2. Bigquery verifier: query from Bigquery table and verify response content. ... Also update TestOptions.on_success_matcher to accept a list of matchers instead of single one. Note: Have retry when doing IO to avoid test flacky that may come from inconsistency of the filesystem. This problem happened in Java integration tests. was: Add more basic verifiers in e2e test to verify output data in different storage/fs: 1. File verifier: compute and verify checksum of file(s) that’s stored on a filesystem (GCS / local fs). 2. Bigquery verifier: query from Bigquery table and verify response content. Also update TestOptions.on_success_matcher to accept a list of matchers instead of single one. Note: Have retry when doing IO to avoid test flacky that may come from inconsistency of the filesystem. This problem happened in Java integration tests. > More Verifiers For Python E2E Tests > --- > > Key: BEAM-1188 > URL: https://issues.apache.org/jira/browse/BEAM-1188 > Project: Beam > Issue Type: Task > Components: sdk-py, testing >Reporter: Mark Liu >Assignee: Mark Liu > > Add more basic verifiers in e2e test to verify output data in different > storage/fs: > 1. File verifier: compute and verify checksum of file(s) that’s stored on a > filesystem (GCS / local fs). > 2. Bigquery verifier: query from Bigquery table and verify response content. > ... > Also update TestOptions.on_success_matcher to accept a list of matchers > instead of single one. > Note: Have retry when doing IO to avoid test flacky that may come from > inconsistency of the filesystem. This problem happened in Java integration > tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1188) More Verifiers For Python E2E Tests
Mark Liu created BEAM-1188: -- Summary: More Verifiers For Python E2E Tests Key: BEAM-1188 URL: https://issues.apache.org/jira/browse/BEAM-1188 Project: Beam Issue Type: Task Components: sdk-py, testing Reporter: Mark Liu Assignee: Mark Liu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1091) Python E2E WordCount test in Precommit
[ https://issues.apache.org/jira/browse/BEAM-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-1091: --- Summary: Python E2E WordCount test in Precommit (was: E2E WordCount test in Precommit) > Python E2E WordCount test in Precommit > -- > > Key: BEAM-1091 > URL: https://issues.apache.org/jira/browse/BEAM-1091 > Project: Beam > Issue Type: Task > Components: sdk-py, testing >Reporter: Mark Liu >Assignee: Mark Liu > > We want to include some e2e test in precommit in order to catch bugs earlier, > instead of breaking postcommit very often. > As what we have in postcommit, we want the same wordcount test in precommit > executed by DirectPipelineRunner and DataflowRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-1124) Python ValidateRunner Test test_multi_valued_singleton_side_input Break Postcommit
[ https://issues.apache.org/jira/browse/BEAM-1124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-1124: --- Description: Python test_multi_valued_singleton_side_input test, a ValidatesRunner test that running on dataflow service, failed and broke postcommit(https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/853/). Here is the stack trace: {code} Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/dataflow_test.py", line 186, in test_multi_valued_singleton_side_input pipeline.run() File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/pipeline.py", line 159, in run return self.runner.run(self) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/dataflow_runner.py", line 195, in run % getattr(self, 'last_error_msg', None), self.result) DataflowRuntimeException: Dataflow pipeline failed: (99aeafa7a8dffcc7): Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 514, in do_work work_executor.execute() File "dataflow_worker/executor.py", line 892, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:24008) op.start() File "dataflow_worker/executor.py", line 456, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:13870) def start(self): File "dataflow_worker/executor.py", line 483, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:13685) self.dofn_runner = common.DoFnRunner( File "apache_beam/runners/common.py", line 89, in apache_beam.runners.common.DoFnRunner.__init__ (apache_beam/runners/common.c:3469) args, kwargs, [side_input[global_window] File "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py", line 192, in __getitem__ _FilteringIterable(self._iterable, target_window), self._view_options) File "/usr/local/lib/python2.7/dist-packages/apache_beam/pvalue.py", line 279, in _from_runtime_iterable 'PCollection with more than one element accessed as ' ValueError: PCollection with more than one element accessed as a singleton view. {code} Worker logs in here: https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/853/console In order to temporarily ignore this test in postcommit, we can comment out annotation "@attr('ValidatesRunner')" of this test. Then it will only run as a unit test (execute by DirectRunner), but not run as a ValidatesRunner test. was: Python test_multi_valued_singleton_side_input test, a ValidatesRunner test that running on dataflow service, failed and broke postcommit. Here is the stack trace: {code} Traceback (most recent call last): File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/dataflow_test.py", line 186, in test_multi_valued_singleton_side_input pipeline.run() File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/pipeline.py", line 159, in run return self.runner.run(self) File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python_Verify/sdks/python/apache_beam/runners/dataflow_runner.py", line 195, in run % getattr(self, 'last_error_msg', None), self.result) DataflowRuntimeException: Dataflow pipeline failed: (99aeafa7a8dffcc7): Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 514, in do_work work_executor.execute() File "dataflow_worker/executor.py", line 892, in dataflow_worker.executor.MapTaskExecutor.execute (dataflow_worker/executor.c:24008) op.start() File "dataflow_worker/executor.py", line 456, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:13870) def start(self): File "dataflow_worker/executor.py", line 483, in dataflow_worker.executor.DoOperation.start (dataflow_worker/executor.c:13685) self.dofn_runner = common.DoFnRunner( File "apache_beam/runners/common.py", line 89, in apache_beam.runners.common.DoFnRunner.__init__ (apache_beam/runners/common.c:3469) args, kwargs, [side_input[global_window] File "/usr/local/lib/python2.7/dist-packages/apache_beam/transforms/sideinputs.py", line 192, in __getitem__ _FilteringIterable(self._iterable, target_window), self._view_options) File "/usr/local/lib/python2.7/dist-packages/apache_beam/pvalue.py", line 279, in _from_runtime_iterable 'PCollection with more than one element accessed as ' ValueError: PCollection with more than one element accessed as a singleton view. {code} Worker logs in here: https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/853/console In order to temporarily ignore this test in postcommit,
[jira] [Created] (BEAM-1112) Python E2E Integration Test Framework - Batch Only
Mark Liu created BEAM-1112: -- Summary: Python E2E Integration Test Framework - Batch Only Key: BEAM-1112 URL: https://issues.apache.org/jira/browse/BEAM-1112 Project: Beam Issue Type: Task Components: sdk-py, testing Reporter: Mark Liu Assignee: Mark Liu Parity with Java. Build e2e integration test framework that can configure and run batch pipeline with specified test runner, wait for pipeline execution and verify results with given verifiers in the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-1077) @ValidatesRunner test in Python postcommit
[ https://issues.apache.org/jira/browse/BEAM-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu closed BEAM-1077. -- Resolution: Done Fix Version/s: Not applicable > @ValidatesRunner test in Python postcommit > -- > > Key: BEAM-1077 > URL: https://issues.apache.org/jira/browse/BEAM-1077 > Project: Beam > Issue Type: Test > Components: sdk-py, testing >Reporter: Mark Liu >Assignee: Mark Liu > Fix For: Not applicable > > > Modify run_postcommit.sh to have @ValidatesRunner tests running on service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-1109) Python ValidatesRunner Tests on Dataflow Service Timeout
[ https://issues.apache.org/jira/browse/BEAM-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15731495#comment-15731495 ] Mark Liu commented on BEAM-1109: Actually the current ValidatesRunner tests are not running parallel since missing config in context. Also unique job_name should be assigned to each pipeline job instead of specifying a unique name in test execution command. In order to make postcommit back to green soon, let's remove parallel config first and add it back later when fully tested. > Python ValidatesRunner Tests on Dataflow Service Timeout > > > Key: BEAM-1109 > URL: https://issues.apache.org/jira/browse/BEAM-1109 > Project: Beam > Issue Type: Bug > Components: sdk-py, testing >Reporter: Mark Liu >Assignee: Mark Liu > > ValidatesRunner tests timeout with following logs: > https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/839/console > Need to increase "--process-timeout" in execution command > (https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/run_postcommit.sh#L77). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1109) Python ValidatesRunner Tests on Dataflow Service Timeout
Mark Liu created BEAM-1109: -- Summary: Python ValidatesRunner Tests on Dataflow Service Timeout Key: BEAM-1109 URL: https://issues.apache.org/jira/browse/BEAM-1109 Project: Beam Issue Type: Bug Components: sdk-py, testing Reporter: Mark Liu Assignee: Mark Liu ValidatesRunner tests timeout with following logs: https://builds.apache.org/view/Beam/job/beam_PostCommit_Python_Verify/839/console Need to increase "--process-timeout" in execution command (https://github.com/apache/incubator-beam/blob/python-sdk/sdks/python/run_postcommit.sh#L77). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-1091) E2E WordCount test in Precommit
Mark Liu created BEAM-1091: -- Summary: E2E WordCount test in Precommit Key: BEAM-1091 URL: https://issues.apache.org/jira/browse/BEAM-1091 Project: Beam Issue Type: Task Components: sdk-py, testing Reporter: Mark Liu Assignee: Mark Liu We want to include some e2e test in precommit in order to catch bugs earlier, instead of breaking postcommit very often. As what we have in postcommit, we want the same wordcount test in precommit executed by DirectPipelineRunner and DataflowRunner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-719) Run WindowedWordCount Integration Test in Spark
[ https://issues.apache.org/jira/browse/BEAM-719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu reassigned BEAM-719: - Assignee: Mark Liu (was: Amit Sela) > Run WindowedWordCount Integration Test in Spark > --- > > Key: BEAM-719 > URL: https://issues.apache.org/jira/browse/BEAM-719 > Project: Beam > Issue Type: Improvement > Components: testing >Reporter: Mark Liu >Assignee: Mark Liu > > The purpose of running WindowedWordCountIT in Spark is to have a streaming > test pipeline running in Jenkins pre-commit using TestSparkRunner. > More discussion happened here: > https://github.com/apache/incubator-beam/pull/1045#issuecomment-251531770 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-605) Create BigQuery Verifier
[ https://issues.apache.org/jira/browse/BEAM-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu resolved BEAM-605. --- Resolution: Done Fix Version/s: Not applicable > Create BigQuery Verifier > > > Key: BEAM-605 > URL: https://issues.apache.org/jira/browse/BEAM-605 > Project: Beam > Issue Type: Task > Components: testing >Affects Versions: Not applicable >Reporter: Mark Liu >Assignee: Mark Liu > Fix For: Not applicable > > > Create BigQuery verifier that is used to verify output of integration test > which is using BigQuery as output source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-747) Text checksum verifier is not resilient to eventually consistent filesystems
[ https://issues.apache.org/jira/browse/BEAM-747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576630#comment-15576630 ] Mark Liu commented on BEAM-747: --- Yes, it's worth having retry in file path matching and reading in order to handle IO failures from filesystem and some special cases like no file is found. As for example2, one place to add sharding name template is the ouputpath argument passing to FileChecksumMatcher. Instead of using ".../result*, we can use ".../result*-of-*". This can avoid reading irrelevant files but can't guaranty all shards are read unless given total number of shards. The current thought in my mind is passing the number of shards from command line as an optional test option, then pass it to the verifier. Not sure if we have a better way to do that. Since from previous test results, I found that the number of shards is runner dependent. > Text checksum verifier is not resilient to eventually consistent filesystems > > > Key: BEAM-747 > URL: https://issues.apache.org/jira/browse/BEAM-747 > Project: Beam > Issue Type: Bug > Components: testing >Affects Versions: Not applicable >Reporter: Daniel Halperin >Assignee: Mark Liu > > Example 1: > https://builds.apache.org/job/beam_PreCommit_MavenVerify/3934/org.apache.beam$beam-examples-java/console > Here it looks like we need to retry listing files, at least a little bit, if > none are found. They did show up: > {code} > gsutil ls > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results\* > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-0-of-3 > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-1-of-3 > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-13-12-37-02-467/output/results-2-of-3 > {code} > Example 2: > https://builds.apache.org/job/beam_PostCommit_MavenVerify/org.apache.beam$beam-examples-java/1525/testReport/junit/org.apache.beam.examples/WordCountIT/testE2EWordCount/ > Here it looks like we need to fill in the shard template if the filesystem > does not give us a consistent result: > {code} > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > readLines > INFO: [0 of 1] Read 162 lines from file: > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-0-of-3 > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > readLines > INFO: [1 of 1] Read 144 lines from file: > gs://temp-storage-for-end-to-end-tests/WordCountIT-2016-10-14-00-25-55-609/output/results-2-of-3 > Oct 14, 2016 12:31:16 AM org.apache.beam.sdk.testing.FileChecksumMatcher > matchesSafely > INFO: Generated checksum for output data: > aec68948b2515e6ea35fd1ed7649c267a10a01e5 > {code} > We missed shard 1-of-3 and hence got the wrong checksum. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-720) Running WindowedWordCount Integration Test in Flink
Mark Liu created BEAM-720: - Summary: Running WindowedWordCount Integration Test in Flink Key: BEAM-720 URL: https://issues.apache.org/jira/browse/BEAM-720 Project: Beam Issue Type: Improvement Reporter: Mark Liu Assignee: Aljoscha Krettek In order to have coverage of streaming pipeline test in pre-commit, it's important to have TestFlinkRunner to be able to run WindowedWordCountIT successfully. Relevant works in TestDataflowRunner: https://github.com/apache/incubator-beam/pull/1045 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-719) Running WindowedWordCount Integration Test in Spark
Mark Liu created BEAM-719: - Summary: Running WindowedWordCount Integration Test in Spark Key: BEAM-719 URL: https://issues.apache.org/jira/browse/BEAM-719 Project: Beam Issue Type: Improvement Reporter: Mark Liu Assignee: Amit Sela The purpose of running WindowedWordCountIT in Spark is to have a streaming test pipeline running in Jenkins pre-commit using TestSparkRunner. More discussion happened here: https://github.com/apache/incubator-beam/pull/1045#issuecomment-251531770 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-495) Generalize FileChecksumMatcher used for all E2E test
[ https://issues.apache.org/jira/browse/BEAM-495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu closed BEAM-495. - > Generalize FileChecksumMatcher used for all E2E test > > > Key: BEAM-495 > URL: https://issues.apache.org/jira/browse/BEAM-495 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > Fix For: Not applicable > > > Refactor WordCountOnSuccessMatcher to be more general so that it can be > reused by other tests. > Requirement: > Given input file path (accept glob) and expected checksum, generate checksum > of file(s) and verify with expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (BEAM-624) NoClassDefFoundError Failed Dataflow Streaming Job
[ https://issues.apache.org/jira/browse/BEAM-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu resolved BEAM-624. --- Resolution: Fixed Fix Version/s: Not applicable > NoClassDefFoundError Failed Dataflow Streaming Job > -- > > Key: BEAM-624 > URL: https://issues.apache.org/jira/browse/BEAM-624 > Project: Beam > Issue Type: Bug >Reporter: Mark Liu >Assignee: Mark Liu > Fix For: Not applicable > > > NoClassDefFoundError when running streaming job on Dataflow service. > Full stacktrace here when : > {code} > (bb04a33113307d77): Exception: java.lang.NoClassDefFoundError: > org/apache/beam/sdk/util/AttemptBoundedExponentialBackOff > com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources$UnboundedReaderIterator.advance(WorkerCustomSources.java:694) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:371) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:198) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:139) > > com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:71) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:660) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:89) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:487) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {code} > Retore AttemptBounded/AttemptAndTimeBounded backoff classes which are removed > in this commit[1] in order to pass Dataflow streaming job against at HEAD. > [1] > https://github.com/apache/incubator-beam/commit/dbbcbe604e167b306feac2443bec85f2da3c1dd6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-624) NoClassDefFoundError Failed Dataflow Streaming Job
[ https://issues.apache.org/jira/browse/BEAM-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu closed BEAM-624. - > NoClassDefFoundError Failed Dataflow Streaming Job > -- > > Key: BEAM-624 > URL: https://issues.apache.org/jira/browse/BEAM-624 > Project: Beam > Issue Type: Bug >Reporter: Mark Liu >Assignee: Mark Liu > Fix For: Not applicable > > > NoClassDefFoundError when running streaming job on Dataflow service. > Full stacktrace here when : > {code} > (bb04a33113307d77): Exception: java.lang.NoClassDefFoundError: > org/apache/beam/sdk/util/AttemptBoundedExponentialBackOff > com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources$UnboundedReaderIterator.advance(WorkerCustomSources.java:694) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:371) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:198) > > com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:139) > > com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:71) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:660) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:89) > > com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:487) > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > {code} > Retore AttemptBounded/AttemptAndTimeBounded backoff classes which are removed > in this commit[1] in order to pass Dataflow streaming job against at HEAD. > [1] > https://github.com/apache/incubator-beam/commit/dbbcbe604e167b306feac2443bec85f2da3c1dd6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-624) NoClassDefFoundError Failed Dataflow Streaming Job
Mark Liu created BEAM-624: - Summary: NoClassDefFoundError Failed Dataflow Streaming Job Key: BEAM-624 URL: https://issues.apache.org/jira/browse/BEAM-624 Project: Beam Issue Type: Bug Reporter: Mark Liu Assignee: Mark Liu NoClassDefFoundError when running streaming job on Dataflow service. Full stacktrace here when : {code} (bb04a33113307d77): Exception: java.lang.NoClassDefFoundError: org/apache/beam/sdk/util/AttemptBoundedExponentialBackOff com.google.cloud.dataflow.worker.runners.worker.WorkerCustomSources$UnboundedReaderIterator.advance(WorkerCustomSources.java:694) com.google.cloud.dataflow.worker.util.common.worker.ReadOperation$SynchronizedReaderIterator.advance(ReadOperation.java:371) com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.runReadLoop(ReadOperation.java:198) com.google.cloud.dataflow.worker.util.common.worker.ReadOperation.start(ReadOperation.java:139) com.google.cloud.dataflow.worker.util.common.worker.MapTaskExecutor.execute(MapTaskExecutor.java:71) com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.process(StreamingDataflowWorker.java:660) com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker.access$500(StreamingDataflowWorker.java:89) com.google.cloud.dataflow.worker.runners.worker.StreamingDataflowWorker$6.run(StreamingDataflowWorker.java:487) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) {code} Retore AttemptBounded/AttemptAndTimeBounded backoff classes which are removed in this commit[1] in order to pass Dataflow streaming job against at HEAD. [1] https://github.com/apache/incubator-beam/commit/dbbcbe604e167b306feac2443bec85f2da3c1dd6 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Description: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. Update: Pipeline author should have control on whether or not canceling streaming job and when. Test framework is a better place to auto-cancel streaming test job when curtain conditions meet, rather than in waitUntilFinish(). was: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. Update:idile 1. pipeline author have control on whether or not canceling streaming job and when. The ideal way to do is: {code} job = pipeline.run(); job.waitUntilFinish(); job.cancel(); {code} > Use Watermark Check Streaming Job Finish in TestDataflowRunner > -- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also Dataflow runners can > keep simple and free from handling batch and streaming two cases. > Update: > Pipeline author should have control on whether or not canceling streaming job > and when. Test framework is a better place to auto-cancel streaming test job > when curtain conditions meet, rather than in waitUntilFinish(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Description: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. Update:idile 1. pipeline author have control on whether or not canceling streaming job and when. The ideal way to do is: {code} job = pipeline.run(); job.waitUntilFinish(); job.cancel(); {code} was: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. Update: > Use Watermark Check Streaming Job Finish in TestDataflowRunner > -- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also Dataflow runners can > keep simple and free from handling batch and streaming two cases. > Update:idile > 1. pipeline author have control on whether or not canceling streaming job and > when. The ideal way to do is: > {code} > job = pipeline.run(); > job.waitUntilFinish(); > job.cancel(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Description: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. Update: was: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. > Use Watermark Check Streaming Job Finish in TestDataflowRunner > -- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also Dataflow runners can > keep simple and free from handling batch and streaming two cases. > Update: -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Summary: Use Watermark Check Streaming Job Finish in TestDataflowRunner (was: Use Watermark Check Streaming Job Finish in DataflowPipelineJob) > Use Watermark Check Streaming Job Finish in TestDataflowRunner > -- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also Dataflow runners can > keep simple and free from handling batch and streaming two cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Description: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also Dataflow runners can keep simple and free from handling batch and streaming two cases. was: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also > Use Watermark Check Streaming Job Finish in DataflowPipelineJob > --- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also Dataflow runners can > keep simple and free from handling batch and streaming two cases. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Description: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. Update: Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with bounded input will take advantage of this change and are canceled automatically when watermarks reach to max value. Also was: Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. > Use Watermark Check Streaming Job Finish in DataflowPipelineJob > --- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. > Update: > Suggesting by [~pei...@gmail.com], implement checkMaxWatermark in > DataflowPipelineJob#waitUntilFinish. Thus, all dataflow streaming jobs with > bounded input will take advantage of this change and are canceled > automatically when watermarks reach to max value. Also -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-604) Use Watermark Check Streaming Job Finish in DataflowPipelineJob
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-604: -- Summary: Use Watermark Check Streaming Job Finish in DataflowPipelineJob (was: Use Watermark Check Streaming Job Finish in TestDataflowRunner ) > Use Watermark Check Streaming Job Finish in DataflowPipelineJob > --- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-572) Remove Spark references in WordCount
[ https://issues.apache.org/jira/browse/BEAM-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452754#comment-15452754 ] Mark Liu commented on BEAM-572: --- PR is closed. This jira can be marked as resolved. > Remove Spark references in WordCount > > > Key: BEAM-572 > URL: https://issues.apache.org/jira/browse/BEAM-572 > Project: Beam > Issue Type: Bug > Components: examples-java >Reporter: Pei He >Assignee: Mark Liu > > Examples should be runner agnostics. > We don't want to have Spark references in > https://github.com/apache/incubator-beam/blob/master/examples/java/src/main/java/org/apache/beam/examples/WordCount.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15449381#comment-15449381 ] Mark Liu commented on BEAM-604: --- Right. [~pei...@gmail.com] may have some context on this. > Use Watermark Check Streaming Job Finish in TestDataflowRunner > --- > > Key: BEAM-604 > URL: https://issues.apache.org/jira/browse/BEAM-604 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > > Currently, streaming job with bounded input can't be terminated automatically > and TestDataflowRunner can't handle this case. Need to update > TestDataflowRunner so that streaming integration test such as > WindowedWordCountIT can run with it. > Implementation: > Query watermark of each step and wait until all watermarks set to MAX then > cancel the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-605) Create BigQuery Verifier
Mark Liu created BEAM-605: - Summary: Create BigQuery Verifier Key: BEAM-605 URL: https://issues.apache.org/jira/browse/BEAM-605 Project: Beam Issue Type: Task Reporter: Mark Liu Create BigQuery verifier that is used to verify output of integration test which is using BigQuery as output source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-604) Use Watermark Check Streaming Job Finish in TestDataflowRunner
Mark Liu created BEAM-604: - Summary: Use Watermark Check Streaming Job Finish in TestDataflowRunner Key: BEAM-604 URL: https://issues.apache.org/jira/browse/BEAM-604 Project: Beam Issue Type: Bug Reporter: Mark Liu Assignee: Mark Liu Currently, streaming job with bounded input can't be terminated automatically and TestDataflowRunner can't handle this case. Need to update TestDataflowRunner so that streaming integration test such as WindowedWordCountIT can run with it. Implementation: Query watermark of each step and wait until all watermarks set to MAX then cancel the job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-594) Support cancel() and waitUntilFinish() in FlinkRunnerResult
[ https://issues.apache.org/jira/browse/BEAM-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446359#comment-15446359 ] Mark Liu commented on BEAM-594: --- duplicated with BEAM-593 > Support cancel() and waitUntilFinish() in FlinkRunnerResult > --- > > Key: BEAM-594 > URL: https://issues.apache.org/jira/browse/BEAM-594 > Project: Beam > Issue Type: New Feature > Components: runner-flink >Reporter: Pei He > > We introduced both functions to PipelineResult. > Currently, both of them throw UnsupportedOperationException in Flink runner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-592) StackOverflowError Failed example/java/WordCount When Using SparkRunner
[ https://issues.apache.org/jira/browse/BEAM-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-592: -- Description: WordCount(example/java/WordCount) failed running with sparkRunner in following command: {code} mvn clean compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/kinglear.txt --runner=SparkRunner --sparkMaster=local --tempLocation=/tmp/out" {code} Following is part of stacktrace: {code} Caused by: java.lang.StackOverflowError at java.util.concurrent.CopyOnWriteArrayList.toArray(CopyOnWriteArrayList.java:374) at java.util.logging.Logger.accessCheckedHandlers(Logger.java:1782) at java.util.logging.LogManager$RootLogger.accessCheckedHandlers(LogManager.java:1668) at java.util.logging.Logger.getHandlers(Logger.java:1776) at java.util.logging.Logger.log(Logger.java:735) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:580) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:650) at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224) at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301) at java.util.logging.Logger.log(Logger.java:738) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:580) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:650) at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224) at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301) ... {code} According to [http://slf4j.org/legacy.html#jul-to-slf4j] and in particular section: "jul-to-slf4j.jar and slf4j-jdk14.jar cannot be present simultaneously". Change slf4j-jdk14 dependency scope to test solve the above problem, but WordCountIT still failed in same reason. was: WordCount failed running with sparkRunner in following command: {code} mvn clean compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount -Dexec.args="--inputFile=/tmp/kinglear.txt --runner=SparkRunner --sparkMaster=local --tempLocation=/tmp/out" {code} Following is part of stacktrace: {code} Caused by: java.lang.StackOverflowError at java.util.concurrent.CopyOnWriteArrayList.toArray(CopyOnWriteArrayList.java:374) at java.util.logging.Logger.accessCheckedHandlers(Logger.java:1782) at java.util.logging.LogManager$RootLogger.accessCheckedHandlers(LogManager.java:1668) at java.util.logging.Logger.getHandlers(Logger.java:1776) at java.util.logging.Logger.log(Logger.java:735) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:580) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:650) at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224) at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301) at java.util.logging.Logger.log(Logger.java:738) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:580) at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:650) at org.slf4j.bridge.SLF4JBridgeHandler.callLocationAwareLogger(SLF4JBridgeHandler.java:224) at org.slf4j.bridge.SLF4JBridgeHandler.publish(SLF4JBridgeHandler.java:301) ... {code} According to [http://slf4j.org/legacy.html#jul-to-slf4j] and in particular section: "jul-to-slf4j.jar and slf4j-jdk14.jar cannot be present simultaneously". Change slf4j-jdk14 dependency scope to test solve the above problem, but WordCountIT still failed in same reason. > StackOverflowError Failed example/java/WordCount When Using SparkRunner > --- > > Key: BEAM-592 > URL: https://issues.apache.org/jira/browse/BEAM-592 > Project: Beam > Issue Type: Bug >Reporter: Mark Liu > > WordCount(example/java/WordCount) failed running with sparkRunner in > following command: > {code} > mvn clean compile exec:java > -Dexec.mainClass=org.apache.beam.examples.WordCount > -Dexec.args="--inputFile=/tmp/kinglear.txt --runner=SparkRunner > --sparkMaster=local --tempLocation=/tmp/out" > {code} > Following is part of stacktrace: > {code} > Caused by: java.lang.StackOverflowError > at > java.util.concurrent.CopyOnWriteArrayList.toArray(CopyOnWriteArrayList.java:374) > at java.util.logging.Logger.accessCheckedHandlers(Logger.java:1782) > at > java.util.logging.LogManager$RootLogger.accessCheckedHandlers(LogManager.java:1668) > at java.util.logging.Logger.getHandlers(Logger.java:1776) > at java.util.logging.Logger.log(Logger.java:735) > at org.slf4j.impl.JDK14LoggerAdapter.log(JDK14LoggerAdapter.java:580) > at
[jira] [Commented] (BEAM-583) Auto Register TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435790#comment-15435790 ] Mark Liu commented on BEAM-583: --- Make sense for me. For me, TestPipeline is more used in unit and functional test and people should really use it. But once a pipeline is build and people want to test it from end to end, TestYYYRunner + IT framework will be a better choice. > Auto Register TestDataflowRunner > - > > Key: BEAM-583 > URL: https://issues.apache.org/jira/browse/BEAM-583 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > > Register TestDataflowRunner automatically. > Simplify option's arguments when using TestDataflowRunner run end-to-end test > against dataflow service. Instead of using > {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, > we can also use {code}--runner=TestDataflowRunner{code}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-583) Auto Register TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435760#comment-15435760 ] Mark Liu commented on BEAM-583: --- I can't think of a case that the pipeline author use TestDataflowRunner directly. it is used to kick off an integration test like WordCountIT. So do we want to only register runners that pipeline author use directly? > Auto Register TestDataflowRunner > - > > Key: BEAM-583 > URL: https://issues.apache.org/jira/browse/BEAM-583 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > > Register TestDataflowRunner automatically. > Simplify option's arguments when using TestDataflowRunner run end-to-end test > against dataflow service. Instead of using > {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, > we can also use {code}--runner=TestDataflowRunner{code}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-583) Auto Register TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435739#comment-15435739 ] Mark Liu commented on BEAM-583: --- using {code}--runner=TestDataflowRunner{code}will cause following error which may confuse people. {code} java.lang.IllegalArgumentException: Unknown 'runner' specified 'TestDataflowRunner', supported pipeline runners [BlockingDataflowRunner, DataflowRunner, DirectRunner, FlinkRunner, SparkRunner, TestFlinkRunner, TestSparkRunner] {code} > Auto Register TestDataflowRunner > - > > Key: BEAM-583 > URL: https://issues.apache.org/jira/browse/BEAM-583 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > > Register TestDataflowRunner automatically. > Simplify option's arguments when using TestDataflowRunner run end-to-end test > against dataflow service. Instead of using > {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, > we can also use {code}--runner=TestDataflowRunner{code}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-583) Auto Register TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15435710#comment-15435710 ] Mark Liu commented on BEAM-583: --- Not figure out how to use --help, but according to PipelineOptionsFactory code, I only find registered options will be listed. I'm intend to register TestDataflowRunner in DataflowPipelineRegistrar if it make sense to you. > Auto Register TestDataflowRunner > - > > Key: BEAM-583 > URL: https://issues.apache.org/jira/browse/BEAM-583 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > > Register TestDataflowRunner automatically. > Simplify option's arguments when using TestDataflowRunner run end-to-end test > against dataflow service. Instead of using > {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, > we can also use {code}--runner=TestDataflowRunner{code}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-583) Auto Register TestDataflowRunner
[ https://issues.apache.org/jira/browse/BEAM-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-583: -- Description: Register TestDataflowRunner automatically. Simplify option's arguments when using TestDataflowRunner run end-to-end test against dataflow service. Instead of using {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, we can also use {code}--runner=TestDataflowRunner{code}. was: Register TestDataflowRunner automatically. Simplify option's arguments when using TestDataflowRunner run end-to-end test against dataflow service. Instead of using "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner", we can also use "--runner=TestDataflowRunner". > Auto Register TestDataflowRunner > - > > Key: BEAM-583 > URL: https://issues.apache.org/jira/browse/BEAM-583 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu > > Register TestDataflowRunner automatically. > Simplify option's arguments when using TestDataflowRunner run end-to-end test > against dataflow service. Instead of using > {code}--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner{code}, > we can also use {code}--runner=TestDataflowRunner{code}. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-583) Auto Register TestDataflowRunner
Mark Liu created BEAM-583: - Summary: Auto Register TestDataflowRunner Key: BEAM-583 URL: https://issues.apache.org/jira/browse/BEAM-583 Project: Beam Issue Type: Improvement Reporter: Mark Liu Assignee: Mark Liu Register TestDataflowRunner automatically. Simplify option's arguments when using TestDataflowRunner run end-to-end test against dataflow service. Instead of using "--runner=org.apache.beam.runners.dataflow.testing.TestDataflowRunner", we can also use "--runner=TestDataflowRunner". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (BEAM-561) Add WindowedWordCountIT
[ https://issues.apache.org/jira/browse/BEAM-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu reassigned BEAM-561: - Assignee: Mark Liu > Add WindowedWordCountIT > --- > > Key: BEAM-561 > URL: https://issues.apache.org/jira/browse/BEAM-561 > Project: Beam > Issue Type: Bug >Reporter: Jason Kuster >Assignee: Mark Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-495) Generalize FileChecksumMatcher used for all E2E test
Mark Liu created BEAM-495: - Summary: Generalize FileChecksumMatcher used for all E2E test Key: BEAM-495 URL: https://issues.apache.org/jira/browse/BEAM-495 Project: Beam Issue Type: Improvement Reporter: Mark Liu Assignee: Mark Liu Refactor WordCountOnSuccessMatcher to be more general so that it can be reused by other tests. Requirement: Given input file path (accept glob) and expected checksum, generate checksum of file(s) and verify with expected. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (BEAM-446) Improve IOChannelUtils.resolve() to accept multiple paths at once
[ https://issues.apache.org/jira/browse/BEAM-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu closed BEAM-446. - > Improve IOChannelUtils.resolve() to accept multiple paths at once > - > > Key: BEAM-446 > URL: https://issues.apache.org/jira/browse/BEAM-446 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Assignee: Mark Liu >Priority: Minor > Fix For: 0.2.0-incubating > > > Currently, IOChannelUtils.resolve() method can only resolve one path against > base path. > It's useful to have another method with arguments that includes one base path > and multiple others. The return string will be a directory that start with > base path and append rests which are separated by file separator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (BEAM-446) Improve IOChannelUtils.resolve() to accept multiple paths at once
[ https://issues.apache.org/jira/browse/BEAM-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Liu updated BEAM-446: -- Description: Currently, IOChannelUtils.resolve() method can only resolve one path against base path. It's useful to have another method with arguments that includes one base path and multiple others. The return string will be a directory that start with base path and append rests which are separated by file separator. was: Currently, IOChannelUtils.resolve() method can only resolve one path against base path. It's useful to have another method with arguments that includes one base path and multiple other paths. The return string will be a full path that start with base path and append rests which is separated by file separator. > Improve IOChannelUtils.resolve() to accept multiple paths at once > - > > Key: BEAM-446 > URL: https://issues.apache.org/jira/browse/BEAM-446 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Priority: Minor > > Currently, IOChannelUtils.resolve() method can only resolve one path against > base path. > It's useful to have another method with arguments that includes one base path > and multiple others. The return string will be a directory that start with > base path and append rests which are separated by file separator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (BEAM-446) Improve IOChannelUtils.resolve() to accept multiple paths at once
[ https://issues.apache.org/jira/browse/BEAM-446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15373860#comment-15373860 ] Mark Liu commented on BEAM-446: --- [~lcwik] assign to me? > Improve IOChannelUtils.resolve() to accept multiple paths at once > - > > Key: BEAM-446 > URL: https://issues.apache.org/jira/browse/BEAM-446 > Project: Beam > Issue Type: Improvement >Reporter: Mark Liu >Priority: Minor > > Currently, IOChannelUtils.resolve() method can only resolve one path against > base path. > It's useful to have another method with arguments that includes one base path > and multiple other paths. The return string will be a full path that start > with base path and append rests which is separated by file separator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (BEAM-445) Beam-examples-java build failed through local "mvn install"
Mark Liu created BEAM-445: - Summary: Beam-examples-java build failed through local "mvn install" Key: BEAM-445 URL: https://issues.apache.org/jira/browse/BEAM-445 Project: Beam Issue Type: Bug Environment: linux Reporter: Mark Liu Priority: Critical Build project under beam/examples/java with command "mvn clean install -DskipTests" failed with following error: [ERROR] Failed to execute goal on project beam-examples-java: Could not resolve dependencies for project org.apache.beam:beam-examples-java:jar:0.2.0-incubating-SNAPSHOT: Could not transfer artifact io.netty:netty-tcnative-boringssl-static:jar:${os.detected.classifier}:1.1.33.Fork13 from/to central (http://repo.maven.apache.org/maven2): Illegal character in path at index 138: http://repo.maven.apache.org/maven2/io/netty/netty-tcnative-boringssl-static/1.1.33.Fork13/netty-tcnative-boringssl-static-1.1.33.Fork13-${os.detected.classifier}.jar Reason: can't resolve ${os.detected.classifier} in beam/sdks/java/io/google-cloud-platform/pom file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)