[jira] [Created] (BEAM-8520) Expansion of _CombinePerKeyWithHotKeyFanout has no typehints
Yifan Mai created BEAM-8520: --- Summary: Expansion of _CombinePerKeyWithHotKeyFanout has no typehints Key: BEAM-8520 URL: https://issues.apache.org/jira/browse/BEAM-8520 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Yifan Mai The expansion of [_CombinePerKeyWithHotKeyFanout|https://github.com/apache/beam/blob/bd09a038f4df896ef6d903a4944083fcd2e8f727/sdks/python/apache_beam/transforms/core.py#L1913], which is used by [CombinePerKey.with_hot_key_fanout()|https://github.com/apache/beam/blob/bd09a038f4df896ef6d903a4944083fcd2e8f727/sdks/python/apache_beam/transforms/core.py#L1734], uses DoFns (SplitHotCold) and CombineFns (PreCombineFn, PostCombineFn) and a lambda (SplitNonce) that have no typehints. If the user specified a typehints on the original CombineFn, the typehints are lost. This means that the fallback coder rather than the user specified coder. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8381) Typehints on DoFn subclass are incorrectly set on superclass
[ https://issues.apache.org/jira/browse/BEAM-8381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8381: Summary: Typehints on DoFn subclass are incorrectly set on superclass (was: Typehints on DoFn subclass are incorrect set on superclass) > Typehints on DoFn subclass are incorrectly set on superclass > > > Key: BEAM-8381 > URL: https://issues.apache.org/jira/browse/BEAM-8381 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Reporter: Yifan Mai >Priority: Minor > > Suppose a parent DoFn with typehints is subclassed into a child DoFn with > different typehints. The typehints will be incorrectly set on the parent DoFn > class instead of the child DoFn class, causing type checking errors. This > only happens if the parent already has typehints; if the parent has no > typehints, then the typehints are correctly set on the child. > Here's a example test case. I would expect this example to run successfully, > but instead it fails with {{apache_beam.typehints.decorators.TypeCheckError: > Type hint violation for 'AddOneAndStringify': requires but got > for element}}. > {code} > def test_do_fn_pipeline_pipeline_type_check_satisfied_with_subclassing(self): > @with_input_types(int) > @with_output_types(int) > class AddOne(beam.DoFn): > def add_one(elements): > return element + 1 > def process(self, element): > return [add_one(element)] > @with_input_types(int) > @with_output_types(str) > class AddOneAndStringify(AddOne): > def process(self, element): > return [str(add_one(element))] > d = (self.p >| 'T' >> beam.Create([1, 2, 3]).with_output_types(int) >| 'AddOne' >> beam.ParDo(AddOne()) >| 'AddOneAndStringify' >> beam.ParDo(AddOneAndStringify())) > assert_that(d, equal_to(['3', '4', '5'])) > self.p.run() > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (BEAM-8519) Typehint for Python CombineFn Accumulator
[ https://issues.apache.org/jira/browse/BEAM-8519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8519: Description: Feature request: Allow users to specify a typehint for the accumulator type of CombineFn, either through a typehints decorator, or by using Python type annotations on the CombineFn methods, or through inheriting a CombineFn [generic type|https://docs.python.org/3/library/typing.html#user-defined-generic-types]. Benefits: * Allow the user to [specify a more efficient coder|https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety] for accumulators if one is available. Currently users are able to do this for the input and output elements, but not the accumulators, which [always uses the fallback coder|https://github.com/apache/beam/blob/d664592ee761d45b0273edb027a18c6ed9340349/sdks/python/apache_beam/transforms/core.py#L859-L860]. * Allows typechecking of methods for CombineFn e.g. check that create_accumulator(), add_input() and merge_accumulators() returns accumulators. This provides better developer ergonomics as the CombineFnAPI is non-trivial. was: Feature request: Allow users to specify a typehint for the accumulator type of CombineFn, either through a typehints decorator, or by using Python type annotations on the CombineFn methods, or through inheriting a CombineFn [generic type|https://docs.python.org/3/library/typing.html#user-defined-generic-types]. Benefits: * Allow the user to [specify a more efficient coder|https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety] for accumulators if one is available. Currently users are able to do this for the input and output elements, but not the accumulators, which [always uses the fallback coder|https://github.com/apache/beam/blob/d664592ee761d45b0273edb027a18c6ed9340349/sdks/python/apache_beam/transforms/core.py#L859-L860]. * Allows typechecking of methods for CombineFn e.g. check that create_accumulator(), add_input() and merge_accumulators() returns accumulators. This provides better developer ergonomics as the CombineFnAPI is non-trivial > Typehint for Python CombineFn Accumulator > - > > Key: BEAM-8519 > URL: https://issues.apache.org/jira/browse/BEAM-8519 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Yifan Mai >Priority: Minor > > Feature request: > Allow users to specify a typehint for the accumulator type of CombineFn, > either through a typehints decorator, or by using Python type annotations on > the CombineFn methods, or through inheriting a CombineFn [generic > type|https://docs.python.org/3/library/typing.html#user-defined-generic-types]. > Benefits: > * Allow the user to [specify a more efficient > coder|https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety] > for accumulators if one is available. Currently users are able to do this > for the input and output elements, but not the accumulators, which [always > uses the fallback > coder|https://github.com/apache/beam/blob/d664592ee761d45b0273edb027a18c6ed9340349/sdks/python/apache_beam/transforms/core.py#L859-L860]. > * Allows typechecking of methods for CombineFn e.g. check that > create_accumulator(), add_input() and merge_accumulators() returns > accumulators. This provides better developer ergonomics as the CombineFnAPI > is non-trivial. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8519) Typehint for Python CombineFn Accumulator
Yifan Mai created BEAM-8519: --- Summary: Typehint for Python CombineFn Accumulator Key: BEAM-8519 URL: https://issues.apache.org/jira/browse/BEAM-8519 Project: Beam Issue Type: New Feature Components: sdk-py-core Reporter: Yifan Mai Feature request: Allow users to specify a typehint for the accumulator type of CombineFn, either through a typehints decorator, or by using Python type annotations on the CombineFn methods, or through inheriting a CombineFn [generic type|https://docs.python.org/3/library/typing.html#user-defined-generic-types]. Benefits: * Allow the user to [specify a more efficient coder|https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety] for accumulators if one is available. Currently users are able to do this for the input and output elements, but not the accumulators, which [always uses the fallback coder|https://github.com/apache/beam/blob/d664592ee761d45b0273edb027a18c6ed9340349/sdks/python/apache_beam/transforms/core.py#L859-L860]. * Allows typechecking of methods for CombineFn e.g. check that create_accumulator(), add_input() and merge_accumulators() returns accumulators. This provides better developer ergonomics as the CombineFnAPI is non-trivial -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (BEAM-8381) Typehints on DoFn subclass are incorrect set on superclass
Yifan Mai created BEAM-8381: --- Summary: Typehints on DoFn subclass are incorrect set on superclass Key: BEAM-8381 URL: https://issues.apache.org/jira/browse/BEAM-8381 Project: Beam Issue Type: Bug Components: sdk-py-core Reporter: Yifan Mai Suppose a parent DoFn with typehints is subclassed into a child DoFn with different typehints. The typehints will be incorrectly set on the parent DoFn class instead of the child DoFn class, causing type checking errors. This only happens if the parent already has typehints; if the parent has no typehints, then the typehints are correctly set on the child. Here's a example test case. I would expect this example to run successfully, but instead it fails with {{apache_beam.typehints.decorators.TypeCheckError: Type hint violation for 'AddOneAndStringify': requires but got for element}}. {code} def test_do_fn_pipeline_pipeline_type_check_satisfied_with_subclassing(self): @with_input_types(int) @with_output_types(int) class AddOne(beam.DoFn): def add_one(elements): return element + 1 def process(self, element): return [add_one(element)] @with_input_types(int) @with_output_types(str) class AddOneAndStringify(AddOne): def process(self, element): return [str(add_one(element))] d = (self.p | 'T' >> beam.Create([1, 2, 3]).with_output_types(int) | 'AddOne' >> beam.ParDo(AddOne()) | 'AddOneAndStringify' >> beam.ParDo(AddOneAndStringify())) assert_that(d, equal_to(['3', '4', '5'])) self.p.run() {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (BEAM-3645) Support multi-process execution on the FnApiRunner
[ https://issues.apache.org/jira/browse/BEAM-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924585#comment-16924585 ] Yifan Mai edited comment on BEAM-3645 at 9/6/19 8:54 PM: - While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. I added the repro steps to BEAM-8149. was (Author: myffi...@gmail.com): While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > Support multi-process execution on the FnApiRunner > -- > > Key: BEAM-3645 > URL: https://issues.apache.org/jira/browse/BEAM-3645 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Charles Chen >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.15.0 > > Time Spent: 35h 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance > gain over the previous DirectRunner. We can do even better in multi-core > environments by supporting multi-process execution in the FnApiRunner, to > scale past Python GIL limitations. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8149) [FnApiRunner]multi-process runner does not terminate cleanly upon receiving SIGINT
[ https://issues.apache.org/jira/browse/BEAM-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8149: Description: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the first comment in BEAM-3645 (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} was: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > [FnApiRunner]multi-process runner does not terminate cleanly upon receiving > SIGINT > -- > > Key: BEAM-8149 > URL: https://issues.apache.org/jira/browse/BEAM-8149 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > The multi-process runner does not handle SIGINT gracefully. To reproduce, run > wordcount.py using the "Run with multiprocessing mode" instructions from the > first comment in BEAM-3645 (in Python 3). > Expected: wordcount terminates gracefully when Ctrl-C is pressed during > pipeline execution (similarly to default direct runner) > Actual: wordcount hangs forever after printing the following once per worker: > {code} > Exception in thread run_worker: > Traceback (most recent call last): > File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner > self.run() > File "/usr/lib/python3.6/threading.py", line 864, in run > self._target(*self._args, **self._kwargs) > File > "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", > line 216, in run > 'Worker subprocess exited with return code %s' % p.returncode) > RuntimeError: Worker subprocess exited with return code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Updated] (BEAM-8149) [FnApiRunner]multi-process runner does not terminate cleanly upon receiving SIGINT
[ https://issues.apache.org/jira/browse/BEAM-8149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-8149: Description: The multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > [FnApiRunner]multi-process runner does not terminate cleanly upon receiving > SIGINT > -- > > Key: BEAM-8149 > URL: https://issues.apache.org/jira/browse/BEAM-8149 > Project: Beam > Issue Type: Bug > Components: sdk-py-core >Affects Versions: 2.15.0 >Reporter: Hannah Jiang >Assignee: Hannah Jiang >Priority: Major > > The multi-process runner does not handle SIGINT gracefully. To reproduce, run > wordcount.py using the "Run with multiprocessing mode" instructions from the > comment above (in Python 3). > Expected: wordcount terminates gracefully when Ctrl-C is pressed during > pipeline execution (similarly to default direct runner) > Actual: wordcount hangs forever after printing the following once per worker: > {code} > Exception in thread run_worker: > Traceback (most recent call last): > File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner > self.run() > File "/usr/lib/python3.6/threading.py", line 864, in run > self._target(*self._args, **self._kwargs) > File > "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", > line 216, in run > 'Worker subprocess exited with return code %s' % p.returncode) > RuntimeError: Worker subprocess exited with return code 1 > {code} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (BEAM-3645) Support multi-process execution on the FnApiRunner
[ https://issues.apache.org/jira/browse/BEAM-3645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924585#comment-16924585 ] Yifan Mai commented on BEAM-3645: - While testing this I noticed that the multi-process runner does not handle SIGINT gracefully. To reproduce, run wordcount.py using the "Run with multiprocessing mode" instructions from the comment above (in Python 3). Expected: wordcount terminates gracefully when Ctrl-C is pressed during pipeline execution (similarly to default direct runner) Actual: wordcount hangs forever after printing the following once per worker: {code} Exception in thread run_worker: Traceback (most recent call last): File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner self.run() File "/usr/lib/python3.6/threading.py", line 864, in run self._target(*self._args, **self._kwargs) File "/usr/local/google/home/yifanmai/venv/wordcount/lib/python3.6/site-packages/apache_beam/runners/portability/local_job_service.py", line 216, in run 'Worker subprocess exited with return code %s' % p.returncode) RuntimeError: Worker subprocess exited with return code 1 {code} > Support multi-process execution on the FnApiRunner > -- > > Key: BEAM-3645 > URL: https://issues.apache.org/jira/browse/BEAM-3645 > Project: Beam > Issue Type: Improvement > Components: sdk-py-core >Affects Versions: 2.2.0, 2.3.0 >Reporter: Charles Chen >Assignee: Hannah Jiang >Priority: Major > Fix For: 2.15.0 > > Time Spent: 35h 20m > Remaining Estimate: 0h > > https://issues.apache.org/jira/browse/BEAM-3644 gave us a 15x performance > gain over the previous DirectRunner. We can do even better in multi-core > environments by supporting multi-process execution in the FnApiRunner, to > scale past Python GIL limitations. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856777#comment-16856777 ] Yifan Mai edited comment on BEAM-529 at 6/5/19 3:06 PM: Sorry, I haven't captured the proposal on JIRA yet. The general idea to have DoFnRunner hash each input element (or some sample of input elements) before and after the DoFn is run. If the hashes differ, then the input element was mutated and the pipeline should return an error. The problem is that does not actually have the semantics we want. See https://docs.python.org/3/reference/datamodel.html#object.__hash__ # Not all objects are hashable. For instance mutable containers like lists are unhashable. # User defined classes are hashable by default, but the default hash is simply the id of the object, rather than its contents. I've tried some workarounds such as: # Convert unhashable containers to immutable hashable containers before hashing them # Directly read the {{__attr__}} of user defined classes and hash the elements Even so, there are user defined classes that still break under this scheme. For instance, pandas DataFrame has properties that, when read, modifies a cache that is stored as a parameter. This scheme will treat the cache modification as a mutation and incorrectly raise a false positive. As such, I haven't come up with a way to do this in a way that is robust enough to cover all conceivable user code. was (Author: myffi...@gmail.com): Sorry, I haven't captured the proposal on JIRA yet. The general idea to have DoFnRunner hash each input element (or some sample of input elements) before and after the DoFn is run. If the hashes differ, then the input element was mutated and the pipeline should return an error. The problem is that does not actually have the semantics we want. See https://docs.python.org/3/reference/datamodel.html#object.__hash__ # Not all objects are hashable. For instance mutable containers like lists are unhashable. # User defined classes are hashable by default, but the default hash is simply the id of the object, rather than its contents. I've tried some workarounds such as: # Convert unhashable containers to immutable hashable containers before hashing them # Traverse into the __attr__ of user defined classes and hash the elements Even so, there are user defined classes that still break under this scheme. For instance, pandas DataFrame has properties that, when read, modifies a cache that is stored as a parameter. This scheme will treat the cache modification as a mutation and incorrectly raise a false positive. As such, I haven't come up with a way to do this in a way that is robust enough to cover all conceivable user code. > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856777#comment-16856777 ] Yifan Mai commented on BEAM-529: Sorry, I haven't captured the proposal on JIRA yet. The general idea to have DoFnRunner hash each input element (or some sample of input elements) before and after the DoFn is run. If the hashes differ, then the input element was mutated and the pipeline should return an error. The problem is that does not actually have the semantics we want. See https://docs.python.org/3/reference/datamodel.html#object.__hash__ # Not all objects are hashable. For instance mutable containers like lists are unhashable. # User defined classes are hashable by default, but the default hash is simply the id of the object, rather than its contents. I've tried some workarounds such as: # Convert unhashable containers to immutable hashable containers before hashing them # Traverse into the __attr__ of user defined classes and hash the elements Even so, there are user defined classes that still break under this scheme. For instance, pandas DataFrame has properties that, when read, modifies a cache that is stored as a parameter. This scheme will treat the cache modification as a mutation and incorrectly raise a false positive. As such, I haven't come up with a way to do this in a way that is robust enough to cover all conceivable user code. > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (BEAM-529) Check immutability violations in DirectPipelineRunner
[ https://issues.apache.org/jira/browse/BEAM-529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16856227#comment-16856227 ] Yifan Mai commented on BEAM-529: No, not planning to work on this in the near future. As discussed elsewhere, I am not confident that my proposal will actually work. > Check immutability violations in DirectPipelineRunner > - > > Key: BEAM-529 > URL: https://issues.apache.org/jira/browse/BEAM-529 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Minor > Labels: newbie, starter > > Users are going to mutate inputs and outputs of DoFn inappropriately. We > should help their tests fail to catch such mistakes. (Similar to the > DirectPipelineRunner in Java SDK) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7340) DoFn.teardown metrics are lost in Python SDK
Yifan Mai created BEAM-7340: --- Summary: DoFn.teardown metrics are lost in Python SDK Key: BEAM-7340 URL: https://issues.apache.org/jira/browse/BEAM-7340 Project: Beam Issue Type: Bug Components: sdk-py-harness Reporter: Yifan Mai If user code in DoFn.shutdown updates custom user metrics, those updates will not get registered e.g. counter increments are not registered. Context: In [FnApiRunner.run_stages|https://github.com/apache/beam/blob/4629e82512ef1606c78cf28a2d66402c3533e23f/sdks/python/apache_beam/runners/portability/fn_api_runner.py#L342-L364], DoFn.teardown is called in worker_handler_manager.close_all, but this is called outside of the FnApiRunner.run_stage calls, so no metrics / monitoring info is retrieved there. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-563) DoFn Reuse: Update DirectRunner
[ https://issues.apache.org/jira/browse/BEAM-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai closed BEAM-563. -- Resolution: Done Fix Version/s: Not applicable This was also done in https://github.com/apache/beam/pull/7994 > DoFn Reuse: Update DirectRunner > --- > > Key: BEAM-563 > URL: https://issues.apache.org/jira/browse/BEAM-563 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Priority: Major > Fix For: Not applicable > > > https://issues.apache.org/jira/browse/BEAM-562 will add setup and teardown > methods to DoFns. Update DirectRunner to add support for these new methods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (BEAM-562) DoFn Reuse: Add new methods to DoFn
[ https://issues.apache.org/jira/browse/BEAM-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai resolved BEAM-562. Resolution: Done Fix Version/s: Not applicable > DoFn Reuse: Add new methods to DoFn > --- > > Key: BEAM-562 > URL: https://issues.apache.org/jira/browse/BEAM-562 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core >Reporter: Ahmet Altay >Assignee: Yifan Mai >Priority: Major > Labels: sdk-consistency > Fix For: Not applicable > > Time Spent: 10h 40m > Remaining Estimate: 0h > > Java SDK added setup and teardown methods to the DoFns. This makes DoFns > reusable and provide performance improvements. Python SDK should add support > for these new DoFn methods: > Proposal doc: > https://docs.google.com/document/d/1LLQqggSePURt3XavKBGV7SZJYQ4NW8yCu63lBchzMRk/edit?ts=5771458f# -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai closed BEAM-7121. --- Resolution: Done Fix Version/s: Not applicable > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > Fix For: Not applicable > > Time Spent: 3h 10m > Remaining Estimate: 0h > > Passing deterministic=true to proto's > [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. > This would allow protos to be used as a key for grouping by key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-7121: Description: Passing deterministic=true to proto's [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder. This would allow protos to be used as a key for grouping by key. was:Passing deterministic=true to proto's [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder. > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > > Passing deterministic=true to proto's > [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. > This would allow protos to be used as a key for grouping by key. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-7121: Description: Passing deterministic=true to proto's [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder. (was: Passing deterministic=true to proto's [SerializeToString|[https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder.) > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > > Passing deterministic=true to proto's > [SerializeToString|https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
[ https://issues.apache.org/jira/browse/BEAM-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-7121: Description: Passing deterministic=true to proto's [SerializeToString|[https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder. (was: [Passing deterministic=true to proto's SerializeToString|[https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder.) > Provide deterministic version of Python's ProtoCoder > > > Key: BEAM-7121 > URL: https://issues.apache.org/jira/browse/BEAM-7121 > Project: Beam > Issue Type: Improvement > Components: runner-core >Reporter: Yifan Mai >Priority: Minor > > Passing deterministic=true to proto's > [SerializeToString|[https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]] > will result in deterministic encoding of maps in protos. This can be used to > provide a deterministic version of ProtoCoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-7121) Provide deterministic version of Python's ProtoCoder
Yifan Mai created BEAM-7121: --- Summary: Provide deterministic version of Python's ProtoCoder Key: BEAM-7121 URL: https://issues.apache.org/jira/browse/BEAM-7121 Project: Beam Issue Type: Improvement Components: runner-core Reporter: Yifan Mai [Passing deterministic=true to proto's SerializeToString|[https://github.com/protocolbuffers/protobuf/blob/60b66a119d17f0a2a595a231bea87cd4f4cf2689/python/google/protobuf/message.py#L189-L204]] will result in deterministic encoding of maps in protos. This can be used to provide a deterministic version of ProtoCoder. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (BEAM-6746) Support DoFn.setup and DoFn.teardown in Python
[ https://issues.apache.org/jira/browse/BEAM-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai closed BEAM-6746. --- Resolution: Duplicate Fix Version/s: Not applicable Looks like this was already filed as BEAM-562 > Support DoFn.setup and DoFn.teardown in Python > -- > > Key: BEAM-6746 > URL: https://issues.apache.org/jira/browse/BEAM-6746 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-py-core >Reporter: Yifan Mai >Priority: Major > Fix For: Not applicable > > > DoFn.setup and DoFn.teardown is currently supported in Java but not Python. > These would be useful for performing expensive per-thread initialization. > While lazy initialization is a usable workaround, first class support for > setup and teardown would encourage consistent conventions and make the API > more uniform with the Java version. > This is related to BEAM-3736 (same issue, but for CombineFn). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (BEAM-6746) Support DoFn.setup and DoFn.teardown in Python
[ https://issues.apache.org/jira/browse/BEAM-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yifan Mai updated BEAM-6746: Issue Type: Improvement (was: Bug) > Support DoFn.setup and DoFn.teardown in Python > -- > > Key: BEAM-6746 > URL: https://issues.apache.org/jira/browse/BEAM-6746 > Project: Beam > Issue Type: Improvement > Components: beam-model, sdk-py-core >Reporter: Yifan Mai >Priority: Major > > DoFn.setup and DoFn.teardown is currently supported in Java but not Python. > These would be useful for performing expensive per-thread initialization. > While lazy initialization is a usable workaround, first class support for > setup and teardown would encourage consistent conventions and make the API > more uniform with the Java version. > This is related to BEAM-3736 (same issue, but for CombineFn). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (BEAM-6746) Support DoFn.setup and DoFn.teardown in Python
Yifan Mai created BEAM-6746: --- Summary: Support DoFn.setup and DoFn.teardown in Python Key: BEAM-6746 URL: https://issues.apache.org/jira/browse/BEAM-6746 Project: Beam Issue Type: Bug Components: beam-model, sdk-py-core Reporter: Yifan Mai DoFn.setup and DoFn.teardown is currently supported in Java but not Python. These would be useful for performing expensive per-thread initialization. While lazy initialization is a usable workaround, first class support for setup and teardown would encourage consistent conventions and make the API more uniform with the Java version. This is related to BEAM-3736 (same issue, but for CombineFn). -- This message was sent by Atlassian JIRA (v7.6.3#76005)