Re: [PR] [yaml] : fix create different type elements issue [beam]
tvalentyn merged PR #37585: URL: https://github.com/apache/beam/pull/37585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3915620290 Hi @tvalentyn, I have addressed your comments. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on code in PR #37585: URL: https://github.com/apache/beam/pull/37585#discussion_r2802023556 ## sdks/python/apache_beam/yaml/yaml_provider_unit_test.py: ## @@ -364,3 +364,17 @@ def test_empty_base(self): if __name__ == '__main__': logging.getLogger().setLevel(logging.INFO) unittest.main() + + +class YamlProvidersCreateTest(unittest.TestCase): + def test_create_mixed_types(self): +with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions( +pickle_library='cloudpickle')) as p: Review Comment: removed, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2802023040
##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle:
Optional[bool] = True):
if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
raise TypeError('elements must be a list of elements')
+if elements:
+ # Normalize elements to be all dicts or all primitives.
Review Comment:
just updated, thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
tvalentyn commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2802011844
##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle:
Optional[bool] = True):
if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
raise TypeError('elements must be a list of elements')
+if elements:
+ # Normalize elements to be all dicts or all primitives.
Review Comment:
(lmk if you plan changes here, otherwise I will merge)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3894502107 > i thought whether we should mention this in changes.md. looking at the issue, this is not a breaking change, correct? It shouldn't be, as an error should be thrown with existing code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
tvalentyn commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3894424156 i thought whether we should mention this in changes.md. looking at the issue, this is not a breaking change, correct? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
tvalentyn commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801933772
##
sdks/python/apache_beam/yaml/yaml_provider_unit_test.py:
##
@@ -364,3 +364,17 @@ def test_empty_base(self):
if __name__ == '__main__':
logging.getLogger().setLevel(logging.INFO)
unittest.main()
+
+
+class YamlProvidersCreateTest(unittest.TestCase):
+ def test_create_mixed_types(self):
+with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions(
+pickle_library='cloudpickle')) as p:
Review Comment:
pickle_library should be already cloudpickle by default
##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle:
Optional[bool] = True):
if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
raise TypeError('elements must be a list of elements')
+if elements:
+ # Normalize elements to be all dicts or all primitives.
Review Comment:
should some of this comment content be part of the docstring?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
github-actions[bot] commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3894345731 Assigning reviewers: R: @tvalentyn for label python. Note: If you would like to opt out of this review, comment `assign to next reviewer`. Available commands: - `stop reviewer notifications` - opt out of the automated review tooling - `remind me after tests pass` - tag the comment author after tests pass - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers) The PR bot will only process comments in the main thread (not review comments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3894229541 Run Python_Runners PreCommit 3.11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801551054
##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,22 @@ def create(elements: Iterable[Any], reshuffle:
Optional[bool] = True):
if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
raise TypeError('elements must be a list of elements')
+if elements:
+ # Normalize elements to be all dicts or all primitives.
+ # If we have a mix, we want to treat them all as dicts for the purpose
+ # of schema inference (so we can have a schema like
+ # Row(element=..., other_field=...)).
+ # Note that we don't want to change the elements themselves if they
+ # are already all dicts or all primitives, as that would change the
+ # resulting schema (e.g. from int to Row(element=int)).
+ is_dict = [isinstance(e, dict) for e in elements]
+ if not all(is_dict) and any(is_dict):
+elements = [
+e if isinstance(e, dict) else {
+'element': e
+} for e in elements
+]
Review Comment:
done thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
gemini-code-assist[bot] commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801359749
##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,22 @@ def create(elements: Iterable[Any], reshuffle:
Optional[bool] = True):
if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
raise TypeError('elements must be a list of elements')
+if elements:
+ # Normalize elements to be all dicts or all primitives.
+ # If we have a mix, we want to treat them all as dicts for the purpose
+ # of schema inference (so we can have a schema like
+ # Row(element=..., other_field=...)).
+ # Note that we don't want to change the elements themselves if they
+ # are already all dicts or all primitives, as that would change the
+ # resulting schema (e.g. from int to Row(element=int)).
+ is_dict = [isinstance(e, dict) for e in elements]
+ if not all(is_dict) and any(is_dict):
+elements = [
+e if isinstance(e, dict) else {
+'element': e
+} for e in elements
+]
Review Comment:

The current implementation for checking mixed types creates an intermediate
list `is_dict`, which can be memory-intensive for a large number of elements. A
more efficient approach would be to iterate once to detect if there's a mix of
types, avoiding the creation of this intermediate list. This can also
short-circuit as soon as a mix is detected.
```python
has_dict = False
has_non_dict = False
for e in elements:
if isinstance(e, dict):
has_dict = True
else:
has_non_dict = True
if has_dict and has_non_dict:
break
if has_dict and has_non_dict:
elements = [
e if isinstance(e, dict) else {
'element': e
} for e in elements
]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3893717088 /gemini review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] [yaml] : fix create different type elements issue [beam]
derrickaw commented on PR #37585: URL: https://github.com/apache/beam/pull/37585#issuecomment-3893573437 Run Python PreCommit 3.13 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
