Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-17 Thread via GitHub


tvalentyn merged PR #37585:
URL: https://github.com/apache/beam/pull/37585


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-17 Thread via GitHub


derrickaw commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3915620290

   Hi @tvalentyn, I have addressed your comments.  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2802023556


##
sdks/python/apache_beam/yaml/yaml_provider_unit_test.py:
##
@@ -364,3 +364,17 @@ def test_empty_base(self):
 if __name__ == '__main__':
   logging.getLogger().setLevel(logging.INFO)
   unittest.main()
+
+
+class YamlProvidersCreateTest(unittest.TestCase):
+  def test_create_mixed_types(self):
+with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions(
+pickle_library='cloudpickle')) as p:

Review Comment:
   removed, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2802023040


##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle: 
Optional[bool] = True):
 if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
   raise TypeError('elements must be a list of elements')
 
+if elements:
+  # Normalize elements to be all dicts or all primitives.

Review Comment:
   just updated, thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


tvalentyn commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2802011844


##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle: 
Optional[bool] = True):
 if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
   raise TypeError('elements must be a list of elements')
 
+if elements:
+  # Normalize elements to be all dicts or all primitives.

Review Comment:
   (lmk if you plan changes here, otherwise I will merge)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3894502107

   > i thought whether we should mention this in changes.md. looking at the 
issue, this is not a breaking change, correct?
   
   It shouldn't be, as an error should be thrown with existing code.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


tvalentyn commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3894424156

   i thought whether we should mention this in changes.md.  looking at the 
issue, this is not a breaking change, correct? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


tvalentyn commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801933772


##
sdks/python/apache_beam/yaml/yaml_provider_unit_test.py:
##
@@ -364,3 +364,17 @@ def test_empty_base(self):
 if __name__ == '__main__':
   logging.getLogger().setLevel(logging.INFO)
   unittest.main()
+
+
+class YamlProvidersCreateTest(unittest.TestCase):
+  def test_create_mixed_types(self):
+with beam.Pipeline(options=beam.options.pipeline_options.PipelineOptions(
+pickle_library='cloudpickle')) as p:

Review Comment:
   pickle_library should be already cloudpickle by default



##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,31 @@ def create(elements: Iterable[Any], reshuffle: 
Optional[bool] = True):
 if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
   raise TypeError('elements must be a list of elements')
 
+if elements:
+  # Normalize elements to be all dicts or all primitives.

Review Comment:
   should some of this comment content be part of the docstring?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


github-actions[bot] commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3894345731

   Assigning reviewers:
   
   R: @tvalentyn for label python.
   
   
   Note: If you would like to opt out of this review, comment `assign to next 
reviewer`.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any 
comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review 
comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3894229541

   Run Python_Runners PreCommit 3.11


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801551054


##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,22 @@ def create(elements: Iterable[Any], reshuffle: 
Optional[bool] = True):
 if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
   raise TypeError('elements must be a list of elements')
 
+if elements:
+  # Normalize elements to be all dicts or all primitives.
+  # If we have a mix, we want to treat them all as dicts for the purpose
+  # of schema inference (so we can have a schema like
+  # Row(element=..., other_field=...)).
+  # Note that we don't want to change the elements themselves if they
+  # are already all dicts or all primitives, as that would change the
+  # resulting schema (e.g. from int to Row(element=int)).
+  is_dict = [isinstance(e, dict) for e in elements]
+  if not all(is_dict) and any(is_dict):
+elements = [
+e if isinstance(e, dict) else {
+'element': e
+} for e in elements
+]

Review Comment:
   done thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


gemini-code-assist[bot] commented on code in PR #37585:
URL: https://github.com/apache/beam/pull/37585#discussion_r2801359749


##
sdks/python/apache_beam/yaml/yaml_provider.py:
##
@@ -878,6 +878,22 @@ def create(elements: Iterable[Any], reshuffle: 
Optional[bool] = True):
 if not isinstance(elements, Iterable) or isinstance(elements, (dict, str)):
   raise TypeError('elements must be a list of elements')
 
+if elements:
+  # Normalize elements to be all dicts or all primitives.
+  # If we have a mix, we want to treat them all as dicts for the purpose
+  # of schema inference (so we can have a schema like
+  # Row(element=..., other_field=...)).
+  # Note that we don't want to change the elements themselves if they
+  # are already all dicts or all primitives, as that would change the
+  # resulting schema (e.g. from int to Row(element=int)).
+  is_dict = [isinstance(e, dict) for e in elements]
+  if not all(is_dict) and any(is_dict):
+elements = [
+e if isinstance(e, dict) else {
+'element': e
+} for e in elements
+]

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The current implementation for checking mixed types creates an intermediate 
list `is_dict`, which can be memory-intensive for a large number of elements. A 
more efficient approach would be to iterate once to detect if there's a mix of 
types, avoiding the creation of this intermediate list. This can also 
short-circuit as soon as a mix is detected.
   
   ```python
 has_dict = False
 has_non_dict = False
 for e in elements:
   if isinstance(e, dict):
 has_dict = True
   else:
 has_non_dict = True
   if has_dict and has_non_dict:
 break
   
 if has_dict and has_non_dict:
   elements = [
   e if isinstance(e, dict) else {
   'element': e
   } for e in elements
   ]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3893717088

   /gemini review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [yaml] : fix create different type elements issue [beam]

2026-02-12 Thread via GitHub


derrickaw commented on PR #37585:
URL: https://github.com/apache/beam/pull/37585#issuecomment-3893573437

   Run Python PreCommit 3.13


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]