jhaberstroh-sharethis opened a new pull request, #41686:
URL: https://github.com/apache/spark/pull/41686

   The `on` field complained when I passed it a Tuple. That's because it saw 
that it checked for `list` exactly, and so wrapped it into a list like `[on]`, 
leading to immediate failure. This was surprising -- typically, tuple and list 
should be interchangeable, and typically tuple is the more readily accepted 
type. I have proposed a change that moves towards the principle of least 
surprise for this situation.
   
   The reason it checked for `list` exactly is because `Column` actually is an 
`Iterable` object because it implements `__iter__`. It only does this because 
it has `__getitem__` implemented, and this allows it to be iterated over with 
`iter()`. This caused bad behavior, and so `__iter__` was implemented to raise 
an exception any time a Column is iterated over. That change was implemented in 
SPARK-10417:
   https://github.com/apache/spark/pull/8574
   
   It happens to also be that Python docs specifically advise against checking 
for iterability by using `isinstance(x, Iterable)`, and that checking for 
ability to call `iter()` is preferred. For references:
   
https://stackoverflow.com/questions/1952464/in-python-how-do-i-determine-if-an-object-is-iterable
   
https://docs.python.org/3/library/collections.abc.html#collections.abc.Iterable
   
   There will be no user-facing changes for existing working code. It will only 
fix code that did not work previously.
   
   
   ### How was this patch tested?
   Tests for:
    * `isinstance_interable` behaves as-expected for all combinations of (str, 
col) and (bare, list, tuple).
    * `to_list_column_style` creates a list when passed any of these types, and 
contains a non-iterable (as-defined)
    * require that all of these different joins produce the same result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to