[GitHub] [spark] gerashegalov commented on a change in pull request #32555: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

GitBox Sun, 16 May 2021 11:07:09 -0700


gerashegalov commented on a change in pull request #32555:
URL: https://github.com/apache/spark/pull/32555#discussion_r633120362




##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,23 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
         if isinstance(truncate, bool) and truncate:
             print(self._jdf.showString(n, 20, vertical))
         else:
-            print(self._jdf.showString(n, int(truncate), vertical))
+            try:
+                int_truncate = int(truncate)
+            except ValueError:
+                raise ValueError(f"Non-bool parameter 'truncate={truncate}'"

Review comment:
       We are in the `else` branch  dealing with non-bool truncate values. 
currently, anything that `int(x)` can 
[convert](https://docs.python.org/3/library/functions.html#int) to an int 
works. We only catch exceptions that would have been thrown previously to add 
more diagnostics without trying to restrict further for backwards-compatibility.
   
   Thus the test documents this as working
   ```
           df.show(n=5, truncate='1', vertical=False)
           df.show(n=5, truncate=1.5, vertical=False)
   ```
   while it's not how it's documented I thought we should avoid potentially 
breaking the user code.
   
   This is the test case that would trigger the exception 
https://github.com/apache/spark/pull/32555/files#diff-3695fad77c3d1ed787c659b0772fbfa936033fdc233904a9ed35836e2643c839R855-R856
   
   
   
   
    
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gerashegalov commented on a change in pull request #32555: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

Reply via email to