[GitHub] [spark] gerashegalov commented on a change in pull request #32555: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

GitBox Mon, 17 May 2021 11:07:19 -0700


gerashegalov commented on a change in pull request #32555:
URL: https://github.com/apache/spark/pull/32555#discussion_r633047787




##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,22 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):

Review comment:
       right, so if we pass `n=False` then ` not isinstance(n, int)` is `False` 
and with this alone the error would not be raised. We need to have an explicit 
check to reject the `bool`-typed `n`

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,22 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):

Review comment:
       if we pass `n=False` then ` not isinstance(n, int)` is `False` and with 
this alone the error would not be raised. We need to have an explicit check to 
reject the `bool`-typed `n`

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,23 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
         if isinstance(truncate, bool) and truncate:
             print(self._jdf.showString(n, 20, vertical))
         else:
-            print(self._jdf.showString(n, int(truncate), vertical))
+            try:
+                int_truncate = int(truncate)
+            except ValueError:
+                raise ValueError(f"Non-bool parameter 'truncate={truncate}'"

Review comment:
       We are in the `else` branch  dealing with non-bool truncate values. 
currently, anything that `int(x)` can 
[convert](https://docs.python.org/3/library/functions.html#int) to an int 
works. We only catch exceptions that would have been thrown previously to add 
more diagnostics without trying to restrict further for backwards-compatibility.
   
   Thus the test documents this as working
   ```
           df.show(n=5, truncate='1', vertical=False)
           df.show(n=5, truncate=1.5, vertical=False)
   ```
   while it's not how it's documented I thought we should avoid potentially 
breaking the user code.
   
   This is the test case that would trigger the exception 
https://github.com/apache/spark/pull/32555/files#diff-3695fad77c3d1ed787c659b0772fbfa936033fdc233904a9ed35836e2643c839R855-R856
   
   
   
   
    
   

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,23 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):
+            raise TypeError("Parameter 'n' (number of rows) must be an int")
+
+        if not isinstance(vertical, bool):
+            raise TypeError("Parameter 'vertical' must be a bool")
+
         if isinstance(truncate, bool) and truncate:
             print(self._jdf.showString(n, 20, vertical))
         else:
-            print(self._jdf.showString(n, int(truncate), vertical))
+            try:
+                int_truncate = int(truncate)
+            except ValueError:
+                raise ValueError(f"Non-bool parameter 'truncate={truncate}'"

Review comment:
       We are in the `else` branch  dealing with non-bool truncate values. 
currently, anything that `int(x)` can 
[convert](https://docs.python.org/3/library/functions.html#int) to an int 
works. We only catch exceptions that would have been thrown previously to add 
more diagnostics without trying to restrict further for backwards-compatibility.
   
   Thus the test documents this as working
   ```python
           df.show(n=5, truncate='1', vertical=False)
           df.show(n=5, truncate=1.5, vertical=False)
   ```
   while it's not how it's documented I thought we should avoid potentially 
breaking the user code.
   
   This is the test case that would trigger the exception 
https://github.com/apache/spark/pull/32555/files#diff-3695fad77c3d1ed787c659b0772fbfa936033fdc233904a9ed35836e2643c839R855-R856
   
   ```python
           with self.assertRaisesRegex(ValueError, "Non-bool parameter 
'truncate=foo'"):
               df.show(truncate='foo')
   ```
   
   
    
   

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -482,10 +482,22 @@ def show(self, n=20, truncate=True, vertical=False):
          age  | 5
          name | Bob
         """
+
+        if not isinstance(n, int) or isinstance(n, bool):

Review comment:
       Maybe we can come up with a generic mechanism  to tackle  non-existing 
method exceptions from`ReflectionEngine.getMethod`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] gerashegalov commented on a change in pull request #32555: [SPARK-35408][PYTHON] Improve parameter validation in DataFrame.show

Reply via email to