zhengruifeng opened a new pull request, #44793:
URL: https://github.com/apache/spark/pull/44793

   ### What changes were proposed in this pull request?
   Make `shuffle` specify the datatype of `seed`
   
   
   ### Why are the changes needed?
   `shuffle` function may fail with an extreme possibility (~ 2e10) :
   `shuffle` in an unregistered function, and it requires a Long type `seed`, 
in Scala client the 
   `SparkClassUtils.random.nextLong` make sure the type; while in Python, 
`lit(random.randint(0, sys.maxsize))` may return a Literal Integer instead of 
Literal Long. 
   
   ```
   In [26]: from pyspark.sql import functions as sf
   
   In [27]: df = spark.createDataFrame([([1, 20, 3, 5],)], ['data'])
   
   In [28]: df.select(sf.shuffle(df.data)).show()
   +-------------+
   |shuffle(data)|
   +-------------+
   |[1, 3, 5, 20]|
   +-------------+
   
   
   In [29]: df.select(sf.call_udf("shuffle", df.data, 
sf.lit(123456789000000))).show()
   +-------------+
   |shuffle(data)|
   +-------------+
   |[20, 1, 5, 3]|
   +-------------+
   
   
   In [30]: df.select(sf.call_udf("shuffle", df.data, sf.lit(12345))).show()
   ...
   SparkConnectGrpcException: 
(org.apache.spark.sql.connect.common.InvalidPlanInput) seed should be a literal 
long, but got 12345
   
   ```
   
   Another case is `uuid`, but it is not supported in Python due to namespace 
conflicts.
   I don't find other similar cases.
   
   ### Does this PR introduce _any_ user-facing change?
   no
   
   
   ### How was this patch tested?
   manually check
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to