bzhaoopenstack commented on PR #37369:
URL: https://github.com/apache/spark/pull/37369#issuecomment-1204705860

   > If it's a common mistake, we might want to add this fix, but for this 
patch, I personally think this example seems a little too extreme. The user 
could also found the error by seeing `due to data type mismatch: differing 
types`. @bzhaoopenstack Or you have any other plus?
   
   That's just a demo. We can not image the developers how to type in the 
python session and provide a confused err-msg. Try to image when you are the 
developer and try to use the function in the first time, you want to try the 
borderline of a func.  I compare the pandas codeline with pyspark, and try the 
best to leverage the validation gap between pandas and PySpark.
   
   From discussion and my experience about this kind issue, the validation and 
early validation are necessary. The key point now I think is finding a good way 
to support validate during runtime. That would be good that community could 
consider a good/acceptable way to make this happen, due to the codetree of 
PySpark Pandas already contains so many sperated valiations already. So I think 
that would be a good improvement for PySpark Pandas. ;-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to