bzhaoopenstack commented on PR #37369: URL: https://github.com/apache/spark/pull/37369#issuecomment-1204705860
> If it's a common mistake, we might want to add this fix, but for this patch, I personally think this example seems a little too extreme. The user could also found the error by seeing `due to data type mismatch: differing types`. @bzhaoopenstack Or you have any other plus? That's just a demo. We can not image the developers how to type in the python session and provide a confused err-msg. Try to image when you are the developer and try to use the function in the first time, you want to try the borderline of a func. I compare the pandas codeline with pyspark, and try the best to leverage the validation gap between pandas and PySpark. From discussion and my experience about this kind issue, the validation and early validation are necessary. The key point now I think is finding a good way to support validate during runtime. That would be good that community could consider a good/acceptable way to make this happen, due to the codetree of PySpark Pandas already contains so many sperated valiations already. So I think that would be a good improvement for PySpark Pandas. ;-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
