[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-781801206 cc @cloud-fan This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-778581932 It sounds really making sense to me. So seems to have the status quo working in Java 9+ sounds good idea and no (or just little) harm. Because `UserDefinedType` is somehow well-known and used across Spark users, improving the API before opening it as a DeveloperApi or later, in this case is not too much different. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-775709570 > @jnh5y @viirya I got a good question from @marmbrus - can you support user-defined types by just defining an Encoder for it? so that it can work in a Dataset? This sounds a more tremendous change than the PR I proposed. I think `Encoder` is for converting JVM object to top-level row in Spark SQL. Currently `Encoder` is a pretty abstract trait without any defined APIs, excepts for returning its schema and class tag. Only one implementation of `Encoder` is `ExpressionEncoder`, and it looks more for internal usage as it is based on catalyst expressions. If using `Encoder` for user-defined types is the plan, we probably need to add public API for users to define `Encoder`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-773084514 > So what about #16478 @viirya ? is that still a sound proposal for refactoring the interface? That seems more ideal to add first. Hm, yea, the refactoring is trying to hide internal classes such as `UnsafeArrayData`, `GenericInternalRow`, etc. from UDT usage. So the developers use Scala classes when writing UDTs. Although as UDT is private for many years, I guess the developers just use Spark namespace to code their UDTs with internal classes for a while. Technically, yes, it seems more ideal to refactor and hide internal classes first. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-773084514 > So what about #16478 @viirya ? is that still a sound proposal for refactoring the interface? That seems more ideal to add first. Hm, yea, the refactoring is trying to hide internal classes such as `UnsafeArrayData`, `GenericInternalRow`, etc. from UDT usage. So the developers use Scala classes when writing UDTs. Although as UDT is private for many years, I guess the developers just use Spark namespace to code their UDTs with internal classes for a while. Technically, yes, it seems more ideal to refactor and hide internal classes first. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-772849433 I just checked current UDT classes. Looks like it still uses internal Spark classes. Actually as mentioned in previous comment, I don't see any obvious improvement in UDT in recent years, if I don't miss anything. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API
viirya commented on pull request #31461: URL: https://github.com/apache/spark/pull/31461#issuecomment-772847017 Hmm, I am not aware of some API changes was made. I closed it because seems to me these users want is to open UDT API. I'm not sure if these API changes are in need or not. And you know it was not reviewed for many years... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org