[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-18 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-781801206


   cc @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-13 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-778581932


   It sounds really making sense to me. So seems to have the status quo working 
in Java 9+ sounds good idea and no (or just little) harm. Because 
`UserDefinedType` is somehow well-known and used across Spark users, improving 
the API before opening it as a DeveloperApi or later, in this case is not too 
much different.
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-08 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-775709570


   > @jnh5y @viirya I got a good question from @marmbrus - can you support 
user-defined types by just defining an Encoder for it? so that it can work in a 
Dataset?
   
   This sounds a more tremendous change than the PR I proposed. I think 
`Encoder` is for converting JVM object to top-level row in Spark SQL. Currently 
`Encoder` is a pretty abstract trait without any defined APIs, excepts for 
returning its schema and class tag. Only one implementation of `Encoder` is 
`ExpressionEncoder`, and it looks more for internal usage as it is based on 
catalyst expressions.
   
   If using `Encoder` for user-defined types is the plan, we probably need to 
add public API for users to define `Encoder`?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-05 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-773084514


   > So what about #16478 @viirya ? is that still a sound proposal for 
refactoring the interface? That seems more ideal to add first.
   
   Hm, yea, the refactoring is trying to hide internal classes such as 
`UnsafeArrayData`, `GenericInternalRow`, etc. from UDT usage. So the developers 
use Scala classes when writing UDTs. Although as UDT is private for many years, 
I guess the developers just use Spark namespace to code their UDTs with 
internal classes for a while. Technically, yes, it seems more ideal to refactor 
and hide internal classes first.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-03 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-773084514


   > So what about #16478 @viirya ? is that still a sound proposal for 
refactoring the interface? That seems more ideal to add first.
   
   Hm, yea, the refactoring is trying to hide internal classes such as 
`UnsafeArrayData`, `GenericInternalRow`, etc. from UDT usage. So the developers 
use Scala classes when writing UDTs. Although as UDT is private for many years, 
I guess the developers just use Spark namespace to code their UDTs with 
internal classes for a while. Technically, yes, it seems more ideal to refactor 
and hide internal classes first.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-03 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-772849433


   I just checked current UDT classes. Looks like it still uses internal Spark 
classes. Actually as mentioned in previous comment, I don't see any obvious 
improvement in UDT in recent years, if I don't miss anything.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #31461: [SPARK-7768][CORE][SQL] Open UserDefinedType as a Developer API

2021-02-03 Thread GitBox


viirya commented on pull request #31461:
URL: https://github.com/apache/spark/pull/31461#issuecomment-772847017


   Hmm, I am not aware of some API changes was made. I closed it because seems 
to me these users want is to open UDT API. I'm not sure if these API changes 
are in need or not. And you know it was not reviewed for many years...



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org