HeartSaVioR edited a comment on pull request #28326:
URL: https://github.com/apache/spark/pull/28326#issuecomment-620310834


   Sorry for maybe out-of-topic, but I'm seeing that column metadata may have 
more issues due to not properly defined its usage and how it should be handled.
   
   I'm actually a bit surprised that both Spark and end users co-use metadata 
and can overwrite/hide each other. I thought that's only used internally from 
Spark, and wasn't aware that Spark exposes a public API to modify metadata.
   
   I'm not sure this is really needed to be provided on end users side (or even 
3rd party), because end users (+ 3rd party) would have no way to retrieve 
metadata from only public API. Retrieving metadata in end user's perspective 
requires pattern matching `col.expr` with `NamedExpression` and call `metadata` 
which is already in catalyst area (not a public API), or package hack to call 
`named` method (not a public API). That means, they may just blindly overwrite 
the one and hide the metadata of the underlying attribute. Do we have actual 
usage on it?
   
   Also, as @cloud-fan commented earlier  
https://github.com/apache/spark/pull/28326#issuecomment-619769222, metadata 
propagation doesn't seem to be clearly defined. Alias keeps the metadata being 
propagated, but I'm not sure which other operations consider the metadata 
propagation while considering output of the operation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to