maropu commented on pull request #30867: URL: https://github.com/apache/spark/pull/30867#issuecomment-748818566
Thanks for the comment, @dongjoon-hyun ! > The category is based on the outputType or inputType? Some functions are at the intersection of both types. e.g. StructToCsv or CreateArray. A basic policy to re-categorize functions is that functions in the same file are categorized into the same group. But, yea, the two cases you pointed out above are ambiguous cases, I think. In the current approach, `StructToCsv` is categorized into `csv_funcs` because it exists in the `csvExpressions.scala` file (I mean that it is categorized based on its functionality). `CreateArray` is categorized into `array_funcs`based on the output type (The other functions in `array_funcs` are categorized based on their input types though...). > The definition of array_func and collection_func? `array_funcs` and `map_funcs` are sub-groups of `collection_funcs` in the current approach. For example, `array_contains` is used only for arrays, so it is assigned to `array_funcs`. On the other hand, `reverse` is used for both arrays and strings, so it is assigned to `collection_funcs`. Anyway, this is a first-shot to re-categorize them, so I'm open to other ideas. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
