Github user liancheng commented on the pull request:
https://github.com/apache/spark/pull/5263#issuecomment-87696048
This is a good point. Actually all these characters ` ,;{}()\n\t=` (note
there is a space character at the beginning) can be problematic if they appear
in field names, according to [`MessageTypeParser`] [1].
However, personally I think simply replacing these characters with
legitimate ones like brackets might be confusing. On the other hand, similar
problems can be worked around easily by assigning an alias. So how about this:
1. Check all field names for invalid characters in `convertFromAttributes`
2. Throw an error message when any invalid character is found
3. In the error message, suggest the user to add an alias to the field
explicitly
[1]:
https://github.com/apache/incubator-parquet-mr/blob/b8f5d89e0f4347ce54cf680bd7dffc9bc02f876a/parquet-column/src/main/java/parquet/schema/MessageTypeParser.java#L46
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]