[jira] [Comment Edited] (BEAM-3581) [SQL] Support for Non-ASCII chars is flaky

2018-02-01 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349074#comment-16349074
 ] 

Anton Kedin edited comment on BEAM-3581 at 2/1/18 6:37 PM:
---

Next step after [fixing the tests|https://github.com/apache/beam/pull/4564] we 
should document the current behavior:
 - Beam SQL parser tries to use UTF16;
 - unless system properties were overriden;
 - or unless you used some Calcite classes before using Beam which loaded the 
default charset;
 - except for Beam tests, which set the system properties to UTF16;

Then we need to upgrade to Calcite version that reads saffron.properties from 
resources, and use that instead of system properties.

And keep this jira open, or create another one for the follow up.


was (Author: kedin):
Next step after [fixing the tests|https://github.com/apache/beam/pull/4564] we 
should document the current behavior:
 - Beam SQL parser tries to use UTF16;
 - unless system properties were overriden;
 - or unless you used some Calcite classes before using Beam which loaded the 
default charset;
 - except for Beam tests, which set the system properties to UTF16;

Then we need to upgrade to Calcite version that reads saffron.properties from 
resources, and use that instead of system properties.

> [SQL] Support for Non-ASCII chars is flaky
> --
>
> Key: BEAM-3581
> URL: https://issues.apache.org/jira/browse/BEAM-3581
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Beam SQL overrides default charset that Calcite uses and sets it to UTF16. It 
> is done via system properties.
> Problem is that we do this only when it hasn't been set yet. So if system 
> property has been set to ISO-8859-1 (Calcite's default), then test runs will 
> fail when trying to encode characters not supported in that encoding.
> Solution:
>  - because it's a system property, we don't want to force override it;
>  - for the same reason we cannot set it for a specific query execution;
>  - we can expose a static method on BeamSql to override these properties if 
> explicitly requested;
>  - affected tests will explicitly override it;
>  - otherwise behavior will stay unchanged and we will respect defaults and 
> user settings;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (BEAM-3581) [SQL] Support for Non-ASCII chars is flaky

2018-01-31 Thread Anton Kedin (JIRA)

[ 
https://issues.apache.org/jira/browse/BEAM-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347860#comment-16347860
 ] 

Anton Kedin edited comment on BEAM-3581 at 2/1/18 1:33 AM:
---

[~dkulp]
It looks like during initialization of SqlTypeName it [calls a Util. 
enumConstants 
()|https://github.com/apache/calcite/blob/cb8376b13ad50003134e398a87161bec68908606/core/src/main/java/org/apache/calcite/sql/type/SqlTypeName.java#L138].
 But Util has a static member DEFAULT_CHARSET 
[here|https://github.com/apache/calcite/blob/cb8376b13ad50003134e398a87161bec68908606/core/src/main/java/org/apache/calcite/util/Util.java#L142]
 which creates the instance of SaffronProperties. So if I understand it right, 
if we didn't initialize the system properties before that then default charset 
will be used?





was (Author: kedin):
[~dkulp]
It looks like during initialization of SqlTypeName.VARCHAR it [calls a Util. 
enumConstants 
()|https://github.com/apache/calcite/blob/cb8376b13ad50003134e398a87161bec68908606/core/src/main/java/org/apache/calcite/sql/type/SqlTypeName.java#L138].
 But Util has a static member DEFAULT_CHARSET 
[here|https://github.com/apache/calcite/blob/cb8376b13ad50003134e398a87161bec68908606/core/src/main/java/org/apache/calcite/util/Util.java#L142]
 which creates the instance of SaffronProperties. So if I understand it right, 
if we didn't initialize the system properties before that then default charset 
will be used?




> [SQL] Support for Non-ASCII chars is flaky
> --
>
> Key: BEAM-3581
> URL: https://issues.apache.org/jira/browse/BEAM-3581
> Project: Beam
>  Issue Type: Bug
>  Components: dsl-sql
>Reporter: Anton Kedin
>Assignee: Anton Kedin
>Priority: Major
>
> Beam SQL overrides default charset that Calcite uses and sets it to UTF16. It 
> is done via system properties.
> Problem is that we do this only when it hasn't been set yet. So if system 
> property has been set to ISO-8859-1 (Calcite's default), then test runs will 
> fail when trying to encode characters not supported in that encoding.
> Solution:
>  - because it's a system property, we don't want to force override it;
>  - for the same reason we cannot set it for a specific query execution;
>  - we can expose a static method on BeamSql to override these properties if 
> explicitly requested;
>  - affected tests will explicitly override it;
>  - otherwise behavior will stay unchanged and we will respect defaults and 
> user settings;



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)