[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999266#comment-16999266 ] ASF subversion and git services commented on AVRO-2644: --- Commit c0638c8e4f329752064b016929b470c15c0b2f3b in avro's branch refs/heads/branch-1.9 from austin ce [ https://gitbox.apache.org/repos/asf?p=avro.git;h=c0638c8 ] AVRO-2644: Fix deterministic directory compilation > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > Fix For: 1.10.0, 1.9.2 > > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999230#comment-16999230 ] Hudson commented on AVRO-2644: -- SUCCESS: Integrated in Jenkins build AvroJava #789 (See [https://builds.apache.org/job/AvroJava/789/]) AVRO-2644: Fix deterministic directory compilation (ryan: [https://github.com/apache/avro/commit/a4a0bccfaebcecce429acc78e7aab0b68cccaa45]) * (edit) lang/java/tools/src/main/java/org/apache/avro/tool/SpecificCompilerTool.java > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > Fix For: 1.10.0, 1.9.2 > > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999210#comment-16999210 ] ASF subversion and git services commented on AVRO-2644: --- Commit a4a0bccfaebcecce429acc78e7aab0b68cccaa45 in avro's branch refs/heads/master from austin ce [ https://gitbox.apache.org/repos/asf?p=avro.git;h=a4a0bcc ] AVRO-2644: Fix deterministic directory compilation > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16990027#comment-16990027 ] Austin Cawley-Edwards commented on AVRO-2644: - In the `rules_avro` usage, Bazel will not automatically order the files as it just finds a common directory and calls the avro-tools CLI on it. We have modified this internally and manually sorted the sources instead of using the directory and would put in a PR for the patch but the maintainers seem to be MIA for the time being. > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16988632#comment-16988632 ] Ryan Skraba commented on AVRO-2644: --- I took a look at the bazel rule -- thanks! It looks like it runs `avro-tools` directly as opposed to wrapping SpecificCompiler itself, so I imagine that it would have the same problem as a user running it on the CLI... Do you know if file order is preserved in bazel rules? Can the build script specify which avsc to compile first? Or do we need to add an option like the avro-maven-plugin "imports" to support this type of avro-tools use? (Either way, it doesn't affect the PR above...) > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986198#comment-16986198 ] Austin Cawley-Edwards commented on AVRO-2644: - PR here: https://github.com/apache/avro/pull/731 > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16986124#comment-16986124 ] Austin Cawley-Edwards commented on AVRO-2644: - Hi [~ryanskraba], thanks for the quick reply. I'll get started on something later today. I agree with all the above assumptions, though I know some users of the Bazel build system might be affected. This is an open-sourced extension from Meetup that doesn't have wide usage but might be used internally by some companies (like ours): [https://github.com/chenrui333/rules_avro] Again thanks, and talk soon. > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (AVRO-2644) Non-Deterministic avsc Directory Compilation
[ https://issues.apache.org/jira/browse/AVRO-2644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16985916#comment-16985916 ] Ryan Skraba commented on AVRO-2644: --- {quote} Would a PR be accepted that enforces LANG=C semantics or would that have to be shipped as a breaking change? {quote} My opinion is that it would be a welcome and non-breaking improvement. The Java listFiles() eventually ends up at a [readdir|http://man7.org/linux/man-pages/man3/readdir.3.html]: "The order in which filenames are read by successive calls to readdir() depends on the filesystem implementation; it is unlikely that the names will be sorted in any fashion." It is _possible_ that a use of avro-tools is currently working and would break after sorting. Your case demonstrates it currently depends on good luck and the whim of readdir. Sorting would ensure that the same avro-tools command line would work/break regardless of the filesystem. I would estimate that the majority of scripted use of the Java schema compiler is through the maven plugin, which would be unaffected. The default sort of Java Strings is independent of locale (close enough if not identical to LANG=C), so that sounds good to me. The *current workaround* is to explicitly specify the order of the files to be compiled on the command line, which should continue to work after sorting. > Non-Deterministic avsc Directory Compilation > > > Key: AVRO-2644 > URL: https://issues.apache.org/jira/browse/AVRO-2644 > Project: Apache Avro > Issue Type: Bug >Reporter: Austin Cawley-Edwards >Priority: Minor > > {color:#22}We're trying to use the `compile \{src dir} \{output dir}` > command in{color} > {color:#22}`avro-tools` and finding that there are some > non-deterministic{color} > {color:#22}behaviors between systems, depending on how the OS sorts > files.{color} > {color:#22}Example:{color} > {color:#22}schemas/Component.avsc{color} > {color:#22} - defines Component record type in the namespace > `com.test`{color} > {color:#22}schemas/Parent.avsc{color} > {color:#22} - defines a Parent record, in the same `com.test` > namespace, with a{color} > {color:#22}field of type `com.test.Component`{color} > {color:#22}With the command, `java -jar avro-tools-1.9.1.jar compile > schemas/{color} > {color:#22}out-dir/`, some systems compile the directory in the order > Component,{color} > {color:#22}Parent while others compile in the order Parent, Component. > The latter{color} > {color:#22}fails as Component has not been defined when it is referenced > by{color} > {color:#22}Parent.{color} > {color:#22}We have also tried using the IDL and importing the dependency > types,{color} > {color:#22}and then converting them to avsc, and finally compiling the > entire{color} > {color:#22}directory, but that fails as the generated avsc files embed/ > duplicate{color} > {color:#22}the "Component" types each time it is used.{color} > {color:#22}OS:{color} > {color:#22}Linux 857aaf92e059 4.15.0-70-generic #79-Ubuntu SMP Tue Nov > 12{color} > {color:#22}10:36:11 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux{color} > {color:#22}Avro:{color} > {color:#22}version 1.9.1{color} > > > Would a PR be accepted that enforces LANG=C semantics or would that have to > be shipped as a breaking change? > > Coming from this thread in the mailing list: > [http://mail-archives.apache.org/mod_mbox/avro-user/201911.mbox/%3CCALGL%2BUDH03pCyKAQ5a%2B_fvwnUVougwwEXe8%2BHFAuR8Q%3D2cqYmw%40mail.gmail.com%3E] -- This message was sent by Atlassian Jira (v8.3.4#803005)