[jira] [Work logged] (AVRO-3423) Add release step to build.sh for C#
[ https://issues.apache.org/jira/browse/AVRO-3423?focusedWorklogId=769539=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769539 ] ASF GitHub Bot logged work on AVRO-3423: Author: ASF GitHub Bot Created on: 12/May/22 10:58 Start Date: 12/May/22 10:58 Worklog Time Spent: 10m Work Description: martin-g commented on code in PR #1570: URL: https://github.com/apache/avro/pull/1570#discussion_r871242405 ## lang/rust/build.sh: ## @@ -17,57 +17,69 @@ set -e # exit on error -root_dir=$(pwd) -build_dir="../../build/rust" -dist_dir="../../dist/rust" +cd "$(dirname "$0")" # If being called from another folder, cd into the directory containing this script. +# shellcheck disable=SC1091 +source ../../share/build-helper.sh "Rust" + +build_dir="$BUILD_ROOT/build/rust" +dist_dir="$BUILD_ROOT/dist/rust" function clean { - if [ -d $build_dir ]; then -find $build_dir | xargs chmod 755 -rm -rf $build_dir + if [ -d "$build_dir" ]; then +execute find "$build_dir" -exec chmod 755 {} + +execute rm -rf "$build_dir" fi } - function prepare_build { clean - mkdir -p $build_dir + execute mkdir -p "$build_dir" +} + +function command_clean() +{ + execute cargo clean +} + +function command_lint() +{ + execute cargo clippy --all-targets --all-features Issue Time Tracking --- Worklog Id: (was: 769539) Remaining Estimate: 20h 10m (was: 20h 20m) Time Spent: 3h 50m (was: 3h 40m) > Add release step to build.sh for C# > --- > > Key: AVRO-3423 > URL: https://issues.apache.org/jira/browse/AVRO-3423 > Project: Apache Avro > Issue Type: Improvement > Components: csharp >Reporter: Zoltan Csizmadia >Priority: Minor > Labels: pull-request-available > Original Estimate: 24h > Time Spent: 3h 50m > Remaining Estimate: 20h 10m > > Add release step to build.sh to simplify pushing nuget packages during new > releases. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (AVRO-3266) Output stream incompatible with MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/AVRO-3266?focusedWorklogId=769503=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769503 ] ASF GitHub Bot logged work on AVRO-3266: Author: ASF GitHub Bot Created on: 12/May/22 09:33 Start Date: 12/May/22 09:33 Worklog Time Spent: 10m Work Description: RyanSkraba commented on PR #1618: URL: https://github.com/apache/avro/pull/1618#issuecomment-1124752427 Hello! My apologies, I'm getting back to Avro after a pretty constrained month. This looks important for 1.11.1, which should be coming out within the month! I've set the fix version in the JIRA, so you can be sure that this PR will get some attention. Issue Time Tracking --- Worklog Id: (was: 769503) Time Spent: 2.5h (was: 2h 20m) > Output stream incompatible with MagicS3GuardCommitter > - > > Key: AVRO-3266 > URL: https://issues.apache.org/jira/browse/AVRO-3266 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Michiel de Jong >Assignee: Emil Ejbyfeldt >Priority: Minor > Labels: pull-request-available > Fix For: 1.11.1 > > Time Spent: 2.5h > Remaining Estimate: 0h > > Avro's output stream can not be used in combination with the > MagicS3GuardCommitter > {code:java} > Error: java.lang.ClassCastException: class > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter cannot be cast to > class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter{code} > The reason for this problem is that > AvroOutputFormatBase.getAvroFileOutputStream tries to cast the outputcommiter > to a FileOutputCommitter. > It can be solved by casting to a PathOutputCommitter instead (which is a > superclass of both the FileOutputCommitter and the MagicS3GuardCommitter) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Assigned] (AVRO-3266) Output stream incompatible with MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/AVRO-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Skraba reassigned AVRO-3266: - Assignee: Emil Ejbyfeldt > Output stream incompatible with MagicS3GuardCommitter > - > > Key: AVRO-3266 > URL: https://issues.apache.org/jira/browse/AVRO-3266 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Michiel de Jong >Assignee: Emil Ejbyfeldt >Priority: Minor > Labels: pull-request-available > Fix For: 1.11.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Avro's output stream can not be used in combination with the > MagicS3GuardCommitter > {code:java} > Error: java.lang.ClassCastException: class > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter cannot be cast to > class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter{code} > The reason for this problem is that > AvroOutputFormatBase.getAvroFileOutputStream tries to cast the outputcommiter > to a FileOutputCommitter. > It can be solved by casting to a PathOutputCommitter instead (which is a > superclass of both the FileOutputCommitter and the MagicS3GuardCommitter) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (AVRO-3266) Output stream incompatible with MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/AVRO-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Skraba updated AVRO-3266: -- Status: Patch Available (was: Open) > Output stream incompatible with MagicS3GuardCommitter > - > > Key: AVRO-3266 > URL: https://issues.apache.org/jira/browse/AVRO-3266 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Michiel de Jong >Assignee: Emil Ejbyfeldt >Priority: Minor > Labels: pull-request-available > Fix For: 1.11.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Avro's output stream can not be used in combination with the > MagicS3GuardCommitter > {code:java} > Error: java.lang.ClassCastException: class > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter cannot be cast to > class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter{code} > The reason for this problem is that > AvroOutputFormatBase.getAvroFileOutputStream tries to cast the outputcommiter > to a FileOutputCommitter. > It can be solved by casting to a PathOutputCommitter instead (which is a > superclass of both the FileOutputCommitter and the MagicS3GuardCommitter) -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Updated] (AVRO-3266) Output stream incompatible with MagicS3GuardCommitter
[ https://issues.apache.org/jira/browse/AVRO-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Skraba updated AVRO-3266: -- Fix Version/s: 1.11.1 > Output stream incompatible with MagicS3GuardCommitter > - > > Key: AVRO-3266 > URL: https://issues.apache.org/jira/browse/AVRO-3266 > Project: Apache Avro > Issue Type: Bug > Components: java >Reporter: Michiel de Jong >Priority: Minor > Labels: pull-request-available > Fix For: 1.11.1 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Avro's output stream can not be used in combination with the > MagicS3GuardCommitter > {code:java} > Error: java.lang.ClassCastException: class > org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter cannot be cast to > class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter{code} > The reason for this problem is that > AvroOutputFormatBase.getAvroFileOutputStream tries to cast the outputcommiter > to a FileOutputCommitter. > It can be solved by casting to a PathOutputCommitter instead (which is a > superclass of both the FileOutputCommitter and the MagicS3GuardCommitter) -- This message was sent by Atlassian Jira (v8.20.7#820007)
Re: Avro Big Data Question from a developer
Hi Gokay, I am not sure whether you received Fokko's response since you are/were not subscribed to the mailing list (I know because I moderated your first email). Please check https://lists.apache.org/thread/58c9v7qbof3jzgxzx6qf9h436zcp79wp On Thu, May 12, 2022 at 11:44 AM Gokay Tosunoglu wrote: > Hi there, > I am new to avro. I have a C# appilcation which deals with big data. For > example like 1 file is more than 100 GB. This big file is CSV type, to be > able to read the data, I am using bufferedstream and checking end of line > to read next buffer.I want to support avro file type too, but i couldn't > find how can i read a 100 GB avro file and/or how to divide file into > buffers.Can any of you send me a way to do it or a sample code maybe?Thanks > in advance.Gokay Tosunoglu
Avro Big Data Question from a developer
Hi there, I am new to avro. I have a C# appilcation which deals with big data. For example like 1 file is more than 100 GB. This big file is CSV type, to be able to read the data, I am using bufferedstream and checking end of line to read next buffer.I want to support avro file type too, but i couldn't find how can i read a 100 GB avro file and/or how to divide file into buffers.Can any of you send me a way to do it or a sample code maybe?Thanks in advance.Gokay Tosunoglu
Re: Avro Big Data Question from a developer
Hi Gokay, That's some CSV file. That will probably be much smaller in Avro. An Avro file is a so-called Object Container File. This was implemented in the MapReduce era to make sure that the workload for each of the workers is roughly the same. Which makes it easier to tune the memory requirements. An Avro file actually contains one or more containers, which are individually compressed. The blocks are separated using the synchronization marker. More info can be found here: https://avro.apache.org/docs/current/spec.html Also, the C# code gives some good pointers: https://github.com/apache/avro/blob/42edbd721fedc0ed6cde89ab3b64a9ac606aa74f/lang/csharp/src/apache/main/File/DataFileWriter.cs You could read (and uncompress) the blocks one by one to keep the memory constant or use this to parallelize the reading of the file, which might significantly improve the throughput of the application. Kind regards, Fokko Driesprong Op do 12 mei 2022 om 08:16 schreef Gokay Tosunoglu : > Hi there,I have a C# appilcation which deals with big data. For example > like 1 file is more than 100 GB. This big file is CSV type, I am reading > this customer data by buffers and checking end of line to read next > buffer.I want to support avro too, but i couldn't find how can i read a 100 > GB avro file and/or how to divide file into buffers.Can any of you send me > a way to do it or a sample code maybe?Thanks in advance.Gokay Tosunoglu
Avro Big Data Question from a developer
Hi there,I have a C# appilcation which deals with big data. For example like 1 file is more than 100 GB. This big file is CSV type, I am reading this customer data by buffers and checking end of line to read next buffer.I want to support avro too, but i couldn't find how can i read a 100 GB avro file and/or how to divide file into buffers.Can any of you send me a way to do it or a sample code maybe?Thanks in advance.Gokay Tosunoglu
[jira] [Commented] (AVRO-3517) Rust: Optimize crates' size by disabling default features of the dependencies
[ https://issues.apache.org/jira/browse/AVRO-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535886#comment-17535886 ] ASF subversion and git services commented on AVRO-3517: --- Commit eab0404a02d7d93ff371e9aaabd5a18eda19b511 in avro's branch refs/heads/branch-1.11 from Martin Grigorov [ https://gitbox.apache.org/repos/asf?p=avro.git;h=eab0404a0 ] AVRO-3517: Do not use the default features of the dependencies (#1684) Explicitly list the features used/needed by Avro Signed-off-by: Martin Tzvetanov Grigorov (cherry picked from commit f56051539d9330722c36888b730aee5b559e01ec) > Rust: Optimize crates' size by disabling default features of the dependencies > - > > Key: AVRO-3517 > URL: https://issues.apache.org/jira/browse/AVRO-3517 > Project: Apache Avro > Issue Type: Improvement > Components: rust >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Inspired-by: [https://github.com/paupino/rust-decimal/blob/master/Cargo.toml] > All dependencies are declared with "{color:#cc7832}default-features {color}= > false" and then only the used features are listed explicitly. > This may reduce the size of the dependency tree. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Resolved] (AVRO-3517) Rust: Optimize crates' size by disabling default features of the dependencies
[ https://issues.apache.org/jira/browse/AVRO-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martin Tzvetanov Grigorov resolved AVRO-3517. - Fix Version/s: 1.11.1 1.12.0 Resolution: Fixed > Rust: Optimize crates' size by disabling default features of the dependencies > - > > Key: AVRO-3517 > URL: https://issues.apache.org/jira/browse/AVRO-3517 > Project: Apache Avro > Issue Type: Improvement > Components: rust >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Major > Labels: pull-request-available > Fix For: 1.11.1, 1.12.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Inspired-by: [https://github.com/paupino/rust-decimal/blob/master/Cargo.toml] > All dependencies are declared with "{color:#cc7832}default-features {color}= > false" and then only the used features are listed explicitly. > This may reduce the size of the dependency tree. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (AVRO-3517) Rust: Optimize crates' size by disabling default features of the dependencies
[ https://issues.apache.org/jira/browse/AVRO-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535884#comment-17535884 ] ASF subversion and git services commented on AVRO-3517: --- Commit f56051539d9330722c36888b730aee5b559e01ec in avro's branch refs/heads/master from Martin Grigorov [ https://gitbox.apache.org/repos/asf?p=avro.git;h=f56051539 ] AVRO-3517: Do not use the default features of the dependencies (#1684) Explicitly list the features used/needed by Avro Signed-off-by: Martin Tzvetanov Grigorov > Rust: Optimize crates' size by disabling default features of the dependencies > - > > Key: AVRO-3517 > URL: https://issues.apache.org/jira/browse/AVRO-3517 > Project: Apache Avro > Issue Type: Improvement > Components: rust >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Inspired-by: [https://github.com/paupino/rust-decimal/blob/master/Cargo.toml] > All dependencies are declared with "{color:#cc7832}default-features {color}= > false" and then only the used features are listed explicitly. > This may reduce the size of the dependency tree. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Work logged] (AVRO-3517) Rust: Optimize crates' size by disabling default features of the dependencies
[ https://issues.apache.org/jira/browse/AVRO-3517?focusedWorklogId=769432=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-769432 ] ASF GitHub Bot logged work on AVRO-3517: Author: ASF GitHub Bot Created on: 12/May/22 06:05 Start Date: 12/May/22 06:05 Worklog Time Spent: 10m Work Description: martin-g merged PR #1684: URL: https://github.com/apache/avro/pull/1684 Issue Time Tracking --- Worklog Id: (was: 769432) Time Spent: 20m (was: 10m) > Rust: Optimize crates' size by disabling default features of the dependencies > - > > Key: AVRO-3517 > URL: https://issues.apache.org/jira/browse/AVRO-3517 > Project: Apache Avro > Issue Type: Improvement > Components: rust >Reporter: Martin Tzvetanov Grigorov >Assignee: Martin Tzvetanov Grigorov >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Inspired-by: [https://github.com/paupino/rust-decimal/blob/master/Cargo.toml] > All dependencies are declared with "{color:#cc7832}default-features {color}= > false" and then only the used features are listed explicitly. > This may reduce the size of the dependency tree. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [avro] martin-g merged pull request #1684: AVRO-3517: Do not use the default features of the dependencies
martin-g merged PR #1684: URL: https://github.com/apache/avro/pull/1684 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@avro.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org