[rust-dev] GZip and Deflate
I wanted to add full compression support to Rust. Full support means stream compression (good for http compression) and multiple call compression (compressing large file with multiple batches of read). To get to that point, the miniz.cpp and deflate API in Rust runtime need to be enhanced to overcome a few limitations. I've worked with the author of miniz.c (Rich Geldreich) to merge changes into miniz.c on his codeline for the needed API for Rust, and resolved the decompression bug in miniz's code when working with gzip'ed file. http://code.google.com/p/miniz/issues/detail?id=25can=1 http://code.google.com/p/miniz/issues/detail?id=23can=1 I've implemented a full set of deflate API in Rust to support stream compression and multiple call compression, with caller-driven and callee-driven pipe style API. Also I've written the Rust GZip library with stream support like GZipReader and GZipWriter. For testing, I've re-implemented most of the gzip command line program on top of the Rust GZip library. Some performance data for the interested: Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Overall it seems solid. See https://github.com/williamw520/rustyzipfor the source. Now I need help to merge the changes into the Rust master codeline. There are couple things. 1. What license to assign for the new files? I use MPL currently. 2. There's a new version of miniz.cpp needed for things to work. Is the Rust runtime still open for C++ file changes? 3. There are two new files for the deflate library (deflate.rs) and the gzip library (gzip.rs). Which Rust runtime library is the appropriate one to put them in? 4. Should the command line tool rgzip stay in an outside codeline or be merged into the Rust runtime? If merging in, where is a good place for tools? Thanks William ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] GZip and Deflate
I can't answer your questions, but I do want to say that this is very interesting! Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Have you tried profiling this to see where our bottlenecks are? It would be great if we could use this as an opportunity to improve our performance. On Fri, Nov 1, 2013 at 6:02 AM, William Wong williamw...@gmail.com wrote: I wanted to add full compression support to Rust. Full support means stream compression (good for http compression) and multiple call compression (compressing large file with multiple batches of read). To get to that point, the miniz.cpp and deflate API in Rust runtime need to be enhanced to overcome a few limitations. I've worked with the author of miniz.c (Rich Geldreich) to merge changes into miniz.c on his codeline for the needed API for Rust, and resolved the decompression bug in miniz's code when working with gzip'ed file. http://code.google.com/p/miniz/issues/detail?id=25can=1 http://code.google.com/p/miniz/issues/detail?id=23can=1 I've implemented a full set of deflate API in Rust to support stream compression and multiple call compression, with caller-driven and callee-driven pipe style API. Also I've written the Rust GZip library with stream support like GZipReader and GZipWriter. For testing, I've re-implemented most of the gzip command line program on top of the Rust GZip library. Some performance data for the interested: Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Overall it seems solid. See https://github.com/williamw520/rustyzip for the source. Now I need help to merge the changes into the Rust master codeline. There are couple things. 1. What license to assign for the new files? I use MPL currently. 2. There's a new version of miniz.cpp needed for things to work. Is the Rust runtime still open for C++ file changes? 3. There are two new files for the deflate library (deflate.rs) and the gzip library (gzip.rs). Which Rust runtime library is the appropriate one to put them in? 4. Should the command line tool rgzip stay in an outside codeline or be merged into the Rust runtime? If merging in, where is a good place for tools? Thanks William ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] GZip and Deflate
On 11/1/13 8:40 AM, Benjamin Striegel wrote: I can't answer your questions, but I do want to say that this is very interesting! Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Have you tried profiling this to see where our bottlenecks are? It would be great if we could use this as an opportunity to improve our performance. It sounds like the problem is in miniz, so not in Rust code. (BTW, compile times for small crates are very gated on decompression of metadata. Improvement of decompression speed improves compile times. Although maybe we should just switch to Snappy or something.) Patrick ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] RFC about std::option and std::result API
Hello everyone! In the last week I've been trying to update both std::Result and std::Option's API to match each other more, and I'd like to see the opinion of the community and the Devs on a few areas it touches. # Option today The baseline here is Options current API, which roughly looks like this: 1. Methods for querying the variant - fn is_some(self) - bool - fn is_none(self) - bool 2. Adapter for working with references - fn as_ref'r('r self) - Option'r T - fn as_mut'r('r mut self) - Option'r mut T 3. Methods for getting to the contained value - fn expect(self, msg: str) - T - fn unwrap(self) - T - fn unwrap_or(self, def: T) - T - fn unwrap_or_else(self, f: fn() - T) - T 4. Methods for transforming the contained value - fn mapU(self, f: fn(T) - U) - OptionU - fn map_defaultU(self, def: U, f: fn(T) - U) - U - fn mutate(mut self, f: fn(T) - T) - bool - fn mutate_default(mut self, def: T, f: fn(T) - T) - bool 5. Iterator constructors - fn iter'r('r self) - OptionIterator'r T - fn mut_iter'r('r mut self) - OptionIterator'r mut T - fn move_iter(self) - OptionIteratorT 6. Boolean-like operations on the values, eager and lazy - fn andU(self, optb: OptionU) - OptionU - fn and_thenU(self, f: fn(T) - OptionU) - OptionU - fn or(self, optb: OptionT) - OptionT - fn or_else(self, f: fn() - OptionT) - OptionT 7. Other useful methods - fn take(mut self) - OptionT - fn filtered(self, f: fn(t: T) - bool) - OptionT - fn while_some(self, blk: fn(v: T) - OptionT) 8. Common special cases that are shorthand for chaining two other methods together. - fn take_unwrap(mut self) - T - fn get_ref'a('a self) - 'a T - fn get_mut_ref'a('a mut self) - 'a mut T Based on this, I have a few areas of the API of both modules to discuss: # Renaming `unwrap` to `get` This is a known issue (https://github.com/mozilla/rust/issues/9784), and I think we should just go through with it. There are a few things that speak in favor for this: - The name `unwrap` implies destruction of the original `Option`, which is not the case for implicitly copyable types, so it's a bad name. - Ever since we got the reference adapters, Options primary usage is to take self by value, with `unwrap` being the main method for getting the value out of an Option. `get` is a shorter name than `unwrap`, so it would make using Option less painful. - `Option` already has two shorthands for `.as_ref().unwrap()` and `as_mut().unwrap()`: `.get_ref()` and `.get_mut_ref`, so the name `get` has precedence in the current API. # Renaming `map_default` and `mutate_default` to `map_or` and `mutate_or_set` I can't find an issue for this, but I remember there being an informal agreement to not use the `_default` prefix in methods unless they are related to the `std::default::Default` trait, a confirmation of this would be nice. The names `map_or` and `mutate_or_set` would fit in the current naming scheme. # The problem with Result Now, the big issue. Up until now, work on the result module tried to converge on option's API, except adapted to work with Result. For example, option has `fn or_else(self, f: fn() - OptionT) - OptionT` to evaluate a function in case of a `None` value, while result has `pub fn or_elseF(self, op: fn(E) - ResultT, F) - ResultT, F` to evaluate a function in case of an `Err` value. However, while some of the operations are directly compatible with this approach, most others require two methods each, one for the `Ok` and one for the `Err` variant: - `fn unwrap(self) - T` vs `fn unwrap_err(self) - E` - `fn expect(self, str) - T` vs `fn expect_err(self, str) - E` - `fn map_defaultU(self, def: U, op: fn(T) - U) - U` vs `fn map_err_defaultF(self, def: F, op: fn(E) - F) - F` - ... and many other methods in blocks 3, 4 and 5. As you can see, this leads to API duplication twofold: All those methods already exist on Option, and all those methods exist both for the `Ok` and the `Err` variant. This is not an Result-only issue: Every enum that is laid out like Result either suffers the same kind of method duplication, or simply does not provide any, instead requiring the user to match and manipulate them manually. Examples would be `std::either::Either`, or `std::unstable::sync::UnsafeArcUnwrap` To solve this problem, I'm proposing a convention for all enums that consist of only newtype-like variants: # Variant adapters for newtype variant enums Basically, we should start the convention that every enum of the form enum FooA, B, C, ... { VariantA(A), VariantB(B), VariantC(C), ... } should implement two sets of methods: 1. Reference adapters for the type itself: - fn as_ref'r('r self) - Foo'r A, 'r B, 'r C, ... - fn as_mut'r('r mut self) - Foo'r mut A, 'r mut B, 'r mut C, ... 2. Option adapters for each variant: - fn variant_a(self) - OptionA - fn variant_b(self) - OptionB - fn variant_c(self) - OptionC - ... This would
Re: [rust-dev] GZip and Deflate
On 11/01/2013 03:02 AM, William Wong wrote: I wanted to add full compression support to Rust. Full support means stream compression (good for http compression) and multiple call compression (compressing large file with multiple batches of read). To get to that point, the miniz.cpp and deflate API in Rust runtime need to be enhanced to overcome a few limitations. I've worked with the author of miniz.c (Rich Geldreich) to merge changes into miniz.c on his codeline for the needed API for Rust, and resolved the decompression bug in miniz's code when working with gzip'ed file. http://code.google.com/p/miniz/issues/detail?id=25can=1 http://code.google.com/p/miniz/issues/detail?id=23can=1 Awesome! I've implemented a full set of deflate API in Rust to support stream compression and multiple call compression, with caller-driven and callee-driven pipe style API. Also I've written the Rust GZip library with stream support like GZipReader and GZipWriter. For testing, I've re-implemented most of the gzip command line program on top of the Rust GZip library. Some performance data for the interested: Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Overall it seems solid. See https://github.com/williamw520/rustyzip for the source. Even more awesome! Now I need help to merge the changes into the Rust master codeline. There are couple things. 1. What license to assign for the new files? I use MPL currently. APL2/MIT dual license. Just copy the same headers that exist on all the other .rs files. 2. There's a new version of miniz.cpp needed for things to work. Is the Rust runtime still open for C++ file changes? This change is fine. 3. There are two new files for the deflate library (deflate.rs http://deflate.rs) and the gzip library (gzip.rs http://gzip.rs). Which Rust runtime library is the appropriate one to put them in? Replace extra::flate with these two. 4. Should the command line tool rgzip stay in an outside codeline or be merged into the Rust runtime? If merging in, where is a good place for tools? Not in mainline. Note that libextra is slated to be broken up into a number of supported but external crates in the near future, and this will probably end up in its own crate. For now though I think the above strategy will work. Regards, Brian ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] GZip and Deflate
On 11/01/2013 11:42 AM, Brian Anderson wrote: On 11/01/2013 03:02 AM, William Wong wrote: I wanted to add full compression support to Rust. Full support means stream compression (good for http compression) and multiple call compression (compressing large file with multiple batches of read). To get to that point, the miniz.cpp and deflate API in Rust runtime need to be enhanced to overcome a few limitations. I've worked with the author of miniz.c (Rich Geldreich) to merge changes into miniz.c on his codeline for the needed API for Rust, and resolved the decompression bug in miniz's code when working with gzip'ed file. http://code.google.com/p/miniz/issues/detail?id=25can=1 http://code.google.com/p/miniz/issues/detail?id=23can=1 Awesome! I've implemented a full set of deflate API in Rust to support stream compression and multiple call compression, with caller-driven and callee-driven pipe style API. Also I've written the Rust GZip library with stream support like GZipReader and GZipWriter. For testing, I've re-implemented most of the gzip command line program on top of the Rust GZip library. Some performance data for the interested: Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Overall it seems solid. See https://github.com/williamw520/rustyzip for the source. Even more awesome! Now I need help to merge the changes into the Rust master codeline. There are couple things. 1. What license to assign for the new files? I use MPL currently. APL2/MIT dual license. Just copy the same headers that exist on all the other .rs files. Er, that's ASL2 (Apache License 2.0) ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC about std::option and std::result API
# Renaming `unwrap` to `get` I would personally find this change a little odd because we still have a large number of `unwrap` methods thorughout the codebase. Most of these do indeed imply destruction of the enclosing type. A change like this would mean that when you decide to write your unwrapping method you must internally think about whether this always implies that the outer type would be destroyed or not. In my opinion, unwrap() on Optionint does exactly what it should and it's just a bug vs state of mind kind of thing. I would rather strive for consistency across all APIs than have a special case based on whether the type just happens to not be destroyed because the whole thing is implicitly copyable. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] GZip and Deflate
Date: Fri, 1 Nov 2013 11:40:18 -0400 From: Benjamin Striegel ben.strie...@gmail.com Cc: rust-dev@mozilla.org rust-dev@mozilla.org Subject: Re: [rust-dev] GZip and Deflate Message-ID: caavrl-nzheontsrp2ubsggrjpbjtcf0ytrjho8o3phezxmw...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 I can't answer your questions, but I do want to say that this is very interesting! Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Have you tried profiling this to see where our bottlenecks are? It would be great if we could use this as an opportunity to improve our performance. I haven't profiled the code yet; will get to it once I get some time. The first priority was just to get things working correctly; there were some tricky edge cases on some large files with different sizes. They had been flushed out and fixed. Without profiling, it's uncertain to see where the bottlenecks are. The CPU was max'ed out so I suspect it's the compression code rather than the IO code. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC about std::option and std::result API
On 11/01/2013 08:22 PM, Brian Anderson wrote: My first reaction is that it's not obvious the difference between `is_ok` and `ok`. Also, it would seem that `ok()` is required to return a different type to make this work, which seems fairly burdensome. If it *is* required to return a different type, then the obvious thing to do is just transform it into `OptionT` which is fairly nice since it just defers the problem to `Option` completely. The idea is that for example for `res: ResultT, E`, `res.ok()` returns `OptionT` and `res.err()` returns `OptionE`. So yes, it would just defer to Option. About the difference between `is_ok` and `ok` - fair enough. However I think the only way to make this more clear is to rename `ok()` and `err()` to something longer, which would make the chaining more verbose and has the problem that all short words that would fit are already heavily overloaded in rust terminology: - `as_ok()` and `as_err()` - `to_ok()` and `to_err()` - `get_ok()` and `get_err()` - `ok_get()` and `err_get()` Maybe a abbreviation of variant would work: - `ok_var()` and `err_var()` Seems to read nice at least: ~~~ res.ok_var().get(); res.err_var().get(); res.err_var().expect(...); ~~~ ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
[rust-dev] (no subject)
Date: Fri, 01 Nov 2013 09:14:26 -0700 From: Patrick Walton pcwal...@mozilla.com To: rust-dev@mozilla.org Subject: Re: [rust-dev] GZip and Deflate Message-ID: 5273d362.8040...@mozilla.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 11/1/13 8:40 AM, Benjamin Striegel wrote: I can't answer your questions, but I do want to say that this is very interesting! Rust compression is about 1.8 times slower, decompression is about 3 times slower than gzip. Have you tried profiling this to see where our bottlenecks are? It would be great if we could use this as an opportunity to improve our performance. It sounds like the problem is in miniz, so not in Rust code. (BTW, compile times for small crates are very gated on decompression of metadata. Improvement of decompression speed improves compile times. Although maybe we should just switch to Snappy or something.) Patrick I suspect that's the case but without profiling it's difficult to pinpoint the bottleneck. Snappy looks interesting. I'll look into it later when I get some times. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] GZip and Deflate
From: Brian Anderson bander...@mozilla.com To: rust-dev@mozilla.org Subject: Re: [rust-dev] GZip and Deflate Message-ID: 5273f691.3060...@mozilla.com Content-Type: text/plain; charset=iso-8859-1; Format=flowed 1. What license to assign for the new files? I use MPL currently. APL2/MIT dual license. Just copy the same headers that exist on all the other .rs files. Er, that's ASL2 (Apache License 2.0) That's great. Thanks. I'll use that. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC about std::option and std::result API
On 11/01/2013 08:30 PM, Alex Crichton wrote: # Renaming `unwrap` to `get` I would personally find this change a little odd because we still have a large number of `unwrap` methods thorughout the codebase. Most of these do indeed imply destruction of the enclosing type. A change like this would mean that when you decide to write your unwrapping method you must internally think about whether this always implies that the outer type would be destroyed or not. In my opinion, unwrap() on Optionint does exactly what it should and it's just a bug vs state of mind kind of thing. I would rather strive for consistency across all APIs than have a special case based on whether the type just happens to not be destroyed because the whole thing is implicitly copyable. Imo we still keep consistency even with this rename. `get` is simply the more general term which we'd use for generic situations where we don't know anything about the type, while specific implementations can choose either name depending on situation. I think it's more useful to say use the name unwrap if the function does something non-trivial. For example, `ARC::unwrap()` should probably not be renamed to `get` because it can block the task. ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] (no subject)
For my data experiments, I would rather like to see an LZ4 implementation https://code.google.com/p/lz4/ (a lossless, very, very, very, very, very, very, very, very fast decompression, with same compression - the very's are dependent on how many cpu cores you have :-) ) and it's BSD licensed. -- -Thad +ThadGuidry https://www.google.com/+ThadGuidry Thad on LinkedIn http://www.linkedin.com/in/thadguidry/ ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC about std::option and std::result API
On Nov 1, 2013, at 12:59 PM, Marvin Löbel loebel.mar...@gmail.com wrote: Maybe a abbreviation of variant would work: - `ok_var()` and `err_var()` Seems to read nice at least: ~~~ res.ok_var().get(); res.err_var().get(); res.err_var().expect(...); ~~~ var here makes me think variable. My two cents says go with `res.ok().get()` and `res.err().get()`. It's unfortunate that `ok()` can be read as if it were `is_ok()`, but I think people will get used to it pretty fast. -Kevin___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev
Re: [rust-dev] RFC about std::option and std::result API
Yeah, many of the overlaps between these APIs should or could be expressed as additional traits... -- Kevin On Nov 1, 2013 12:49 PM, Brendan Zabarauskas bjz...@yahoo.com.au wrote: My first thought is unrelated: it would be awesome if we had a lint mode that detected methods like `get`, `get_ref`, etc. - all these common patterns - and confirmed that their result type looked like what we expect. We could apply this to all the official libraries to try to stay consistent. This could help to ensure that our APIs could remain reasonably intact through a transition to higher kinded types. ~Brendan ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev ___ Rust-dev mailing list Rust-dev@mozilla.org https://mail.mozilla.org/listinfo/rust-dev