[rust-dev] GZip and Deflate

2013-11-01 Thread William Wong
I wanted to add full compression support to Rust.  Full support means
stream compression (good for http compression) and multiple call
compression (compressing large file with multiple batches of read).  To get
to that point, the miniz.cpp and deflate API in Rust runtime need to be
enhanced to overcome a few limitations.

I've worked with the author of miniz.c (Rich Geldreich) to merge changes
into miniz.c on his codeline for the needed API for Rust, and resolved the
decompression bug in miniz's code when working with gzip'ed file.

http://code.google.com/p/miniz/issues/detail?id=25can=1
http://code.google.com/p/miniz/issues/detail?id=23can=1

I've implemented a full set of deflate API in Rust to support stream
compression and multiple call compression, with caller-driven and
callee-driven pipe style API.  Also I've written the Rust GZip library with
stream support like GZipReader and GZipWriter.  For testing, I've
re-implemented most of the gzip command line program on top of the Rust
GZip library.  Some performance data for the interested: Rust compression
is about 1.8 times slower, decompression is about 3 times slower than
gzip.  Overall it seems solid.  See
https://github.com/williamw520/rustyzipfor the source.

Now I need help to merge the changes into the Rust master codeline.  There
are couple things.

1. What license to assign for the new files?  I use MPL currently.
2. There's a new version of miniz.cpp needed for things to work.  Is the
Rust runtime still open for C++ file changes?
3. There are two new files for the deflate library (deflate.rs) and the
gzip library (gzip.rs).  Which Rust runtime library is the appropriate one
to put them in?
4. Should the command line tool rgzip stay in an outside codeline or be
merged into the Rust runtime?  If merging in, where is a good place for
tools?

Thanks

William
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] GZip and Deflate

2013-11-01 Thread Benjamin Striegel
I can't answer your questions, but I do want to say that this is very
interesting!

 Rust compression is about 1.8 times slower, decompression is about 3
times slower than gzip.

Have you tried profiling this to see where our bottlenecks are? It would be
great if we could use this as an opportunity to improve our performance.


On Fri, Nov 1, 2013 at 6:02 AM, William Wong williamw...@gmail.com wrote:

 I wanted to add full compression support to Rust.  Full support means
 stream compression (good for http compression) and multiple call
 compression (compressing large file with multiple batches of read).  To get
 to that point, the miniz.cpp and deflate API in Rust runtime need to be
 enhanced to overcome a few limitations.

 I've worked with the author of miniz.c (Rich Geldreich) to merge changes
 into miniz.c on his codeline for the needed API for Rust, and resolved the
 decompression bug in miniz's code when working with gzip'ed file.

 http://code.google.com/p/miniz/issues/detail?id=25can=1
 http://code.google.com/p/miniz/issues/detail?id=23can=1

 I've implemented a full set of deflate API in Rust to support stream
 compression and multiple call compression, with caller-driven and
 callee-driven pipe style API.  Also I've written the Rust GZip library with
 stream support like GZipReader and GZipWriter.  For testing, I've
 re-implemented most of the gzip command line program on top of the Rust
 GZip library.  Some performance data for the interested: Rust compression
 is about 1.8 times slower, decompression is about 3 times slower than
 gzip.  Overall it seems solid.  See
 https://github.com/williamw520/rustyzip for the source.

 Now I need help to merge the changes into the Rust master codeline.  There
 are couple things.

 1. What license to assign for the new files?  I use MPL currently.
 2. There's a new version of miniz.cpp needed for things to work.  Is the
 Rust runtime still open for C++ file changes?
 3. There are two new files for the deflate library (deflate.rs) and the
 gzip library (gzip.rs).  Which Rust runtime library is the appropriate
 one to put them in?
 4. Should the command line tool rgzip stay in an outside codeline or be
 merged into the Rust runtime?  If merging in, where is a good place for
 tools?

 Thanks

 William


 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev


___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] GZip and Deflate

2013-11-01 Thread Patrick Walton

On 11/1/13 8:40 AM, Benjamin Striegel wrote:

I can't answer your questions, but I do want to say that this is very
interesting!

  Rust compression is about 1.8 times slower, decompression is about 3
times slower than gzip.

Have you tried profiling this to see where our bottlenecks are? It would
be great if we could use this as an opportunity to improve our performance.


It sounds like the problem is in miniz, so not in Rust code.

(BTW, compile times for small crates are very gated on decompression of 
metadata. Improvement of decompression speed improves compile times. 
Although maybe we should just switch to Snappy or something.)


Patrick

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Marvin Löbel

Hello everyone!

In the last week I've been trying to update both std::Result and 
std::Option's API
to match each other more, and I'd like to see the opinion of the 
community and the

Devs on a few areas it touches.

# Option today

The baseline here is Options current API, which roughly looks like this:

1. Methods for querying the variant
  - fn is_some(self) - bool
  - fn is_none(self) - bool

2. Adapter for working with references
  - fn as_ref'r('r self) - Option'r T
  - fn as_mut'r('r mut self) - Option'r mut T

3. Methods for getting to the contained value
  - fn expect(self, msg: str) - T
  - fn unwrap(self) - T
  - fn unwrap_or(self, def: T) - T
  - fn unwrap_or_else(self, f: fn() - T) - T

4. Methods for transforming the contained value
  - fn mapU(self, f: fn(T) - U) - OptionU
  - fn map_defaultU(self, def: U, f: fn(T) - U) - U
  - fn mutate(mut self, f: fn(T) - T) - bool
  - fn mutate_default(mut self, def: T, f: fn(T) - T) - bool

5. Iterator constructors
  - fn iter'r('r self) - OptionIterator'r T
  - fn mut_iter'r('r mut self) - OptionIterator'r mut T
  - fn move_iter(self) - OptionIteratorT

6. Boolean-like operations on the values, eager and lazy
  - fn andU(self, optb: OptionU) - OptionU
  - fn and_thenU(self, f: fn(T) - OptionU) - OptionU
  - fn or(self, optb: OptionT) - OptionT
  - fn or_else(self, f: fn() - OptionT) - OptionT

7. Other useful methods
  - fn take(mut self) - OptionT
  - fn filtered(self, f: fn(t: T) - bool) - OptionT
  - fn while_some(self, blk: fn(v: T) - OptionT)

8. Common special cases that are shorthand for chaining two other 
methods together.

  - fn take_unwrap(mut self) - T
  - fn get_ref'a('a self) - 'a T
  - fn get_mut_ref'a('a mut self) - 'a mut T

Based on this, I have a few areas of the API of both modules to discuss:

# Renaming `unwrap` to `get`

This is a known issue (https://github.com/mozilla/rust/issues/9784), and I
think we should just go through with it.
There are a few things that speak in favor for this:
- The name `unwrap` implies destruction of the original `Option`, which is
  not the case for implicitly copyable types, so it's a bad name.
- Ever since we got the reference adapters, Options primary
  usage is to take self by value, with `unwrap` being the main method for
  getting the value out of an Option.
  `get` is a shorter name than `unwrap`, so it would make using Option 
less painful.
- `Option` already has two shorthands for `.as_ref().unwrap()` and 
`as_mut().unwrap()`:
  `.get_ref()` and `.get_mut_ref`, so the name `get` has precedence in 
the current API.


# Renaming `map_default` and `mutate_default` to `map_or` and 
`mutate_or_set`


I can't find an issue for this, but I remember there being an informal 
agreement to

not use the `_default` prefix in methods unless they are related to the
`std::default::Default` trait, a confirmation of this would be nice.

The names `map_or` and `mutate_or_set` would fit in the current naming 
scheme.


# The problem with Result

Now, the big issue. Up until now, work on the result module tried to 
converge on

option's API, except adapted to work with Result.

For example, option has `fn or_else(self, f: fn() - OptionT) - 
OptionT`

to evaluate a function in case of a `None` value, while result has
`pub fn or_elseF(self, op: fn(E) - ResultT, F) - ResultT, F`
to evaluate a function in case of an `Err` value.

However, while some of the operations are directly compatible with this 
approach,
most others require two methods each, one for the `Ok` and one for the 
`Err` variant:


- `fn unwrap(self) - T` vs
  `fn unwrap_err(self) - E`
- `fn expect(self, str) - T` vs
  `fn expect_err(self, str) - E`
- `fn map_defaultU(self, def: U, op: fn(T) - U) - U` vs
  `fn map_err_defaultF(self, def: F, op: fn(E) - F) - F`
- ... and many other methods in blocks 3, 4 and 5.

As you can see, this leads to API duplication twofold: All those methods
already exist on Option, and all those methods exist both for the `Ok` 
and the `Err`

variant.

This is not an Result-only issue: Every enum that is laid out like 
Result either

suffers the same kind of method duplication, or simply does not provide any,
instead requiring the user to match and manipulate them manually.

Examples would be `std::either::Either`, or 
`std::unstable::sync::UnsafeArcUnwrap`


To solve this problem, I'm proposing a convention for all enums that 
consist of

only newtype-like variants:

# Variant adapters for newtype variant enums

Basically, we should start the convention that every enum of the form

enum FooA, B, C, ... {
VariantA(A),
VariantB(B),
VariantC(C),
...
}

should implement two sets of methods:

1. Reference adapters for the type itself:
  - fn as_ref'r('r self) - Foo'r A, 'r B, 'r C, ...
  - fn as_mut'r('r mut self) - Foo'r mut A, 'r mut B, 'r mut C, 
...

2. Option adapters for each variant:
  - fn variant_a(self) - OptionA
  - fn variant_b(self) - OptionB
  - fn variant_c(self) - OptionC
  - ...

This would 

Re: [rust-dev] GZip and Deflate

2013-11-01 Thread Brian Anderson

On 11/01/2013 03:02 AM, William Wong wrote:
I wanted to add full compression support to Rust.  Full support means 
stream compression (good for http compression) and multiple call 
compression (compressing large file with multiple batches of read).  
To get to that point, the miniz.cpp and deflate API in Rust runtime 
need to be enhanced to overcome a few limitations.


I've worked with the author of miniz.c (Rich Geldreich) to merge 
changes into miniz.c on his codeline for the needed API for Rust, and 
resolved the decompression bug in miniz's code when working with 
gzip'ed file.


http://code.google.com/p/miniz/issues/detail?id=25can=1
http://code.google.com/p/miniz/issues/detail?id=23can=1


Awesome!



I've implemented a full set of deflate API in Rust to support stream 
compression and multiple call compression, with caller-driven and  
callee-driven pipe style API.  Also I've written the Rust GZip library 
with stream support like GZipReader and GZipWriter.  For testing, I've 
re-implemented most of the gzip command line program on top of the 
Rust GZip library.  Some performance data for the interested: Rust 
compression is about 1.8 times slower, decompression is about 3 times 
slower than gzip.  Overall it seems solid.  See 
https://github.com/williamw520/rustyzip for the source.


Even more awesome!



Now I need help to merge the changes into the Rust master codeline.  
There are couple things.


1. What license to assign for the new files?  I use MPL currently.


APL2/MIT dual license. Just copy the same headers that exist on all the 
other .rs files.


2. There's a new version of miniz.cpp needed for things to work.  Is 
the Rust runtime still open for C++ file changes?


This change is fine.

3. There are two new files for the deflate library (deflate.rs 
http://deflate.rs) and the gzip library (gzip.rs http://gzip.rs).  
Which Rust runtime library is the appropriate one to put them in?


Replace extra::flate with these two.

4. Should the command line tool rgzip stay in an outside codeline or 
be merged into the Rust runtime?  If merging in, where is a good place 
for tools?


Not in mainline.

Note that libextra is slated to be broken up into a number of supported 
but external crates in the near future, and this will probably end up in 
its own crate. For now though I think the above strategy will work.


Regards,
Brian

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] GZip and Deflate

2013-11-01 Thread Brian Anderson

On 11/01/2013 11:42 AM, Brian Anderson wrote:

On 11/01/2013 03:02 AM, William Wong wrote:
I wanted to add full compression support to Rust.  Full support means 
stream compression (good for http compression) and multiple call 
compression (compressing large file with multiple batches of read).  
To get to that point, the miniz.cpp and deflate API in Rust runtime 
need to be enhanced to overcome a few limitations.


I've worked with the author of miniz.c (Rich Geldreich) to merge 
changes into miniz.c on his codeline for the needed API for Rust, and 
resolved the decompression bug in miniz's code when working with 
gzip'ed file.


http://code.google.com/p/miniz/issues/detail?id=25can=1
http://code.google.com/p/miniz/issues/detail?id=23can=1


Awesome!



I've implemented a full set of deflate API in Rust to support stream 
compression and multiple call compression, with caller-driven and  
callee-driven pipe style API.  Also I've written the Rust GZip 
library with stream support like GZipReader and GZipWriter.  For 
testing, I've re-implemented most of the gzip command line program on 
top of the Rust GZip library.  Some performance data for the 
interested: Rust compression is about 1.8 times slower, decompression 
is about 3 times slower than gzip.  Overall it seems solid. See 
https://github.com/williamw520/rustyzip for the source.


Even more awesome!



Now I need help to merge the changes into the Rust master codeline.  
There are couple things.


1. What license to assign for the new files?  I use MPL currently.


APL2/MIT dual license. Just copy the same headers that exist on all 
the other .rs files.


Er, that's ASL2 (Apache License 2.0)
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Alex Crichton
 # Renaming `unwrap` to `get`

I would personally find this change a little odd because we still have
a large number of `unwrap` methods thorughout the codebase. Most of
these do indeed imply destruction of the enclosing type. A change like
this would mean that when you decide to write your unwrapping method
you must internally think about whether this always implies that the
outer type would be destroyed or not. In my opinion, unwrap() on
Optionint does exactly what it should and it's just a bug vs state
of mind kind of thing. I would rather strive for consistency across
all APIs than have a special case based on whether the type just
happens to not be destroyed because the whole thing is implicitly
copyable.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] GZip and Deflate

2013-11-01 Thread William Wong
Date: Fri, 1 Nov 2013 11:40:18 -0400

 From: Benjamin Striegel ben.strie...@gmail.com
 Cc: rust-dev@mozilla.org rust-dev@mozilla.org
 Subject: Re: [rust-dev] GZip and Deflate
 Message-ID:
 
 caavrl-nzheontsrp2ubsggrjpbjtcf0ytrjho8o3phezxmw...@mail.gmail.com
 Content-Type: text/plain; charset=iso-8859-1

 I can't answer your questions, but I do want to say that this is very
 interesting!

  Rust compression is about 1.8 times slower, decompression is about 3
 times slower than gzip.

 Have you tried profiling this to see where our bottlenecks are? It would be
 great if we could use this as an opportunity to improve our performance.


I haven't profiled the code yet; will get to it once I get some time.  The
first priority was just to get things working correctly; there were some
tricky edge cases on some large files with different sizes.  They had been
flushed out and fixed.

Without profiling, it's uncertain to see where the bottlenecks are.  The
CPU was max'ed out so I suspect it's the compression code rather than the
IO code.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Marvin Löbel

On 11/01/2013 08:22 PM, Brian Anderson wrote:
My first reaction is that it's not obvious the difference between 
`is_ok` and `ok`. Also, it would seem that `ok()` is required to 
return a different type to make this work, which seems fairly 
burdensome. If it *is* required to return a different type, then the 
obvious thing to do is just transform it into `OptionT` which is 
fairly nice since it just defers the problem to `Option` completely. 
The idea is that for example for `res: ResultT, E`, `res.ok()` returns 
`OptionT` and `res.err()` returns `OptionE`. So yes, it would just 
defer to Option.


About the difference between `is_ok` and `ok` - fair enough. However I 
think the only way to make this more clear is to rename `ok()` and 
`err()` to something longer, which would make the chaining more verbose 
and has the problem that all short words that would fit are already 
heavily overloaded in rust terminology:

- `as_ok()` and `as_err()`
- `to_ok()` and `to_err()`
- `get_ok()` and `get_err()`
- `ok_get()` and `err_get()`

Maybe a abbreviation of variant would work:

- `ok_var()` and `err_var()`

Seems to read nice at least:

~~~
res.ok_var().get();
res.err_var().get();
res.err_var().expect(...);
~~~

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] (no subject)

2013-11-01 Thread William Wong
Date: Fri, 01 Nov 2013 09:14:26 -0700

 From: Patrick Walton pcwal...@mozilla.com
 To: rust-dev@mozilla.org
 Subject: Re: [rust-dev] GZip and Deflate
 Message-ID: 5273d362.8040...@mozilla.com
 Content-Type: text/plain; charset=ISO-8859-1; format=flowed

 On 11/1/13 8:40 AM, Benjamin Striegel wrote:
  I can't answer your questions, but I do want to say that this is very
  interesting!
 
Rust compression is about 1.8 times slower, decompression is about 3
  times slower than gzip.
 
  Have you tried profiling this to see where our bottlenecks are? It would
  be great if we could use this as an opportunity to improve our
 performance.

 It sounds like the problem is in miniz, so not in Rust code.

 (BTW, compile times for small crates are very gated on decompression of
 metadata. Improvement of decompression speed improves compile times.
 Although maybe we should just switch to Snappy or something.)

 Patrick


I suspect that's the case but without profiling it's difficult to pinpoint
the bottleneck.

Snappy looks interesting.  I'll look into it later when I get some times.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] GZip and Deflate

2013-11-01 Thread William Wong
 From: Brian Anderson bander...@mozilla.com
 To: rust-dev@mozilla.org
 Subject: Re: [rust-dev] GZip and Deflate
 Message-ID: 5273f691.3060...@mozilla.com
 Content-Type: text/plain; charset=iso-8859-1; Format=flowed

  1. What license to assign for the new files?  I use MPL currently.
 
  APL2/MIT dual license. Just copy the same headers that exist on all
  the other .rs files.

 Er, that's ASL2 (Apache License 2.0)


 That's great.  Thanks.  I'll use that.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Marvin Löbel

On 11/01/2013 08:30 PM, Alex Crichton wrote:

# Renaming `unwrap` to `get`

I would personally find this change a little odd because we still have
a large number of `unwrap` methods thorughout the codebase. Most of
these do indeed imply destruction of the enclosing type. A change like
this would mean that when you decide to write your unwrapping method
you must internally think about whether this always implies that the
outer type would be destroyed or not. In my opinion, unwrap() on
Optionint does exactly what it should and it's just a bug vs state
of mind kind of thing. I would rather strive for consistency across
all APIs than have a special case based on whether the type just
happens to not be destroyed because the whole thing is implicitly
copyable.
Imo we still keep consistency even with this rename. `get` is simply the 
more general term which we'd use for generic situations where we don't 
know anything about the type, while specific implementations can choose 
either name depending on situation.


I think it's more useful to say use the name unwrap if the function 
does something non-trivial. For example, `ARC::unwrap()` should 
probably not be renamed to `get` because it can block the task.

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] (no subject)

2013-11-01 Thread Thad Guidry
For my data experiments, I would rather like to see an LZ4 implementation
https://code.google.com/p/lz4/  (a lossless, very, very, very, very, very,
very, very, very fast decompression, with same compression - the very's are
dependent on how many cpu cores you have :-) ) and it's BSD licensed.

-- 
-Thad
+ThadGuidry https://www.google.com/+ThadGuidry
Thad on LinkedIn http://www.linkedin.com/in/thadguidry/
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Kevin Ballard

On Nov 1, 2013, at 12:59 PM, Marvin Löbel loebel.mar...@gmail.com wrote:

 Maybe a abbreviation of variant would work:
 
 - `ok_var()` and `err_var()`
 
 Seems to read nice at least:
 
 ~~~
 res.ok_var().get();
 res.err_var().get();
 res.err_var().expect(...);
 ~~~

var here makes me think variable.

My two cents says go with `res.ok().get()` and `res.err().get()`. It's 
unfortunate that `ok()` can be read as if it were `is_ok()`, but I think people 
will get used to it pretty fast.

-Kevin___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] RFC about std::option and std::result API

2013-11-01 Thread Kevin Cantu
Yeah, many of the overlaps between these APIs should or could be expressed
as additional traits...

-- Kevin
On Nov 1, 2013 12:49 PM, Brendan Zabarauskas bjz...@yahoo.com.au wrote:

  My first thought is unrelated: it would be awesome if we had a lint mode
 that detected methods like `get`, `get_ref`, etc. - all these common
 patterns - and confirmed that their result type looked like what we expect.
 We could apply this to all the official libraries to try to stay consistent.

 This could help to ensure that our APIs could remain reasonably intact
 through a transition to higher kinded types.

 ~Brendan
 ___
 Rust-dev mailing list
 Rust-dev@mozilla.org
 https://mail.mozilla.org/listinfo/rust-dev

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev