Re: [rust-dev] Cryptol, the language of cryptography

2014-04-27 Thread Bill Myers
There is nothing hard about it, assuming you are using a decent language.

Just add a CryptoT type that wraps integers and booleans and that doesn't 
allow any non-constant time operations nor implicit conversion to anything that 
is not CryptoT (which of course means you can't index memory or do branches 
based on it).

If the optimizer can screw up things, implement the CryptoT operations in 
assembly.

  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Optimizing pattern bindings to not copy anything at runtime

2014-04-24 Thread Bill Myers
Nice, but why isn't the LLVM optimizer removing the move?
Is it lack of proper alias analysis?
Sounds like that is a separate issue worth pursuing.

 The remaining, more difficult, issue is initialization of 
 aggregate data structures via constructor functions, which still 
 involves a bunch of moves, but I don't really see any way short of 
 macros to optimize that at the moment.

Wouldn't #[inline(always)] on the aggregate constructor fix that? (it's not 
great, but it's strictly better than a macro)

At any rate, shouldn't LLVM inlining heuristics catch that automatically as 
well?
If not, that sounds like another separate issue.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Removing ~foo

2014-04-24 Thread Bill Myers
 Something like this will work, yes. It'll probably look more like:
 
 Box::new(*x)
 
 This will be described in some of the RFCs that are coming up soon.

Awesome!

We should really get rid of the ~T syntax in favor of FooT (where Foo = Box, 
Own, Heap, etc.), since it is deceptively simple for something that should in 
fact be rarely used.

  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Reminder: ~[T] is not going away

2014-04-02 Thread Bill Myers

 
 At the moment, Rust is completely broken in this regard. The following
 expression evaluates to None:
 Some(~())

Ouch, this is a disaster.

Is there a bug filed for this?

Anyway, I don't get your argument about size to free having anything to do with 
fixing it (although I agree that size to free is awesome).

If you don't care about equality (i.e. the fact that *~() != *~(), but a == a 
where a = *~()), just return the address of a single private static 1-byte 
item for any 0-sized allocation.

If you DO care about equality, then you will need at least an integer 
allocation scheme in all cases on 32-bit platforms, and the real costs are the 
data structures to track that (at least a bit in a bitmask, probably at least 2 
bits for an efficient implementation).
If you can't use the 1-2GB of kernel address space, then you'll also need to 
allocate one byte of actual usable address space (but not committed memory).

On 64-bit platforms, you generally have at least around 2^60-2^63 bytes of 
unusable address space, so you can just increment a pointer pointing there for 
each allocation, at zero cost.

Of course the quick and simple fix is to try to call malloc(0) and if it 
returns NULL, remember that and switch to using malloc(1) instead.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] matching on a few bits in int

2014-03-29 Thread Bill Myers
I think the best solution is to add uN and sN types where N is not a power of 
two, which LLVM should already support.

Then you can write your match like this:
match (val  6) as u2
{
  ...
}

And it will work as desired.

Biggest issue is that to make it work nicely you'd need to add some way to 
generalize over the bit-length and integers, and that's going to require 
generics with int parameters and work to add those.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Virtual fn is a bad idea

2014-03-12 Thread Bill Myers
 However, the extensibility of trait objects comes at the cost of fat
 pointers, which can be a problem if you have a lot of pointers.

This is fixable without introducing virtual functions, by adding a way to 
express Struct and vtable for impl Trait for Struct and thin pointer to 
Struct and vtable for impl Trait for Struct (or by using an indirect pointer, 
i.e. ~~Trait or Rc~Trait).

 It also implies that downcasting isn't really possible, at least not
 cheaply, because there is no notion of the actual type of a
 particular object.

I don't understand this.

You can downcast trait pointers to struct pointers just fine, since you can put 
whatever information needed in the trait impl vtable (such as a typeid).

Sure, you can't downcast a struct to a struct, but that's the whole point of 
structs.

The idea of a struct is that one is referring to something of EXACTLY that 
type, and so downcasting makes no sense; when one wants a variable type one 
uses a trait and then you can (attempt to) downcast to concrete structs.

One can of course introduce virtual structs which are no longer exactly of 
that type but are now virtual, but that just overlaps the role of traits.
 (As an aside, I am personally happier to see virtual structs than to see 
 traits extending structs or traits extending a `HasPrefix` trait, as was 
 included in previous designs. Both of those approaches mean that the trait 
 is no longer pure interface, and if you write generic code against that 
 trait, you're actually coding at least partially against a fixed 
 implementation, not an interface.)

I think traits inheriting structs is a better design, because you can have 
multiple traits inheriting the same struct.

For example, let's say you are modelling GUI widgets which need both input 
handling and drawing support.

With traits inheriting structs, you can have several traits, one for input 
handling, 
one for drawing in memory, one for drawing with OpenGL, etc., while with 
virtual functions (and without multiple inheritance) you have to put everything 
together and you have to either implement 
all functionality or add not supported error codes.

In other words, a trait inheriting a struct neatly separates the trait part 
where multiple inheritance is natural from the struct part where single 
inheritance is natural.
Also, a trait inheriting a Struct behaves to users of the trait exactly like a 
trait with a get_struct() accessor method, but with better performance.

It's only different for implementors, where it mandates how get_struct() is 
implemented for the sake of performance, at the cost of making it impossible to 
implement it along with traits inheriting from incompatible structs.

In general I think it's better to have multiple options at a low level like 
this, rather than having multiple option at an high-level semantic level like 
virtual struct vs trait, since that exposes the choice less to other modules.   

___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Virtual fn is a bad idea

2014-03-11 Thread Bill Myers
I see a proposal to add virtual struct and virtual fn in the workweek 
meeting notes, which appears to add an exact copy of Java's OO system to Rust.

I think however that this should be carefully considered, and preferably not 
added at all (or failing that, feature gated and discouraged).

The core problem of virtual functions (shared by Java's classes, etc.) is 
that rather than exposing a single public API, they expose two: the API formed 
by public functions, and the API formed by virtual functions to be overridden 
by subclasses, and the second API is exposed in an inflexible and unclean way.

A much better way of allowing to override part of a struct's behavior is by 
defining a trait with the overridable functionality, and allowing to pass in an 
implementation of the trait to the base class, while also providing a default 
implementation if desired.

Another way is to have the subclass implement all the traits that the base 
class implements, include a field of the base class type, and then direct 
all non-overridden functionality to the base class (here syntax sugar can be 
easily added to eliminate the boilerplate, by automatically implementing all 
non-implemented trait functions by calling the same function on the base class 
field).

These approaches can be combined, as the first approach allows to change the 
inside behavior of the base class, while the second one allows to put extra 
behavior around the base class code.

The fact that OO using virtual functions (as opposed to traits) is a bad design 
is one of the crucial steps forward of the design of languages like Go and 
current Rust compared to earlier OO languages, and Rust should not go backwards 
on this.

Here is a list of issues with virtual functions:

1. Incentive for bad documentation

Usually there is no documentation for how virtual functions are supposed to be 
overridden, and it as awkward to add it since it needs to be mixed with the 
documentation on how to use the struct

2. Mishmashing multiple unrelated APIs

With traits, you could pass in multiple objects to implement separate sets of 
overridable functionality; with virtual structs you need to mishmash all those 
interfaces into a single set of virtual functions, all sharing data even when 
not appropriate.

3. No encapsulation

Private data for virtual function implementations is accessible to all other 
functions in the struct.

This means for instance that if you have a virtual function called 
compute_foo() that is implemented by default by reading a foo field in the 
base class, then all other parts of the base class can access foo too.

If anything else accesses mistakenly foo directly, which it can, then 
overriding compute_foo() will not work as expected.

If compute_foo() were provided by an external trait implementation, then foo 
would be private and inaccessible, eliminating the problem.

4. Data for overridden implementations left there in a zombie state.

In the above example, if you override compute_foo(), the foo variable in the 
base class will no longer be used, yet it will still be present in the type and 
allocated in memory.

5. Inability to statically dispatch

With a trait implementation, you can pass the concrete type as a generic 
parameter, allowing static dispatch.

If you instead call an overridable virtual function, then you can't dispatch 
that statically at all (unless you add cumbersome syntax for that).

6. Adds a ton of unnecessary complexity to the language 
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Handling I/O errors

2014-02-03 Thread Bill Myers
 it is guaranteed to happen on all readers

I meant all finite readers, such as those for normal disk files.
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Will smart-pointer deref operator allow making iter::ByRef more generic?

2014-02-03 Thread Bill Myers
I don't think so, because the fact that the particular instance of T implements 
the Deref trait cannot have any effect on the decorator code, since it's not in 
the bounds for T.

What instead would work is to change the language so that if type Type 
implements Trait and all Trait methods take self or mut self (as opposed to 
by value self or ~self), then an implementation of Trait for 'a mut Type is 
automatically generated (with the obvious implementation).

Likewise if all Trait methods take self, then an implementation of Trait for 
'a Type is also automatically generated.

Then what you want to do will just work without the need of any wrapper or 
special syntax.

One could then, as an additional step, automatically generate an implementation 
of Trait for MutDerefType if Trait is implemented by mut Type (possibly due 
to the above technique), but this would not be required for the example.
   
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
Stack management for green tasks has been based in the past first on segmented 
stacks and then on standard large stacks.

However, I just realized that there is a third alternative which might well be 
better than both of those.

The idea is very simple: a green task would run on a large stack like now, but 
when it is descheduled, a memory buffer of the size of its used stack space is 
allocated, and its stack data is copied there; when it is rescheduled, the 
stack data is copied back to the original address.

The copying can be performed either by memcpy (perhaps with a specialized 
memcpy that assumes alignment and always uses SIMD instruction to copy with no 
checks), or perhaps by using a compression algorithm designed for maximum 
speed, such as Snappy or maybe a new ad-hoc design.

This allows to have the best possible memory efficiency and thus the maximum 
possible number of tasks, even better than segmented stacks due to precise 
sizing and optional compression, while not adding any overhead to code that 
does not block, either in terms of code generation complexity or runtime cost.

There is of course an additional runtime cost when blocking, but blocking 
already often involves both system calls and context switching, and workloads 
with lots of blocking green tasks are probably I/O bound anyway; furthermore, 
stack data is probably in the CPU cache, so there shouldn't be much penalty.

The only significant drawback is that two green tasks running from the 
same large stack (note that stacks cannot be easily switched, as that 
would invalidate borrowed pointers on the stack to the stack) cannot 
ever run concurrently; however, by making the number of large stacks a 
large enough multiple of the number of CPU cores, the probability of 
reduced parallelism due to this issue can be made as small as desired.

In general, one would use an adaptive approach, where this kind of copying 
would start to kick in only once the number of green tasks becomes large; when 
not copying, one would just optionally use madvise(MADV_DONTNEED) to trim 
unused whole pages of the stack.

Also, if the data to be copied is large (several OS pages), one may use 
mremap() or similar facilities to move the address space mappings instead of 
the data itself.

Some unsafe code that assumes that tasks can access each other's stacks will 
break (probably only the new Mutex implementation), but that can be fixed by 
putting the data in the task structure instead of the stack.

Overall, this should allow writing something like a parallel network server or 
web crawler in Rust in a natural blocking style, while being fully confident in 
its scalability, which is something that is not really possible at the moment.

Once this is in place, the best practices for Rust programs would become:
1. Use native tasks for tasks whose number is bounded at compile time and small
2. Use green tasks for tasks whose number is unbounded or large

One could even automate this to some extent by spawning by default a native 
task the first C times a given proc() code address is used to start a task 
(where C = number of CPU cores), and green tasks afterwards.

What do you think?
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
 I assume this is incompatible with work stealing and task migration
 between threads?

It is compatible, assuming that the number of large stacks is sufficiently 
larger than the number of threads.

Basically, each green task can only run on a specific large stack, but as long 
as you aren't unlucky that all runnable green tasks you'd steal are running on 
a large stack that has already a running green task, then you can steal one.

The probability of that obviously decreases the more large stacks you use; 
large stacks can only consume address space (if you wipe them with 
madvise(MADV_DONTNEED) or similar when unused), so one can reasonably have 256 
8MB large stacks on 32-bit, for instance.

So for instance, if you had an HT 4-core machine with 8 system threads running 
green tasks and 1024 large stacks, the probability of a green task that is not 
blocked but not executable (because its large stack is in use) is 7/256; 
obviously if K tasks are not blocked, then the probability of having none 
executable becomes (7/256)^K (i.e. exponentially lower).

On 64-bit, the probability can be made arbitrarily low, although you consume 
more kernel memory to store the metadata associated with the memory mappings 
for the large stacks.

It would require to rearchitecture the work stealing code to work with a 
two-level data structure and first steal a large stack with at least one 
non-blocked task, and then run the task, rather than directly stealing the 
task.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
Interesting: my proposal appears to be indeed a generalization of the greenlet 
approach.

Specifically, while the greenlet proposal seems to only use one large stack per 
native thread, I'm suggesting to use multiple large stacks that can be stolen 
by other threads, which does mitigate the issues stemming from non-position 
dependent stacks (they are not eliminated, but they have low probability of 
happening).

It's also indeed possible to fully eliminate those issues by autoboxing 
everything whose address is taken, but that would have a potentially large 
performance impact on non-blocking code, while merely copying the stack only 
imposes a performance penalty to code that blocks.  
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Idea: Memcpyed stacks for green tasks

2014-01-25 Thread Bill Myers
The ratio of native threads to stacks and of stacks to tasks can actually be 
used to characterize all systems discussed.

(stacks/thread, tasks/stacks)
(1, 1) = current Rust native tasks
(1, N) = Python greenlets
(N, 1) = current Rust green tasks
(N, M) = proposal in my original mail
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts

2014-01-10 Thread Bill Myers
This can be easily implemented in Rust as a struct doing exactly that.

There's no need to modify the I/O layer, since you'd simply borrow an [u8] 
from the type and pass it, resulting in the I/O layer directly writing into the 
locked zeroed-on-destruction memory.

As for crypto, it seems the plan is to not implement it in Rust, but
 to bind to libraries such as OpenSSL, libgcrypt, Windows CryptoAPI, etc.

I guess patches would be welcome to implement this. 
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Ephemeral byte arrays for cryptographic keys/plaintexts

2014-01-10 Thread Bill Myers
At any rate, note that what you are trying to do only provides some mitigation 
and is far from a complete solution, because in practice you can't prevent 
leakage of all confidential data in this way (what about hibernation while the 
key is in memory? what about plaintext decrypted with the key?)

The only effective solution is to encrypt all storage including swap using 
full-disk encryption, as well as all internal network links using IPsec or 
similar, so that it doesn't matter if sensitive data is swapped, accidentally 
written to files or communicated between servers.   
  
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] RFC: Generalized monadic notation

2013-12-24 Thread Bill Myers
Some languages support a special do notation that allows to express monadic 
operations more naturally.

However, there is an even more powerful option, that I'd call in notation (I 
came up with it, but it's obvious, so I'm sure there's some language that has 
something like it).

The idea is that we add an in keyword, and in-types, which are the type in 
T where T is a monadic type.

In-types are physically represented as the monadic type, but semantically 
behave like the type contained in the monad; they are constructed with the 
expression in x.

The out keyword does the opposite, converting an in-type to the wrapped type.

Operations performed on in types actually act on the value inside of the monad, 
as better specified below

Quick examples:
out(in Some(3) + 6) gives Some(9)
out(in Some(3) + in Some(6)) also gives Some(9)
out(in Some(3) + in None) gives None
out(in ~[1, 2, 3] + 10) gives ~[11, 12, 13]
out(in ~[1, 2, 3] + in ~[10, 20, 30]) gives ~[11, 21, 31, 12, 22, 32, 13, 23, 
33]

Internally, when the compiler encounters any expression including an in-type 
(expressions include control flow constructs), it proceeds like this 
(considering its operands from left to right):
1. Optionally, if the expression is correctly typed (e.g. calling a function 
that takes an in-type), it compiles it normally
2. If the expression does not have an in-type value itself, it converts the 
expression into a closure, and passes it to map()
2. If the expression does have an in-type value itself (for example in x + in y 
has an in-type when viewed as a function of in x, due to the presence of in y), 
it converts out(expression) into a closure, and passes it to flat_map()

Non-movable control flow like return or break is forbidden in expression 
involving in-types (they can be implemented with a flag, but that causes other 
equivalence issues).

The advantage of this is that special do-notation syntax is not required, and 
one can naturally manipulate the values.

The disadvantage is that it adds a layer of magic, making what the code 
actually does less obvious.

This is particularly true with list-like monads, where simple things like (in 
x) + (in y) actually become a double loop with quadratic run-time; this can be 
optionally avoided by only allowing in notation to be used with monads that 
call the closure once.

Also, side effects will be performed a different amount of times depending on 
the monad values, which may also not be obvious.

Note that there are a lot of details that I omitted for the sake of brevity, 
and would need to be handled.
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Safely writing iterators over idiomatic C structures

2013-12-06 Thread Bill Myers
Maybe the language should be changed to allow Iterator to be changed to have a 
signature like this:
pub trait IteratorA {
    fn next'a('a mut self) - Option'a A;
}

Then you could return the mut by reborrowing and would be able to advance the 
iterator without issue afterwards. 
___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-05 Thread Bill Myers
No, the problem you describe does not exist in my implementation because it 
requires an mut to the smart pointer.

In particular, if the reference count is 1, then there is no other Rc and Arc 
pointing to the same data, and because we have an mut there is also no other 
borrowed reference to the Rc or Arc we are manipulating.

Hence, if the reference count is 1 and we have an mut to the Rc or Arc, it's 
safe to return an mut pointing to the contents and thus mutate its contents in 
place using it.

If the reference count is more than 1, then it uses read-only access to clone 
the contents, giving a new allocation with reference count 1 that can then be 
mutated in place as above.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] How to reply properly in digest mode?

2013-12-05 Thread Bill Myers
Never use digest mode.

Instead, use normal mode and if necessary add a filter in your e-mail website 
or application to separate all mailing list messages in a specific folder or 
label.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-04 Thread Bill Myers



Hello, I already implemented a persistent tree-map called SnapMap: you can find 
the source code at https://github.com/mozilla/rust/pull/9816

I stopped working on it before I made a serious effort to push it into the Rust 
codebase and don't have time to work further on it, so it would be awesome if 
you were interested in continuing that effort.

It's pretty much finished, but pretty much completely untested (however, it was 
derived from the existing treemap, so the amount of bugs should not be as high 
as something written from scratch and untested).

The first thing that needs to be done is to run the existing tests inherited 
from treemap, fix any bug that shows up, and then write more tests to test 
SnapMap specific usage patterns and features.

Then, one should look at the mutable/handle-based iterators in the code and 
decide whether they should be removed, kept as is, or improved, possibly after 
changing the iterator interface to return an object with the lifetime of the 
function call instead of the iterator.

The rest of the code should be fine (except for bugs due to no testing), but 
there might be places where unnecessary copy-on-write is performed.

My code used an improved version of Rc that supports copy-on-write by cloning 
only when reference count  1.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Persistent data structures

2013-12-04 Thread Bill Myers
 Did COW improve performance?  What's a good way to do performance 
 testing of Rust code?

The reason I introduced COW when RC  1 is that it allows persistent data 
structures to be mutated in place if there aren't extra references, just like 
non-persistent data structures.

Lots of languages with persistent data structures can't do that because they 
use garbage collection (and thus they can't tell whether there are other 
references due to lack of a reference count), but it seems an essential feature 
to have if one is using reference counting in a language like Rust that is 
supposed to be a systems language producing optimal code.

 Should I try to get your patches to Rc and Arc merged?  They look 
 generally useful, if folks think they fit the design of Rc.  You also 
 have a patch that adds OwnT to std which is equivalent to ~T but has 
 methods that look like Rc's.  You use this, and putting most of your 
 treemap.rs inside macro definitions, to get a sort of 
 reference-type-polymorphism in your treemap.  Shall I continue with this 
 approach and try to get OwnT merged too?  (Opinions from anybody are 
 welcome.)

I did it because I realized that having two different balanced tree 
implementations would have imposed a significant maintenance burden, and thus 
the macro and OwnT approach would have allowed to avoid that by replacing the 
current treemap code completely with the new optionally-persistent version.

Also, having both Rc and Arc versions seems quite useful anyway, and currently 
I think macros are the best way to have both, until Rust starts supporting 
higher-kinded types (which would allow to pass Rc as a parameter), or at least 
recursive types (which would allow to express RcTreeNodeT, RcTreeNode, T, 
... and pass it as a parameter).

I don't have much more input on what the best names for the methods are, 
whether to have OwnT instead of ~T and so on; I guess someone on the Rust 
core team will have to decide on that, and there's already some discussion on 
that on https://github.com/mozilla/rust/pull/9786

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Removing some autoref magic

2013-11-20 Thread Bill Myers
Have you considered making deref the default instead and requiring moves to use 
an explicit move keyword?

Basically, from this hypothetical syntax to current one:
- x = x
- mut x = mut x
- move x = x

One could even make the  implicit in parameter types in function declarations 
unless the move keyword is used, so we would have (new = old syntax):
- fn(self, ...) = fn(self, ...)
- fn(mut self, ...) = fn(mut self, ...)
- fn(move self, ...) = fn(self, ...)

Regarding owned pointers, one could introduce this syntax:
let *x = func_that_gives_owned_pointer()

which makes x a variable of type T referring to the contents of the owned box, 
and thus derefed to  when passed to a function as above without having to 
write *x in the function call.

One could then add a new #x or similar syntax that would get back to the 
owned pointer.

Not sure if all this is a good idea, just throwing it out there. I actually 
think keeping autoref/autoderef is probably better.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
I see several proposals for the future of Rust tasks, and I think one of the 
best approaches is being overlooked, and that is something similar to async in 
C# (http://msdn.microsoft.com/en-us/library/vstudio/hh191443.aspx).

In C#, the async keyword can be applied to functions and it causes the 
compiler to transform a function that returns T into a function that returns a 
TaskT (which is C#'s name for a future of type T) representing the 
potentially asynchronous computation.

Blocking is representing by using the await keyword on a future (typically 
returned by a call to another async function), and it causes the compiler to 
perform a Continuation-Passing Style transformation, and attach the 
continuation to the future, returning another future representing the composed 
computation.

I/O functions are designed to return futures, so in this system blocking causes 
all calling async functions to return, building up a chain of continuations 
on the heap, which is equivalent to the machine stack in a current-Rust task, 
but which is as small as needed, and is only used for call chains that block.

In Rust, this transformation is much more delicate, because the resulting 
return value futures must have a lifetime that is the smallest among all the 
arguments to the function, if those arguments are needed by the continuation, 
and the argument types must be shareable between parallel forks (e.g. 
std::rc::Rc is not shareable because RC manipulation is non-atomic).

However, it is possible to restrict the system to use non-delimited 
continuations instead of the delimited continuations and futures, which would 
avoid this problem, since futures cannot be used explicitly anymore (at the 
cost of making flexible parallelism impossible).

In this latter case, it would be equivalent to the current task system, except 
for requiring blocking functions to be marked async; the await keyword 
would not be required, since it would effectively become compulsory if there 
are no first-class futures returned for async functions.

Advantages:
- Functions and interfaces that can perform I/O or can block are clearly marked 
by a keyword
- Can have billions of blocked tasks (with hundreds of gigabytes of RAM) since 
the memory used by each blocked task is truly minimized because it's on the heap

Disadvantages:
- Requires an heap allocation for every function call to an async function (to 
hold the continuation data)
- Non-async functions cannot call async functions, so interfaces must be 
explicitly marked as async or not
- Requires to implement the transform in the compiler

Microsoft switched to this paradigm in the latest version of C# and in the 
Windows RT API, and it might be an appropriate choice for Rust too.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers



 This is similar to IO Monad in Haskell. Adding that to previously pure
 computational code is painful, but on the other hand it does emphasis
 that computational code != IO code and minimizing mixing between two
 typically leads to better design overall.

Yes, but you can have the compiler automatically transform normal code into the 
equivalent of Haskell's do notation, which is what C# does with async.

Basically code like this:

async fn foo() - uint {
let a: int = compute_a();
let b: int = compute_b();
let c: char = await read_char();
let r: uint = compute_r(a, b, c);
return r;
}

becomes after the compiler performs CPS transform:

// assume that we have trait ContinuationT {fn continueT(~self, v: T);}
// assume that cps_read_char is provided by the runtime and tells a main loop 
to read a char and then schedule a new task running the passed continuation

struct foo_Continuation(~Continuationuint, int, int);

fn cps_foo(continuation: ~Continuationuint) {
let a: int = compute_a();
let b: int = compute_b();
return cps_read_char(~foo_Continuation(continuation, a, b));
}

impl Continuationchar for foo_Continuation
{
fn continue(~self, c: char) {
let (continuation, a, b) = *self;
let r = compute_r(a, b, c);
continuation.continue(r);
}
}

and then a future-based version can also be generated by the compiler based on 
the CPS transform results:
// assume that ContinuationT is implemented on TaskFutureT and causes the 
TaskFuture to be completed with the value

fn future_foo() - ~TaskFutureint {
let f = ~TaskFuture::new();
cps_foo(f);
return f;
}

Note that if foo() had taken a borrowed pointer with lifetime 'a, then the CPS 
version becomes unsafe, and TaskFuture would also need an 'a parameter.

Also, one can use a borrowed pointer instead of an owned pointer for 
continuations and futures, but then they need to be waited on before its 
lifetime region ends.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
 The issue with async/await is that while it maps very well to the AIO
 primitives like IOCP and POSIX AIO, it doesn't map well to something
 that's solid on Linux. It's just not how I/O is done on the platform.
 It uses *non-blocking* I/O so scale up socket servers, with
 notification of ready state rather than completion state. This doesn't
 work for file system access though.

Well, I/O would work exactly the same as the current libuv-based Rust tasks 
approach, except that one needs a stack for each CPU thread and not for each 
task, because task creation functions return when a task wants to block, and 
thus there is no stack to save and restore.

On Linux, the epoll interface allows to implement this, and I think it's what 
libuv uses.

The only problem is that Linux doesn't really support asynchronously resolving 
file paths to inodes (aka opening files), but that can be done on a dedicated 
thread pool, with the advantage that the threads don't do anything, so they 
don't have zombie stacks.

The problem with blocking on all threads/the 1:1 thread model is that if you do 
a computation that requires 8MB of stack, and then block with a shallow call 
stack, the 8MB of stack are not freed, so you waste 8MB per TCP connection in a 
TCP server example, which means that on a system with 32GB RAM you can only 
service 4096 TCP connections without swapping to disk instead of millions.

A dedicated thread pool for blocking I/O doesn't have this issue 
because it can run with only 4KB stack or so since it doesn't do 
anything stack intensive, and the number of its threads can be limited 
without user-visible results.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] C# async for Rust

2013-11-13 Thread Bill Myers
Although, on second thought, one could just free the unused part of the user 
mode stack whenever a thread blocks, either in the user mode code (i.e. using 
madvise MADV_DONTNEED or equivalent to discard everything below the stack 
pointer modulo the page size, perhaps minus the page size) or automatically in 
a modified kernel, and thus greatly reduce the worst case.

It's still going to have higher overhead than the CPS continuations on the 
heap, because the stack will tend to have holes where dead variables live, for 
aligning to page boundaries, and you also keep around the kernel stack and 
kernel scheduler objects.

And it doesn't work on 32-bit, because you cannot have more than around 2048 
tasks with megabyte-sized task stacks (which is way too low for several 
usages), and unfortunately there are lots of 32-bit-only smartphones and 
tablets that should probably be supported.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] On Stack Safety

2013-10-21 Thread Bill Myers

  It seems to me that trying to determine max stack size is incompatible with 
  dynamic linking. So even if you disallow recursion, any function that calls 
  a function outside of its own crate is not going to be able to trust its 
  calculated max stack size.

The maximum stack size needs to be computed dynamically, in a C++ global 
constructor placed by rustc in all crate compiled libraries/executables that 
would write the computed value in suitable global variables.

The check only needs to be done by task creation routines, by function calls 
contained in a recursion cycle, by closures and by stubs put in ~Trait/@Trait 
vtables (where Trait is public and thus implementable by other crates).

Note that the latter two cannot really be avoided by taking the max over all 
implementations dynamically because a new dynamic library could be loaded 
between the check and the indirect call.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Should I/O use conditions?

2013-10-16 Thread Bill Myers
What about the idea of making Result cause task failure if it is destroyed in 
the error case? (as opposed to destructuring it)

This just needs a simple language change to add an attribute that would be 
applied on Result to declare that it's OK to destructure it and cause drop() to 
not be called.

In addition, maybe some EH syntax sugar like the one I proposed in 
http://www.mail-archive.com/rust-dev@mozilla.org/msg04093.html

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Fate of const

2013-09-19 Thread Bill Myers
 Allowing one closure to take mut while another takes const would
 create a data race if the two closures are executed in parallel.

Closures executable in parallel would probably have kind bounds forbidding 
const: 
http://smallcultfollowing.com/babysteps/blog/2013/06/11/data-parallelism-in-rust/
 explicitly mentions this.

BTW, how about keeping it, and calling it volatile instead of const, 
since that's what C uses to name something that can be changed outside the 
program's control?

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Structural enums for datasort refinements

2013-08-28 Thread Bill Myers
The issues in O'Caml seem to be due to the fact that in O'Caml function 
parameter and return types are inferred, and thus accidentally oversized enum 
types can propagate through them.

In Rust, they must specified by the user, so those oversized enum types will 
cause an error as they are passed as a parameter or return value with an 
user-specified type that is smaller than the inferred type.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] x;RE: Structural enums for datasort refinements

2013-08-28 Thread Bill Myers
I was talking about 
http://smallcultfollowing.com/babysteps/blog/2012/08/24/datasort-refinements/, 
which essentially introduces structural enums but only for variants belonging 
to the same named enum.


   * This strikes me as an extreme change to the language, but
  perhaps my gut is overly conservative.  

  
Well, to some extent: it does indeed cause a fundamental change in type 
inference, since now any finite set of types can be unified as a structural 
enum type.

That said, I think this needs to be done anyway in something like Matsakis' 
proposal, since otherwise you can't pass an x declared as let mut x = A; 
if(foo) {x = B;} to something taking a Enum[A, B] if Enum also has a C case, 
so it only really changes how frequently the expanded type inference would be 
used.

BTW, it should be allowed to call impl methods, trait methods and access fields 
that are common to A, B, C  on A | B | C, effectively adding a new closed 
polymorphism system that can be used by just switching a types from Trait to A 
| B | C if the only possible Trait implementations in that code are A, B, C, 
with the rest of the code being unchanged.

 You have not addressed in your proposal how you would change the
  match syntax to deal with non-struct variants, such as ~str or
  int.

Scala uses x: int to match an int (or type pattern in general) an assign it 
to x, and x @ pattern to assign x to something matched by a pattern (since 
Scala distinguishes between value patterns and type patterns).

In Rust distinguishing between value and type patterns might not be necessary, 
so one can probably use the same character for both cases.

 Do the tags on the variants need to
  encode the whole type, and not just which struct it is?

They of course need to identify the whole type, and they probably will change 
numerically depending on generic instantiation if some can unify under some 
generic instantiations.

 And what
  about the poor user who didn't think about the fact that they
  might alias each other, and
 thought that all the clauses in the
  code for A | B| SY were disjoint, but in fact they
  potentially overlap
 due to the potential aliasing, and thus the
  order of the cases in the above is now crucial, yes?

Well, the poor user's thought process turns out to be incorrect, since any enum 
including a naked type parameter is obviously not disjoint in all cases.


   -- Another example: Can/should I now just throw seemingly
  unrelated struct's into a match, in order to 
 anticipate that the
  parameters will be instantiated with that struct?  Consider the
  following:

Yes: the compiler can potentially emit an error or warning if the match will 
never match under any  generic instantiation.

  
This also allows a form of internal partial specialization of functions, 
where a function can have specialized behavior for some subsets of generic 
instantiations.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Structural enums for datasort refinements

2013-08-27 Thread Bill Myers
I was reading a proposal about adding datasort refinements to make enum 
variants first-class types, and it seems to me there is a simpler and more 
effective way of solving the problem.

The idea is that if A, B and C are types, then A | B | C is a structural 
enum type that can be either A, B or C.

In addition, A can be implicitly converted to A | B, A | B can be 
implicitly converted to A | B | C, and also (A | B) | C and A | (B | C) 
are equivalent to A | B | C, and finally C | B | A is equivalent to A | B 
| C (to support the latter, the implementation needs to sort variants in some 
arbitrary total order before assigning tag numbers).

Furthermore, a way to bind variables to an or pattern is introduced to allow 
to convert A | B | C to A | B in the case that it holds an A or a B.

This way, one can rewrite Option as a type alias like this:
struct SomeT(T);
struct None;

type OptionT = None | SomeT;

Which is like the current Option, but also makes None and SomeT first-class 
types.

The current enum syntax can remain as syntax sugar for the above code.

The only issue I see is what to do for code such as let mut x = Some(3); x = 
None;: with this proposal, Some and None are separate unrelated types, so we 
either have this code emit an error, or x must be given the type Someint | 
None automatically, which however can lead to obscure error messages if one 
mistakenly attempts to assign a string to it causing the type to become 
Someint | None | ~str (i.e. the user might be told than a match is not 
exhaustive because it does not handle the ~str case, rather than that they 
assigned a ~str to an Option-typed variable).

It should be possible to allow this, and make the error-emitting code use 
heuristics to figure out whether it is more likely that the user assigned a 
value of the wrong type, or used an enum improperly (for example, by looking at 
whether the implicitly created enum type is ever written explicitly in the 
source, and whether the deduced structural enum type is being used in places 
that require a non-enum type).

Alternatively, one can stipulate that only types that are structs, or that are 
structs marked enum struct or case struct or similar can become part of an 
inferred structural enum, but this seems unappealing.

Note that some structural enums can change representations depending generic 
instantiation, since T | int becomes just int if T = int, while it is ~str 
| int if T = ~str (and similar for SomeT | Someint), but this should not 
be a problem.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] cycle time, compile/test performance

2013-08-23 Thread Bill Myers
  2. Distribute compilations and tests across a cluster of machines (like
  distcc)
 
 
 Compilation is 99%  serial (the only things that happen in parallel
 are rustpkg and rustdoc etc at the end, and they are almost nothing),
 though tests could be distributed (and Graydon is working on doing
 that afaik).

Are you sure?

I'm not an expert on rustc implementation, but I'd expect you could have a 
sequence of parallel phases and collection points: specifically I'd guess you 
could parse all files in parallel, and once you have the AST for all files, do 
checking and typing in parallel, and then do codegen in parallel.

For comparison, C/C++ can do everything except linking in parallel, except for 
parsing header files if not using precompiled headers (which is still parallel 
but repeated for each file); linking can be parallelized to some extent on a 
single machine with gold.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] cycle time, compile/test performance

2013-08-21 Thread Bill Myers
Have you considered the following non-specific quick fixes?

1. Build on a ramfs/ramdisk

2. Distribute compilations and tests across a cluster of machines (like distcc)

3. If non-parallelizable code is still the bottleneck, use the 
fastest CPU possible (i.e. an overclocked Core i7 
4770K, overclocked Core i7 4960X/3960X/4930K/3930K, dual Xeon E5 2687W or quad 
Xeon E5 4650 
depending on whether you need 4, 6, 16 or 32 cores)

4. Read metadata only once, in one of these ways:
4a. Pass all files to a single compiler invocation (per machine or core)
4b. Have a long-lived rustc daemon (per machine or core) that keeps crate 
metadata in memory and gets passed files to compile by fd
4c. Use CRIU suspend/restore (or the unexec from Emacs or whatever) to suspend 
a rustc process after metadata is read and restore that image for each file 
instead of spawning a new one
4d. Allocate metadata using a special allocator that allocates it from a block 
at a fixed memory address, then just dump the block into a file, and read 
metadata with a single mmap system call at that same fixed address (this is a 
security hole in general, so it needs to be optional and off by default)

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Segmented stacks (was: IsRustSlimYet (IsRustFastYet v2))

2013-07-05 Thread Bill Myers
I believe that instead of segmented stacks, the runtime should determine a 
tight upper bound for stack space for the a task's function, and only allocate 
a fixed stack of that size, falling back to a large C-sized stack if a bound 
cannot be determined.

Such a bound can always be computed if there is no recursion, dynamic dispatch, 
dynamic allocation on the stack, or foreign C functions.

And in fact, it can probably be computed even for foreign C functions, as long 
as DWARF exception tables are available, or some constraints on machine code 
are assumed (i.e. that negative offsets are not applied to pointers to the 
stack stored in memory).

This would allow for instance to transform many simple internal iterators into 
external ones automatically via coroutines, without large memory overhead.
  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] On tunable Garbage Collection

2013-06-18 Thread Bill Myers
For Rc, it should be enough to have the GC traverse raw pointers
 that have/don't have a special attribute (probably traversing by 
default is the best choice), as long as the raw pointers have the 
correct type.

Obviously it only makes sense to traverse types if it is possible for them to 
point to GC pointers.

For
 data held exclusively by external C libraries (assuming that there is 
any), an unsafe GcVisitor trait that can be implemented to do the visit 
should work, along with a NonGc kind for code that doesn't want to 
implement GcVisitor (otherwise, marksweep GC would kill live 
objects).

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers

 This would be a big step away from the advantages of Rust's current
 trait system. Right now, if the definition of a generic function type
 checks, it's valid for all possible types implementing the trait
 bounds. There are no hidden or implicit requirements.

Yes, but since Rust, like C++, instantiates generic functions at specialization 
time, this shouldn't be an issue. It just means that, like with C++ templates, 
compilation errors concerning the template code can happen at instantiation 
time.

In general, this is unavoidable with a mutable reference counting scheme, 
because if you have something like this:
struct AT
{
v: T
}

impl AT
{
fn set_v(mut self, v: T)
{
self.v = v;
}
}

Then you can allow to call set_v when T is u32, but not when T contains an rc 
pointer back to AT because that could create a cycle for some values of v.

The alternative is to just disallow programs that have cycles of rc pointers 
where at least one of them is mutable, which is less powerful.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers
 Every
 single reviewer I showed a no cycles variant of rust to told me
 it was unacceptable and they would walk away from a language that
 prohibited cycles in all cases. All of them. We tried limiting it
 within subsets of the type system rather than pervasively: it still
 irritated and confused everyone.

Interesting, are these versions online anywhere?

   - It doesn't do anything about fragmentation. Our longer-term plan for
 the gc involves moving to mostly-copying[1], which compacts pages
 other than the conservatively-scanned set. And is incremental.

True, in fact fragmentation might be as bad or worse as the dead objects living 
on in GC.

  However, it cannot support cycles (without destroying its advantages
  over GC), so it is a good idea to statically prevent them to avoid leaks.
 
 We tried this; various strategies for it. Ownership cycles even just
 within subtrees of well-organized ownership trees are not a rare
 occurrence. They are a result both of normal patterns within common data
 structures, and normal levels of coupling between subsystems in large
 applictions. Especially web browsers.

How about, however, something like (hypotetically) rewriting the Linux kernel 
in Rust? (with limited amounts of unsafe code)

The issue there is that a kernel needs to provide syscalls to multiple CPU 
concurrently running threads which manipulate shared data structures like the 
filesystem.

Having the filesystem structures be local to a single task would destroy 
performance on things like filesystem-bound loads on 32-core servers, while 
adding global GC would make it impossible to provide soft and hard real time 
guarantees needed for embedded systems.

So using a MutexARC/RWARC global reference counting scheme with fine grained 
locks like the Linux kernel in fact does (along with RCU, but let's ignore that 
for now) seems the only option, but it's currently not safely possible in the 
current Rust, due to the inability to nest those structures safely with 
checking.

Now it's not totally clear whether full checking of reference cycles is indeed 
viable, but it seems the only way to achieve that, and it seems bothersome that 
one could not write a fast and safe real-time POSIX-implementing kernel in a 
language that is pretty much the only mainstream language that can hope to do 
so.

Even excluding kernels, it seems network servers running on multi-socket 
machines and needing to concurrently mutate complex object graphs would have 
similar issues.

Although for those real-time might not be so essential, so adding optional 
concurrent global GC (like Java has) would be an alternative (but is also quite 
a bit of work).

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-07 Thread Bill Myers
 Restructuring your code to avoid cycles is problematic when you're 
 implementing a platform where the spec allows users to create ownership 
 cycles --- like, say, the Web platform. So if 
 Rust didn't support cyclic ownership, Servo would have to implement its own 
 GC and tracing code just like Gecko does. That's a situation we need to get 
 away from.

Yes, my idea was to essentially extend the compiler with facilities allowing to 
implement RcMutT and RWARCT in such a way that they can be nested without 
creating memory leaks (subject to some restrictions, but less restrictions than 
the blanket inability to nest), with the goal of replacing C++ (or Java, 
perhaps) in high-performance multithreaded servers and kernels.

That would be orthogonal to the issue of what to do with @-pointers and whether 
to have garbage collection (which is indeed very desirable since some programs 
need it).


 An interesting meta-point is that when you're implementing a platform, rather 
 than a specific application, you have much less freedom to structure the 
 implementation because you don't 
 know what any given instance of your system actually does. This affects 
 everything from the requirements on the programming language to the design of 
 graphics APIs.

Yes, although note that while task-local GC is sufficient to implement a 
browser with JavaScript (since JavaScript is single threaded), it would not be 
enough to implement an high-performance Java VM, which requires GC on memory 
simultaneously accessible by multiple threads.

In general, it would be nice to have great support for all 4 models: 
sophisticated local and global reference counting, and local and global GC (as 
well as owned and borrowed pointers).

It's probably reasonable though for Mozilla to mostly focus on the browser use 
case.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] Adding exception handling as syntax sugar with declared exceptions

2013-06-06 Thread Bill Myers


 Date: Thu, 16 May 2013 10:58:28 -0700
 From: gray...@mozilla.com
 To: bill_my...@outlook.com
 CC: rust-dev@mozilla.org
 Subject: Re: [rust-dev] Adding exception handling as syntax sugar with 
 declared exceptions
 
 On 12/05/2013 8:00 PM, Bill Myers wrote:
  This is a suggestion for adding an exception system to Rust that
  satisfies these requirements:
  1. Unwinding does NOT pass through code that is not either in a function
  that declares throwing exceptions or in a try block (instead, that
  triggers task failure)
  2. Callers can ignore the fact that a function throws an exception,
  resulting in task failure
  3. Callers can handle exceptions if desired, with the same syntax of
  C++, Java and C#
 
  (1) avoids the issue with C++ exception handling forcing to think about
  exception safety everywhere
  (2) avoids the issue with Java's checked exceptions forcing to
  explicitly handle or redeclare them
  (3) provides an easy-to-use exception mechanism
 
 Hi, sorry, I did mean to get back to this.
 
 I think this is ... unlikely to work out, or be something we incorporate 
 into our repo. For a few reasons:
 
- It's very late in the design cycle to be making changes of
  this scope. We're trying to stabilize.
 
- It'll be a big performance hit on code paths that want to use
  such exceptions. Things that look like just function calls
  turn into quite extensive operations. We're already struggling
  with the costs of a return-bool solution for unwinding on
  platforms that have difficult normal EH (#4342).
 
- Most seriously: it doesn't actually resolve (1) or (2), imo.
 
  (1) You'd still have to declare checked exceptions everywhere. But
  worse than in java, if you failed to declare them _anywhere_
  you wouldn't get a compilation error, but a runtime failure.
  This is like making C++ default to 'throw ()', which tends
  to surprise everyone.

Well, the idea is that exception would generally either be caught in the 
immediate caller or let go to task failure, with rare cases of propagation.

The target use case are things like str_to_int functions that would throw 
format exceptions that would be caught by the immediate caller in case they 
want to handle it.

  (2) You still have to write in worry about exceptions everywhere
  style inside a try-block or function-that-rethrows. Only
  get to avoid it when you're insulating yourself by the implicit
  throw () declaration.

Yes, but you can't always avoid this anyway, since in some cases operations are 
truly irreversible.

For instance, if you want to make three HTTP POST requests and the second 
fails, then the first will have gone through and cannot be reverted 
automatically in any way, so the logic to handle we did something and then 
failed, leaving us in a half-done way is fundamentally necessary in some case 
regardless of language design.

A limited version of this might be implementable by macros that could 
automatically generate a try_ version of a function with Result return type 
along with a version that fails.


The only other approach that can improve on this is to have support for 
reversible execution.

Basically, one would compile everything in two versions, where one of the 
versions keeps an undo log of memory writes. When a try block is entered, the 
compiler switches to the undo-log-enabled code, and when an exception is 
thrown, the log is played to undo everything. Hardware transactional memory 
like Haswell TSX can also be tried first if available.

Of course, this wouldn't be able to handle external function calls and 
modifications of shared memory via unsafe pointers: this could be handled by 
having a special unsafe construct that passes a closure to unwind the changes.

Calling irreversible functions like I/O would need to cause a transaction 
abort/throw an exception. Also, some things like RWARC modifications would be 
impossible to handle without a full STM implementation (with MVCC and so on).

The biggest drawback of this is that it will double code size and compilation 
time.

I'm not sure if it's worthwhile since it cannot work in general due to I/O, and 
simple cases can be handled with the syntax sugar proposed or doing the same by 
hand with Result return types and so on.


  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


Re: [rust-dev] The future of iterators in Rust

2013-06-06 Thread Bill Myers
Scala has a similar design, with the following traits:
- TraversableOnce: can be internally iterated once (has a foreach() method that 
takes a closure)
- Traversable: can be internally iterated unlimited times (has a foreach() 
method that takes a closure)
- Iterable: can be externally iterated (has an iterator() method that returns 
an Iterator trait)

The way it works is that Iterable extends Traversable, which extends 
TraversableOnce, and the for loop just uses TraversableOnce, and Iterable has a 
default implementation of the TraversableOnce foreach() function using the 
iterator() function.

Also, the Iterator trait itself extends TraversableOnce, implementing foreach() 
by mutating itself.

It might be a good idea to investigate copying this design.

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev


[rust-dev] Statically preventing reference-counted cycles while allowing nested rc pointers

2013-06-06 Thread Bill Myers
Reference counting is generally more desirable than garbage collection, since 
it is simple and deterministic, and avoids scanning the whole heap of the 
program, which causes pauses, destroys caches, prevents effective swapping and 
requires to tolerate increasing memory usage by a multiplicative factor to 
allow GC to be amortized to be linear in the number of dead objects in the 
general case.

However, it cannot support cycles (without destroying its advantages over GC), 
so it is a good idea to statically prevent them to avoid leaks.

Currently Rust has several types of reference-counted pointers, with different 
strategies to do that:
- The @ and @mut pointer, which seem to allow cycles and cause them to leak 
until task termination, although it's sort of supposed to use a GC instead 
eventually or be removed.
- The MutexARC type, which allows cycles but requires unsafe code
- The Rc and ARC types, that avoid cycles by not allowing mutations
- The RcMut and RWARC type, that avoid cycles by not allowing rc pointers in 
the data

The issue is that none of this is optimal, since for instance it is not 
possible to have an RcMutA where A contains an RcMutB but B contains no 
pointers, even though it is perfectly safe to do so, and in general reference 
counted pointers cannot be created for some types like the A in this example.

However, there seems to be a good solution: the following rule extends both the 
no mutations and no rc pointers cases while allowing the above, and allows 
creating rc pointers to any type at the expense of forbidding some mutations:
*** If there is a cycle of pointers (of any kind, including raw, owned and 
borrowed, excluding raw pointers annotated as weak references) or enums 
containing at least one reference-counted pointer, forbid mutating all those 
pointers (even with an mut) unless the object being assigned is the result of 
an expression that doesn't include pointers to previously-created objects (and 
thus cannot point to the pointer), or the value containing the pointer is on 
the stack or owned by a stack slot (and thus is not in an rc box or owned by 
it, and so cannot be pointed to by the assigned value) ***

Here a cycle means a cyclic sequence of structure fields such that each can 
point to an instance of the structure of the next field in the sequence.

So for instance this case would be allowed:
struct A {b: RcB}
struct B {a: OptionRcA}

but the A.a and B.b fields would be immutable even with mut A or mut B 
pointers, and would only be mutable if the A or B structure is on the stack or 
owned from the stack (and thus has no incoming rc pointers).

On the other hand, this would be allowed with no restrictions since the 
points-to relationship of the types is acyclic:
struct A {b: RcMutB)
struct B {c: RcC}
struct C {x: u32}

To implement this, one would need to extend the language with an annotation to 
tell the compiler that the raw pointers inside of Rc/RcMut/ARC/RWARC/etc. are 
to be treated as reference-counted pointers.

The compiler can then perform the static analysis required (do an SCC 
decomposition of a graph containing types, enum members and pointer fields, 
with edges from types to contained types, enum members and pointer fields, from 
pointer fields to the pointed type, from traits to all possible implementing 
types, from enum to enum member, and then forbid modification to pointers in 
any SCC with at least one
 reference-counted pointer)

There are at least two complexities: public trait pointers needs to be assumed 
to be able to point to anything (unless one can see the whole program), while 
generic types will need to be analyzed in their specialized versions, which 
means that some methods would be valid to call only for some values of generic 
type parameters, and that the analysis needs to be done from scratch globally 
for every module compilation.

In addition to all this, weak versions of all rc pointers should be supported 
(to allow weak parent references, for instance), which would require a 
further annotation telling the compiler that the contained unsafe pointer is 
weak and thus does not participate in the pointer cycle analysis.

Also, a keyword can be provided to allow the forbidden mutations at the cost of 
doing a full run-time scan of the assigned value to ensure it doesn't point to 
the object with the field being assigned, and failing or returning an error if 
the check fails (this probably needs to be explicit since it can of course be 
very expensive to do the scan)

Some structures like mutable strong doubly-linked lists would not be allowed by 
this and would instead require an ad-hoc implementation with raw pointers 
(where the list node smart pointer will increase the reference count on both 
the node, and the whole list, and splicing will be O(n)).

This infrastructure should be able to handle most programs without using 
garbage collection, except those that are interpreters of garbage-collected 
languages or 

[rust-dev] Adding exception handling as syntax sugar with declared exceptions

2013-05-12 Thread Bill Myers
This is a suggestion for adding an exception system to Rust that satisfies 
these requirements:
1. Unwinding does NOT pass through code that is not either in a function that 
declares throwing exceptions or in a try block (instead, that triggers task 
failure)
2. Callers can ignore the fact that a function throws an exception, resulting 
in task failure
3. Callers can handle exceptions if desired, with the same syntax of C++, Java 
and C#

(1) avoids the issue with C++ exception handling forcing to think about 
exception safety everywhere
(2) avoids the issue with Java's checked exceptions forcing to explicitly 
handle or redeclare them
(3) provides an easy-to-use exception mechanism

The specification is based on adding some syntax sugar, which is then 
transformed by the compiler into the current Rust syntax, and thus does not 
really alter runtime behavior (but it is also possible to implement this with 
traditional unwinding if desired).

Functions can be declared to throw exceptions, and if the conceptual unwinding 
process would have to pass through a function call that does not declare 
throwing an exception of that type, task failure is invoked instead.

To integrate with the condition system, one can raise a condition instead of 
throwing, and declare the condition type and handlers with throws, allowing 
them to throw exceptions if desired (note that of course this will just result 
in task failure unless the function raising the condition is declared to throw).

It is of course possible to code using this style by hand, but the syntax 
sugar allows to do it in a much terser way, and allows to convert functions 
that use task failure to functions that throw exceptions without changing the 
source code of callers that don't want to handle them.

* New syntax added:

- throws E1, E2, ... modifier on functions and closures
- try/catch block
- try() expression
- throw e statement

* New type declarations added:
enum Result1T, E1
{
  Ok(T),
  Fail(E1)
}

enum Result2T, E1, E2
{
  Ok(T),
  Fail1(E1),
  Fail2(E2)
}

... and so on...

* Transformations to implement the system:

Everywhere:
fn foo() - T throws E1 [similar with closures] = fn foo() - Result1T, E1
fn foo() - T throws E1, E2 [similar with closures] = fn foo() - Result2T, 
E1, E2
try(f(args)) [if f declared with throws] = f(args)
f(args)   [if f returns T and is declared with throws, not inside try()] = 
match f(args) {Ok(x) = x, Fail1(e) = throw e, Fail2(e) = throw e, ...}

In functions declared throws:
return v = return Ok(v)

try {try_body} catch(e1: E1) {catch_body} catch(e2: E2) {catch_body}
=
let mut e1_: OptionE1 = None;
let mut e2_: OptionE2 = None;
'try_: loop {try_body; break;}
if e1_.is_some() { let e1 = e1_.unwrap(); catch1_body }
else if e2_.is_some() { let e2 = e2_.unwrap(); catch2_body }

throw e, if there is any containing try block that has a catch block with 
argument eK with type EK such that e is convertible to EK (taking the innermost 
suitable try block and the first suitable catch block)
= eK_ = Some(e); break 'try_

throw e, if there are no suitable try blocks but the function is declared 
throws and e is convertible to throws type EK
= return FailK(e)

throw e, otherwise, if e implements ToStr:
= fail!(e.to_str())

throw e, otherwise:
= fail!()

  ___
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev