On 12/23/12 10:43 AM, Michael Neumann wrote:
Hi,

I've spent the last days hacking in Rust and a few questions and ideas
have accumulated over that time.

* If I use unique ~pointers, there is absolutely no runtime overhead,
   so neither ref-counting nor GC is involved, right?

Well, you have to malloc and free, but I assume you aren't counting that. There is no reference counting or GC, and the GC is totally unaware of such pointers.

(There is one caveat: when we have tracing GC, it must scan ~ pointers that contain @ pointers, just as it must scan the stack. Such pointers are generally uncommon though.)

* Heap-allocated pointers incur ref-counting. So when I pass a
   @pointer, I will basically pass a

     struct heap_ptr {ptr: *byte, cnt: uint}

   around. Right?

They currently do thread-unsafe reference counting, but we would like to eventually change that to tracing GC. However, the structure is different: we use intrusive reference counting, so it's actually a pointer to this structure:

pointer --> [ ref count, type_info, next alloc, prev alloc, data... ]

You're only passing one word around, not two. The reference count is inside the object pointed to. This setup saves one allocation over C++ std::shared_ptr.

* vec::build_sized() somehow seems to be pretty slow. When I use it,
   instead of a for() loop, my rust-msgpack library slows down by
   factor 2 for loading msgpack data.

   Also, I would have expected that vec::build_sized() will call my
   supplied function "n" times. IMHO the name is little bit
   misleading here.

You want vec::from_fn() instead. vec::build_sized() is not commonly used and could probably be renamed without too much trouble.

I suspect the performance problem you're seeing with it is due to not supplying enough LLVM inline hints. LLVM's inline heuristics are not well tuned to Rust at the moment; we work around it by writing #[inline(always)] in a lot of places, but we should probably have the compiler insert those automatically for certain uses of higher-order functions. When LLVM inlines properly, the higher-order functions generally compile down into for loops.

* I do not fully understand the warning of the following script:

   fn main() {
     let bytes =
       io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
   }

   t2.rs:2:14: 2:78 warning: instantiating copy type parameter with a not
   implicitly copyable type t2.rs:2   let bytes =
   io::read_whole_file(&path::Path("/tmp/matching.msgpack")).get();
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   Does it mean that it will copy the ~str again? When I use pattern
   matching instead of get(), I don't get this warning, but it seems to
   be slower. Will it just silence the warning???

Yes, it means it will copy the string again. To avoid this, you want result::unwrap() or option::unwrap() instead. I've been thinking for some time that .unwrap() should change to .get() and .get() should change to .copy_value() or something.


* This is also strange to me:

   fn nowarn(bytes: &[u8]) {}

   fn main() {
     let bytes = ~[1,2,3];
     nowarn(bytes);
     let br = io::BytesReader { bytes: bytes, pos: 0 }; // FAILS
   }

   t.rs:6:36: 6:41 error: mismatched types: expected `&/[u8]` but found
   `~[u8]` ([] storage differs: expected & but found ~) t.rs:6   let br =
   io::BytesReader { bytes: bytes, pos: 0 }; ^~~~~

   It implicitly converts the ~pointer into a borrowed pointer when
   calling the function, but the same does not work when using the
   BytesReader struct. I think, I should use a make_bytes_reader
   function, but I didn't found one.

This is a missing feature that should be in the language. Struct literals are basically just like functions; their fields should cause coercions as well.

* String literals seem to be not immutable. Is that right. That means
   they are always "heap" allocated. I wished they were immutable, so
   that writing ~"my string" is stored in read-only memory.

~"my string" isn't designed to be stored in read-only memory. You want `&static/str` instead; since it's a borrowed pointer, it cannot be mutated. `static` is the read-only memory region.

   Is there a way how a function which takes a ~str can state that it
   will not modify the content?

Take an `&str` (or an `&~str`) instead. Functions that take `~str` require that the caller give up its ownership of the string. If you, the caller, give up a string, then you give up your say in how it is used, including mutability. However, if you as the callee *borrow* the string via `&str` or `&~str`, then you are not allowed to change its mutability, since you are not the owner.

   In this regard I very much like the way the D language handles this.
   It uses "const" to state that it won't modify the value, while the
   value itself may be mutable. Then there is "immutable", and a value
   declared as such will not change during the whole lifetime.

We have a "const" qualifier as well, which means what "const" does in D. It has a good chance of becoming redundant and going away, however, with the changes suggested in the blog post "Imagine Never Hearing the Words 'Aliasable, Mutable' Again".

Having the ability to declare that a data type is forever immutable is something we've talked about a lot, but we'd have to add a lot of type system machinery for it to be as flexible as we'd like. Being able to arbitrarily freeze and thaw deep data structures is a very powerful feature, and having data types specify that they must be immutable forever is at odds with that. (For that matter, the `mut` keyword on struct fields is at odds with that in the other direction, which is why I'd like to get rid of that too.)

   Of course in Rust, thanks to unique pointers, there is less need for
   immutability, as you cannot share a unique pointer between threads.

You can share unique pointers between threads with an ARC data type, actually (in `std::arc`). The ARC demands that the pointer be immutable and will not allow it to be mutated.

* Appending to strings. It's easy to push an element to an array by
   doing:

   let mut v: ~[int] = ~[1,2];
   v.push(3);
   v.push(4);

   But when I want to append to a string, I have to write:

   let mut s: ~str = ~"";
   let mut s = str::append(s, "abc");
   let mut s = str::append(s, "def");

   I found this a bit counter-intuitive. I know there exists "+=", but
   this will always create a new string. A "<<" operator would be really
   nice to append to strings (or to arrays).

There should probably be a ".append()" method on strings with a "&mut self" argument. Then you could write:

    let mut s = ~"";
    s.append("abc");
    s.append("def");

There are also plans to make "+=" separately overloadable. This would allow += to work in this case, I believe.

* Default initializers for structs. Would be nice to specify them like:

   struct S {a: int = 4, b: int = 3};

   I know I can use the ".." notation, and this is very cool and more
   flexible, but I will have to type in a lot of code if the struct get
   pretty large.

   const DefaultS = S{a: 4, b: 3}; // imagine this has 100 fields :)
   let s = S{a: 4, ..DefaultS};

Perhaps. This might be a good job for a macro at first, then we can see about folding it into the language if it's widely used.

* Metaprogramming

   Given an arbitrary struct S {...} with some fields, it would be nice
   to somehow derive S.serialize and S.deserialize functions
   automatically. Are there any ideas how to do that? In C++ I use the
   preprocessor and templates for that. In D, thanks to
   compile-time-code-evaluation, I can write code that will introspect
   the struct during compile-time and then generate code.

There are #[auto_encode] and #[auto_decode] syntax extensions that exist already, actually (although the documentation is almost nonexistent). These are polymorphic over the actual serialization method, so you can choose the actual serialization format. There is also a visitor you can use for reflection, although it will be slower than generating the code at compile time.

We currently have syntax extensions written as compiler plugins. These allow you to write any code you want and have it executed at compile time. There are two main issues with them at the moment: (1) they have to be compiled as part of the compiler itself; (2) they expose too many internals of the `rustc` compiler, making your code likely to break when we change the compiler (or on alternative compilers implementing the Rust language, if they existed). The plan to fix (1) is to allow plugins to be written as separate crates and dynamically loaded; we've also talked about, longer-term, allowing them to be JIT'd, allowing you to execute any code you wish at compile time. The plan to fix (2) is to make the syntax extensions operate on token trees, not AST nodes, basically along the lines of Scheme syntax objects.

   I guess I could write a macro like:

   define_ser_struct!(S, field1, int, field2, uint, ...)

   which would generate the struct S and two functions for
   serialization. Would that be possible with macros?

Yes, you should be able do this with macros today, now that macros can expand to items.

Patrick

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to