Re: Visual D 0.3.32 maintenance release
On 5/13/2012 3:25 PM, Ary Manzana wrote: On 5/13/12 7:31 PM, Rainer Schuetze wrote: With the workflow of bugzilla/svn it was just copy and pasting the diff into the bug report. I understand it is easier on Walter's side, though. But where did you get the diff from? I'm sure you checked out the project and made the changes on it. If that's the case, then it's the same as forking and cloning. With small patches to a single file (which is what most patches are), it was just the diff to the svn working base that you could copy and paste from a shell context menu command. You could even adjust the diff manually to filter out other unrelated changes. With pull requests you have to redo the patch on a clean branch of the full source tree. Maintaining larger patches did get messy, though. I *do* expect contributions to appear in Visual D. Since it's so easy to contribute in github, and it is standarized: people know how to do it: fork, work, make a pull request (as opposed to making a patch, sending it... mmm... is that the author's email? I hope it does work. And I hope it checks emails and mine doesn't go to the spam folder! Um, maybe I should post in the forums... but does he read them? Ah, maybe I will leave the patch for another day). Well, the bug-tracking system is/was probably the right place. But I agree, the infrastructure provided by github is very impressive and might be more attractive to contributors.
Re: Visual D 0.3.32 maintenance release
On 5/11/2012 9:49 PM, Walter Bright wrote: On 5/1/2012 9:46 AM, Rainer Schuetze wrote: The Visual D installer can be downloaded from its website at http://www.dsource.org/projects/visuald Can you please move it to github? I will give it a try...
Re: DCT: D compiler as a collection of libraries
On Saturday, 12 May 2012 at 03:32:20 UTC, Ary Manzana wrote: I think you are wasting much more memory and performance by storing all the tokens in the lexer. Imagine I want to implement a simple syntax highlighter: just highlight keywords. How can I tell DCT to *not* store all tokens because I need each one in turn? And since I'll be highlighting in the editor I will need column and line information. That means I'll have to do that O(log(n)) operation for every token. So you see, for the simplest use case of a lexer the performance of DCT is awful. Now imagine I want to build an AST. Again, I consume the tokens one by one, probably peeking in some cases. If I want to store line and column information I just copy them to the AST. You say the tokens are discarded but their data is not, and that's why their data is usually copied. Currently I think about making token a class instead of struct. A token (from https://github.com/roman-d-boiko/dct/blob/master/fe/core.d) is: // Represents lexed token struct Token { size_t startIndex; // position of the first code unit in the source string string spelling; // characters from which this token has been lexed TokenKind kind; // enum; each keyword and operator, have a dedicated kind ubyte annotations; // meta information like whether a token is valid, or an integer literal is signed, long, hexadecimal, etc. } Making it a class would give several benefits: * allow not to worry about allocating a big array of tokens. E.g., on 64-bit OS the largest module in Phobos (IIRC, the std.datetime) consumes 13.5MB in an array of almost 500K tokens. It would require 4 times smaller chunk of contiguous memory if it was an array of class objects, because each would consume only 8 bytes instead of 32. * allow subclassing, for example, for storing strongly typed literal values; this flexibility could also facilitate future extensibility (but it's difficult to predict which kind of extension may be needed) * there would be no need to copy data from tokens into AST, passing an object would be enough (again, copy 8 instead of 32 bytes); the same applies to passing into methods - no need to pass by ref to minimise overhead It would incur some additional memory overhead (at least 8 bytes per token), but that's hardly significant. Also there is additional price for accessing token members because of indirection, and, possibly, worse cache friendliness (token instances may be allocated anywhere in memory, not close to each other). These considerations are mostly about performance. I think there is also some impact on design, but couldn't find anything significant (given that currently I see a token as merely a datastructure without associated behavior). Could anybody suggest other pros and cons? Which option would you choose?
Re: DCT: D compiler as a collection of libraries
On Monday, 14 May 2012 at 15:00:37 UTC, Roman D. Boiko wrote: Could anybody suggest other pros and cons? Which option would you choose? Further discussion on this topic (struct vs class) is at http://forum.dlang.org/thread/asdrqlaydzcdpqwsb...@forum.dlang.org
Re: DCT: D compiler as a collection of libraries
On Monday, 14 May 2012 at 16:30:21 UTC, deadalnix wrote: Le 14/05/2012 17:00, Roman D. Boiko a écrit : Making it a class would give several benefits: * allow not to worry about allocating a big array of tokens. E.g., on 64-bit OS the largest module in Phobos (IIRC, the std.datetime) consumes 13.5MB in an array of almost 500K tokens. It would require 4 times smaller chunk of contiguous memory if it was an array of class objects, because each would consume only 8 bytes instead of 32. Why is this a benefice ? NNTP error: 400 load at 23.60, try later prevented me from answering :) Because it might be difficult to find a big chunk of available memory (3.5M vs 14M for this particular case). * allow subclassing, for example, for storing strongly typed literal values; this flexibility could also facilitate future extensibility (but it's difficult to predict which kind of extension may be needed) I'm pretty sure that D's token will not change that much. If the need isn't identified right know, I'd advocate for YAGNI. Agree. * there would be no need to copy data from tokens into AST, passing an object would be enough (again, copy 8 instead of 32 bytes); the same applies to passing into methods - no need to pass by ref to minimise overhead Yes but now you add pressure on the GC and add indirections. I'm not sure it worth it. It seems to me like a premature optimization. It looks so. Thanks. It would incur some additional memory overhead (at least 8 bytes per token), but that's hardly significant. Also there is additional price for accessing token members because of indirection, and, possibly, worse cache friendliness (token instances may be allocated anywhere in memory, not close to each other). These considerations are mostly about performance. I think there is also some impact on design, but couldn't find anything significant (given that currently I see a token as merely a datastructure without associated behavior). Could anybody suggest other pros and cons? Which option would you choose? You are over engineering the whole stuff. I'm trying to solve this and other tradeoffs. I'd like to simplify but satisfy my design goals.
Re: DCT: D compiler as a collection of libraries
On Monday, 14 May 2012 at 16:58:42 UTC, Roman D. Boiko wrote: You are over engineering the whole stuff. I'm trying to solve this and other tradeoffs. I'd like to simplify but satisfy my design goals. What if there were two different lex:er modes... with different struct:s. 1. For an IDE with on the fly lexing: Assumption, the error rate is high.(need to keep much info) 2. For the compiler Assumption, the error rate is non existent, and if there is an error it really doesn't matter if it's slow. So... when choosing the compiler mode... and there actually is an error, then just lex it again, to produce a pretty error message ;) try { lex(mode.compiler); } catch { lex(mode.ide); // calculates column etc. what ever info it needs. }
Re: DCT: D compiler as a collection of libraries
On Monday, 14 May 2012 at 19:04:20 UTC, Tove wrote: On Monday, 14 May 2012 at 16:58:42 UTC, Roman D. Boiko wrote: You are over engineering the whole stuff. I'm trying to solve this and other tradeoffs. I'd like to simplify but satisfy my design goals. What if there were two different lex:er modes... with different struct:s. 1. For an IDE with on the fly lexing: Assumption, the error rate is high.(need to keep much info) 2. For the compiler Assumption, the error rate is non existent, and if there is an error it really doesn't matter if it's slow. So... when choosing the compiler mode... and there actually is an error, then just lex it again, to produce a pretty error message ;) try { lex(mode.compiler); } catch { lex(mode.ide); // calculates column etc. what ever info it needs. } So far it doesn't seem expensive to tolerate errors and proceed. The only thing I miss is some sort of specification when to stop including characters into token spelling and start a new token. I don't think I'll use backtracking for that in the nearest future. If I did, I would really separate part of lexer and provide two implementations for that part. Given this, accepting errors and moving on simply requires some finite set of rules about boundaries of invalid tokens. I also think structural code editing concepts will help here, but I didn't do any research on this topic yet. The problem with multiple lexer implementations is that it might become much more difficult to maintain them.
Re: DCT: D compiler as a collection of libraries
On Monday, 14 May 2012 at 19:13:39 UTC, Roman D. Boiko wrote: On Monday, 14 May 2012 at 19:04:20 UTC, Tove wrote: What if there were two different lex:er modes... with different struct:s. 1. For an IDE with on the fly lexing: Assumption, the error rate is high.(need to keep much info) 2. For the compiler Assumption, the error rate is non existent, and if there is an error it really doesn't matter if it's slow. So... when choosing the compiler mode... and there actually is an error, then just lex it again, to produce a pretty error message ;) ... The problem with multiple lexer implementations is that it might become much more difficult to maintain them. Just to clarify: different modes in lexer in my view are like two different implementations combined in a non-trivial way (unless the difference is minor). So complexity goes from two factors: different implementations and how to combine them. I try to avoid this.