Re: Visual D 0.3.32 maintenance release

2012-05-14 Thread Rainer Schuetze


On 5/13/2012 3:25 PM, Ary Manzana wrote:

On 5/13/12 7:31 PM, Rainer Schuetze wrote:

With the workflow of bugzilla/svn it was just copy and pasting the diff
into the bug report. I understand it is easier on Walter's side, though.


But where did you get the diff from? I'm sure you checked out the
project and made the changes on it. If that's the case, then it's the
same as forking and cloning.


With small patches to a single file (which is what most patches are), it 
was just the diff to the svn working base that you could copy and paste 
from a shell context menu command. You could even adjust the diff 
manually to filter out other unrelated changes. With pull requests you 
have to redo the patch on a clean branch of the full source tree. 
Maintaining larger patches did get messy, though.




I *do* expect contributions to appear in Visual D. Since it's so easy to
contribute in github, and it is standarized: people know how to do it:
fork, work, make a pull request (as opposed to making a patch, sending
it... mmm... is that the author's email? I hope it does work. And I hope
it checks emails and mine doesn't go to the spam folder! Um, maybe I
should post in the forums... but does he read them? Ah, maybe I will
leave the patch for another day).


Well, the bug-tracking system is/was probably the right place. But I 
agree, the infrastructure provided by github is very impressive and 
might be more attractive to contributors.




Re: Visual D 0.3.32 maintenance release

2012-05-14 Thread Rainer Schuetze



On 5/11/2012 9:49 PM, Walter Bright wrote:

On 5/1/2012 9:46 AM, Rainer Schuetze wrote:

The Visual D installer can be downloaded from its website at
http://www.dsource.org/projects/visuald


Can you please move it to github?



I will give it a try...


Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Roman D. Boiko

On Saturday, 12 May 2012 at 03:32:20 UTC, Ary Manzana wrote:
I think you are wasting much more memory and performance by 
storing all the tokens in the lexer.


Imagine I want to implement a simple syntax highlighter: just 
highlight keywords. How can I tell DCT to *not* store all 
tokens because I need each one in turn? And since I'll be 
highlighting in the editor I will need column and line 
information. That means I'll have to do that O(log(n)) 
operation for every token.


So you see, for the simplest use case of a lexer the 
performance of DCT is awful.


Now imagine I want to build an AST. Again, I consume the tokens 
one by one, probably peeking in some cases. If I want to store 
line and column information I just copy them to the AST. You 
say the tokens are discarded but their data is not, and that's 
why their data is usually copied.


Currently I think about making token a class instead of struct.

A token (from 
https://github.com/roman-d-boiko/dct/blob/master/fe/core.d) is:


// Represents lexed token
struct Token
{
size_t startIndex; // position of the first code unit in the 
source string
string spelling; // characters from which this token has been 
lexed
TokenKind kind; // enum; each keyword and operator, have a 
dedicated kind
ubyte annotations; // meta information like whether a token 
is valid, or an integer literal is signed, long, hexadecimal, etc.

}

Making it a class would give several benefits:

* allow not to worry about allocating a big array of tokens. 
E.g., on 64-bit OS the largest module in Phobos (IIRC, the 
std.datetime) consumes 13.5MB in an array of almost 500K tokens. 
It would require 4 times smaller chunk of contiguous memory if it 
was an array of class objects, because each would consume only 8 
bytes instead of 32.


* allow subclassing, for example, for storing strongly typed 
literal values; this flexibility could also facilitate future 
extensibility (but it's difficult to predict which kind of 
extension may be needed)


* there would be no need to copy data from tokens into AST, 
passing an object would be enough (again, copy 8 instead of 32 
bytes); the same applies to passing into methods - no need to 
pass by ref to minimise overhead


It would incur some additional memory overhead (at least 8 bytes 
per token), but that's hardly significant. Also there is 
additional price for accessing token members because of 
indirection, and, possibly, worse cache friendliness (token 
instances may be allocated anywhere in memory, not close to each 
other).


These considerations are mostly about performance. I think there 
is also some impact on design, but couldn't find anything 
significant (given that currently I see a token as merely a 
datastructure without associated behavior).


Could anybody suggest other pros and cons? Which option would you 
choose?


Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Roman D. Boiko

On Monday, 14 May 2012 at 15:00:37 UTC, Roman D. Boiko wrote:
Could anybody suggest other pros and cons? Which option would 
you choose?
Further discussion on this topic (struct vs class) is at 
http://forum.dlang.org/thread/asdrqlaydzcdpqwsb...@forum.dlang.org


Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Roman D. Boiko

On Monday, 14 May 2012 at 16:30:21 UTC, deadalnix wrote:

Le 14/05/2012 17:00, Roman D. Boiko a écrit :

Making it a class would give several benefits:
* allow not to worry about allocating a big array of tokens. 
E.g., on
64-bit OS the largest module in Phobos (IIRC, the 
std.datetime) consumes
13.5MB in an array of almost 500K tokens. It would require 4 
times
smaller chunk of contiguous memory if it was an array of class 
objects,

because each would consume only 8 bytes instead of 32.

Why is this a benefice ?
NNTP error: 400 load at 23.60, try later prevented me from 
answering :)


Because it might be difficult to find a big chunk of available
memory (3.5M vs 14M for this particular case).

* allow subclassing, for example, for storing strongly typed 
literal
values; this flexibility could also facilitate future 
extensibility (but
it's difficult to predict which kind of extension may be 
needed)




I'm pretty sure that D's token will not change that much. If 
the need isn't identified right know, I'd advocate for YAGNI.

Agree.

* there would be no need to copy data from tokens into AST, 
passing an
object would be enough (again, copy 8 instead of 32 bytes); 
the same
applies to passing into methods - no need to pass by ref to 
minimise

overhead



Yes but now you add pressure on the GC and add indirections. 
I'm not sure it worth it. It seems to me like a premature 
optimization.

It looks so. Thanks.

It would incur some additional memory overhead (at least 8 
bytes per
token), but that's hardly significant. Also there is 
additional price
for accessing token members because of indirection, and, 
possibly, worse
cache friendliness (token instances may be allocated anywhere 
in memory,

not close to each other).

These considerations are mostly about performance. I think 
there is also
some impact on design, but couldn't find anything significant 
(given

that currently I see a token as merely a datastructure without
associated behavior).

Could anybody suggest other pros and cons? Which option would 
you choose?


You are over engineering the whole stuff.

I'm trying to solve this and other tradeoffs. I'd like to
simplify but satisfy my design goals.


Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Tove

On Monday, 14 May 2012 at 16:58:42 UTC, Roman D. Boiko wrote:

You are over engineering the whole stuff.

I'm trying to solve this and other tradeoffs. I'd like to
simplify but satisfy my design goals.


What if there were two different lex:er modes... with different 
struct:s.


1. For an IDE with on the fly lexing:
  Assumption, the error rate is high.(need to keep much info)

2. For the compiler
Assumption, the error rate is non existent, and if there is an 
error it really doesn't matter if it's slow.


So... when choosing the compiler mode... and there actually is 
an error, then just lex it again, to produce a pretty error 
message ;)


try
{
  lex(mode.compiler);
}
catch
{
  lex(mode.ide); // calculates column etc. what ever info it 
needs.

}



Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Roman D. Boiko

On Monday, 14 May 2012 at 19:04:20 UTC, Tove wrote:

On Monday, 14 May 2012 at 16:58:42 UTC, Roman D. Boiko wrote:

You are over engineering the whole stuff.

I'm trying to solve this and other tradeoffs. I'd like to
simplify but satisfy my design goals.


What if there were two different lex:er modes... with different 
struct:s.


1. For an IDE with on the fly lexing:
  Assumption, the error rate is high.(need to keep much info)

2. For the compiler
Assumption, the error rate is non existent, and if there is an 
error it really doesn't matter if it's slow.


So... when choosing the compiler mode... and there actually 
is an error, then just lex it again, to produce a pretty error 
message ;)


try
{
  lex(mode.compiler);
}
catch
{
  lex(mode.ide); // calculates column etc. what ever info it 
needs.

}

So far it doesn't seem expensive to tolerate errors and proceed.
The only thing I miss is some sort of specification when to stop
including characters into token spelling and start a new token. I
don't think I'll use backtracking for that in the nearest future.
If I did, I would really separate part of lexer and provide two
implementations for that part. Given this, accepting errors and
moving on simply requires some finite set of rules about
boundaries of invalid tokens. I also think structural code
editing concepts will help here, but I didn't do any research on
this topic yet.

The problem with multiple lexer implementations is that it might
become much more difficult to maintain them.


Re: DCT: D compiler as a collection of libraries

2012-05-14 Thread Roman D. Boiko

On Monday, 14 May 2012 at 19:13:39 UTC, Roman D. Boiko wrote:

On Monday, 14 May 2012 at 19:04:20 UTC, Tove wrote:
What if there were two different lex:er modes... with 
different struct:s.


1. For an IDE with on the fly lexing:
 Assumption, the error rate is high.(need to keep much info)

2. For the compiler
Assumption, the error rate is non existent, and if there is an 
error it really doesn't matter if it's slow.


So... when choosing the compiler mode... and there actually 
is an error, then just lex it again, to produce a pretty error 
message ;)

...

The problem with multiple lexer implementations is that it might
become much more difficult to maintain them.

Just to clarify: different modes in lexer in my view are like two
different implementations combined in a non-trivial way (unless
the difference is minor). So complexity goes from two factors:
different implementations and how to combine them. I try to avoid
this.