d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread MIURA Masahiro
Hi,

Being happy to see issue 3415 (broken JSON format) fixed,
I have written a utility to convert DMD2's JSON output
to Exuberent Ctags format.  This enables you to tagjump in Vim
and other editors/IDEs.  It's just 150+ lines, thanks to D2's
powerful string handling.  Enjoy!

http://github.com/Dubhead/d2tags

usage:
% dmd -Xftags.json foo.d
% d2tags tags.json  tags


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Steven Schveighoffer
On Wed, 05 May 2010 23:45:50 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



Walter Bright wrote:

Alex Makhotin wrote:
It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3  
kernel 2.6.32.4), to get the actual error messages about the undefined  
identifier.

 Definitely there's a problem.


The problem is the spell checker is O(n*n) on the number of characters  
in the undefined identifier.


That can't be it.  The identifier shown by Alex is only 33 characters.   
O(n^2) is not that slow, especially for smaller variables.  There must be  
other factors you're not considering...


-Steve


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Bernard Helyer
On 06/05/10 22:46, MIURA Masahiro wrote:
 Hi,
 
 Being happy to see issue 3415 (broken JSON format) fixed,
 I have written a utility to convert DMD2's JSON output
 to Exuberent Ctags format.  This enables you to tagjump in Vim
 and other editors/IDEs.  It's just 150+ lines, thanks to D2's
 powerful string handling.  Enjoy!
 
 http://github.com/Dubhead/d2tags
 
 usage:
 % dmd -Xftags.json foo.d
 % d2tags tags.json  tags

Awesome!


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread BCS

Hello Walter,


Walter Bright wrote:


Alex Makhotin wrote:


It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
kernel 2.6.32.4), to get the actual error messages about the
undefined identifier.


Definitely there's a problem.


The problem is the spell checker is O(n*n) on the number of characters
in the undefined identifier.



How about switch algos for long identifiers: you could bucket the knows by 
length and compare histograms on things of similar length. Or maybe just 
turn it off for long names. 


--
... IXOYE





Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Michel Fortin

On 2010-05-05 23:45:50 -0400, Walter Bright newshou...@digitalmars.com said:


Walter Bright wrote:

Alex Makhotin wrote:
It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3 
kernel 2.6.32.4), to get the actual error messages about the undefined 
identifier.


Definitely there's a problem.


The problem is the spell checker is O(n*n) on the number of characters 
in the undefined identifier.


That's an algorithm that can't scale then.

Checking the Levenshtein distance for each known identifier within a 
small difference in length would be a better idea. (Clang is said to 
use the Levenshtein distance, it probably does something of the sort.)


http://en.wikipedia.org/wiki/Levenshtein_distance

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Leandro Lucarella
Steven Schveighoffer, el  6 de mayo a las 07:17 me escribiste:
 On Wed, 05 May 2010 23:45:50 -0400, Walter Bright
 newshou...@digitalmars.com wrote:
 
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core
 processor(CentOS 5.3 kernel 2.6.32.4), to get the actual error
 messages about the undefined identifier.
  Definitely there's a problem.
 
 The problem is the spell checker is O(n*n) on the number of
 characters in the undefined identifier.
 
 That can't be it.  The identifier shown by Alex is only 33
 characters.  O(n^2) is not that slow, especially for smaller
 variables.  There must be other factors you're not considering...

Run a profiler.

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
No existiría el sonido del mar si faltara en la vida oreja y caracol.
-- Ricardo Vaporeso. Cosquín, 1908.


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Leandro Lucarella
MIURA Masahiro, el  6 de mayo a las 19:46 me escribiste:
 Hi,
 
 Being happy to see issue 3415 (broken JSON format) fixed,
 I have written a utility to convert DMD2's JSON output
 to Exuberent Ctags format.  This enables you to tagjump in Vim
 and other editors/IDEs.  It's just 150+ lines, thanks to D2's
 powerful string handling.  Enjoy!
 
 http://github.com/Dubhead/d2tags
 
 usage:
 % dmd -Xftags.json foo.d
 % d2tags tags.json  tags
  % vim -t tags foo.d

Great!

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
Que barbaridad, este país se va cada ves más pa' tras, más pa' tras...
-- Sidharta Kiwi


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Robert Clipsham

On 06/05/10 11:46, MIURA Masahiro wrote:

Hi,

Being happy to see issue 3415 (broken JSON format) fixed,
I have written a utility to convert DMD2's JSON output
to Exuberent Ctags format.  This enables you to tagjump in Vim
and other editors/IDEs.  It's just 150+ lines, thanks to D2's
powerful string handling.  Enjoy!

http://github.com/Dubhead/d2tags

usage:
% dmd -Xftags.json foo.d
% d2tags tags.json  tags


I love it! I don't suppose you have a guide for how to get it set up and 
working in vim do you? I've never managed to get ctags working, even 
with C/C++ :/


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Andrei Alexandrescu
MIURA Masahiro wrote:
 Hi,
 
 Being happy to see issue 3415 (broken JSON format) fixed,
 I have written a utility to convert DMD2's JSON output
 to Exuberent Ctags format.  This enables you to tagjump in Vim
 and other editors/IDEs.  It's just 150+ lines, thanks to D2's
 powerful string handling.  Enjoy!
 
 http://github.com/Dubhead/d2tags
 
 usage:
 % dmd -Xftags.json foo.d
 % d2tags tags.json  tags

Very useful, and a beautiful example of D scripting.

I wonder if this is of enough general utility to warrant inclusion
within the D distribution, along with rdmd. Thoughts?

One small suggestion, Masahiro: you may want to replace the file reading
loop in main() with simply std.file.readText(args[1]).


Andrei


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Pelle

On 05/06/2010 06:48 PM, Andrei Alexandrescu wrote:

I wonder if this is of enough general utility to warrant inclusion
within the D distribution, along with rdmd. Thoughts?



Yes please, rdmd --tags would be great.


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Andrei Alexandrescu

Pelle wrote:

On 05/06/2010 06:48 PM, Andrei Alexandrescu wrote:

I wonder if this is of enough general utility to warrant inclusion
within the D distribution, along with rdmd. Thoughts?



Yes please, rdmd --tags would be great.


I was thinking of including the utility as a separate program.

Andrei


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Leandro Lucarella
Andrei Alexandrescu, el  6 de mayo a las 09:48 me escribiste:
 MIURA Masahiro wrote:
  Hi,
  
  Being happy to see issue 3415 (broken JSON format) fixed,
  I have written a utility to convert DMD2's JSON output
  to Exuberent Ctags format.  This enables you to tagjump in Vim
  and other editors/IDEs.  It's just 150+ lines, thanks to D2's
  powerful string handling.  Enjoy!
  
  http://github.com/Dubhead/d2tags
  
  usage:
  % dmd -Xftags.json foo.d
  % d2tags tags.json  tags
 
 Very useful, and a beautiful example of D scripting.
 
 I wonder if this is of enough general utility to warrant inclusion
 within the D distribution, along with rdmd. Thoughts?

I think it might be better to add support to the common tools, like
exuberant-ctags[1], but having it as part of rdmd or whatever could be
nice too.

[1] http://ctags.sourceforge.net/

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--
A lo que Peperino respondióles: aquel que tenga sabañones que se los
moje, aquel que padece calvicie no padece un osito, no es bueno comer
lechón en día de gastritis, no mezcleis el vino con la sandía, sacad la
basura después de las ocho, en caso de emergencia rompa el vidrio con
el martillo, a cien metros desvio por Pavón.
-- Peperino Pómoro


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Masahiro Nakagawa

On Fri, 07 May 2010 01:48:59 +0900, Andrei Alexandrescu
seewebsiteforem...@erdani.org wrote:


MIURA Masahiro wrote:

Hi,

Being happy to see issue 3415 (broken JSON format) fixed,
I have written a utility to convert DMD2's JSON output
to Exuberent Ctags format.  This enables you to tagjump in Vim
and other editors/IDEs.  It's just 150+ lines, thanks to D2's
powerful string handling.  Enjoy!

http://github.com/Dubhead/d2tags

usage:
% dmd -Xftags.json foo.d
% d2tags tags.json  tags


Very useful, and a beautiful example of D scripting.

I wonder if this is of enough general utility to warrant inclusion
within the D distribution, along with rdmd. Thoughts?


vote++


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Lutger
Andrei Alexandrescu wrote:

 MIURA Masahiro wrote:
 Hi,
 
 Being happy to see issue 3415 (broken JSON format) fixed,
 I have written a utility to convert DMD2's JSON output
 to Exuberent Ctags format.  This enables you to tagjump in Vim
 and other editors/IDEs.  It's just 150+ lines, thanks to D2's
 powerful string handling.  Enjoy!
 
 http://github.com/Dubhead/d2tags
 
 usage:
 % dmd -Xftags.json foo.d
 % d2tags tags.json  tags
 
 Very useful, and a beautiful example of D scripting.
 
 I wonder if this is of enough general utility to warrant inclusion
 within the D distribution, along with rdmd. Thoughts?

Yes it's very useful. How about also including the source in the examples 
directory?


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Bernard Helyer

On 07/05/10 06:30, Lutger wrote:


Yes it's very useful. How about also including the source in the examples
directory?


That's a good idea, seeing as most of the examples are either for 
Windows, or outdated.


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Steven Schveighoffer
On Thu, 06 May 2010 17:07:12 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



Steven Schveighoffer wrote:
That can't be it.  The identifier shown by Alex is only 33 characters.   
O(n^2) is not that slow, especially for smaller variables.  There must  
be other factors you're not considering...


I recompiled dmd with the profiler (-gt switch) which confirmed it.


So a single unknown symbol (from Alex's example) which can be checked  
against each existing symbol in O(n^2) time, takes 40 seconds on a decent  
CPU?  How many other symbols are there?  33^2 == 1089, if there are 1  
symbols, that's 10 million iterations, that shouldn't take 40 seconds to  
run, should it?  Are there more symbols to compare against?  Do you use  
heuristics to prune the search?  For example, if the max distance is 2,  
and the difference in length between two strings is 2, you should be able  
to return immediately.


-Steve


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Walter Bright

Steven Schveighoffer wrote:
On Thu, 06 May 2010 17:07:12 -0400, Walter Bright 
newshou...@digitalmars.com wrote:



Steven Schveighoffer wrote:
That can't be it.  The identifier shown by Alex is only 33 
characters.  O(n^2) is not that slow, especially for smaller 
variables.  There must be other factors you're not considering...


I recompiled dmd with the profiler (-gt switch) which confirmed it.


So a single unknown symbol (from Alex's example) which can be checked 
against each existing symbol in O(n^2) time, takes 40 seconds on a 
decent CPU?  How many other symbols are there?  33^2 == 1089, if there 
are 1 symbols, that's 10 million iterations, that shouldn't take 40 
seconds to run, should it?  Are there more symbols to compare against?  
Do you use heuristics to prune the search?  For example, if the max 
distance is 2, and the difference in length between two strings is 2, 
you should be able to return immediately.


Check out the profiler output. It's clearly the vast number of calls to the 
symbol lookup, not the time spent in each call.


-
  Num  TreeFuncPer
  CallsTimeTimeCall

50409318   632285778   145858160   2 Dsymbol *syscall 
ScopeDsymbol::search(Loc ,Identifier *,int )
50411264   131394915   106356855   2 void **syscall 
StringTable::search(char const *,unsigned )
50409329   341960075   105532978   2 Dsymbol *syscall 
DsymbolTable::lookup(Identifier *)
50409329   236427096   105037393   2 StringValue *syscall 
StringTable::lookup(char const *,unsigned )
12602340   61389061967393753   5 Dsymbol *syscall 
Scope::search(Loc ,Identifier *,Dsymbol **)
12602178   69391519766918360   5 void *cdecl 
scope_search_fp(void *,char const *)
25204505   46135292052529164   2 Dsymbol *syscall 
Module::search(Loc ,Identifier *,int )
504121372503847425038474   0 unsigned cdecl 
Dchar::calcHash(char const *,unsigned )
   3520  1428323068203493755781 void *cdecl spellerX(char const 
*,void *cdecl (*)(void *,char const *),void *,char const *,int )
12602664 6811916 6811916   0 syscall 
Identifier::Identifier(char const *,int )

12602178 6299089 6299089   0 void cdecl Module::clearCache()
12602183 6151175 6151175   0 Module *syscall 
Module::isModule()
   1600   113294261   2 StringValue *syscall 
StringTable::update(char const *,unsigned )


-


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Lionello Lunesu
On 6-5-2010 22:37, Michel Fortin wrote:
 On 2010-05-05 23:45:50 -0400, Walter Bright newshou...@digitalmars.com
 said:
 
 Walter Bright wrote:
 Alex Makhotin wrote:
 It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
 kernel 2.6.32.4), to get the actual error messages about the
 undefined identifier.

 Definitely there's a problem.

 The problem is the spell checker is O(n*n) on the number of characters
 in the undefined identifier.
 
 That's an algorithm that can't scale then.
 
 Checking the Levenshtein distance for each known identifier within a
 small difference in length would be a better idea. (Clang is said to use
 the Levenshtein distance, it probably does something of the sort.)
 
 http://en.wikipedia.org/wiki/Levenshtein_distance
 
and especially this line:

# If we are only interested in the distance if it is smaller than a
threshold k, then it suffices to compute a diagonal stripe of width 2k+1
in the matrix. In this way, the algorithm can be run in O(kl) time,
where l is the length of the shortest string.


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Brad Roberts
On Fri, 7 May 2010, Lionello Lunesu wrote:

 On 6-5-2010 22:37, Michel Fortin wrote:
  On 2010-05-05 23:45:50 -0400, Walter Bright newshou...@digitalmars.com
  said:
  
  Walter Bright wrote:
  Alex Makhotin wrote:
  It takes ~40 seconds 50% load on the dual core processor(CentOS 5.3
  kernel 2.6.32.4), to get the actual error messages about the
  undefined identifier.
 
  Definitely there's a problem.
 
  The problem is the spell checker is O(n*n) on the number of characters
  in the undefined identifier.
  
  That's an algorithm that can't scale then.
  
  Checking the Levenshtein distance for each known identifier within a
  small difference in length would be a better idea. (Clang is said to use
  the Levenshtein distance, it probably does something of the sort.)
  
  http://en.wikipedia.org/wiki/Levenshtein_distance
  
 and especially this line:
 
 # If we are only interested in the distance if it is smaller than a
 threshold k, then it suffices to compute a diagonal stripe of width 2k+1
 in the matrix. In this way, the algorithm can be run in O(kl) time,
 where l is the length of the shortest string.

The source for this is pretty isolated.. anyone want to volunteer take a 
shot at improving this part of dmd?

Later,
Brad


Re: dmd 1.060 and 2.045 release

2010-05-06 Thread Walter Bright

Walter Bright wrote:

I recompiled dmd with the profiler (-gt switch) which confirmed it.


For those interested, try out changeset 470.


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread MIURA Masahiro
On 05/07/2010 01:48 AM, Andrei Alexandrescu wrote:
 I wonder if this is of enough general utility to warrant inclusion
 within the D distribution, along with rdmd. Thoughts?

That's my pleasure, actually!

 One small suggestion, Masahiro: you may want to replace the file reading
 loop in main() with simply std.file.readText(args[1]).

Done, thank you for the advice.


I'm considering an enhancement:  d2tags DIRECTORY reads
all JSON files in the directory.  However I'm not sure if it
should recurse into subdirectories.


Re: d2tags - converts DMD2's JSON output to Exuberant Ctags format

2010-05-06 Thread Ali Çehreli

MIURA Masahiro wrote:


I'm considering an enhancement:  d2tags DIRECTORY reads
all JSON files in the directory.  However I'm not sure if it
should recurse into subdirectories.


I think simpler is better. There are already tools like find on all 
Linux shells that could do the recursion.


Ali