Re: [Geany-devel] tagmanager changes - tree refactoring

2012-06-07 Thread Colomban Wendling
Le 07/06/2012 17:59, Nick Treleaven a écrit :
 On 06/06/2012 15:42, Nick Treleaven wrote:
 OK, I tried building it and found some fixes are necessary for the
 Windows makefiles - I'll commit these to a branch. After those changes,
 it seems to build fine.

 Now pushed to:
 https://github.com/ntrel/geany/commits/tm/tree-refactoring

   Thanks for doing this, the structure is much better than before.
   However, the build changes are quite substantial, so I think it
 might be
   best not to include them in the 1.22 release in case it introduces any
   problems.
  
   Also, I found the problem with suppressed gcc warnings in tagmanager
   code - it was the hardcoded -w switch (which I read as -W)!

Oh great, tricky to find out!

   Unfortunately this commit will conflict with the tree refactoring. I
   will try to resolve the conflicts so we can merge the branch, perhaps
   after the 1.22 release?

 I also have a ctags/c.c fix I'd like to include in the release, so
 please let me know whether to delay it until merging this branch,
 although as I said, it might be better to leave this out of the release.
 I'm still happy to try to fix the conflicts myself, even if we do delay
 the merge.
 
 I've now merged in master to my branch, same URL - surprisingly git
 didn't report any conflicts, I think because it recognizes moved files,
 even when there is some changed content, e.g. tagmanager/makefile.win32
 - tagmanager/ctags/makefile.win32. So all I had to do was make my CFLAG
 changes to the new file tagmanager/src/makefile.win32.
 
 I don't mind if you want to merge my branch before the release, I just
 wanted to raise the issue. Or shall I merge it now?

No you're right, it's quite large and maybe have hidden problems
(although I doubt so but only testing will show off), after the release
is better. And it's not something users mind about, so they won't notice
either way anyway.

So just push your fix to current master and we will merge the branch
after the release.  As you said, Git seems to be clever enough to deal
with the renames anyway so it shouldn't cause too much trouble merging
later anyway.

And thanks for the tests and fixes :)

Regards,
Colomban
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-09 Thread Thomas Martitz

Am 09.05.2012 07:47, schrieb Lex Trotman:

Using ctags, including locals in the tags generated from Geany source,
slightly more than doubled the number of tags, and for some C++ I have
around nearly four times the number.


But you only need the tags for the current scope and can drop them if 
you enter another (non-nested) scope. This surely doesn't double or 
quadruple the tags.


If I'm editing func A I don't want the locals of func B through Z in my 
autocompletion list.


Best regards.
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-09 Thread Lex Trotman
On 9 May 2012 16:54, Thomas Martitz
thomas.mart...@student.htw-berlin.de wrote:
 Am 09.05.2012 07:47, schrieb Lex Trotman:

 Using ctags, including locals in the tags generated from Geany source,
 slightly more than doubled the number of tags, and for some C++ I have
 around nearly four times the number.


 But you only need the tags for the current scope and can drop them if you
 enter another (non-nested) scope. This surely doesn't double or quadruple
 the tags.

You can't drop them from the tags structures because when you are
parsing you don't know which scope the cursor is in.  So you have to
add them all, then decide which ones apply to the current scope.


 If I'm editing func A I don't want the locals of func B through Z in my
 autocompletion list.

Yes, correct, but they have to be in the tags first, then tagmanager
has to be taught to choose the in-scope declaration, thats my point
the parsers do not generate the information and tagmanager/symbols.c
doesn't know how to use it

Cheers
Lex



 Best regards.

 ___
 Geany-devel mailing list
 Geany-devel@uvena.de
 https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-09 Thread Thomas Martitz

Am 09.05.2012 09:37, schrieb Lex Trotman:

On 9 May 2012 16:54, Thomas Martitz
thomas.mart...@student.htw-berlin.de  wrote:

Am 09.05.2012 07:47, schrieb Lex Trotman:


Using ctags, including locals in the tags generated from Geany source,
slightly more than doubled the number of tags, and for some C++ I have
around nearly four times the number.


But you only need the tags for the current scope and can drop them if you
enter another (non-nested) scope. This surely doesn't double or quadruple
the tags.

You can't drop them from the tags structures because when you are
parsing you don't know which scope the cursor is in.  So you have to
add them all, then decide which ones apply to the current scope.


Okay, but still only for the current file and not an entire project.

Best regards.
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-09 Thread Lex Trotman
On 9 May 2012 17:40, Thomas Martitz
thomas.mart...@student.htw-berlin.de wrote:
 Am 09.05.2012 09:37, schrieb Lex Trotman:

 On 9 May 2012 16:54, Thomas Martitz
 thomas.mart...@student.htw-berlin.de  wrote:

 Am 09.05.2012 07:47, schrieb Lex Trotman:

 Using ctags, including locals in the tags generated from Geany source,
 slightly more than doubled the number of tags, and for some C++ I have
 around nearly four times the number.


 But you only need the tags for the current scope and can drop them if you
 enter another (non-nested) scope. This surely doesn't double or quadruple
 the tags.

 You can't drop them from the tags structures because when you are
 parsing you don't know which scope the cursor is in.  So you have to
 add them all, then decide which ones apply to the current scope.


 Okay, but still only for the current file and not an entire project.

Yes, Geany would only have the open files parsed, I only parsed the
entire project to see how much the number of symbols increased if you
go parse locals, by using all of Geany I got an average increase, note
I only said two times not an absolute number :)

Cheers
Lex



 Best regards.
 ___
 Geany-devel mailing list
 Geany-devel@uvena.de
 https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Matthew Brush

On 12-05-07 08:32 PM, Lex Trotman wrote:

On 8 May 2012 13:27, Matthew Brushmbr...@codebrainz.ca  wrote:

On 12-05-07 05:03 PM, Lex Trotman wrote:


On 8 May 2012 02:04, Nick Treleavennick.trelea...@btinternet.comwrote:


On 02/05/2012 05:46, Lex Trotman wrote:




4. Ctags parsers

Agree with Nick that the parsers are usable, but if we start modifying
them to handle local declarations then they will be totally
incompatible with the Ctags project so I guess it doesn't matter other
than for getting languages we don't currently parse.




ctags c.c already parses local tags



No it doesn't AFAICT:



I'm guessing he's referring to the upstream CTags c.c, which does have a
l kind for local variables (off by default). See `ctags --list-kinds=C`.
I'm not sure if the Geany fork has this, was forked before it was added, or
if the guy that wrote TM took it out.



Ok, I havn't looked at Ctags c.c because IIUC from other comments it
isn't really mergable with our c.c.

Does upstream c.c use tagmanager, and if so how does it structure
scopes?  (A good exercise for your compiler :)



1) no
1a) I never could understand CTags scopes, something like a 2-byte 
value, maybe an index into some array?


Cheers,
Matthew Brush


___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Colomban Wendling
Le 07/05/2012 18:04, Nick Treleaven a écrit :
 On 02/05/2012 05:46, Lex Trotman wrote:
 Hi All,

 To summarise since the thread has several subthreads.

 1. Tagmanager Understandability

 a. I generated the doxygen documentation for tagmanager, it works if
 you set recursive, but didn't help much:

 - if its not OOP why does it say things like TMWorkspace is derived
 from TMWorkObject and similar?
 
 documentation bug IMO

I don't think so.  TM uses a more or less OOP-like approach.  See for
example TMWorkspace:

typedef struct
{
TMWorkObject work_object; /*! The parent work object */
GPtrArray *global_tags; /*! Global tags loaded at startup */
GPtrArray *work_objects; /*! An array of TMWorkObject pointers */
} TMWorkspace;

The first field (work_object) is the inherited class, here
TMWorkObject.  And you'll see numerous places where the code uses such a
derived structure as a TMWorkObject -- since it is one actually --,
which looks quite like OOP.

Or see tm_workspace.c:44:tm_create_workspace():  it uses
tm_work_object_register() to register itself as a new type of work
object with a few methods (or vfuncs), and the initializes iself with
tm_work_object_init(), etc.

I very well understand Lex's questionings about how it does actually
work, since it brings a second OOP-style programming in C, less well
known than GObject -- though of course less complex also, but still (BTW
maybe porting to GObject could help?)

 - its not clear how it all goes together, the workspace contains
 global tags and work_objects, or is that files and whats the
 
 workspace work objects are document tags. global tags explained in
 geany's manual.
 
 difference between source_file and file_entry?
 
 It doesn't look like tm_file_entry_ is really used.
 

 - similarly whats the difference between symbol and tag?
 
 tm_symbol_ doesn't seem to be used.
 

 2. Ability to expand tagmanager to handle names declared in lexical
 scopes (not to be confused with struct/class scopes).  Here is the
 example again with some numbers so I can refer to them

 { struct a o; struct a p;
  o./* struct a members */
 { struct b o;
   o./* struct b members */
   p./* struct a members */
 }
 o./* struct a members */
 p./* struct a members1  */
 { struct c o;
   o./* struct c members */
   p./* struct a members2  */
 }
 o./* struct a members */
 p./* struct a members */
   }

 a. yep, tries use more memory than an array, the usual speed/space,
 pick one, tradeoff :)

 b. @Nick, when you say sort by scope then name, are you wanting to
 have an entry in the table for each declaration of the name?
 
 no
 

 - If so this makes the array much bigger to search and your search
 speed depends on size, and it doesn't get you anything, you can't
 search by scope since you don't know if the name is declared in the
 scope you are in or an outer scope compare p at1  and2

 - having a single name array which then points to scope info for the
 name is a viable approach (disclosure, thats how I'm doing the symbol
 table for a language I'm developing) but the table being searched is
 usually larger than if you have nested arrays.  Being smaller these
 are faster to search if the search isn't O(1), hence the suggestion of
 trie instead of bsearch.
 
 the gain in simplicity makes a bigger array to search worth it.
 Remember, global tags aren't included in the workspace array of
 tagmanager, so we're not talking a big number of tags, and we have o(log
 n) searching.
 
 
 4. Ctags parsers

 Agree with Nick that the parsers are usable, but if we start modifying
 them to handle local declarations then they will be totally
 incompatible with the Ctags project so I guess it doesn't matter other
 than for getting languages we don't currently parse.
 
 ctags c.c already parses local tags
 

 5. Overloaded symbols

 Since Colombans patch, overloaded symbols are now stable for all
 practical code (I think theoretically it could get confused if the
 overloads are on the same line but thats unlikely enough to ignore for
 human generated code)
 
 If you're talking about master, I think I still experienced wrong
 parenting on reparsing when removing lines.
 
 6. Moving functionality from symbols.c to tagmanager

 a. Since its the 100th anniversary of the Titanic sinking, I think
 that shuffling the deckchairs is an apt analogy, the functionality
 has to be somewhere, its only useful to move it if the destination
 significantly reduces the effort required.
 
 I don't think I suggested moving functionality. I wondered whether TM
 could help make symbols.c less complex. I would need to understand the
 complexity to know whether this is appropriate or not.

Well, what symbols.c tries to do when updating the symbols tree is (as
documented above update_tree_tags() BTW):

1) update tags that still exist and remove obsolescent ones
2) add the remaining tags (new ones) to the tree

The implementation is a (tiny) little (bit) more 

Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Colomban Wendling
Le 02/05/2012 06:46, Lex Trotman a écrit :
 [...]
 
 3. Background/asynchronous whatever you want to call it parsing
 
 a. @Colomban, you say that caches are created on first lookup, doesn't
 that throw work back into the UI thread which could have been done in
 the parsing thread?

Well, yes, the UI thread gets the load (unless there is background/async
lookups too :)).  However in my approach there can be any number of
caches [1], and those caches can't really be guessed, so... but maybe
it's simply a bad approach.  Or the used caches could be hard-coded in
CTM so they never get created implicitly -- quite ugly but works).

[1] cache means:  a array of tags sorted by a given sort function.  Each
cache is then used for a particular type of lookups:
* simple completions (sort by name);
* scope completion (uses more than one cache, sorted by scope and name,
and sorted by type);
* etc. (I think there is a third one we use somewhere, can't remenber
what though)
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Colomban Wendling
Le 08/05/2012 02:03, Lex Trotman a écrit :
 On 8 May 2012 02:04, Nick Treleaven nick.trelea...@btinternet.com wrote:
 [...]

 It doesn't look like tm_file_entry_ is really used.
 
 Along with your comment below and about project on the next post,
 sounds like tm code could be reduced significantly.  Might help :)

Agreed at 100%!  If we could cut down TM to remove the code that's
actually not used (or not useful for us) would certainly help a lot to
towards making it easier to understand.  (BTW I think I remember
something about Jiří having done something like it a long time ago)
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Colomban Wendling
Le 30/04/2012 18:54, Nick Treleaven a écrit :
 On 29/04/2012 15:42, Colomban Wendling wrote:
 * it support asynchronous parsing (though not concurrent parsing);

 What's the difference? Also, what does it buy us?

 What I meant when saying it's asynchronous but not concurrent is that it
 supports launching the parsers in a separate thread, but it cannot
 launch several parsers at once.  This is because CTags parsers aren't
 thread-safe (have a lot of global states and no locks).

 What asynchronous parsing gives us is quite simple: no blocking.  This
 means that a slow parser (like e.g. the HTML one on Windows before you
 changed the regex library) wouldn't lock Geany.  Same for project
 plugins that want to parse thousands of files:  this would still take a
 long time, but Geany would be still usable in the meantime.
 
 Ok. It seems a good idea to support this, but for parsing tags in open
 documents it doesn't seem particularly useful, as this ought to be fast.
 The regex problem was unusual IMO and due to an old version of GNU regex.

Right, though a really big file could also be slow to parse.

 BTW, how does TagManager do fast searches?  I always though it did a
 sorting with new attributes each time they changed, so in such
 situations it's even worse than O(n), isn't it?
 
 For searching, it doesn't do any sorting ever. For adding tags the work
 object (i.e. document tags) have to be sorted, which I think is good,
 but also the workspace tags are currently resorted, which I think may be
 a bad design.

If it is never resorted, it means ALL lookups are done on the same
criteria: the name.  Right?  If so, how could scope completion be fast?
 It requires a lot of different search:

1) name search for finding the type of the element to complete;
2) type search for resolving possible typedefs (recursively);
3) scope search for getting the actual results.

Or do I miss something again?

 - a multi-cache one that, as its name suggests, maintains
 multiple
   caches (sorted tags arrays).  it uses a little more memory and is
   slower on insertion since it maintains several sorted lists,
 but a
   search on an already existing cache is very fast.

 Won't this be slow for adding many tags at once? How is this design
 better than tagmanager?

 Well, adding a tag would require a bsearch() on each cache, yes.
 However, adding tags is mostly done in a separate thread (if async
 parsing it used), so it can be less of a problem.
 
 I haven't studied your design, but I would prefer that any design is
 efficient on all threads, so the user's computer can use remaining
 performance for compiling  whatever else they want.

Yeah of course.  One possible simple improvement would be to merge only
when all tags got parsed, getting something like you did earlier with
the patch on TM global tags loading.

But yes, it would still require insertions in multiple caches; but I
though it was worth it since it provided fast search for all caches (see
above for why I though/thinks multiple cache are useful).

 Also, what about global tags? Those can add a lot of tags all at once.

I didn't tried to deal with them yet, but I naively reproduced something
like in TM, e.g. another array (cache(s)) for them, so they shouldn't
make the whole think much slower.  However I'm not certain that having a
completely separate array is really good for searching, but again, I
just replicated the TM design here.

And again, I must admin I didn't actually implemented this for real yet
anyway, so I can't really tell.

 And a search is simply a bsearch() (O(n log n), right?) given the cache
 already exists.  If the cache doesn't exist, it has to be created so
 yeah on the first search it's slow.
 
 If it can be slow enough for the user to notice this is probably bad.

As said in another mail, if it's a problem the required caches can be
hard-coded so they are created anyway.  I doesn't seem a clean thing to
me, but that's very well doable.

 It's not strictly needed, but it makes some memory management easier,
 and fits well with GTK memory management.  And this last point helps a
 lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated
 and not removed.
 But that's not new, I already added this in TM.
 
 Yes. Is the reason the tree should be updated and not recreated to
 preserve fold states and scrollbar position? In fact I'm a bit confused

Yes, that's the most prominent reason.  It's to:

* keep fold state
* keep selection
* keep scroll position
* avoid overall flickering

basically it tries to avoid any visible side effects of replacing the tree.

 about this, how come old tags are still accessed after reparsing the
 document with new tags?

That's the magic of reference counting :)  The GtkListStore actually
holds a reference to the tag, so they can still be used after TM dropped
them.  But anyway we always update the symbol list to have a reference
to the current TM tag/

 BTW, what don't you understand in 

Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Matthew Brush

On 12-05-08 05:44 AM, Colomban Wendling wrote:

Le 08/05/2012 02:03, Lex Trotman a écrit :

On 8 May 2012 02:04, Nick Treleavennick.trelea...@btinternet.com  wrote:

[...]

It doesn't look like tm_file_entry_ is really used.


Along with your comment below and about project on the next post,
sounds like tm code could be reduced significantly.  Might help :)


Agreed at 100%!  If we could cut down TM to remove the code that's
actually not used (or not useful for us) would certainly help a lot to
towards making it easier to understand.  (BTW I think I remember
something about Jiří having done something like it a long time ago)


+1000

Also, it wouldn't hurt to make the file system structure and coding 
standard/style as other Geany files. For example:


tagmanager/tm_*.[ch] - delete include/ dir, maybe remove tm_ prefix
tagmanager/mio/*
tagmanager/ctags/* - all non-tm files here

And then we could run the files through GNU Indent or similar program to 
match Geany's coding style.


Cheers,
Matthew Brush
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Colomban Wendling
Le 08/05/2012 14:12, Colomban Wendling a écrit :
 Le 30/04/2012 19:07, Nick Treleaven a écrit :
 On 29/04/2012 15:47, Colomban Wendling wrote:
 Le 26/04/2012 18:53, Nick Treleaven a écrit :
 On 26/04/2012 16:02, Nick Treleaven wrote:
 On 24/04/2012 22:31, Colomban Wendling wrote:
 * it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;

 BTW this is a good idea to clearly separate CTags from tagmanager. If
 this change can be applied separately, perhaps it could be merged into
 master.

 It should be quite easy -- though it won't still be a vanilla CTags of
 course, my own isn't either (yet?).

 I just thought it was a separate change from the TM rewrite.
 
 It could very well be I think, basically it only changes the directory
 structure a little.  I'll try to replicate this on TM.

Here we go: https://github.com/b4n/geany/commits/tm/tree-refactoring
Looks reasonable?

The Autotools and Waf build systems should be OK (tested  running), but
I haven't tested the makefile.win32s; so if somebody could check them
it'd be awesome.

Cheers,
Colomban
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Lex Trotman
On 8 May 2012 22:31, Colomban Wendling lists@herbesfolles.org wrote:
 Le 07/05/2012 18:04, Nick Treleaven a écrit :
 On 02/05/2012 05:46, Lex Trotman wrote:
 Hi All,

 To summarise since the thread has several subthreads.

 1. Tagmanager Understandability

 a. I generated the doxygen documentation for tagmanager, it works if
 you set recursive, but didn't help much:

 - if its not OOP why does it say things like TMWorkspace is derived
 from TMWorkObject and similar?

 documentation bug IMO

 I don't think so.  TM uses a more or less OOP-like approach.  See for
 example TMWorkspace:

 typedef struct
 {
    TMWorkObject work_object; /*! The parent work object */
    GPtrArray *global_tags; /*! Global tags loaded at startup */
    GPtrArray *work_objects; /*! An array of TMWorkObject pointers */
 } TMWorkspace;

 The first field (work_object) is the inherited class, here
 TMWorkObject.  And you'll see numerous places where the code uses such a
 derived structure as a TMWorkObject -- since it is one actually --,
 which looks quite like OOP.

 Or see tm_workspace.c:44:tm_create_workspace():  it uses
 tm_work_object_register() to register itself as a new type of work
 object with a few methods (or vfuncs), and the initializes iself with
 tm_work_object_init(), etc.

 I very well understand Lex's questionings about how it does actually
 work, since it brings a second OOP-style programming in C, less well
 known than GObject -- though of course less complex also, but still (BTW
 maybe porting to GObject could help?)

Thanks Colomban, that helps :)

[...]

Cheers
Lex
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-08 Thread Lex Trotman
Hi All,

I did a step back and got some numbers (what! hard evidence in the
discussion? :)

Using ctags, including locals in the tags generated from Geany source,
slightly more than doubled the number of tags, and for some C++ I have
around nearly four times the number.

For Geany bsearching a sorted array, four times the size is only two
more iterations, doesn't matter :)

But as Colomban pointed out, with real-time tag generation on, *most*
lookups will be from the parser, not Geany.  This is needed to see if
the name it found is a type so it can tell the statement is a
declaration. (c.c does not do this at the moment since it only looks
at top level statements and all top level statements are declarations
in {} languages).

That workload *will* be proportional to the file size.  And the number
of insertions will also be proportional to the file size.  This is a
big change from the current situation where parsing occurs rarely.

For a simple array structure then, whilst parsing, either the search
is linear, or the array is re-sorted on each insert, neither is an
attractive prospect for larger files with 4* the number of tags.

Also on examining the tags file ctags produced I found that local
variable entries did not actually get parsed, ie there was no way to
go from a local variable to its type.  All that happened was an
occurrence of the name was recorded.  Also no information about
lexical scope is recorded.  So even updating to ctags c.c wouldn't
give the information on local variables we need to handle the example
I posted earlier.

So although improving tagmanager/Colomban manager is still worthwhile,
significantly more work is needed on the parsers as well.

And in terms of handling local variables properly and allowing
contextual completion ctags looks like a dead end.

Cheers
Lex
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-07 Thread Nick Treleaven

On 02/05/2012 05:46, Lex Trotman wrote:

Hi All,

To summarise since the thread has several subthreads.

1. Tagmanager Understandability

a. I generated the doxygen documentation for tagmanager, it works if
you set recursive, but didn't help much:

- if its not OOP why does it say things like TMWorkspace is derived
from TMWorkObject and similar?


documentation bug IMO


- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the


workspace work objects are document tags. global tags explained in 
geany's manual.



difference between source_file and file_entry?


It doesn't look like tm_file_entry_ is really used.



- similarly whats the difference between symbol and tag?


tm_symbol_ doesn't seem to be used.



2. Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes).  Here is the
example again with some numbers so I can refer to them

{ struct a o; struct a p;
 o./* struct a members */
{ struct b o;
  o./* struct b members */
  p./* struct a members */
}
o./* struct a members */
p./* struct a members1  */
{ struct c o;
  o./* struct c members */
  p./* struct a members2  */
}
o./* struct a members */
p./* struct a members */
  }

a. yep, tries use more memory than an array, the usual speed/space,
pick one, tradeoff :)

b. @Nick, when you say sort by scope then name, are you wanting to
have an entry in the table for each declaration of the name?


no



- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't
search by scope since you don't know if the name is declared in the
scope you are in or an outer scope compare p at1  and2

- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol
table for a language I'm developing) but the table being searched is
usually larger than if you have nested arrays.  Being smaller these
are faster to search if the search isn't O(1), hence the suggestion of
trie instead of bsearch.


the gain in simplicity makes a bigger array to search worth it. 
Remember, global tags aren't included in the workspace array of 
tagmanager, so we're not talking a big number of tags, and we have o(log 
n) searching.




4. Ctags parsers

Agree with Nick that the parsers are usable, but if we start modifying
them to handle local declarations then they will be totally
incompatible with the Ctags project so I guess it doesn't matter other
than for getting languages we don't currently parse.


ctags c.c already parses local tags



5. Overloaded symbols

Since Colombans patch, overloaded symbols are now stable for all
practical code (I think theoretically it could get confused if the
overloads are on the same line but thats unlikely enough to ignore for
human generated code)


If you're talking about master, I think I still experienced wrong 
parenting on reparsing when removing lines.



6. Moving functionality from symbols.c to tagmanager

a. Since its the 100th anniversary of the Titanic sinking, I think
that shuffling the deckchairs is an apt analogy, the functionality
has to be somewhere, its only useful to move it if the destination
significantly reduces the effort required.


I don't think I suggested moving functionality. I wondered whether TM 
could help make symbols.c less complex. I would need to understand the 
complexity to know whether this is appropriate or not.


Titanic comment is bizarre. My suggestion could equally apply to 
ctagsmanager.

___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-07 Thread Nick Treleaven

On 07/05/2012 17:04, Nick Treleaven wrote:

On 02/05/2012 05:46, Lex Trotman wrote:

- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the


workspace work objects are document tags. global tags explained in
geany's manual.


difference between source_file and file_entry?


It doesn't look like tm_file_entry_ is really used.



- similarly whats the difference between symbol and tag?


tm_symbol_ doesn't seem to be used.


Also, tm_project_ is not used either.





2. Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes). Here is the
example again with some numbers so I can refer to them



b. @Nick, when you say sort by scope then name, are you wanting to
have an entry in the table for each declaration of the name?


no



- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't
search by scope since you don't know if the name is declared in the
scope you are in or an outer scope compare p at1 and2

- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol
table for a language I'm developing) but the table being searched is
usually larger than if you have nested arrays. Being smaller these
are faster to search if the search isn't O(1), hence the suggestion of
trie instead of bsearch.


the gain in simplicity makes a bigger array to search worth it.
Remember, global tags aren't included in the workspace array of
tagmanager, so we're not talking a big number of tags, and we have o(log
n) searching.


Oops, forget the global/workspace division, we still need to search 
global scopes. But I don't know why you think this is too slow.

___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-07 Thread Lex Trotman
On 8 May 2012 02:04, Nick Treleaven nick.trelea...@btinternet.com wrote:
 On 02/05/2012 05:46, Lex Trotman wrote:

 Hi All,

 To summarise since the thread has several subthreads.

 1. Tagmanager Understandability

 a. I generated the doxygen documentation for tagmanager, it works if
 you set recursive, but didn't help much:

 - if its not OOP why does it say things like TMWorkspace is derived
 from TMWorkObject and similar?


 documentation bug IMO

O-K, clearly the documenter obviously had a meaning for the term in
mind since it occurs in several places, what do you think they meant?



 - its not clear how it all goes together, the workspace contains
 global tags and work_objects, or is that files and whats the


 workspace work objects are document tags. global tags explained in geany's
 manual.

And the work objects/document tags are in a different array to global
tags if I read your comment on the next post right?



 difference between source_file and file_entry?


 It doesn't look like tm_file_entry_ is really used.

Along with your comment below and about project on the next post,
sounds like tm code could be reduced significantly.  Might help :)




 - similarly whats the difference between symbol and tag?


 tm_symbol_ doesn't seem to be used.



 2. Ability to expand tagmanager to handle names declared in lexical
 scopes (not to be confused with struct/class scopes).  Here is the
 example again with some numbers so I can refer to them

 { struct a o; struct a p;
     o./* struct a members */
    { struct b o;
      o./* struct b members */
      p./* struct a members */
    }
    o./* struct a members */
    p./* struct a members1  */
    { struct c o;
      o./* struct c members */
      p./* struct a members2  */
    }
    o./* struct a members */
    p./* struct a members */
  }

 a. yep, tries use more memory than an array, the usual speed/space,
 pick one, tradeoff :)

 b. @Nick, when you say sort by scope then name, are you wanting to
 have an entry in the table for each declaration of the name?


 no

So what did you mean?




 - If so this makes the array much bigger to search and your search
 speed depends on size, and it doesn't get you anything, you can't
 search by scope since you don't know if the name is declared in the
 scope you are in or an outer scope compare p at1  and2

 - having a single name array which then points to scope info for the
 name is a viable approach (disclosure, thats how I'm doing the symbol
 table for a language I'm developing) but the table being searched is
 usually larger than if you have nested arrays.  Being smaller these
 are faster to search if the search isn't O(1), hence the suggestion of
 trie instead of bsearch.


 the gain in simplicity makes a bigger array to search worth it. Remember,
 global tags aren't included in the workspace array of tagmanager, so we're
 not talking a big number of tags, and we have o(log n) searching.


Ok, but to date document tags don't include local variables (see
below) so they will add many more names, with lots of replication.



 4. Ctags parsers

 Agree with Nick that the parsers are usable, but if we start modifying
 them to handle local declarations then they will be totally
 incompatible with the Ctags project so I guess it doesn't matter other
 than for getting languages we don't currently parse.


 ctags c.c already parses local tags

No it doesn't AFAICT:

on brace open createTags() calls tagCheck() and nest(),

on brace open checkTag() only makes a tag for a preceding function or
contextual entity (== class, enum, namespace struct or union) it does
not look at the contents of the {},

and nest() only recurses if the { opens a class, enum, namespace,
struct, union otherwise it skips everything in the {},

so no local declarations are parsed unless I've missed something?



 5. Overloaded symbols

 Since Colombans patch, overloaded symbols are now stable for all
 practical code (I think theoretically it could get confused if the
 overloads are on the same line but thats unlikely enough to ignore for
 human generated code)


 If you're talking about master, I think I still experienced wrong parenting
 on reparsing when removing lines.

HEAD of master gives me no stability problems with C++, can't cause
any problems with C either, can you reproduce?



 6. Moving functionality from symbols.c to tagmanager

 a. Since its the 100th anniversary of the Titanic sinking, I think
 that shuffling the deckchairs is an apt analogy, the functionality
 has to be somewhere, its only useful to move it if the destination
 significantly reduces the effort required.


 I don't think I suggested moving functionality.

OK, I read it that you were suggesting that.

I wondered whether TM could
 help make symbols.c less complex. I would need to understand the complexity
 to know whether this is appropriate or not.

 Titanic comment is bizarre. My suggestion could equally apply to
 ctagsmanager.

Oh dear, misdirected attempt at 

Re: [Geany-devel] tagmanager changes

2012-05-07 Thread Matthew Brush

On 12-05-07 05:03 PM, Lex Trotman wrote:

On 8 May 2012 02:04, Nick Treleavennick.trelea...@btinternet.com  wrote:

On 02/05/2012 05:46, Lex Trotman wrote:



4. Ctags parsers

Agree with Nick that the parsers are usable, but if we start modifying
them to handle local declarations then they will be totally
incompatible with the Ctags project so I guess it doesn't matter other
than for getting languages we don't currently parse.



ctags c.c already parses local tags


No it doesn't AFAICT:



I'm guessing he's referring to the upstream CTags c.c, which does have 
a l kind for local variables (off by default). See `ctags 
--list-kinds=C`. I'm not sure if the Geany fork has this, was forked 
before it was added, or if the guy that wrote TM took it out.


Cheers,
Matthew Brush
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-07 Thread Lex Trotman
On 8 May 2012 13:27, Matthew Brush mbr...@codebrainz.ca wrote:
 On 12-05-07 05:03 PM, Lex Trotman wrote:

 On 8 May 2012 02:04, Nick Treleavennick.trelea...@btinternet.com  wrote:

 On 02/05/2012 05:46, Lex Trotman wrote:


 4. Ctags parsers

 Agree with Nick that the parsers are usable, but if we start modifying
 them to handle local declarations then they will be totally
 incompatible with the Ctags project so I guess it doesn't matter other
 than for getting languages we don't currently parse.



 ctags c.c already parses local tags


 No it doesn't AFAICT:


 I'm guessing he's referring to the upstream CTags c.c, which does have a
 l kind for local variables (off by default). See `ctags --list-kinds=C`.
 I'm not sure if the Geany fork has this, was forked before it was added, or
 if the guy that wrote TM took it out.


Ok, I havn't looked at Ctags c.c because IIUC from other comments it
isn't really mergable with our c.c.

Does upstream c.c use tagmanager, and if so how does it structure
scopes?  (A good exercise for your compiler :)

Cheers
Lex


 Cheers,
 Matthew Brush

 ___
 Geany-devel mailing list
 Geany-devel@uvena.de
 https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-05-01 Thread Lex Trotman
Hi All,

To summarise since the thread has several subthreads.

1. Tagmanager Understandability

a. I generated the doxygen documentation for tagmanager, it works if
you set recursive, but didn't help much:

- if its not OOP why does it say things like TMWorkspace is derived
from TMWorkObject and similar?

- its not clear how it all goes together, the workspace contains
global tags and work_objects, or is that files and whats the
difference between source_file and file_entry?

- similarly whats the difference between symbol and tag?

2. Ability to expand tagmanager to handle names declared in lexical
scopes (not to be confused with struct/class scopes).  Here is the
example again with some numbers so I can refer to them

{ struct a o; struct a p;
o./* struct a members */
   { struct b o;
 o./* struct b members */
 p./* struct a members */
   }
   o./* struct a members */
   p./* struct a members 1 */
   { struct c o;
 o./* struct c members */
 p./* struct a members 2 */
   }
   o./* struct a members */
   p./* struct a members */
 }

a. yep, tries use more memory than an array, the usual speed/space,
pick one, tradeoff :)

b. @Nick, when you say sort by scope then name, are you wanting to
have an entry in the table for each declaration of the name?

- If so this makes the array much bigger to search and your search
speed depends on size, and it doesn't get you anything, you can't
search by scope since you don't know if the name is declared in the
scope you are in or an outer scope compare p at 1 and 2

- having a single name array which then points to scope info for the
name is a viable approach (disclosure, thats how I'm doing the symbol
table for a language I'm developing) but the table being searched is
usually larger than if you have nested arrays.  Being smaller these
are faster to search if the search isn't O(1), hence the suggestion of
trie instead of bsearch.

3. Background/asynchronous whatever you want to call it parsing

a. @Colomban, you say that caches are created on first lookup, doesn't
that throw work back into the UI thread which could have been done in
the parsing thread?

4. Ctags parsers

Agree with Nick that the parsers are usable, but if we start modifying
them to handle local declarations then they will be totally
incompatible with the Ctags project so I guess it doesn't matter other
than for getting languages we don't currently parse.

5. Overloaded symbols

Since Colombans patch, overloaded symbols are now stable for all
practical code (I think theoretically it could get confused if the
overloads are on the same line but thats unlikely enough to ignore for
human generated code)

6. Moving functionality from symbols.c to tagmanager

a. Since its the 100th anniversary of the Titanic sinking, I think
that shuffling the deckchairs is an apt analogy, the functionality
has to be somewhere, its only useful to move it if the destination
significantly reduces the effort required.

Cheers
Lex
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-30 Thread Nick Treleaven

On 29/04/2012 14:07, Lex Trotman wrote:

On 29 April 2012 22:07, Nick Treleavennick.trelea...@btinternet.com  wrote:

On 27/04/2012 06:30, Lex Trotman wrote:
I don't see this myself, I see some complicating issues which could be
resolved (and I'm willing to work on them), but generally a sound design for
what it provides and for extra things we may want to add.



Maybe my problem is simply the attempt to simulate object orientation
and inheritance in C makes it look complicated for a spoilt C++
programmer :)


Eh? Tagmanager isn't OOP.


I expect the design is better in some respects (and to be fair I didn't look
for better things), but finding a tag based on its name is something we are
always going to want to be fast. Even for scope completion, you still need
to lookup a tag structure from a name string. So I think we will always want
a sorted array of tags per document that we can bsearch (or something
equally fast).


Yes, we could have one name table and then prune the results to the
scope required, or a name table per scope (which makes the tables
smaller and simple searches faster).


It seems to me that if we supported proper scope completion then we 
could still have one array per document, sorted first by scope, then by 
name.



I don't know, but we still need fast tag lookup based on name. If O(n) scope
lookup is too slow, we will need additional data structures arranged
differently, but whatever we have should have something like O(log n) lookup
for names as this is by far the most common operation.



Agree, name lookup should be as fast as possible, O(log n) what about
O(1) like compilers :)

Well since we can't use hashes due to the need to identify prefixes, a
radix trie seems the best since it can be as fast as a hash ie two
passes through the key, and that gives a sub-trie that begins with the
prefix.  But as you say thats orthogonal to the question of scope
layout.


I only know about a simple trie, but wouldn't that use a lot more memory?

I don't think we want name lookup 'as fast as possible', just no slower 
than we have already.



* tags (and most types) are reference-counted (so they aren't
   necessarily duplicated, e.g. in the multicache backend);




I don't really understand src/symbols.c since the real-time parsing
change,
so don't really understand why this is needed.



Blame C++ and overloaded names I think.



I looked at the thread about that, and from what I could tell, the problem
was for reparsing unsaved files. Wasn't the order OK for files that have
just been saved? (Also I don't follow what that has to do with reference
counting).


Hmm, you're right, looks like Colomban is duplicating tags in nested
caches, so all searches are just one cache, but in fact they are not
duplicated, but reference the same tag using a reference count to tell
when to destroy it.  I am not sure where the reference to real time
parsing comes from?


Sorry, I meant reparsing not real time updates. But is your problem with 
overloaded symbols OK when the file is parsed after saving, i.e. the 
order is only wrong on reparsing?


___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-30 Thread Nick Treleaven

On 29/04/2012 15:42, Colomban Wendling wrote:

* it support asynchronous parsing (though not concurrent parsing);


What's the difference? Also, what does it buy us?


What I meant when saying it's asynchronous but not concurrent is that it
supports launching the parsers in a separate thread, but it cannot
launch several parsers at once.  This is because CTags parsers aren't
thread-safe (have a lot of global states and no locks).

What asynchronous parsing gives us is quite simple: no blocking.  This
means that a slow parser (like e.g. the HTML one on Windows before you
changed the regex library) wouldn't lock Geany.  Same for project
plugins that want to parse thousands of files:  this would still take a
long time, but Geany would be still usable in the meantime.


Ok. It seems a good idea to support this, but for parsing tags in open 
documents it doesn't seem particularly useful, as this ought to be fast. 
The regex problem was unusual IMO and due to an old version of GNU regex.



I don't understand why tagmanager has to be replaced, why not just
replace the parts you want to improve? Rewriting it is likely to lead to
a new set of bugs and be hard to review and merge changes from master.


I think Lex and Matthew probably summarize it quite correctly:  it's not
that TM is bad or has to be replaced; it is that (I'm) unable to
understand it enough to fix anything in it, like scope completion.
Maybe it's only me, but I tried hard :)

And the only reason why maybe TagManager would need replacement/large
changes is asynchronous parsing.  I'm not sure how hard it would be to
get it with TM -- but again, maybe it's only that I don't understand it
enough.


Not sure about this, but I think the only messy part of TM is when it 
updates the workspace array, which I think could be removed (I'll 
explain this in another reply).



* there is 2 backends as of today:
- a simple one that is simple and that doesn't waste (too much)
  memory, but that searches in O(n);


Linear searching seems like a bad idea when we currently have O(log n).
Removing x tags is O(n * x) by the look of it.


I agree it's not interesting regarding performances, and this backend
isn't meant for it.  I needed an implementation for early testing, so I
wrote it simple.  And it showed useful for testing later on too, since
because of it's simplicity it isn't bugged -- just slow :)


Ok.


BTW, how does TagManager do fast searches?  I always though it did a
sorting with new attributes each time they changed, so in such
situations it's even worse than O(n), isn't it?


For searching, it doesn't do any sorting ever. For adding tags the work 
object (i.e. document tags) have to be sorted, which I think is good, 
but also the workspace tags are currently resorted, which I think may be 
a bad design.



- a multi-cache one that, as its name suggests, maintains multiple
  caches (sorted tags arrays).  it uses a little more memory and is
  slower on insertion since it maintains several sorted lists, but a
  search on an already existing cache is very fast.


Won't this be slow for adding many tags at once? How is this design
better than tagmanager?


Well, adding a tag would require a bsearch() on each cache, yes.
However, adding tags is mostly done in a separate thread (if async
parsing it used), so it can be less of a problem.


I haven't studied your design, but I would prefer that any design is 
efficient on all threads, so the user's computer can use remaining 
performance for compiling  whatever else they want.


Also, what about global tags? Those can add a lot of tags all at once.


And a search is simply a bsearch() (O(n log n), right?) given the cache
already exists.  If the cache doesn't exist, it has to be created so
yeah on the first search it's slow.


If it can be slow enough for the user to notice this is probably bad.


I don't see why having two is better. The memory overhead for a pointer
array is not much vs. the size of the tag structures. Fast searching is
important.


Multiple backends isn't really useful probably.  As said above, the
first was mostly a testing one, and I wanted to be able to keep both
during development for better testing.  At one point I also though maybe
there could be some backends optimized for particular situations, like
fast search, fast insertion, low memory consumption, etc., but I don't
see much use for this anymore.


Ok.


* tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);


I don't really understand src/symbols.c since the real-time parsing
change, so don't really understand why this is needed.


BTW here I meant reparsing document tags without saving, sorry for any 
confusion.



It's not strictly needed, but it makes some memory management easier,
and fits well with GTK memory management.  And this last point helps a
lot to maintain the GtkTreeStore on src/symbols.c, now tags are updated
and not 

Re: [Geany-devel] tagmanager changes

2012-04-30 Thread Nick Treleaven

On 29/04/2012 15:47, Colomban Wendling wrote:

Le 26/04/2012 18:53, Nick Treleaven a écrit :

On 26/04/2012 16:02, Nick Treleaven wrote:

On 24/04/2012 22:31, Colomban Wendling wrote:

* it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;


BTW this is a good idea to clearly separate CTags from tagmanager. If
this change can be applied separately, perhaps it could be merged into
master.


It should be quite easy -- though it won't still be a vanilla CTags of
course, my own isn't either (yet?).


I just thought it was a separate change from the TM rewrite.

We have a lot of changes from CTags e.g. new languages and improvements 
to existing parsers (and even CTags itself) which IMO we can't throw away.


It is possible to merge some changes from CTags but for e.g. c.c this 
has become pretty difficult. I tried this once but it ended up breaking 
the parser and I didn't work out why.


IMO the biggest issue with CTags is that it isn't really maintained and 
they rarely accept patches even when it fixes an important bug (e.g. a 
fix to regex callback parsing with ignored matches).



For avoiding resorting of workspace tags when only reparsing one source
object, we can remove the source object's old tags  merge the new tags.
This is all O(n) for the workspace array. I haven't looked into
implementing this yet because now it's clear you're working on a
competing change.


Another option is to remove the workspace array altogether and just have
Geany collate tags from each (sorted) source object when needed. Not
sure the implications of this yet but it would simplify tagmanager.


Well, tagmanager needs to know all tags to perform e.g. scope completion
beyond file's boundaries.  And for search too, it would need us to pass
it everything, or to perform a search on each document's tags and then
merge the result.  Doesn't sound sensible at a first glance, but maybe
I'm missing something.


It comes to a trade off between merging/sorting results from multiple 
tag arrays and merging/sorting the whole workspace array even when only 
one document is reparsed.


Actually I suppose the first one can cause lags which the user could 
notice if there are a lot of results, whereas the second one only causes 
problems on reparsing, which is less often than searching.


I'm undecided. The second one can be more serious if the calling code 
doesn't handle e.g. save all specially and the workspace array size is 
quite big. The special handling is to not update the workspace array 
until all documents have been parsed, then to resort it. If it's only 
save all, we can cope, but there may be other ways this can happen.

___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Nick Treleaven

On 27/04/2012 06:30, Lex Trotman wrote:

[...]


I don't understand why tagmanager has to be replaced, why not just replace
the parts you want to improve? Rewriting it is likely to lead to a new set
of bugs and be hard to review and merge changes from master.



One of the problems with tagmanager is its complexity, leading to
considerable wariness on the part of many of us about changing it
since we don't understand what we might break.


I don't see this myself, I see some complicating issues which could be 
resolved (and I'm willing to work on them), but generally a sound design 
for what it provides and for extra things we may want to add.



Actually documenting the overall structure of tagmanager and how it is
supposed to work would be a good thing, whats a workspace? what is it
meant to represent, how are scopes represented? etc.


Isn't it clear from the data structures? Look at TMWorkspace. Scopes and 
other tag metadata is the same as CTags. Obviously if we had at-a-glance 
overall documentation that would be good.


One confusing thing is that a TMTag can be used for an actual tag or for 
a file. Probably that could be cleaned up.



   - a multi-cache one that, as its name suggests, maintains multiple
 caches (sorted tags arrays).  it uses a little more memory and is
 slower on insertion since it maintains several sorted lists, but a
 search on an already existing cache is very fast.



Won't this be slow for adding many tags at once? How is this design better
than tagmanager?


Perhaps Colomban could confirm, but my first thought is that this is
for nested scopes.


I expect the design is better in some respects (and to be fair I didn't 
look for better things), but finding a tag based on its name is 
something we are always going to want to be fast. Even for scope 
completion, you still need to lookup a tag structure from a name string. 
So I think we will always want a sorted array of tags per document that 
we can bsearch (or something equally fast).


Also, I've probably sounded quite harsh on Colomban's design, but I'm 
commenting on what I think is important. I am genuinely interested in 
why his design decisions are better. It's a lot to take in all at once, 
so probably needs some explanation. Sorry if I didn't make that clear.



How does tagmanager handle nested scopes, or how would it need to be
modified to do so, considering the example (in C)

{ struct a o; struct a p;
o./* struct a members */
   { struct b o;
 o./* struct b members */
 p./* struct a members */
   }
   o./* struct a members */
   p./* struct a members */
   { struct c o;
 o./* struct c members */
 p./* struct a members */
   }
   o./* struct a members */
   p./* struct a members */
}

How much needs to be changed in tagmanager so that the right
autocompletes can be provided at each comment?  (assuming c.c is
taught to parse local variables of course)


I don't know, but we still need fast tag lookup based on name. If O(n) 
scope lookup is too slow, we will need additional data structures 
arranged differently, but whatever we have should have something like 
O(log n) lookup for names as this is by far the most common operation.



* this backend abstraction might be really overkill, and maybe we
   could do better without it?;



I don't see why having two is better. The memory overhead for a pointer
array is not much vs. the size of the tag structures. Fast searching is
important.



It is but is it flexible enough to be expanded to nested scopes.


see above.


* tags (and most types) are reference-counted (so they aren't
   necessarily duplicated, e.g. in the multicache backend);



I don't really understand src/symbols.c since the real-time parsing change,
so don't really understand why this is needed.


Blame C++ and overloaded names I think.


I looked at the thread about that, and from what I could tell, the 
problem was for reparsing unsaved files. Wasn't the order OK for files 
that have just been saved? (Also I don't follow what that has to do with 
reference counting).



I don't really see what the problem understanding it is. I thought scope
completion was just tm_workspace_find_scoped and related functions, not some
tagmanager-wide problem.


I think the fact that this isn't clear is the problem :)


If I'm following you correctly, I think you're saying the design needs 
to change, which I accept may be true. What I was saying was that 
understanding the existing code for scope completion is not really that 
complex.


Regards,
Nick
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Matthew Brush

Hi Nick,

I think maybe you just didn't realize how much everyone doesn't 
understand TagManager because we always bitch about it on IRC in 
passing. Actually, you might be the only person who *really* understands 
it :)


I'll just rant a little bit about some problems with TM, as I see them 
(and as bitched about on IRC), and maybe that can spin off some 
discussions on ways we could improve it:


- Not invented here; none of us wrote it and not in Geany's coding 
style, file system layout and naming convention, etc. I personally see 
it as an upstream project like Scintilla, even though the upstream 
project is dead (at least the TM part).
- Seems to be overly complex for what it needs to do (this might not be 
true, but it's how it seems at a glance).
- Contains a *whole other fork* of CTags; for me this is the worst part. 
It's far too difficult to take upstream improvements on files like c.c, 
for example.
- Isn't threaded, blocks the UI for several seconds while parsing many 
tags files before Geany can start, and even worse for the project 
plugins that parse all the project files on opening. This makes Geany 
appear really slow and in some cases *too* slow (ie. several minutes or 
more, if there's enough files to parse).
- Isn't re-entrant or thread-safe, uses lots of global state, I guess 
this is mostly due to CTags but also I think TM itself has some same 
issues. This means it's really hard to get tag parsing out of the main 
thread.
- Upstream project doesn't use or support TM anymore, just us. AFAIK 
they are using a simpler scheme[1] involving forking out to a CTags 
binary and using a (seemingly) more logical database (sqlite) for 
storing and accessing tags.
- Doesn't complete local variables, scope completion doesn't seem to 
work properly either.
- Doesn't support CTags format files for some reason (though I added 
this previously in my fork, so it's certainly do-able).


Of course I don't mean to make it sound like TM is garbage, looking at 
the code shows it's quite well engineered/optimized, and I'm confident 
that it has lots of good qualities, even if I don't understand them.


Anyways, I'll end ranting here and hope it might give some ideas about 
the problems some of us see with TM, and we could work towards fixing, 
if we aren't to replace TM altogether.


Cheers,
Matthew Brush

[1] http://git.gnome.org/browse/anjuta/tree/plugins/symbol-db

On 12-04-29 05:07 AM, Nick Treleaven wrote:

On 27/04/2012 06:30, Lex Trotman wrote:

[...]


I don't understand why tagmanager has to be replaced, why not just
replace
the parts you want to improve? Rewriting it is likely to lead to a
new set
of bugs and be hard to review and merge changes from master.



One of the problems with tagmanager is its complexity, leading to
considerable wariness on the part of many of us about changing it
since we don't understand what we might break.


I don't see this myself, I see some complicating issues which could be
resolved (and I'm willing to work on them), but generally a sound design
for what it provides and for extra things we may want to add.


Actually documenting the overall structure of tagmanager and how it is
supposed to work would be a good thing, whats a workspace? what is it
meant to represent, how are scopes represented? etc.


Isn't it clear from the data structures? Look at TMWorkspace. Scopes and
other tag metadata is the same as CTags. Obviously if we had at-a-glance
overall documentation that would be good.

One confusing thing is that a TMTag can be used for an actual tag or for
a file. Probably that could be cleaned up.


- a multi-cache one that, as its name suggests, maintains multiple
caches (sorted tags arrays). it uses a little more memory and is
slower on insertion since it maintains several sorted lists, but a
search on an already existing cache is very fast.



Won't this be slow for adding many tags at once? How is this design
better
than tagmanager?


Perhaps Colomban could confirm, but my first thought is that this is
for nested scopes.


I expect the design is better in some respects (and to be fair I didn't
look for better things), but finding a tag based on its name is
something we are always going to want to be fast. Even for scope
completion, you still need to lookup a tag structure from a name string.
So I think we will always want a sorted array of tags per document that
we can bsearch (or something equally fast).

Also, I've probably sounded quite harsh on Colomban's design, but I'm
commenting on what I think is important. I am genuinely interested in
why his design decisions are better. It's a lot to take in all at once,
so probably needs some explanation. Sorry if I didn't make that clear.


How does tagmanager handle nested scopes, or how would it need to be
modified to do so, considering the example (in C)

{ struct a o; struct a p;
o./* struct a members */
{ struct b o;
o./* struct b members */
p./* struct a members */
}
o./* struct a members */
p./* struct 

Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Colomban Wendling
Hi,

Le 26/04/2012 17:02, Nick Treleaven a écrit :
 On 24/04/2012 22:31, Colomban Wendling wrote:
 Le 17/04/2012 18:20, Nick Treleaven a écrit :
 Sorry for the long delays -- and also small activity -- recently.  I
 have/had a lot of non-Geany stuff to do and stuff, the whole story, you
 know.
 
 No problem.
 
 I finally committed it and pushed it so you can see it, comment, blame,
 flame  more:  see https://github.com/b4n/geany/tree/wip%2Fctagsmanager

 A few points, as they comes to my mind:

 * it support asynchronous parsing (though not concurrent parsing);
 
 What's the difference? Also, what does it buy us?

What I meant when saying it's asynchronous but not concurrent is that it
supports launching the parsers in a separate thread, but it cannot
launch several parsers at once.  This is because CTags parsers aren't
thread-safe (have a lot of global states and no locks).

What asynchronous parsing gives us is quite simple: no blocking.  This
means that a slow parser (like e.g. the HTML one on Windows before you
changed the regex library) wouldn't lock Geany.  Same for project
plugins that want to parse thousands of files:  this would still take a
long time, but Geany would be still usable in the meantime.

 * it is under the new ctagsmanager/ directory;
 * it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
 * all types have different names than the tagmanager ones, though
currently the API is almost an 1:1 mapping -- and that's maybe a
huge mistake?;
 
 I don't understand why tagmanager has to be replaced, why not just
 replace the parts you want to improve? Rewriting it is likely to lead to
 a new set of bugs and be hard to review and merge changes from master.

I think Lex and Matthew probably summarize it quite correctly:  it's not
that TM is bad or has to be replaced; it is that (I'm) unable to
understand it enough to fix anything in it, like scope completion.
Maybe it's only me, but I tried hard :)

And the only reason why maybe TagManager would need replacement/large
changes is asynchronous parsing.  I'm not sure how hard it would be to
get it with TM -- but again, maybe it's only that I don't understand it
enough.

 * there is 2 backends as of today:
- a simple one that is simple and that doesn't waste (too much)
  memory, but that searches in O(n);
 
 Linear searching seems like a bad idea when we currently have O(log n).
 Removing x tags is O(n * x) by the look of it.

I agree it's not interesting regarding performances, and this backend
isn't meant for it.  I needed an implementation for early testing, so I
wrote it simple.  And it showed useful for testing later on too, since
because of it's simplicity it isn't bugged -- just slow :)

BTW, how does TagManager do fast searches?  I always though it did a
sorting with new attributes each time they changed, so in such
situations it's even worse than O(n), isn't it?

- a multi-cache one that, as its name suggests, maintains multiple
  caches (sorted tags arrays).  it uses a little more memory and is
  slower on insertion since it maintains several sorted lists, but a
  search on an already existing cache is very fast.
 
 Won't this be slow for adding many tags at once? How is this design
 better than tagmanager?

Well, adding a tag would require a bsearch() on each cache, yes.
However, adding tags is mostly done in a separate thread (if async
parsing it used), so it can be less of a problem.

And a search is simply a bsearch() (O(n log n), right?) given the cache
already exists.  If the cache doesn't exist, it has to be created so
yeah on the first search it's slow.

 Given my global tags merge change (see below), I think tagmanager needs
 some changes to avoid sorting the workspace tags array each time tags
 are updated, but this is certainly doable.
 
In practice I haven't yet tested anything big enough to see any
difference in performances between these two backends, and that's
something that probably isn't worth bothering about for now.

 * this backend abstraction might be really overkill, and maybe we
could do better without it?;
 
 I don't see why having two is better. The memory overhead for a pointer
 array is not much vs. the size of the tag structures. Fast searching is
 important.

Multiple backends isn't really useful probably.  As said above, the
first was mostly a testing one, and I wanted to be able to keep both
during development for better testing.  At one point I also though maybe
there could be some backends optimized for particular situations, like
fast search, fast insertion, low memory consumption, etc., but I don't
see much use for this anymore.

 * tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);
 
 I don't really understand src/symbols.c since the real-time parsing
 change, so don't really understand why this is needed.

It's not strictly needed, but it makes some memory management 

Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Colomban Wendling
Le 26/04/2012 18:53, Nick Treleaven a écrit :
 On 26/04/2012 16:02, Nick Treleaven wrote:
 On 24/04/2012 22:31, Colomban Wendling wrote:
 * it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
 
 BTW this is a good idea to clearly separate CTags from tagmanager. If
 this change can be applied separately, perhaps it could be merged into
 master.

It should be quite easy -- though it won't still be a vanilla CTags of
course, my own isn't either (yet?).

 For avoiding resorting of workspace tags when only reparsing one source
 object, we can remove the source object's old tags  merge the new tags.
 This is all O(n) for the workspace array. I haven't looked into
 implementing this yet because now it's clear you're working on a
 competing change.
 
 Another option is to remove the workspace array altogether and just have
 Geany collate tags from each (sorted) source object when needed. Not
 sure the implications of this yet but it would simplify tagmanager.

Well, tagmanager needs to know all tags to perform e.g. scope completion
beyond file's boundaries.  And for search too, it would need us to pass
it everything, or to perform a search on each document's tags and then
merge the result.  Doesn't sound sensible at a first glance, but maybe
I'm missing something.
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Colomban Wendling
Le 27/04/2012 07:30, Lex Trotman a écrit :
 [...]
   - a multi-cache one that, as its name suggests, maintains multiple
 caches (sorted tags arrays).  it uses a little more memory and is
 slower on insertion since it maintains several sorted lists, but a
 search on an already existing cache is very fast.


 Won't this be slow for adding many tags at once? How is this design better
 than tagmanager?
 
 Perhaps Colomban could confirm, but my first thought is that this is
 for nested scopes.

Not at all, sorry :)
It's only for performance, so search on different criteria can remain
fast (just a bsearch()).

 How does tagmanager handle nested scopes, or how would it need to be
 modified to do so, considering the example (in C)
 
 { struct a o; struct a p;
o./* struct a members */
   { struct b o;
 o./* struct b members */
 p./* struct a members */
   }
   o./* struct a members */
   p./* struct a members */
   { struct c o;
 o./* struct c members */
 p./* struct a members */
   }
   o./* struct a members */
   p./* struct a members */
 }
 
 How much needs to be changed in tagmanager so that the right
 autocompletes can be provided at each comment?  (assuming c.c is
 taught to parse local variables of course)
 
 And of course the same question for Colomban.

To really support such thing would require the parsers to provide a true
tree, not a flat thing.  However, my scope completion algorithm takes
into account a few things to try to find The Right Thing™:

1) the file in which the tag appears (e.g. tries to resolve things
locally first);
2) the distance in lines between the tags (e.g. the tag that comes
just before the one to complete has precedence over another one).

but this wouldn't be enough to get correct results with your example; it
would really need a tree.


Cheers,
Colomban
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-29 Thread Colomban Wendling
Le 29/04/2012 14:07, Nick Treleaven a écrit :
 On 27/04/2012 06:30, Lex Trotman wrote:
 [...]

 I don't understand why tagmanager has to be replaced, why not just
 replace
 the parts you want to improve? Rewriting it is likely to lead to a
 new set
 of bugs and be hard to review and merge changes from master.
 
 One of the problems with tagmanager is its complexity, leading to
 considerable wariness on the part of many of us about changing it
 since we don't understand what we might break.
 
 I don't see this myself, I see some complicating issues which could be
 resolved (and I'm willing to work on them), but generally a sound design
 for what it provides and for extra things we may want to add.
 
 Actually documenting the overall structure of tagmanager and how it is
 supposed to work would be a good thing, whats a workspace? what is it
 meant to represent, how are scopes represented? etc.
 
 Isn't it clear from the data structures? Look at TMWorkspace. Scopes and
 other tag metadata is the same as CTags. Obviously if we had at-a-glance
 overall documentation that would be good.

What I personally blame TM for is not the data structure or overall
design, but the code.  I can't get my head around the implementation.
Every time I try to get it, I get a headache and generally no fix :(

But maybe I'm just plain stupid, or maybe I just miss one or two key
things of how it's written to get it.

 One confusing thing is that a TMTag can be used for an actual tag or for
 a file. Probably that could be cleaned up.

Agreed, that's something I think that should be quite easy to fix and
that would make the thing easier to understand at first (though it's not
one of the things that makes me not understand TM ;)).  BTW, do we
actually use any file tags?

- a multi-cache one that, as its name suggests, maintains multiple
  caches (sorted tags arrays).  it uses a little more memory and is
  slower on insertion since it maintains several sorted lists, but a
  search on an already existing cache is very fast.


 Won't this be slow for adding many tags at once? How is this design
 better
 than tagmanager?

 Perhaps Colomban could confirm, but my first thought is that this is
 for nested scopes.
 
 I expect the design is better in some respects (and to be fair I didn't
 look for better things), but finding a tag based on its name is
 something we are always going to want to be fast. Even for scope
 completion, you still need to lookup a tag structure from a name string.
 So I think we will always want a sorted array of tags per document that
 we can bsearch (or something equally fast).
 
 Also, I've probably sounded quite harsh on Colomban's design, but I'm
 commenting on what I think is important. I am genuinely interested in
 why his design decisions are better. It's a lot to take in all at once,
 so probably needs some explanation. Sorry if I didn't make that clear.

Don't worry about sounding harsh or something, I totally agree that
changing for changing is not a good idea.  To be honest, there were
really two reasons why I tried to write a new tagmanager on the first place:

1) because I wasn't able to fix the current one;
2) to support asynchronous parsing.

And actually I haven't added much great design, it actually works quite
the same as does TM currently -- per-file tags, a workspace holding a
merge of all tags.

 [...]
 
 * tags (and most types) are reference-counted (so they aren't
necessarily duplicated, e.g. in the multicache backend);


 I don't really understand src/symbols.c since the real-time parsing
 change,
 so don't really understand why this is needed.

 Blame C++ and overloaded names I think.
 
 I looked at the thread about that, and from what I could tell, the
 problem was for reparsing unsaved files. Wasn't the order OK for files
 that have just been saved? (Also I don't follow what that has to do with
 reference counting).

We don't parse unsaved files at all because TM can't do that.  And
that's once another thing that should be fixed.

However I don't get the point you're talking about, maybe my answer is
OT then.


Cheers,
Colomban
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-26 Thread Nick Treleaven

On 24/04/2012 22:31, Colomban Wendling wrote:

Le 17/04/2012 18:20, Nick Treleaven a écrit :
Sorry for the long delays -- and also small activity -- recently.  I
have/had a lot of non-Geany stuff to do and stuff, the whole story, you
know.


No problem.


I finally committed it and pushed it so you can see it, comment, blame,
flame  more:  see https://github.com/b4n/geany/tree/wip%2Fctagsmanager

A few points, as they comes to my mind:

* it support asynchronous parsing (though not concurrent parsing);


What's the difference? Also, what does it buy us?


* it is under the new ctagsmanager/ directory;
* it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
* all types have different names than the tagmanager ones, though
   currently the API is almost an 1:1 mapping -- and that's maybe a
   huge mistake?;


I don't understand why tagmanager has to be replaced, why not just 
replace the parts you want to improve? Rewriting it is likely to lead to 
a new set of bugs and be hard to review and merge changes from master.



* there is 2 backends as of today:
   - a simple one that is simple and that doesn't waste (too much)
 memory, but that searches in O(n);


Linear searching seems like a bad idea when we currently have O(log n). 
Removing x tags is O(n * x) by the look of it.



   - a multi-cache one that, as its name suggests, maintains multiple
 caches (sorted tags arrays).  it uses a little more memory and is
 slower on insertion since it maintains several sorted lists, but a
 search on an already existing cache is very fast.


Won't this be slow for adding many tags at once? How is this design 
better than tagmanager?


Given my global tags merge change (see below), I think tagmanager needs 
some changes to avoid sorting the workspace tags array each time tags 
are updated, but this is certainly doable.



   In practice I haven't yet tested anything big enough to see any
   difference in performances between these two backends, and that's
   something that probably isn't worth bothering about for now.

* this backend abstraction might be really overkill, and maybe we
   could do better without it?;


I don't see why having two is better. The memory overhead for a pointer 
array is not much vs. the size of the tag structures. Fast searching is 
important.



* tags (and most types) are reference-counted (so they aren't
   necessarily duplicated, e.g. in the multicache backend);


I don't really understand src/symbols.c since the real-time parsing 
change, so don't really understand why this is needed.



* tag matching/finding is done using ctm_data_backend_find() (or a
   convenience wrapper), which takes 2 functions for performing the
   search:
   - a sort function, used to, heh, sort the tags to search and/or the
 resulting list (the simple backend should use it to sort the
 result; and the multi-cache backend uses it to sort the caches)


It doesn't sound a good idea to have to sort the haystack on each 
lookup. Lookup should be very fast, like tagmanager. Adding/removing 
tags is always less important because real-time updates can be disabled 
anyway.



   - a match function, use to check whether a tag should be included in
 the results.  Like the sort function it returns a strcmp()-like
 value, with the only difference that it probably returns 0 for more
 tags.

   It is somewhat similar to what tagmanager did, but it's more flexible
   -- and maybe complex, though many sort/match functions would already
   be provided.

* no pruning is done yet, so there is duplicate in the results;
* there is an (almost) working scope completeion implementation;
* ... I don't see anything to add, so I'll stop here :)

All this isn't of course written in stone: if we already redo a lot of
things, let's get something nice -- though IMO we'll always be better
than with tagmanager, since each time I wanted to touch it it took me
hours, and sometimes I even haven't been able to fix the problem
(thinking of e.g. scope completion...).


I don't really see what the problem understanding it is. I thought scope 
completion was just tm_workspace_find_scoped and related functions, not 
some tagmanager-wide problem.



Also, later in the thread he says that performance problems with
resorting global and workspace tags cannot be fixed with the design of
tagmanager. I've been working yesterday on improving this significantly
by merging the new tags each time instead of resorting *all* the tags. I
hope to commit this in the next few days.


Cool!  I haven't done any profiling on either tagmanager or may new
attempt, so I can't tell what's actually slow, but any improvement is
good to have :)


I finished the global tags merging change:
https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd29067864628d

IMO this should make loading global tags fast enough, even for heavy 
use, as the whole array is now only traversed once.


For avoiding resorting of workspace 

Re: [Geany-devel] tagmanager changes

2012-04-26 Thread Nick Treleaven

On 26/04/2012 16:02, Nick Treleaven wrote:

On 24/04/2012 22:31, Colomban Wendling wrote:

* it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;


BTW this is a good idea to clearly separate CTags from tagmanager. If 
this change can be applied separately, perhaps it could be merged into 
master.



For avoiding resorting of workspace tags when only reparsing one source
object, we can remove the source object's old tags  merge the new tags.
This is all O(n) for the workspace array. I haven't looked into
implementing this yet because now it's clear you're working on a
competing change.


Another option is to remove the workspace array altogether and just have 
Geany collate tags from each (sorted) source object when needed. Not 
sure the implications of this yet but it would simplify tagmanager.

___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-26 Thread Lex Trotman
[...]

 I don't understand why tagmanager has to be replaced, why not just replace
 the parts you want to improve? Rewriting it is likely to lead to a new set
 of bugs and be hard to review and merge changes from master.


Hi Nick,

It is good to see someone supporting tagmanager rather than it just
being trashed (by the likes of me :)

One of the problems with tagmanager is its complexity, leading to
considerable wariness on the part of many of us about changing it
since we don't understand what we might break.

Actually documenting the overall structure of tagmanager and how it is
supposed to work would be a good thing, whats a workspace? what is it
meant to represent, how are scopes represented? etc.


[...]
   - a multi-cache one that, as its name suggests, maintains multiple
     caches (sorted tags arrays).  it uses a little more memory and is
     slower on insertion since it maintains several sorted lists, but a
     search on an already existing cache is very fast.


 Won't this be slow for adding many tags at once? How is this design better
 than tagmanager?

Perhaps Colomban could confirm, but my first thought is that this is
for nested scopes.

How does tagmanager handle nested scopes, or how would it need to be
modified to do so, considering the example (in C)

{ struct a o; struct a p;
   o./* struct a members */
  { struct b o;
o./* struct b members */
p./* struct a members */
  }
  o./* struct a members */
  p./* struct a members */
  { struct c o;
o./* struct c members */
p./* struct a members */
  }
  o./* struct a members */
  p./* struct a members */
}

How much needs to be changed in tagmanager so that the right
autocompletes can be provided at each comment?  (assuming c.c is
taught to parse local variables of course)

And of course the same question for Colomban.


 Given my global tags merge change (see below), I think tagmanager needs some
 changes to avoid sorting the workspace tags array each time tags are
 updated, but this is certainly doable.


   In practice I haven't yet tested anything big enough to see any
   difference in performances between these two backends, and that's
   something that probably isn't worth bothering about for now.

 * this backend abstraction might be really overkill, and maybe we
   could do better without it?;


 I don't see why having two is better. The memory overhead for a pointer
 array is not much vs. the size of the tag structures. Fast searching is
 important.


It is but is it flexible enough to be expanded to nested scopes.


 * tags (and most types) are reference-counted (so they aren't
   necessarily duplicated, e.g. in the multicache backend);


 I don't really understand src/symbols.c since the real-time parsing change,
 so don't really understand why this is needed.

Blame C++ and overloaded names I think.

[...]

 I don't really see what the problem understanding it is. I thought scope
 completion was just tm_workspace_find_scoped and related functions, not some
 tagmanager-wide problem.

I think the fact that this isn't clear is the problem :)



 Also, later in the thread he says that performance problems with
 resorting global and workspace tags cannot be fixed with the design of
 tagmanager. I've been working yesterday on improving this significantly
 by merging the new tags each time instead of resorting *all* the tags. I
 hope to commit this in the next few days.


 Cool!  I haven't done any profiling on either tagmanager or may new
 attempt, so I can't tell what's actually slow, but any improvement is
 good to have :)


 I finished the global tags merging change:
 https://github.com/geany/geany/commit/712cdd6aa0005f65bd77b15c34cd29067864628d

 IMO this should make loading global tags fast enough, even for heavy use, as
 the whole array is now only traversed once.

Excellent, good one Nick.

Cheers
Lex
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-24 Thread Colomban Wendling
Le 17/04/2012 18:20, Nick Treleaven a écrit :
 Hi,
 How's it going?

Hi,

Sorry for the long delays -- and also small activity -- recently.  I
have/had a lot of non-Geany stuff to do and stuff, the whole story, you
know.

 Lex mentioned in this mail:
 http://lists.uvena.de/pipermail/geany/2012-April/007991.html
 
 that (according to him) you are working to 'replace it'. Maybe he's
 exaggerating, but this sounds interesting. If you have time could you
 maybe send me a link to the IRC log, or better, explain a little about
 the work on the devel list?

That's true, I have a WIP on writing a new ctags management code.  It is
far from being ready for production, and I'm not even sure the ideas in
it are really appropriate.  But yes, there is a something :)

I finally committed it and pushed it so you can see it, comment, blame,
flame  more:  see https://github.com/b4n/geany/tree/wip%2Fctagsmanager

A few points, as they comes to my mind:

* it is under the new ctagsmanager/ directory;
* it uses the same tag parsers tagmanager used, in ctagsmanager/ctags;
* it support asynchronous parsing (though not concurrent parsing);
* all types have different names than the tagmanager ones, though
  currently the API is almost an 1:1 mapping -- and that's maybe a
  huge mistake?;
* there is 2 backends as of today:
  - a simple one that is simple and that doesn't waste (too much)
memory, but that searches in O(n);
  - a multi-cache one that, as its name suggests, maintains multiple
caches (sorted tags arrays).  it uses a little more memory and is
slower on insertion since it maintains several sorted lists, but a
search on an already existing cache is very fast.

  In practice I haven't yet tested anything big enough to see any
  difference in performances between these two backends, and that's
  something that probably isn't worth bothering about for now.

* this backend abstraction might be really overkill, and maybe we
  could do better without it?;
* tags (and most types) are reference-counted (so they aren't
  necessarily duplicated, e.g. in the multicache backend);
* tag matching/finding is done using ctm_data_backend_find() (or a
  convenience wrapper), which takes 2 functions for performing the
  search:
  - a sort function, used to, heh, sort the tags to search and/or the
resulting list (the simple backend should use it to sort the
result; and the multi-cache backend uses it to sort the caches)
  - a match function, use to check whether a tag should be included in
the results.  Like the sort function it returns a strcmp()-like
value, with the only difference that it probably returns 0 for more
tags.

  It is somewhat similar to what tagmanager did, but it's more flexible
  -- and maybe complex, though many sort/match functions would already
  be provided.

* no pruning is done yet, so there is duplicate in the results;
* there is an (almost) working scope completeion implementation;
* ... I don't see anything to add, so I'll stop here :)

All this isn't of course written in stone: if we already redo a lot of
things, let's get something nice -- though IMO we'll always be better
than with tagmanager, since each time I wanted to touch it it took me
hours, and sometimes I even haven't been able to fix the problem
(thinking of e.g. scope completion...).

 Also, later in the thread he says that performance problems with
 resorting global and workspace tags cannot be fixed with the design of
 tagmanager. I've been working yesterday on improving this significantly
 by merging the new tags each time instead of resorting *all* the tags. I
 hope to commit this in the next few days.

Cool!  I haven't done any profiling on either tagmanager or may new
attempt, so I can't tell what's actually slow, but any improvement is
good to have :)


Regards,
Colomban

___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel


Re: [Geany-devel] tagmanager changes

2012-04-24 Thread Lex Trotman
[...]
 Also, later in the thread he says that performance problems with
 resorting global and workspace tags cannot be fixed with the design of
 tagmanager. I've been working yesterday on improving this significantly
 by merging the new tags each time instead of resorting *all* the tags. I
 hope to commit this in the next few days.

 Cool!  I haven't done any profiling on either tagmanager or may new
 attempt, so I can't tell what's actually slow, but any improvement is
 good to have :)


Coo, have I accidentally ignited a competition between Nick fixing
tagmanager and Colomban's potential replacement? :)

Didn't mean to, but if we get to pick the best, great :)

Cheers
Lex


 Regards,
 Colomban

 ___
 Geany-devel mailing list
 Geany-devel@uvena.de
 https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel
___
Geany-devel mailing list
Geany-devel@uvena.de
https://lists.uvena.de/cgi-bin/mailman/listinfo/geany-devel