Re: ENB: about external file format 5-thin

2020-06-06 Thread Edward K. Ream
On Sat, Jun 6, 2020 at 8:30 AM vitalije  wrote:

Anyway, I won't insist on changing the format, but if we are changing
> something it would be better to make all changes at once.
>

I agree.

>
> Regarding the first node start sentinel, perhaps new read code can just
> skip this sentinel and use the values from the xml for gnx and headline.
> When writing a file, Leo can check to see if this sentinel is present in
> the external file and if it is, it will keep this sentinel line unchanged.
> Leo always reads existing file to check whether there is a change or not,
> so this check won't be too expensive. This way single external file can be
> opened using different paths in different outlines without generating
> unnecessary file changes.
>

I don't have an opinion about this. Do what you think best, and we'll all
test it.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS1ZbDUJB7wTgSCJjDO%2BSyOO_EdKjan9Js1-HPGcNJ_hgA%40mail.gmail.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread Edward K. Ream
On Sat, Jun 6, 2020 at 10:48 AM Thomas Passin  wrote:

>
>
> On Saturday, June 6, 2020 at 10:42:37 AM UTC-4, vitalije wrote:
>>
>>
>> You wonder why the speed of reading and writing matters. Perhaps when you
>> use Leo it doesn't matter to you if it will load 200ms faster or not. But
>> If a developer wants to run thousand of tests than 20ms less actually means
>> 20 seconds less. Waiting 20 seconds more for tests to finish, might break
>> developer's thought flow. Keeping developer's thought flow leads to better
>> code. So in the end users will benefit even if they don't care about this
>> micro optimizations.
>>
>
> Well, there's something in what you say.
>

But not enough. Any new confusion or bug will cost Leo's users and devs
hours, days or weeks of work.

Edward

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/CAMF8tS3rpCNse72so46a8ELNkNApkBUdAt887QrLB-nM2xZwzQ%40mail.gmail.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread Thomas Passin


On Saturday, June 6, 2020 at 10:42:37 AM UTC-4, vitalije wrote:
>
>
> You wonder why the speed of reading and writing matters. Perhaps when you 
> use Leo it doesn't matter to you if it will load 200ms faster or not. But 
> If a developer wants to run thousand of tests than 20ms less actually means 
> 20 seconds less. Waiting 20 seconds more for tests to finish, might break 
> developer's thought flow. Keeping developer's thought flow leads to better 
> code. So in the end users will benefit even if they don't care about this 
> micro optimizations. 
>

Well, there's something in what you say.

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/3bf2e2c0-7770-45f5-9164-e230a721af91o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread vitalije

On Saturday, June 6, 2020 at 4:18:00 PM UTC+2, Thomas Passin wrote:
>
> Edward also mentioned redundancy.  IMO, redundancy that helps in error 
> recovery is good.  Remember, there are going to be tens of thousands of 
> files in the new format eventually.  Some of them will have mis-used 
> directives, some of them will have some kind of corruption.  We need to 
> have a good chance of recovering those files anyway. 
>

While I would agree that redundancy usually means better error recovery, I 
really doubt that this can be applied here. The redundant parts that I've 
mentioned doesn't add any valuable information that could possibly be used 
for error recovery. And by the way for the redundancy to be used for error 
recovery you must have error recovery tools that can use it (which AFAIK 
Leo doesn't have). So the redundancy here means just more complexity, more 
garbage and nothing valuable in return. 

As I said before I won't insist on this change, but for the sake of being 
precise I won't let go false arguments either.

You wonder why the speed of reading and writing matters. Perhaps when you 
use Leo it doesn't matter to you if it will load 200ms faster or not. But 
If a developer wants to run thousand of tests than 20ms less actually means 
20 seconds less. Waiting 20 seconds more for tests to finish, might break 
developer's thought flow. Keeping developer's thought flow leads to better 
code. So in the end users will benefit even if they don't care about this 
micro optimizations. 

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/4dd661c1-77e3-4c25-808d-7b96afb175dao%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread Thomas Passin

On Saturday, June 6, 2020 at 9:30:38 AM UTC-4, vitalije wrote:
>
> 6. Changing Leo's file format to make your new code easier to test would 
>> be letting the tail wag the dog. I am confident that you can find a robust 
>> testing strategy that does not depend on a new file format.
>>
>
> I understand your unease for making this kind of change. There is nothing 
> urgent in my proposition. If we change write code so that it outputs 
> starting sentinel *@+leo-ver=6*, we can use two different functions for 
> parsing the rest of the file content. Old files having *@+leo-ver=5* will 
> be loaded using the old reading code. So there won't be any inconveniences 
> for users, developers and future maintainers.
>

I'm with Edward on this one.  Having had corrupted or obsolete .leo files 
before, I do not want to have any possibility of having more.  In addition, 
if a new version of Leo starts to write a new format for say @file nodes, 
those still using an older version will not be able to read them.   It's 
already confusing enough to know what we are getting - what is the 
difference between @auto vs @file, for example?  Adding a new format will 
add to the uncertainty, and if you call it something different like @file1, 
that would be even more confusing for a lot of people.

Edward also mentioned redundancy.  IMO, redundancy that helps in error 
recovery is good.  Remember, there are going to be tens of thousands of 
files in the new format eventually.  Some of them will have mis-used 
directives, some of them will have some kind of corruption.  We need to 
have a good chance of recovering those files anyway.  And we would still 
need to keep the old code for the old format in Leo for many years.  So the 
result will be more complexity for Leo (both code branches will need to be 
maintained), not less, and more potential confusion for users and not less.

If Leo had to read large files rapidly and repeatedly, the conclusion might 
be different.  But why should I care if Leo could read leoref.leo in 20 ms 
less time?  It wouldn't matter at all as a practical matter.  As a 
technical matter, of course it's cool if you develop new, clean, fast code 
- who doesn't like that?  But for Leo as an everyday tool, There's really 
no benefit that I can see.

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/6355bcc4-d57c-4e0f-ac81-962893538d8bo%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread vitalije

>
> 6. Changing Leo's file format to make your new code easier to test would 
> be letting the tail wag the dog. I am confident that you can find a robust 
> testing strategy that does not depend on a new file format.
>

I wrote this post not because I couldn't make tests. The attached Leo 
document contains scripts that do tests read and write functions performing 
a round trip on all external files found in the Leo installation folder. 
Each external file is read/parsed using a function *nodes_from_thin_file*  
which is a generator yielding tuples suitable to be piped into the 
*build_tree* which I wrote and tested earlier. The first testing part 
compares tuple values with the values found and prepared using normal Leo's 
read logic. Then test script actually builds a VNode instance representing 
the whole external file and uses function *v_to_string* to generate the 
content of the external file and compares the resulting content with the 
source file. 

I understand your unease for making this kind of change. There is nothing 
urgent in my proposition. If we change write code so that it outputs 
starting sentinel *@+leo-ver=6*, we can use two different functions for 
parsing the rest of the file content. Old files having *@+leo-ver=5* will 
be loaded using the old reading code. So there won't be any inconveniences 
for users, developers and future maintainers.

Explicit is better than implicit, I agree. Then why is the node level 
encoded using '*', '**', '*3*', '*4*', ...? Why is it better than just 
simple '1', '2', '3', ..? Isn't the second variant more explicit?

The need for *@last* directive is a result of having *@-leo *sentinel. Try 
it yourself, delete the closing Leo sentinel and all `*@@last` *lines 
before it, and Leo will read this file correctly placing the last lines at 
the end of the node. The closing leo sentinel doesn't add any useful 
information to the reading process. But because it exists it generates a 
need for the at-last directives. Which means more code to execute, more 
regex searches to perform and no gains in return.

If you edit external file and separate the opening *@+others* or *@+<<  
*sentinel 
from the following start node sentinel (for example insert a few lines 
between them), Leo will read this file correctly, but in the following 
write it will report file as being changed even if user didn't change 
anything. If those two sentinels are expressed *explicitly* not on their 
own separate line but in the following node start sentinel as a single 
character  (for example "+/-" can represent the presence/absence of this 
directive), there won't be possible to separate those two sentinels and we 
would have two pattern less to match while reading.

Even if you prefer user being able to better understand sentinels, having 
two consecutive lines containing the same *<> *text is not 
helping a lot. But it does cause user to see (and read) more garbage 
content.

Perhaps we could have a new setting *@int default-external-file-format=5* 
by default and user can override it to 6 in myLeoSettings.leo. I am sure 
format-6 would be faster to read and write and some users would prefer to 
use it instead. 

Anyway, I won't insist on changing the format, but if we are changing 
something it would be better to make all changes at once. 

Regarding the first node start sentinel, perhaps new read code can just 
skip this sentinel and use the values from the xml for gnx and headline. 
When writing a file, Leo can check to see if this sentinel is present in 
the external file and if it is, it will keep this sentinel line unchanged. 
Leo always reads existing file to check whether there is a change or not, 
so this check won't be too expensive. This way single external file can be 
opened using different paths in different outlines without generating 
unnecessary file changes.

Or we can just skip this sentinel when writing file. This will cause a 
single change to each external file, but after this no changes will ever be 
caused by accessing this file from different outlines. 

Vitalije


-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/9ee3ffce-95ba-4fb4-9472-7589d61a4275o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-06 Thread Edward K. Ream
On Fri, Jun 5, 2020 at 10:10 AM vitalije  wrote:

For the past few days I've been working on the reusable functions for both
> parsing content of external files and writing external files. In the
> attached Leo document there are two new scripts. One is for generating the
> test data, and the other is for testing these two new functions. All tests
> are passing and round trip (*text-> outline -> text*) confirms that these
> functions have almost the same effect as Leo's FastAtFile reading and
> atFile writing methods.
>

Good to know.

Thinking about the format of external files and looking at them, I've come
> to the conclusion that this format contains some redundant information.
> This is not a big problem, but since I am currently working on this part of
> the Leo's code base, I wish to propose some improvements to this format.
> Having redundant information means that different files may produce the
> same outline. This can cause problems when testing round trip
> transformations.
>

Some general reactions:

1. Changing Leo's file format would be a big deal. It will be inconvenient
for Leo's users, Leo's devs, and future maintainers. A new file format
would, at minimum, create migration problems. It would require new
documentation and probably migration scripts similar to the script I
recently wrote.

2. Leo's existing file format explicitly represents all of Leo's syntactic
constructs. I never considered using a minimal set of sentinels. I only
considered the *clearest*, most explicit, set of sentinels. The second
principle of the zen of python 
is "Explicit is better than implicit." I want to remain the explicit
correspondences between sentinels, nodes, @others and section references.

True, the first zen-of-python principle is "Beautiful is better than
ugly."  Imo, this principle does not apply here. Eliding sentinels makes it
harder for users to understand the sentinels. Again imo, there is nothing
very beautiful about embedding subtle inferences in crucial read logic.

3. Error correction is not possible without redundancy. Removing various
"non-essential" sentinels would make it harder to write scripts that act on
external files. Such scripts would have to recreate the clever inferences
that make eliding sentinels possible in the first place.

4. @clean allows users to eliminate *all* sentinels. Those who dislike
sentinels are already using @clean. Those who don't care much about
sentinels will not appreciate yet another unnecessary change to Leo.

5. Changing Leo's file format might affect the @clean logic. This logic
does a diff between the external file and a recreation of that file (with
sentinels) generated from the outline itself. *Maybe* that diff will work
with a new file format, but that is not guaranteed. For sure, removing
redundancy in the file format will make the @clean logic more fragile, in
hard to predict ways.

6. Changing Leo's file format to make your new code easier to test would be
letting the tail wag the dog. I am confident that you can find a robust
testing strategy that does not depend on a new file format.

Now to specific comments:

top level node gnx and its headline are not necessary. Both headline and
> gnx are present in the xml. They don't provide any useful information. This
> also can cause problems when two different outlines contain the same
> external file. If the top level node have different path or different gnx
> in those outlines than they would produce different file even if they have
> the same content.
>

I agree with you and Bob that this can be a problem. Imo, the way forward
is to define clearly what happens when the xml and external file collide. I
welcome your thoughts on this. Imo, it should be considered as a separate
issue.

>
>- *@+<<* sentinels are redundant too. When we encounter the node whose
>headline is a section reference, we know that the section reference was
>just before the opening node line.
>
> Yes, but I don't care.

>
>- *@-<<* sentinel and *@afterref* can be joined in one. The section
>name is not necessary because opening and closing sections must be properly
>nested. We know for sure that the closingsection has the same headline
>as the last open one. The closing *@-<<* sentinel can give a clue
>whether the following line is *@afterref* or an ordinary line. For
>example *@-<<[* means same as closing section sentinel followed by an
>*@after* line, while* @-<<]* means there is no *@after *line after
>this closing sentinel.
>
> The documentation for @afterref
 is: "Marks
non-whitespace text appearing after a section reference." I don't know
whether these words are still true. Perhaps @afterref can truly be
eliminated. If so, the way to do that is to change the *write* logic, not
the read logic. Leo should be able to read @afterref "forever".

>
>- *@+others* is not 

Re: ENB: about external file format 5-thin

2020-06-05 Thread Thomas Passin

On Friday, June 5, 2020 at 1:34:31 PM UTC-4, vitalije wrote:
>
>
>>
>> I just used @delims the other day for a Windows command file.  In cmd 
>> files I use "::" as a comment marker.  I didn't find a Leo file type for 
>> cmd files, so I just went ahead and used the directive.  
>>
>
> Ok, this is a valid use case, though I didn't object this kind of 
> usage.This kind of directives may be skipped when writing external file. 
> Which delimiters were used to write external file can (and should) be 
> deduced from *@+leo* sentinel line. If those delimiters don't match 
> delimiters defined for this file extension (or if there are no defaults 
> like in your case), the *@delims* directive can be automatically added to 
> the top level body. That way we could prevent a possibility of having 
> different pairs of delimiters in a single external file. A possibility to 
> create such ambiguous file is the main reason why these directives are 
> considered dangerous. Handling them during the process of parsing the 
> external file content makes this code complex. And I can't think of a valid 
> use case for this kind of situation.
>
> Delimiters are used in order to allow Leo sentinels to be written in the 
> external file as a comment lines using the proper syntax for the given 
> file. If we have two *@delims* directives with the different values 
> inside one external file, this file can't be syntactically correct.
>
> I am not against letting user to choose which delimiters to use for any 
> given file. I am just suggesting that this choice should be limited to one 
> set of delimiters per file. If we agree on this limitation, then the *@delims 
> *directive can be used but it doesn't have to be written in the external 
> file. If it is necessary (i.e. if it clashes with the default delimiters), 
> then reading code would add it automatically in the top level body. Or 
> perhaps it can be written just as a  flag in the *@+leo* sentinel 
> signaling only that this directive was (or was not) present in the top 
> level body. The delimiters deduced from the *@+leo *should be used for 
> the entire file.
>

Yes you have!  It makes perfect sense. 

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/31be6343-0d4e-438e-bdbe-f4699a64c216o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-05 Thread vitalije


> This has bothered me five or ten times when for unusual reasons I wanted 
> to @file one external file from two Leo-Editor files.  In most cases 
> this problem caused me to do something else.  In one or two cases I 
> lived with this problem. 
>
> -- 
> Segundo Bob 
> segun...@gmail.com  
>

One way to solve this issue is to add a node with the correct @path 
directive one level above the @file node. This will allow that @file node 
in both outlines have the same headline. Then it is necessary to make sure 
that these @file nodes  have the same gnx in both outlines. To achieve this 
you should copy the @file node from the one outline and then paste it 
retaining clones in the other outline. After this both outlines will 
produce the same external file.

It is not impossible to solve this problem using this trick, but it is 
cumbersome. It would be much easier if the top level gnx and headline were 
not written in the external file. Every outline could have its own gnx and 
file path, but they would produce the same output.

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/cd5206e9-c6b3-4d1f-885b-452489228b80o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-05 Thread vitalije

>
>
>
> I just used @delims the other day for a Windows command file.  In cmd 
> files I use "::" as a comment marker.  I didn't find a Leo file type for 
> cmd files, so I just went ahead and used the directive.  
>

Ok, this is a valid use case, though I didn't object this kind of 
usage.This kind of directives may be skipped when writing external file. 
Which delimiters were used to write external file can (and should) be 
deduced from *@+leo* sentinel line. If those delimiters don't match 
delimiters defined for this file extension (or if there are no defaults 
like in your case), the *@delims* directive can be automatically added to 
the top level body. That way we could prevent a possibility of having 
different pairs of delimiters in a single external file. A possibility to 
create such ambiguous file is the main reason why these directives are 
considered dangerous. Handling them during the process of parsing the 
external file content makes this code complex. And I can't think of a valid 
use case for this kind of situation.

Delimiters are used in order to allow Leo sentinels to be written in the 
external file as a comment lines using the proper syntax for the given 
file. If we have two *@delims* directives with the different values inside 
one external file, this file can't be syntactically correct.

I am not against letting user to choose which delimiters to use for any 
given file. I am just suggesting that this choice should be limited to one 
set of delimiters per file. If we agree on this limitation, then the *@delims 
*directive can be used but it doesn't have to be written in the external 
file. If it is necessary (i.e. if it clashes with the default delimiters), 
then reading code would add it automatically in the top level body. Or 
perhaps it can be written just as a  flag in the *@+leo* sentinel signaling 
only that this directive was (or was not) present in the top level body. 
The delimiters deduced from the *@+leo *should be used for the entire file.

I hope I made my point a bit more clear.

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/aed1ad6e-d268-4cde-b467-2dc22fce0069o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-05 Thread Segundo Bob
On 6/5/20 8:10 AM, vitalije wrote:
> top level node gnx and its headline are not necessary. Both headline and
> gnx are present in the xml. They don't provide any useful information.
> This also can cause problems when two different outlines contain the
> same external file. If the top level node have different path or
> different gnx in those outlines than they would produce different file
> even if they have the same content.

This has bothered me five or ten times when for unusual reasons I wanted
to @file one external file from two Leo-Editor files.  In most cases
this problem caused me to do something else.  In one or two cases I
lived with this problem.

-- 
Segundo Bob
segundo...@gmail.com

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/566615c4-02e8-c553-66fb-16eb863cec2c%40gmail.com.


Re: ENB: about external file format 5-thin

2020-06-05 Thread Thomas Passin

On Friday, June 5, 2020 at 11:10:24 AM UTC-4, vitalije wrote:
>
> For the past few days I've been working on the reusable functions for both 
> parsing content of external files and writing external files. In the 
> attached Leo document there are two new scripts. One is for generating the 
> test data, and the other is for testing these two new functions. All tests 
> are passing and round trip (*text-> outline -> text*) confirms that these 
> functions have almost the same effect as Leo's FastAtFile reading and 
> atFile writing methods.
>
> Thinking about the format of external files and looking at them, I've come 
> to the conclusion that this format contains some redundant information. 
> This is not a big problem, but since I am currently working on this part of 
> the Leo's code base, I wish to propose some improvements to this format. 
> Having redundant information means that different files may produce the 
> same outline. This can cause problems when testing round trip 
> transformations.
>
> First of all I have to say, that I wrote two simple scripts that can 
> automatically convert current external file content to the new format and 
> back to the original format.
> Also so called "dangerous directives" (*@comment* and *@delims*), are 
> never used in the Leo's code base. Personaly I can't think of the use case 
> for those directives. If anyone knows for a specific use case where these 
> directives can solve a real life problem which can't be solved without 
> these directives, please share it here. I wish to understand why would 
> anyone wish to use these directives. If no such use case can be found, I 
> would strongly suggest dropping support for those dangerous directives. It 
> would allow us to further simplify both reading and writing code.
> [snip]
> Less sentinel lines means less parsing less ambiguity and less work which 
> leads to both simpler code and faster execution.
>
> Your thoughts, please.
>

I just used @delims the other day for a Windows command file.  In cmd files 
I use "::" as a comment marker.  I didn't find a Leo file type for cmd 
files, so I just went ahead and used the directive.  I have used it a few 
other times over the years. I imagine that @comments is also needed from 
time to time.  I can't be the only one.  So I wouldn't get rid of these two.

I'm all in favor of simplifying code, but I think you may be drifting into 
the area of premature optimization.

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/2f73a133-4a9e-44e7-8e13-66f92f820a74o%40googlegroups.com.


Re: ENB: about external file format 5-thin

2020-06-05 Thread vitalije
I forgot to mention that round trip using new functions is 1.9 times faster 
than using c.atFileCommands. Test script compares round trip of 
leo/core/leoGlobals.py

$ python p.py
setting leoID from os.getenv('USER'): 'vitalije'
f_new average: 30.429ms
f_old average: 58.055ms

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/4e42d534-ac4c-4ec2-b30f-14487f2051dbo%40googlegroups.com.


ENB: about external file format 5-thin

2020-06-05 Thread vitalije
For the past few days I've been working on the reusable functions for both 
parsing content of external files and writing external files. In the 
attached Leo document there are two new scripts. One is for generating the 
test data, and the other is for testing these two new functions. All tests 
are passing and round trip (*text-> outline -> text*) confirms that these 
functions have almost the same effect as Leo's FastAtFile reading and 
atFile writing methods.

Thinking about the format of external files and looking at them, I've come 
to the conclusion that this format contains some redundant information. 
This is not a big problem, but since I am currently working on this part of 
the Leo's code base, I wish to propose some improvements to this format. 
Having redundant information means that different files may produce the 
same outline. This can cause problems when testing round trip 
transformations.

First of all I have to say, that I wrote two simple scripts that can 
automatically convert current external file content to the new format and 
back to the original format.


   - top level node gnx and its headline are not necessary. Both headline 
   and gnx are present in the xml. They don't provide any useful information. 
   This also can cause problems when two different outlines contain the same 
   external file. If the top level node have different path or different gnx 
   in those outlines than they would produce different file even if they have 
   the same content.
   - *@+<<* sentinels are redundant too. When we encounter the node whose 
   headline is a section reference, we know that the section reference was 
   just before the opening node line.
   - *@-<<* sentinel and *@afterref* can be joined in one. The section name 
   is not necessary because opening and closing sections must be properly 
   nested. We know for sure that the closingsection has the same headline 
   as the last open one. The closing *@-<<* sentinel can give a clue 
   whether the following line is *@afterref* or an ordinary line. For 
   example *@-<<[* means same as closing section sentinel followed by an 
   *@after* line, while* @-<<]* means there is no *@after *line after this 
   closing sentinel.
   - *@+others* is not necessary because when we hit the first open node 
   without the section reference in its headline we know for sure that just 
   before this node was @others directive. Also when we encounter new open 
   node with the different identation we can be sure that just before this 
   node was *@others* directive. In the reading external file this line is 
   used just to push current node data on the stack. But this signal can be 
   added to the opening node sentinel as a single character.
   - format of *@+node* sentinel can be changed so that headline comes 
   first and gnx and level at the end of the line for example:
   #@ at.findFilesToRead:ekr.20190108054317.1:6
   instead of 
   #@+node:ekr.20190108054317.1: *6* at.findFilesToRead
   It would be nicer to read source code using other editors
   - closing *@-leo* line is not necessary and there is no need for *@last* 
   directives either. Last lines are just last lines of the top level node.
   - *@first* directive can be present in the body, but it doesn't need to 
   be written in the external file, because we know that all lines coming 
   before `*@+leo*` sentinel are first lines.
   
Also so called "dangerous directives" (*@comment* and *@delims*), are never 
used in the Leo's code base. Personaly I can't think of the use case for 
those directives. If anyone knows for a specific use case where these 
directives can solve a real life problem which can't be solved without 
these directives, please share it here. I wish to understand why would 
anyone wish to use these directives. If no such use case can be found, I 
would strongly suggest dropping support for those dangerous directives. It 
would allow us to further simplify both reading and writing code.

Less sentinel lines means less parsing less ambiguity and less work which 
leads to both simpler code and faster execution.

Your thoughts, please.

Vitalije

-- 
You received this message because you are subscribed to the Google Groups 
"leo-editor" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to leo-editor+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/leo-editor/0dda5ea8-b156-4ff6-a76e-5322894956f0o%40googlegroups.com.


issue-1598-experiments.leo
Description: Binary data