Le 05/02/2016 14:55, Christophe Demarey a écrit :
Le 5 févr. 2016 à 14:33, Thierry Goubier a écrit :
Le 05/02/2016 11:33, Christophe Demarey a écrit :
Hi Thierry,
Just some thoughts I wanted to share:
Le 3 févr. 2016 à 10:18, Thierry Goubier a écrit :
I went through all the different possible file formats,
class-based, package-based, method-based, log metadata and the
like, and I concluded that:
- the method based format is as good as any other. Even better
since it has a spec (cypress).
I see cons that a class (or package) format would not have. One
file per method approach leads to generate plenty of small files.
In general, file systems do not like that: - it may consumes a
lot of space. I remember I had a Java/maven project with a lot of
small files and I got to fill the inodes tables on my unix
system. - you generate long pathes. Long pathes are not
user-friendly and some OS have restrictions on path length.
The method based structure of filetree is very close to how code is
navigated in Smalltalk browsers: one method at a time, with a
package/class/protocol hierarchical layering on top. The one file
per class / one file per package is a reference to the base unit of
C / C++.
I fully agree with that. As we have a lot (small) methods, we will
have a lot of small files and some file-system does not like that. I
remember huge slow-down be cause of that. It is good to keep that in
mind.
The problem is linked with writing too many files. Because of a possible
uncertainty about the on-disk state, FileTree erase the complete package
directory then rewrites everything, letting the vcs decide what has
really changed. This is doubly slow, because it hits the filesystem and
the vcs.
I said to Dale I'd see into a diff based writer; it should improve
things a lot.
And no OS in general use has path restrictions that matter. Ok, the
windows vm has issues, but this is a vm bug, not a filesystem
issue.
Windows command-line has (had) this limitation.
Good to know.
By adopting a file per method approach, you also increase the
distance to get a common script format for Smalltalk. Here I mean
a file where you could define classes, methods, and run
arbitrary portions of Smalltalk code.
This format is called fileout, and already exist.
I mean something like a python script:
http://archive.stsci.edu/vo/python_examples.html
Not entirely keen in going that way. I prefer declarative formats for
storing packages. And I still think that the fileout format is that (a
sequence of scripts to execute, separated by !!).
All you describe is also available in the FileTree/Cypress format
and is technically better specified.
- method based format allow for method-history queries on the
git/vcs history (as well as class based / package based
queries). - the tree structure on github or bitbucket is quite
convenient (and browsable) to the point one could edit a
package directly in it (I do when I need to do a quick fix).
but is a pain to navigate: too much click to effectively browse
a method content.
You must hate Nautilus, then, since this is Nautilus approach as
well. Just count the number of clicks you do in a Nautilus, and the
number in github.
but we have spotter! (I just miss the exact search to not click and
scroll too much)
Then you want spotter on the web :)
If we remove the instance sub-directory and write instance-side
methods just below the class name in filetree, then you'll get the
exact same number of clicks to reach a method than in Nautilus.
it would be a good idea
Why not.
Fun fact: if you do that with the Mac finder in NexT mode over a
filetree repository (miller columns), you'll see that it almost
looks like a Nautilus top panes.
I do not know what would be the best format but I think we need
to take care to do not generate too much files / folders. File
system and VCS will appreciate also.
I'd say, overall, what we need to remember is that we produce a
lot less lines of code than other languages, and that we shouldn't
over-optimize.
I'll probably look into optimising FileTree-like writing in the
future; I wasn't that good into planning for it and it shows in
specific cases.
It is actually the problem: we generate a lot of small files. I do
not have numbers but I think it would be good to stress a bit a file
system to see where we hit the barrier and compare with the pharo
code base. From the git side, I'm not aware of a limitation regarding
small files.
I'm sure the numbers are already available. And, as I said above, you
may be measuring FileTree implementation limitations and nothing related
to filesystem issues (or git issues).
Thierry