Re: what are guidelines for when to split a module into a package?
On Thu, Feb 22, 2018 at 03:53:59PM +0100, Timon Gehr via Digitalmars-d wrote: > On 22.02.2018 08:13, Timothee Cour wrote: > > Advantages of splitting: > > ... > > * easier to edit (no need to scroll much to see entirety of module > > we're editing) > > I don't think this particular point is true. Splitting can actually > make editing slightly harder. Scrolling is an inefficient way to find > the definition to edit; just search the file for the definition. +1. A worthwhile coding editor will have built-in functionality to search for symbols and making bookmarks. In Vim, I usually mark several important places in the code, e.g., a mark on the class/struct definition that I'm working on, another mark on a related template, another mark on the function, another mark on the associated unittest, etc.. This lets me jump to any relevant place in two keystrokes, without needing to care exactly where it is in the file. I also have a semi-standard assignment of marks, and update them as I go along, so I can always switch between different parts of the functionality I'm working on. For looking up other things, searching for symbols and ctags with jump-to-definition is invaluable. Also, Vim's ability to split the window at any time is invaluable: it lets me, for example, work on a unittest while having the function body in view or vice versa. Or jump between the two and work on two places in the same file at the same time. Any editor/IDE incapable of such basic functionality ought to be thrown out the window and replaced with something more usable. T -- GEEK = Gatherer of Extremely Enlightening Knowledge
Re: what are guidelines for when to split a module into a package?
On 22.02.2018 08:13, Timothee Cour wrote: Advantages of splitting: ... * easier to edit (no need to scroll much to see entirety of module we're editing) I don't think this particular point is true. Splitting can actually make editing slightly harder. Scrolling is an inefficient way to find the definition to edit; just search the file for the definition.
Re: what are guidelines for when to split a module into a package?
On Thursday, 22 February 2018 at 08:04:19 UTC, Timothee Cour wrote: it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split my understand is that we have: time22 < time1 but time21 ~= time1 so we're in no way worse than before the split unless time21 > time1 (noticably) in which case you have a strong argument On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d wrote: On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote: ``` import std.algorithm.searching : find; not import std.algorithm : find; ``` that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect. Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols. - Jonathan M Davis Regarding import time: Apart from the fact that you usually end up importing almost entire Phobos anyways, the import cost for importing entire Phobos is about 0.2s on my system with the upcoming 2.079 and we are working on getting it even further towards 0 which shouldn't be too hard (there's still a lot of unnecessary work being done by default). I'm really looking forward to the point where we can just tell people to use `import std;` because there's almost no overhead. (I am aware that there are downsides to importing everything but that's not what this discussion is about)
Re: what are guidelines for when to split a module into a package?
> that doesn't help anyone who's actually reading the documentation and trying > to find stuff that way how about the following fix for that: having a DDOC token on a package.d to indicate merging the submodules in the documentation, eg: ``` /// MERGE_SUBMODULES std/aglorithm/package.d ``` when user browses to https://dlang.org/phobos/std_algorithm.html, he would see the DDOC contents from all direct submodules of std/aglorithm/ right there in that ddoc page. On Thu, Feb 22, 2018 at 12:04 AM, Timothee Cour wrote: >> it actually does reduce compilation times if the imports go directly to the >> module in > question rather than to a module that publicly imports the symbols > > time1=compilation time of `import std.algorithm : find;` before split > time21=compilation time of `import std.algorithm : find;` after split > time22=compilation time of `import std.algorithm.searching : find;` after > split > > my understand is that we have: > time22 < time1 but time21 ~= time1 > so we're in no way worse than before the split > unless time21 > time1 (noticably) in which case you have a strong argument > > > On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d > wrote: >> On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d >> wrote: >>> ``` >>> import std.algorithm.searching : find; >>> >>> not >>> >>> import std.algorithm : find; >>> ``` >>> >>> that's just a missed opportunity to benefit from the split; we're in >>> no way worse after the split than before the split in that regard. We >>> can just leave it as `import std.algorithm : find;` with no adverse >>> effect. >> >> Maybe, but the CI stuff for Phobos doesn't like that, and it actually does >> reduce compilation times if the imports go directly to the module in >> question rather than to a module that publicly imports the symbols. >> >> - Jonathan M Davis >>
Re: what are guidelines for when to split a module into a package?
> it actually does reduce compilation times if the imports go directly to the > module in question rather than to a module that publicly imports the symbols time1=compilation time of `import std.algorithm : find;` before split time21=compilation time of `import std.algorithm : find;` after split time22=compilation time of `import std.algorithm.searching : find;` after split my understand is that we have: time22 < time1 but time21 ~= time1 so we're in no way worse than before the split unless time21 > time1 (noticably) in which case you have a strong argument On Wed, Feb 21, 2018 at 11:57 PM, Jonathan M Davis via Digitalmars-d wrote: > On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d > wrote: >> ``` >> import std.algorithm.searching : find; >> >> not >> >> import std.algorithm : find; >> ``` >> >> that's just a missed opportunity to benefit from the split; we're in >> no way worse after the split than before the split in that regard. We >> can just leave it as `import std.algorithm : find;` with no adverse >> effect. > > Maybe, but the CI stuff for Phobos doesn't like that, and it actually does > reduce compilation times if the imports go directly to the module in > question rather than to a module that publicly imports the symbols. > > - Jonathan M Davis >
Re: what are guidelines for when to split a module into a package?
On Wednesday, February 21, 2018 23:44:49 Timothee Cour via Digitalmars-d wrote: > > it's harder to find symbols > > i don't understand this argument. > > ``` > dscanner --declaration startsWith > ./std/algorithm/searching.d(4105:6) > ./std/algorithm/searching.d(4195:6) > ./std/algorithm/searching.d(4265:6) > ./std/algorithm/searching.d(4301:6) > ``` I have no intention whatsoever of using dscanner. I can do something similar with grep, but either way, that doesn't help anyone who's actually reading the documentation and trying to find stuff that way, which is often what people do. Regardless, the more split up stuff is, the more places that you have to look for it. We already have plenty of complaints about folks not knowing whether something is in std.algorithm, std.array, std.range, and std.string without getting into issues of sub-modules. And I really don't understand why splitting up a module like std.array would help anything except maybe import times, but that only matters if the module is large (which std.array definitely isn't) and if you're importing the modules as specifically as possible and not using public imports through something like std.array.package. - Jonathan M Davis
Re: what are guidelines for when to split a module into a package?
On Wednesday, February 21, 2018 23:48:32 Timothee Cour via Digitalmars-d wrote: > ``` > import std.algorithm.searching : find; > > not > > import std.algorithm : find; > ``` > > that's just a missed opportunity to benefit from the split; we're in > no way worse after the split than before the split in that regard. We > can just leave it as `import std.algorithm : find;` with no adverse > effect. Maybe, but the CI stuff for Phobos doesn't like that, and it actually does reduce compilation times if the imports go directly to the module in question rather than to a module that publicly imports the symbols. - Jonathan M Davis
Re: what are guidelines for when to split a module into a package?
``` import std.algorithm.searching : find; not import std.algorithm : find; ``` that's just a missed opportunity to benefit from the split; we're in no way worse after the split than before the split in that regard. We can just leave it as `import std.algorithm : find;` with no adverse effect. On Wed, Feb 21, 2018 at 11:44 PM, Timothee Cour wrote: >> it's harder to find symbols > > i don't understand this argument. > > ``` > dscanner --declaration startsWith > ./std/algorithm/searching.d(4105:6) > ./std/algorithm/searching.d(4195:6) > ./std/algorithm/searching.d(4265:6) > ./std/algorithm/searching.d(4301:6) > ``` > > > On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d > wrote: >> On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d >> wrote: >>> from my perspective it makes sense to split a module M into submodules >>> A, B when: >>> * M is large >>> * there's little interaction between A and B (eg only few symbols from >>> A are needed in B and vice versa) >>> * A and B are logically grouped (that is domain specific) >>> * it doesn't turn into an extreme (1 function per module) >>> >>> Advantages of splitting: >>> * easier to review >>> * easier to edit (no need to scroll much to see entirety of module >>> we're editing) >>> * less pollution from top-level imports as they only affect submodule >>> (likewise with top-level attributes etc) >>> * more modular >>> * doesn't affect existing code since `import M` will continue to work >>> after M is split into a package >>> * less memory when using separate compilation >>> * allows fine-grained compiler options (eg we can compile B with `-O` if >>> needed) * allows to run unittests just for A instead of M >>> * allows selective import in client to avoid pulling in too many >>> dependencies (see arguments that were made for std.range.primitives) >>> >>> Disadvantages of splitting: >>> * more files; but not sure why that's a problem so long we don't go >>> into extremes (eg 1 function per module or other things of bad taste) >>> >>> --- >>> while working on https://github.com/dlang/phobos/pull/6178 I had >>> initially split M:std.array into submodules: >>> A:std.array.util (the old std.array) and B:std.array.static_array >>> (everything added in the PR) >>> IMO this made sense according to my above criteria (in this case there >>> was 0 interaction between A and B), but the reviewers disagreed with >>> the split. >>> >>> So, what are the guidelines? >> >> It's decided on a case-by-case basis but is generally only done if the >> module is quite large. std.array is not particularly large. It's less than >> 4000 lines, including unit tests and documentation, and it only has 18 >> top-level symbols. >> >> Also, remember that within Phobos, imports are supposed to be as localized >> as possible - both in terms of where the import is placed and in terms of >> selective imports - e.g. it would be >> >> import std.algorithm.searching : find; >> >> not >> >> import std.algorithm : find; >> >> which means that splitting the module then requires that all of those >> imports be even more specific. User code can choose to do that or not, but >> it does make having modules split up further that much more tedious. Related >> to that is the fact that anyone searching for these symbols now has more >> modules to search through. So, finding symbols will be harder. Take >> std.algorithm for instance. It was split, because it was getting large >> enough that compiling it on machines without large amounts of memory >> resulted in the compiler running out of memory. So, there was a very good >> argument for splitting it. However, now, even if you know that a symbol is >> in std.algorithm, do you know where in std.algorithm it is? Some are obvious >> - e.g. sort is in std.algorithm.sorting. However, others are not so >> obviously - e.g. where does startsWith live? Arguably, it could go in either >> std.algorithm.comparison or std.algorithm.searching. It turns out that it's >> in std.algorithm.searching, but I generally have to look it up. And where to >> functions like map or filter live? std.algorithm.mutation? >> std.algorithm.iteration? It's not necessarily obvious at all. >> >> From the perspective of users trying to find stuff, splitting modules up >> comes at a real cost, and I honestly don't understand why some folks are in >> a hurry to make module really small. That means more import statements when >> using those modules, and it means that it's harder to find symbols. >> >> Personally, I think that we should be very slow to consider splitting >> modules and only do so when it's clear that there's a real need, and >> std.array is nowhere near that level. >> >> - Jonathan M Davis >> >>
Re: what are guidelines for when to split a module into a package?
> it's harder to find symbols i don't understand this argument. ``` dscanner --declaration startsWith ./std/algorithm/searching.d(4105:6) ./std/algorithm/searching.d(4195:6) ./std/algorithm/searching.d(4265:6) ./std/algorithm/searching.d(4301:6) ``` On Wed, Feb 21, 2018 at 11:31 PM, Jonathan M Davis via Digitalmars-d wrote: > On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d > wrote: >> from my perspective it makes sense to split a module M into submodules >> A, B when: >> * M is large >> * there's little interaction between A and B (eg only few symbols from >> A are needed in B and vice versa) >> * A and B are logically grouped (that is domain specific) >> * it doesn't turn into an extreme (1 function per module) >> >> Advantages of splitting: >> * easier to review >> * easier to edit (no need to scroll much to see entirety of module >> we're editing) >> * less pollution from top-level imports as they only affect submodule >> (likewise with top-level attributes etc) >> * more modular >> * doesn't affect existing code since `import M` will continue to work >> after M is split into a package >> * less memory when using separate compilation >> * allows fine-grained compiler options (eg we can compile B with `-O` if >> needed) * allows to run unittests just for A instead of M >> * allows selective import in client to avoid pulling in too many >> dependencies (see arguments that were made for std.range.primitives) >> >> Disadvantages of splitting: >> * more files; but not sure why that's a problem so long we don't go >> into extremes (eg 1 function per module or other things of bad taste) >> >> --- >> while working on https://github.com/dlang/phobos/pull/6178 I had >> initially split M:std.array into submodules: >> A:std.array.util (the old std.array) and B:std.array.static_array >> (everything added in the PR) >> IMO this made sense according to my above criteria (in this case there >> was 0 interaction between A and B), but the reviewers disagreed with >> the split. >> >> So, what are the guidelines? > > It's decided on a case-by-case basis but is generally only done if the > module is quite large. std.array is not particularly large. It's less than > 4000 lines, including unit tests and documentation, and it only has 18 > top-level symbols. > > Also, remember that within Phobos, imports are supposed to be as localized > as possible - both in terms of where the import is placed and in terms of > selective imports - e.g. it would be > > import std.algorithm.searching : find; > > not > > import std.algorithm : find; > > which means that splitting the module then requires that all of those > imports be even more specific. User code can choose to do that or not, but > it does make having modules split up further that much more tedious. Related > to that is the fact that anyone searching for these symbols now has more > modules to search through. So, finding symbols will be harder. Take > std.algorithm for instance. It was split, because it was getting large > enough that compiling it on machines without large amounts of memory > resulted in the compiler running out of memory. So, there was a very good > argument for splitting it. However, now, even if you know that a symbol is > in std.algorithm, do you know where in std.algorithm it is? Some are obvious > - e.g. sort is in std.algorithm.sorting. However, others are not so > obviously - e.g. where does startsWith live? Arguably, it could go in either > std.algorithm.comparison or std.algorithm.searching. It turns out that it's > in std.algorithm.searching, but I generally have to look it up. And where to > functions like map or filter live? std.algorithm.mutation? > std.algorithm.iteration? It's not necessarily obvious at all. > > From the perspective of users trying to find stuff, splitting modules up > comes at a real cost, and I honestly don't understand why some folks are in > a hurry to make module really small. That means more import statements when > using those modules, and it means that it's harder to find symbols. > > Personally, I think that we should be very slow to consider splitting > modules and only do so when it's clear that there's a real need, and > std.array is nowhere near that level. > > - Jonathan M Davis > >
Re: what are guidelines for when to split a module into a package?
On Wednesday, February 21, 2018 23:13:33 Timothee Cour via Digitalmars-d wrote: > from my perspective it makes sense to split a module M into submodules > A, B when: > * M is large > * there's little interaction between A and B (eg only few symbols from > A are needed in B and vice versa) > * A and B are logically grouped (that is domain specific) > * it doesn't turn into an extreme (1 function per module) > > Advantages of splitting: > * easier to review > * easier to edit (no need to scroll much to see entirety of module > we're editing) > * less pollution from top-level imports as they only affect submodule > (likewise with top-level attributes etc) > * more modular > * doesn't affect existing code since `import M` will continue to work > after M is split into a package > * less memory when using separate compilation > * allows fine-grained compiler options (eg we can compile B with `-O` if > needed) * allows to run unittests just for A instead of M > * allows selective import in client to avoid pulling in too many > dependencies (see arguments that were made for std.range.primitives) > > Disadvantages of splitting: > * more files; but not sure why that's a problem so long we don't go > into extremes (eg 1 function per module or other things of bad taste) > > --- > while working on https://github.com/dlang/phobos/pull/6178 I had > initially split M:std.array into submodules: > A:std.array.util (the old std.array) and B:std.array.static_array > (everything added in the PR) > IMO this made sense according to my above criteria (in this case there > was 0 interaction between A and B), but the reviewers disagreed with > the split. > > So, what are the guidelines? It's decided on a case-by-case basis but is generally only done if the module is quite large. std.array is not particularly large. It's less than 4000 lines, including unit tests and documentation, and it only has 18 top-level symbols. Also, remember that within Phobos, imports are supposed to be as localized as possible - both in terms of where the import is placed and in terms of selective imports - e.g. it would be import std.algorithm.searching : find; not import std.algorithm : find; which means that splitting the module then requires that all of those imports be even more specific. User code can choose to do that or not, but it does make having modules split up further that much more tedious. Related to that is the fact that anyone searching for these symbols now has more modules to search through. So, finding symbols will be harder. Take std.algorithm for instance. It was split, because it was getting large enough that compiling it on machines without large amounts of memory resulted in the compiler running out of memory. So, there was a very good argument for splitting it. However, now, even if you know that a symbol is in std.algorithm, do you know where in std.algorithm it is? Some are obvious - e.g. sort is in std.algorithm.sorting. However, others are not so obviously - e.g. where does startsWith live? Arguably, it could go in either std.algorithm.comparison or std.algorithm.searching. It turns out that it's in std.algorithm.searching, but I generally have to look it up. And where to functions like map or filter live? std.algorithm.mutation? std.algorithm.iteration? It's not necessarily obvious at all. >From the perspective of users trying to find stuff, splitting modules up comes at a real cost, and I honestly don't understand why some folks are in a hurry to make module really small. That means more import statements when using those modules, and it means that it's harder to find symbols. Personally, I think that we should be very slow to consider splitting modules and only do so when it's clear that there's a real need, and std.array is nowhere near that level. - Jonathan M Davis
Re: what are guidelines for when to split a module into a package?
On 22/02/2018 8:13 PM, Timothee Cour wrote: from my perspective it makes sense to split a module M into submodules A, B when: * M is large Soft 1k, 2-3k hard limit. At this point either your scope is good, or you need to subdivide it again.