Re: [racket-dev] proposal: `data' collection

2010-07-25 Thread Matthew Flatt
I've pushed the splice-at-file-level change so we can try it out.

There are many uses of `collection-path' that should change to
`collection-file-path'. Most would be easy to change, but as expected,
that's not always the case. For a while, I think file-level splicing
will work well only for collections where that's expected, such as the
possible data collection.

At Fri, 9 Jul 2010 13:10:26 -0600, Matthew Flatt wrote:
 At Wed, 30 Jun 2010 22:28:48 -0400, Eli Barzilay wrote:
  Back to `data', the problem is that you cannot have two toplevel
  `data' collections -- which means that you cannot have separate
  distributions of `data/foo' and `data/bar' since they must both appear
  in your plt installation or in your user directory -- not in both.
 
 The more I think about it, the more I'm convinced that it's ok to
 splice collections at the file level instead of the directory level:
 
  * Splicing at the file level doesn't create any issues for resolving
module names: There's already a search path to find the directory
for a collection, and the filename is known at that point, so the
filename could be used as part of the search.
 
  * The `collection-path' function would have to be deprecated, and we'd
add a `collection-file-path' function that splices at the file
level.
 
Most uses of `collection-path' could be easily replaced with
`collection-file-path'.
 
Some other uses of `collection-path' don't particularly need
splicing (e.g., locating a file used by a test suite).
 
A Planet package (or some other code outside the main development
repository) might use `collection-path' in a way that would break if
a collection is spliced at the file level. If the package is useful
enough, I imagine there will be plenty of time to fix it before
file-level splicing becomes common.
 
 Does anyone see a problem that I've overlooked?

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-07-10 Thread Matthew Flatt
At Sat, 10 Jul 2010 01:48:09 -0400, Eli Barzilay wrote:
 On Jul  9, Matthew Flatt wrote:
  At Wed, 30 Jun 2010 22:28:48 -0400, Eli Barzilay wrote:
   Back to `data', the problem is that you cannot have two toplevel
   `data' collections -- which means that you cannot have separate
   distributions of `data/foo' and `data/bar' since they must both
   appear in your plt installation or in your user directory -- not
   in both.
  
  The more I think about it, the more I'm convinced that it's ok to
  splice collections at the file level instead of the directory level:
  
   * Splicing at the file level doesn't create any issues for
 resolving module names: There's already a search path to find the
 directory for a collection, and the filename is known at that
 point, so the filename could be used as part of the search.
 
 Was there an issue of efficiency?  

I don't think so.

 Also, I think that there was a
 potential issue with, for example, collects/foo/bar.rkt having:
 
   (require foo/blah)
   (require blah.rkt)
 
 not have the same meaning, since the first would search for the file
 in all roots, and the second is always in the same directory. So
 assuming those would be different, there should be a scan on all
 requires in the tree, making sure that the appropriate style is used.

Using blah.rkt wouldn't always mean the blah.rkt in the same
directory, due to the occasional need for module-path collapsing (as
opposed to module-path resolution). That is, neither style of `require'
would avoid the possibility that blah.rkt comes from a different
directory or that there will be a mess if it exists in multiple places.

On points like this, I came to the conclusion that users can just break
things with bad collection configurations. Similar bad things can
happen with info.rkt files. Maybe `raco setup' should perform a
sanity check, but I don't think we can rule out this kind of confusion
by design within our current constraints.

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-07-10 Thread Matthew Flatt
At Sat, 10 Jul 2010 09:49:31 -0500, Robby Findler wrote:
 On Sat, Jul 10, 2010 at 9:47 AM, Matthew Flatt mfl...@cs.utah.edu wrote:
  At Sat, 10 Jul 2010 09:35:28 -0500, Robby Findler wrote:
  Just to be sure I understand, you're saying that these two may or may
  not refer to the same file:
 
     (require foo/blah)
     (require blah.rkt)
 
  right?
 
  Right --- depending on whether the enclosing file is required through a
  `lib' path or through a `file' path, and when blah.rkt is shadowed in
  an alternative collects directory.
 
  That much is true already if you shadow the foo collection through a
  different collects. Currently, I think the blah.rkt form will
  always refers to a file in the same directory as the enclosing module,
  but I'm not certain.
 
 That seems like it matches the principle of least surprise: quote
 marks mean relative directories and things like (require foo/blah)
 mean go find it in the collection tree. So the difference is
 syntactically apparent.

I agree that it's a nice property. I'm just not convinced that it's
crucial, and even if it holds now, maybe we can live without it in
exchange for collection splicing at the file level.

Along similar lines, various things can go wrong if there are multiple
paths to your main collects tree. I have /home/mflatt symlinked to
/Users/mflatt on my machine, and sometimes I end up referring to the
`racket' binary through one path while requiring a file through the
other path --- which breaks the expected correspondence between `file'
and `lib' paths. The sandbox.rktl test suite goes wrong when that
happens, for example, but I also hit other problems. I suppose I keep
my broken configuration just to maintain a sense of how often things go
wrong and how reasonable it is to try to defend against such problems.
So far, I'm left with the sense that defense against broken
configurations is too hard.

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] proposal: `data' collection

2010-07-09 Thread Matthew Flatt
At Wed, 30 Jun 2010 22:28:48 -0400, Eli Barzilay wrote:
 Back to `data', the problem is that you cannot have two toplevel
 `data' collections -- which means that you cannot have separate
 distributions of `data/foo' and `data/bar' since they must both appear
 in your plt installation or in your user directory -- not in both.

The more I think about it, the more I'm convinced that it's ok to
splice collections at the file level instead of the directory level:

 * Splicing at the file level doesn't create any issues for resolving
   module names: There's already a search path to find the directory
   for a collection, and the filename is known at that point, so the
   filename could be used as part of the search.

 * The `collection-path' function would have to be deprecated, and we'd
   add a `collection-file-path' function that splices at the file
   level.

   Most uses of `collection-path' could be easily replaced with
   `collection-file-path'.

   Some other uses of `collection-path' don't particularly need
   splicing (e.g., locating a file used by a test suite).

   A Planet package (or some other code outside the main development
   repository) might use `collection-path' in a way that would break if
   a collection is spliced at the file level. If the package is useful
   enough, I imagine there will be plenty of time to fix it before
   file-level splicing becomes common.

Does anyone see a problem that I've overlooked?

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-07-09 Thread Eli Barzilay
On Jul  9, Matthew Flatt wrote:
 At Wed, 30 Jun 2010 22:28:48 -0400, Eli Barzilay wrote:
  Back to `data', the problem is that you cannot have two toplevel
  `data' collections -- which means that you cannot have separate
  distributions of `data/foo' and `data/bar' since they must both
  appear in your plt installation or in your user directory -- not
  in both.
 
 The more I think about it, the more I'm convinced that it's ok to
 splice collections at the file level instead of the directory level:
 
  * Splicing at the file level doesn't create any issues for
resolving module names: There's already a search path to find the
directory for a collection, and the filename is known at that
point, so the filename could be used as part of the search.

Was there an issue of efficiency?  Also, I think that there was a
potential issue with, for example, collects/foo/bar.rkt having:

  (require foo/blah)
  (require blah.rkt)

not have the same meaning, since the first would search for the file
in all roots, and the second is always in the same directory.  So
assuming those would be different, there should be a scan on all
requires in the tree, making sure that the appropriate style is used.


  * The `collection-path' function would have to be deprecated, and
we'd add a `collection-file-path' function that splices at the
file level.
 
Most uses of `collection-path' could be easily replaced with
`collection-file-path'.
 
Some other uses of `collection-path' don't particularly need
splicing (e.g., locating a file used by a test suite).
 
A Planet package (or some other code outside the main development
repository) might use `collection-path' in a way that would break
if a collection is spliced at the file level. If the package is
useful enough, I imagine there will be plenty of time to fix it
before file-level splicing becomes common.

This seems reasonable to me.


 Does anyone see a problem that I've overlooked?

I don't see anything else.


[

But a related question is preparing for some definition of package
(not in the highlevel sense, but in a concrete way as info files were
kind-of used as).  It might be related to the way requires work or
something like that.  For example, something that I thought about
recently: maybe it makes sense to have a rule that private
directories can appear *only* in relative string requires that do not
have .. parts in them.  I think that this would rule out any
unauthorized use of such modules in a simple way -- and perhaps this
can be extended somehow to create packages on in a similar way.

]

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-07-08 Thread Matthias Felleisen

This sounds like we should give up on stratification. 


On Jul 7, 2010, at 5:21 PM, Eli Barzilay wrote:

 On Jul  6, Petey Aldous wrote:
 That would be interesting and it would not be terribly difficult to
 instrument setup-plt to do it.
 
 There's no reason to do that -- the data is all there in the dep
 files.  It just needs to be trimmed for the collection name instead of
 the full paths.
 
 I'm attaching a file with a list of all toplevel collections and a has
 table that maps collection names to what they depend on.  I tried to
 visualize this with graphviz in a number of ways -- and there was
 nothing that producing a graph that would make sense.  I then wrote
 some code and there's two groups of collections -- one huge ball of
 code that is all inter-dependent, and then a bunch of collections that
 are nearly free of dependency problems.  The only problem in the
 latter group is teachpacks and deinprogramm which form a cycle.
 
 So, the unproblematic set of collects is:
 
  2htdp, afm, algol60, combinator-parser, datalog, defaults,
  embedded-gui, eopl, frtime, gui-debugger, guibuilder, handin-client,
  handin-server, hierlist, honu, lazy, macro-debugger, make, mysterx,
  mzcom, plai, plot, preprocessor, racklog, redex, repo-time-stamp,
  schemeunit, sgl, sirmail, slatex, srpersist, swindle,
  test-box-recovery, tex2page, waterworld, games, tests, meta
 
 it would work fine to install them one by one in this order, except
 for {teachpack, deinprogramm} which are their own set.
 
 The problematic set of collects is:
 
  racket, at-exp, browser, compiler, config, drracket, drscheme,
  dynext, errortrace, ffi, file, framework, graphics, help, htdp,
  html, icons, lang, launcher, mred, mrlib, mzlib, mzscheme, net,
  openssl, parser-tools, planet, profile, r5rs, r6rs, rackunit, raco,
  reader, readline, rnrs, s-exp, scheme, scribble, scribblings,
  scriblib, setup, slideshow, srfi, stepper, string-constants, syntax,
  syntax-color, test-engine, texpict, trace, typed, typed-scheme,
  unstable, version, web-server, wxme, xml
 
 deps.rktd
 -- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-07-07 Thread Petey Aldous
That would be interesting and it would not be terribly difficult to
instrument setup-plt to do it.

I can't promise to have any time in the lab between now and the beginning of
my time at the University of Utah, so if you'd like it done anytime soon, I
won't be of much help.

- Petey

-Original Message-
From: Matthias Felleisen [mailto:matth...@ccs.neu.edu] 
Sent: Sunday, July 04, 2010 4:16 PM
To: Petey Aldous
Cc: 'Jay McCarthy'; 'Robby Findler'; dev@racket-lang.org
Subject: Re: [racket-dev] proposal: `data' collection


Wouldn't the more interesting thing be to measure the connectivity at the
collects level (not the files) and to discover cycles in this graph? --
Matthias



On Jul 2, 2010, at 5:50 PM, Petey Aldous wrote:

 Here it is. This is a simplified dependency graph; rather than showing
file-to-file dependencies, it shows file dependencies from collection to
collection. Cheers!
 
 - Petey
 
 -Original Message-
 From: Jay McCarthy [mailto:jay.mccar...@gmail.com] 
 Sent: Friday, July 02, 2010 5:22 AM
 To: Robby Findler
 Cc: Eli Barzilay; dev@racket-lang.org; Petey Aldous
 Subject: Re: [racket-dev] proposal: `data' collection
 
 On Fri, Jul 2, 2010 at 5:17 AM, Robby Findler
 ro...@eecs.northwestern.edu wrote:
 Those numbers seem pretty small in today's disk sizes, but I do agree
 that there is value in being able to divide up the distribution and to
 be able to stratify things so we can better keep track of our
 dependencies.
 
 I feel like I routinely download programs and dev environments where
 the distribution is over 100MBs.
 
 (BTW, just a random question: have you thought about
 trying to visualize the collection-level dependencies with, say, dot?)
 
 My student did that. It is absurd. I'll CC him to get the image.
 
 Jay
 
 
 It seems like you're after something that would allow multiple
 collections with the same name. Is that part of it, all of it, or
 mostly irrelevant to your main issue?
 
 Robby
 
 On Fri, Jul 2, 2010 at 1:15 AM, Eli Barzilay e...@barzilay.org wrote:
 [Sorry for the late reply.]
 
 
 On Jun 30, Matthias Felleisen wrote:
 Which part is a symptom? My request for a description when there's
 no owner?
 
 The no-owner fact?
 
 The unstable collects?
 
 All of the above.
 
 Here are some questions that can demonstrate the problem better:
 
 1. What text would you expect to find in the purpose.txt file of
  `unstable'?  Of `data'?
 
 2. My course code is installed in a local collection named `pl'.  Why
  would I need to rename it if a new `pl' module was added to the
  racket distribution?
 
 3. Say that you want to install apache on your machine.  What would
  you think if your OS tells you that you need to install powerpoint
  for that?
 
 4. Assuming that there is a `data' collection with a few known data
  structures implemented, what happens when there's another data
  structure that happens to be just the thing for some project X
  and otherwise it's not too useful, or at least it seems that way.
  Why can't project X come with a new data/foo module?
 
 In any case, keep in mind that there is another way to make me stop
 saying coherent and package -- give up the idea of ever getting a
 smaller racket distribution, and the problem is solved.  We won't even
 need the distribution specs, since everything will be included...
 (From my POV, this would work out great since it looks like the
 general attitude towards it is that it's just something that *I*
 choose to be concerned with, and otherwise there's no problems.)
 
 For reference, here's a table of installer sizes (the Windows one,
 which has the highest compression) and source bundle size (the unix
 one, which has the highest compression in the sources bundles), with
 roughly one representative per year:
 
bin   src
 ver  year  size  size
 ---      
  53  1998  2.6M
 103  2000  3.4M  4.6M
 200  2001  4.3M  6.7M
 203  2002  4.8M  6.0M
 205  2003  5.8M  7.6M
 209  2004  8.4M  11M
 300  2005  12M   13M
 372  2007  14M   15M
 4.0  2008  22M   14M
 4.2  2009  25M   15M
 5.0  2010  28M   16M
 
 --
 ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
   http://barzilay.org/   Maze is Life!
 _
 For list-related administrative tasks:
 http://lists.racket-lang.org/listinfo/dev
 
 _
 For list-related administrative tasks:
 http://lists.racket-lang.org/listinfo/dev
 
 
 
 -- 
 Jay McCarthy j...@cs.byu.edu
 Assistant Professor / Brigham Young University
 http://teammccarthy.org/jay
 
 The glory of God is Intelligence - DC 93
 dag.png_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

_
  For list-related administrative tasks:
  http://lists.racket-lang.org

Re: [racket-dev] proposal: `data' collection

2010-07-02 Thread Robby Findler
Those numbers seem pretty small in today's disk sizes, but I do agree
that there is value in being able to divide up the distribution and to
be able to stratify things so we can better keep track of our
dependencies. (BTW, just a random question: have you thought about
trying to visualize the collection-level dependencies with, say, dot?)

It seems like you're after something that would allow multiple
collections with the same name. Is that part of it, all of it, or
mostly irrelevant to your main issue?

Robby

On Fri, Jul 2, 2010 at 1:15 AM, Eli Barzilay e...@barzilay.org wrote:
 [Sorry for the late reply.]


 On Jun 30, Matthias Felleisen wrote:
 Which part is a symptom? My request for a description when there's
 no owner?

 The no-owner fact?

 The unstable collects?

 All of the above.

 Here are some questions that can demonstrate the problem better:

 1. What text would you expect to find in the purpose.txt file of
   `unstable'?  Of `data'?

 2. My course code is installed in a local collection named `pl'.  Why
   would I need to rename it if a new `pl' module was added to the
   racket distribution?

 3. Say that you want to install apache on your machine.  What would
   you think if your OS tells you that you need to install powerpoint
   for that?

 4. Assuming that there is a `data' collection with a few known data
   structures implemented, what happens when there's another data
   structure that happens to be just the thing for some project X
   and otherwise it's not too useful, or at least it seems that way.
   Why can't project X come with a new data/foo module?

 In any case, keep in mind that there is another way to make me stop
 saying coherent and package -- give up the idea of ever getting a
 smaller racket distribution, and the problem is solved.  We won't even
 need the distribution specs, since everything will be included...
 (From my POV, this would work out great since it looks like the
 general attitude towards it is that it's just something that *I*
 choose to be concerned with, and otherwise there's no problems.)

 For reference, here's a table of installer sizes (the Windows one,
 which has the highest compression) and source bundle size (the unix
 one, which has the highest compression in the sources bundles), with
 roughly one representative per year:

                 bin   src
      ver  year  size  size
      ---      
       53  1998  2.6M
      103  2000  3.4M  4.6M
      200  2001  4.3M  6.7M
      203  2002  4.8M  6.0M
      205  2003  5.8M  7.6M
      209  2004  8.4M  11M
      300  2005  12M   13M
      372  2007  14M   15M
      4.0  2008  22M   14M
      4.2  2009  25M   15M
      5.0  2010  28M   16M

 --
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!
 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] proposal: `data' collection

2010-07-02 Thread Jay McCarthy
On Fri, Jul 2, 2010 at 5:17 AM, Robby Findler
ro...@eecs.northwestern.edu wrote:
 Those numbers seem pretty small in today's disk sizes, but I do agree
 that there is value in being able to divide up the distribution and to
 be able to stratify things so we can better keep track of our
 dependencies.

I feel like I routinely download programs and dev environments where
the distribution is over 100MBs.

 (BTW, just a random question: have you thought about
 trying to visualize the collection-level dependencies with, say, dot?)

My student did that. It is absurd. I'll CC him to get the image.

Jay


 It seems like you're after something that would allow multiple
 collections with the same name. Is that part of it, all of it, or
 mostly irrelevant to your main issue?

 Robby

 On Fri, Jul 2, 2010 at 1:15 AM, Eli Barzilay e...@barzilay.org wrote:
 [Sorry for the late reply.]


 On Jun 30, Matthias Felleisen wrote:
 Which part is a symptom? My request for a description when there's
 no owner?

 The no-owner fact?

 The unstable collects?

 All of the above.

 Here are some questions that can demonstrate the problem better:

 1. What text would you expect to find in the purpose.txt file of
   `unstable'?  Of `data'?

 2. My course code is installed in a local collection named `pl'.  Why
   would I need to rename it if a new `pl' module was added to the
   racket distribution?

 3. Say that you want to install apache on your machine.  What would
   you think if your OS tells you that you need to install powerpoint
   for that?

 4. Assuming that there is a `data' collection with a few known data
   structures implemented, what happens when there's another data
   structure that happens to be just the thing for some project X
   and otherwise it's not too useful, or at least it seems that way.
   Why can't project X come with a new data/foo module?

 In any case, keep in mind that there is another way to make me stop
 saying coherent and package -- give up the idea of ever getting a
 smaller racket distribution, and the problem is solved.  We won't even
 need the distribution specs, since everything will be included...
 (From my POV, this would work out great since it looks like the
 general attitude towards it is that it's just something that *I*
 choose to be concerned with, and otherwise there's no problems.)

 For reference, here's a table of installer sizes (the Windows one,
 which has the highest compression) and source bundle size (the unix
 one, which has the highest compression in the sources bundles), with
 roughly one representative per year:

                 bin   src
      ver  year  size  size
      ---      
       53  1998  2.6M
      103  2000  3.4M  4.6M
      200  2001  4.3M  6.7M
      203  2002  4.8M  6.0M
      205  2003  5.8M  7.6M
      209  2004  8.4M  11M
      300  2005  12M   13M
      372  2007  14M   15M
      4.0  2008  22M   14M
      4.2  2009  25M   15M
      5.0  2010  28M   16M

 --
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!
 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev



-- 
Jay McCarthy j...@cs.byu.edu
Assistant Professor / Brigham Young University
http://teammccarthy.org/jay

The glory of God is Intelligence - DC 93
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] proposal: `data' collection

2010-07-01 Thread Robby Findler
This sounds like a good plan to me. Taking a month to think thru the
issues carefully to plan the talk can only have good consequences for
the discussion. And delaying the solution for the release after August
does not seem like it hurts anything.

Robby

On Wed, Jun 30, 2010 at 9:34 PM, Matthias Felleisen
matth...@ccs.neu.edu wrote:

 I do not understand your answer.

 I nominate you for a major speech at PLT day.
 You can get up to 30 mins full time and 30 mins
 of discussion time.

 If things aren't clear after that, you will never
 be allowed again to use the words 'coherent' and
 'package.'

 Until then I propose we postpone collects/data/ .

 This is a one-month delay and I think in our interest.




 On Jun 30, 2010, at 10:28 PM, Eli Barzilay wrote:

 On Jun 30, Matthias Felleisen wrote:
 Eli, I do not understand and/or appreciate your objection.

 Here is what I understand:

 -- you believe that top-level collects are something coherent
    Q: could you explain this? In what sense is lang/ more or less
    coherent than data/

 They are currently coherent by necessity, because very little can be
 done with their contents.  For example, the `lang' collection has both
 the student languages and the r5rs implementation -- and this was a
 mess because they cannot be separated.  Later on this was cleared up
 by moving the r5rs code into its own collection.  Another problem
 around `lang': we've talked about moving generic language support
 there (like `syntax/module-reader'), but this is impractical to do
 with the current setup.

 Back to `data', the problem is that you cannot have two toplevel
 `data' collections -- which means that you cannot have separate
 distributions of `data/foo' and `data/bar' since they must both appear
 in your plt installation or in your user directory -- not in both.


 -- you introduce the notion of a package
    Q: what is that and how does it differ from a collects?

 A package would be a (coherent) unit of distribution -- it should be
 possible to distribute it as an independent unit, it should have an
 owner (one or more people) -- but most of all, it should be clear what
 code is in the package.  Currently, we only have collections some with
 no owner, some have files that are owned by different people.  And to
 put things very concretely, I want to start working on the
 distribution thing -- and a solution to this problem is *needed*.
 I think that we're beyond the reasonable limit of a monolithic
 distribution -- so splitting it up to such packages is necessary.

 (That's why I said that the name is only a symptom, and that overall I
 *want* to see a solution to this.  And I want one now (as in august),
 not in some hypothetical future.)

 --
          ((lambda (x) (x x)) (lambda (x) (x x)))          Eli Barzilay:
                    http://barzilay.org/                   Maze is Life!

 _
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev

Re: [racket-dev] proposal: `data' collection

2010-06-24 Thread Matthias Felleisen

On Jun 23, 2010, at 5:37 PM, Sam Tobin-Hochstadt wrote:

 To clarify, I'm proposing that this be a part of the core 


I agree with this goal and the name. We could call it 'collections' hierarchy 
as in Java, but I don't think that this is a good name. Ideally, I'd like to 
call it data-structure but that isn't a good path element. 
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev


Re: [racket-dev] proposal: `data' collection

2010-06-24 Thread Eli Barzilay
On Jun 24, Matthias Felleisen wrote:
 On Jun 23, 2010, at 5:37 PM, Sam Tobin-Hochstadt wrote:
 
  To clarify, I'm proposing that this be a part of the core
 
 I agree with this goal and the name.

[BTW, when I talked about part of the core earlier, the meaning was
the actual `racket' collection -- the area where it's difficult to get
into because you're running into all kinds of circularity problems.
IIUC, Sam's meaning is more of a core distribution, which is much
easier to deal with.  (And I'm not making any opinions about TR being
more in the core in the former meaning -- if we take types seriously,
then that's probably the better way to go (with the untyped language
being layered on top), but that's a much more fundamental change than
distribution issues.)]


 We could call it 'collections' hierarchy as in Java, but I don't
 think that this is a good name. Ideally, I'd like to call it
 data-structure but that isn't a good path element.

+1 on both.  `data' does seem to me better than both of these, but I
still dislike it since it's a vague name like etc.  Here's an
attempted clarification of what bothers me about it, and possibly
something to think about before August.

Currently, we use the toplevel collections as units of coherent pieces
of code -- they match both how the code is layered (at least it
should) and how it's distributed.  Yes, the plan for a minimal
distribution is still not concrete -- but we're already doing that.
For example, planet's granularity is by collection, and the most of
the distribution specs are in terms of collections too.  The bottom
line is that currently we have top level collection as something
that roughly corresponds to a package.

Now, a name as generic as `data' is going outside of this role.  It's
likely to have in there general core things like `data/list' as well
as specific things like some queue that is optimized for a specific
task or perhaps a persistent set that is backed by a database.
Because of this I view `data' as a bad choice -- at least as long as
we have the current meaning of a collection.  Even if the decision
was not made consciously, I think that the fact that the core data
types are in the `racket' collection are a direct byproduct of this
issue too.  (There's also the fact that such libraries might need to
behave differently -- for exaple, not getting an error in terms of
`vector-length' when the original call was some Honu `x.length()'.)

Perhaps the role of (toplevel) collections should change.  More
likely, it's about time we decide -- concretely -- on defining
packages.  These would be relevant for planet, for minimizing the
distribution, and a whole bunch of other issues that depend on this.

It seems reasonable to define these packages somehow as either
toplevel collections or complete subtrees of them, with some way of
specifying which directory (or maybe a group of sibling directories)
is the root of a package.  But this requires modifying the current
way that toplevel collections are spliced together -- for example, you
should be able to get install a user-specific data/foo package
(something that is not possible now).

In that case, a generic name like `data' works out much better.  Since
that's a separate issue, my objection to `data' is based on the
current state of the system.

-- 
  ((lambda (x) (x x)) (lambda (x) (x x)))  Eli Barzilay:
http://barzilay.org/   Maze is Life!
_
  For list-related administrative tasks:
  http://lists.racket-lang.org/listinfo/dev