Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-11 Thread Dan Sugalski

At 11:36 PM 2/11/2001 -0500, Sam Tregar wrote:
>On Sun, 11 Feb 2001, Jan Dubois wrote:
>
> > However, I couldn't solve the problem of "deterministic destruction
> > behavior": Currently Perl will call DESTROY on any object as soon as the
> > last reference to it goes out of scope.  This becomes important if the
> > object own scarce external resources (e.g. file handles or database
> > connections) that are only freed during DESTROY.  Postponing DESTROY until
> > an indeterminate time in the future can lead to program failures due to
> > resource exhaustion.
>
>Well put.  Can we finally admit that if we want Perl to DWIM with respect
>to DESTROY that we need to keep counting references?

Perl needs some level of tracking for objects with finalization attached to 
them. Full refcounting isn't required, however. Also, the vast majority of 
perl variables have no finalization attached to them.

I do wish people would get garbage collection and finalization split in 
their minds. They are two separate things which can, and will, be dealt 
with separately.

For the record:

THE GARBAGE COLLECTOR WILL HAVE NOTHING TO DO WITH FINALIZATION, AND NO 
PERL OBJECT CODE WILL BE CALLED FOR VARIABLES UNDERGOING GARBAGE COLLECTION.

Thank you.

I do wish this stuff would flare up during the week...

>Speaking of which, do any of the high priests know when Larry might come
>down off the mountain?  Any day now the true believers are going to melt
>down their copies of Camel III and cast themselves a golden Python.

The Cabal Magic 5-ball says "Outlook cloudy, try again later".

Dan

--"it's like this"---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-11 Thread Sam Tregar

On Sun, 11 Feb 2001, Jan Dubois wrote:

> However, I couldn't solve the problem of "deterministic destruction
> behavior": Currently Perl will call DESTROY on any object as soon as the
> last reference to it goes out of scope.  This becomes important if the
> object own scarce external resources (e.g. file handles or database
> connections) that are only freed during DESTROY.  Postponing DESTROY until
> an indeterminate time in the future can lead to program failures due to
> resource exhaustion.

Well put.  Can we finally admit that if we want Perl to DWIM with respect
to DESTROY that we need to keep counting references?  I certainly hope so.
I think this research project has gone on long enough.  If Larry comes
back and says that DESTROYs can be called in a non-deterministic fashion
I'll be very surprised and we can certainly revive the GC debate then.
Until then I think we might as well accept that ref-counting is here to
stay.

Speaking of which, do any of the high priests know when Larry might come
down off the mountain?  Any day now the true believers are going to melt
down their copies of Camel III and cast themselves a golden Python.

-sam





Re: Garbage collection

2001-02-11 Thread Bryan C . Warnock

crossed to -internals

Jan Dubois:
> Not necessarily; you would have to implement it that way: When you try to
> open a file and you don't succeed, you run the garbage collector and try
> again.  But what happens in the case of XS code: some external library
> tries to open a file and gets a failure.  How would it trigger a GC in the
> Perl internals?  It wouldn't know a thing that it had been embedded in a
> Perl app.

But that would be the point of the API, no?  Even in XS, you'd interface 
through perl for memory or file management.  So the core would still be 
able to invoke the GC.  Granted, these are last-ditch efforts anyway - what 
would really be needed to trigger?  E[MN]FILE? ENOMEM?  Weird cases of 
ENOSPC?  If you happen to hit one, force a GC pass, and retry whatever the 
call was.  Even if the GC is unsuccessful (at resource reclamation), 
wouldn't you still want Perl to panic, vice the XS code anyway?

>
> This scheme would only work if *all* resources including memory and
> garbage collection are handled by the OS (or at least by a virtual machine
> like JVM or .NET runtime).  But this still doesn't solve the destruction
> order problem.

Well, no.  My thought would be if A needed to be destroyed before B, then B 
wouldn't/shouldn't be marked for GC until after A was destroyed.  It might 
take several sweeps to clean an entire dependency tree, unfortunately.  

-- 
Bryan C. Warnock
bwarnock@(gtemail.net|capita.com)



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-11 Thread Jan Dubois

On Sun, 11 Feb 2001 21:11:09 -0500, "Bryan C. Warnock"
<[EMAIL PROTECTED]> wrote:

>On Sunday 11 February 2001 19:08, Jan Dubois wrote:
>> However, I couldn't solve the problem of "deterministic destruction
>> behavior": Currently Perl will call DESTROY on any object as soon as the
>> last reference to it goes out of scope.  This becomes important if the
>> object own scarce external resources (e.g. file handles or database
>> connections) that are only freed during DESTROY.  Postponing DESTROY until
>> an indeterminate time in the future can lead to program failures due to
>> resource exhaustion.
>
>But doesn't resource exhaustion usually trigger garbage collection and 
>resource reallocation?  (Not that this addresses the remainder of your 
>post.)

Not necessarily; you would have to implement it that way: When you try to
open a file and you don't succeed, you run the garbage collector and try
again.  But what happens in the case of XS code: some external library
tries to open a file and gets a failure.  How would it trigger a GC in the
Perl internals?  It wouldn't know a thing that it had been embedded in a
Perl app.

This scheme would only work if *all* resources including memory and
garbage collection are handled by the OS (or at least by a virtual machine
like JVM or .NET runtime).  But this still doesn't solve the destruction
order problem.

-Jan




Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-11 Thread Bryan C . Warnock

On Sunday 11 February 2001 19:08, Jan Dubois wrote:
> However, I couldn't solve the problem of "deterministic destruction
> behavior": Currently Perl will call DESTROY on any object as soon as the
> last reference to it goes out of scope.  This becomes important if the
> object own scarce external resources (e.g. file handles or database
> connections) that are only freed during DESTROY.  Postponing DESTROY until
> an indeterminate time in the future can lead to program failures due to
> resource exhaustion.

But doesn't resource exhaustion usually trigger garbage collection and 
resource reallocation?  (Not that this addresses the remainder of your 
post.)

-- 
Bryan C. Warnock
bwarnock@(gtemail.net|capita.com)



Re: Garbage collection (was Re: JWZ on s/Java/Perl/)

2001-02-11 Thread Jan Dubois

On Fri, 09 Feb 2001 13:19:36 -0500, Dan Sugalski <[EMAIL PROTECTED]> wrote:

>Almost all refcounting schemes are messy. That's one of its problems. A 
>mark and sweep GC system tends to be less prone to leaks because of program 
>bugs, and when it *does* leak, the leaks tend to be large. Plus the code to 
>do the GC work is very localized, which tends not to be the case in 
>refcounting schemes.
>
>Going to a more advanced garbage collection scheme certainly isn't a 
>universal panacea--mark and sweep in perl 6 will *not* bring about world 
>peace or anything. It will (hopefully) make our lives easier, though.

I currently don't have much time to follow the perl6 discussions, so I
might have missed this, but I have some questions about abandoning
reference counts for Perl internals.  When I reimplemented some of the
Perl guts in C# last year for the 'Perl for .NET" research project, I
tried to get rid of reference counting because the runtime already
provides a generational garbage collection scheme.

However, I couldn't solve the problem of "deterministic destruction
behavior": Currently Perl will call DESTROY on any object as soon as the
last reference to it goes out of scope.  This becomes important if the
object own scarce external resources (e.g. file handles or database
connections) that are only freed during DESTROY.  Postponing DESTROY until
an indeterminate time in the future can lead to program failures due to
resource exhaustion.

The second problem is destruction order:  With reference counts you can
have a dependency graph between objects.  Without them destruction can
only appear in random order, which sometimes is a problem: You may have a
database connection and a recordset.  The recordset may need to be
DESTROYed first because it may contain unsaved data that still needs to be
written back to the database.

I've been discussing this with Sarathy multiple times over the last year,
and he insists that relying on DESTROY for resource cleanup is bad style
and shouldn't be done anyways.  But always explicitly calling e.g. Close()
or whatever is pretty messy at the application level: you have to use
eval{} blocks all over the place to guarantee calling Close() even when
something else blows up.

As an implementer I most definitely see the advantages of giving up
deterministic destruction behavior to random sequences of finalizer calls.
But as a Perl programmer I loathe the additional complexity for my Perl
programs to make them robust.  There is a reason memory allocation isn't
exposed to the user either. :-)

Have these issues been discussed somewhere for Perl6?  If yes, could you
point me to that discussion?

-Jan




Re: JWZ on s/Java/Perl/

2001-02-11 Thread Ken Fox

[Please be careful with attributions -- I didn't write any
 of the quoted material...]

Russ Allbery wrote:
> >>  sub test {
> >>  my($foo, $bar, %baz);
> >>  ...
> >>  return \%baz;
> >>  }

> That's a pretty fundamental aspect of the Perl language; I use that sort
> of construct all over the place.  We don't want to turn Perl into C, where
> if you want to return anything non-trivial without allocation you have to
> pass in somewhere to put it.

There's no problems at all with that code. It's not going to break under
Perl 6. It's not going to be deprecated -- this is one of the ultimate
Keep Perl Perl language features!

I think that there's a lot of concern and confusion about what it means to
replace perl's current memory manager (aka garbage collector) with something
else. The short-term survival guide for dealing with this is "only believe
what Dan says." The longer-term guide is "only believe what Benchmark says."

There are only three Perl-visible features of a collector that I can think
of (besides the obvious "does it work?"):

1. How fast does it run?
2. How efficient is it? (i.e. what's the overhead?)
3. When does it call object destructors?

The first two are too early to talk about, but if Perl 6 is worse than
Perl 5 something is seriously wrong.

The last has never been defined in Perl, but it's definitely something to
discuss before the internals are written. Changing it could be a *major*
job.

- Ken



Re: JWZ on s/Java/Perl/

2001-02-11 Thread Ken Fox

Bart Lateur wrote:
> On Fri, 09 Feb 2001 12:06:12 -0500, Ken Fox wrote:
> > 1. Cheap allocations. Most fast collectors have a one or two
> >instruction malloc. In C it looks like this:
> >
> >  void *malloc(size) { void *obj = heap; heap += size; return obj; }
> > ...
> 
> That is not a garbage collector.

I said it was an allocator not a garbage collector. An advanced
garbage collector just makes very simple/fast allocators possible.

> That is "drop everything you don't need, and we'll never use it
> again." Oh, sure, not doing garbage collection at all is faster then
> doing reference counting.

You don't have a clue. The allocator I posted is a very common allocator
used with copying garbage collectors. This is *not* a "pool" allocator
like Apache uses. What happens is when the heap fills up (probably on a
seg fault triggered by using an obj outside the current address space),
the collector is triggered. It traverses live data and copies it into a
new space (in a simple copying collector these are called "from" and "to"
spaces). Generational collectors often work similarly, but they have
more than two spaces and special rules for references between spaces.

> > 2. Work proportional to live data, not total data. This is hard to
> >believe for a C programmer, but good garbage collectors don't have
> >to "free" every allocation -- they just have to preserve the live,
> >or reachable, data. Some researchers have estimated that 90% or
> >more of all allocated data dies (becomes unreachable) before the
> >next collection. A ref count system has to work on every object,
> >but smarter collectors only work on 10% of the objects.
> 
> That may work for C, but not for Perl.

Um, no. It works pretty well for Lisp, ML, Prolog, etc. I'm positive
that it would work fine for Perl too.

> sub test {
> my($foo, $bar, %baz);
> ...
> return \%baz;
> }
> 
> You may notice that only PART of the locally malloced memory, gets
> freed. the memory of %baz may well be in the middle of that pool. You're
> making a huge mistake if you simply declare the whole block dead weight.

You don't understand how collectors work. You can't think about individual
allocations anymore -- that's a fundamental and severe restriction on
malloc(). What happens is that the garbage accumulates until a collection
happens. When the collection happens, live data is saved and the garbage
over-written.

In your example above, the memory for $foo and $bar is not reclaimed
until a collection occurs. %baz is live data and will be saved when
the collection occurs (often done by copying it to a new heap space).
Yes, this means it is *totally* unsafe to hold pointers to objects in
places the garbage collector doesn't know about. It also means that
memory working-set sizes may be larger than with a malloc-style system.

There are lots of advantages though -- re-read my previous note.

The one big down-side to non-ref count GC is that finalization is
delayed until collection -- which may be relatively infrequently when
there's lots of memory. Data flow analysis can allow us to trigger
finalizers earlier, but that's a lot harder than just watching a ref
count.

- Ken



Re: Auto-install (was autoloaded...)

2001-02-11 Thread James Mastros

You should probably also take a look a Debian's packaging, the .deb.  

It consists of an ar archive containing three files: one for the magic
(named debian-binary, containing "2.0"), one for the filesystem image
(filesystem.tar.gz)

On Fri, Feb 09, 2001 at 06:17:34PM -0200, Branden wrote:
> | Platform independent | Yes | Yes | Yes |
Yes.

> | Available in a wide  | Yes | No  | Yes |
> | range of platforms   | | (Win32 +/-, | |
> |  | | MacOS, VMS) | |
No -- only debian, but that includes several HW archs, and both linux and
the hurd.  But source should be portable and abstracted decently.

> | Allow platform   | Yes | Yes | No  |
> | dependent deployment | | | |
Yes.

> | Supports binary, | Yes | Yes | No  |
> | source and bytecode  | | |  (source?)  |
Yes.  Source format is .dsc (metadata) + .tar.gz (upstream) + .patch.gz
(debian patches).  Keeping that would allow for many CPAN packages to be
used unmodified.  Not keeping it would allow for single-file distribution.

> | Install archive  | Yes | Yes | No  |
> | automatically| | |  (manually) |
Yes.

> | Uninstall and| Yes | Yes | No  |
> | upgrade archive  | | | |
Yes.

> | Install, uninstall   | No  | Yes | No  |
> | and upgrade scripts  | (possibly)  | | |
Yes.

> | Run from archive | Yes | No  | Yes |
No, but certianly possible.  (Replace the .tar.gz files with .zips.  Worse
compression but easyer to use individual files.  We could do .bz2.tar or
somesuch, but that's nastyer for others to deal with.)

> | Resources| Yes | Yes | Yes |
Yes, I think.  (Not certian what you mean by this.)

> | Documentation| Yes | Yes | No  |
Yes.

> | Supports various | Yes | No  | Yes |
> | modules per archive  | |(yes)| (packages)  |
No (one file, one package).  This could easily be changed, though.

> | Merge many archives  | Yes | No  | Yes |
> | in one   | | | |
No, but wouldn't be hard with small extention.

> | Usable with external | Yes | No  | Yes |
> | tools (e.g. WinZip)  | | | |
Yes, with a little pain (ar archive + .tar.gz files).

> | Dependencies of  | Yes | Yes | No  |
> | the archive  |  (included) | | |
Yes.  Complex dependencies supported (versions between A and B), and
support for autogeneration of dependencies in many cases.

> | Build archive from   | Yes | Yes | No  |
> | source tree  | | (external)  | |
Yes.

> | Could be bundled | Yes |  Probably   |  Maybe (if  |
> | with Perl 6? | | No  |  we bundle  |
> |  | |  (too big)  |  a JVM too) |
Yes.  (I think.)  (Binary of format-handling program is 110,288 -- a bit on
the big side, but not too bad.)

> | Signed archives  | No  | No  | Yes |
No.  (Source packages are signed, though.)  (At present, feature is planned
for future, and shouldn't be all that hard.)

-=- James Mastros
-- 
"All I really want is somebody to curl up with and pretend the world is a
safe place."
AIM: theorbtwo   homepage: http://www.rtweb.net/theorb/



Re: JWZ on s/Java/Perl/

2001-02-11 Thread Bart Lateur

On Fri, 9 Feb 2001 16:14:34 -0800, Mark Koopman wrote:

>but is this an example of the way people SHOULD code, or simply are ABLE to 
>code this.   are we considering to deprecate this type of bad style, and force
>to a programmer to, in this case, supply a ref to %baz in the arguements to
>this sub?

I think you're trying too hard turning Perl into just another C clone.
Dynamic variable allocation and freeing, like this, are one of the main
selling points for Perl as a language.

Note that %baz can, as values, also contain references to other
lexically scoped varibles, like \$foo and \$bar. No prototping around
that.

>>  sub test {
>>  my($foo, $bar, %baz);
>>  ...
>>  return \%baz;
>>  }

You could, theoretically, create special versions of "my", or a "my"
with an attribute, so that these declared variables are kept out of the
normal lexical pool, and garbage collected in a more elaborate way,
perhaps even reference counting.

-- 
Bart.