Re: [Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

2010-08-10 Thread Jason Dagit
On Tue, Aug 10, 2010 at 5:54 PM, Brandon Simmons <
brandon.m.simm...@gmail.com> wrote:

> On Tue, Aug 10, 2010 at 4:34 PM, Jason Dagit  wrote:
> >
> >
> > On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons
> >  wrote:
> >>
> >> Greetings Haskellers!
> >>
> >> directory-tree is a module providing a directory-tree-like datatype
> >> along with Foldable and Traversable instances, along with a simple,
> >> high-level IO interface. You can see the package along with some
> >> examples here (apologies if the haddock docs haven't been generated
> >> yet) :
> >>
> >>http://hackage.haskell.org/package/directory-tree
> >
> > If I understand what you're saying, then your library is very similar to
> an
> > abstraction that darcs had for years knows as "Slurpy".  The experience
> in
> > the darcs project was that it lead to performance issues and correctness
> > issues that were hard to find/fix.
> >>
> >> This primary change in this release is the addition of two
> >> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
> >> These functions use `unsafePerformIO` behind the scenes to traverse
> >> the filesystem as required by pure computations consuming the returned
> >> DirTree data structure. I believe I am doing this safely and sanely
> >> but would love if some more experienced folks could comment on the
> >> code.
> >
> > unsafePerformIO or unsafeInterleaveIO?
> > Either way, to me it seems a bit dangerous to be doing this sort of lazy
> IO.
> >  If the directory structure is large will I run out of file handles?  How
> > will IO errors be handled?  Will I receive the exceptions in pure code or
> > inside my IO actions?  Will I run into space leaks if something holds on
> to
> > 1 file and then references it "after" the directory traversal?  I might
> have
> > my history wrong, but as I recall darcs started with lazy slurpies and
> moved
> > to doing things strictly due to space leaks, running out of file
> > descriptors, file descriptor leaks (not running out, but having the file
> be
> > locked long after darcs should have been 'done' with it), and exception
> > delivery.
>
> IO Errors are caught in a pure constructor called "Failed". In
> practice I think my unsafe version is better in many of those respects
> than the original, for example with regard to running out of file
> handles. Are you referring to lazy IO in general, which those problems
> you mention seem to apply to, or the use of unsafePerformIO?
>

It boils down to the same thing right?


>
> I certainly want this module to be as useful and problem-free as
> possible, but I will be content if it is no less problematic than lazy
> IO is problematic.
>
> Could you elaborate on
>
>> "Will I run into space leaks if something holds on to1 file and
> then references
>> it "after" the directory traversal"?
>
>
Let me give you an example.  Prelude's readFile is lazy.  That is, it
returns immediately and then only fetches from the file as you demand the
contents of the file.  This makes it possible to stream the file.  If you
process it chunks, say 1 line at a time, then you can do so in constant
space.

If you then let the contents of the file escape, meaning somewhere else in
the processing references it, then you'll stop streaming it and start
holding on to the whole thing at once.  Something like this, untested:

notleaky1 = do
  xs <- readFile "foo"
  mapM_ print (lines xs)

notleaky2 = do
  xs <- readFile "foo"
  print (length xs)

leaky = do
  xs <- readFile "foo"
  mapM_ print (lines xs)
  print (length xs)

handleleak = do
  xs <- readFile "foo"
  return (take 10 xs)

Now, in leaky if you calculated the length and printed the lines in the same
iteration, the leak would go away.  In the handleleak example the file stays
open even after handleleak produces all 10 elements.

Now imagine those examples in terms of directory traversals instead of read
from a file.

This would still be a problem even if replace readFile with readFile':
readFile' f = unsafePerformIO (readFile f)

I hope that helps,
Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

2010-08-10 Thread Brandon Simmons
On Tue, Aug 10, 2010 at 4:34 PM, Jason Dagit  wrote:
>
>
> On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons
>  wrote:
>>
>> Greetings Haskellers!
>>
>> directory-tree is a module providing a directory-tree-like datatype
>> along with Foldable and Traversable instances, along with a simple,
>> high-level IO interface. You can see the package along with some
>> examples here (apologies if the haddock docs haven't been generated
>> yet) :
>>
>>    http://hackage.haskell.org/package/directory-tree
>
> If I understand what you're saying, then your library is very similar to an
> abstraction that darcs had for years knows as "Slurpy".  The experience in
> the darcs project was that it lead to performance issues and correctness
> issues that were hard to find/fix.
>>
>> This primary change in this release is the addition of two
>> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
>> These functions use `unsafePerformIO` behind the scenes to traverse
>> the filesystem as required by pure computations consuming the returned
>> DirTree data structure. I believe I am doing this safely and sanely
>> but would love if some more experienced folks could comment on the
>> code.
>
> unsafePerformIO or unsafeInterleaveIO?
> Either way, to me it seems a bit dangerous to be doing this sort of lazy IO.
>  If the directory structure is large will I run out of file handles?  How
> will IO errors be handled?  Will I receive the exceptions in pure code or
> inside my IO actions?  Will I run into space leaks if something holds on to
> 1 file and then references it "after" the directory traversal?  I might have
> my history wrong, but as I recall darcs started with lazy slurpies and moved
> to doing things strictly due to space leaks, running out of file
> descriptors, file descriptor leaks (not running out, but having the file be
> locked long after darcs should have been 'done' with it), and exception
> delivery.

IO Errors are caught in a pure constructor called "Failed". In
practice I think my unsafe version is better in many of those respects
than the original, for example with regard to running out of file
handles. Are you referring to lazy IO in general, which those problems
you mention seem to apply to, or the use of unsafePerformIO?

I certainly want this module to be as useful and problem-free as
possible, but I will be content if it is no less problematic than lazy
IO is problematic.

Could you elaborate on

> "Will I run into space leaks if something holds on to1 file and
then references
> it "after" the directory traversal"?

?

> It's a seductive path but one that does not seem to have a good ending.
> I'm not sure what darcs uses these days.  Perhaps that's what hashed-storage
> provides, although I haven't been able to find any documentation on
> hashed-storage other than the haddocks (which only document the api with no
> overview or explanation of the problem hashed-storage solves).
> Jason

Eric Kow just pointed out the existence of hashed-storage to me (I
believe you are right that it is what darcs does/will use) and it will
be interesting to see the approach in there, if I can grok it.

Thanks a lot for the input.

Brandon Simmons
http://coder.bsimmons.name
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

2010-08-10 Thread Jason Dagit
On Mon, Aug 9, 2010 at 10:48 PM, Brandon Simmons <
brandon.m.simm...@gmail.com> wrote:

> Greetings Haskellers!
>
> directory-tree is a module providing a directory-tree-like datatype
> along with Foldable and Traversable instances, along with a simple,
> high-level IO interface. You can see the package along with some
> examples here (apologies if the haddock docs haven't been generated
> yet) :
>
>http://hackage.haskell.org/package/directory-tree


If I understand what you're saying, then your library is very similar to an
abstraction that darcs had for years knows as "Slurpy".  The experience in
the darcs project was that it lead to performance issues and correctness
issues that were hard to find/fix.

>
>
> This primary change in this release is the addition of two
> experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
> These functions use `unsafePerformIO` behind the scenes to traverse
> the filesystem as required by pure computations consuming the returned
> DirTree data structure. I believe I am doing this safely and sanely
> but would love if some more experienced folks could comment on the
> code.
>

unsafePerformIO or unsafeInterleaveIO?

Either way, to me it seems a bit dangerous to be doing this sort of lazy IO.
 If the directory structure is large will I run out of file handles?  How
will IO errors be handled?  Will I receive the exceptions in pure code or
inside my IO actions?  Will I run into space leaks if something holds on to
1 file and then references it "after" the directory traversal?  I might have
my history wrong, but as I recall darcs started with lazy slurpies and moved
to doing things strictly due to space leaks, running out of file
descriptors, file descriptor leaks (not running out, but having the file be
locked long after darcs should have been 'done' with it), and exception
delivery.

It's a seductive path but one that does not seem to have a good ending.

I'm not sure what darcs uses these days.  Perhaps that's what hashed-storage
provides, although I haven't been able to find any documentation on
hashed-storage other than the haddocks (which only document the api with no
overview or explanation of the problem hashed-storage solves).

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] [ANNOUNCE] (and request for review): directory-tree v0.9.0

2010-08-09 Thread Brandon Simmons
Greetings Haskellers!

directory-tree is a module providing a directory-tree-like datatype
along with Foldable and Traversable instances, along with a simple,
high-level IO interface. You can see the package along with some
examples here (apologies if the haddock docs haven't been generated
yet) :

http://hackage.haskell.org/package/directory-tree

This primary change in this release is the addition of two
experimental "lazy" functions: `readDirectoryWithL` and `buildL`.
These functions use `unsafePerformIO` behind the scenes to traverse
the filesystem as required by pure computations consuming the returned
DirTree data structure. I believe I am doing this safely and sanely
but would love if some more experienced folks could comment on the
code.

These changes (and this whole revamping of this originally very simple
module) were inspired by the fact that a few people seemed to really
like this API, and this recent reddit post lamenting the perceived
difficulty of writing a `du`-like function in haskell.


http://www.reddit.com/r/haskell/comments/cs54i/how_would_you_write_du_in_haskell/

One could write such a function using directory-tree as follows (sorry
if the monadic compositional style is foreign):

> import System.Directory.Tree
> import qualified Data.Foldable as F
> import System.IO
> import Control.Monad
>
> du :: FileName -> IO ()
> du = print . F.sum . free <=< readDirectoryWithL (hFileSize <=< readHs)
> where readHs = flip openFile ReadMode

Thanks for reading and for any input, especially performance
suggestions or opinions on my unsafe function usage. I hope this is
useful to someone.


SIncerely,
Brandon Simmons
http://coder.bsimmons.name/
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe