Re: [Haskell-cafe] zip-archive performance/memmory usage

2010-08-10 Thread Neil Brown

On 10/08/10 00:29, Pieter Laeremans wrote:

Hello,

I'm trying some haskell scripting. I'm writing a script to print some 
information
from a zip archive.  The zip-archive library does look nice but the 
performance of zip-archive/lazy bytestring

doesn't seem to scale.

Executing :

   eRelativePath $ head $ zEntries archive

on an archive of around 12 MB with around 20 files yields

Stack space overflow: current size 8388608 bytes.


The script in question can be found at :

http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs

I'm using the latest version of haskell platform.  Are these libaries 
not production ready,

or am I doing something terribly wrong ?


I downloaded your program and compiled it (GHC 6.12.1, zip-archive 
0.1.1.6, bytestring 0.9.1.5).  I ran it on the JVM src.zip (20MB, ~8000 
files) and it sat there for a minute (67s), taking 2.2% memory according 
to top, then completed successfully.  Same behaviour with -O2.  Which 
compares very badly in time to the instant return when I ran unzip -l on 
the same file, but I didn't see any memory problems.  Presumably your 
archive is valid and works with unzip and other tools?


Thanks,

Neil.

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] zip-archive performance/memmory usage

2010-08-10 Thread David Powell
I was interested to see if I could determine what was happening with this.
After some playing around, I noticed the code was running significantly
faster if I *didn't* compile it, but ran it with 'runghc' instead (running
under ghci was also fast).

Here are the running times I found.  The 'Zip.hs' program comes with the
zip-archive package.  The runtime of the compiled version didn't seem to be
affected by optimisations.  Regardless, I'm quite surprised running
interpreted was significantly faster than compiled.

> time runghc ./Zip.hs -l ~/jdk1.6.0_05-src.zip
 1.48s user 0.17s system 97% cpu 1.680 total

> time ./dist/build/Zip/Zip -l ~/jdk1.6.0_05-src.zip
 89.00s user 1.06s system 98% cpu 1:31.84 total

The file 'jdk1.6.0_05-src.zip' was just an 18MB zip file I had lying
around.  I'm using ghc 6.12.1

Cheers,

-- 
David Powell


On Tue, Aug 10, 2010 at 12:10 PM, Jason Dagit  wrote:

>
>
> On Mon, Aug 9, 2010 at 4:29 PM, Pieter Laeremans wrote:
>
>> Hello,
>>
>> I'm trying some haskell scripting. I'm writing a script to print some
>> information
>> from a zip archive.  The zip-archive library does look nice but the
>> performance of zip-archive/lazy bytestring
>> doesn't seem to scale.
>>
>> Executing :
>>
>>eRelativePath $ head $ zEntries archive
>>
>> on an archive of around 12 MB with around 20 files yields
>>
>> Stack space overflow: current size 8388608 bytes.
>>
>
> So it's a stack overflow at about 8 megs.  I don't have a strong sense of
> what is normal, but that seems like a small stack to me.  Oh, actually I
> just check and that is the default stack size :)
>
> I looked at Zip.hs (included as an example).  The closest I see to your
> example is some code for listing the files in the archive.  Perhaps you
> should try the supplied program on your archive and see if it too has a
> stack overflow.
>
> The line the author uses to list files is:
> List-> mapM_ putStrLn $ filesInArchive archive
>
> But, you're taking the head of the entries, so I don't see how you'd be
> holding on to too much data.  I just don't see anything wrong with your
> program.  Did you remember to compile with optimizations?  Perhaps try the
> author's way of listing entries and see if performance changes?
>
>
>>
>> The script in question can be found at :
>>
>> http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs
>>
>> I'm using the latest version of haskell platform.  Are these libaries not
>> production ready,
>> or am I doing something terribly wrong ?
>>
>
> Not production ready would be my assumption.  I think an iteratee style
> might be more appropriate for these sorts of nested streams of potentially
> large size anyway.  I'm skeptical of anything that depends on lazy
> bytestrings or lazy io.  In this case, the performance would appear to be
> depend on lazy bytestrings.
>
> You might want to experiment with increasing the stack size.  Something
> like this:
> ./ZipList +RTS -K100M -RTS foo.zip
>
> Jason
>
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
>
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] zip-archive performance/memmory usage

2010-08-09 Thread Jason Dagit
On Mon, Aug 9, 2010 at 4:29 PM, Pieter Laeremans wrote:

> Hello,
>
> I'm trying some haskell scripting. I'm writing a script to print some
> information
> from a zip archive.  The zip-archive library does look nice but the
> performance of zip-archive/lazy bytestring
> doesn't seem to scale.
>
> Executing :
>
>eRelativePath $ head $ zEntries archive
>
> on an archive of around 12 MB with around 20 files yields
>
> Stack space overflow: current size 8388608 bytes.
>

So it's a stack overflow at about 8 megs.  I don't have a strong sense of
what is normal, but that seems like a small stack to me.  Oh, actually I
just check and that is the default stack size :)

I looked at Zip.hs (included as an example).  The closest I see to your
example is some code for listing the files in the archive.  Perhaps you
should try the supplied program on your archive and see if it too has a
stack overflow.

The line the author uses to list files is:
List-> mapM_ putStrLn $ filesInArchive archive

But, you're taking the head of the entries, so I don't see how you'd be
holding on to too much data.  I just don't see anything wrong with your
program.  Did you remember to compile with optimizations?  Perhaps try the
author's way of listing entries and see if performance changes?


>
> The script in question can be found at :
>
> http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs
>
> I'm using the latest version of haskell platform.  Are these libaries not
> production ready,
> or am I doing something terribly wrong ?
>

Not production ready would be my assumption.  I think an iteratee style
might be more appropriate for these sorts of nested streams of potentially
large size anyway.  I'm skeptical of anything that depends on lazy
bytestrings or lazy io.  In this case, the performance would appear to be
depend on lazy bytestrings.

You might want to experiment with increasing the stack size.  Something like
this:
./ZipList +RTS -K100M -RTS foo.zip

Jason
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] zip-archive performance/memmory usage

2010-08-09 Thread Ivan Lazar Miljenovic
On 10 August 2010 09:29, Pieter Laeremans  wrote:
> Hello,
>
> I'm trying some haskell scripting. I'm writing a script to print some
> information
> from a zip archive.  The zip-archive library does look nice but the
> performance of zip-archive/lazy bytestring
> doesn't seem to scale.
>
> Executing :
>
>    eRelativePath $ head $ zEntries archive
>
> on an archive of around 12 MB with around 20 files yields
>
> Stack space overflow: current size 8388608 bytes.
>
>
> The script in question can be found at :
>
> http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs
>
> I'm using the latest version of haskell platform.  Are these libaries not
> production ready,
> or am I doing something terribly wrong ?

I'm not familiar with the library, but I have a sneaking suspicion
that you're doing something terribly wrong, namely keeping the entire
archive in memory rather than using it lazily.

-- 
Ivan Lazar Miljenovic
ivan.miljeno...@gmail.com
IvanMiljenovic.wordpress.com
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] zip-archive performance/memmory usage

2010-08-09 Thread Pieter Laeremans
Hello,

I'm trying some haskell scripting. I'm writing a script to print some
information
from a zip archive.  The zip-archive library does look nice but the
performance of zip-archive/lazy bytestring
doesn't seem to scale.

Executing :

   eRelativePath $ head $ zEntries archive

on an archive of around 12 MB with around 20 files yields

Stack space overflow: current size 8388608 bytes.


The script in question can be found at :

http://github.com/plaeremans/HaskellSnipplets/blob/master/ZipList.hs

I'm using the latest version of haskell platform.  Are these libaries not
production ready,
or am I doing something terribly wrong ?

kind regards,

Pieter

-- 
Pieter Laeremans 

"The future is here. It's just not evenly distributed yet."  W. Gibson
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe