Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-19 Thread Henning Thielemann

On 02.07.2011 01:26, Gwern Branwen wrote:


Another thing you can do along the same lines is generate a script to
download all the repos from packages which declare repos. Some ugly
code:


If 'script' also includes Haskell code, then the 'tar' package could be 
of help to walk throught the TAR archive.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-09 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 5:37 PM, Gwern Branwen gwe...@gmail.com wrote:
 Looking at it, the index tarball contains the .cabal files for all
 versions known to Hackage, which isn't necessarily the interesting set
 of cabal files - I'm usually more interested in just the cabal files
 of the latest version of every package. No doubt there's a scripting
 solution (loop over the untarred directory of packages, and take the
 lexically last cabal file?), but it was easier to just exploit cabal
 fetch's behavior of fetching only the latest version and work with
 those tarballs.

The version using just the index tarball is kind of ugly; the
filtering and extracting doesn't seem terribly easy, so the best
script I could come up with was:

cd ~/.cabal/packages/hackage.haskell.org  for DIR in */; do (for
CABAL in `tar --wildcards $DIR -tf 00-index.tar|head -1`; do (tar
-Oxf 00-index.tar $CABAL | runhaskell ~/deps.hs); done); done

(Parentheses aren't necessary but make it more readable.)

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
Athas on #haskell wondered how many dependencies the average Haskell
package had. I commented that it seemed like some fairly simple
scripting to find out, and as these things tend to go, I wound up
doing a complete solution myself.

First, we get most/all of Hackage locally to examine, as tarballs:

for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
fetch $package; done

Then we cd .cabal/packages/hackage.haskell.org

Now we can run a command which extracts the .cabal file from each
tarball to standard output:

find . -name *.tar.gz -exec tar --wildcards *.cabal -Oxf {} \;

We could grep for 'build-depends' or something, but that gives
unreliable dirty results. (80k items, resulting in a hard to believe
87k total deps and an average of 27 deps.) So instead, we use the
Cabal library and write a program to parse Cabal files  spit out the
dependencies, and we feed each .cabal into that:

find . -name *.tar.gz -exec sh -c 'tar --wildcards *.cabal
-Oxf {} | runhaskell ~/deps.hs' \;

And what is deps.hs? Turns out to be surprisingly easy to parse a
String, extract the Library and Executable AST, and grab the
[Dependency] field, and then print it out (code is not particularly
clean):

import Distribution.Package
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
main :: IO ()
main = do cbl - getContents
  let desc = parsePackageDescription cbl
  case desc of
ParseFailed _ - return ()
ParseOk _ d - putStr $ unlines $ map show $ map
(\(Dependency x _) - x) $ extractDeps d
extractDeps :: GenericPackageDescription - [Dependency]
extractDeps d = ldeps ++ edeps
  where ldeps = case (condLibrary d) of
Nothing - []
Just c - condTreeConstraints c
edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d

So what are the results? (The output of one run is attached.) I get
18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
per package.

-- 
gwern
http://www.gwern.net


deps.txt.gz
Description: GNU Zip compressed data
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 4:49 PM, L Corbijn aspergesoe...@gmail.com wrote:
 Is this including or exluding 'or'-ed dependency lists like
 http://hackage.haskell.org/package/hugs2yc ?

Excluding, it seems. When I run the script on that tarball:

$ tar --wildcards *.cabal -Oxf `find . -name *.tar.gz | g hugs2yc`
| runhaskell /home/gwern/deps.hs
PackageName mtl
PackageName uniplate
PackageName yhccore
PackageName ycextra
PackageName parsec
PackageName directory
PackageName filepath

No version of base or containers appears. (mtl appears in both
branches and also the general build-depends list.)

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Rogan Creswick
On Fri, Jul 1, 2011 at 1:43 PM, Gwern Branwen gwe...@gmail.com wrote:
 Athas on #haskell wondered how many dependencies the average Haskell
 package had. I commented that it seemed like some fairly simple
 scripting to find out, and as these things tend to go, I wound up
 doing a complete solution myself.

 First, we get most/all of Hackage locally to examine, as tarballs:

    for package in `cabal list | grep '\*' | tr -d '\*'`; do cabal
 fetch $package; done

I think the index tarball has all the info you need, and would be
faster to retrieve / process, if you or anyone else needs to get the
.cabal files again:

http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb)

The set of the latest package sdists is also available:

http://hackage.haskell.org/cgi-bin/hackage-scripts/archive.tar (~150mb)

--Rogan

 Then we cd .cabal/packages/hackage.haskell.org

 Now we can run a command which extracts the .cabal file from each
 tarball to standard output:

    find . -name *.tar.gz -exec tar --wildcards *.cabal -Oxf {} \;

 We could grep for 'build-depends' or something, but that gives
 unreliable dirty results. (80k items, resulting in a hard to believe
 87k total deps and an average of 27 deps.) So instead, we use the
 Cabal library and write a program to parse Cabal files  spit out the
 dependencies, and we feed each .cabal into that:

    find . -name *.tar.gz -exec sh -c 'tar --wildcards *.cabal
 -Oxf {} | runhaskell ~/deps.hs' \;

 And what is deps.hs? Turns out to be surprisingly easy to parse a
 String, extract the Library and Executable AST, and grab the
 [Dependency] field, and then print it out (code is not particularly
 clean):

 import Distribution.Package
 import Distribution.PackageDescription
 import Distribution.PackageDescription.Parse
 main :: IO ()
 main = do cbl - getContents
          let desc = parsePackageDescription cbl
          case desc of
            ParseFailed _ - return ()
            ParseOk _ d - putStr $ unlines $ map show $ map
 (\(Dependency x _) - x) $ extractDeps d
 extractDeps :: GenericPackageDescription - [Dependency]
 extractDeps d = ldeps ++ edeps
  where ldeps = case (condLibrary d) of
                Nothing - []
                Just c - condTreeConstraints c
        edeps = concat $ map (condTreeConstraints . snd) $ condExecutables d

 So what are the results? (The output of one run is attached.) I get
 18,134 dependencies, having run on 3,137 files, or 5.8 dependencies
 per package.

 --
 gwern
 http://www.gwern.net

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe



___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
On Fri, Jul 1, 2011 at 5:23 PM, Rogan Creswick cresw...@gmail.com wrote:

 I think the index tarball has all the info you need, and would be
 faster to retrieve / process, if you or anyone else needs to get the
 .cabal files again:

 http://hackage.haskell.org/packages/archive/00-index.tar.gz (2.2mb)

Looking at it, the index tarball contains the .cabal files for all
versions known to Hackage, which isn't necessarily the interesting set
of cabal files - I'm usually more interested in just the cabal files
of the latest version of every package. No doubt there's a scripting
solution (loop over the untarred directory of packages, and take the
lexically last cabal file?), but it was easier to just exploit cabal
fetch's behavior of fetching only the latest version and work with
those tarballs.

-- 
gwern
http://www.gwern.net

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Parsing cabal files to calculate average number of dependencies

2011-07-01 Thread Gwern Branwen
Another thing you can do along the same lines is generate a script to
download all the repos from packages which declare repos. Some ugly
code:

import Data.Maybe (fromJust)
import Distribution.PackageDescription
import Distribution.PackageDescription.Parse
import Control.Monad (unless)

main :: IO ()
main = do cbl - getContents
  let desc = parsePackageDescription cbl
  case desc of
ParseFailed _ - return ()
ParseOk _ d - do let repos = repoPair $ extractHead $
extractRepos d
  let cmd = concatMap shellify repos
  unless (null cmd) $ putStrLn cmd

shellify :: (RepoType, String) - String
shellify (rt,url) = case rt of
   Darcs - darcs get  ++ url
   Git - git clone  ++ url
   SVN - svn clone  ++ url
   CVS - cvs co  ++ url
   Mercurial - hg clone  ++ url
   _ - 

repoPair :: [SourceRepo] - [(RepoType, String)]
repoPair = map (\x - (fromJust $ repoType x, fromJust $ repoLocation x))

extractHead :: [SourceRepo] - [SourceRepo]
extractHead rs = filter (\x - isnothing x  ishead x) rs
where ishead sr = case repoKind sr of
RepoHead - True
_ - False
  isnothing ss = case repoType ss of
   Nothing - False
   Just _ - case repoLocation ss of
 Nothing - False
 Just _ - True

extractRepos :: GenericPackageDescription - [SourceRepo]
extractRepos = sourceRepos . packageDescription

This generates results (with the same find command and setup as
previously) like:

...
git clone git://gitorious.org/maximus/mandulia.git
darcs get http://darcs.cielonegro.org/HsOpenSSL/
darcs get http://darcs.cielonegro.org/HsOpenSSL/
hg clone https://bitbucket.org/bos/text-icugit clone
https://github.com/bos/text-icu
darcs get http://code.haskell.org/Graphalyze
darcs get http://code.haskell.org/~roelvandijk/code/base-unicode-symbols
git clone git://github.com/roelvandijk/base-unicode-symbols.git
darcs get http://code.haskell.org/~basvandijk/code/regions
git clone https://github.com/skogsbaer/xmlgen
git clone git://github.com/tanakh/HongoDB.git
darcs get http://repos.mornfall.net/shellish
darcs get http://patch-tag.com/r/Saizan/syb-with-class/
git clone git://github.com/ekmett/eq.git
git clone git://github.com/ekmett/data-lens-fd.git
git clone git://github.com/ekmett/streams.git
git clone git://github.com/alanz/hjsmin.git
darcs get http://patch-tag.com/r/byorgey/diagrams-lib
...

--
gwern
http://www.gwern.net/haskell/Archiving%20GitHub

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe