[Hackage] #767: Highlighted source code on hackage.haskell.org mangles Unicode
#767: Highlighted source code on hackage.haskell.org mangles Unicode --+- Reporter: andersk |Owner: Type: defect | Status: new Priority: normal |Milestone: HackageDB Component: hackageDB website| Version: Severity: normal | Keywords: Difficulty: very easy (1 hour) | Ghcversion: Platform: | --+- On hackage.haskell.org, when you browse the source of a module that contains Unicode (e.g. [http://hackage.haskell.org/packages/archive/base- unicode-symbols/latest/doc/html/src/Data-Eq-Unicode.html Data.Eq.Unicode]), it is sent with a `Content-Type: text/html` header with no charset. There is a charset in the XML declaration `?xml version=1.0 encoding=UTF-8?`, but that is ignored by Firefox because of the non-XML `Content-Type`. Therefore, the wrong encoding is detected and the Unicode symbols get mangled. Possible fixes include sending `Content-Type: text/html; charset=UTF-8`, or sending `Content-Type: application/xhtml+xml` so that the XML declaration is respected , or both (`Content-Type: application/xhtml+xml; charset=utf-8`), or adding equivalent `meta http-equiv=Content-Type` tags. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/767 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Thu, 2010-11-18 at 19:46 -0600, Antoine Latter wrote: Hi folks, The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form: ./$pkg/$version/$pkg.cabal including the leading ./. I'm guessing that this is done as a method of distinguishing non-package meta-data. Is this a convention we need to preserve? The .cabal extension is essential. Tools are required to ignore file extensions they do not understand. This provides a bit of forwards compatibility. In theory the file path should not be significant. However the current cabal-install code does rely on the name and version directories. It uses this to find the package id without having to parse the .cabal file. This is bad and fragile. But basically you cannot change that layout for the moment. I would like to move to a model where the file name may be meaningful but the path is not significant. I would also like to make the proper way to find the package id be to parse the file. I'd like to change cabal-install so that it generates it's own fast cache on each cabal update, rather than reading the index.tar every time. This would mean we could pay the expense of parsing all the .cabal files and thus could do it properly. Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs. Currently clients have to know the URL structure of the server: given a package Id taken from the index they construct a URL $root/pkg-ver/pkg-ver.tar.gz. As we all know, forcing clients to construct URLs is bad (inflexible etc etc). To extend the format to contain URLs we were thinking of making use of the tar format's support for symlinks. The symlink content can be interpreted as a URL, either relative or absolute, e.g.: foo-1.0.tar.gz - /package/foo-1.0/foo-1.0.tar.gz or foo-1.0.tar.gz - http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz That is, the index contains a bunch of cabal files, and also a bunch of .tar.gz symlinks. Like URLs in html these are interpreted relative to the URL of the index.tar.gz itself. So if we got the index.tar.gz from say: http://hackage.haskell.org/index.tar.gz then a relative URL like /package/foo-1.0/foo-1.0.tar.gz is interpreted as http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz This is totally standard URL convention, only odd thing is using tarball symlinks as URLs, though it seems like a pretty natural generalisation. It works fine if you unpack the tarball with ordinary tar programs, it just makes broken symlinks. So note that the name of the tarball entry foo-1.0.tar.gz is significant, beyond the fact of the extension. The name foo-1.0 is significant as it is the key in the package Id - url mapping. Duncan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Fri, 2010-11-19 at 12:27 +, Duncan Coutts wrote: Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs. Indeed we could go further and use a single general format for describing or distributing bundles of packages. Use case: local build trees --- A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked locally. /home/me/prgs/myproj/foo/--top of source tree for foo /home/me/prgs/myproj/foo/foo.cabal /home/me/prgs/myproj/bar/ /home/me/prgs/myproj/bar/bar.cabal Now we can have an index.tar containing symlinks to .cabal files! /home/me/prgs/myproj/index.tar: containing foo.cabal - foo/foo.cabal bar.cabal - bar/bar.cabal So these are not copies of the .cabal files, these really are symlinks to the local .cabal files (but inside the tarball). I guess we need some extra index entry to point to the location of the source tree, though it's not a .tar.gz kind. Now just as we can have symlinks (or really URLs) inside the tarball, we could also have full file contents there too. Next use case... Use case: distribution bundles -- Shipping a bunch of source packages as a single file some-name.tar: containing foo.cabal foo-1.0.tar.gz bar.cabal bar-1.0.tar.gz So now instead of symlinks/URLs to separate tarballs, the whole file contents is right there. We have a hackage-like index plus the file tarballs. We might have to have a different naming convention than simply blah.tar for these indexes, otherwise cabal install might not know how to interpret cabal install foo.tar should it interpret foo.tar as an index or as a single package? Opinions? Duncan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
[Hackage] #768: Cabal cannot find GHC when using relative path in -w flag
#768: Cabal cannot find GHC when using relative path in -w flag +--- Reporter: tibbe |Owner: Type: defect | Status: new Priority: normal |Milestone: Component: Cabal library | Version: 1.8.0.6 Severity: normal | Keywords: Difficulty: unknown| Ghcversion: Platform: | +--- Trying to build the network package while standing at the root of a GHC build tree fails to find GHC: {{{ $ cabal install -w inplace/bin/ghc-stage2 network -v2 inplace/bin/ghc-stage2 --numeric-version looking for package tool: ghc-pkg near compiler in inplace/bin found package tool in inplace/bin/ghc-pkg inplace/bin/ghc-pkg --version inplace/bin/ghc-stage2 --supported-languages Reading installed packages... inplace/bin/ghc-pkg dump --global inplace/bin/ghc-pkg dump --user inplace/bin/ghc-stage2 --print-libdir Reading available packages... Resolving dependencies... selecting network-2.3 (hackage) and discarding network-2.0, 2.1.0.0, 2.2.0.0, 2.2.0.1, 2.2.1, 2.2.1.1, 2.2.1.2, 2.2.1.3, 2.2.1.4, 2.2.1.5, 2.2.1.6, 2.2.1.7, 2.2.1.8, 2.2.1.9, 2.2.1.10, 2.2.3 and 2.2.3.1 selecting base-4.3.0.0 (installed) selecting ffi-1.0 (installed) selecting ghc-prim-0.2.0.0 (installed) selecting integer-gmp-0.2.0.2 (installed) selecting rts-1.0 (installed) selecting parsec-2.1.0.1 (hackage) and discarding parsec-2.0, 2.1.0.0, 3.0.0, 3.0.1 and 3.1.0 selecting unix-2.4.1.0 (installed) and discarding unix-2.0, 2.2.0.0, 2.3.0.0, 2.3.1.0, 2.3.2.0, 2.4.0.0, 2.4.0.1 and 2.4.0.2 selecting bytestring-0.9.1.8 (installed) and discarding bytestring-0.9, 0.9.0.1, 0.9.0.2, 0.9.0.3, 0.9.0.4, 0.9.1.0, 0.9.1.1, 0.9.1.2, 0.9.1.3, 0.9.1.4, 0.9.1.5, 0.9.1.6 and 0.9.1.7 In order, the following would be installed: parsec-2.1.0.1 (new package) network-2.3 (new package) parsec-2.1.0.1 has already been downloaded. Extracting /home/tibell/.cabal/packages/hackage.haskell.org/parsec/2.1.0.1/parsec-2.1.0.1.tar.gz to /tmp/parsec-2.1.0.117969... Configuring parsec-2.1.0.1... cabal: Cannot find the program 'ghc' at 'inplace/bin/ghc-stage2' or on the path cabal: Error: some packages failed to install: network-2.3 depends on parsec-2.1.0.1 which failed to install. parsec-2.1.0.1 failed during the configure step. The exception was: ExitFailure 1 }}} -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/768 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: [Hackage] #768: Cabal cannot find GHC when using relative path in -w flag
#768: Cabal cannot find GHC when using relative path in -w flag -+-- Reporter: tibbe |Owner: Type: defect | Status: new Priority: normal |Milestone: Component: cabal-install tool | Version: 1.8.0.6 Severity: normal | Keywords: Difficulty: unknown | Ghcversion: Platform: | -+-- Changes (by duncan): * component: Cabal library = cabal-install tool Comment: Presumably due to cabal-install changing the current directory when it builds the package in question. See SetupWrapper in cabal-install. -- Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/768#comment:1 Hackage http://haskell.org/cabal/ Hackage: Cabal and related projects ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
Duncan Coutts wrote: [...] symlinks [...] Opinions? How would this interact with the absence of symlinks on Windows? Tillmann ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote: The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form: ./$pkg/$version/$pkg.cabal including the leading ./. I'm guessing that this is done as a method of distinguishing non-package meta-data. Is this a convention we need to preserve? I've removed the leading ./; let's see if it breaks anything. ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Fri, Nov 19, 2010 at 02:44:39PM +0100, Tillmann Rendel wrote: Duncan Coutts wrote: [...] symlinks [...] How would this interact with the absence of symlinks on Windows? Note that NTFS has supported all kinds of links, sym- and hard-, since Vista and up, so I guess you're referring to exotic filesystems like FAT32, or slow-to-adopt environments. -- Lars Viklund | z...@acc.umu.se ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
--- don't think the message made it to cabal-devel, forwarding, sorry if you get it twice --- On Fri, Nov 19, 2010 at 8:44 AM, Tillmann Rendel ren...@informatik.uni-marburg.de wrote: Duncan Coutts wrote: [...] symlinks [...] Opinions? How would this interact with the absence of symlinks on Windows? Tillmann Symlinks are supported in the tar format with some special markers. Since the index tarball is never unpacked on the client's filesystem, only cabal needs to know about it, and it is backwards/forwards compatible. It would be just another Tar.Entry (though LinkTarget doesn't provide any straightforward extraction functions..). http://hackage.haskell.org/packages/archive/tar/latest/doc/html/Codec-Archive-Tar.html But if it does get unpacked by, say, 7zip on Windows -- not an unreasonable thing to do -- we should at least check the behaviors aren't too pathological. -.- Antoine, as far as I can tell, the only reason the leading ./ is there in the first place is because the tarball is created by piping find into tar. ( echo preferred-versions; find . -maxdepth 3 -name '*.cabal' ) \ | tar -c -T - -f - | gzip -9 $tmp mv $tmp 00-index.tar.gz To answer your question: running tar -tf on hackage-server's index.tar.gz, it doesn't include the ./, and cabal seems to have no problems with it. I think Duncan mentioned that any other files that are worth adding to the index tarball can be simply added for future versions of cabal, since it ignores unknown files... any kind of metadata you want. Matt ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts duncan.cou...@googlemail.com wrote: On Fri, 2010-11-19 at 12:27 +, Duncan Coutts wrote: Matt and I also discussed making the 00-index.tar.gz into a RESTful format by adding proper URLs for package tarballs. Indeed we could go further and use a single general format for describing or distributing bundles of packages. Use case: local build trees --- A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked locally. /home/me/prgs/myproj/foo/ --top of source tree for foo /home/me/prgs/myproj/foo/foo.cabal /home/me/prgs/myproj/bar/ /home/me/prgs/myproj/bar/bar.cabal Now we can have an index.tar containing symlinks to .cabal files! /home/me/prgs/myproj/index.tar: containing foo.cabal - foo/foo.cabal bar.cabal - bar/bar.cabal So these are not copies of the .cabal files, these really are symlinks to the local .cabal files (but inside the tarball). I guess we need some extra index entry to point to the location of the source tree, though it's not a .tar.gz kind. Now just as we can have symlinks (or really URLs) inside the tarball, we could also have full file contents there too. Next use case... Use case: distribution bundles -- Shipping a bunch of source packages as a single file some-name.tar: containing foo.cabal foo-1.0.tar.gz bar.cabal bar-1.0.tar.gz So now instead of symlinks/URLs to separate tarballs, the whole file contents is right there. We have a hackage-like index plus the file tarballs. We might have to have a different naming convention than simply blah.tar for these indexes, otherwise cabal install might not know how to interpret cabal install foo.tar should it interpret foo.tar as an index or as a single package? Opinions? It feels like an abuse of tar-files to me - if we want to have a set of meta-data about the location of resources in a package repository, I think it would be better to come up with a file format that has the information we want directly and then serve it up. This hypothetical cabal-repository.description file would be pointed at by a user's .cabal/conf, and the config file would describe either what resources the repo makes available or how to discover what resources it makes available. So for a small repo, this file could contain a listing of package ids and where the tar-ball/package descriptions are. We could even have a special case for local or file-share hosted repositories - the presence of an empty repo description file would imply that the contents of the repo is every tar, tar.gz or directory containing a .cabal file in the top level. A larger repository would point to another file which contains a collection of packages and their meta-data. One of the resources could be here's where to find a tarball containing the package descriptions of every package I know how to serve to support the current model of solving dependencies based. In this scenario the 'repo description' files would exactly be a REST description of the contents of Hackage Server. It's the same information as what you'd wanted to put in the index tarball, and we might even want to make it so that the repo config file can live in the tarball and address resources in the tarball it is hosted in (so I can deply a local cabal repo by dropping a tarball into a fileshare). But slipstreaming metadata into soft-links in a tarball feels weird, and since we need client changes to make it work we may as well do it right. Does this sort of approach sound sensible? I don't mind fleshing it out more as a start. Antoine ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Fri, 2010-11-19 at 14:44 +0100, Tillmann Rendel wrote: Duncan Coutts wrote: [...] symlinks [...] Opinions? How would this interact with the absence of symlinks on Windows? Not a problem at all. The index tarballs are never unpacked to files on disk. We read the tar file directly using the tar package. Duncan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: hackage-server: index format
On Fri, 2010-11-19 at 13:46 +, Ross Paterson wrote: On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote: The index tar-ball on Hackage has an odd naming convention. Package descriptions are given paths of the form: ./$pkg/$version/$pkg.cabal including the leading ./. I'm guessing that this is done as a method of distinguishing non-package meta-data. Is this a convention we need to preserve? I've removed the leading ./; let's see if it breaks anything. I expect it'll be fine. cabal-install uses: case Tar.entryContent entry of Tar.NormalFile content _ | takeExtension fileName == .cabal - case splitDirectories (normalise fileName) of [pkgname,vers,_] - and splitDirectories (normalise ./$pkg/$version/$pkg.cabal) = [$pkg,$version,$pkg.cabal] Duncan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel