[Hackage] #767: Highlighted source code on hackage.haskell.org mangles Unicode

2010-11-19 Thread Hackage
#767: Highlighted source code on hackage.haskell.org mangles Unicode
--+-
  Reporter:  andersk  |Owner:   
  Type:  defect   |   Status:  new  
  Priority:  normal   |Milestone:  HackageDB
 Component:  hackageDB website|  Version:   
  Severity:  normal   | Keywords:   
Difficulty:  very easy (1 hour)  |   Ghcversion:   
  Platform:   |  
--+-
 On hackage.haskell.org, when you browse the source of a module that
 contains Unicode (e.g. [http://hackage.haskell.org/packages/archive/base-
 unicode-symbols/latest/doc/html/src/Data-Eq-Unicode.html
 Data.Eq.Unicode]), it is sent with a `Content-Type: text/html` header with
 no charset.  There is a charset in the XML declaration `?xml
 version=1.0 encoding=UTF-8?`, but that is ignored by Firefox because
 of the non-XML `Content-Type`.  Therefore, the wrong encoding is detected
 and the Unicode symbols get mangled.

 Possible fixes include sending `Content-Type: text/html; charset=UTF-8`,
 or sending `Content-Type: application/xhtml+xml` so that the XML
 declaration is respected , or both (`Content-Type: application/xhtml+xml;
 charset=utf-8`), or adding equivalent `meta http-equiv=Content-Type`
 tags.

-- 
Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/767
Hackage http://haskell.org/cabal/
Hackage: Cabal and related projects

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Duncan Coutts
On Thu, 2010-11-18 at 19:46 -0600, Antoine Latter wrote:
 Hi folks,
 
 The index tar-ball on Hackage has an odd naming convention. Package
 descriptions are given paths of the form:
 
 ./$pkg/$version/$pkg.cabal
 
 including the leading ./.
 I'm guessing that this is done as a method of distinguishing
 non-package meta-data.
 
 Is this a convention we need to preserve?

The .cabal extension is essential. Tools are required to ignore file
extensions they do not understand. This provides a bit of forwards
compatibility.

In theory the file path should not be significant. However the current
cabal-install code does rely on the name and version directories. It
uses this to find the package id without having to parse the .cabal
file. This is bad and fragile.

But basically you cannot change that layout for the moment.

I would like to move to a model where the file name may be meaningful
but the path is not significant. I would also like to make the proper
way to find the package id be to parse the file. I'd like to change
cabal-install so that it generates it's own fast cache on each cabal
update, rather than reading the index.tar every time. This would mean
we could pay the expense of parsing all the .cabal files and thus could
do it properly.

Matt and I also discussed making the 00-index.tar.gz into a RESTful
format by adding proper URLs for package tarballs. Currently clients
have to know the URL structure of the server: given a package Id taken
from the index they construct a URL $root/pkg-ver/pkg-ver.tar.gz. As we
all know, forcing clients to construct URLs is bad (inflexible etc etc).

To extend the format to contain URLs we were thinking of making use of
the tar format's support for symlinks. The symlink content can be
interpreted as a URL, either relative or absolute, e.g.:

foo-1.0.tar.gz  -  /package/foo-1.0/foo-1.0.tar.gz
or
foo-1.0.tar.gz  -
http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz

That is, the index contains a bunch of cabal files, and also a bunch
of .tar.gz symlinks. Like URLs in html these are interpreted relative to
the URL of the index.tar.gz itself. So if we got the index.tar.gz from
say:

http://hackage.haskell.org/index.tar.gz
then a relative URL like /package/foo-1.0/foo-1.0.tar.gz is interpreted
as
http://hackage.haskell.org/package/foo-1.0/foo-1.0.tar.gz

This is totally standard URL convention, only odd thing is using tarball
symlinks as URLs, though it seems like a pretty natural generalisation.
It works fine if you unpack the tarball with ordinary tar programs, it
just makes broken symlinks.

So note that the name of the tarball entry foo-1.0.tar.gz is
significant, beyond the fact of the extension. The name foo-1.0 is
significant as it is the key in the package Id - url mapping.

Duncan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Duncan Coutts
On Fri, 2010-11-19 at 12:27 +, Duncan Coutts wrote:

 Matt and I also discussed making the 00-index.tar.gz into a RESTful
 format by adding proper URLs for package tarballs.

Indeed we could go further and use a single general format for
describing or distributing bundles of packages.

Use case: local build trees
---

A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked
locally.

/home/me/prgs/myproj/foo/--top of source tree for foo
/home/me/prgs/myproj/foo/foo.cabal
/home/me/prgs/myproj/bar/
/home/me/prgs/myproj/bar/bar.cabal

Now we can have an index.tar containing symlinks to .cabal files!

/home/me/prgs/myproj/index.tar: containing
foo.cabal - foo/foo.cabal
bar.cabal - bar/bar.cabal

So these are not copies of the .cabal files, these really are symlinks
to the local .cabal files (but inside the tarball). I guess we need some
extra index entry to point to the location of the source tree, though
it's not a .tar.gz kind.

Now just as we can have symlinks (or really URLs) inside the tarball, we
could also have full file contents there too. Next use case...

Use case: distribution bundles
--

Shipping a bunch of source packages as a single file

some-name.tar: containing
foo.cabal
foo-1.0.tar.gz
bar.cabal
bar-1.0.tar.gz

So now instead of symlinks/URLs to separate tarballs, the whole file
contents is right there. We have a hackage-like index plus the file
tarballs.


We might have to have a different naming convention than simply blah.tar
for these indexes, otherwise cabal install might not know how to
interpret  cabal install foo.tar should it interpret foo.tar as an
index or as a single package?

Opinions?

Duncan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


[Hackage] #768: Cabal cannot find GHC when using relative path in -w flag

2010-11-19 Thread Hackage
#768: Cabal cannot find GHC when using relative path in -w flag
+---
  Reporter:  tibbe  |Owner: 
  Type:  defect |   Status:  new
  Priority:  normal |Milestone: 
 Component:  Cabal library  |  Version:  1.8.0.6
  Severity:  normal | Keywords: 
Difficulty:  unknown|   Ghcversion: 
  Platform: |  
+---
 Trying to build the network package while standing at the root of a GHC
 build tree fails to find GHC:

 {{{
 $ cabal install -w inplace/bin/ghc-stage2 network -v2
 inplace/bin/ghc-stage2 --numeric-version
 looking for package tool: ghc-pkg near compiler in inplace/bin
 found package tool in inplace/bin/ghc-pkg
 inplace/bin/ghc-pkg --version
 inplace/bin/ghc-stage2 --supported-languages
 Reading installed packages...
 inplace/bin/ghc-pkg dump --global
 inplace/bin/ghc-pkg dump --user
 inplace/bin/ghc-stage2 --print-libdir
 Reading available packages...
 Resolving dependencies...
 selecting network-2.3 (hackage) and discarding network-2.0, 2.1.0.0,
 2.2.0.0,
 2.2.0.1, 2.2.1, 2.2.1.1, 2.2.1.2, 2.2.1.3, 2.2.1.4, 2.2.1.5, 2.2.1.6,
 2.2.1.7,
 2.2.1.8, 2.2.1.9, 2.2.1.10, 2.2.3 and 2.2.3.1
 selecting base-4.3.0.0 (installed)
 selecting ffi-1.0 (installed)
 selecting ghc-prim-0.2.0.0 (installed)
 selecting integer-gmp-0.2.0.2 (installed)
 selecting rts-1.0 (installed)
 selecting parsec-2.1.0.1 (hackage) and discarding parsec-2.0, 2.1.0.0,
 3.0.0,
 3.0.1 and 3.1.0
 selecting unix-2.4.1.0 (installed) and discarding unix-2.0, 2.2.0.0,
 2.3.0.0,
 2.3.1.0, 2.3.2.0, 2.4.0.0, 2.4.0.1 and 2.4.0.2
 selecting bytestring-0.9.1.8 (installed) and discarding bytestring-0.9,
 0.9.0.1, 0.9.0.2, 0.9.0.3, 0.9.0.4, 0.9.1.0, 0.9.1.1, 0.9.1.2, 0.9.1.3,
 0.9.1.4, 0.9.1.5, 0.9.1.6 and 0.9.1.7
 In order, the following would be installed:
 parsec-2.1.0.1 (new package)
 network-2.3 (new package)
 parsec-2.1.0.1 has already been downloaded.
 Extracting
 
/home/tibell/.cabal/packages/hackage.haskell.org/parsec/2.1.0.1/parsec-2.1.0.1.tar.gz
 to /tmp/parsec-2.1.0.117969...
 Configuring parsec-2.1.0.1...
 cabal: Cannot find the program 'ghc' at 'inplace/bin/ghc-stage2' or on the
 path
 cabal: Error: some packages failed to install:
 network-2.3 depends on parsec-2.1.0.1 which failed to install.
 parsec-2.1.0.1 failed during the configure step. The exception was:
 ExitFailure 1
 }}}

-- 
Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/768
Hackage http://haskell.org/cabal/
Hackage: Cabal and related projects

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: [Hackage] #768: Cabal cannot find GHC when using relative path in -w flag

2010-11-19 Thread Hackage
#768: Cabal cannot find GHC when using relative path in -w flag
-+--
  Reporter:  tibbe   |Owner: 
  Type:  defect  |   Status:  new
  Priority:  normal  |Milestone: 
 Component:  cabal-install tool  |  Version:  1.8.0.6
  Severity:  normal  | Keywords: 
Difficulty:  unknown |   Ghcversion: 
  Platform:  |  
-+--
Changes (by duncan):

  * component:  Cabal library = cabal-install tool


Comment:

 Presumably due to cabal-install changing the current directory when it
 builds the package in question. See SetupWrapper in cabal-install.

-- 
Ticket URL: http://hackage.haskell.org/trac/hackage/ticket/768#comment:1
Hackage http://haskell.org/cabal/
Hackage: Cabal and related projects

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Tillmann Rendel

Duncan Coutts wrote:

[...] symlinks [...]

Opinions?


How would this interact with the absence of symlinks on Windows?

  Tillmann

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Ross Paterson
On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote:
 The index tar-ball on Hackage has an odd naming convention. Package
 descriptions are given paths of the form:
 
 ./$pkg/$version/$pkg.cabal
 
 including the leading ./.
 I'm guessing that this is done as a method of distinguishing
 non-package meta-data.
 
 Is this a convention we need to preserve?

I've removed the leading ./; let's see if it breaks anything.

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Lars Viklund
On Fri, Nov 19, 2010 at 02:44:39PM +0100, Tillmann Rendel wrote:
 Duncan Coutts wrote:
 [...] symlinks [...]

 How would this interact with the absence of symlinks on Windows?

Note that NTFS has supported all kinds of links, sym- and hard-, since
Vista and up, so I guess you're referring to exotic filesystems like
FAT32, or slow-to-adopt environments.

-- 
Lars Viklund | z...@acc.umu.se

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Matthew Gruen
--- don't think the message made it to cabal-devel, forwarding, sorry
if you get it twice ---

On Fri, Nov 19, 2010 at 8:44 AM, Tillmann Rendel
ren...@informatik.uni-marburg.de wrote:
 Duncan Coutts wrote:

 [...] symlinks [...]

 Opinions?

 How would this interact with the absence of symlinks on Windows?

  Tillmann


Symlinks are supported in the tar format with some special markers.
Since the index tarball is never unpacked on the client's filesystem,
only cabal needs to know about it, and it is backwards/forwards
compatible. It would be just another Tar.Entry (though LinkTarget
doesn't provide any straightforward extraction functions..).
http://hackage.haskell.org/packages/archive/tar/latest/doc/html/Codec-Archive-Tar.html

But if it does get unpacked by, say, 7zip on Windows -- not an
unreasonable thing to do -- we should at least check the behaviors
aren't too pathological. -.-

Antoine, as far as I can tell, the only reason the leading ./ is there
in the first place is because the tarball is created by piping find
into tar.

( echo preferred-versions; find . -maxdepth 3 -name '*.cabal' ) \
       | tar -c -T - -f - | gzip -9 $tmp
mv $tmp 00-index.tar.gz

To answer your question: running tar -tf on hackage-server's
index.tar.gz, it doesn't include the ./, and cabal seems to have no
problems with it. I think Duncan mentioned that any other files that
are worth adding to the index tarball can be simply added for future
versions of cabal, since it ignores unknown files... any kind of
metadata you want.

Matt

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Antoine Latter
On Fri, Nov 19, 2010 at 7:01 AM, Duncan Coutts
duncan.cou...@googlemail.com wrote:
 On Fri, 2010-11-19 at 12:27 +, Duncan Coutts wrote:

 Matt and I also discussed making the 00-index.tar.gz into a RESTful
 format by adding proper URLs for package tarballs.

 Indeed we could go further and use a single general format for
 describing or distributing bundles of packages.

 Use case: local build trees
 ---

 A bunch of related packages (e.g. gtk2hs, happstack-* etc) unpacked
 locally.

 /home/me/prgs/myproj/foo/            --top of source tree for foo
 /home/me/prgs/myproj/foo/foo.cabal
 /home/me/prgs/myproj/bar/
 /home/me/prgs/myproj/bar/bar.cabal

 Now we can have an index.tar containing symlinks to .cabal files!

 /home/me/prgs/myproj/index.tar: containing
        foo.cabal - foo/foo.cabal
        bar.cabal - bar/bar.cabal

 So these are not copies of the .cabal files, these really are symlinks
 to the local .cabal files (but inside the tarball). I guess we need some
 extra index entry to point to the location of the source tree, though
 it's not a .tar.gz kind.

 Now just as we can have symlinks (or really URLs) inside the tarball, we
 could also have full file contents there too. Next use case...

 Use case: distribution bundles
 --

 Shipping a bunch of source packages as a single file

 some-name.tar: containing
        foo.cabal
        foo-1.0.tar.gz
        bar.cabal
        bar-1.0.tar.gz

 So now instead of symlinks/URLs to separate tarballs, the whole file
 contents is right there. We have a hackage-like index plus the file
 tarballs.


 We might have to have a different naming convention than simply blah.tar
 for these indexes, otherwise cabal install might not know how to
 interpret  cabal install foo.tar should it interpret foo.tar as an
 index or as a single package?

 Opinions?


It feels like an abuse of tar-files to me - if we want to have a set
of meta-data about the location of resources in a package repository,
I think it would be better to come up with a file format that has the
information we want directly and then serve it up.

This hypothetical cabal-repository.description file would be pointed
at by a user's .cabal/conf, and the config file would describe either
what resources the repo makes available or how to discover what
resources it makes available.

So for a small repo, this file could contain a listing of package ids
and where the tar-ball/package descriptions are.

We could even have a special case for local or file-share hosted
repositories - the presence of an empty repo description file would
imply that the contents of the repo is every tar, tar.gz or directory
containing a .cabal file in the top level.

A larger repository would point to another file which contains a
collection of packages and their meta-data. One of the resources could
be here's where to find a tarball containing the package descriptions
of every package I know how to serve to support the current model of
solving dependencies based. In this scenario the 'repo description'
files would exactly be a REST description of the contents of Hackage
Server.

It's the same information as what you'd wanted to put in the index
tarball, and we might even want to make it so that the repo config
file can live in the tarball and address resources in the tarball it
is hosted in (so I can deply a local cabal repo by dropping a tarball
into a fileshare).

But slipstreaming metadata into soft-links in a tarball feels weird,
and since we need client changes to make it work we may as well do it
right.

Does this sort of approach sound sensible? I don't mind fleshing it
out more as a start.

Antoine

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Duncan Coutts
On Fri, 2010-11-19 at 14:44 +0100, Tillmann Rendel wrote:
 Duncan Coutts wrote:
  [...] symlinks [...]
 
  Opinions?
 
 How would this interact with the absence of symlinks on Windows?

Not a problem at all. The index tarballs are never unpacked to files on
disk. We read the tar file directly using the tar package.

Duncan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: hackage-server: index format

2010-11-19 Thread Duncan Coutts
On Fri, 2010-11-19 at 13:46 +, Ross Paterson wrote:
 On Thu, Nov 18, 2010 at 07:46:33PM -0600, Antoine Latter wrote:
  The index tar-ball on Hackage has an odd naming convention. Package
  descriptions are given paths of the form:
  
  ./$pkg/$version/$pkg.cabal
  
  including the leading ./.
  I'm guessing that this is done as a method of distinguishing
  non-package meta-data.
  
  Is this a convention we need to preserve?
 
 I've removed the leading ./; let's see if it breaks anything.

I expect it'll be fine. cabal-install uses:

case Tar.entryContent entry of
  Tar.NormalFile content _
 | takeExtension fileName == .cabal
- case splitDirectories (normalise fileName) of
[pkgname,vers,_] -

and

 splitDirectories (normalise ./$pkg/$version/$pkg.cabal)
=
 [$pkg,$version,$pkg.cabal]

Duncan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel