Re: Hackage 2 status

2012-07-03 Thread Ian Lynagh
On Mon, Jul 02, 2012 at 08:14:01PM +0100, Duncan Coutts wrote:
 On Mon, 2012-07-02 at 12:25 +0100, Ian Lynagh wrote:
 
  Conclusion
  --
  
  I think the following are the blockers for deploying Hackage 2:
  
  * #911 upload perms; may be good enough already
  * #916 check URLs are OK
  * #918 build haddock (and HsColour) docs

I forgot that the bug tracker had moved to github. So actually these
are now:

* #901 upload perms; may be good enough already
* #906 check URLs are OK
* #908 build haddock (and HsColour) docs

and are the tickets marked important or urgent on
https://github.com/haskell/cabal/issues?labels=hackage2page=1state=open

  * Show source respository on package pages
 
 Should be easy to port that from the old code.

I've filed #965 (hackage2, important) for that.

  * Support the existing Distributions files, and show info on package pages
 
 I advocated at the time the feature was added that it should be done
 differently so that the hackage server does not poll some url, but
 people in charge of distros push instead. I think it would not be a
 blocker to not implement the distribution info system as it is now and
 when eventually spending the time to implement it, switch to doing it in
 a more sensible way.

OK, I won't treat that as a blocker then.

  (plus enough testing to give us confidence in it, of course).
 
 One of the main things here is adding tests that the database
 dump/restore mechanism round trips correctly.

#966 (hackage2, important) filed.

 Something to keep in mind is memory usage.

Will do, but currently I don't think this is a blocker for deploying
2.0.


Thanks
Ian


___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: Hackage 2 status

2012-07-03 Thread Ben Millwood
On Tue, Jul 3, 2012 at 2:27 PM, Ian Lynagh i...@well-typed.com wrote:
 On Mon, Jul 02, 2012 at 08:14:01PM +0100, Duncan Coutts wrote:
 Something to keep in mind is memory usage.

 Will do, but currently I don't think this is a blocker for deploying
 2.0.


Isn't it the reason why the test server
(http://hackage.factisresearch.com/) is constantly down? Or is that
just because no-one's paying much attention?

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: Hackage 2 status

2012-07-03 Thread Johan Tibell
On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts
duncan.cou...@googlemail.com wrote:
 Something to keep in mind is memory usage. I know Jeremy is looking at
 this from the infrastructure side, but I think from the app side there's
 also some likely culprits. Cabal's GenericPackageDescription type is
 very large in memory. Having 10's of 1000's of these means lots of
 memory. One hopefully easy way to save memory here without going to the
 hassle of redoing Cabal's type definitions is simply to increase
 sharing. There's a huge amount of repeated information. Start by sharing
 all the package names and versions. Then there's other meta-data that
 rarely changes between versions of the same package. This kind of thing
 should be easy to evaluate, just write a test prog that reads the index
 file and look at peak memory use. Then try sharing stuff and see how
 much it drops. This sharing optimisation would still be useful even if
 later we go and redo GenericPackageDescription to be more compact.

This should not hold up the launch of Hackage 2 (which is very
important) but I think it's an important issue that we need to
address: we don't want to store the perhaps most important data the
Haskell community has in an experimental data store! Creating a
correct data store (i.e. ACID) that also handles a moderate amount of
load is a quite difficult undertaking and it shouldn't be taken
lightly. Lets stick the data in some SQL database and spend our energy
on other things. :)

Cheers,
Johan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: Hackage 2 status

2012-07-03 Thread Duncan Coutts
On 3 July 2012 20:38, Johan Tibell johan.tib...@gmail.com wrote:
 On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts
 duncan.cou...@googlemail.com wrote:
 Something to keep in mind is memory usage. I know Jeremy is looking at
 this from the infrastructure side, but I think from the app side there's
 also some likely culprits. Cabal's GenericPackageDescription type is
 very large in memory. Having 10's of 1000's of these means lots of
 memory. One hopefully easy way to save memory here without going to the
 hassle of redoing Cabal's type definitions is simply to increase
 sharing. There's a huge amount of repeated information. Start by sharing
 all the package names and versions. Then there's other meta-data that
 rarely changes between versions of the same package. This kind of thing
 should be easy to evaluate, just write a test prog that reads the index
 file and look at peak memory use. Then try sharing stuff and see how
 much it drops. This sharing optimisation would still be useful even if
 later we go and redo GenericPackageDescription to be more compact.

 This should not hold up the launch of Hackage 2 (which is very
 important) but I think it's an important issue that we need to
 address: we don't want to store the perhaps most important data the
 Haskell community has in an experimental data store! Creating a
 correct data store (i.e. ACID) that also handles a moderate amount of
 load is a quite difficult undertaking and it shouldn't be taken
 lightly. Lets stick the data in some SQL database and spend our energy
 on other things. :)

I still disagree that going with an external SQL db will be easier.
The big advantage of the acid-state (and similar) data stores is that
they let us use Haskell types properly and don't imply a separate
external data model and a marshalling stage.

That said, I also do not trust acid-state for long term storage
(simply because the binary format it uses isn't sensible) which is why
the hackage server already has a system for dumping and restoring to
standard formats (like csv, tarballs etc). So if we use this backup
system properly (ie in combination with a system for backups to other
machines) then I think there's little chance of data loss.
Additionally, the really important data (the packages) are stored in
the file system.

Duncan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel


Re: Hackage 2 status

2012-07-03 Thread Johan Tibell
On Tue, Jul 3, 2012 at 4:05 PM, Duncan Coutts
duncan.cou...@googlemail.com wrote:
 I still disagree that going with an external SQL db will be easier.
 The big advantage of the acid-state (and similar) data stores is that
 they let us use Haskell types properly and don't imply a separate
 external data model and a marshalling stage.

This is moot if the data ends up being corrupted* or if the data store
doesn't handle the load. :) This might be the cranky old engineer in
me talking, but these things don't usually end well.

Using something like mysql-simple to marshal the data is pretty
convient; it's very much like writing Binary instances for the data
types.

 Additionally, the really important data (the packages) are stored in
 the file system.

While this is true now (we don't have much data except the packages!)
in my experience long term the user generated data (i.e. actions they
perform on the Hackage site) will be the most valuable (as the
packages can be regenerated from source if need be.) For example,
using this data is how we're going to do ranking of packages. In fact,
this data is what should make Hackage 2 and improvement over Hackage
1.

* It doesn't matter much if we can restore the data from backups. Any
corruption will still cause downtime and will likely require both
manual maintenance and bug fixing in acid-state.

-- Johan

___
cabal-devel mailing list
cabal-devel@haskell.org
http://www.haskell.org/mailman/listinfo/cabal-devel