Re: Hackage 2 status
On Mon, Jul 02, 2012 at 08:14:01PM +0100, Duncan Coutts wrote: On Mon, 2012-07-02 at 12:25 +0100, Ian Lynagh wrote: Conclusion -- I think the following are the blockers for deploying Hackage 2: * #911 upload perms; may be good enough already * #916 check URLs are OK * #918 build haddock (and HsColour) docs I forgot that the bug tracker had moved to github. So actually these are now: * #901 upload perms; may be good enough already * #906 check URLs are OK * #908 build haddock (and HsColour) docs and are the tickets marked important or urgent on https://github.com/haskell/cabal/issues?labels=hackage2page=1state=open * Show source respository on package pages Should be easy to port that from the old code. I've filed #965 (hackage2, important) for that. * Support the existing Distributions files, and show info on package pages I advocated at the time the feature was added that it should be done differently so that the hackage server does not poll some url, but people in charge of distros push instead. I think it would not be a blocker to not implement the distribution info system as it is now and when eventually spending the time to implement it, switch to doing it in a more sensible way. OK, I won't treat that as a blocker then. (plus enough testing to give us confidence in it, of course). One of the main things here is adding tests that the database dump/restore mechanism round trips correctly. #966 (hackage2, important) filed. Something to keep in mind is memory usage. Will do, but currently I don't think this is a blocker for deploying 2.0. Thanks Ian ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: Hackage 2 status
On Tue, Jul 3, 2012 at 2:27 PM, Ian Lynagh i...@well-typed.com wrote: On Mon, Jul 02, 2012 at 08:14:01PM +0100, Duncan Coutts wrote: Something to keep in mind is memory usage. Will do, but currently I don't think this is a blocker for deploying 2.0. Isn't it the reason why the test server (http://hackage.factisresearch.com/) is constantly down? Or is that just because no-one's paying much attention? ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: Hackage 2 status
On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts duncan.cou...@googlemail.com wrote: Something to keep in mind is memory usage. I know Jeremy is looking at this from the infrastructure side, but I think from the app side there's also some likely culprits. Cabal's GenericPackageDescription type is very large in memory. Having 10's of 1000's of these means lots of memory. One hopefully easy way to save memory here without going to the hassle of redoing Cabal's type definitions is simply to increase sharing. There's a huge amount of repeated information. Start by sharing all the package names and versions. Then there's other meta-data that rarely changes between versions of the same package. This kind of thing should be easy to evaluate, just write a test prog that reads the index file and look at peak memory use. Then try sharing stuff and see how much it drops. This sharing optimisation would still be useful even if later we go and redo GenericPackageDescription to be more compact. This should not hold up the launch of Hackage 2 (which is very important) but I think it's an important issue that we need to address: we don't want to store the perhaps most important data the Haskell community has in an experimental data store! Creating a correct data store (i.e. ACID) that also handles a moderate amount of load is a quite difficult undertaking and it shouldn't be taken lightly. Lets stick the data in some SQL database and spend our energy on other things. :) Cheers, Johan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: Hackage 2 status
On 3 July 2012 20:38, Johan Tibell johan.tib...@gmail.com wrote: On Mon, Jul 2, 2012 at 3:14 PM, Duncan Coutts duncan.cou...@googlemail.com wrote: Something to keep in mind is memory usage. I know Jeremy is looking at this from the infrastructure side, but I think from the app side there's also some likely culprits. Cabal's GenericPackageDescription type is very large in memory. Having 10's of 1000's of these means lots of memory. One hopefully easy way to save memory here without going to the hassle of redoing Cabal's type definitions is simply to increase sharing. There's a huge amount of repeated information. Start by sharing all the package names and versions. Then there's other meta-data that rarely changes between versions of the same package. This kind of thing should be easy to evaluate, just write a test prog that reads the index file and look at peak memory use. Then try sharing stuff and see how much it drops. This sharing optimisation would still be useful even if later we go and redo GenericPackageDescription to be more compact. This should not hold up the launch of Hackage 2 (which is very important) but I think it's an important issue that we need to address: we don't want to store the perhaps most important data the Haskell community has in an experimental data store! Creating a correct data store (i.e. ACID) that also handles a moderate amount of load is a quite difficult undertaking and it shouldn't be taken lightly. Lets stick the data in some SQL database and spend our energy on other things. :) I still disagree that going with an external SQL db will be easier. The big advantage of the acid-state (and similar) data stores is that they let us use Haskell types properly and don't imply a separate external data model and a marshalling stage. That said, I also do not trust acid-state for long term storage (simply because the binary format it uses isn't sensible) which is why the hackage server already has a system for dumping and restoring to standard formats (like csv, tarballs etc). So if we use this backup system properly (ie in combination with a system for backups to other machines) then I think there's little chance of data loss. Additionally, the really important data (the packages) are stored in the file system. Duncan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel
Re: Hackage 2 status
On Tue, Jul 3, 2012 at 4:05 PM, Duncan Coutts duncan.cou...@googlemail.com wrote: I still disagree that going with an external SQL db will be easier. The big advantage of the acid-state (and similar) data stores is that they let us use Haskell types properly and don't imply a separate external data model and a marshalling stage. This is moot if the data ends up being corrupted* or if the data store doesn't handle the load. :) This might be the cranky old engineer in me talking, but these things don't usually end well. Using something like mysql-simple to marshal the data is pretty convient; it's very much like writing Binary instances for the data types. Additionally, the really important data (the packages) are stored in the file system. While this is true now (we don't have much data except the packages!) in my experience long term the user generated data (i.e. actions they perform on the Hackage site) will be the most valuable (as the packages can be regenerated from source if need be.) For example, using this data is how we're going to do ranking of packages. In fact, this data is what should make Hackage 2 and improvement over Hackage 1. * It doesn't matter much if we can restore the data from backups. Any corruption will still cause downtime and will likely require both manual maintenance and bug fixing in acid-state. -- Johan ___ cabal-devel mailing list cabal-devel@haskell.org http://www.haskell.org/mailman/listinfo/cabal-devel