Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi On 14 Mar 2008, at 21:39, Aaron Denney wrote: On 2008-03-14, Conor McBride [EMAIL PROTECTED] wrote: Hi On 13 Mar 2008, at 23:33, Aaron Denney wrote: On 2008-03-13, Conor McBride [EMAIL PROTECTED] wrote: For a suitable notion of = on quotients, and with a suitable abstraction barrier at least morally in place, is that really too much to ask? I really think it is. I don't think the case of equivalent for this purpose, but not that purpose can be ignored. Sure. But use the right tools for the job. So what are the right tools then? Why is a typeclass not the right tool? I guess I mean to distinguish *equality*, which is necessarily respected by all observations (for some notion of observation which it's important to pin down), from coarser equivalences where respect takes careful effort. Take Roman's example of alpha-equivalence for the lambda-terms with strings for bound variables. No way would I ever call that equality, because respecting alpha-equivalence takes quite a lot of care. (Correspondingly, I'd switch to a representation which admitted equality.) [..] Of course, if you want to expose the representation for some other legitimate purpose, then it wasn't equality you were interested in, so you should call it something else. I'm perfectly happy calling it Equivalence. I'm perfectly happy having equivalences around, but if Eq means has an equivalence rather than has equality, then I'm not very happy about the use of the == sign. [..] That's a workable definition, but I don't know if I'd call it a sort, precisely. The standard unix tool tsort (for topological sort, a bit of a misnomer) does this. Will that do? Unfortunately, one can't just reuse the standard algorithms. Indeed. (Does anyone know a topological sort algorithm which behaves like an ordinary sort if you do give it a total order? Or a reason why there's no such thing?) So you're probably right that x = y \/ y = x should hold for the order relation used by library sort. That's not the axiom I was thinking of dropping when I said sort's type was too tight (it was you who brought up incomparability): I was thinking of dropping antisymmetry. If a sort can't support the standard sort on this key technique, and don't munge everything for two keys that compare equal, something is wrong. And I don't think sort is that special a case. I quite agree. That's why I'm suggesting we drop antisymmetry as a requirement for sorting. Instances, rather than explicit functions, are nice because they let us use the type system to ensure that we never have incompatible functions used when combining two data structures, or pass in a function that's incompatible with the invariants already in a data structure built with another function. I'm not sure what you mean here. So we surely do need an equivalence relation typeclass. And there are Eq instances that aren't quite equality, but are equivalences, and work with almost all code that takes Eq instances. My main concern is that we should know where we stand. I think it would be a very good thing if we knew what the semantic notion of equality should be for each type. What notion of equality should we use in discussion? What do we mean when we write laws like map f . map g = map (f . g) ? I write = to distinguish it from whatever Bool-valued function at some type or other that we might call ==. Given sneaky ways to observe memory pointers or fairly ordinary ways to expose representations which are supposed to be abstract, it's clearly impossible to ensure that = is absolutely always respected. It would be good if it was clear which operations were peculiar in this way. I'd like to know when I can reason just by replacing equals for equals, and when more care is required (eg, when ensuring that substitution respects alpha- equivalence). From the point of view of reasoning (informally or formally) it then becomes useful to know that some binary Bool-valued function is sound with respect to = and complete when one argument is defined and finite. It's useful to know that one is testing equality, rather than just some equivalence, because equality supports stronger reasoning principles. Equivalences are useful too, but harder to work with. I quite agree that we should support them, and that it is reasonable to do so via typeclasses: if a type supports multiple useful equivalences, then the usual newtype trick is a reasonable enough way to manage it. The only time treating equalities as equivalences won't work is when we need to coalesce equivalent elements into one representative, and the choice of representative matters. Another time when treating equalities just as equivalences won't do is when it's time to think about whether your program works. This issue is not just an operational one, although in the case of TypeRep, for example, it can get pretty near the bone. So, do we mark equivalencies as special, or observational
Re: Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Hi On 14 Mar 2008, at 03:48, Roman Leshchinskiy wrote: Adrian Hey wrote: I would ask for any correct Eq instance something like the law: (x==y) = True implies x=y (and vice-versa) which implies f x = f y for all definable f which implies (f x == f y) = True (for expression types which are instances of Eq). This pretty much requires structural equality for concrete types. For abstract types you can do something different provided functions which can give different answers for two equal arguments are not exposed. How do you propose something like this to be specified in the language definition? The report doesn't (and shouldn't) know about abstract types. Why not? Why shouldn't there be at least a standard convention, if not an abstype-like feature for establishing an abstraction barrier, and hence determine the appropriate observational equality for an abstract type? So you can either require your law to hold everywhere, which IMO isn't a good idea, or you don't require it to hold. From the language definition point of view, I don't see any middle ground here. Why not demand it in the definition, but allow unsafe leaks in practice? As usual. Why are you so determined that there's nothing principled to do here? People like to say Haskell's easy to reason about. How much of a lie would you like that not to be? Also, when you talk about definable functions, do you include things like I/O? What if I want to store things (such as a Set) on a disk? If the same abstract value can have multiple representations, do I have to convert them all to some canonical representation before writing them to a file? Canonical representations are not necessary for observational congruence. Representation hiding is enough. This might be rather slow and is, IMO, quite unnecessary. From a more philosophical points of view, I'd say that the appropriate concept of equality depends a lot on the problem domain. It's certainly true that different predicates may respect different equivalence relations. The equivalence relation you call equality should be the finest of those, with finer representational distinctions abstracted away. What that buys you is a class of predicates which are guaranteed to respect equality without further ado... Personally, I quite strongly disagree with restricting Eq instances in the way you propose. I have never found much use for strict structural equality (as opposed to domain-specific equality which may or may not coincide with the structural one). ...which is how we use equality when we think. I certainly don't think strict structural equality should be compulsory. In fact, for Haskell's lazy data structures, you rather need lazy structural simulation if you want to explain why cycle x = cycle xx What would be so wrong with establishing a convention for saying, at each given type (1) this is the propositional equivalence which we expect functions on this type to respect (2) here is an interface which respects that equivalence (3) here are some unsafe functions which break that equivalence: use them at your own risk ? Why is it pragmatically necessary to make reasoning difficult? I'm sure that wise folk out there have wise answers to that question which they don't consider to be an embarrassment. When representation-hiding is bliss, 'tis folly to be wise. All the best Conor ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Dan Weston wrote: 6.3.2 (The Ord Class): The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) Thanks Dan. I didn't grasp the significance of this at first, but I believe you are correct. But maybe it should be = not == in the last line? forall f . (compare x y == EQ and (f x or f y is defined)) == f x = f y) So assuming your (and my) logic is correct, the existing report text does indeed settle the original dispute that sparked this thread. Essentially you can't have 2 distinct values that compare equal, so if they do then they must be indistinguishable? Is that right? So there is no need for the sort on a list of elements whose type is an instance of Ord to be stable as the difference between the results of a stable and unstable sort cannot be observable for any (correct) Ord instance (assuming the the instances compare method was used to perform the sort). So if we have a compare method on this type we can establish the == method: x == y = case compare x y of EQ - True _ - False and from this it follows that (x == y) = True implies x and y are indistingushable. So I believe for types that are instances of both Eq and Ord, this settles the question of what (x == y) = True implies. So now I'm wondering what about types that are instances of Eq but not of Ord? Well from para. 6.3.1 The Eq class provides equality (==) and inequality (/=) methods. Well I guess assuming that saying two values are equal is another way of saying they are indistinguishable then I think it's pretty clear what the report is saying. This interpretation also ensures consistency between Eq/Ord instances and Eq only instances. Assuming this is all correct, I think I can sleep easier now I can forget about all this things being equal and not equal at the same time craziness, at least for Eq/Ord instances that are compliant with the standard (which are the only ones I care about). I think anyone wanting standard classes with different mathematical properties should define them, stick them in Hackage and propose them for Haskell-prime (if that's still happening?) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Conor McBride wrote: Hi On 14 Mar 2008, at 03:48, Roman Leshchinskiy wrote: Adrian Hey wrote: I would ask for any correct Eq instance something like the law: (x==y) = True implies x=y (and vice-versa) which implies f x = f y for all definable f which implies (f x == f y) = True (for expression types which are instances of Eq). This pretty much requires structural equality for concrete types. For abstract types you can do something different provided functions which can give different answers for two equal arguments are not exposed. How do you propose something like this to be specified in the language definition? The report doesn't (and shouldn't) know about abstract types. Why not? Why shouldn't there be at least a standard convention, if not an abstype-like feature for establishing an abstraction barrier, and hence determine the appropriate observational equality for an abstract type? Adrian's original question/proposal was about the language report. I'm only pointing out that all other considerations aside, it's not clear how to distinguish between the implementation part of the ADT and everything else in the report. So you can either require your law to hold everywhere, which IMO isn't a good idea, or you don't require it to hold. From the language definition point of view, I don't see any middle ground here. Why not demand it in the definition, but allow unsafe leaks in practice? As usual. Why are you so determined that there's nothing principled to do here? People like to say Haskell's easy to reason about. How much of a lie would you like that not to be? I'm not sure what you mean here. Should the report say something like a valid Eq instance must ensure that x == y implies f x == f y for all f? Probably not, since this requires structural equality which is not what you want for ADTs. Should it be for all f which are not part of the implementation of the type? That's a non-requirement if the report doesn't specify what the implementation is. So what should it say? Unsafe leaks are ok as long as they are rarely used. If you have to resort to unsafe leaks to define an ADT, then something is wrong. Also, when you talk about definable functions, do you include things like I/O? What if I want to store things (such as a Set) on a disk? If the same abstract value can have multiple representations, do I have to convert them all to some canonical representation before writing them to a file? Canonical representations are not necessary for observational congruence. Representation hiding is enough. I beg to disagree. If the representation is stored on the disk, for instance, then it becomes observable, even if it's still hidden in the sense that you can't do anything useful with it other than read it back. Actually, we don't even need the disk. What about ADTs which implement Storable, for instance? What would be so wrong with establishing a convention for saying, at each given type (1) this is the propositional equivalence which we expect functions on this type to respect (2) here is an interface which respects that equivalence (3) here are some unsafe functions which break that equivalence: use them at your own risk My (probably erroneous) understanding of the above is that you propose to call (==) propositional equivalence and to require that for every type, we define what that means. To be honest, I don't quite see how this is different from saying that the meaning of (==) should be documented for every type, which I wholeheartedly agree with. But the unsafe bit really doesn't make sense to me. As an example, consider the following data type: data Expr = Var String | Lam String Expr | App Expr Expr The most natural notion of equality here is probably equality up to alpha conversion and that's what I usually would expect (==) to mean. In fact, I would be quite surprised if (==) meant structural equality. Should I now consider the Show instance for this type somehow unsafe? I don't see why this makes sense. Most of the time I probably don't even want to make this type abstract. Are the constructors also unsafe? Why? To summarise my views on this: an Eq instance should define a meaningful equivalence relation and be documented. Requiring anything beyond that just doesn't make sense to me. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Note that even if you wanted Eq to mean observational equality, you still can't perform that kind of reordering or 'sort' optimizations without running into trouble. for a not contrived at all example: data Id = Id { idIdent :: Int, idFreeVarCache :: [Id] } instance Eq Id where x == y = idIdent x == idIdent y now, this type represents an identifier in a language that is annotated with the free variables it contains. Note that the Eq instance really does declare observational equality here, the free var cache is only a copy of what is in the definition of the Id. now consider the id for the simple v1 = v1 all of the following are observationally the same x = Id 1 [x] x = Id 1 [Id 1 [x]] x = Id 1 [Id 1 [Id 1 [Id 1 ... now, this is just fine, there is no way for a program to tell the difference between them, but the difference is very important! the second wastes space and the third is an honest to goodness space leak. One has to rely on the fact Set.insert really replaces its element, max x y where x == y is always y and other such things to reasonably reason about the space usage of haskell programs, something that is hard enough as it is without basics like 'sort' trying to be clever. So, even if a == b always meant observational equality, specifying bias is still very important. Even if you document it as 'unspecified' that is fine (though it limits the use of said library), but it is part of the API. For the record I also always thought of 'Eq' as an arbitrary equality relationship and 'Ord' as a compatible total ordering. It is not even clear whether structural equality is meaningful for a lot of types, even though they might have a 'natural' equality relationship. John -- John Meacham - ⑆repetae.net⑆john⑈ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
[EMAIL PROTECTED] wrote: G'day all. Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. Nonsense. Consider a Schwartzian transform wrapper: data OrdWrap k v = OrdWrap k v instance (Ord k) = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = OrdWrap k1 k2 I take it you mean something like .. instance Ord k = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = compare k1 k2 Where's the Eq instance for OrdWrap? This may or may not satisfy the law: (compare a b) = EQ implies (a == b) = True. I think everbody agrees about that, but I can't tell from the code you've posted if it does in this case. What's disputed is whether or not this law should hold: (a == b) = True implies a = b Again, I can't tell if it does or not in this case, but I assume the point of your post is that it doesn't. AFAICT the report is ambiguous about this, or at least the non-intutive equality semantics are not at all clear to me from what I can see in the Eq class definition (para 6.3.1). I think an the absence of any clear and *explicit* statement to the contrary people are entitled to assume this law is mandatory for all (correct) Eq instances. It would be incorrect (and not sane) for sort [a,b] to return [a,a] in this case, though a case could be made that either [a,b] or [b,a] make sense. Quoting Jules Bean [EMAIL PROTECTED]: Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. Stability is an occasionally very useful property. However, if there is a tradeoff between stability and performance, I'd prefer it if the library didn't choose for me. Well I hope you or anyone else hasn't used Data.Map or with OrdWrap keys because if so it's likely that the code has either been broken in the past, or is broken now (not sure which). But the equality semantics some people seem to want seem to me like a very good way to guarantee that similar bugs and ambiguities will occur all over the place, now and forever. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Thu, Mar 13, 2008 at 1:00 AM, Adrian Hey [EMAIL PROTECTED] wrote: AFAICT the report is ambiguous about this, or at least the non-intutive equality semantics are not at all clear to me from what I can see in the Eq class definition (para 6.3.1). I think an the absence of any clear and *explicit* statement to the contrary people are entitled to assume this law is mandatory for all (correct) Eq instances. In mathematics we usually *don't* assume things that aren't stated assumptions. Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Thu, Mar 13, 2008 at 3:02 AM, Adrian Hey [EMAIL PROTECTED] wrote: The report doesn't state that for all Ints, (x==y = True) implies that x=y. There's no reason to suppose the Int instance is in any way special, so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y Of course not :-). However, on what grounds am I to assume that these two will be semantically equivalent for instances other than Int? Int *is* special insofar as its implementation of Eq differs from that of other types (of course, all other instances of Eq are special then, too). So it's reasonable that == means observational equivalence for Int but not for other types, since it's possible to implement them that way and there is no (explicitly stated) law which requires it. But I agree that Eq should have some laws, just maybe not observational equivalence because it is very limiting from a user's perspective (that is, if I have a data type, requiring observational equivalence makes it much less likely that I will be able to write an instance of Eq, even if it makes sense in some stretch of the imagination). Saying that it's reasonable for everyone, everywhere to assume that Eq means what you want it to mean is a stretch. I believe every for function I've written which was polymorphic in Eq it would have sufficed for Eq to be any equivalence relation. What reason do I have to restrict the users of my functions to those who can implement observational equivalence? But I'm just blabbering. Here's my position on the issue with an argument for why I think it's a good one: Eq should be allowed to be any equivalence relation, because there are many data types for which it is impossible to satisfy the constraint of observational equivalence, thus reducing the usefulness of data structures written over types with Eq. On the other hand, (and this is anecdotal), no data structures have been unable to cope with Eq not implying observational equivalence. Here's another argument. Since Eq has no stated laws in the report, give Eq no assumptions*, and allow the community to create an empty subclass of Eq (ObsEq, say) which documents the necessary laws. Then a data structure which relies on the observational equivalence property can specify it explicitly. But really the thing that makes me choose this position is that it sucks not to be able to use someone's code only because it is impossible to satisfy instance laws, even though the code would be perfectly reasonable anyway (though it isn't a strong argument, consider the case of the broken ListT, still, it's enough to convince me for the time being). * No assumptions at all would be strange, but also okay with me, as long as functions which rely on Eq specify that they need it to conform to certain laws. But I consider equivalence relation reasonable because (1) everyone here seems to be on common ground that it should *at least* be that, and (2) all the prelude functions on Eq assume that (and note that none assume obs. eq.). Indeed, group is almost meaningless if Eq imples obs. eq. Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Hello All, I'm top posting because I'm getting bored and frustrated with this thread and I don't want to respond detail to everything Aaron has said below. Aaron: Where are you getting this equivalence stuff from? Searching the report for the word equivalence the only remotely relevant section seems to be in para. 17.6.. When the “By” function replaces an Eq context by a binary predicate, the predicate is assumed to define an equivalence Which is fair enough, but this is talking about the argument of By functions. The Haskell wiki refers me to wikipedia, which contains the words In Haskell, a class Eq intended to contain types that admit equality would be declared in the following way http://en.wikipedia.org/wiki/Type_class Not that this is necessarily authoritative, but it seems to be contaradicting some peoples interpretation. Also, on page 60 of the report I find the words Even though the equality is taken at the list type.. So I don't know if all this is really is the correct reading of the report, but if so would like to appeal to movers and shakers in the language definition to please decide exactly what the proper interpretation of standard Eq and Ord class laws are and in the next version of the report give an explanation of these in plain English using terms that people without a Mathematics degree are likely to understand. Aaron's interpretation may indeed be very correct and precise, but I fear in reality this is just going to be incomprehensible to many programmers and a terrible source of bugs in real world code. I cite previous left biasing bugs Data.Map as evidence. I would ask for any correct Eq instance something like the law: (x==y) = True implies x=y (and vice-versa) which implies f x = f y for all definable f which implies (f x == f y) = True (for expression types which are instances of Eq). This pretty much requires structural equality for concrete types. For abstract types you can do something different provided functions which can give different answers for two equal arguments are not exposed. Anything else is just wrong (according to the language specification, even if it can be right in some mathematical sense). Before anyone jumps down my throat, I remind you that this is a request, not an assertion! :-) On the subject of ambiguity, bugs and maxima, I see we have in Data.List -- | 'maximum' returns the maximum value from a list, -- which must be non-empty, finite, and of an ordered type. -- It is a special case of 'Data.List.maximumBy', which allows the -- programmer to supply their own comparison function. maximum :: (Ord a) = [a] - a maximum [] = errorEmptyList maximum maximum xs = foldl1 max xs -- | The 'maximumBy' function takes a comparison function and a list -- and returns the greatest element of the list by the comparison function. -- The list must be finite and non-empty. maximumBy :: (a - a - Ordering) - [a] - a maximumBy _ [] = error List.maximumBy: empty list maximumBy cmp xs= foldl1 max xs where max x y = case cmp x y of GT - x _ - y So I believe I'm correct in saying that maximumBy returns the last of several possible maximum elements of the list. This obviously needs specifying in the Haddock. Because maximumBy documentation is ambiguous in this respect, so is the overloaded maximum documentation. At least you can't figure it out from the Haddock. Despite this ambiguity, the statement that maximum is a special case of maximumBy is true *provided* max in the Ord instance is defined the way Aaron says is should be: (x==y = True) implies max x y = y. But it could be be made unconditionally true using.. maximum :: Ord a = [a] - a maximum [] = error List.maximum: empty list maximum xs = maximumBy compare xs AFAICT, the original report authors either did not perceive an ambiguity in maximum, or just failed to notice and resolve it. If there is no ambiguity this could be for 2 reasons. 1 - It doesn't matter which maximum is returned because: (x==y) = True implies x=y 2 - It does matter, and the result is guaranteed to be the last maximum in all cases because: (x==y) = True implies max x y = y But without either of the above, it is unsafe to assume maximum = maximumBy compare Regarding the alleged max law this too is not mentioned in the Haddock for the Ord class, nor is it a law AFAICT from reading the report. The report (page 83) just says that the default methods are reasonable, but presumably not manditory in any semantic sense. This interpretation also seems to be the intent of this from the second para. of Chapter 8: The default method definitions, given with class declarations, constitute a specification only of the default method. They do not constitute a specification of the meaning of the method in all instances. I
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Luke Palmer wrote: On Thu, Mar 13, 2008 at 1:00 AM, Adrian Hey [EMAIL PROTECTED] wrote: AFAICT the report is ambiguous about this, or at least the non-intutive equality semantics are not at all clear to me from what I can see in the Eq class definition (para 6.3.1). I think an the absence of any clear and *explicit* statement to the contrary people are entitled to assume this law is mandatory for all (correct) Eq instances. In mathematics we usually *don't* assume things that aren't stated assumptions. But the trouble is the report says practically *nothing* about Eq class or what the (==) operator means. It all seems to be assumed, and even when it does talk about it informally it talks about equality, not equivalence or some other word. The report doesn't state that for all Ints, (x==y = True) implies that x=y. There's no reason to suppose the Int instance is in any way special, so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Luke Palmer wrote: On Thu, Mar 13, 2008 at 3:02 AM, Adrian Hey [EMAIL PROTECTED] wrote: The report doesn't state that for all Ints, (x==y = True) implies that x=y. There's no reason to suppose the Int instance is in any way special, so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y Of course not :-). However, on what grounds am I to assume that these two will be semantically equivalent for instances other than Int? Umm..Maybe the fact that you're using the == method from the Eq class, not some Int specific isIntEqual function? :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Aaron Denney wrote: so do you really seriously consider the possibility that this might not hold in your Int related code? if (x==y) then f x else g x y might not mean the same as.. if (x==y) then f y else g x y In Int code, of course not, because I know the types, and I know the behaviour of (==) on Ints. But f is specialized to work on Ints, isn't it, so it's reasonable to know what semantics (==) has for Ints, and depend on them? Why are Ints special in this way? Couldn't you use say exacly the same about any type (just substitute type X of your choice for Int) IMO if your going to define a type X which is intended to be an Eq instance you should always ensure, one way or another that all exposed primitives that operate on that type respect equality, as defined by == for the instance method. (And hence more complex functions built on those primitives do too). Just MO, the report doesn't make this clear 1 way or another AFAICS. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Hi folks I'm late into this thread, so apologies if I'm being dim. On 13 Mar 2008, at 16:17, [EMAIL PROTECTED] wrote: Adrian Hey [EMAIL PROTECTED] wrote: I would ask for any correct Eq instance something like the law: (x==y) = True implies x=y (and vice-versa) I wish I knew what = meant here. Did somebody say? I don't think it's totally obvious what equational propositions should mean. Nor do I think it's desirable to consider only those propositions expressible in QuickCheck. If = is reflexive and distinguishes undefined from True, then x = y implies (x == y) = True will be tricky to satisfy for quite a lot of types. What about undefined == undefined or repeat 'x' == repeat 'x' ? For some suitable (slightly subtle) definition of finite, you might manage x finite and x = y implies (x == y) = True One rather intensional definition of x = y might be x and y have a common n-step reduct with respect to some suitable operational semantics. I don't think this is what Adrian had in mind, but it certainly falls foul of Wolfram's objection. The easiest counterexample are sets (or finite maps) provided as an abstract data type (i.e., exported without access to the implementation, in particular constructors). Different kinds of balanced trees typically do not produce the same representation for the same set constructed by different construction expressions. This suggests that we should seek to define x = y on a type by type basis, to mean x and y support the same observations, for some suitable notion of observation, which might depend crucially on what operations are exported from the (notional or actual) module which defines the type. If so, it's clearly crucial to allow some observations which rely on waiting for ever, in order to avoid _|_-induced collapse. Something of the sort should allow this... Therefore, (==) on sets will be expected to produce equality of sets, which will only be an equivalence on set representations. ...but this... which implies (f x == f y) = True (for expression types which are instances of Eq). This specifies that (==) is a congurence for f, and is in my opinion the right specification: (==) should be a congurence on all datatypes with respect to all purely defineable functions. ...is more troublesome. Take f = repeat. Define f = f. I'd hope x == y = True would give us x = y, and that x == y would be defined if at least one of x and y is finite. That implies f x = f y, which should guarantee that f x == f y is not False. But at least nowadays people occasionally do export functions that allow access to representation details, [..] I consider this as an argument to remove showTree from the interface of Data.Set, and to either specify toList to produce an ordered list (replacing toAscList), or to remove it from the interface as well. Perhaps that's a little extreme but I agree with the sentiment. How about designating such abstraction- breaking functions nosy, in the same way that functions which might break purity are unsafe. (mapMonotonic should of course be removed, too, or specified to fail (preferably in some MonadZero) if the precondition is violated, which should still be implementable in linear time.) What's wrong with mapMonotonic that isn't wrong with, say, sortBy?, or Eq instances for parametrized types? but if so would like to appeal to movers and shakers in the language definition to please decide exactly what the proper interpretation of standard Eq and Ord class laws are and in the next version of the report give an explanation of these Strongly seconded, inserting ``precise'' before ``explanation'' ;-) (And I'd expect equivalences and congruences to be accessible on the basis of standard first-year math...) Before we can talk about what == should return, can we settle what we mean by = ? I think we need to pragmatic about breaking the rules, given suitable documentation and maybe warnings. We should at least aspire to some principles, which means we should try to know what we're talking about and to know what we're doing, even if we don't always do what we're talking about. I'll shut up now. Potatoes to peel Conor ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Wed, Mar 12, 2008 at 4:29 PM, Aaron Denney [EMAIL PROTECTED] wrote: When defining max, yes, you must take care to make sure it useable for cases when Eq is an equivalence relation, rather than equality. If you're writing library code, then it won't generally know whether Eq means true equality rather than equivalence. If this would let you optimize things, you need some other way to communicate this. The common typeclasses are for generic, parameterizable polymorphism. Equivalence is a more generally useful notion than equality, so that's what I want captured by the Eq typeclass. I agree that equivalence is more general than equality, but I think equality makes more sense for Eq. Unfortunately, my reasons are mostly circumstantial. (1) You get at most one instance of Eq per type, and you get at most one equality relation per type. On the other hand, you have at least one equivalence (trivial equivalence) and will usually have several. Type classes don't work well when you have more than one of something per type (consider monoids). (2) Libraries like Data.Set and the Edison have to go through a lot of hoops because they can't assume that an Eq tests equality. (The Edison API, in particular, has to create a distinction between observable and non-observable collections, in order to support, e.g., a bag that doesn't store multiple copies of equivalent elements.) (3) Eq uses (==), which is commonly known as the equality sign, not the equivalence sign. (4) The Prelude already provides alternative functions that support any equivalence (sortBy, nubBy, etc.). If I were creating Haskell right now, I'd use Eq for equality and provide an additional class for equivalences along these lines: data P r class Equivalence r where type EqOver r :: * equiv :: P r - EqOver r - EqOver r - Bool data Equality a instance (Eq a) = Equivalence (Equality a) where type EqOver (Equality a) = a equiv _ = (==) data Trivial a instance Equivalence (Trivial a) where type EqOver (Trivial a) = a equiv _ _ _ = True Similar tricks can be used for orderings. -- Dave Menendez [EMAIL PROTECTED] http://www.eyrie.org/~zednenem/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Quoting Adrian Hey [EMAIL PROTECTED]: I take it you mean something like .. Err... yes, I did. Where's the Eq instance for OrdWrap? Omitted for brevity. This may or may not satisfy the law: (compare a b) = EQ implies (a == b) = True. I think everbody agrees about that, but I can't tell from the code you've posted if it does in this case. The default implementation of compare says that. One thing that's not explicitly stated in the report is whether or not the instances of typeclasses like Eq or Ord need to do the same thing as[*] the default implementations. Does x /= y do the same thing as not (x == y)? What's disputed is whether or not this law should hold: (a == b) = True implies a = b Apart from possibly your good self, I don't think this is disputed. Quotient types, as noted elsewhere in this thread, are very useful. Their common use predates Miranda, so it's way too late to unbless them now. Well I hope you or anyone else hasn't used Data.Map or with OrdWrap keys because if so it's likely that the code has either been broken in the past, or is broken now (not sure which). For Data.Map, using an OrdWrap-like wrapper for keys is wrong, because it's not necessary. OrdWrap is for situations where you need to associate a value with a key which is, unsurprisingly, what Data.Map also does. As for sort, the behaviour hasn't been broken at any point in the past or present that I'm aware of, and a lot of people would strongly resist it if it ever were proposed that it be broken. Cheers, Andrew Bromage [*] Do the same thing as here means that they mean the same thing, but allows for the possibility that some implementation may be less stack-consuming, lazier or in some way more efficient than its default. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi On 13 Mar 2008, at 22:28, [EMAIL PROTECTED] wrote: G'day all. Quoting Adrian Hey [EMAIL PROTECTED]: What's disputed is whether or not this law should hold: (a == b) = True implies a = b Apart from possibly your good self, I don't think this is disputed. Quotient types, as noted elsewhere in this thread, are very useful. For a suitable notion of = on quotients, and with a suitable abstraction barrier at least morally in place, is that really too much to ask? Their common use predates Miranda, so it's way too late to unbless them now. How depressing! Untyped programming also predates Miranda. We can always aspire for better. It's not that we need to get rid of Quotients: it's just that we need to manage information hiding properly, which is perhaps not such a tall order. Meanwhile, the sort/Ord/OrdWrap issue may be a storm in a different teacup: the type of sort is too tight. Ord (total ordering) is way too strong a requirement for sorting. Antisymmetry isn't needed for sorting and isn't possessed by OrdWrap. A bit more structure for order-related classes would surely help here. Isn't there room for hope? All the best Conor ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
[EMAIL PROTECTED] wrote: What's disputed is whether or not this law should hold: (a == b) = True implies a = b Apart from possibly your good self, I don't think this is disputed. If that's supposed it imply you think I'm in a minority of one I don't think you've been following this thread very well. Even the report uses the word equality in the prose. And as I pointed out in another post, even the standard library maximum function appears to ambiguous if the law doesn't hold. It can be disambiguated if Aarons max law holds: (a == b) = True implies max x y = y But this is only true for the *default* max implementation. One of the few explicit things the report does say on these matters is that the default methods should *not* be regarded as definitive. Besides there are good pragmatic safety and performance reasons why Haskell should provide at least one class that offers strong guarantees regarding equality and the (==) operator. If that class isn't Eq, then where is it? The (==) law holds for: 1- All standard Eq instances 2- All wholly derived Eq instances 3- Most hand defined instances I suspect. ..and has almost certainly been implicitly assumed by heaven knows what extant code (some of it in the standard libraries I suspect). So I think that we should recognise that this was the original intent for the Eq class and this should be made official, albeit retrospectively. If there's a need for a similar class where the (==) law doesn't hold that's fine. But please don't insist that class must be Eq. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Quoting Conor McBride [EMAIL PROTECTED]: How depressing! Sorry, I don't understand that. Quotient types are good, but they're not the whole story. They just happen to be one use case with a solid history behind them. it's just that we need to manage information hiding properly, which is perhaps not such a tall order. It's my opinion (and I know I'm not alone in this) that modularity is probably the one thing that Haskell really hasn't (yet) gotten right. Haskell's implementation of modules/namespaces/whatever is the bare minimum needed to be minimally useful. It's a shame, because abstraction, in Haskell, is extremely cheap. It's often only one line, and you've got a compiler-checked abstraction that can't be accidentally confused with its representation. This should encourage micro-abstractions everywhere, but without submodules, namespaces or whatever you want to call them, these abstractions are easy to break (on purpose, not by accident). If only you could add a couple more lines of code, and instantly have your abstraction unbreakable. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi On 13 Mar 2008, at 23:33, Aaron Denney wrote: On 2008-03-13, Conor McBride [EMAIL PROTECTED] wrote: Hi On 13 Mar 2008, at 22:28, [EMAIL PROTECTED] wrote: G'day all. Quoting Adrian Hey [EMAIL PROTECTED]: What's disputed is whether or not this law should hold: (a == b) = True implies a = b Apart from possibly your good self, I don't think this is disputed. Quotient types, as noted elsewhere in this thread, are very useful. For a suitable notion of = on quotients, and with a suitable abstraction barrier at least morally in place, is that really too much to ask? I really think it is. I don't think the case of equivalent for this purpose, but not that purpose can be ignored. Sure. But use the right tools for the job. Now, it may be the case that fooBy functions are then the right thing, but it's not clear to me at all that this is true. And if the fooBy option works, then why would the foo option fail for equivalence classes? It seems reasonable to construct quotients from arbitrary equivalences: if fooBy works for the carrier, foo should work for the quotient. Of course, if you want to expose the representation for some other legitimate purpose, then it wasn't equality you were interested in, so you should call it something else. A bit more structure for order-related classes would surely help here. Say what? Don't panic! If I don't have a total ordering, then it's possible two elements are incomparable Quite so. -- what should a sort algorithm do in such a situation? Not care. Produce a resulting list where for any [..., x, ..., y, ...] in the result, y = x implies x = y. Vacuously satisfied in the case of incomparable elements. In the case of a total order, that gives you y = x implies x = y (and everything in between), but for a preorder, you put less in, you get less out. Will that do? Conor ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Quoting Adrian Hey [EMAIL PROTECTED]: If that's supposed it imply you think I'm in a minority of one I don't think you've been following this thread very well. Sorry, that was a bit of hyperbole. Even the report uses the word equality in the prose. Indeed, and the only sensible meaning of equality that I can think of is _semantic_ equality. Two values are semantically equal if they mean the same thing. A concrete example of a quotient type that I had in mind is rationals. A rational is implemented as, for the sake of argument, a pair of integers. Two rational numbers, a/b and c/d, are equal iff ad = bc. That's what everyone means by equality for rationals. It's true that rationals have a normal form, and this can be enforced by a smart constructor and an unbreakable abstraction. But if you've got an unbreakable abstraction, then you've also got the mechanism to enforce observational equality. Moreover, not all quotient types have a one true normal form (e.g. regular expressions), and even in cases where there is a sensible normal form, it might be undesirable for reasons of performance or convenience. Besides there are good pragmatic safety and performance reasons why Haskell should provide at least one class that offers strong guarantees regarding equality and the (==) operator. Well, I haven't heard any reasons that have convinced me yet. No arguing over taste, of course. ..and has almost certainly been implicitly assumed by heaven knows what extant code (some of it in the standard libraries I suspect). Nobody has yet gone to the trouble of consulting either heaven or the source code (in whatever order is deemed appropriate) to see if this claim is true. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Hi On 13 Mar 2008, at 23:42, [EMAIL PROTECTED] wrote: Conor McBride [EMAIL PROTECTED] responded to my comment: (mapMonotonic should of course be removed, too, or specified to fail (preferably in some MonadZero) if the precondition is violated, which should still be implementable in linear time.) What's wrong with mapMonotonic that isn't wrong with, say, sortBy?, or Eq instances for parametrized types? Prelude :m + Data.Set Prelude Data.Set toAscList $ mapMonotonic (10 -) (fromList [1 .. 5]) [9,8,7,6,5] Prelude Data.Set 5 `member` mapMonotonic (10 -) (fromList [1 .. 5]) False Something's certainly wrong there! But nothing out of the ordinary: garbage in, garbage out. Happens all the time, even in Haskell. Why pick on mapMonotonic? Prelude Data.List sortBy (\_ _ - GT) [1,3,2,5,4] [4,5,2,3,1] Prelude Data.List sortBy (\_ _ - GT) [4,5,3,2,1] [1,2,3,5,4] I guess there's a question of what we might call toxic waste---junk values other than undefined. I think undefined is bad enough already. So the type system can't express the spec. I don't think we should be casual about that: we should be precise in documentation about the obligations which fall on the programmer. Some dirt is pragmatically necessary: we shouldn't pretend that it ain't so; we shouldn't pretend that dirt is clean. Before we can talk about what == should return, can we settle what we mean by = ? ``='' is not in the Haskell interface! ;-) No, but is is in the human interface! Therefore, I talked only about (==). Ah, but you talked about things. Which things? Is one of the things you talked about the same as (==)? the same as (flip (==))? The best way to include ``='' seems to be the semantic equality of P-logic [Harrison-Kieburtz-2005], which is quite a heavy calibre, and at least in that paper, classes are not yet included. I expect it's hard work. It's hard work in much better behaved systems. My point is that it's worth it, in order to facilitate more meaningful discussions. All the best Conor ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Thursday 13 March 2008 07:33:12 pm Aaron Denney wrote: [snip] I've seen mention of difficulties with Data.Map, and edison, but not in enough detail to really grasp what the problems are. Until I do, my natural bias (which I'm trying to resist, really) is that it's a matter of lazy coding, not any inherent difficulty. For the specific case of Edison, the Haddock documentation for the following two modules tells the whole sordid story: http://hackage.haskell.org/packages/archive/EdisonAPI/1.2.1/doc/html/Data-Edison.html http://hackage.haskell.org/packages/archive/EdisonAPI/1.2.1/doc/html/Data-Edison-Coll.html The Cliff Notes version is that Edison assumes the following things about Eq and Ord instances: * An Eq instance correctly defines an equivalence relation (but not necessarily structural equality); that is, we assume (==) (considered as a relation) is reflexive, symmetric and transitive, but allow that equivalent items may be distinguishable by other means. * An Ord instance correctly defines a total order which is consistent with the Eq instance for that type. It's not explicitly stated, but Edison also assumes that the operations within a class are consistent, i.e., that (not (x == y)) calculates the same function as (x /= y), etc. I suppose that should go in the docs too. Edison makes no particular assumptions about min and max, except that they are consistent with the defined order. Anyway, the end result for Edison is that some operations aren't well-defined, and can't be made well-defined without restrictions. For example, consider the operation of folding in non-decreasing order over the elements of a multi-set. If the function being folded distinguishes between two elements x and y, but (compare x y) = EQ, and x and y are both contained in the multi-set, then the result of the fold depends on internal state that is not supposed to be user-visible (e.g., the exact shape of a balanced tree). Blah, blah, blah, its all in the documentation. The point is that making loose assumptions about the meaning of the operations provided by Eq and Ord complicates things in ways that can't be made to go away. Rob Dockins ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: Some clarity please! (was Re: [Haskell-cafe] Re: (flawed?) benchmark : sort)
Adrian Hey wrote: I would ask for any correct Eq instance something like the law: (x==y) = True implies x=y (and vice-versa) which implies f x = f y for all definable f which implies (f x == f y) = True (for expression types which are instances of Eq). This pretty much requires structural equality for concrete types. For abstract types you can do something different provided functions which can give different answers for two equal arguments are not exposed. How do you propose something like this to be specified in the language definition? The report doesn't (and shouldn't) know about abstract types. It only talks about things which are exported and things which are not. The distinction between implementation modules and client modules is made by the programmer, not by the language. So you can either require your law to hold everywhere, which IMO isn't a good idea, or you don't require it to hold. From the language definition point of view, I don't see any middle ground here. Also, when you talk about definable functions, do you include things like I/O? What if I want to store things (such as a Set) on a disk? If the same abstract value can have multiple representations, do I have to convert them all to some canonical representation before writing them to a file? This might be rather slow and is, IMO, quite unnecessary. From a more philosophical points of view, I'd say that the appropriate concept of equality depends a lot on the problem domain. Personally, I quite strongly disagree with restricting Eq instances in the way you propose. I have never found much use for strict structural equality (as opposed to domain-specific equality which may or may not coincide with the structural one). On the subject of ambiguity, bugs and maxima, I see we have in Data.List [...] So I believe I'm correct in saying that maximumBy returns the last of several possible maximum elements of the list. This obviously needs specifying in the Haddock. Because maximumBy documentation is ambiguous in this respect, so is the overloaded maximum documentation. At least you can't figure it out from the Haddock. Why not simply say that maximumBy returns some maximum element from the list but it's not specified which one? That's how I always understood the spec. Code which needs a particular maximum element can't use maximumBy but such code is rare. I don't see how this is ambiguous, it just leaves certain things unspecified which is perfectly ok. Roman ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Tue, Mar 11, 2008 at 4:01 AM, Adrian Hey [EMAIL PROTECTED] wrote: and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Isn't 3 also a permutation? Why is it incorrect? Because it is not stable. The documentation for Data.List.sort says the sort is stable: The sort function implements a stable sorting algorithm. A stable sort respects the order of equal elements as they occur in the input list. -- Denis ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Denis Bueno wrote: On Tue, Mar 11, 2008 at 4:01 AM, Adrian Hey [EMAIL PROTECTED] wrote: and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Isn't 3 also a permutation? Why is it incorrect? Because it is not stable. The documentation for Data.List.sort says the sort is stable: The sort function implements a stable sorting algorithm. A stable sort respects the order of equal elements as they occur in the input list. This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. So really I think the docs have this backwards. It's sortBy that implements a stable sort (assuming a suitably sane comparison function I guess) and apparently sort is whatever you get from (sortBy compare). But this is unduly restrictive on possible correct sort implementations IMO. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Speaking as someone who often has to answer questions along the lines of why isn't my code generating the results I want on your system?, I wouldn't call it evil, just commonly mistaken for Real. NaN breaks most assumptions about ordering: (NaN = _) == false (NaN == _) == false (NaN = _) == false It doesn't even compare equal to a bitwise copy of itself. This would imply that it's impossible to write a stable sort for (IEEE) floating-point types. x (x+n) (and variations thereof) does not always hold either. - Derek Brandon S. Allbery KF8NH wrote: On Mar 11, 2008, at 0:20 , Chaddaï Fouché wrote: 2008/3/11, David Menendez [EMAIL PROTECTED]: I think Adrian is just arguing that a == b should imply f a == f b, for all definable f, in which case it doesn't *matter* which of two equal elements you choose, because there's no semantic difference. I completely agree that this propriety should be true for all Eq instance exported by a public module. I don't care if it is not the case in a isolated code, but libraries shouldn't break expected invariant (or at least be very cautious and warn the user). Even Eq Double respects this propriety as far as I know. I wouldn't want to bet on that (Eq Double, that is). Floating point's just *evil*. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Tue, Mar 11, 2008 at 01:43:36AM -0400, Brandon S. Allbery KF8NH wrote: On Mar 11, 2008, at 0:20 , Chaddaï Fouché wrote: 2008/3/11, David Menendez [EMAIL PROTECTED]: I think Adrian is just arguing that a == b should imply f a == f b, for all definable f, in which case it doesn't *matter* which of two equal elements you choose, because there's no semantic difference. I completely agree that this propriety should be true for all Eq instance exported by a public module. I don't care if it is not the case in a isolated code, but libraries shouldn't break expected invariant (or at least be very cautious and warn the user). Even Eq Double respects this propriety as far as I know. I wouldn't want to bet on that (Eq Double, that is). Floating point's just *evil*. I wouldn't bet on it either: Prelude 0.0 == -0.0 True Prelude isNegativeZero 0.0 == isNegativeZero (-0.0) False Although isNegativeZero might be considered a ``private, internal interface that exposes implementation details.'' Groeten, Remi ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Adrian Hey [EMAIL PROTECTED] writes: So really I think the docs have this backwards. It's sortBy that implements a stable sort (assuming a suitably sane comparison function I guess) and apparently sort is whatever you get from (sortBy compare). But this is unduly restrictive on possible correct sort implementations IMO. Somebody (maybe you?) suggested that 'sort' should be a function in class Ord, giving the implementer freedom to decide exactly what is optimal for that particular data type. Could this also be used to solve the Data.Map issue? I mean, could Data.Map.insert use 'sort' instead of 'compare' to place new elements? For types where there is no observable difference between EQ elements (which you know when you instantiate Ord for the type), 'sort [a,b]' could return [a,a] when a == b, saving you space. For types with observably different but EQual values (like Neil's (Foo Int (Int-Int))), you would need to fall back to the old behavior. Just wondering. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Adrian Hey wrote: Denis Bueno wrote: On Tue, Mar 11, 2008 at 4:01 AM, Adrian Hey [EMAIL PROTECTED] wrote: and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Isn't 3 also a permutation? Why is it incorrect? Because it is not stable. The documentation for Data.List.sort says the sort is stable: The sort function implements a stable sorting algorithm. A stable sort respects the order of equal elements as they occur in the input list. This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. The fact that you can't implement a function to differentiation does not meant the difference is not important. [b,a] might cause 500G of file IO which [a,b] will not cause. This is not observable within haskell, but is very observable to the user. Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. Jules ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Derek Gladding wrote: Speaking as someone who often has to answer questions along the lines of why isn't my code generating the results I want on your system?, I wouldn't call it evil, just commonly mistaken for Real. Yes, of course. Double is an excellent example since it indicates that there exist types for which Ord (and Eq) instances are not quite sensible, but nonetheless we want them to exist because it is a real pain if they don't. (Or at least, we definitely want Ord. It's easier make the argument that we don't really want Eq) Jules ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Ketil Malde wrote: Adrian Hey [EMAIL PROTECTED] writes: So really I think the docs have this backwards. It's sortBy that implements a stable sort (assuming a suitably sane comparison function I guess) and apparently sort is whatever you get from (sortBy compare). But this is unduly restrictive on possible correct sort implementations IMO. Somebody (maybe you?) suggested that 'sort' should be a function in class Ord, giving the implementer freedom to decide exactly what is optimal for that particular data type. Could this also be used to solve the Data.Map issue? I mean, could Data.Map.insert use 'sort' instead of 'compare' to place new elements? I don't really think so. To be consistent people would have to do this all over the place, not just in Data.Map/Set. Anywhere where you have code like this (for Ord instances) if (x==y) then f x -- f y should be just as good! else g x y you'd now have to have something like.. if (x==y) then f (head (sort ([x,y])) else g x y Also, since the problem is with the concept of equality, in that we're now admitting that things can be equal but also not equal at the same time then choice should really be a method of the Eq class.. Something like.. -- Returns Nothing if args are not equal -- Just p if args are equal, where p is the prefered equal value chose :: Eq a = a - a - Maybe a Like I said, this way lies madness!! Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
I agree, I view == as some kind of equivalence relation in Haskell, and not a congruence relation (which would force x==y = f x == f y). Of course, the Haskell type system isn't strong enough to enforce anything more than it being a function returning a boolean. -- Lennart On Wed, Mar 12, 2008 at 12:55 AM, Aaron Denney [EMAIL PROTECTED] wrote: On 2008-03-11, Adrian Hey [EMAIL PROTECTED] wrote: Neil Mitchell wrote: Hi (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] Fortunately the Haskell sort is meant to be stable, I would have said it is meant to be *correct* first and *efficient* second. You're ruling out a whole bunch of possibly more efficient and correct sorts on the grounds that they may give observably different results for a tiny minority of (IMO broken) Eq/Ord instances. It's exactly your opinion that these are broken that we're arguing about. My view is that they are just equivalence and ordering relations, not complete equality relations. Using sortBy, or wrapping in a newtype with a different ordering instance and then using sort should be equivalent. Wrt to *sortBy* (vs. *sort*), I would be inclined to agree with you. I sure hope someone has proven that the (apparently) different sortBy implementations provided by ghc,nhc and h98 library report are precisely equivalent for all (type legal) function arguments. and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Isn't 3 also a permutation? Why is it incorrect? Stability -- see Fortunately the Haskell sort is meant to be stable, above. Adrian: I think its fairly clear we disagree about these things. However, we both understand the others point of view, so I guess its just a question of opinion - and I doubt either of us will change. As such I think any further discussion may just lead to sleep deprivation [1]. I think I'm coming from a more discrete maths/theoretical background while you are coming from a more practical/pragmatist background. If the discrete maths/theoretical POV necessatates to the kind of biasing madness of Data.Map/Set (for example) then it *sucks* bigtime IMO :-) Having tried this approach myself too (with the clone) I can confirm that *this way lies madness*, so in future I will not be making any effort to define or respect sane, unambiguous and stable behaviour for insane Eq/Ord instances for any lib I produce or hack on. Instead I will be aiming for correctness and optimal efficiency on the assumption that Eq and Ord instances are sensible. Good. But sensible only means that the Eq and Ord instances agree, not that x == y = f x == f y. -- Aaron Denney -- ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Jules Bean wrote: Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. The fact that you can't implement a function to differentiation does not meant the difference is not important. [b,a] might cause 500G of file IO which [a,b] will not cause. I can't imagine why, unless this is some weird side effect of lazy IO (which I thought was generally agreed to be a bad thing). What if it's the [a,b] ordering that causes this but the [b,a] ordering that doesn't? The choice is arbitrary (for sort), but neither is obviously correct. This is not observable within haskell, but is very observable to the user. Yes, there are plenty of things like space and time behaviour that are not observable in the semantic sense, but have real obvervable consequenses in the practical sense of real executables running on real machines. But constraints like this and Data.Set/Map left biasing often mean that implementations have to be made unnecessarily time and space *inefficient* for no good semantic reason. Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. I'm not arguing against it for sortBy. I wouldn't even object to the existance of an overloaded.. stableSort = sortBy compare by definition. I am arguing against it for the default sort that is applied to all types because for many types there will be more efficient alternatives which are perfectly correct in the semantic sense, but don't respect the (semantically meaningless IMO for Ord instances) stability property. Of course the proper place for this hypothetical sort (and several other variations) is probably as an Ord class method, not a single overloaded function in Data.List. I would also regard any use of stableSort (in preference to the hypothetical unstable overloaded sort) with about the same degree of suspicion as any use of unsafePerformIO. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Remi Turk wrote: I wouldn't bet on it either: Prelude 0.0 == -0.0 True Prelude isNegativeZero 0.0 == isNegativeZero (-0.0) False Although isNegativeZero might be considered a ``private, internal interface that exposes implementation details.'' Interesting example. So is the correct conclusion from this that all (polymorphic) code that assumes (x == y) = True implies x=y is inherently broken, or is just this particular Eq instance that's broken? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Aaron Denney wrote: On 2008-03-11, Adrian Hey [EMAIL PROTECTED] wrote: Having tried this approach myself too (with the clone) I can confirm that *this way lies madness*, so in future I will not be making any effort to define or respect sane, unambiguous and stable behaviour for insane Eq/Ord instances for any lib I produce or hack on. Instead I will be aiming for correctness and optimal efficiency on the assumption that Eq and Ord instances are sensible. Good. But sensible only means that the Eq and Ord instances agree, not that x == y = f x == f y. So can I assume that max x y = max y x? If not, how can I tell if I've made the correct choice of argument order. If I can't tell then I guess I have no alternative but document my arbitrary choice in the Haddock, and probably for the (sake of completeness) provide 2 or more alternative definitions of the same function which use a different argument order. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
I'd say that any polymorphic code that assumes that x==y implies x=y is broken. But apart from that, floating point numbers break all kinds of laws that we might expect to hold. Even so, they are convenient to have instances of various classes. On Wed, Mar 12, 2008 at 7:31 PM, Adrian Hey [EMAIL PROTECTED] wrote: Remi Turk wrote: I wouldn't bet on it either: Prelude 0.0 == -0.0 True Prelude isNegativeZero 0.0 == isNegativeZero (-0.0) False Although isNegativeZero might be considered a ``private, internal interface that exposes implementation details.'' Interesting example. So is the correct conclusion from this that all (polymorphic) code that assumes (x == y) = True implies x=y is inherently broken, or is just this particular Eq instance that's broken? Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. Nonsense. Consider a Schwartzian transform wrapper: data OrdWrap k v = OrdWrap k v instance (Ord k) = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = OrdWrap k1 k2 It would be incorrect (and not sane) for sort [a,b] to return [a,a] in this case, though a case could be made that either [a,b] or [b,a] make sense. Quoting Jules Bean [EMAIL PROTECTED]: Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. Stability is an occasionally very useful property. However, if there is a tradeoff between stability and performance, I'd prefer it if the library didn't choose for me. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
In OCaml you have sort and fastsort - the latter doesn't have to be stable. It currently is, because fastsort = sort. I think it is a good thing to leave people an option, if there is something important to choose. On Thu, Mar 13, 2008 at 12:48 AM, [EMAIL PROTECTED] wrote: G'day all. Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. Nonsense. Consider a Schwartzian transform wrapper: data OrdWrap k v = OrdWrap k v instance (Ord k) = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = OrdWrap k1 k2 It would be incorrect (and not sane) for sort [a,b] to return [a,a] in this case, though a case could be made that either [a,b] or [b,a] make sense. Quoting Jules Bean [EMAIL PROTECTED]: Stability is a nice property. I don't understand why you are arguing against this so aggressiviely. Stability is an occasionally very useful property. However, if there is a tradeoff between stability and performance, I'd prefer it if the library didn't choose for me. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Wed, Mar 12, 2008 at 7:48 PM, [EMAIL PROTECTED] wrote: Adrian Hey wrote: This might be a reasonable thing to say about *sortBy*, but not sort as the ordering of equal elements should not be observable (for any correct instance of Ord). It should be impossible to implement a function which can discriminate between [a,a],[a,b],[b,a],[b,b] if compare a b = EQ. Nonsense. Consider a Schwartzian transform wrapper: data OrdWrap k v = OrdWrap k v instance (Ord k) = Ord (OrdWrap k v) where compare (OrdWrap k1 v1) (OrdWrap k2 v2) = OrdWrap k1 k2 It would be incorrect (and not sane) for sort [a,b] to return [a,a] in this case, though a case could be made that either [a,b] or [b,a] make sense. Adrian is arguing that compare a b == EQ should imply compare (f a) (f b) == EQ for all functions f (excluding odd stuff). Thus, the problem with your example would be in the Ord instance, not the sort function. -- Dave Menendez [EMAIL PROTECTED] http://www.eyrie.org/~zednenem/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Quoting David Menendez [EMAIL PROTECTED]: Adrian is arguing that compare a b == EQ should imply compare (f a) (f b) == EQ for all functions f (excluding odd stuff). Thus, the problem with your example would be in the Ord instance, not the sort function. Understood, and the Schwartzian transform might be better understood as sortBy rather than sort. As others have noted, this really is a question of what Eq and Ord mean. And the answer to that is: Whatever makes the most domain-specific sense. Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mar12, [EMAIL PROTECTED] wrote: G'day all. Quoting David Menendez [EMAIL PROTECTED]: Adrian is arguing that compare a b == EQ should imply compare (f a) (f b) == EQ for all functions f (excluding odd stuff). Thus, the problem with your example would be in the Ord instance, not the sort function. Understood, and the Schwartzian transform might be better understood as sortBy rather than sort. As others have noted, this really is a question of what Eq and Ord mean. And the answer to that is: Whatever makes the most domain-specific sense. I think the notion of a quotient type (C / ~) may be helpful in this discussion. A quotient type represents the equivalence classes of some carrier type C under some equivalence relation ~. Functions of type (C / ~) - A are often defined by working with the underlying carrier type C. However, not all functions C - A define functions (C / ~) - A: to be a well-defined function on equivalence classes, the function C - A must respect the equivalence relation ~, in the sense that c ~ c implies f(c) =_A f(c') where =_A is whatever equality at A is. For example, you can think of a type Set of sets as (List / ~) where ~ equates two lists iff they are permutations of each other. Then a function List - A counts as a function Set - A iff it takes permutations to equal A's. For instance, you can't write a function tolist :: Set - List that simply dumps out the underlying representation, because then you can distinguish different representatives of the same equivalence class. Now, Haskell doesn't let you define quotient types directly, but you can code them up with abstract types: if you hide the implementation of a type C and ensure that all functions C - A respect some equivalence relation ~, then you effectively have a quotient type (C / ~), because all functions on C are well-defined on the equivalence classes. So, I think a way of summing up the two points of view on Eq are: (1) You're only allowed to add an instance Eq A where (==) = ~ if A is really (A / ~). Then all functions on A necessarily respect ==. (2) The instance for Eq A is just some equivalence relation ~ that I might quotient A by. I.e., in Eq A, is A the quotient type or the underlying carrier? Both are reasonable and useful notions, but it might make sense to have two different type classes for these two notions, since if you expect one and get the other you can get into trouble. -Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
ok: On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote: I've written little framework to work on. See sortbench.hs and sortbench.py attachments. Furthermore, I checked Yhc's implementation of sort: it is indeed very fast: I took his earlier code and plugged yhc's sort into it. Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically variant of natural merge) to be considerably slower than some of the alternatives. There is a pretty obvious way to speed up the YHC code which you would expect to provide nearly a factor of two speedup, and with the random integer data, it does. However, there is one simple but extremely important point which must be considered in evaluating a sorting routine: the library 'sort' function is, or should be, a *general-purpose* sort. It should be useful with any data type which is an instance of Ord or for which you can write a `cmp` function, and it should work at least as well with already-sorted input as with randomised input. quicksort (whose original reason for existence was to sort on a machine whose memory would disgrace today's wristwatches) is well known for doing deceptively well on randomised integer sequences. When I run Krzystztof's test harness (which I have currently brought up to 25 different sorting functions) with a list of the form [1..N] instead of a random list, suddenly all the variants of merge sort come out ahead of all the variants of quick sort. In fact his best version of quicksort, qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a list of 10,000 ordered integers. That can be improved by spending a bit of effort on choosing a good pivot, but then the quicksort algorithms are no longer so competitive for randomised inputs. The classic Engineering a Quicksort paper by Bentley and McIlroy from Software : Practice Experience recommends a whole bunch of distribution shapes (one run, two runs, sawtooth, organ pipes, random, ...) that should be benchmarked before drawing too many conclusions. It is also wise to try more than one data type. How do the different algorithms compare on random samples from a Scrabble dictionary? (Why that particular question? Because I mean to try it.) Right now, I remain happy with merge sort, because it is never mysteriously several thousand times slower than expected. Do you have these tests bundled up in a repository? It would be very useful to drop these into the base library testsuite, so we can point to this when needed. -- Don ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Jonathan Cast wrote: On 10 Mar 2008, at 4:00 AM, Adrian Hey wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Data.Map is implicitly using such an Ord instance behind the scenes, and I think it has to to maintain its own invariants. If I take the `union' of two maps that take the same key to different values, I have to decide which value to use, even if every Ord instance supplied by my clients is 100% Adrian-compliant. The biasing policy for Data.Map/Set is refering to (Set) elements, or (Map) keys, not the associated values (in a Map). So during an insertion op, if an equal element/key is found the Set/Map the biasing policy tells me which of the two equal elements/keys in incorporated in the resulting Set/Map. So there's an implied acceptance of the posibility that the choice is significant and that the two elements/keys can be both equal and not equal at the same time. This is crazy IMO. Implementors should not have to offer an guarantees about this and should be perfectly free to make their own unspecified choice regarding which of two equal values is used in any expression (on space efficiency grounds say). For example, the left biasing of insertion on Data.Set forces the implementation to burn O(log n) heap space creating a new equal set, even if the set already contains an old element that is equal to the new element. I would argue that in this situation it should be perfectly correct to simply return the old set instead. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Are you really sure that your results are correct? I obviously did all my tests with -O2 on. Please rerun your tests on the new framework. Also note that this one contains tests for three cases: - sorted - revsorted - randomized From previous results I can guess that with randomized input Yhc's code will be ~3 times slower then the fastest quicksort out there. But I'm not going to run O(n^2) algorithm to compare with O(n log n) - and this is the case for (rev?)sorted inputs. Christopher Skrzętnicki On Tue, Mar 11, 2008 at 5:14 AM, Richard A. O'Keefe [EMAIL PROTECTED] wrote: On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote: I've written little framework to work on. See sortbench.hs and sortbench.py attachments. Furthermore, I checked Yhc's implementation of sort: it is indeed very fast: I took his earlier code and plugged yhc's sort into it. Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically variant of natural merge) to be considerably slower than some of the alternatives. (...) When I run Krzystztof's test harness (which I have currently brought up to 25 different sorting functions) with a list of the form [1..N] instead of a random list, suddenly all the variants of merge sort come out ahead of all the variants of quick sort. In fact his best version of quicksort, qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a list of 10,000 ordered integers. That can be improved by spending a bit of effort on choosing a good pivot, but then the quicksort algorithms are no longer so competitive for randomised inputs. This paper looks interesting, I'm going to implement those tests. The classic Engineering a Quicksort paper by Bentley and McIlroy from Software : Practice Experience recommends a whole bunch of distribution shapes (one run, two runs, sawtooth, organ pipes, random, ...) that should be benchmarked before drawing too many conclusions. This is the right point. A further work will be to add different input generators to run them too. It is also wise to try more than one data type. How do the different algorithms compare on random samples from a Scrabble dictionary? (Why that particular question? Because I mean to try it.) ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Krzysztof Skrzętnicki [EMAIL PROTECTED] writes: The above results are for 100 Ints x 10 runs, but I don't expect any drastic changes in longer run. I leave the interpretation up to you. Might I suggest (also) testing with numbers of smaller magnitude? Lots of collisions is another killer for the naïve quicksort (albeit easily remedied, of course), and something a general sorting algorithm should handle. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b sort [Foo 1 2, Foo 1 -2] must return the original list back, in that order. You cannot delete things and duplicate them later. To guarantee the ordering you must also do other stuff. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi Can whoever picks this up please try the sort code from Yhc in their comparisons. In my benchmarks it ran up to twice as fast as the GHC code. http://darcs.haskell.org/yhc/src/packages/yhc-base-1.0/Data/List.hs I think what we really need is first quickCheck and timing framework for measuring sorts. After we have decided what makes one sort faster/better than another, then is the time to start deciding what sort is the best one. Ian did some initial work on this: http://www.haskell.org/pipermail/glasgow-haskell-users/2002-May/003376.html Until the sort-check package is uploaded to hackage I think most people will find it hard to be convinced of one sort over another. Thanks Neil On Mon, Mar 10, 2008 at 8:27 AM, Neil Mitchell [EMAIL PROTECTED] wrote: Hi 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b sort [Foo 1 2, Foo 1 -2] must return the original list back, in that order. You cannot delete things and duplicate them later. To guarantee the ordering you must also do other stuff. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On 10 Mar 2008, at 08:36, Neil Mitchell wrote: Can whoever picks this up please try the sort code from Yhc in their comparisons. In my benchmarks it ran up to twice as fast as the GHC code. http://darcs.haskell.org/yhc/src/packages/yhc-base-1.0/Data/List.hs I believe the Yhc sort implementation is faster because Lennart did some extensive performance tuning of sorting with hbc, about ten years ago, and contributed the resulting winner to nhc98 way back then. Regards, Malcolm ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mon, Mar 10, 2008 at 11:00 AM, Adrian Hey [EMAIL PROTECTED] wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). It could be. I actually don't know what Haskell specifies here. If a type has an Eq instance and x == y, must x and y be observationally equivalent? Or is it reasonable to allow some wiggle room? I'd say (==) definitely has to be an equivalence relation, but beyond that, it's open to the implementor, since if an algorithm only depends on (Eq a), it can't tell the difference between observational equality and any other equivalence relation. But that's just one argument (by example, in a way). That is, an argument that this is hopelessly broken isn't trivial, it needs to be defended. There is nonetheless a need to handle duplicates gracefully, that is keeping a count won't cut it, because of sortBy. Luke ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). The sort in Haskell is specified to be stable. What that means is that the above ord relation is fine. The above ordering observes all the necessary mathematical definitions of ordering, assuming an Eq of: instance Eq Foo where (==) (Foo a _) (Foo b _) = (==) a b Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Please give the sane law that this ordering violates. I can't see any! What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) Now, is the only possible answer that any Ord instance for Foo is wrong? Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Adrian Hey wrote: or specify and control the behaviour of their behaviour for such instances. Urk, sorry for the gibberish. I guess I should get into the habit of reading what I write before posting :-) Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). The sort in Haskell is specified to be stable. What that means is that the above ord relation is fine. The above ordering observes all the necessary mathematical definitions of ordering, assuming an Eq of: instance Eq Foo where (==) (Foo a _) (Foo b _) = (==) a b Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Please give the sane law that this ordering violates. I can't see any! The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. The Haskell standard doen't even specify that compare x y = EQ implies (x == y) = True, but again it should (what's the purpose of the Eq constraint on Ord class otherwise). What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) Now, is the only possible answer that any Ord instance for Foo is wrong? Yes, if the Foo constuctor is exported. If it's scope confined to one module then you could maintain the invariant that the same function is always associated with a given Int. However, if this is the case then the issue you raise wrt sort behaviour is irrelevant. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Monday 10 March 2008, Neil Mitchell wrote: That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b sort [Foo 1 2, Foo 1 -2] must return the original list back, in that order. You cannot delete things and duplicate them later. To guarantee the ordering you must also do other stuff. Ah! Quite right. So, instead, we'd have to store the elements themselves. Something like: treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x - (x,[x])) I had a feeling the duplicate counting one wasn't the correct answer, but your example slipped my mind last night. -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Wrong. It shouldn't, it doesn't, and I don't think it even can! The Haskell standard doen't even specify that compare x y = EQ implies (x == y) = True, but again it should (what's the purpose of the Eq constraint on Ord class otherwise). Correct. Yes, this is one law that _should_ be true, along with others: a b b c = a c a == b = b == a etc. But a == b = a = b is not a law that needs to hold, and not a law that can be stated in Haskell, even as a quickcheck property. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Wrong. It shouldn't, Should too it doesn't, indeed and I don't think it even can! Well we need to be precise about exactly what = means, but informally I guess we're talking about observational equvalence. But seriously, once you admit the possibility that even if x == y it still matters which of x or y is used in expressions than all hell breaks loose. I shudder to think just how much Haskell code there must be out there that is (at best) ambiguious or just plain broken if this is a serious possibility. Again, I have to cite Data.Map as an obvious example. It's unclear to me exactly what the proper interpretation of left biasing is for all functions in the API. Furthermore, until quite recently some function implementations in Data.Map we're actually broken wrt the stated biasing policy (though few actually noticed this for obvious reasons). Perhaps some still are? Who knows.. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Adrian Hey [EMAIL PROTECTED] writes: But seriously, once you admit the possibility that even if x == y it still matters which of x or y is used in expressions than all hell breaks loose. I shudder to think just how much Haskell code there must be out there that is (at best) ambiguious or just plain broken if this is a serious possibility. Just search for copy (on ByteStrings). One program of mine was extracting substrings from a large file. Since the file was pretty huge, I used lazy bytestrings for this purpose. Unfortunately, the short substrings I retained pulled with them rather large chunks from the file -- since a bytestring is essentially a pointer to a chunk, an offset, and a length. The solution is copy, which creates a new string, indistinguishable from within Haskell, but with very different effects on the program. -k -- If I haven't seen further, it is by standing in the footprints of giants ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Bulat Ziganshin wrote: Hello Adrian, Monday, March 10, 2008, 2:00:18 PM, you wrote: instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I h. for example, imagine files in file manager sorted by size or date In such cases you should be using sortBy, not the overloaded sort (you have several reasonable orderings for the same record type say). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mon, Mar 10, 2008 at 10:10 AM, Adrian Hey [EMAIL PROTECTED] wrote: The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Unless I'm missing something obvious, the example Neil gave earlier should make it clear how impossible this requirement is: What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) There is no way in general to decide the observational equivalence of two values of this data type (by reduction to the halting problem). Therefore it is impossible to write any function implementing such an equality test. -- Denis ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Ketil Malde wrote: Adrian Hey [EMAIL PROTECTED] writes: But seriously, once you admit the possibility that even if x == y it still matters which of x or y is used in expressions than all hell breaks loose. I shudder to think just how much Haskell code there must be out there that is (at best) ambiguious or just plain broken if this is a serious possibility. Just search for copy (on ByteStrings). One program of mine was extracting substrings from a large file. Since the file was pretty huge, I used lazy bytestrings for this purpose. Unfortunately, the short substrings I retained pulled with them rather large chunks from the file -- since a bytestring is essentially a pointer to a chunk, an offset, and a length. The solution is copy, which creates a new string, indistinguishable from within Haskell, but with very different effects on the program. We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). Indeed, one of my main objections the having things like biasing policy as part of a functions specification in that it seriously inhibits you when producing more refined and efficient implementations. BTW, I noticed this when I was writing my Data.Map clone. Respecting the stated biasing policy resulted in less efficient implementations. It broke my heart to knowingly write code that would slow down 99% of users code just keep the 1% who'd defined broken Ord instances happy, so I defined biasing policy differently for the clone. On reflection I think even that was a mistake and is something I intend drop if I ever do a Hackage release (the lib should not specify any biasing policy whatsoever). Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). I would have thought: sort x == sortBy compare x was a reasonable property to have. But you are proposing that sort should make assumptions on the compare function, which you can't even state in Haskell, but sortBy should not. That seems suspicious... I also know of a data type: data Set xs = Set [xs] where the Set constructor is all nicely hidden. If I define Set ab to be equal to Set ba, should the compiler burst into flames? If we _require_ all Eq definitions to follow our very narrowly defined notion of equality, then the question comes up why we permit people to write Eq at all - why doesn't the compiler just do it for us, if there is only one right answer. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Denis Bueno wrote: On Mon, Mar 10, 2008 at 10:10 AM, Adrian Hey [EMAIL PROTECTED] wrote: The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Unless I'm missing something obvious, the example Neil gave earlier should make it clear how impossible this requirement is: What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) There is no way in general to decide the observational equivalence of two values of this data type (by reduction to the halting problem). Therefore it is impossible to write any function implementing such an equality test. Did you read my original response to this example? http://www.haskell.org/pipermail/haskell-cafe/2008-March/040356.html Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mon, Mar 10, 2008 at 12:19 PM, Adrian Hey [EMAIL PROTECTED] wrote: Denis Bueno wrote: On Mon, Mar 10, 2008 at 10:10 AM, Adrian Hey [EMAIL PROTECTED] wrote: The Eq instance you've given violates the law that (x == y) = True implies x = y. Of course the Haskell standard doesn't specify this law, but it should. Unless I'm missing something obvious, the example Neil gave earlier should make it clear how impossible this requirement is: What if I had made the definition of Foo: data Foo = Foo Int (Int - Int) There is no way in general to decide the observational equivalence of two values of this data type (by reduction to the halting problem). Therefore it is impossible to write any function implementing such an equality test. Did you read my original response to this example? http://www.haskell.org/pipermail/haskell-cafe/2008-March/040356.html Yes. You would argue that one should not export the data constructor Foo. That is a decision that requires more details about the code providing Foo, although it certainly seems a reasonable approach in many cases. Supposing I wanted to export Foo, though, the condition you'd like to put on == breaks down. Even if I don't export Foo, how do I ensure that any standard library functions called from the Foo library don't depend on the condition you'd like to put on ==? Do I have to examine them individually? Wouldn't it be easier to reason about code if we constrain the semantics of == as *little* as possible (as an equivalence relation)? -- Denis ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: instance Eq Foo where (==) (Foo a _) (Foo b _) = (==) a b [...] Please give the sane law that this ordering violates. I can't see any! The (non-existant) law would be (Eq1) x == y = f x == f y, for all f of appropriate type which is analogous to this (existant) law about observational equality: (Eq2) x = y = f x = f y, for all f of appropriate type Kalman -- Finally - A spam blocker that actually works. http://www.bluebottle.com/tag/4 ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Neil Mitchell wrote: Hi We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). I would have thought: sort x == sortBy compare x was a reasonable property to have. Certainly, but this is part of (but not the complete) specification for sortBy, not sort. But given sane Ord/Eq instances and sortBy implementation, then this is indeed also one of many possible correct implementatations of sort. But you are proposing that sort should make assumptions on the compare function, Not just sort, but any function with an Ord constraint is entited to assume sane behavior wrt to compare. Without this the Ord class just becomes quite useless, other than saving a few keystrokes for folk who be bothered to pass any old compare function as explicit arg. Surely type classes are supposed to be more than that? which you can't even state in Haskell, There are plenty of formal statements about things that can't be written in Haskell. That doesn't mean they aren't true or should not be respected or relied upon. It just means Haskell is an imperfect language for expressing such things. but sortBy should not. sortBy should not assume anything about the function of type x - x - Ordering. Rather, what sortBy actually does with that function should be specified. I also know of a data type: data Set xs = Set [xs] where the Set constructor is all nicely hidden. If I define Set ab to be equal to Set ba, should the compiler burst into flames? ?? If we _require_ all Eq definitions to follow our very narrowly defined notion of equality, then the question comes up why we permit people to write Eq at all - why doesn't the compiler just do it for us, if there is only one right answer. You provided one example yourself.. data Foo = Foo Int (Int - Int) It's perfectly possible for Foo to be an abstract type exported from a module that preserves the invariant that the same function is always associated with a given Int (value). If this is the case there's no reason why Foo should not be an instance of Ord or Eq. If this isn't the case then Foo should certainly not be an instance or either class IMO. If this was intended to be the case but in fact isn't the case, then that's a bug. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. In fact there are requirements in the Haskell98 report: 6.3 Standard Haskell Classes Note the word reasonable in the paragraph below: Default class method declarations (Section 4.3) are provided for many of the methods in standard classes. A comment with each class declaration in Chapter 8 specifies the smallest collection of method definitions that, together with the default declarations, provide a reasonable definition for all the class methods. This (coupled with the premise that anything not required is optional) means that default definitions are not normative, so the following Ord default code comment need not hold: -- Note that (min x y, max x y) = (x,y) or (y,x) However, the report text is normative: 6.3.2 (The Ord Class): The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) There is an (seriously insane but required by the total ordering, and in any case) officially encouraged use of left-bias in sum types: The declared order of the constructors in the data declaration determines the ordering in derived Ord instances. Perhaps in Haskell' the total ordering requirement can be loosened to a partial order (say in a class between Eq and Ord), with comparePartial :: a - a - PartialOrdering? Dan Adrian Hey wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Ok, my turn now. Let's think about algorithm that takes equivalence relation EQ, ordering relation ORD on abstraction classes generated by this equivalence ( - equivalence classes ) and divides given input elements XS into appropriate classes and then prints them out according to given ordering ORD. If we pose the restriction (let's call it (*)), that input XS should have at most one element from every abstraction class, we get sorting in a way that you desire. Hovewer, if we don't pose that restriction the algorithm still makes perfect sense and is given by standard library sortBy. I see no reason for restriction (*). For efficiency reasons there could be additional class StrictEq. If the type is in that class then we can assume (*) and use more space-efficient algorithm. Now, the problem with treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x - (x,[x])) is that i can't tell any compact way of implementing treeSortBy in nice manner. This is because Data.Map does not allow us to provide our own comparison test instead of compare. On Mon, Mar 10, 2008 at 6:10 PM, Adrian Hey [EMAIL PROTECTED] wrote: Neil Mitchell wrote: Hi We're talking about *semantic correctness*, not efficiency. If you want to fine tune your code like this you shouldn't be relying on overloaded general purpose function implementations. E.G. the COrdering type used by AVL lib is one way to do this. This lets a combining comparison chose which of two equal values is used (and other things). I would have thought: sort x == sortBy compare x was a reasonable property to have. Certainly, but this is part of (but not the complete) specification for sortBy, not sort. But given sane Ord/Eq instances and sortBy implementation, then this is indeed also one of many possible correct implementatations of sort. But you are proposing that sort should make assumptions on the compare function, Not just sort, but any function with an Ord constraint is entited to assume sane behavior wrt to compare. Without this the Ord class just becomes quite useless, other than saving a few keystrokes for folk who be bothered to pass any old compare function as explicit arg. Surely type classes are supposed to be more than that? which you can't even state in Haskell, There are plenty of formal statements about things that can't be written in Haskell. That doesn't mean they aren't true or should not be respected or relied upon. It just means Haskell is an imperfect language for expressing such things. but sortBy should not. sortBy should not assume anything about the function of type x - x - Ordering. Rather, what sortBy actually does with that function should be specified. I also know of a data type: data Set xs = Set [xs] where the Set constructor is all nicely hidden. If I define Set ab to be equal to Set ba, should the compiler burst into flames? ?? If we _require_ all Eq definitions to follow our very narrowly defined notion of equality, then the question comes up why we permit people to write Eq at all - why doesn't the compiler just do it for us, if there is only one right answer. You provided one example yourself.. data Foo = Foo Int (Int - Int) It's perfectly possible for Foo to be an abstract type exported from a module that preserves the invariant that the same function is always associated with a given Int (value). If this is the case there's no reason why Foo should not be an instance of Ord or Eq. If this isn't the case then Foo should certainly not be an instance or either class IMO. If this was intended to be the case but in fact isn't the case, then that's a bug. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On the other hand, though the behavior of == is not defined by the Report, it does require in 6.3.1 that if compare is defined, then == must be defined. That strongly implies a semantic causal link (in the Free Theorem kind of way), that the semantics of Ord completely specify the semantics of Eq, and the only free and continuous way to specify this is to make == and EQ always agree. I would (almost) take this conclusion as normative as well. Dan Dan Weston wrote: Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. In fact there are requirements in the Haskell98 report: 6.3 Standard Haskell Classes Note the word reasonable in the paragraph below: Default class method declarations (Section 4.3) are provided for many of the methods in standard classes. A comment with each class declaration in Chapter 8 specifies the smallest collection of method definitions that, together with the default declarations, provide a reasonable definition for all the class methods. This (coupled with the premise that anything not required is optional) means that default definitions are not normative, so the following Ord default code comment need not hold: -- Note that (min x y, max x y) = (x,y) or (y,x) However, the report text is normative: 6.3.2 (The Ord Class): The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) There is an (seriously insane but required by the total ordering, and in any case) officially encouraged use of left-bias in sum types: The declared order of the constructors in the data declaration determines the ordering in derived Ord instances. Perhaps in Haskell' the total ordering requirement can be loosened to a partial order (say in a class between Eq and Ord), with comparePartial :: a - a - PartialOrdering? Dan Adrian Hey wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Unfortunately the Haskell standards don't currently specify sane laws for Eq and Ord class instances, but they should. Otherwise knowing a type is an instance of Ord tells me nothing that I can rely on. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) Are you sure? I would have read this as the ordering must be reflexive, antisymetric and transitive - the standard restrictions on any ordering. See http://en.wikipedia.org/wiki/Total_ordering Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mon, Mar 10, 2008 at 3:12 PM, Neil Mitchell [EMAIL PROTECTED] wrote: Hi The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) Are you sure? I would have read this as the ordering must be reflexive, antisymetric and transitive - the standard restrictions on any ordering. See http://en.wikipedia.org/wiki/Total_ordering This is my reading, too. In addition, to make it total, the property that any two elements are comparable (this is the property that a partial order does not necessarily have). -- Denis ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
It certainly makes perfect sense, because total order antisymmetry law states that IF a = b AND b = a THEN a = b However it should rather be written IF a = b AND b = a THEN a ~= b, since = could be any equivalence class. However, we can also specify the Ord on type type Foo = Foo Int (Int-Int) in this way: instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b which yields equivalence relation that is not assuming equivalence of the functions. So this restriction does not seem to work on Adrian Hey's side. Christopher Skrzętnicki On Mon, Mar 10, 2008 at 8:06 PM, Dan Weston [EMAIL PROTECTED] wrote: On the other hand, though the behavior of == is not defined by the Report, it does require in 6.3.1 that if compare is defined, then == must be defined. That strongly implies a semantic causal link (in the Free Theorem kind of way), that the semantics of Ord completely specify the semantics of Eq, and the only free and continuous way to specify this is to make == and EQ always agree. I would (almost) take this conclusion as normative as well. Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Am Montag, 10. März 2008 20:12 schrieb Neil Mitchell: Hi The Ord class is used for totally ordered datatypes. This *requires* that it be absolutely impossible in valid code to distinguish equivalent (in the EQ sense, not the == sense) things via the functions of Ord. The intended interpretation of these functions is clear and can be taken as normative: forall f . (compare x y == EQ and (f x or f y is defined)) == f x == f y) Are you sure? I would have read this as the ordering must be reflexive, antisymetric and transitive - the standard restrictions on any ordering. See http://en.wikipedia.org/wiki/Total_ordering But antisymmetry means that (x = y) (y = x) == x = y, where '=' means identity. Now what does (should) 'identity' mean? Depends on the type, I dare say. For e.g. Int, it should mean 'identical bit pattern', shouldn't it? For IntSet it should mean 'x and y contain exactly the same elements', the internal tree-structure being irrelevant. But that means IntSet shouldn't export functions that allow to distinguish (other than by performance) between x and y. In short, I would consider code where for some x, y and a function f we have (x = y) (y = x) [or, equivalently, compare x y == EQ] but f x /= f y broken indeed. So for data Foo = Foo Int (Int - Int), an Ord instance via compare (Foo a _) (Foo b _) = compare a b is okay if Foo is an abstract datatype and outside the defining module it's guaranteed that compare (Foo a f) (Foo b g) == EQ implies (forall n. f n == g n), but not if the data-constructor Foo is exported. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
No, '=' should not mean an identity but any equivalence relation. Therefore, we can use whatever equivalence relation suits us. The reasoning you provided is IMHO rather blur. Anyway, having possibility of using different equivalence relations is great because they mean different abstraction classes - and not all of them are isomorphic. On Mon, Mar 10, 2008 at 9:09 PM, Daniel Fischer [EMAIL PROTECTED] wrote: But antisymmetry means that (x = y) (y = x) == x = y, where '=' means identity. Now what does (should) 'identity' mean? Depends on the type, I dare say. For e.g. Int, it should mean 'identical bit pattern', shouldn't it? For IntSet it should mean 'x and y contain exactly the same elements', the internal tree-structure being irrelevant. But that means IntSet shouldn't export functions that allow to distinguish (other than by performance) between x and y. In short, I would consider code where for some x, y and a function f we have (x = y) (y = x) [or, equivalently, compare x y == EQ] but f x /= f y broken indeed. So for data Foo = Foo Int (Int - Int), an Ord instance via compare (Foo a _) (Foo b _) = compare a b is okay if Foo is an abstract datatype and outside the defining module it's guaranteed that compare (Foo a f) (Foo b g) == EQ implies (forall n. f n == g n), but not if the data-constructor Foo is exported. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi But antisymmetry means that (x = y) (y = x) == x = y, where '=' means identity. Now what does (should) 'identity' mean? I think you are using the word identity when the right would would be equality. Hence, the answer is, without a doubt, (==). If you define equality, then you are defining equality. In short, I would consider code where for some x, y and a function f we have (x = y) (y = x) [or, equivalently, compare x y == EQ] but f x /= f y broken indeed. I would consider it slightly bad code too. But not broken code. I think Ord functions can assume that Ord is a total ordering, nothing more. Nothing to do with the existence (or otherwise) of entirely unrelated code. Consider the following implementation of Data.Set, which *does* meet all the invariants in Data.Set: data Set a = Set [a] insert x (Set xs) = Set $ x : filter (/= x) xs elems (Set xs) = xs i.e. there is real code in the base libraries which breaks this notion of respecting classes etc. Is the interface to Data.Set broken? I would say it isn't. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
If x = y y = x does not imply that x == y, then Ord has no business being a subclass of Eq. By your logic, there is absolutely no constructive subclassing going on here, only an existence proof of (==) given (=). What is the rational basis of such an existence claim, unless == has the obvious meaning? Or should I take it that you are suggesting we should move Ord up to be a peer of Eq? Dan Neil Mitchell wrote: Hi But antisymmetry means that (x = y) (y = x) == x = y, where '=' means identity. Now what does (should) 'identity' mean? I think you are using the word identity when the right would would be equality. Hence, the answer is, without a doubt, (==). If you define equality, then you are defining equality. In short, I would consider code where for some x, y and a function f we have (x = y) (y = x) [or, equivalently, compare x y == EQ] but f x /= f y broken indeed. I would consider it slightly bad code too. But not broken code. I think Ord functions can assume that Ord is a total ordering, nothing more. Nothing to do with the existence (or otherwise) of entirely unrelated code. Consider the following implementation of Data.Set, which *does* meet all the invariants in Data.Set: data Set a = Set [a] insert x (Set xs) = Set $ x : filter (/= x) xs elems (Set xs) = xs i.e. there is real code in the base libraries which breaks this notion of respecting classes etc. Is the interface to Data.Set broken? I would say it isn't. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi If x = y y = x does not imply that x == y, then Ord has no business being a subclass of Eq. By your logic, there is absolutely no constructive subclassing going on here, only an existence proof of (==) given (=). What is the rational basis of such an existence claim, unless == has the obvious meaning? Is this directed at me? I think x = y y = x implies x == y. My point above was that where you have used x = y, I think = should be ==. I also think (compare x y == EQ) = (x == y), where = is bi-implication or boolean equality. i.e. Eq is a fine parent to Ord, but given a Eq/Ord pair they must be in agreement. Thanks Neil ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Krzysztof Skrze;tnicki wrote: Ok, my turn now. Let's think about algorithm that takes equivalence relation EQ, ordering relation ORD on abstraction classes generated by this equivalence ( - equivalence classes ) and divides given input elements XS into appropriate classes and then prints them out according to given ordering ORD. If we pose the restriction (let's call it (*)), that input XS should have at most one element from every abstraction class, we get sorting in a way that you desire. Hovewer, if we don't pose that restriction the algorithm still makes perfect sense and is given by standard library sortBy. I see no reason for restriction (*). I don't understand the above paragraph. Let's consider a simple example: (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] I would say they are all correct and equivalent for any sane Ord instance, though from the point of view of space efficiency 1 or 4 are preferable to 2 or 3. For efficiency reasons there could be additional class StrictEq. If the type is in that class then we can assume (*) and use more space-efficient algorithm. Now, the problem with treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x - (x,[x])) is that i can't tell any compact way of implementing treeSortBy in nice manner. This is because Data.Map does not allow us to provide our own comparison test instead of compare. As a practical matter, for benchmarking you should also count the actual number of comparisons needed, not just execution times for simple types (Ints presumably). Also, I think you'll find that the AVL lib gives better performance than Data.Map, particularly for sorted inputs. Unfortunately you can't use this implementation in the standard libs without making the AVL lib a standard lib (the same happens to be true of Data.Map too, thought this is widely perceived as being standard because of ghc library bundling :-) But actually I would say that if either (both) of these is faster than the the standard sort then this is some kind of performance bug with the current ghc release. They weren't faster last time I tested (with Ints). I also happen to think that sort should be made an Ord class method, so that trie based sorts are possible (which should be faster for complex data types). We should only use sort = sortBy compare as the default. Regards -- Adrian Hey ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Hi (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] Fortunately the Haskell sort is meant to be stable, and sorting is meant to be a permutation, so we happily have the situation where this has a correct answer: 2. Anything else is incorrect. Anyone submitting a revised sort through the Haskell libraries process will have to ensure the answer is 2. I hope someone does take the time to do this, as a faster base library will benefit everyone. Adrian: I think its fairly clear we disagree about these things. However, we both understand the others point of view, so I guess its just a question of opinion - and I doubt either of us will change. As such I think any further discussion may just lead to sleep deprivation [1]. I think I'm coming from a more discrete maths/theoretical background while you are coming from a more practical/pragmatist background. Thanks Neil [1] http://xkcd.com/386/ ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Mon, Mar 10, 2008 at 10:13 PM, Adrian Hey [EMAIL PROTECTED] wrote: (sort [a,b]) in the case we have: (compare a b = EQ) Which of the following 4 possible results are correct/incorrect? 1- [a,a] 2- [a,b] 3- [b,a] 4- [b,b] I'd say 2 and 3 are sane, while 2 is correct - because we need stable sort. Stable - this is the keyword! If `==` would mean identity then we wouldn't need a stable sorting algorithm. Christopher Skrzętnicki ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
Am Montag, 10. März 2008 21:34 schrieb Neil Mitchell: Hi But antisymmetry means that (x = y) (y = x) == x = y, where '=' means identity. Now what does (should) 'identity' mean? I think you are using the word identity when the right would would be equality. Hence, the answer is, without a doubt, (==). If you define equality, then you are defining equality. Okay, bad choice of words. Of course I expect compare x y == EQ == x == y for any Ord instance. And for f :: (Eq a, Eq b) = a - b I expect (x == y) == (f x == f y). In short, I would consider code where for some x, y and a function f we have (x = y) (y = x) [or, equivalently, compare x y == EQ] but f x /= f y broken indeed. I would consider it slightly bad code too. But not broken code. I Perhaps 'broken' is a stronger word than I thought. I wouldn't say there can never be a reason for such. It would not necessarily be *badly* broken, though at the moment I can't see a case where such behaviour (in an exported function) would be reasonable. Of course, internal fuctions are a different matter. think Ord functions can assume that Ord is a total ordering, nothing more. Nothing to do with the existence (or otherwise) of entirely unrelated code. Consider the following implementation of Data.Set, which *does* meet all the invariants in Data.Set: data Set a = Set [a] insert x (Set xs) = Set $ x : filter (/= x) xs elems (Set xs) = xs i.e. there is real code in the base libraries which breaks this notion of respecting classes etc. Is the interface to Data.Set broken? I would say it isn't. I would say, if we have x = Set [1,2], y = Set [2,1] and an Eq instance where x == y, then elems shouldn't be exported. Thanks Neil Cheers, Daniel ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
G'day all. Quoting Dan Weston [EMAIL PROTECTED]: 6.3.2 (The Ord Class): The Ord class is used for totally ordered datatypes. So... Double shouldn't be there, then? As previously noted, nowhere is it even required that x /= y should do the same thing as not (x == y). Cheers, Andrew Bromage ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
I've written little framework to work on. See sortbench.hs and sortbench.pyattachments. Furthermore, I checked Yhc's implementation of sort: it is indeed very fast: [EMAIL PROTECTED] sorting]$ python sortbench.py Benchmark type: OnSorted [1 of 1] Compiling Main ( sortbench.hs, sortbench.o ) Linking sortbenchOnSorted.bin ... 1/10 (...) 10/10 Total time: 171.392577887 Scaled vs best.: ('yhcSort', 1.0) ('sort', 4.1826933506099904) ('treeSort', 4.2466878529708207) Benchmark type: OnRevsorted [1 of 1] Compiling Main ( sortbench.hs, sortbench.o ) Linking sortbenchOnRevsorted.bin ... 1/10 (...) 10/10 Total time: 187.789487839 Scaled vs best.: ('yhcSort', 1.0) ('treeSort', 1.2973727012306746) ('sort', 1.3028663057478311) Benchmark type: OnRandom [1 of 1] Compiling Main ( sortbench.hs, sortbench.o ) Linking sortbenchOnRandom.bin ... 1/10 (...) 10/10 Total time: 289.231264114 Scaled vs best.: ('yhcSort', 1.0) ('treeSort', 1.1167200854190948) ('sort', 1.2050043053111394) The above results are for 100 Ints x 10 runs, but I don't expect any drastic changes in longer run. I leave the interpretation up to you. I must also admit there are not quickCheck properties in the code. Maybe someone will want to write some. Christopher Skrzętnicki On Mon, Mar 10, 2008 at 9:36 AM, Neil Mitchell [EMAIL PROTECTED] wrote: Hi Can whoever picks this up please try the sort code from Yhc in their comparisons. In my benchmarks it ran up to twice as fast as the GHC code. http://darcs.haskell.org/yhc/src/packages/yhc-base-1.0/Data/List.hs I think what we really need is first quickCheck and timing framework for measuring sorts. After we have decided what makes one sort faster/better than another, then is the time to start deciding what sort is the best one. Ian did some initial work on this: http://www.haskell.org/pipermail/glasgow-haskell-users/2002-May/003376.html Until the sort-check package is uploaded to hackage I think most people will find it hard to be convinced of one sort over another. sorting.tar.gz Description: GNU Zip compressed data #! /usr/bin/python import os, time CLEAN_CMD = 'rm -f *.o *.hi *.bin' BUILD_CMD = 'ghc -main-is main%s --make sortbench.hs -o %s' BINARY_NAME = 'sortbench%s.bin' TYPES = ['OnSorted','OnRevsorted','OnRandom'] def clean(): os.system(CLEAN_CMD) def build( benchType ): binName = BINARY_NAME % (benchType,) os.system(BUILD_CMD % (benchType, binName)) def test( benchType ): print 'Benchmark type:', benchType clean() build(benchType) binName = BINARY_NAME % (benchType,) DATA_LENGTH = 100 # beware, 300 gives ~700Mb memory consumption for Int TEST_CASES = 10 dateNow = _.join(map(str,time.localtime()[:-3]))[2:] # ugly, but works output_log = 'sortbench%s_%s.txt' % (benchType,dateNow) start = time.time() for i in range(TEST_CASES): print %d/%d % (i+1,TEST_CASES) os.system('./%s %d 1/dev/null 2%s' % (binName, DATA_LENGTH, output_log) ) stop = time.time() print 'Total time:',stop-start dc = {} for line in open(output_log): (t,al) = eval(line) dc.setdefault( al, 0 ); dc[al] += t print Scaled vs best.: g = list(dc.iteritems()) g.sort(key = lambda (x,y) : y ) for el in g: print tuple([el[0],float(el[1]) / g[0][1]]) def main(): for t in TYPES: test(t) if __name__ == '__main__': main() {-# OPTIONS_GHC -O2 -fbang-patterns #-} module Main where import System.CPUTime import System.IO import System.Environment import System.Random import Data.List( partition, sort, unfoldr ) import Control.Parallel.Strategies import Control.Arrow import qualified Data.Map as Map -- functions to benchmark treeSort = concatMap (reverse . snd) . Map.toAscList . Map.fromListWith (++) . map (\x - (x,[x])) yhcSort :: (Ord a) = [a] - [a] yhcSort = sortByYhc compare sortByYhc cmp = mergeAll . sequences where sequences (a:b:xs) | a `cmp` b == GT = descending b [a] xs | otherwise = ascending b [a] xs sequences xs = [xs] descending a as (b:bs) | a `cmp` b == GT = descending b (a:as) bs descending a as bs = (a:as): sequences bs ascending a as (b:bs) | a `cmp` b /= GT = ascending b (a:as) bs ascending a as bs = rev as [a] : sequences bs rev (x:xs) ys = rev xs (x:ys) rev [] ys = ys mergeAll [x] = x mergeAll xs = mergeAll (mergePairs xs) mergePairs (a:b:xs) = merge a b: mergePairs xs mergePairs xs = xs merge as@(a:as') bs@(b:bs') | a `cmp` b == GT = b:merge as bs' | otherwise = a:merge as' bs merge [] bs = bs merge as [] = as -- begin benchmark making code makeBenchs benchs xs = do let (funcNames, funcs) = unzip benchs tBegin - getCPUTime timers - mapM (\f- print (f xs) getCPUTime) funcs let times = zipWith (-) timers (tBegin:timers)
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On 10 Mar 2008, at 4:00 AM, Adrian Hey wrote: Neil Mitchell wrote: 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. That would be wrong. Consider: data Foo = Foo Int Int instance Ord Foo where compare (Foo a _) (Foo b _) = compare a b I would consider such an Ord instance to be hopelessly broken, and I don't think it's the responsibility of authors of functions with Ord constraints in their sigs (such as sort) to consider such possibilities or specify and control the behaviour of their behaviour for such instances. Trying to do this is what leads to horrors such as the left biasing of Data.Map (for example). Data.Map is implicitly using such an Ord instance behind the scenes, and I think it has to to maintain its own invariants. If I take the `union' of two maps that take the same key to different values, I have to decide which value to use, even if every Ord instance supplied by my clients is 100% Adrian-compliant. jcc ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On 11 Mar 2008, at 12:27 pm, Krzysztof Skrzętnicki wrote: I've written little framework to work on. See sortbench.hs and sortbench.py attachments. Furthermore, I checked Yhc's implementation of sort: it is indeed very fast: I took his earlier code and plugged yhc's sort into it. Compiling with ghc -O2 using GHC 6.8.2, I found the yhc code (basically variant of natural merge) to be considerably slower than some of the alternatives. There is a pretty obvious way to speed up the YHC code which you would expect to provide nearly a factor of two speedup, and with the random integer data, it does. However, there is one simple but extremely important point which must be considered in evaluating a sorting routine: the library 'sort' function is, or should be, a *general-purpose* sort. It should be useful with any data type which is an instance of Ord or for which you can write a `cmp` function, and it should work at least as well with already-sorted input as with randomised input. quicksort (whose original reason for existence was to sort on a machine whose memory would disgrace today's wristwatches) is well known for doing deceptively well on randomised integer sequences. When I run Krzystztof's test harness (which I have currently brought up to 25 different sorting functions) with a list of the form [1..N] instead of a random list, suddenly all the variants of merge sort come out ahead of all the variants of quick sort. In fact his best version of quicksort, qsort_iv, comes out fully 1155 times slower than the YHC algorithm on a list of 10,000 ordered integers. That can be improved by spending a bit of effort on choosing a good pivot, but then the quicksort algorithms are no longer so competitive for randomised inputs. The classic Engineering a Quicksort paper by Bentley and McIlroy from Software : Practice Experience recommends a whole bunch of distribution shapes (one run, two runs, sawtooth, organ pipes, random, ...) that should be benchmarked before drawing too many conclusions. It is also wise to try more than one data type. How do the different algorithms compare on random samples from a Scrabble dictionary? (Why that particular question? Because I mean to try it.) Right now, I remain happy with merge sort, because it is never mysteriously several thousand times slower than expected. ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
2008/3/11, David Menendez [EMAIL PROTECTED]: I think Adrian is just arguing that a == b should imply f a == f b, for all definable f, in which case it doesn't *matter* which of two equal elements you choose, because there's no semantic difference. (Actually, it's probably not desirable to require it for *all* definable functions, since an implementation might define e.g. an unsafe function that does pointer comparisons. We'd probably also exclude functions using a private, internal interface that exposes implementation details.) I like that property, and it bugs me when I have to use a datatype whose Eq instance doesn't have it (either because (==) throws away information or because the type exposes non-semantic information). I completely agree that this propriety should be true for all Eq instance exported by a public module. I don't care if it is not the case in a isolated code, but libraries shouldn't break expected invariant (or at least be very cautious and warn the user). Even Eq Double respects this propriety as far as I know. Ord case is less evident, but assuming a propriety like (compare x y = EQ = x == y) seems like a reasonable guess. Doing it in a library (with a warning) doesn't seems all that bad to me. -- Jedaï ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote: Ok, I did some search and found Data.Map, which can be used to implement pretty fast sorting: import qualified Data.Map as Map treeSort :: Ord a = [a] - [a] treeSort = map (\(x,_) - x ) . Map.toAscList . Map.fromList . map (\x-(x,())) In fact It is likely to behave like sort, with the exception that it is 23% faster. I did not hovever check the memory consumption. It works well on random, sorted and reverse-sorted inputs, and the speed difference is always about the same. I belive I could take Data.Map and get datatype isomorphic to specialized *Data.Map a ()* of it, so that treeSort will became Map.toAscList . Map.fromList. This may also bring some speedup. What do you think about this particular function? Some thoughts: 1) To get your function specifically, you could just use Data.Set.Set a instead of Map a (). 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. 3) I imagine the time taken to get any output is always O(n*log n). Various lazy sorts can be written (and I'd guess the standard library sort is written this way, although I don't know for sure) such that 'head (sort l)' is O(n), or O(n + k*log n) for getting the first k elements. However, Map, being a balanced binary tree, doesn't (I think) have the right characteristics for this. At the very least, you'll probably want to test with a function that doesn't delete duplicate elements. Something like this: treeSort = concatMap (\(x,n) - replicate n x) . Map.toAscList . Map.fromListWith (+) . map (\x - (x,1)) -- Dan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
dan.doel: On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote: Ok, I did some search and found Data.Map, which can be used to implement pretty fast sorting: import qualified Data.Map as Map treeSort :: Ord a = [a] - [a] treeSort = map (\(x,_) - x ) . Map.toAscList . Map.fromList . map (\x-(x,())) In fact It is likely to behave like sort, with the exception that it is 23% faster. I did not hovever check the memory consumption. It works well on random, sorted and reverse-sorted inputs, and the speed difference is always about the same. I belive I could take Data.Map and get datatype isomorphic to specialized *Data.Map a ()* of it, so that treeSort will became Map.toAscList . Map.fromList. This may also bring some speedup. What do you think about this particular function? Some thoughts: 1) To get your function specifically, you could just use Data.Set.Set a instead of Map a (). 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. And a little QuickCheck to help things along: import qualified Data.Map as Map import Data.List import Test.QuickCheck treeSort :: Ord a = [a] - [a] treeSort = map (\(x,_) - x ) . Map.toAscList . Map.fromList . map (\x-(x,())) main = quickCheck prop_sort prop_sort xs = sort xs == treeSort xs where _ = xs :: [Int] Running: $ runhaskell A.hs Falsifiable, after 11 tests: [-2,-2,5] ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe
Re: [Haskell-cafe] Re: (flawed?) benchmark : sort
On Sun, 2008-03-09 at 23:04 -0400, Dan Doel wrote: On Sunday 09 March 2008, Krzysztof Skrzętnicki wrote: What do you think about this particular function? Some thoughts: 1) To get your function specifically, you could just use Data.Set.Set a instead of Map a (). 2) What does it do with duplicate elements in the list? I expect it deletes them. To avoid this, you'd need to use something like fromListWith, keeping track of how many duplicates there are, and expanding at the end. 3) I imagine the time taken to get any output is always O(n*log n). Various lazy sorts can be written (and I'd guess the standard library sort is written this way, although I don't know for sure) such that 'head (sort l)' is O(n), or O(n + k*log n) for getting the first k elements. However, Map, being a balanced binary tree, doesn't (I think) have the right characteristics for this. Sounds to me like we should try a heap sort. As a data structure it should have similar constant factors to Data.Map (or .Set) but a heap is less ordered than a search tree and gives the O(n + k*log n) property. Duncan ___ Haskell-Cafe mailing list Haskell-Cafe@haskell.org http://www.haskell.org/mailman/listinfo/haskell-cafe