Tony Plate wrote: > > AFAIK identical() first introduced by Chambers in "Programming with > data"? On p262 he writes: > > identical: The two objects must be exactly equal in all respects; if not > identical returns FALSE > all.equal: The two objects are expected to be identical up to small > differences that might be considered irrelevant... > > Taken literally, this would seem to argue against identical() treating > attributes as a set (unless one were to tighten up the definition of > attributes in Section 2.2 of the R Language Definition to explicitly state > that attributes are to be treated as an unordered set).
We're certainly down in the fine points here, so arguments either way aren't very strong, but on the whole it seems cleaner to keep identical on the pedantic side, dealing with what's actually in the object, rather than what was "meant". Yes, for practical purposes attribute order better NOT matter, but we do store the attributes in a way that creates an "order", i.e., as an internal vector or list structure rather than, say, a hash table. > > However, given the primary use of identical() on complex objects is in > software testing, and AFAIK no software depends on the order of attributes, > I still think it would be reasonable for attributes to be treated as a set > by identical(). (Unless anyone can show that it's important to recognize > order of attributes in some code.) Treating attributes as a set would have some logical appeal, but it seems likely the fix would have to be more widespread than just to identical(). Otherwise, for example, you could find yourself in a situation where: identical(x,y) was TRUE but identical(attributes(x), attributes(y)) was FALSE, because attributes() just reported out the attributes in their (irrelevant) stored order. > > I'm proposing a more general fix for this problem because I strongly > suspect that factor subsetting is not the only thing that can change the > order of attributes, and because I've wasted many hours tracking down > problems that turned out to be caused by problems with data.dump() and > identical() in S-plus. Another possible fix might be for the attr() and > attributes() replacement functions to store attributes as a sorted list. I > don't know if this would be easy or difficult to implement, or what > consequences it might have in terms of existing tests that involve printed > output of attributes. Yes, as above, it does seem that a satisfactory solution would require treating attributes() as something other than a vector, returned in internal order. Once started down this path, there are a number of other cases where a vector has been used, for convenience, when an unordered set was the more likely model. I think there have been debates over whether the order of the levels of an unordered factor should be considered relevant. It would increase consistency to replace vectors in these examples with an efficient structure that only depended on the set of values (presumably a suitable hashing mechanism would do). But it's not too likely to get to the head of the priority queue, I'd guess. It's not out of the question, as an alternative that doesn't require deep changes to the system, to write methods for identical() for some classes of objects. > > -- Tony Plate > > At Tuesday 09:13 AM 4/20/2004, Prof Brian Ripley wrote: > >I wondered that, but I think we need to hear from the author of > >identical(). > > > >It is neater to have attributes printed in a consistent order, though. > > > >On Tue, 20 Apr 2004, Tony Plate wrote: > > > > > What about changing identical() to ignore the order of attributes? Is > > > there any code anywhere that depends on the order of attributes, other > > than > > > identical()? I've only seen attributes treated as an unordered set, and > > > never as an ordered list. There are some functions in S-plus that change > > > the order of attributes, and the only thing this affects is > > > identical(). (Which in S-plus also pays attention to the order of > > attributes.) > > > > > > -- Tony Plate > > > > > > At Tuesday 05:42 AM 4/20/2004, [EMAIL PROTECTED] wrote: > > > >"Swinton, Jonathan" <[EMAIL PROTECTED]> writes: > > > > > > > > > # works as expected > > > > > > ac <- c('A','B'); > > > > > > identical(ac,ac[1:2]) > > > > > [1] TRUE > > > > > > > > > > #but > > > > > > af <- factor(ac) > > > > > > identical(af,af[1:2]) > > > > > [1] FALSE > > > > > > > > > > Any opinions? > > > > > > > >Did a cross-check with Splus and it doesn't do that , so I think it > > > >qualifies as a bug. Shouldn't be too hard to fix (might lose a little > > > >efficiencty though). > > > > > > > >-- > > > > O__ ---- Peter Dalgaard Blegdamsvej 3 > > > > c/ /'_ --- Dept. of Biostatistics 2200 Cph. N > > > > (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 > > > >~~~~~~~~~~ - ([EMAIL PROTECTED]) FAX: (+45) 35327907 > > > > > > > >______________________________________________ > > > >[EMAIL PROTECTED] mailing list > > > >https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > > > > > > ______________________________________________ > > > [EMAIL PROTECTED] mailing list > > > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel > > > > > > > > > >-- > >Brian D. Ripley, [EMAIL PROTECTED] > >Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > >University of Oxford, Tel: +44 1865 272861 (self) > >1 South Parks Road, +44 1865 272866 (PA) > >Oxford OX1 3TG, UK Fax: +44 1865 272595 > > ______________________________________________ > [EMAIL PROTECTED] mailing list > https://www.stat.math.ethz.ch/mailman/listinfo/r-devel -- John M. Chambers [EMAIL PROTECTED] Bell Labs, Lucent Technologies office: (908)582-2681 700 Mountain Avenue, Room 2C-282 fax: (908)582-3340 Murray Hill, NJ 07974 web: http://www.cs.bell-labs.com/~jmc ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-devel