On Mon, Feb 01, 2010 at 11:35:15AM -0800, [email protected] wrote:
> On Mon, Feb 01, 2010 at 01:29:42PM -0600, Shawn Walker wrote:
> > On 02/ 1/10 01:26 PM, [email protected] wrote:
> > >On Mon, Feb 01, 2010 at 01:17:18PM -0600, Shawn Walker wrote:
> > >>    Why not use a dict instead here?
> > >
> > >Storing the objects in a list uses less space.  Since the requests are
> > >now unique when identified by a url and uuid tuple, it seemed obvious to
> > >store these as a tuple.  Can you explain what you think will be gained
> > >by using a dict here?  It didn't seem like an obvious choice to me.
> > 
> > A dict or a set would be fine, but checking for a key in a dict
> > should be faster than checking for a tuple pair in a list.  I don't
> > know about a set.
> 
> Ok.  I will benchmark different approaches before making a final
> decision.

According to my tests, a set is just a little bit slower than a
dictionary.  Since I don't really need any of the other features of a
dictionary, I'm going with the set, unless there are objections.

Here's what I came up with:

Time to find entry in 15000 item list
6.14760088921
Time to not find entry in 15000 item list
5.86997008324
Time to find entry in 15000 item set
0.00315093994141
Time to not find entry in 15000 item set
0.00269889831543
Time to find entry in 15000 item dict
0.00259494781494
Time to not find entry in 15000 item dict
0.00237011909485

Source is attached, for those who are curious.

-j
#!/usr/bin/python2.6

#
# Copyright 2010 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#

import uuid
from timeit import Timer

def test_list_found():
        """Test list found"""

        if entryf in list1:
                pass

def test_list_notfound():
        """Test list found"""

        if entrynf in list1:
                pass

def test_set_found():
        """Test set found"""

        if entryf in set1:
                pass

def test_set_notfound():
        """Test set found"""

        if entrynf in set1:
                pass

def test_dict_found():
        """Test if found in dictionary"""

        if fmrif in dict1:
                pass

def test_dict_notfound():
        """Test if found in dictionary"""

        if fmrinf in dict1:
                pass

def gen_uniq_tup_fu(count):
        """Generate (fmri, uuid) tuples with unique fmris."""

        tup_list = []

        for i in range(count):
                fmristr = "http://www.ipkg-%d.com"; % i
                uuid_val = uuid.uuid4().int
                tup_list.append((fmristr, uuid_val))

        return tup_list

def gen_dict_fu(count):
        """Generate dict[fmri] = [uuid]"""

        test_dict = {}

        for i in range(count):
                fmristr = "http://www.ipkg-%d.com"; % i
                uuid_val = uuid.uuid4().int
                test_dict.setdefault(fmristr, []).append(uuid_val)

        return test_dict


if __name__ == '__main__':


        dict1 = gen_dict_fu(15000)

        list1 = gen_uniq_tup_fu(15000)
        list1.append(("http://upkg.sfbay";, 12345678901234567890))

        set1 = set(list1)
        entryf = ("http://upkg.sfbay";, 12345678901234567890)
        entrynf = ("http://ypkg.sfbay";, 11345678901134567890)
        fmrif = "http://www.ipkg-124.com";
        fmrinf = "http://ipkg.com";

        t1 = Timer("test_list_found()", "from __main__ import 
entryf,list1,test_list_found")
        print "Time to find entry in 15000 item list"
        print t1.timeit(10000)

        t2 = Timer("test_list_notfound()", "from __main__ import 
entrynf,list1,test_list_notfound")
        print "Time to not find entry in 15000 item list"
        print t2.timeit(10000)

        t3 = Timer("test_set_found()", "from __main__ import 
entryf,set1,test_set_found")
        print "Time to find entry in 15000 item set"
        print t3.timeit(10000)

        t4 = Timer("test_set_notfound()", "from __main__ import 
entrynf,set1,test_set_notfound")
        print "Time to not find entry in 15000 item set"
        print t4.timeit(10000)

        t5 = Timer("test_dict_found()", "from __main__ import 
fmrif,dict1,test_dict_found")
        print "Time to find entry in 15000 item dict"
        print t5.timeit(10000)

        t6 = Timer("test_dict_notfound()", "from __main__ import 
fmrinf,dict1,test_dict_notfound")
        print "Time to not find entry in 15000 item dict"
        print t6.timeit(10000)

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to