Hi,
Here is the 'full' code as such which gets the data from the files.

        ## Classify the users
        table = {}
        table[('', '')] = 0
        for x in horizontal_criterias:
            table[(x['id'], '')] = 0
        for y in vertical_criterias:
            table[('', y['id'])] = 0
        for x in horizontal_criterias:
            x = x['id']
            for y in vertical_criterias:
                table[(x, y['id'])] = 0

        for brain in brains:
            x = getattr(brain, horizontal)
            if isinstance(x, list):
                for item in x:
                    x = item
            else:
                x
            y = getattr(brain, vertical)
            if isinstance(y, list):
                for item in y:
                    y = item
            else:
                y
            if x and y and (x, y) in table:
                table[(x, y)] += 1
                table[(x, '')] += 1
                table[('', y)] += 1
                table[('', '')] += 1


table is a dictionary, which returns, for example:

{   ('', ''): 1,
    ('', 'fr'): 0,
    ('', 'uk'): 1,
    ('', 'us'): 0,
    ('airport-car-parking', ''): 2,
    ('airport-car-parking', 'fr'): 0,
    ('airport-car-parking', 'uk'): 2,
    ('airport-car-parking', 'us'): 0,
    ('air-taxi-operators', ''): 1,
    ('air-taxi-operators', 'fr'): 0,
    ('air-taxi-operators', 'uk'): 1,
    ('air-taxi-operators', 'us'): 0,
     ...
    ('worldwide-attractions-and-ticket-agents', ''): 0,
    ('worldwide-attractions-and-ticket-agents', 'fr'): 0,
    ('worldwide-attractions-and-ticket-agents', 'uk'): 0,
    ('worldwide-attractions-and-ticket-agents', 'us'): 0}


The output is something like:


country |airport-car|air-taxi-operators|airlines-schedule| total
---------------------------------------------------------------
france  |0          |0                 |0               |0
uk      |2          |0                 |0               |2
us      |0          |0                 |0               |0
---------------------------------------------------------------
total   |2          |0                 |0               |2
---------------------------------------------------------------

What I can't seem to figure out is how to do a cumulative sum for each record, for example, my files contain:

file1
<topic>airport-car air-taxi-operators</topic>

file2
<topic>airport-car air-taxi-operators airlines-schedule</topic>

etc...


If I put a print, to see what is listed, in this code

            if isinstance(x, list):
                for item in x:
                    x = item
                    pp.pprint(x)
            else:

I get:

u'airport-car-parking'
u'air-taxi-operators'
u'airport-car-parking'
u'airlines-scheduled'
u'air-taxi-operators'

Which is correct, but the table only counts the first item of the tuple.

Ideally my table should be:

country |airport-car|air-taxi-operators|airlines-schedule| total
---------------------------------------------------------------
france  |0          |0                 |0               |0
uk      |2          |2                 |1               |2
us      |0          |0                 |0               |0
---------------------------------------------------------------
total   |2          |2                 |1               |2
---------------------------------------------------------------


Cheers

Norman

Martin Walsh wrote:
Hi Norman,

Norman Khine wrote:
        for brain in brains:
            x = getattr(brain, horizontal)
            x = string.join(x, '' )
            y = getattr(brain, vertical)
            y = string.join(y, '' )
            if x and y and (x, y) in table:
                table[(x, y)] += 1
                table[(x, '')] += 1
                table[('', y)] += 1
                table[('', '')] += 1

For what it's worth, string.join has been deprecated since the addition
of the join method for str and unicode types. Other deprecated string
module functions are documented here: http://docs.python.org/lib/node42.html

If I'm not mistaken, the conventional form would be:

  x = ''.join(x)

So now my list becomes a string, which is not really good for me, as
this fails when there is more then one item.

Is there a better way to loop through this and sum up all occurrences of
each entry ie  'airport-car-parking'

Maybe, can you show us a brief excerpt of what 'table' might look like
before the loop, and what you expect it to look like after one
iteration, with data samples for both x and y?

Most likely it's just me, but I'm having trouble reconciling your code
examples with your questions. AFAICT, either you want more than just a
simple count of occurrences from your data set, or you have some
confusion regarding dictionaries (if 'table' is a dictionary, of course).

If you want a count of each unique occurrence in a list -- not sure if
it's better, but something like this might get you started (untested):

from sets import Set
x = ['airlines-scheduled', 'airport-car-parking',
     'more-than-100ml', 'do-not-bring-toothpaste',
     'airport-car-parking', 'airlines-scheduled']

entity_count = dict((item, x.count(item)) for item in Set(x))
print entity_count['airlines-scheduled']
# 2

HTH,
Marty

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to