Hi,
Here is the 'full' code as such which gets the data from the files.
## Classify the users
table = {}
table[('', '')] = 0
for x in horizontal_criterias:
table[(x['id'], '')] = 0
for y in vertical_criterias:
table[('', y['id'])] = 0
for x in horizontal_criterias:
x = x['id']
for y in vertical_criterias:
table[(x, y['id'])] = 0
for brain in brains:
x = getattr(brain, horizontal)
if isinstance(x, list):
for item in x:
x = item
else:
x
y = getattr(brain, vertical)
if isinstance(y, list):
for item in y:
y = item
else:
y
if x and y and (x, y) in table:
table[(x, y)] += 1
table[(x, '')] += 1
table[('', y)] += 1
table[('', '')] += 1
table is a dictionary, which returns, for example:
{ ('', ''): 1,
('', 'fr'): 0,
('', 'uk'): 1,
('', 'us'): 0,
('airport-car-parking', ''): 2,
('airport-car-parking', 'fr'): 0,
('airport-car-parking', 'uk'): 2,
('airport-car-parking', 'us'): 0,
('air-taxi-operators', ''): 1,
('air-taxi-operators', 'fr'): 0,
('air-taxi-operators', 'uk'): 1,
('air-taxi-operators', 'us'): 0,
...
('worldwide-attractions-and-ticket-agents', ''): 0,
('worldwide-attractions-and-ticket-agents', 'fr'): 0,
('worldwide-attractions-and-ticket-agents', 'uk'): 0,
('worldwide-attractions-and-ticket-agents', 'us'): 0}
The output is something like:
country |airport-car|air-taxi-operators|airlines-schedule| total
---------------------------------------------------------------
france |0 |0 |0 |0
uk |2 |0 |0 |2
us |0 |0 |0 |0
---------------------------------------------------------------
total |2 |0 |0 |2
---------------------------------------------------------------
What I can't seem to figure out is how to do a cumulative sum for each
record, for example, my files contain:
file1
<topic>airport-car air-taxi-operators</topic>
file2
<topic>airport-car air-taxi-operators airlines-schedule</topic>
etc...
If I put a print, to see what is listed, in this code
if isinstance(x, list):
for item in x:
x = item
pp.pprint(x)
else:
I get:
u'airport-car-parking'
u'air-taxi-operators'
u'airport-car-parking'
u'airlines-scheduled'
u'air-taxi-operators'
Which is correct, but the table only counts the first item of the tuple.
Ideally my table should be:
country |airport-car|air-taxi-operators|airlines-schedule| total
---------------------------------------------------------------
france |0 |0 |0 |0
uk |2 |2 |1 |2
us |0 |0 |0 |0
---------------------------------------------------------------
total |2 |2 |1 |2
---------------------------------------------------------------
Cheers
Norman
Martin Walsh wrote:
Hi Norman,
Norman Khine wrote:
for brain in brains:
x = getattr(brain, horizontal)
x = string.join(x, '' )
y = getattr(brain, vertical)
y = string.join(y, '' )
if x and y and (x, y) in table:
table[(x, y)] += 1
table[(x, '')] += 1
table[('', y)] += 1
table[('', '')] += 1
For what it's worth, string.join has been deprecated since the addition
of the join method for str and unicode types. Other deprecated string
module functions are documented here: http://docs.python.org/lib/node42.html
If I'm not mistaken, the conventional form would be:
x = ''.join(x)
So now my list becomes a string, which is not really good for me, as
this fails when there is more then one item.
Is there a better way to loop through this and sum up all occurrences of
each entry ie 'airport-car-parking'
Maybe, can you show us a brief excerpt of what 'table' might look like
before the loop, and what you expect it to look like after one
iteration, with data samples for both x and y?
Most likely it's just me, but I'm having trouble reconciling your code
examples with your questions. AFAICT, either you want more than just a
simple count of occurrences from your data set, or you have some
confusion regarding dictionaries (if 'table' is a dictionary, of course).
If you want a count of each unique occurrence in a list -- not sure if
it's better, but something like this might get you started (untested):
from sets import Set
x = ['airlines-scheduled', 'airport-car-parking',
'more-than-100ml', 'do-not-bring-toothpaste',
'airport-car-parking', 'airlines-scheduled']
entity_count = dict((item, x.count(item)) for item in Set(x))
print entity_count['airlines-scheduled']
# 2
HTH,
Marty
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor