Helvin a écrit :
Hi,
Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
line = f.readline()
line = line.lstrip() # take away whitespace at the beginning of the
readline.
file.readline returns the line with the ending newline character (which
is considered whitespace by the str.strip method), so you may want to
use line.strip instead of line.lstrip
list = line.split(' ')
Slightly OT but : don't use builtin types or functions names as
identifiers - this shadows the builtin object.
Also, the default behaviour of str.split is to split on whitespaces and
remove the delimiter. You would have better results not specifying the
delimiters here:
>>> " a a a a ".split(' ')
['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a a a a ".split()
['a', 'a', 'a', 'a']
>>>
# the list has empty strings in it, so now,
remove these empty strings
A problem you could have avoided right from the start !-)
for item in list:
if item is ' ':
Don't use identity comparison when you want to test for equality. It
happens to kind of work in your above example but only because CPython
implements a cache for _some_ small strings, but you should _never_ rely
on such implementation details. A string containing accented characters
would not have been cached:
>>> s = 'ééé'
>>> s is 'ééé'
False
>>>
Also, this is surely not your actual code : ' ' is not an empty string,
it's a string with a single space character. The empty string is ''. And
FWIW, empty strings (like most empty sequences and collections, all
numerical zeros, and the None object) have a false value in a boolean
context, so you can just test the string directly:
for s in ['', 0, 0.0, [], {}, (), None]:
if not s:
print "'%s' is empty, so it's false" % str(s)
print 'discard these: ',item
index = list.index(item)
del list[index] # remove this item from the list
And then you do have a big problem : the internal pointer used by the
iterator is not in sync with the list anymore, so the next iteration
will skip one item.
As general rule : *don't* add / remove elements to/from a sequence while
iterating over it. If you really need to modify the sequence while
iterating over it, do a reverse iteration - but there are usually better
solutions.
else:
print 'keep this: ',item
The problem is,
Make it a plural - there's more than 1 problem here !-)
when my list is : ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
len of list: 7
keep this: 44
discard these:
discard these:
discard these:
So finally the list is: ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.
Would you know why this is occuring?
cf above... and below:
>>> alist = ['44', '', '', '', '', '', '0.000000000']
>>> for i, it in enumerate(alist):
... print 'i : %s - it : "%s"' % (i, it)
... if not it:
... del alist[idx]
... print "alist is now %s" % alist
...
i : 0 - it : "44"
alist is now ['44', '', '', '', '', '', '0.000000000']
i : 1 - it : ""
alist is now ['44', '', '', '', '', '0.000000000']
i : 2 - it : ""
alist is now ['44', '', '', '', '0.000000000']
i : 3 - it : ""
alist is now ['44', '', '', '0.000000000']
>>>
Ok, now for practical answers:
1/ in the above case, use line.strip().split(), you'll have no more
problem !-)
2/ as a general rule, if you need to filter a sequence, don't try to do
it in place (unless it's a *very* big sequence and you run into memory
problems but then there are probably better solutions).
The common idioms for filtering a sequence are:
* filter(predicate, sequence):
the 'predicate' param is callback function which takes an item from the
sequence and returns a boolean value (True to keep the item, False to
discard it). The following example will filter out even integers:
def is_odd(n):
return n % 2
alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds
Alternatively, filter() can take None as it's first param, in which case
it will filter out items that have a false value in a boolean context, ie:
alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result
* list comprehensions
Here you directly build the result list:
alist = range(10)
odds = [n for n in alist if n % 2]
alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result
HTH
--
http://mail.python.org/mailman/listinfo/python-list