Helvin a écrit :
Hi,

Sorry I did not want to bother the group, but I really do not
understand this seeming trivial problem.
I am reading from a textfile, where each line has 2 values, with
spaces before and between the values.
I would like to read in these values, but of course, I don't want the
whitespaces between them.
I have looked at documentation, and how strings and lists work, but I
cannot understand the behaviour of the following:
    line = f.readline()
  line = line.lstrip() # take away whitespace at the beginning of the
readline.

file.readline returns the line with the ending newline character (which is considered whitespace by the str.strip method), so you may want to use line.strip instead of line.lstrip

 list = line.split(' ')

Slightly OT but : don't use builtin types or functions names as identifiers - this shadows the builtin object.

Also, the default behaviour of str.split is to split on whitespaces and remove the delimiter. You would have better results not specifying the delimiters here:

>>> " a  a  a  a ".split(' ')
['', 'a', '', 'a', '', 'a', '', 'a', '']
>>> " a  a  a  a ".split()
['a', 'a', 'a', 'a']
>>>

# the list has empty strings in it, so now,
remove these empty strings

A problem you could have avoided right from the start !-)

 for item in list:
   if item is ' ':

Don't use identity comparison when you want to test for equality. It happens to kind of work in your above example but only because CPython implements a cache for _some_ small strings, but you should _never_ rely on such implementation details. A string containing accented characters would not have been cached:
>>> s = 'ééé'
>>> s is 'ééé'
False
>>>


Also, this is surely not your actual code : ' ' is not an empty string, it's a string with a single space character. The empty string is ''. And FWIW, empty strings (like most empty sequences and collections, all numerical zeros, and the None object) have a false value in a boolean context, so you can just test the string directly:

for s in ['', 0, 0.0, [], {}, (), None]:
   if not s:
      print "'%s' is empty, so it's false" % str(s)


        print 'discard these: ',item
        index = list.index(item)
        del list[index]         # remove this item from the list

And then you do have a big problem : the internal pointer used by the iterator is not in sync with the list anymore, so the next iteration will skip one item.

As general rule : *don't* add / remove elements to/from a sequence while iterating over it. If you really need to modify the sequence while iterating over it, do a reverse iteration - but there are usually better solutions.

   else:
        print 'keep this: ',item
The problem is,

Make it a plural - there's more than 1 problem here !-)

when my list is :  ['44', '', '', '', '', '',
'0.000000000\n']
The output is:
    len of list:  7
    keep this:  44
    discard these:
    discard these:
    discard these:
So finally the list is:   ['44', '', '', '0.000000000\n']
The code above removes all the empty strings in the middle, all except
two. My code seems to miss two of the empty strings.

Would you know why this is occuring?


cf above... and below:

>>> alist = ['44', '', '', '', '', '', '0.000000000']
>>> for i, it in enumerate(alist):
...     print 'i : %s -  it : "%s"' % (i, it)
...     if not it:
...         del alist[idx]
...     print "alist is now %s" % alist
...
i : 0 -  it : "44"
alist is now ['44', '', '', '', '', '', '0.000000000']
i : 1 -  it : ""
alist is now ['44', '', '', '', '', '0.000000000']
i : 2 -  it : ""
alist is now ['44', '', '', '', '0.000000000']
i : 3 -  it : ""
alist is now ['44', '', '', '0.000000000']
>>>


Ok, now for practical answers:

1/ in the above case, use line.strip().split(), you'll have no more problem !-)

2/ as a general rule, if you need to filter a sequence, don't try to do it in place (unless it's a *very* big sequence and you run into memory problems but then there are probably better solutions).

The common idioms for filtering a sequence are:

* filter(predicate, sequence):

the 'predicate' param is callback function which takes an item from the sequence and returns a boolean value (True to keep the item, False to discard it). The following example will filter out even integers:

def is_odd(n):
   return n % 2

alist = range(10)
odds = filter(is_odd, alist)
print alist
print odds

Alternatively, filter() can take None as it's first param, in which case it will filter out items that have a false value in a boolean context, ie:

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = filter(None, alist)
print result


* list comprehensions

Here you directly build the result list:

alist = range(10)
odds = [n for n in alist if n % 2]

alist = ['', 'a', 0, 1, [], [1], None, object, False, True]
result = [item for item in alist if item]
print result



HTH
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to