Re: split an iteration

2005-04-01 Thread Peter Otten
Robin Becker wrote:

 eg for
  e = enumerate([0,1,2,3,4,5])
  for i,a in e:
 ... if a==3: break
 ...
  for i,a in e:
 ... print i,a
 ...
 4 4
 5 5
 
 
 I think the second loop needs to start at 3 ie the split needs to be
 start, limit semantics
 
 It would be nice to be able to fix it with a move back method.

I have to reread your previous post, it seems. Meanwhile:

 e = enumerate(range(6))
 for i, a in e:
... if a == 3:
... for i, a in itertools.chain([(i, a)], e):
... print i, a
... break
...
3 3
4 4
5 5

Nesting the loops is necessary to handle empty lists and lists with no
matching item correctly. Alternatively, you could set a 'found' flag.

Another option:

 def predicate((i, a)): return a != 3
...
 for i, a in itertools.dropwhile(predicate, enumerate(range(6))):
... print i, a
...
3 3
4 4
5 5

The extra function call might have a negative impact on performance, though.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split an iteration

2005-03-31 Thread Peter Otten
Robin Becker wrote:

 Is there a fast way to get enumerate to operate over a slice of an
 iterable?

I think you don't need that here:

e = enumerate(active_nodes)
for insert_index, a in e:
# ...

for index, a in e:
# ...

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split an iteration

2005-03-31 Thread Raymond Hettinger
[Robin Becker]
 This function from texlib in oedipus.sf.net is a real cpu hog and I determined
 to see if it could be optimized.

 def add_active_node(self, active_nodes, node):
  Add a node to the active node list.
  The node is added so that the list of active nodes is always
  sorted by line number, and so that the set of (position, line,
  fitness_class) tuples has no repeated values.
  

If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:

insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
return# avoid creating a duplicate entry
active_nodes.insert(insert_index, node)

If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:

def __cmp__(self, other):
return cmp((self.line, self.position, self.fitness_class),
   (other.line, other.position, other.fitness_class))



  insert_index = nan
  for index, a in enumerate(active_nodes):
  if a.line=node_line:
  insert_index = index
  break
  index = insert_index

This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:

for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
if aline  node_line:
. . .



 Is there a fast way to get enumerate to operate over a slice of an iterable?

enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:

for index, value in izip(count(i), s[i:]):
 . . .

That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.

Also, I noticed that the code does not reference self.  Accordingly, it is a
good candidate for being a staticmethod or standalone function.



Raymond Hettinger


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split an iteration

2005-03-31 Thread Robin Becker
Peter Otten wrote:
Robin Becker wrote:

Is there a fast way to get enumerate to operate over a slice of an
iterable?

I think you don't need that here:
e = enumerate(active_nodes)
for insert_index, a in e:
# ...
for index, a in e:
# ...
Peter
I tried your solution, but I think we miss the split element
eg for
 e = enumerate([0,1,2,3,4,5])
 for i,a in e:
... if a==3: break
...
 for i,a in e:
... print i,a
...
4 4
5 5

I think the second loop needs to start at 3 ie the split needs to be 
start, limit semantics

It would be nice to be able to fix it with a move back method.
--
Robin Becker
--
http://mail.python.org/mailman/listinfo/python-list


Re: split an iteration

2005-03-31 Thread Robin Becker
Raymond Hettinger wrote:
[Robin Becker]
This function from texlib in oedipus.sf.net is a real cpu hog and I determined
to see if it could be optimized.
def add_active_node(self, active_nodes, node):
Add a node to the active node list.
The node is added so that the list of active nodes is always
sorted by line number, and so that the set of (position, line,
fitness_class) tuples has no repeated values.


If you can change the data structure to be an actual list of tuples, then the
bisect module can be used directly:
This is a way forward and I think is doable. The original Knuth algo 
used only global integer arrays. The actual insert point depends on the 
line attribute only which would be the first tuple element. So 
apparently we always insert non-identical line breaks at the beginning 
of their line group; I think we can do the insert check and then a bit 
of checking to find the actual insert point using bisect_right handwave 
handwave.


insert_index = bisect.bisect_left(active_nodes, node)
if active_nodes[insert_index] == node:
return# avoid creating a duplicate entry
active_nodes.insert(insert_index, node)
If the data structure cannot be changed to tuples, then try adding a custom
compare operation to the node class:
def __cmp__(self, other):
return cmp((self.line, self.position, self.fitness_class),
   (other.line, other.position, other.fitness_class))


insert_index = nan
for index, a in enumerate(active_nodes):
if a.line=node_line:
insert_index = index
break
index = insert_index

This loop can be squeezed a bit more using itertools.imap() and
operator.attrgetter() for the attribute lookup:
for index, aline in enumerate(imap(attrgetter('line'), active_nodes):
if aline  node_line:
. . .


Is there a fast way to get enumerate to operate over a slice of an iterable?

enumerate(s) is the same as izip(count(), s).
So, to start from position i, write:
for index, value in izip(count(i), s[i:]):
 . . .
That being said, your best bet is to eliminate the initial linear search which
is likely consuming most of the clock cycles.
Also, I noticed that the code does not reference self.  Accordingly, it is a
good candidate for being a staticmethod or standalone function.

Raymond Hettinger


--
Robin Becker
--
http://mail.python.org/mailman/listinfo/python-list