Re: yaouh! 0.4 is out - 10x speedup hack

2009-02-11 Thread Helge Hafting
Robin Paulson wrote:

 this is a fine idea. it's somewhat clumsy invoking 10 scripts manually
 - i'm sure it could be coded to do this automatically, but it's a
 decent first step.

It is clumsy indeed. Just a test to see if parallel curl would help - or 
not. Turns out that it helped. :-)

Running the curl and the md5sum in parallel would help too, and it'd be 
sort of perfect: md5sum would then use the cpu while curl waits for the 
network to respond. currently, curl waits for an answer, and then md5sum 
runs after that. md5sum is quicker than curl, so the md5sum time could 
be hidden completely.  One md5sum isn't much time, but saving the time 
for 50.000 invocations will probably amount to something.

Helge Hafting

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: yaouh! 0.4 is out - 10x speedup hack

2009-02-10 Thread Robin Paulson
2009/2/10 Helge Hafting helge.haft...@hist.no:
 Carlo Minucci wrote:

 download from http://wiki.openmoko.org/wiki/Yaouh!

 little bufgix and add support for multiple wget download
 i think now is more fast

 please, test and feedback

 Seems to work fine, but it looks like only the downloading happens in
 parallel. A common case seems to be only 5%-10% new tiles. The rest
 just need checking. This is done with sequential use of curl. I.e. no
 parallel checking, although this is a big part of a yaouh run.


 I am no expert in python threading, so I did a much simpler hack.
 I run 10 instances of yaouh. yaouh0.py only checks files matching
 *0.png, yaouh1.py only checks files matching *1.png, and so on.

 This gives a tremendous speedup because curl transfers very little data.
 Serialized curl spend most of the time waiting for an answer, while one
 request and a very small answer moves through the network to a distant
 server and back again. The bandwith of the connection is nowhere near fully
 utilized, not even when using usb networking.

this is a fine idea. it's somewhat clumsy invoking 10 scripts manually
- i'm sure it could be coded to do this automatically, but it's a
decent first step.

carlo, any chance of implementing this in a forthcoming release?

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: yaouh! 0.4 is out - 10x speedup hack

2009-02-10 Thread Carlo Minucci
Robin Paulson ha scritto:

 carlo, any chance of implementing this in a forthcoming release?

yes
i promise in the next release i insert this idea


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: yaouh! 0.4 is out - 10x speedup hack

2009-02-09 Thread Helge Hafting

Carlo Minucci wrote:

download from http://wiki.openmoko.org/wiki/Yaouh!

little bufgix and add support for multiple wget download
i think now is more fast

please, test and feedback


Seems to work fine, but it looks like only the downloading happens in 
parallel. A common case seems to be only 5%-10% new tiles. The rest
just need checking. This is done with sequential use of curl. I.e. no 
parallel checking, although this is a big part of a yaouh run.



I am no expert in python threading, so I did a much simpler hack.
I run 10 instances of yaouh. yaouh0.py only checks files matching
*0.png, yaouh1.py only checks files matching *1.png, and so on.

This gives a tremendous speedup because curl transfers very little data. 
Serialized curl spend most of the time waiting for an answer, while one 
request and a very small answer moves through the network to a distant 
server and back again. The bandwith of the connection is nowhere near 
fully utilized, not even when using usb networking.


So my 10 processes fires off 10 curls in parallel, giving a 10x speedup 
if the network can handle the load. usb networking seems to handle it 
fine when there are few updates.


This is a hack and not yet an real solution, because there is much room
for improvement. First, I have to press the start button in all 10 
windows which is excessive. There shouldn't be 10 windows in a real

solution. Then all my 10 yaouh processes run the same find
in the beginning, in order to count the number of files. That is 
unnecessary - such a startup is 10 times heavier than it need to be. 
Still, this don't take much time compared to the rest. Finally, there 
are now 10 progress bars being updated in those 10 windows.


Still, this approach has checked 5000 files (out of 50.) in the 20 
minutes it took to write this mail. And downloaded about 300 outdated 
tiles. There is the hope that my 50.000 tiles will be up to date in 3.5 
hours. :-)


A patch for yaouh 0.4 is attached, if anyone wants to test this, or 
improve it further. The patched file is available here:

http://www.aitel.hist.no/~helgehaf/openmoko/yaouhx.py

To use it, make 10 copies named yaouh0.py, yaouh1.py, ... yaouh9.py.
In each script, edit line 159, so yaouh0.py has 0 in the if-test,
yaouh1.py has 1 in the if-test, yaouh2.py has 2 in the test, and so on.

Then, start everything with a command like:
$ yaouh0.py  yaouh1.py  yaouh2.py  yaouh3.py  yaouh4.py  yaouh5.py 
 yaouh6.py  yaouh7.py  yaouh8.py  yaouh9.py 


Running 10 scripts eats some memory, some may not be able to run all 10 
at the same time. Having a swap partition may help with that.


Helge Hafting



--- yaouh.py2009-02-09 13:32:59.0 +0100
+++ ../public_html/openmoko/yaouhx.py   2009-02-09 13:19:33.0 +0100
@@ -150,21 +150,27 @@
for root, dirs, files in os.walk(dir):
for name in files:
i=i+1
+   self.pbar.set_fraction(p)
+   p=p+ip
+
+   #in the if-test, use 0 for yaouh0.py,
+   #use 1 for yaouh1.py, and so on
+   #up to yaouh9.py
+   if name[len(name)-5] != 0 : 
+   continue
+
url = self.create_url(plain_url, root, 
name[:name.find('.')], invert)
command_curl = curl -A yaouh -I \ + 
url + \ --stderr /dev/null | grep ETag | cut -d \ \ -f 2
curl=os.popen(command_curl)
etag=curl.read()
etag=etag.replace(\, ).rstrip()
-   
+   
command_md5 = md5sum  + join(root, 
name)
md5sum=os.popen(command_md5)
md5=md5sum.read()
md5=string.split(md5,  )
md5=md5[0]

-   self.pbar.set_fraction(p)
-   
-   p=p+ip
if etag != md5:
c=c+1
self.command_wget = wget 
--user-agent yaouh -q  + url +  -O  +  join(root, name) +  
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community