Re: [cs-lisp] RE: Another programming challenge - Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili -
Emre Sevinc wrote: [ ... ] This was a lovely feature. Believe me. People liked it. i was in love with that since Orkut. Now, people and the programmer are sorry. Why? Because the aforementioned contacts table (I've given an example in my previous mail, you can see it below, just includes an ID and ContactID) has grown bigger. It's now about 5 MB. cember.net became popular, more than a few thousand members. There is an another problem here. That path calculation is not required to be done in real time! ( that's a design decision ( ? :) ) ) it could be done daily and with some more clever decisions like has any member added any member as a contact, regenerate the path, else use cached version. [ ... ] -- @n ___ cs-lisp mailing list cs-lisp@cs.bilgi.edu.tr http://church.cs.bilgi.edu.tr/lcg http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp
[cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili
Emre Sevinc wrote: -Original Message- From: [EMAIL PROTECTED] on behalf of Can Burak Cilingir Sent: Sat 12/17/2005 10:12 AM To: cs-lisp@cs.bilgi.edu.tr Cc: cs-discuss Subject: Re: [cs-lisp] RE: Another programming challenge - Re: [cs-discuss]PHP+MySQL versus Lisp: Shortest Path problemi ileilgili - Emre Sevinc wrote: [ ... ] This was a lovely feature. Believe me. People liked it. i was in love with that since Orkut. Another research subject for psychology. Now, people and the programmer are sorry. Why? Because the aforementioned contacts table (I've given an example in my previous mail, you can see it below, just includes an ID and ContactID) has grown bigger. It's now about 5 MB. cember.net became popular, more than a few thousand members. There is an another problem here. That path calculation is not required to be done in real time! ( that's a design decision ( ? :) ) ) it could be done daily and with some more clever decisions like has any member added any member as a contact, regenerate the path, else use cached version. The problem is that, you have to do it real time. What are you going to cache? Every possibility? Every day, every minute new members come, they connect to someone, then they click on a member that they are not directly connected to in order to see their connection path, if any. And they want to see the correct answer. This is not about *my* connection list. I'm directly connected to them. The path is clear. The problem is that the question is asked all the time by the new users which means new IDs. Any new or any old user may wish to see if they are connected to any user. Think about number 1000 and feel the combinatorial explosion deep in your soul. Are you going to generate a path for me, a path in which I'm connected to user A, user B, ... , userZeta (let's say in some way I'm connected to 500 hundred users out of 1000). Are you going to calculate and store the results for me and 500 hundred people (because I may wish to examine all of them). And same thing for the remaining 999 people. And we're talking about 1000. Make it a 10.000 (not a big number for a popular site) and cover your ears in order not to hear the explosion. But then, maybe you'll propose to buy another machine, replicate the database and let that machine work like hell to calculate all possible connections' paths and store them and do that everyday for a few thousand users. If you be more clear about what kind of a caching scheme you intended then I can understand better. Maybe I misunderstood the deal. oh, I missed the point of generating paths to every visiting member and underestimated the scenario. let me rethink the caching scheme (for real-time calculation). once you generate the path from a to b, you don't need to regenerate it unless any member's connection list is updated. so: function shortest-path (membera memberb) { if ispathcached (membera memberb) p = getcachedpath (membera memberb) t = getcachedtime (membera memberb) //is cache still valid? for each member of p as m mt = getmodificationtimeofconnlist(m) if (mt t) { np = regeneratecache(membera memberb) return np } return p } Emre Sevinc -- @n ___ cs-lisp mailing list cs-lisp@cs.bilgi.edu.tr http://church.cs.bilgi.edu.tr/lcg http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp
[cs-lisp] RE: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili
Title: RE: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili -Original Message- From: Can Burak Cilingir [mailto:[EMAIL PROTECTED]] Sent: Sat 12/17/2005 11:15 AM To: Emre Sevinc Cc: cs-lisp@cs.bilgi.edu.tr; cs-discuss Subject: Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili Emre Sevinc wrote: -Original Message- If you be more clear about what kind of a caching scheme you intended then I can understand better. Maybe I misunderstood the deal. oh, I missed the point of generating paths to every visiting member and underestimated the scenario. let me rethink the caching scheme (for real-time calculation). once you generate the path from a to b, you don't need to regenerate it unless any member's connection list is updated. so: Let me restate the consequences: You came to the lovely, business oriented social network. You love those paths, you want to see them everywhere. You are not connected to me. I'm not connected to you. You click on me. (Never was I happier for using English instead of my native language Turkish.) We have a common acquaintance, thus the shortest path between us exists. You had to calculate this. When? When asked. OK, you now have this data, you've already put it in some SQL table. What advantage does this have? If you (or I) click on me (or you) then fetch the result quickly from the SQL table. But as you have seen in the above scenario, lots, lots of things are not cached. I'm not saying that caching is useless. It isn't. I'm saying that new members are coming and also today I'm clicking on people that I haven't ever clicked before. Hundreds or thousands of people doing the same thing. Which cache? Your server is crunching under the load of calculation. What you say is: Once your server is crunched and cached the results of those queries, ok, it won't crash if the same queries are made. But of course, each time brand new queries with different Contacts are generated. Could I make myself clear this time? function shortest-path (membera memberb) { if ispathcached (membera memberb) Probably not. p = getcachedpath (membera memberb) t = getcachedtime (membera memberb) //is cache still valid? for each member of p as m mt = getmodificationtimeofconnlist(m) if (mt t) { np = regeneratecache(membera memberb) return np } And you imagine connections are rock hard? Maybe our good old acquaintance has just left the network. I'm making the same query, you and me but the network data has changed. So you have to modify your cache. That means recalculating. What an acquaintance! Anyway, again you had to calculate. Lots of calculations, people are clicking, think of 10.000 people network, a few thousand online, every minute a few 10 people are coming, partially connected and making queries which are not cached yet. Maybe we need a trade-off here. Emre S. ___ cs-lisp mailing list cs-lisp@cs.bilgi.edu.tr http://church.cs.bilgi.edu.tr/lcg http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp
[cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili
[ ... ] What you say is: Once your server is crunched and cached the results of those queries, ok, it won't crash if the same queries are made. But of course, each time brand new queries with different Contacts are generated. Could I make myself clear this time? I was already clear on that but trying to implement caching somehow. My question for the previous mail was not is caching can handle the load or not. It was the point of my first mail (do that once a day) and understood that is not possible. so my question is is there a better way to cache this? (see my paragragh below Maybe we need a trade-off here) function shortest-path (membera memberb) { if ispathcached (membera memberb) Probably not. You are absolutely right. When I browse such sites, I rarely click on people whom I already know. p = getcachedpath (membera memberb) t = getcachedtime (membera memberb) //is cache still valid? for each member of p as m mt = getmodificationtimeofconnlist(m) if (mt t) { np = regeneratecache(membera memberb) return np } And you imagine connections are rock hard? Maybe our good old acquaintance has just left the network. I'm making the same query, you and me but the network data has changed. So you have to modify your cache. That means recalculating. What an acquaintance! Anyway, again you had to calculate. Lots of calculations, people are clicking, think of 10.000 people network, a few thousand online, every minute a few 10 people are coming, partially connected and making queries which are not cached yet. Maybe we need a trade-off here. we can cache some intermediate paths and somehow calculate the not so shortest path. a - c is (a g j o d c) for a-d, you can assume (a g j o d) although there may be a path such that (a w d) if you need a connection, we have that. but if you need the shortest path this is not the answer. I think this is better than showing nothing. We are also talking about shortest path which is not an easy to solve problem. and doing this over and over again. I am curious how orkut handles that load (with the help of google's cluster? :)). Emre S. -- @n ___ cs-lisp mailing list cs-lisp@cs.bilgi.edu.tr http://church.cs.bilgi.edu.tr/lcg http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp
Programming challenge: Re: [cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili
1. OK we misunderstood the input format - it is neighbour lists, not pairs of contacts, Emre's example test data misled us. 2. My challenge to my students still stands (modified in the light of our new information). a) Modify the Graham code to work on pairs of contacts b) Find the -algorithmic- inefficiency in Graham's code and correct it. My example (misunderstood) test data gives a clue to the algorithmic inefficiency. c) Solve the implementation inefficiencies in Graham's code to get down to a solution that is O(e+v) or, at the very worst, O( (e+v) log (e+v)) I once again attach the scheme version of the Graham code. 3. Technicalities:- There are two inefficiencies in Graham's code (a) the algorithmic inefficiency that is the subject of challenge 2(b). (b) the use of inherently time inefficient LISP structures - this is important, but -much- less important than (a). LISP/Scheme, of course, offers alternative, more efficient, structures, or you can easily and cheaply build your own. 4. Cember's problem: a) I repeat that this is a problem whose solution is LINEAR in the total size of its input. It is therefore easily soluble, in Cember's practical situation, without blowing up the server, even for tens of thousands of subscribers. You can make that hundreds of thousands of subscribers, given the likely e/v ratio (quite low) and the likely pattern of contacts. O(e+v) is for finding ALL shortest paths starting at a give node. We only want one. b) IF the Cember SQL solution has the same -algorithmic- inefficiency as Graham's solution that will explain the explosions. c) Those on the mail lists who are proposing cacheing are missing the point. If we keep the neighbour lists in primary memory, this problem is trivial. If there is a performance problem, do -not- cache the shortest paths (it is a list of size O(n^3)!) Cache the neighbour lists!! We know when we update the neighbour lists in the database, so we can keep them in a simple write through cache. d) Language choice is not important. Whatever language the existing system is written in, use that. PHP gives extensible arrays, efficiently implemented as hash tables, and easy interface to the database. Given that you then only have to write seven lines of code, even I would be prepared to (in fact prefer to!) write it in PHP: then wash my hands, of course. Writing this particular algorithm efficiently in LISP/Scheme requires doing some unlispy things or writing some complicated code. Why bother with the interfacing problems for no significant gain? Actually you are exaggerating the interfacing problems. Emre, put Cember in touch with me. I will solve their problem. It is trivial. After 0900 on Wednesday I will publish my solutions. Sorry, Can Burak, shortest path is an -easy- problem! CS Can Burak Cilingir wrote: [ ... ] What you say is: Once your server is crunched and cached the results of those queries, ok, it won't crash if the same queries are made. But of course, each time brand new queries with different Contacts are generated. Could I make myself clear this time? I was already clear on that but trying to implement caching somehow. My question for the previous mail was not is caching can handle the load or not. It was the point of my first mail (do that once a day) and understood that is not possible. so my question is is there a better way to cache this? (see my paragragh below Maybe we need a trade-off here) function shortest-path (membera memberb) { if ispathcached (membera memberb) Probably not. You are absolutely right. When I browse such sites, I rarely click on people whom I already know. p = getcachedpath (membera memberb) t = getcachedtime (membera memberb) //is cache still valid? for each member of p as m mt = getmodificationtimeofconnlist(m) if (mt t) { np = regeneratecache(membera memberb) return np } And you imagine connections are rock hard? Maybe our good old acquaintance has just left the network. I'm making the same query, you and me but the network data has changed. So you have to modify your cache. That means recalculating. What an acquaintance! Anyway, again you had to calculate. Lots of calculations, people are clicking, think of 10.000 people network, a few thousand online, every minute a few 10 people are coming, partially connected and making queries which are not cached yet. Maybe we need a trade-off here. we can cache some intermediate paths and somehow calculate the not so shortest path. a - c is (a g j o d c) for a-d, you can assume (a g j o d) although there may be a path such that (a w d) if you need a connection, we have that. but if you need the shortest path this is not the answer. I think this is better than showing nothing. We are also talking about shortest path which is not an easy to solve problem. and doing this over and over again. I am curious how orkut
[cs-lisp] Re: Another programming challenge - Another Clue Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili -
Emre and everyone Kod dzgn almyor The Graham code , even with input in the correct format, does NOT work correctly. Consider this input (shortest-path 1 54 '((1 5 ) (5 1 12 17) ( 12 26 28 55) (17 1))) This should produce the answer () or nil, because there is no path from 1 to 54. It does not. The fact that it does not also exposes indirectly the algorithmic problem with his code. CS Emre Sevinc wrote: RE: Another programming challenge - Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili - -Original Message- From: Chris Stephenson [mailto:[EMAIL PROTECTED]] Sent: Sat 12/17/2005 12:07 AM To: Emre Sevinc Cc: cs-lisp@cs.bilgi.edu.tr; cs-discuss; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Another programming challenge - Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili - Aldigin Kod Paul Graham'in "Ansi Common Lisp" kitabindan. Maalesef kitap online degil ve elimde degil ve kutuphane'de "out". Graham'in kodun hangi sorunu zdgn iddia ettigini grmek isterim. Kitap muhtemelen Sinan'da ya da Can Burak'ta ya da Haldun'da (ya da baska bir Common Lisp sevdalisinda, kim acaba? ;-) Bu kod bu sorunu zmyor, zemez. Istersen bunu deneyin! Try this! (shortest-path '1 '26 '((1 5) (5 1) (5 12) (5 17) (12 26) (12 28) (12 54))) daha da kt bir sonu! assoc bildigimiz assoc ise, bu kod alisamaz. Kod calisiyor. Tabii ben, elimde kitap olmadan giristigim icin ne tr input bekledigini anlayamamisim basta. BM'nin tavsiyeleri dogrultusunda kurcalayinca input'un nasil olmasi gerektigi anlasildi. Tekrar kodu gecmek gerekirse: == (defun shortest-path (start end net) (bfs end (list (list start)) net)) (defun bfs (end queue net) (if (null queue) nil (let ((path (car queue))) (let ((node (car path))) (if (eql node end) (reverse path) (bfs end (append (cdr queue) (new-paths path node net)) net)) (defun new-paths (path node net) (mapcar #'(lambda (n) (cons n path)) (cdr (assoc node net === bfs, anladigimiz kadari ile Breadth First Search manasinda. Ve her elemanin komsularini yazip yle arayacak olursak: === (time (shortest-path '1 '54 '( (1 5) (5 1 12 17) (12 5 26 28 54) (17 5) (26 12) (28 12) (54 12 Evaluation took: 0.0 seconds of real time 0.0 seconds of user run time 0.0 seconds of system run time 0 page faults and 8,192 bytes consed. (1 5 12 54) Dogru sonucu yani 1 - 5 - 12 - 54 sonucunu buluyor. Windows'ta CLISP'te ve GNU/Linux'ta SBCL ile denedim. kodu Scheme'e evirdim, ayni buglar mevcut. (buglu Scheme ektedir) Kodu alisir haline getirdim ancak hala performans aisindan byk listeler iin berbat olmasi lazim. Performansini da zmek de zor degil. Yukaridaki Common Lisp kodu grdgm kadari ile dzgn calisiyor (biraz daha bytp karmasiklastirdim inputu gene dzgn sonuc verdi) === CL-USER (time (shortest-path '1 '19 '((1 5) (5 1 12 17) (12 5 26 28 54) (17 5) (26 12) (28 12) (54 12 47) (47 15) (15 2) (2 15 100 101) (3 100) (100 2 15 19 Evaluation took: 0.417 seconds of real time 0.054991 seconds of user run time 0.014998 seconds of system run time 46 page faults and 1,211,352 bytes consed. (1 5 12 54 47 15 2 100 19) CL-USER (time (shortest-path '1 '19 '((1 5) (5 1 12 17) (12 5 26 28 54) (17 5) (26 12) (28 12) (54 12 47) (47 15) (15 2) (2 15 100 101) (3 100) (100 2 15 19 Evaluation took: 0.005 seconds of real time 0.005999 seconds of user run time 0.0 seconds of system run time 0 page faults and 1,224,704 bytes consed. (1 5 12 54 47 15 2 100 19) CL-USER (time (shortest-path '1 '19 '((1 5) (5 1 12 17) (12 5 26 28 54) (17 5) (26 12) (28 12) (54 12 47) (47 15) (15 2) (2 15 100 101) (3 100) (100 19 Evaluation took: 0.005 seconds of real time 0.004999 seconds of user run time 0.0 seconds of system run time 0 page faults and 1,204,224 bytes consed. (1 5 12 54 47 15 2 100 19) Hafiza acisindan problemli oldugu asikar, diger yandan "cache"lemenin getirdigi tekrar hesaplamalardaki hiz farki da grlebiliyor. Tail-recursive hale getirilirse herhalde hafiza problemi halledilebilir, hiz konusunda ise su anda bir fikrim yok. Bu sorun zm besbelli basit bir Bilgisayar Bilimleri sorunu. Algoritmik olarak karisik degil ancak basit olmayan kisimlarini asagida yazacagim. So a programming challenge for any student in any year. Solve either part of Emre's problem (a) find and correct the bugs in Paul Graham's code. Emre has demonstrated one, my example above demonstrates another. (b) produce