Re: [cs-lisp] RE: Another programming challenge - Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili -

2005-12-17 Başlik Can Burak Cilingir

Emre Sevinc wrote:

[ ... ]


This was a lovely feature. Believe me. People liked it.


i was in love with that since Orkut.



Now, people and the programmer are sorry. Why? Because
the aforementioned contacts table (I've given an example
in my previous mail, you can see it below, just includes
an ID and ContactID) has grown bigger. It's now about 5 MB.
cember.net became popular, more than a few thousand members.


There is an another problem here. That path calculation is not required 
to be done in real time! ( that's a design decision ( ? :) ) )


it could be done daily and with some more clever decisions like has any 
member added any member as a contact, regenerate the path, else use 
cached version.


[ ... ]

--
@n

___
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp


[cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili

2005-12-17 Başlik Can Burak Cilingir

Emre Sevinc wrote:

-Original Message-
From: [EMAIL PROTECTED] on behalf of Can Burak Cilingir
Sent: Sat 12/17/2005 10:12 AM
To: cs-lisp@cs.bilgi.edu.tr
Cc: cs-discuss
Subject: Re: [cs-lisp] RE: Another programming challenge - Re: 
[cs-discuss]PHP+MySQL versus Lisp: Shortest Path problemi ileilgili -


Emre Sevinc wrote:

[ ... ]

  This was a lovely feature. Believe me. People liked it.
 
 i was in love with that since Orkut.
 

Another research subject for psychology.


  Now, people and the programmer are sorry. Why? Because
  the aforementioned contacts table (I've given an example
  in my previous mail, you can see it below, just includes
  an ID and ContactID) has grown bigger. It's now about 5 MB.
  cember.net became popular, more than a few thousand members.
 
 There is an another problem here. That path calculation is not required
 to be done in real time! ( that's a design decision ( ? :) ) )
 it could be done daily and with some more clever decisions like has any
 member added any member as a contact, regenerate the path, else use
 cached version.




The problem is that, you have to do it real time. What are you
going to cache? Every possibility? Every day, every minute new members
come, they connect to someone, then they click on a member that
they are not directly connected to in order to see their connection
path, if any. And they want to see the correct answer.

This is not about *my* connection list. I'm directly connected to them.
The path is clear.

The problem is that the question is asked all the time by the new users
which means new IDs. Any new or any old user may wish to see if they
are connected to any user. Think about number 1000 and feel the 
combinatorial

explosion deep in your soul. Are you going to generate a path
for me, a path in which I'm connected to user A, user B, ... , userZeta
(let's say in some way I'm connected to 500 hundred users out of 1000). 
Are you

going to calculate and store the results for me and 500 hundred people
(because I may wish to examine all of them). And same thing for the 
remaining

999 people. And we're talking about 1000. Make it a 10.000 (not a big number
for a popular site) and cover your ears in order not to hear the explosion.

But then, maybe you'll propose to buy another machine, replicate
the database and let that machine work like hell to calculate all
possible connections' paths and store them and do that everyday
for a few thousand users.

If you be more clear about what kind of a caching scheme you intended
then I can understand better. Maybe I misunderstood the deal.


oh, I missed the point of generating paths to every visiting member and 
underestimated the scenario.


let me rethink the caching scheme (for real-time calculation). once you 
generate the path from a to b, you don't need to regenerate it unless 
any member's connection list is updated. so:



function shortest-path (membera memberb)
{
if ispathcached (membera memberb)
p = getcachedpath (membera memberb)
t = getcachedtime (membera memberb)

//is cache still valid?
for each member of p as m
mt = getmodificationtimeofconnlist(m)
if (mt  t)
{
np = regeneratecache(membera memberb)
return np
}

return p
}



Emre Sevinc




--
@n

___
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp


[cs-lisp] RE: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili

2005-12-17 Başlik Emre Sevinc
Title: RE: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili






-Original Message-
From: Can Burak Cilingir [mailto:[EMAIL PROTECTED]]
Sent: Sat 12/17/2005 11:15 AM
To: Emre Sevinc
Cc: cs-lisp@cs.bilgi.edu.tr; cs-discuss
Subject: Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili

Emre Sevinc wrote:
 -Original Message-
 If you be more clear about what kind of a caching scheme you intended
 then I can understand better. Maybe I misunderstood the deal.

oh, I missed the point of generating paths to every visiting member and
underestimated the scenario.

let me rethink the caching scheme (for real-time calculation). once you
generate the path from a to b, you don't need to regenerate it unless
any member's connection list is updated. so:


Let me restate the consequences:

You came to the lovely, business oriented social network. You
love those paths, you want to see them everywhere.

You are not connected to me. I'm not connected to you.

You click on me. (Never was I happier for using English
instead of my native language Turkish.)

We have a common acquaintance, thus the shortest path between us
exists.

You had to calculate this.

When?

When asked.

OK, you now have this data, you've already put it in some
SQL table.

What advantage does this have? If you (or I) click on me (or you)
then fetch the result quickly from the SQL table.

But as you have seen in the above scenario, lots, lots of
things are not cached.

I'm not saying that caching is useless. It isn't.

I'm saying that new members are coming and also today I'm
clicking on people that I haven't ever clicked before. Hundreds
or thousands of people doing the same thing. Which cache? Your
server is crunching under the load of calculation.

What you say is: Once your server is crunched and cached
the results of those queries, ok, it won't crash if the
same queries are made. But of course, each time brand new
queries with different Contacts are generated.

Could I make myself clear this time?

function shortest-path (membera memberb)
{
if ispathcached (membera memberb)

Probably not.

 p = getcachedpath (membera memberb)
 t = getcachedtime (membera memberb)

//is cache still valid?
for each member of p as m
 mt = getmodificationtimeofconnlist(m)
 if (mt  t)
 {
 np = regeneratecache(membera memberb)
 return np
 }

And you imagine connections are rock hard?
Maybe our good old acquaintance has just left the
network. I'm making the same query, you and me
but the network data has changed. So you have
to modify your cache. That means recalculating.
What an acquaintance! Anyway, again you had to
calculate. Lots of calculations, people are
clicking, think of 10.000 people network, a few
thousand online, every minute a few 10 people
are coming, partially connected and making queries
which are not cached yet.

Maybe we need a trade-off here.

Emre S.



___
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp


[cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili

2005-12-17 Başlik Can Burak Cilingir

[ ... ]

What you say is: Once your server is crunched and cached
the results of those queries, ok, it won't crash if the
same queries are made. But of course, each time brand new
queries with different Contacts are generated.

Could I make myself clear this time?


I was already clear on that but trying to implement caching somehow. My 
question for the previous mail was not is caching can handle the load 
or not. It was the point of my first mail (do that once a day) and 
understood that is not possible.


so my question is is there a better way to cache this? (see my paragragh 
below Maybe we need a trade-off here)




 function shortest-path (membera memberb)
 {
 if ispathcached (membera memberb)

Probably not.
You are absolutely right. When I browse such sites, I rarely click on 
people whom I already know.


  p = getcachedpath (membera memberb)
  t = getcachedtime (membera memberb)

 //is cache still valid?
 for each member of p as m
  mt = getmodificationtimeofconnlist(m)
  if (mt  t)
  {
  np = regeneratecache(membera memberb)
  return np
  }

And you imagine connections are rock hard?
Maybe our good old acquaintance has just left the
network. I'm making the same query, you and me
but the network data has changed. So you have
to modify your cache. That means recalculating.
What an acquaintance! Anyway, again you had to
calculate. Lots of calculations, people are
clicking, think of 10.000 people network, a few
thousand online, every minute a few 10 people
are coming, partially connected and making queries
which are not cached yet.

Maybe we need a trade-off here.


we can cache some intermediate paths and somehow calculate the not so 
shortest path.  a - c is (a g j o d c) for a-d, you can assume (a g j 
o d) although there may be a path such that (a w d) if you need a 
connection, we have that. but if you need the shortest path this is 
not the answer. I think this is better than showing nothing.


We are also talking about shortest path which is not an easy to solve 
problem. and doing this over and over again. I am curious how orkut 
handles that load (with the help of google's cluster? :)).




Emre S.




--
@n

___
cs-lisp mailing list
cs-lisp@cs.bilgi.edu.tr
http://church.cs.bilgi.edu.tr/lcg
http://cs.bilgi.edu.tr/mailman/listinfo/cs-lisp


Programming challenge: Re: [cs-lisp] Re: PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili

2005-12-17 Başlik Chris Stephenson

1.
OK we misunderstood the input format - it is neighbour lists, not pairs 
of contacts, Emre's example test data misled us.


2.
My challenge to my students still stands (modified in the light of our 
new information).

a) Modify the Graham code to work on pairs of contacts
b) Find the -algorithmic- inefficiency in Graham's code and correct it. 
My example (misunderstood) test data gives a clue to the algorithmic 
inefficiency.
c) Solve the implementation inefficiencies in Graham's code to get down 
to a solution that is O(e+v) or, at the very worst, O( (e+v) log (e+v))


I once again attach the scheme version of the Graham code.

3.
Technicalities:-
There are two inefficiencies in Graham's code

(a) the algorithmic inefficiency that is the subject of challenge 2(b).

(b) the use of inherently time inefficient LISP structures - this is 
important, but -much- less important than (a). LISP/Scheme, of course, 
offers alternative, more efficient, structures, or you can easily and 
cheaply build your own.


4.
Cember's problem:
a) I repeat that this is a problem whose solution is LINEAR in the total 
size of its input. It is therefore easily soluble, in Cember's practical 
situation, without blowing up the server, even for tens of thousands of 
subscribers. You can make that hundreds of thousands of subscribers, 
given the likely e/v ratio (quite low) and the likely pattern of 
contacts. O(e+v) is for finding ALL shortest paths starting at a give 
node. We only want one.
b) IF the Cember SQL solution has the same -algorithmic- inefficiency as 
Graham's solution that will explain the explosions.
c) Those on the mail lists who are proposing cacheing are missing the 
point. If we keep the neighbour lists in primary memory, this problem is 
trivial. If there is a performance problem, do -not- cache the shortest 
paths (it is a list of size O(n^3)!) Cache the neighbour lists!! We know 
when we update the neighbour lists in the database, so we can keep them 
in a simple write through cache.
d) Language choice is not important. Whatever language the existing 
system is written in, use that. PHP gives extensible arrays, efficiently 
implemented as hash tables, and easy interface to the database. Given 
that you then only have to write seven lines of code, even I would be 
prepared to (in fact prefer to!) write it in PHP: then wash my hands, of 
course. Writing this particular algorithm efficiently in LISP/Scheme 
requires doing some unlispy things or writing some complicated code. Why 
bother with the interfacing problems for no significant gain? Actually 
you are exaggerating the interfacing problems.


Emre, put Cember in touch with me. I will solve their problem. It is 
trivial.


After 0900 on Wednesday I will publish my solutions.

Sorry, Can Burak, shortest path is an -easy- problem!

CS




Can Burak Cilingir wrote:


[ ... ]


What you say is: Once your server is crunched and cached
the results of those queries, ok, it won't crash if the
same queries are made. But of course, each time brand new
queries with different Contacts are generated.

Could I make myself clear this time?



I was already clear on that but trying to implement caching somehow. 
My question for the previous mail was not is caching can handle the 
load or not. It was the point of my first mail (do that once a day) 
and understood that is not possible.


so my question is is there a better way to cache this? (see my 
paragragh below Maybe we need a trade-off here)




 function shortest-path (membera memberb)
 {
 if ispathcached (membera memberb)

Probably not.


You are absolutely right. When I browse such sites, I rarely click on 
people whom I already know.




  p = getcachedpath (membera memberb)
  t = getcachedtime (membera memberb)

 //is cache still valid?
 for each member of p as m
  mt = getmodificationtimeofconnlist(m)
  if (mt  t)
  {
  np = regeneratecache(membera memberb)
  return np
  }

And you imagine connections are rock hard?
Maybe our good old acquaintance has just left the
network. I'm making the same query, you and me
but the network data has changed. So you have
to modify your cache. That means recalculating.
What an acquaintance! Anyway, again you had to
calculate. Lots of calculations, people are
clicking, think of 10.000 people network, a few
thousand online, every minute a few 10 people
are coming, partially connected and making queries
which are not cached yet.

Maybe we need a trade-off here.



we can cache some intermediate paths and somehow calculate the not so 
shortest path.  a - c is (a g j o d c) for a-d, you can assume (a g 
j o d) although there may be a path such that (a w d) if you need a 
connection, we have that. but if you need the shortest path this is 
not the answer. I think this is better than showing nothing.


We are also talking about shortest path which is not an easy to 
solve problem. and doing this over and over again. I am curious how 
orkut 

[cs-lisp] Re: Another programming challenge - Another Clue Re: [cs-discuss] PHP+MySQL versus Lisp: Shortest Path problemi ile ilgili -

2005-12-17 Başlik Chris Stephenson




Emre and everyone

Kod dzgn almyor
The Graham code , even with input in the correct format, does NOT work
correctly.

Consider this input

(shortest-path 1 54 '((1 5 ) (5 1 12 17) ( 12 26 28 55) (17 1)))

This should produce the answer () or nil, because there is no path from
1 to 54.

It does not. The fact that it does not also exposes indirectly the
algorithmic problem with his code.

CS

Emre Sevinc wrote:

  
  
  RE: Another programming challenge - Re: [cs-discuss] PHP+MySQL
versus Lisp: Shortest Path problemi ile ilgili - 

  -Original Message-
From: Chris Stephenson [mailto:[EMAIL PROTECTED]]
Sent: Sat 12/17/2005 12:07 AM
To: Emre Sevinc
Cc: cs-lisp@cs.bilgi.edu.tr; cs-discuss;
[EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Another programming challenge - Re: [cs-discuss] PHP+MySQL
versus Lisp: Shortest Path problemi ile ilgili -
  
 Aldigin Kod Paul Graham'in "Ansi Common Lisp" kitabindan.
Maalesef kitap online degil ve elimde degil ve kutuphane'de "out".
Graham'in kodun hangi sorunu zdgn iddia ettigini grmek
isterim.
  
Kitap muhtemelen Sinan'da ya da Can Burak'ta ya da Haldun'da
(ya da baska bir Common Lisp sevdalisinda, kim acaba? ;-)
  
 Bu kod bu sorunu zmyor, zemez.
Istersen bunu deneyin!
Try this!
(shortest-path '1 '26 '((1 5)
 (5 1)
 (5 12)
 (5 17)
 (12 26)
 (12 28)
 (12 54)))
daha da kt bir sonu!
assoc bildigimiz assoc ise, bu kod alisamaz.
  
Kod calisiyor. Tabii ben, elimde kitap olmadan giristigim
icin ne tr input bekledigini anlayamamisim basta. BM'nin
tavsiyeleri dogrultusunda kurcalayinca input'un nasil olmasi
gerektigi anlasildi.
  
Tekrar kodu gecmek gerekirse:
  
==
(defun shortest-path (start end net)
(bfs end (list (list start)) net))
  
(defun bfs (end queue net)
(if (null queue)
 nil
 (let ((path (car queue)))
 (let ((node (car path)))
 (if (eql node end)
 (reverse path)
 (bfs end
 (append (cdr queue)
 (new-paths path node net))
 net))
  
(defun new-paths (path node net)
(mapcar #'(lambda (n)
 (cons n path))
 (cdr (assoc node net
===
  
  
bfs, anladigimiz kadari ile Breadth First Search manasinda.
  
Ve her elemanin komsularini yazip yle arayacak olursak:
  
  
===
(time (shortest-path '1 '54 '(
 (1 5)
 (5 1 12 17)
 (12 5 26 28 54)
 (17 5)
 (26 12)
 (28 12)
 (54 12
  
Evaluation took:
 0.0 seconds of real time
 0.0 seconds of user run time
 0.0 seconds of system run time
 0 page faults and
 8,192 bytes consed.
  
(1 5 12 54)

  
Dogru sonucu yani 1 - 5 - 12 - 54 sonucunu buluyor.
Windows'ta
CLISP'te ve GNU/Linux'ta SBCL ile denedim.
  
kodu Scheme'e evirdim, ayni buglar mevcut. (buglu Scheme ektedir)
Kodu
alisir haline getirdim ancak hala performans aisindan byk
listeler
iin berbat olmasi lazim. Performansini da zmek de zor degil.
  
Yukaridaki Common Lisp kodu grdgm kadari ile dzgn
calisiyor (biraz daha bytp karmasiklastirdim inputu gene
dzgn sonuc verdi)
  
===
CL-USER (time (shortest-path '1 '19 '((1 5)
 (5 1 12 17)
 (12 5 26 28 54)
 (17 5)
 (26 12)
 (28 12)
 (54 12 47)
 (47 15)
 (15 2)
 (2 15 100 101)
 (3 100)
 (100 2 15 19
Evaluation took:
 0.417 seconds of real time
 0.054991 seconds of user run time
 0.014998 seconds of system run time
 46 page faults and
 1,211,352 bytes consed.
  
(1 5 12 54 47 15 2 100 19)
  
CL-USER (time (shortest-path '1 '19 '((1 5)
 (5 1 12 17)
 (12 5 26 28 54)
 (17 5)
 (26 12)
 (28 12)
 (54 12 47)
 (47 15)
 (15 2)
 (2 15 100 101)
 (3 100)
 (100 2 15 19
Evaluation took:
 0.005 seconds of real time
 0.005999 seconds of user run time
 0.0 seconds of system run time
 0 page faults and
 1,224,704 bytes consed.
  
(1 5 12 54 47 15 2 100 19)
  
CL-USER (time (shortest-path '1 '19 '((1 5)
 (5 1 12 17)
 (12 5 26 28 54)
 (17 5)
 (26 12)
 (28 12)
 (54 12 47)
 (47 15)
 (15 2)
 (2 15 100 101)
 (3 100)
 (100 19
Evaluation took:
 0.005 seconds of real time
 0.004999 seconds of user run time
 0.0 seconds of system run time
 0 page faults and
 1,204,224 bytes consed.
  
(1 5 12 54 47 15 2 100 19)

  
Hafiza acisindan problemli oldugu asikar, diger yandan
"cache"lemenin getirdigi tekrar hesaplamalardaki hiz
farki da grlebiliyor.
  
Tail-recursive hale getirilirse herhalde hafiza problemi
halledilebilir, hiz konusunda ise su anda bir fikrim yok.
  
 Bu sorun zm besbelli basit bir Bilgisayar Bilimleri sorunu.
  
Algoritmik olarak karisik degil ancak basit olmayan kisimlarini
asagida yazacagim.
  
 So a programming challenge for any student in any year. Solve
either
 part of Emre's problem

 (a) find and correct the bugs in Paul Graham's code. Emre has
 demonstrated one, my example above demonstrates another.
 (b) produce