You must have missed Jose Mario Quintana's post: http://jsoftware.com/pipermail/programming/2012-January/026885.html
-- Raul On Fri, Jan 27, 2012 at 2:53 PM, Devon McCormick <devon...@gmail.com> wrote: > One thing not considered so far is that birthdays are not evenly > distributed - see http://www.panix.com/~murphy/bday.html - and this affects > the probability. > > On Fri, Jan 27, 2012 at 9:50 AM, Linda Alvord <lindaalv...@verizon.net>wrote: > >> I'm glad that we have finally covered most of the conditions that could >> occur. I'll enjoy pondering your work. I'm working on a "narrower" >> challenge for next time that will not take us so far afield. >> >> Linda >> >> -----Original Message----- >> From: programming-boun...@jsoftware.com >> [mailto:programming-boun...@jsoftware.com] On Behalf Of Jose Mario >> Quintana >> Sent: Thursday, January 26, 2012 11:08 PM >> To: Programming forum >> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays >> >> Regarding the original task, one can proceed (knowing the actual densities) >> via a shifted multinomial simulation, >> >> multinomial=. +/\ o [ <: o ((0: , [) I. ]) ? o $&0 o ] NB. dyadic verb >> >> 7 (samples=. ((densities o [) (2+multinomial) ]) ("0)) 10 10 10 10 10 NB. >> day of the week samples >> 2 3 4 6 5 6 2 6 2 6 >> 3 4 2 8 2 5 4 5 3 5 >> 3 2 2 6 2 3 4 5 2 3 >> 4 5 5 6 2 6 5 7 2 3 >> 5 2 3 5 5 5 4 5 4 7 >> >> mean=. +/ % # >> >> 365 (samplesmeans=. (mean"1 o samples)) 10000000 NB. day of the year 10 >> million sample mean >> 24.6141 >> >> 10 (] , mean) o (365 &samplesmeans) o # 500 NB. the original task >> 24.264 23.782 24.334 24.516 25.016 24.704 25.05 24.514 23.93 25.25 24.536 >> >> (] , mean) o (365 &samplesmeans) o # f. NB. according to the "simple" >> rules? >> (] , +/ % #)@:(365&((+/ % #)"1@:((+/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.))@:[ >> (2 + +/\@:[ <:@:((0: , [) I. ]) ?@:$&0@:]) ])"0)))@:# >> >> Regarding accuracy, among other things, it can be argued that the >> distribution could even depend whether the experiment is conducted in the >> northern or the southern hemisphere (see >> http://www.panix.com/~murphy/bday.html and >> http://answers.google.com/answers/threadview/id/280242.html). Models, >> maps, >> and other representations are ultimately doomed to be inaccurate; the >> subject matter is not only too complex but also evolving; above all, of >> course, my representation of "the world" that is my mind is affected as >> well :) >> >> ________________________________________ >> From: programming-boun...@jsoftware.com [programming-boun...@jsoftware.com >> ] >> On Behalf Of Jose Mario Quintana [josemarioquint...@2bestsystems.com] >> Sent: Tuesday, January 24, 2012 2:23 PM >> To: Programming forum >> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays >> >> > +/(2+i.1000) * p * q NB. expected value >> > 24.6166 >> >> I found the same solution in a slightly different way, >> >> ((2 + i.) +/ .* +/\^:_1@:((1 - */\)@:(1 - ] %~ 1 + i.))) 365 >> 24.6166 >> >> The outline follows: >> >> It is easier to start dealing with the day of the week birthday process >> first, >> >> (outcomes=. 2 + i.) 7 NB. all other outcomes have zero densities; thus, >> they are irrelevant >> 2 3 4 5 6 7 8 >> >> o=. @: >> >> (cp=. 1 - ] %~ 1 + i.) 7 NB. conditional probabilities the process will >> not stop at each outcome given that it did not stop at its predecessor >> 0.857143 0.714286 0.571429 0.428571 0.285714 0.142857 0 >> >> cdf=. 1 - */\ o cp NB. cumulative distribution function >> >> load'plot' >> plot (0 0 , cdf) 7 NB. ploting the (smoothed) cdf >> >> densities=. +/\^:_1 o cdf NB. since cdf -: +/\ densities >> >> (mean=. outcomes +/ .* densities) 7 NB. formula for discrete densities >> 4.01814 >> >> This generalizes to the day of the year birthday process, >> >> plot (0 0 , cdf) 365 >> mean 365 >> 24.6166 >> >> mean f. >> (2 + i.) +/ .* +/\^:_1@:(1 - */\@:(1 - ] %~ 1 + i.)) >> >> >> >> ________________________________________ >> From: programming-boun...@jsoftware.com [programming-boun...@jsoftware.com >> ] >> On Behalf Of Mike Day [mike_liz....@tiscali.co.uk] >> Sent: Friday, January 20, 2012 7:54 PM >> To: Programming forum >> Subject: Re: [Jprogramming] Challenge 4 Bountiful Birthdays >> >> My "trial" function, listed earlier (and below) was >> not quite correct, as it failed to count the >> successful person. >> >> So it should be: >> >> trialb =: ([: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0 >> >> So we get, for example (but it's very slow! My variant >> triala discussed with Linda is somewhat better): >> >> (mean, stdev) mean trialb 5000 100 $ _1 >> 24.6133 0.180788 >> >> Linda thinks the mean should be somewhat lower, and >> Brian thinks it's a lot lower. However, the standard >> deviation suggests it's close to the true value. >> >> I think this is the way to find the true expected number >> of people. We don't need Markov after all: >> >> Probability that (n-1) arrivals all have different >> b/days: >> >> q =: Prod (1 - i%Y), 0<: i <: n-2, Y =~ 365 >> >> Probability that the nth arrival's b/day is one of >> those present, ie is one of n-1 distinct bdays: >> >> p =: (n-1) % Y >> >> Expected value of number of arrivals for "success": >> >> Sum (2+i) pi * qi, 0 <: i <: n-2 >> >> In J: >> 5{. q =: */\(1 - (365 %~(i.))) 1000 >> 1 0.99726 0.991796 0.983644 0.972864 >> >> 5 {. p =: (365 %~>:@i. )1000 >> 0.00273973 0.00547945 0.00821918 0.0109589 0.0136986 >> >> +/(2+i.1000) * p * q NB. expected value >> 24.6166 >> >> This is not the same as the median, where the >> probability q moves below 0.5, >> >> 21 22 { q >> 0.524305 0.492703 >> >> As Roger observes, the index origin comes into play; >> we should add one as the first person is 1, not zero (!) >> and the median group size is therefore just below 23. >> >> This last is dealing with a slightly different problem: >> what is the probability that a certain sized group of >> people do (not) share a birthday? So we shouldn't be >> surprised at the difference. >> >> Mike >> >> On 18/01/2012 3:17 PM, Mike Day wrote: >> > People seem to be tackling two different problems. >> > >> > Variations on the Birthday Problem as I remember them: >> > (a) what is the probability that two (or more) people >> > share a birthday in a group of N people? >> > (b) what should N be for the probability to be (say) 0.5 ? >> > The somewhat counter-intuitive answers are dealt with in >> > Roger's Wiki Essay, among many treatments, and also >> > Pablo's message, below. The essential point is to >> > consider the probability that there are no matches. >> > >> > However, Linda's single trial as stated is a random >> > process with a stopping condition: >> > take one person at a time until the new person shares a >> > birthday with those already present. The result is the >> > number of people including the new arrival. >> > >> > I expect you need a Markov Process approach to get the >> > exact expected value for the stopping number. Not proved! >> > >> > Here's a stab at the required simulation, avoiding @ and @: >> > though using [: >> > >> > NB. I use _1 as seed, so need to decrement the count >> > >> > trial =: (_1 + [: # (] (,`]@.e.~ ([: ? 365"_)))^:_)"0 >> > >> > trial 10#_1 NB. eg conduct 10 trials >> > 27 19 29 2 24 42 30 9 34 33 >> > >> > mean =: +/%# NB. ok for vectors or columns of matrix >> > >> > ([:(;~mean) mean) TRIALS =: trial 500 10 $ _1 >> > >> >> +-------+------------------------------------------------------------------- >> + >> > >> > |23.5882|22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81 >> > 23.654| >> > >> >> +-------+------------------------------------------------------------------- >> + >> > >> > >> > These means are indeed close to N in problems >> > (a) & (b) where the probability is ~0.50, namely >> > 21 for 0.475695 and 22 for 0.507297, but not the >> > same. >> > >> > I used 365 rather than Pablo's 365.25 . The simulation >> > could be done for 365.25, using the integer 1461 (say). >> > The stopping condition would be a bit more complicated. >> > >> > The deviation of trials is quite large: >> > SS =: [: *: (-"1 mean) NB. squared deviations from mean >> > stdev=: [: %: [: mean SS NB. Observed Standard deviation >> > NB. not necessarily recommended for real, large sets of data >> > >> > (mean,:stdev) TRIALS >> > 22.696 23.676 23.894 24.044 23.874 23.56 24.258 23.416 22.81 >> > 23.654 >> > 11.9378 12.6587 12.6917 12.5288 12.281 11.9741 12.1957 11.442 12.0969 >> > 12.8718 >> > >> > NB. standard deviation of the means: >> > >> > (mean, stdev) mean TRIALS >> > 23.5882 0.477041 >> > >> > Mike >> > >> > >> > >> > >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > > -- > Devon McCormick, CFA > ^me^ at acm. > org is my > preferred e-mail > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm