Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote: On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote: On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? Well this isn't as simple problem as it sounds. A random 32 bit integer has 32 bits of randomness. If you take a divison reminder of 62 from this integer you will get only 5,95419631039 bits of randomness (log(62)/log(2)). So you are wasting 81,4% of your random data. Which is quite much and usually random data is quite expensive. You can save your precious random data by taking only 6 bit from your 32 bit integer and dividing it by 62. Then you will be wasting only 0,8% of random data. Another problem is alignment, but that is about mathematical correctness. How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: The easiest is not mathematically correct though. Random data will stay random only if you select and modify it so that randomness is preserved. If you take devison reminder of 62 from 32 bit integer there are 69 273 667 possibilities of the reminder to be 3 or less. For the reminder to 4 or more the number of possibilities is 69 273 666. In mathematically ideal case the probability for every index of the list should be same: 1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%. If you wish not to waste those random bits the probabilities will get worse. With 6 bits of random the probability for index 0-1 will be 2/64 and for 2-63 it will be 1/64. This is a very significant change because first and second index will appear twice as much as the rest. If you add 2 characters to your list you will perfect alignment and you can take 6 bits of data without it modifying probabilities. If you are looking a mathematically perfect solution there is a simple one even if your list is not in the power of 2! Take 6 bits at a time of the random data. If the result is 62 or 63 you will discard the data and get the next 6 bits. This selectively modifies the random data but keeps the probabilities in correct balance. Now the probability for index of 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } so modify the next_char function: char next_character() { static unsigned int rand = 0; //(sizeof(int) = 32) static char bit_avail = 0; char result = 0; char move_bits = 0; char bits_moved = 0; do { if (!bits_avail) { // Use the correct call for ISAAC instead of rand() rand = rand(); bit_avail = 32; } move_bits = bits_avail = 6 ? 6 : bits_avail; result = move_bits; result = (result | rand (0xFF (8 - move_bits))) 0x3F; bits_avail -= move_bits; bits_moved += move_bits; rand = move_bits; } while (bits_moved != 6 result 61); return result; } Well actually it looks simpler if you break this like this: unsigned char get_6bits () { static unsigned int rand = 0; //(sizeof(int) = 32) static char bits_avail = 0; unsigned char result = 0; //get 2 bits 3 times: 32 is devidable by 2 for (int i = 0; i 3; i++) { // --std=c99 //Fill buffer if it is empty! if (!bits_avail || bits_avail 0 ) { //if bits_avail 0 it is an error!
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
Am Sat, 7 Jun 2014 12:19:11 +0300 schrieb Matti Nykyri matti.nyk...@iki.fi: On Sat, Jun 07, 2014 at 12:03:29AM +0300, Matti Nykyri wrote: [...] unsigned char get_6bits () { static unsigned int rand = 0; //(sizeof(int) = 32) Just an itty bitty nitpick: since you already require C99 as per the for loop below, you might as well use uint32_t (from stdint.h) instead of assuming that sizeof(int) == 32 :) . static char bits_avail = 0; unsigned char result = 0; //get 2 bits 3 times: 32 is devidable by 2 for (int i = 0; i 3; i++) { // --std=c99 //Fill buffer if it is empty! if (!bits_avail || bits_avail 0 ) { //if bits_avail 0 it is an error! // Use the correct call for ISAAC instead of rand() rand = rand(); bits_avail = 32; } result = 2; //move two bits to left. result = result | (rand 0x3); //add two least signifigant bits to the result. rand = 2; //move two bits to right. bits_avail -= 2; } return result; //result has 6 bits of random data... } char next_character() { unsigned char idx = 0; do { idx = get_6bits(); } while (idx 61); return S[idx]; } Very simple :) [...] -- Marc Joliet -- People who think they know everything really annoy those of us who know we don't - Bjarne Stroustrup signature.asc Description: PGP signature
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
Canek Peláez Valdés can...@gmail.com [14-06-06 17:36]: On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i 20; i++) // --std=c99 printf(%c\n, next_character()); return 0; } --- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution among the set [A-Za-z0-9]. Unless I missunderstood what you meant with create random numbers in the range of [a-Za-z0-9]. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México Hi, Thank you very much for the input! :) I have a question about the algorithm: Suppose rand() has an equal distribution of numbers and furthermore one has a count of 2^32 random numbers listed in numerical sort order. In this list each number would appear (nearly) with the same count: 1 To get an better imagination of that...suppose the rand() would only return numbers in the range of 1...12 and the alphabet has only 8 characters (as 2^32 is not devideable by 62) rand(): 1 2 3 4 5 6 7 8 9 10 11 12 rand()%N : rand()%7 1 2 3 4 5 6 7 0 1 2 3 4 or in other words: An even distribution of numbers of rand() would result in a unevenly distributed sequence of characters...or? This would break the quality of ISAACs output. I am sure I did something wrong here...but where is the logic trap? Thank you very much for any help in advance! Best regards, mcc
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On 06/06/14 20:39, meino.cra...@gmx.de wrote: To get an better imagination of that...suppose the rand() would only return numbers in the range of 1...12 and the alphabet has only 8 characters (as 2^32 is not devideable by 62) rand(): 1 2 3 4 5 6 7 8 9 10 11 12 rand()%N : rand()%7 1 2 3 4 5 6 7 0 1 2 3 4 or in other words: An even distribution of numbers of rand() would result in a unevenly distributed sequence of characters...or? This would break the quality of ISAACs output. I am sure I did something wrong here...but where is the logic trap? You thought on that is totally right. Let's say random numbers have the rande [0..RAND_MAX] Blindly calculating modulus creates a bias towards lower numbers if RAND_MAX+1 is not divisible by N. For most applications this bias is negligible. If you really want the same distribution you have to discard numbers greater than RAND_MAX - ((RAND_MAX+1)%N) (Beware the undefined behaviour in the above line if you try C.) Basically you want one less than the largest by N dividable number in your range. With numbers: rand(): 0 1 2 3 4 5 6 7 8 9 10 11 12 rand()%5 with discarding 12-((12+1)%5)=9: 0 1 2 3 4 0 1 2 3 4 discarded --- null_ptr Please don't follow me.
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On Fri, Jun 6, 2014 at 1:39 PM, meino.cra...@gmx.de wrote: Canek Peláez Valdés can...@gmail.com [14-06-06 17:36]: On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i 20; i++) // --std=c99 printf(%c\n, next_character()); return 0; } --- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution among the set [A-Za-z0-9]. Unless I missunderstood what you meant with create random numbers in the range of [a-Za-z0-9]. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México Hi, Thank you very much for the input! :) I have a question about the algorithm: Suppose rand() has an equal distribution of numbers and furthermore one has a count of 2^32 random numbers listed in numerical sort order. In this list each number would appear (nearly) with the same count: 1 To get an better imagination of that...suppose the rand() would only return numbers in the range of 1...12 and the alphabet has only 8 characters (as 2^32 is not devideable by 62) rand(): 1 2 3 4 5 6 7 8 9 10 11 12 rand()%N : rand()%7 1 2 3 4 5 6 7 0 1 2 3 4 or in other words: An even distribution of numbers of rand() would result in a unevenly distributed sequence of characters...or? This would break the quality of ISAACs output. I am sure I did something wrong here...but where is the logic trap? In theory, it doesn't matter if the number of random characters you want is not multiple of the size of the set of characters. In practice, it doesn't matter if the number of random characters you want is big enough. Consider the following modification to my program (I changed the order of the array S so it's sorted and I can do binary search): -- #include time.h #include stdio.h #include stdlib.h #define N (10+26+26) static char S[] = { '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z' }; static int index_of_aux(char c, int a, int b) { if (a b) return -1; int m = (a + b) / 2; if (S[m] == c) return m; if (S[m] c) return index_of_aux(c, a, m-1); return index_of_aux(c, m+1, b); } static int index_of(char c) { return index_of_aux(c, 0, N-1); } int next_character() { // Use the correct call for isaac instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { srand((unsigned int)time(NULL)); int total = 100; // Size = 62 int count[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; for (int i = 0; i total; i++) { char c = next_character(); count[index_of(c)]++; } int min =
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On Thu, Jun 05, 2014 at 10:58:51PM -0500, Canek Peláez Valdés wrote: On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? Well this isn't as simple problem as it sounds. A random 32 bit integer has 32 bits of randomness. If you take a divison reminder of 62 from this integer you will get only 5,95419631039 bits of randomness (log(62)/log(2)). So you are wasting 81,4% of your random data. Which is quite much and usually random data is quite expensive. You can save your precious random data by taking only 6 bit from your 32 bit integer and dividing it by 62. Then you will be wasting only 0,8% of random data. Another problem is alignment, but that is about mathematical correctness. How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: The easiest is not mathematically correct though. Random data will stay random only if you select and modify it so that randomness is preserved. If you take devison reminder of 62 from 32 bit integer there are 69 273 667 possibilities of the reminder to be 3 or less. For the reminder to 4 or more the number of possibilities is 69 273 666. In mathematically ideal case the probability for every index of the list should be same: 1/62 = 1,61290322581%. But the modulo 62 modifies this probability: for index 0-3 the probability is 69 273 667/2^32 = 1,61290324759%. And for indexes 4-61 the probability will be 69 273 666/2^32 = 1,6129032243%. If you wish not to waste those random bits the probabilities will get worse. With 6 bits of random the probability for index 0-1 will be 2/64 and for 2-63 it will be 1/64. This is a very significant change because first and second index will appear twice as much as the rest. If you add 2 characters to your list you will perfect alignment and you can take 6 bits of data without it modifying probabilities. If you are looking a mathematically perfect solution there is a simple one even if your list is not in the power of 2! Take 6 bits at a time of the random data. If the result is 62 or 63 you will discard the data and get the next 6 bits. This selectively modifies the random data but keeps the probabilities in correct balance. Now the probability for index of 0-61 is 1/62 because the probability to get 62-63 out of 64 if 0. --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } so modify the next_char function: char next_character() { static unsigned int rand = 0; //(sizeof(int) = 32) static char bit_avail = 0; char result = 0; char move_bits = 0; char bits_moved = 0; do { if (!bits_avail) { // Use the correct call for ISAAC instead of rand() rand = rand(); bit_avail = 32; } move_bits = bits_avail = 6 ? 6 : bits_avail; result = move_bits; result = (result | rand (0xFF (8 - move_bits))) 0x3F; bits_avail -= move_bits; bits_moved += move_bits; rand = move_bits; } while (bits_moved != 6 result 61); return result; } This function will give perfect distribution of 1/62 probability for every index. It will waste 6 bits with the probability of 1/32 (2/64). int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i 20; i++) // --std=c99 printf(%c\n, next_character()); return 0; } --- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On Fri, Jun 06, 2014 at 08:39:28PM +0200, meino.cra...@gmx.de wrote: Canek Peláez Valdés can...@gmail.com [14-06-06 17:36]: On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i 20; i++) // --std=c99 printf(%c\n, next_character()); return 0; } --- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution among the set [A-Za-z0-9]. Unless I missunderstood what you meant with create random numbers in the range of [a-Za-z0-9]. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México Hi, Thank you very much for the input! :) I have a question about the algorithm: Suppose rand() has an equal distribution of numbers and furthermore one has a count of 2^32 random numbers listed in numerical sort order. In this list each number would appear (nearly) with the same count: 1 To get an better imagination of that...suppose the rand() would only return numbers in the range of 1...12 and the alphabet has only 8 characters (as 2^32 is not devideable by 62) rand(): 1 2 3 4 5 6 7 8 9 10 11 12 rand()%N : rand()%7 1 2 3 4 5 6 7 0 1 2 3 4 or in other words: An even distribution of numbers of rand() would result in a unevenly distributed sequence of characters...or? This would break the quality of ISAACs output. I am sure I did something wrong here...but where is the logic trap? This is the thing I explained in my message. -- -Matti
[gentoo-user] OT: Mapping random numbers (PRNG)
Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. How can I do this mathemtically (in concern of the quality of output) correct? Thank you very much in advance for any help! Best regards, mcc
Re: [gentoo-user] OT: Mapping random numbers (PRNG)
On Thu, Jun 5, 2014 at 9:56 PM, meino.cra...@gmx.de wrote: Hi, I am experimenting with the C code of the ISAAC pseudo random number generator (http://burtleburtle.net/bob/rand/isaacafa.html). Currently the implementation creates (on my embedded linux) 32 bit hexadecimal output. So it's a 32 bit integer. From this I want to create random numbers in the range of [a-Za-z0-9] *without violating randomness* and (if possible) without throwing away bits of the output. You mean *characters* int the range [A-Za-z0-9]? How can I do this mathemtically (in concern of the quality of output) correct? The easiest thing to do would be: --- #include time.h #include stdio.h #include stdlib.h #define N (26+26+10) static char S[] = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9' }; int next_character() { // Use the correct call for ISAAC instead of rand() unsigned int idx = rand() % N; return S[idx]; } int main(int argc, char* argv[]) { // Use the correct call for initializing the ISAAC seed srand((unsigned int)time(NULL)); for (int i = 0; i 20; i++) // --std=c99 printf(%c\n, next_character()); return 0; } --- If the ISAAC RNG has a good distribution, then the next_character() function will give a good distribution among the set [A-Za-z0-9]. Unless I missunderstood what you meant with create random numbers in the range of [a-Za-z0-9]. Regards. -- Canek Peláez Valdés Profesor de asignatura, Facultad de Ciencias Universidad Nacional Autónoma de México