Re: [math] Improving speed of UnitSphereRandomVectorGenerator

2013-01-30 Thread Gilles

Hi.

On Wed, 30 Jan 2013 11:54:50 +, Sean Owen wrote:

Hello all, I have a small proposal to improve the speed of
UnitSphereRandomVectorGenerator. This class picks a random point on
the unit n-sphere -- a unit vector, chosen uniformly from all 
possible

directions.

It does so using a rejection process -- generates a random point in
the unit n-cube (well, with side lengths 2) and rejects any points
outside the unit n-sphere, then normalizes the length. This is 
correct

and works well at low dimension. However the volume of the unit
n-sphere compared to the unit n-cube drops exponentially. This method
eventually takes an extraordinary amount of time when dimensions get
past about 20, since virtually no samples are usable.

For example, here is the time taken to make pick 10 points as a
function of dimension up to 20:

1 : 11
2 : 1
3 : 0
4 : 1
5 : 0
6 : 1
7 : 1
8 : 17
9 : 4
10 : 3
11 : 13
12 : 32
13 : 15
14 : 41
15 : 220
16 : 897
17 : 1770
18 : 7426
19 : 48457
20 : 122647
...

It's equally correct, and much faster, to generate these points by
picking n standard Gaussians and normalizing. This method takes
negligible time even into thousands of dimensions.

Is this non-trivial enough that I might propose a patch? I wanted to
ask users first.


Patch certainly welcome!
Also, please open a ticket on the bug-tracking system.

Thanks,
Gilles


-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Improving speed of UnitSphereRandomVectorGenerator

2013-01-30 Thread Thomas Neidhart
On Wed, Jan 30, 2013 at 4:56 PM, Gilles gil...@harfang.homelinux.orgwrote:

 Hi.


 On Wed, 30 Jan 2013 11:54:50 +, Sean Owen wrote:

 Hello all, I have a small proposal to improve the speed of
 UnitSphereRandomVectorGenerato**r. This class picks a random point on
 the unit n-sphere -- a unit vector, chosen uniformly from all possible
 directions.

 It does so using a rejection process -- generates a random point in
 the unit n-cube (well, with side lengths 2) and rejects any points
 outside the unit n-sphere, then normalizes the length. This is correct
 and works well at low dimension. However the volume of the unit
 n-sphere compared to the unit n-cube drops exponentially. This method
 eventually takes an extraordinary amount of time when dimensions get
 past about 20, since virtually no samples are usable.

 For example, here is the time taken to make pick 10 points as a
 function of dimension up to 20:

 1 : 11
 2 : 1
 3 : 0
 4 : 1
 5 : 0
 6 : 1
 7 : 1
 8 : 17
 9 : 4
 10 : 3
 11 : 13
 12 : 32
 13 : 15
 14 : 41
 15 : 220
 16 : 897
 17 : 1770
 18 : 7426
 19 : 48457
 20 : 122647
 ...

 It's equally correct, and much faster, to generate these points by
 picking n standard Gaussians and normalizing. This method takes
 negligible time even into thousands of dimensions.

 Is this non-trivial enough that I might propose a patch? I wanted to
 ask users first.


 Patch certainly welcome!
 Also, please open a ticket on the bug-tracking system.


ohoh, I had the reply open and was distracted, and in the mean time you
already responded ;-).

Thomas