Re: [julia-users] randperm run time is slow

2016-01-25 Thread Stefan Karpinski
The algorithm was fine, the issue was that sampling from a range of
integers was heavily optimized for the case of sampling repeatedly from the
same interval, rather than from different intervals. Since the Fisher-Yates
shuffle changes the interval for every sample, it was effectively a worst
case scenario. Rafael posted a change that gets around that worst case by
not calling the version that's optimized for sampling from the same
interval repeatedly.

On Sunday, January 24, 2016, Dan  wrote:

> As requested I deleted the post. But algorithms for creating permutations
> are standard and very much in the public domain (what does
> TheArtOfProgramming say?). If someone really invests a little effort into
> it, a more formally correct algorithm can be implemented.
>
> On Saturday, January 23, 2016 at 4:43:36 PM UTC+2, Kevin Squire wrote:
>>
>> Hi Dan,
>>
>> It would also be good if you deleted that post (or edited it and removed
>> the code), for the same reason Viral mentioned: if someone reads the post
>> and then creates a pull request for changing the Julia implementation, it
>> could be argued that that implementation was derived from GPL code, which
>> makes it GPL.  A clean-room implementation (such as creating the patch from
>> a higher level description of the code) is okay.
>>
>> (Viral, it would probably be good for you to remove the code from your
>> post as well.  I did so in this post below.)
>>
>> Cheers,
>>Kevin
>>
>> On Saturday, January 23, 2016 at 6:30:07 AM UTC-8, Viral Shah wrote:
>>>
>>> Unfortunately, this makes the implementation GPL. If you can describe
>>> the algorithm in an issue on github, someone can do a cleanroom
>>> implementation.
>>>
>>> -viral
>>>
>>> On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:

 Another way to go about it, is to look at R's implementation of a
 random permutation and recreate it in Julia, hoping for similar
 performance. Having done so, the resulting Julia code is:

 
 A size 1,000,000 permutation was generated x2 faster with this method.
 But, this method uses uniform floating random numbers, which might not
 be uniform on integers for large enough numbers. In general, it should be
 possible to optimize a more correct implementation. But if R envy is a
 driver, R performance is recovered with a pure Julia implementation (the R
 implementation is in C).

 On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet
 wrote:
>
> > Let's capture this as a Julia performance issue on github,
> > if we can't figure out an easy way to speed this up right away.
>
> I think I remember having identified a potentially sub-optimal
> implementation of this function few weeks back (perhaps no more than
> what Tim suggested) and had planned to investigate further (when time
> permits...)
>



Re: [julia-users] randperm run time is slow

2016-01-24 Thread Dan
A good description of random permutation generation is given in Wikipedia. 
It's the Fisher-Yates algorithm and its variants.

On Saturday, January 23, 2016 at 4:43:36 PM UTC+2, Kevin Squire wrote:
>
> Hi Dan, 
>
> It would also be good if you deleted that post (or edited it and removed 
> the code), for the same reason Viral mentioned: if someone reads the post 
> and then creates a pull request for changing the Julia implementation, it 
> could be argued that that implementation was derived from GPL code, which 
> makes it GPL.  A clean-room implementation (such as creating the patch from 
> a higher level description of the code) is okay.
>
> (Viral, it would probably be good for you to remove the code from your 
> post as well.  I did so in this post below.)
>
> Cheers,
>Kevin
>
> On Saturday, January 23, 2016 at 6:30:07 AM UTC-8, Viral Shah wrote:
>>
>> Unfortunately, this makes the implementation GPL. If you can describe the 
>> algorithm in an issue on github, someone can do a cleanroom implementation.
>>
>> -viral
>>
>> On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
>>>
>>> Another way to go about it, is to look at R's implementation of a random 
>>> permutation and recreate it in Julia, hoping for similar performance. 
>>> Having done so, the resulting Julia code is:
>>>
>>> 
>>> A size 1,000,000 permutation was generated x2 faster with this method.
>>> But, this method uses uniform floating random numbers, which might not 
>>> be uniform on integers for large enough numbers. In general, it should be 
>>> possible to optimize a more correct implementation. But if R envy is a 
>>> driver, R performance is recovered with a pure Julia implementation (the R 
>>> implementation is in C).
>>>  
>>> On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet wrote:

 > Let's capture this as a Julia performance issue on github, 
 > if we can't figure out an easy way to speed this up right away. 

 I think I remember having identified a potentially sub-optimal 
 implementation of this function few weeks back (perhaps no more than 
 what Tim suggested) and had planned to investigate further (when time 
 permits...) 

>>>

Re: [julia-users] randperm run time is slow

2016-01-24 Thread Dan
As requested I deleted the post. But algorithms for creating permutations 
are standard and very much in the public domain (what does 
TheArtOfProgramming say?). If someone really invests a little effort into 
it, a more formally correct algorithm can be implemented.

On Saturday, January 23, 2016 at 4:43:36 PM UTC+2, Kevin Squire wrote:
>
> Hi Dan, 
>
> It would also be good if you deleted that post (or edited it and removed 
> the code), for the same reason Viral mentioned: if someone reads the post 
> and then creates a pull request for changing the Julia implementation, it 
> could be argued that that implementation was derived from GPL code, which 
> makes it GPL.  A clean-room implementation (such as creating the patch from 
> a higher level description of the code) is okay.
>
> (Viral, it would probably be good for you to remove the code from your 
> post as well.  I did so in this post below.)
>
> Cheers,
>Kevin
>
> On Saturday, January 23, 2016 at 6:30:07 AM UTC-8, Viral Shah wrote:
>>
>> Unfortunately, this makes the implementation GPL. If you can describe the 
>> algorithm in an issue on github, someone can do a cleanroom implementation.
>>
>> -viral
>>
>> On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
>>>
>>> Another way to go about it, is to look at R's implementation of a random 
>>> permutation and recreate it in Julia, hoping for similar performance. 
>>> Having done so, the resulting Julia code is:
>>>
>>> 
>>> A size 1,000,000 permutation was generated x2 faster with this method.
>>> But, this method uses uniform floating random numbers, which might not 
>>> be uniform on integers for large enough numbers. In general, it should be 
>>> possible to optimize a more correct implementation. But if R envy is a 
>>> driver, R performance is recovered with a pure Julia implementation (the R 
>>> implementation is in C).
>>>  
>>> On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet wrote:

 > Let's capture this as a Julia performance issue on github, 
 > if we can't figure out an easy way to speed this up right away. 

 I think I remember having identified a potentially sub-optimal 
 implementation of this function few weeks back (perhaps no more than 
 what Tim suggested) and had planned to investigate further (when time 
 permits...) 

>>>

Re: [julia-users] randperm run time is slow

2016-01-23 Thread Dan
Another way to go about it, is to look at R's implementation of a random 
permutation and recreate it in Julia, hoping for similar performance. 
Having done so, the resulting Julia code is:

function nrandperm(r::AbstractRNG, n::Integer)
   res = Array(typeof(n),n)
   if n == 0
  return res
   end
   a = typeof(n)[i for i=1:n]
   nn = n
   @inbounds for i = 1:n
   j = floor(Int,nn*rand(r))+1
   res[i] = a[j]
   a[j] = a[nn]
   nn -= 1
   end
   return res
end
nrandperm(n::Integer) = nrandperm(Base.Random.GLOBAL_RNG, n)

A size 1,000,000 permutation was generated x2 faster with this method.
But, this method uses uniform floating random numbers, which might not be 
uniform on integers for large enough numbers. In general, it should be 
possible to optimize a more correct implementation. But if R envy is a 
driver, R performance is recovered with a pure Julia implementation (the R 
implementation is in C).
 
On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet wrote:
>
> > Let's capture this as a Julia performance issue on github, 
> > if we can't figure out an easy way to speed this up right away. 
>
> I think I remember having identified a potentially sub-optimal 
> implementation of this function few weeks back (perhaps no more than 
> what Tim suggested) and had planned to investigate further (when time 
> permits...) 
>


Re: [julia-users] randperm run time is slow

2016-01-23 Thread Viral Shah
Unfortunately, this makes the implementation GPL. If you can describe the 
algorithm in an issue on github, someone can do a cleanroom implementation.

-viral

On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
>
> Another way to go about it, is to look at R's implementation of a random 
> permutation and recreate it in Julia, hoping for similar performance. 
> Having done so, the resulting Julia code is:
>
> function nrandperm(r::AbstractRNG, n::Integer)
>res = Array(typeof(n),n)
>if n == 0
>   return res
>end
>a = typeof(n)[i for i=1:n]
>nn = n
>@inbounds for i = 1:n
>j = floor(Int,nn*rand(r))+1
>res[i] = a[j]
>a[j] = a[nn]
>nn -= 1
>end
>return res
> end
> nrandperm(n::Integer) = nrandperm(Base.Random.GLOBAL_RNG, n)
>
> A size 1,000,000 permutation was generated x2 faster with this method.
> But, this method uses uniform floating random numbers, which might not be 
> uniform on integers for large enough numbers. In general, it should be 
> possible to optimize a more correct implementation. But if R envy is a 
> driver, R performance is recovered with a pure Julia implementation (the R 
> implementation is in C).
>  
> On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet wrote:
>>
>> > Let's capture this as a Julia performance issue on github, 
>> > if we can't figure out an easy way to speed this up right away. 
>>
>> I think I remember having identified a potentially sub-optimal 
>> implementation of this function few weeks back (perhaps no more than 
>> what Tim suggested) and had planned to investigate further (when time 
>> permits...) 
>>
>

Re: [julia-users] randperm run time is slow

2016-01-23 Thread Kevin Squire
Hi Dan, 

It would also be good if you deleted that post (or edited it and removed 
the code), for the same reason Viral mentioned: if someone reads the post 
and then creates a pull request for changing the Julia implementation, it 
could be argued that that implementation was derived from GPL code, which 
makes it GPL.  A clean-room implementation (such as creating the patch from 
a higher level description of the code) is okay.

(Viral, it would probably be good for you to remove the code from your post 
as well.  I did so in this post below.)

Cheers,
   Kevin

On Saturday, January 23, 2016 at 6:30:07 AM UTC-8, Viral Shah wrote:
>
> Unfortunately, this makes the implementation GPL. If you can describe the 
> algorithm in an issue on github, someone can do a cleanroom implementation.
>
> -viral
>
> On Saturday, January 23, 2016 at 3:17:19 PM UTC+5:30, Dan wrote:
>>
>> Another way to go about it, is to look at R's implementation of a random 
>> permutation and recreate it in Julia, hoping for similar performance. 
>> Having done so, the resulting Julia code is:
>>
>> 
>> A size 1,000,000 permutation was generated x2 faster with this method.
>> But, this method uses uniform floating random numbers, which might not be 
>> uniform on integers for large enough numbers. In general, it should be 
>> possible to optimize a more correct implementation. But if R envy is a 
>> driver, R performance is recovered with a pure Julia implementation (the R 
>> implementation is in C).
>>  
>> On Saturday, January 23, 2016 at 7:26:49 AM UTC+2, Rafael Fourquet wrote:
>>>
>>> > Let's capture this as a Julia performance issue on github, 
>>> > if we can't figure out an easy way to speed this up right away. 
>>>
>>> I think I remember having identified a potentially sub-optimal 
>>> implementation of this function few weeks back (perhaps no more than 
>>> what Tim suggested) and had planned to investigate further (when time 
>>> permits...) 
>>>
>>

Re: [julia-users] randperm run time is slow

2016-01-23 Thread Rafael Fourquet
The problem I think came essentially from the repeated creation of
RangeGenerator objects, cf.
https://github.com/JuliaLang/julia/pull/14772.


[julia-users] randperm run time is slow

2016-01-22 Thread Brian Lucena
I am running a simulation that requires me to generate random permutations 
of size 300 million.  I would ideally like to run this generation in a loop 
10K or 100K times. I am surprised that randperm is not faster in Julia than 
it is.  It seems to be considerably slower than the equivalent in R (and R 
is clearly not known for speed)  :) 

In Julia:

*julia> **@time asd=randperm(3)*

 43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time)

*3-element Array{Int64,1}:*


In R


> start = Sys.time()

> asd = sample(3)

> Sys.time()-start

Time difference of 23.27244 secs

Julia seems to be twice as slow!

Any thoughts on why that is?  Does randperm use the "knuth shuffle" or does 
it use some other algorithm.

Thanks,
Brian



Re: [julia-users] randperm run time is slow

2016-01-22 Thread Rafael Fourquet
> Let's capture this as a Julia performance issue on github,
> if we can't figure out an easy way to speed this up right away.

I think I remember having identified a potentially sub-optimal
implementation of this function few weeks back (perhaps no more than
what Tim suggested) and had planned to investigate further (when time
permits...)


Re: [julia-users] randperm run time is slow

2016-01-22 Thread Viral Shah
Generating one by one or together is a small part of the execution time:

julia> @time rand(3*10^8);
  1.281946 seconds (10 allocations: 2.235 GB, 0.23% gc time)

julia> r(n) = for i=1:n; rand(); end;
julia> @time r(3*10^8)
  0.632389 seconds (6 allocations: 192 bytes)

The one by one is perhaps faster because it doesn't have the array 
allocation overhead. So, it 


-viral

On Saturday, January 23, 2016 at 6:53:38 AM UTC+5:30, Tim Holy wrote:
>
> Try 
> @edit randperm(10) 
> and see for yourself. 
>
> My bet is that you could speed it up by generating all the random numbers 
> you'll need in one go, rather than generating them one-by-one. Want to 
> give it 
> a shot? 
>
> --Tim 
>
> On Friday, January 22, 2016 02:54:51 PM Brian Lucena wrote: 
> > I am running a simulation that requires me to generate random 
> permutations 
> > of size 300 million.  I would ideally like to run this generation in a 
> loop 
> > 10K or 100K times. I am surprised that randperm is not faster in Julia 
> than 
> > it is.  It seems to be considerably slower than the equivalent in R (and 
> R 
> > is clearly not known for speed)  :) 
> > 
> > In Julia: 
> > 
> > *julia> **@time asd=randperm(3)* 
> > 
> >  43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time) 
> > 
> > *3-element Array{Int64,1}:* 
> > 
> > 
> > In R 
> > 
> > > start = Sys.time() 
> > > 
> > > asd = sample(3) 
> > > 
> > > Sys.time()-start 
> > 
> > Time difference of 23.27244 secs 
> > 
> > Julia seems to be twice as slow! 
> > 
> > Any thoughts on why that is?  Does randperm use the "knuth shuffle" or 
> does 
> > it use some other algorithm. 
> > 
> > Thanks, 
> > B 
>


Re: [julia-users] randperm run time is slow

2016-01-22 Thread Viral Shah
If you see the code as @timholy suggested, you will see the knuth shuffle 
implementation. Let's capture this as a Julia performance issue on github, 
if we can't figure out an easy way to speed this up right away.

-viral

On Saturday, January 23, 2016 at 9:47:21 AM UTC+5:30, Viral Shah wrote:
>
> Generating one by one or together is a small part of the execution time:
>
> julia> @time rand(3*10^8);
>   1.281946 seconds (10 allocations: 2.235 GB, 0.23% gc time)
>
> julia> r(n) = for i=1:n; rand(); end;
> julia> @time r(3*10^8)
>   0.632389 seconds (6 allocations: 192 bytes)
>
> The one by one is perhaps faster because it doesn't have the array 
> allocation overhead. So, it 
>
>
> -viral
>
> On Saturday, January 23, 2016 at 6:53:38 AM UTC+5:30, Tim Holy wrote:
>>
>> Try 
>> @edit randperm(10) 
>> and see for yourself. 
>>
>> My bet is that you could speed it up by generating all the random numbers 
>> you'll need in one go, rather than generating them one-by-one. Want to 
>> give it 
>> a shot? 
>>
>> --Tim 
>>
>> On Friday, January 22, 2016 02:54:51 PM Brian Lucena wrote: 
>> > I am running a simulation that requires me to generate random 
>> permutations 
>> > of size 300 million.  I would ideally like to run this generation in a 
>> loop 
>> > 10K or 100K times. I am surprised that randperm is not faster in Julia 
>> than 
>> > it is.  It seems to be considerably slower than the equivalent in R 
>> (and R 
>> > is clearly not known for speed)  :) 
>> > 
>> > In Julia: 
>> > 
>> > *julia> **@time asd=randperm(3)* 
>> > 
>> >  43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time) 
>> > 
>> > *3-element Array{Int64,1}:* 
>> > 
>> > 
>> > In R 
>> > 
>> > > start = Sys.time() 
>> > > 
>> > > asd = sample(3) 
>> > > 
>> > > Sys.time()-start 
>> > 
>> > Time difference of 23.27244 secs 
>> > 
>> > Julia seems to be twice as slow! 
>> > 
>> > Any thoughts on why that is?  Does randperm use the "knuth shuffle" or 
>> does 
>> > it use some other algorithm. 
>> > 
>> > Thanks, 
>> > B 
>>
>

Re: [julia-users] randperm run time is slow

2016-01-22 Thread Tim Holy
Try
@edit randperm(10)
and see for yourself.

My bet is that you could speed it up by generating all the random numbers 
you'll need in one go, rather than generating them one-by-one. Want to give it 
a shot?

--Tim

On Friday, January 22, 2016 02:54:51 PM Brian Lucena wrote:
> I am running a simulation that requires me to generate random permutations
> of size 300 million.  I would ideally like to run this generation in a loop
> 10K or 100K times. I am surprised that randperm is not faster in Julia than
> it is.  It seems to be considerably slower than the equivalent in R (and R
> is clearly not known for speed)  :)
> 
> In Julia:
> 
> *julia> **@time asd=randperm(3)*
> 
>  43.437829 seconds (6 allocations: 2.235 GB, 0.01% gc time)
> 
> *3-element Array{Int64,1}:*
> 
> 
> In R
> 
> > start = Sys.time()
> > 
> > asd = sample(3)
> > 
> > Sys.time()-start
> 
> Time difference of 23.27244 secs
> 
> Julia seems to be twice as slow!
> 
> Any thoughts on why that is?  Does randperm use the "knuth shuffle" or does
> it use some other algorithm.
> 
> Thanks,
> B