Re: [julia-users] Security problem with unitialized memory

2014-11-25 Thread Viral Shah
Much of this this discussion is now captured in the issue Stefan opened. 

-viral

On Tuesday, November 25, 2014 9:05:38 PM UTC+5:30, Stefan Karpinski wrote:
>
> Issue created: https://github.com/JuliaLang/julia/issues/9147.
>
> On Tue, Nov 25, 2014 at 10:16 AM, Stefan Karpinski  
> wrote:
>
>> It seems more reasonable to me to always zero uninitialized fields of 
>> composite values. This is basically free since objects larger than a memory 
>> page are not common.
>>
>> On Tue, Nov 25, 2014 at 1:13 AM, Ronald L. Rivest  
>> wrote:
>>
>>> Sorry; zeros() does not work here instead of new().  My mistake.
>>> Is there a safe alternative to new() that guarantees that all fields
>>> will have a definite fixed value?
>>>
>>> Cheers,
>>> Ron
>>>
>>>
>>> On Tuesday, November 25, 2014 1:05:40 AM UTC-5, Ronald L. Rivest wrote:

 The problem also exists for new() (e.g. when initializing a 
 record/object).  zeros() can
 apparently be used here instead.

 Cheers,
 Ron

 On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote:
>
> Much has been already said on this topic. 
>
> The Array(...) interface was kind of meant to be low-level for the 
> user of scientific computing, only to be used when they know what they 
> are 
> doing. You get the raw uninitialized memory as fast as possible.
>
> The user-facing interface was always an array constructor - zeros(), 
> ones(), rand(), etc. Some of this is because of our past experience 
> coming 
> from a matlab/R-like world. 
>
> As Julia has become more popular, we have realized that those not 
> coming from matlab/R end up using all the possible constructors. While 
> this 
> has raised a variety of issues, I'd like to say that this will not get 
> sorted out satisfactorily before the 0.4 release. For a class that may be 
> taught soon, the thing to do would be to use the zeros/ones/rand 
> constructors to construct arrays, instead of Array(), which currently is 
> more for a package developer. I understand that Array() is a much better 
> name as Stefan points out, but zeros() is not too terrible - it at least 
> clearly tells the user that they get zeroed out arrays.
>
> While we have other "features" that can lead to unsafe code (ccall, 
> @inbounds), none of these are things one is likely to run into while 
> learning the language.
>
> -viral
>
> On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest 
> wrote:
>>
>> Regarding initialization:
>>
>>-- I'm toying with the idea of recommending Julia for an 
>> introductory programming
>>   class (rather than Python).  
>>
>>-- For this purpose, the language should not have hazards that 
>> catch the unwary.
>>
>>-- Not initializing storage is definitely a hazard.  With 
>> uninitialized storage, a 
>>   program may run fine one day, and fail mysteriously the next, 
>> depending on 
>>   the contents of memory.  This is about predictability, 
>> reliability, dependability,
>>   and correctness.
>>
>>-- I would favor a solution like
>>  A = Array(Int64,n)   -- fills with zeros
>>  A = Array(Int64,n,fill=1)  -- to fill with ones
>>  A = Array(Int64,n,fill=None)-- for an uninitialized 
>> array
>>so that the *default* is an initialized array, but the speed 
>> geeks
>>can get what they want.
>>
>> Cheers,
>> Ron
>>
>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>>
>>> If we can make allocating zeroed arrays faster that's great, but 
>>> unless we can close the performance gap all the way and eliminate the 
>>> need 
>>> to allocated uninitialized arrays altogether, this proposal is just a 
>>> rename – Unchecked.Array plays the exact same role as the current 
>>> Array constructor. It's unclear that this would even address the 
>>> original 
>>> concern since it still *allows* uninitialized allocation of arrays. 
>>> This 
>>> rename would just force people who have used Array correctly in code 
>>> that 
>>> cares about being as efficient as possible even for very large arrays 
>>> to 
>>> change their code and use Unchecked.Array instead.
>>>
>>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>>
>>> I think that Rivest’s question may be a good reason to rethink the 
>>> initialization of structs and offer the explicit guarantee that all 
>>> unassigned elements will be initialized to 0 (and not just the 
>>> jl_value_t 
>>> pointers). I would argue that the current behavior resulted more from a 
>>> desire to avoid clearing the array twice (if the user is about to call 
>>> fill

Re: [julia-users] Security problem with unitialized memory

2014-11-25 Thread Erik Schnetter
On Tue, Nov 25, 2014 at 11:10 AM, Jeff Waller  wrote:
>
>
> On Monday, November 24, 2014 10:54:36 PM UTC-5, Simon Kornblith wrote:
>>
>> In general, arrays cannot be assumed to be 16-byte aligned because it's
>> always possible to create one that isn't using pointer_to_array. However,
>> from Intel's AVX introduction:
>>
>> Intel® AVX has relaxed some memory alignment requirements, so now Intel
>> AVX by default allows unaligned access; however, this access may come at a
>> performance slowdown, so the old rule of designing your data to be memory
>> aligned is still good practice (16-byte aligned for 128-bit access and
>> 32-byte aligned for 256-bit access).
>
>
> And BTW 512 bit AVX registers are coming next year.
> http://en.wikipedia.org/wiki/AVX-512

The are already here ; see e.g.
.

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Security problem with unitialized memory

2014-11-25 Thread Jeff Waller


On Monday, November 24, 2014 10:54:36 PM UTC-5, Simon Kornblith wrote:
>
> In general, arrays cannot be assumed to be 16-byte aligned because it's 
> always possible to create one that isn't using pointer_to_array. However, 
> from Intel's AVX introduction 
> 
> :
>
> Intel® AVX has relaxed some memory alignment requirements, so now Intel 
> AVX by default allows unaligned access; however, this access may come at a 
> performance slowdown, so the old rule of designing your data to be memory 
> aligned is still good practice (16-byte aligned for 128-bit access and 
> 32-byte aligned for 256-bit access).
>

And BTW 512 bit AVX registers are coming next year.  
http://en.wikipedia.org/wiki/AVX-512
 


Re: [julia-users] Security problem with unitialized memory

2014-11-25 Thread Stefan Karpinski
Issue created: https://github.com/JuliaLang/julia/issues/9147.

On Tue, Nov 25, 2014 at 10:16 AM, Stefan Karpinski 
wrote:

> It seems more reasonable to me to always zero uninitialized fields of
> composite values. This is basically free since objects larger than a memory
> page are not common.
>
> On Tue, Nov 25, 2014 at 1:13 AM, Ronald L. Rivest 
> wrote:
>
>> Sorry; zeros() does not work here instead of new().  My mistake.
>> Is there a safe alternative to new() that guarantees that all fields
>> will have a definite fixed value?
>>
>> Cheers,
>> Ron
>>
>>
>> On Tuesday, November 25, 2014 1:05:40 AM UTC-5, Ronald L. Rivest wrote:
>>>
>>> The problem also exists for new() (e.g. when initializing a
>>> record/object).  zeros() can
>>> apparently be used here instead.
>>>
>>> Cheers,
>>> Ron
>>>
>>> On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote:

 Much has been already said on this topic.

 The Array(...) interface was kind of meant to be low-level for the user
 of scientific computing, only to be used when they know what they are
 doing. You get the raw uninitialized memory as fast as possible.

 The user-facing interface was always an array constructor - zeros(),
 ones(), rand(), etc. Some of this is because of our past experience coming
 from a matlab/R-like world.

 As Julia has become more popular, we have realized that those not
 coming from matlab/R end up using all the possible constructors. While this
 has raised a variety of issues, I'd like to say that this will not get
 sorted out satisfactorily before the 0.4 release. For a class that may be
 taught soon, the thing to do would be to use the zeros/ones/rand
 constructors to construct arrays, instead of Array(), which currently is
 more for a package developer. I understand that Array() is a much better
 name as Stefan points out, but zeros() is not too terrible - it at least
 clearly tells the user that they get zeroed out arrays.

 While we have other "features" that can lead to unsafe code (ccall,
 @inbounds), none of these are things one is likely to run into while
 learning the language.

 -viral

 On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest
 wrote:
>
> Regarding initialization:
>
>-- I'm toying with the idea of recommending Julia for an
> introductory programming
>   class (rather than Python).
>
>-- For this purpose, the language should not have hazards that
> catch the unwary.
>
>-- Not initializing storage is definitely a hazard.  With
> uninitialized storage, a
>   program may run fine one day, and fail mysteriously the next,
> depending on
>   the contents of memory.  This is about predictability,
> reliability, dependability,
>   and correctness.
>
>-- I would favor a solution like
>  A = Array(Int64,n)   -- fills with zeros
>  A = Array(Int64,n,fill=1)  -- to fill with ones
>  A = Array(Int64,n,fill=None)-- for an uninitialized
> array
>so that the *default* is an initialized array, but the speed
> geeks
>can get what they want.
>
> Cheers,
> Ron
>
> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>
>> If we can make allocating zeroed arrays faster that's great, but
>> unless we can close the performance gap all the way and eliminate the 
>> need
>> to allocated uninitialized arrays altogether, this proposal is just a
>> rename – Unchecked.Array plays the exact same role as the current
>> Array constructor. It's unclear that this would even address the original
>> concern since it still *allows* uninitialized allocation of arrays. This
>> rename would just force people who have used Array correctly in code that
>> cares about being as efficient as possible even for very large arrays to
>> change their code and use Unchecked.Array instead.
>>
>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>
>> I think that Rivest’s question may be a good reason to rethink the
>> initialization of structs and offer the explicit guarantee that all
>> unassigned elements will be initialized to 0 (and not just the jl_value_t
>> pointers). I would argue that the current behavior resulted more from a
>> desire to avoid clearing the array twice (if the user is about to call
>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of
>> uninitialized memory.
>>
>> A random array of integers is also a security concern if an attacker
>> can extract some other information (with some probability) about the 
>> state
>> of the program. Julia is not hardened by design, so you can’t safely run 
>> an
>> unknown code fragment

Re: [julia-users] Security problem with unitialized memory

2014-11-25 Thread Stefan Karpinski
It seems more reasonable to me to always zero uninitialized fields of
composite values. This is basically free since objects larger than a memory
page are not common.

On Tue, Nov 25, 2014 at 1:13 AM, Ronald L. Rivest 
wrote:

> Sorry; zeros() does not work here instead of new().  My mistake.
> Is there a safe alternative to new() that guarantees that all fields
> will have a definite fixed value?
>
> Cheers,
> Ron
>
>
> On Tuesday, November 25, 2014 1:05:40 AM UTC-5, Ronald L. Rivest wrote:
>>
>> The problem also exists for new() (e.g. when initializing a
>> record/object).  zeros() can
>> apparently be used here instead.
>>
>> Cheers,
>> Ron
>>
>> On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote:
>>>
>>> Much has been already said on this topic.
>>>
>>> The Array(...) interface was kind of meant to be low-level for the user
>>> of scientific computing, only to be used when they know what they are
>>> doing. You get the raw uninitialized memory as fast as possible.
>>>
>>> The user-facing interface was always an array constructor - zeros(),
>>> ones(), rand(), etc. Some of this is because of our past experience coming
>>> from a matlab/R-like world.
>>>
>>> As Julia has become more popular, we have realized that those not coming
>>> from matlab/R end up using all the possible constructors. While this has
>>> raised a variety of issues, I'd like to say that this will not get sorted
>>> out satisfactorily before the 0.4 release. For a class that may be taught
>>> soon, the thing to do would be to use the zeros/ones/rand constructors to
>>> construct arrays, instead of Array(), which currently is more for a package
>>> developer. I understand that Array() is a much better name as Stefan points
>>> out, but zeros() is not too terrible - it at least clearly tells the user
>>> that they get zeroed out arrays.
>>>
>>> While we have other "features" that can lead to unsafe code (ccall,
>>> @inbounds), none of these are things one is likely to run into while
>>> learning the language.
>>>
>>> -viral
>>>
>>> On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest
>>> wrote:

 Regarding initialization:

-- I'm toying with the idea of recommending Julia for an
 introductory programming
   class (rather than Python).

-- For this purpose, the language should not have hazards that catch
 the unwary.

-- Not initializing storage is definitely a hazard.  With
 uninitialized storage, a
   program may run fine one day, and fail mysteriously the next,
 depending on
   the contents of memory.  This is about predictability,
 reliability, dependability,
   and correctness.

-- I would favor a solution like
  A = Array(Int64,n)   -- fills with zeros
  A = Array(Int64,n,fill=1)  -- to fill with ones
  A = Array(Int64,n,fill=None)-- for an uninitialized
 array
so that the *default* is an initialized array, but the speed
 geeks
can get what they want.

 Cheers,
 Ron

 On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>
> If we can make allocating zeroed arrays faster that's great, but
> unless we can close the performance gap all the way and eliminate the need
> to allocated uninitialized arrays altogether, this proposal is just a
> rename – Unchecked.Array plays the exact same role as the current
> Array constructor. It's unclear that this would even address the original
> concern since it still *allows* uninitialized allocation of arrays. This
> rename would just force people who have used Array correctly in code that
> cares about being as efficient as possible even for very large arrays to
> change their code and use Unchecked.Array instead.
>
> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>
> I think that Rivest’s question may be a good reason to rethink the
> initialization of structs and offer the explicit guarantee that all
> unassigned elements will be initialized to 0 (and not just the jl_value_t
> pointers). I would argue that the current behavior resulted more from a
> desire to avoid clearing the array twice (if the user is about to call
> fill, zeros, ones, +, etc.) than an intentional, casual exposure of
> uninitialized memory.
>
> A random array of integers is also a security concern if an attacker
> can extract some other information (with some probability) about the state
> of the program. Julia is not hardened by design, so you can’t safely run 
> an
> unknown code fragment, but you still might have an unintended memory
> exposure in a client-facing app. While zero’ing memory doesn’t prevent the
> user from simply reusing a memory buffer in a security-unaware fashion
> (rather than consistently allocating a new o

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Ronald L. Rivest
Sorry; zeros() does not work here instead of new().  My mistake.
Is there a safe alternative to new() that guarantees that all fields
will have a definite fixed value?

Cheers,
Ron

On Tuesday, November 25, 2014 1:05:40 AM UTC-5, Ronald L. Rivest wrote:
>
> The problem also exists for new() (e.g. when initializing a 
> record/object).  zeros() can
> apparently be used here instead.
>
> Cheers,
> Ron
>
> On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote:
>>
>> Much has been already said on this topic. 
>>
>> The Array(...) interface was kind of meant to be low-level for the user 
>> of scientific computing, only to be used when they know what they are 
>> doing. You get the raw uninitialized memory as fast as possible.
>>
>> The user-facing interface was always an array constructor - zeros(), 
>> ones(), rand(), etc. Some of this is because of our past experience coming 
>> from a matlab/R-like world. 
>>
>> As Julia has become more popular, we have realized that those not coming 
>> from matlab/R end up using all the possible constructors. While this has 
>> raised a variety of issues, I'd like to say that this will not get sorted 
>> out satisfactorily before the 0.4 release. For a class that may be taught 
>> soon, the thing to do would be to use the zeros/ones/rand constructors to 
>> construct arrays, instead of Array(), which currently is more for a package 
>> developer. I understand that Array() is a much better name as Stefan points 
>> out, but zeros() is not too terrible - it at least clearly tells the user 
>> that they get zeroed out arrays.
>>
>> While we have other "features" that can lead to unsafe code (ccall, 
>> @inbounds), none of these are things one is likely to run into while 
>> learning the language.
>>
>> -viral
>>
>> On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest wrote:
>>>
>>> Regarding initialization:
>>>
>>>-- I'm toying with the idea of recommending Julia for an introductory 
>>> programming
>>>   class (rather than Python).  
>>>
>>>-- For this purpose, the language should not have hazards that catch 
>>> the unwary.
>>>
>>>-- Not initializing storage is definitely a hazard.  With 
>>> uninitialized storage, a 
>>>   program may run fine one day, and fail mysteriously the next, 
>>> depending on 
>>>   the contents of memory.  This is about predictability, 
>>> reliability, dependability,
>>>   and correctness.
>>>
>>>-- I would favor a solution like
>>>  A = Array(Int64,n)   -- fills with zeros
>>>  A = Array(Int64,n,fill=1)  -- to fill with ones
>>>  A = Array(Int64,n,fill=None)-- for an uninitialized 
>>> array
>>>so that the *default* is an initialized array, but the speed geeks
>>>can get what they want.
>>>
>>> Cheers,
>>> Ron
>>>
>>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:

 If we can make allocating zeroed arrays faster that's great, but unless 
 we can close the performance gap all the way and eliminate the need to 
 allocated uninitialized arrays altogether, this proposal is just a rename 
 – Unchecked.Array 
 plays the exact same role as the current Array constructor. It's 
 unclear that this would even address the original concern since it still 
 *allows* uninitialized allocation of arrays. This rename would just force 
 people who have used Array correctly in code that cares about being as 
 efficient as possible even for very large arrays to change their code and 
 use Unchecked.Array instead.

 On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:

 I think that Rivest’s question may be a good reason to rethink the 
 initialization of structs and offer the explicit guarantee that all 
 unassigned elements will be initialized to 0 (and not just the jl_value_t 
 pointers). I would argue that the current behavior resulted more from a 
 desire to avoid clearing the array twice (if the user is about to call 
 fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
 uninitialized memory.

 A random array of integers is also a security concern if an attacker 
 can extract some other information (with some probability) about the state 
 of the program. Julia is not hardened by design, so you can’t safely run 
 an 
 unknown code fragment, but you still might have an unintended memory 
 exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
 user from simply reusing a memory buffer in a security-unaware fashion 
 (rather than consistently allocating a new one for each use), it’s not 
 clear to me that the performance penalty would be all that noticeable for 
 map Array(X) to zero(X), and only providing an internal constructor for 
 grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
 #8227)

 On Mon Nov 24

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Ronald L. Rivest
The problem also exists for new() (e.g. when initializing a record/object). 
 zeros() can
apparently be used here instead.

Cheers,
Ron

On Tuesday, November 25, 2014 12:29:07 AM UTC-5, Viral Shah wrote:
>
> Much has been already said on this topic. 
>
> The Array(...) interface was kind of meant to be low-level for the user of 
> scientific computing, only to be used when they know what they are doing. 
> You get the raw uninitialized memory as fast as possible.
>
> The user-facing interface was always an array constructor - zeros(), 
> ones(), rand(), etc. Some of this is because of our past experience coming 
> from a matlab/R-like world. 
>
> As Julia has become more popular, we have realized that those not coming 
> from matlab/R end up using all the possible constructors. While this has 
> raised a variety of issues, I'd like to say that this will not get sorted 
> out satisfactorily before the 0.4 release. For a class that may be taught 
> soon, the thing to do would be to use the zeros/ones/rand constructors to 
> construct arrays, instead of Array(), which currently is more for a package 
> developer. I understand that Array() is a much better name as Stefan points 
> out, but zeros() is not too terrible - it at least clearly tells the user 
> that they get zeroed out arrays.
>
> While we have other "features" that can lead to unsafe code (ccall, 
> @inbounds), none of these are things one is likely to run into while 
> learning the language.
>
> -viral
>
> On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest wrote:
>>
>> Regarding initialization:
>>
>>-- I'm toying with the idea of recommending Julia for an introductory 
>> programming
>>   class (rather than Python).  
>>
>>-- For this purpose, the language should not have hazards that catch 
>> the unwary.
>>
>>-- Not initializing storage is definitely a hazard.  With 
>> uninitialized storage, a 
>>   program may run fine one day, and fail mysteriously the next, 
>> depending on 
>>   the contents of memory.  This is about predictability, reliability, 
>> dependability,
>>   and correctness.
>>
>>-- I would favor a solution like
>>  A = Array(Int64,n)   -- fills with zeros
>>  A = Array(Int64,n,fill=1)  -- to fill with ones
>>  A = Array(Int64,n,fill=None)-- for an uninitialized array
>>so that the *default* is an initialized array, but the speed geeks
>>can get what they want.
>>
>> Cheers,
>> Ron
>>
>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>>
>>> If we can make allocating zeroed arrays faster that's great, but unless 
>>> we can close the performance gap all the way and eliminate the need to 
>>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>>> Unchecked.Array 
>>> plays the exact same role as the current Array constructor. It's 
>>> unclear that this would even address the original concern since it still 
>>> *allows* uninitialized allocation of arrays. This rename would just force 
>>> people who have used Array correctly in code that cares about being as 
>>> efficient as possible even for very large arrays to change their code and 
>>> use Unchecked.Array instead.
>>>
>>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>>
>>> I think that Rivest’s question may be a good reason to rethink the 
>>> initialization of structs and offer the explicit guarantee that all 
>>> unassigned elements will be initialized to 0 (and not just the jl_value_t 
>>> pointers). I would argue that the current behavior resulted more from a 
>>> desire to avoid clearing the array twice (if the user is about to call 
>>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
>>> uninitialized memory.
>>>
>>> A random array of integers is also a security concern if an attacker can 
>>> extract some other information (with some probability) about the state of 
>>> the program. Julia is not hardened by design, so you can’t safely run an 
>>> unknown code fragment, but you still might have an unintended memory 
>>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
>>> user from simply reusing a memory buffer in a security-unaware fashion 
>>> (rather than consistently allocating a new one for each use), it’s not 
>>> clear to me that the performance penalty would be all that noticeable for 
>>> map Array(X) to zero(X), and only providing an internal constructor for 
>>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
>>> #8227)
>>>
>>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
>>> stefan.karpin...@gmail.com  
>>> wrote:
>>>
>>> There are two rather different issues to consider:

 1. Preventing problems due to inadvertent programmer errors.
 2. Preventing malicious security attacks.

 When we initially made this choice, it wasn't clear if 1 woul

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Viral Shah
Much has been already said on this topic. 

The Array(...) interface was kind of meant to be low-level for the user of 
scientific computing, only to be used when they know what they are doing. 
You get the raw uninitialized memory as fast as possible.

The user-facing interface was always an array constructor - zeros(), 
ones(), rand(), etc. Some of this is because of our past experience coming 
from a matlab/R-like world. 

As Julia has become more popular, we have realized that those not coming 
from matlab/R end up using all the possible constructors. While this has 
raised a variety of issues, I'd like to say that this will not get sorted 
out satisfactorily before the 0.4 release. For a class that may be taught 
soon, the thing to do would be to use the zeros/ones/rand constructors to 
construct arrays, instead of Array(), which currently is more for a package 
developer. I understand that Array() is a much better name as Stefan points 
out, but zeros() is not too terrible - it at least clearly tells the user 
that they get zeroed out arrays.

While we have other "features" that can lead to unsafe code (ccall, 
@inbounds), none of these are things one is likely to run into while 
learning the language.

-viral

On Tuesday, November 25, 2014 1:00:10 AM UTC+5:30, Ronald L. Rivest wrote:
>
> Regarding initialization:
>
>-- I'm toying with the idea of recommending Julia for an introductory 
> programming
>   class (rather than Python).  
>
>-- For this purpose, the language should not have hazards that catch 
> the unwary.
>
>-- Not initializing storage is definitely a hazard.  With uninitialized 
> storage, a 
>   program may run fine one day, and fail mysteriously the next, 
> depending on 
>   the contents of memory.  This is about predictability, reliability, 
> dependability,
>   and correctness.
>
>-- I would favor a solution like
>  A = Array(Int64,n)   -- fills with zeros
>  A = Array(Int64,n,fill=1)  -- to fill with ones
>  A = Array(Int64,n,fill=None)-- for an uninitialized array
>so that the *default* is an initialized array, but the speed geeks
>can get what they want.
>
> Cheers,
> Ron
>
> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>
>> If we can make allocating zeroed arrays faster that's great, but unless 
>> we can close the performance gap all the way and eliminate the need to 
>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>> Unchecked.Array 
>> plays the exact same role as the current Array constructor. It's unclear 
>> that this would even address the original concern since it still *allows* 
>> uninitialized allocation of arrays. This rename would just force people who 
>> have used Array correctly in code that cares about being as efficient as 
>> possible even for very large arrays to change their code and use 
>> Unchecked.Array instead.
>>
>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>
>> I think that Rivest’s question may be a good reason to rethink the 
>> initialization of structs and offer the explicit guarantee that all 
>> unassigned elements will be initialized to 0 (and not just the jl_value_t 
>> pointers). I would argue that the current behavior resulted more from a 
>> desire to avoid clearing the array twice (if the user is about to call 
>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
>> uninitialized memory.
>>
>> A random array of integers is also a security concern if an attacker can 
>> extract some other information (with some probability) about the state of 
>> the program. Julia is not hardened by design, so you can’t safely run an 
>> unknown code fragment, but you still might have an unintended memory 
>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
>> user from simply reusing a memory buffer in a security-unaware fashion 
>> (rather than consistently allocating a new one for each use), it’s not 
>> clear to me that the performance penalty would be all that noticeable for 
>> map Array(X) to zero(X), and only providing an internal constructor for 
>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
>> #8227)
>>
>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
>> stefan.karpin...@gmail.com  
>> wrote:
>>
>> There are two rather different issues to consider:
>>>
>>> 1. Preventing problems due to inadvertent programmer errors.
>>> 2. Preventing malicious security attacks.
>>>
>>> When we initially made this choice, it wasn't clear if 1 would be a big 
>>> issue but we decided to see how it played out. It hasn't been a problem in 
>>> practice: once people grok that the Array(T, dims...) constructor gives 
>>> uninitialized memory and that the standard usage pattern is to call it and 
>>> then immediately initialize the memory, everything is ok. I can't 
>>> rec

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Erik Schnetter
On Mon, Nov 24, 2014 at 11:15 PM, Stefan Karpinski  wrote:
> That's not the point - if you already have memory and have to fill it, then
> you're not in any position for the kernel to lazily zero it, so the
> alignment of arbitrary arrays is irrelevant. The point SGJ was making is
> that we want to allocate the memory using something calloc-like so that the
> kernel can do lazy zeroing for us, but we also need that memory to be
> 16-byte aligned, but there is not portable way to get 16-byte-aligned memory
> that the kernel will lazily zero for you. We can have lazy zeroing or
> 16-byte alignment but not both. This makes me wonder if we couldn't just
> allocate 15 bytes more than necessary and return the first address that on a
> 16-byte boundary.

On each system, malloc makes certain guarantees about the alignment of
the pointer it returns. On a 64-bit system, malloc will likely return
memory that is at least aligned to 8 bytes, maybe more. Thus one would
need to allocate 8 additional bytes, not 15.

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Viral Shah
Right - we can always use calloc and then provide aligned memory.

Perhaps this is worth benchmarking. It is still likely to be much faster than 
malloc + memset, because it should have significantly better cache behaviour, 
even though the zeroing is not free. The question is, whether this cost is 
small enough.

-viral



> On 25-Nov-2014, at 9:45 am, Stefan Karpinski  wrote:
> 
> That's not the point – if you already have memory and have to fill it, then 
> you're not in any position for the kernel to lazily zero it, so the alignment 
> of arbitrary arrays is irrelevant. The point SGJ was making is that we want 
> to allocate the memory using something calloc-like so that the kernel can do 
> lazy zeroing for us, but we also need that memory to be 16-byte aligned, but 
> there is not portable way to get 16-byte-aligned memory that the kernel will 
> lazily zero for you. We can have lazy zeroing or 16-byte alignment but not 
> both. This makes me wonder if we couldn't just allocate 15 bytes more than 
> necessary and return the first address that on a 16-byte boundary.
> 
> On Mon, Nov 24, 2014 at 11:02 PM, Viral Shah  wrote:
> To add to the point, you can also get non-aligned stuff with subarrays or 
> results from a ccall.
> 
> -viral
> 
> 
> On Tuesday, November 25, 2014 9:24:36 AM UTC+5:30, Simon Kornblith wrote:
> In general, arrays cannot be assumed to be 16-byte aligned because it's 
> always possible to create one that isn't using pointer_to_array. However, 
> from Intel's AVX introduction:
> 
> Intel® AVX has relaxed some memory alignment requirements, so now Intel AVX 
> by default allows unaligned access; however, this access may come at a 
> performance slowdown, so the old rule of designing your data to be memory 
> aligned is still good practice (16-byte aligned for 128-bit access and 
> 32-byte aligned for 256-bit access).
> 
> On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson 
>  wrote: 
> > Unfortunately, Julia allocates 16-byte aligned data by default (to help 
> > SIMD 
> > code), and there is no calloc version of posix_memalign as far as I know. 
> 
> The generated machine code I've seen does not make use of this. All 
> the load/store instructions in vectorized or unrolled loops assume 
> unaligned pointers. (Plus, with AVX one should align to 32 bytes 
> instead.) 
> 
> -erik 
> 
> -- 
> Erik Schnetter  
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
> 



Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
That's not the point – if you already have memory and have to fill it, then
you're not in any position for the kernel to lazily zero it, so the
alignment of arbitrary arrays is irrelevant. The point SGJ was making is
that we want to allocate the memory using something calloc-like so that the
kernel can do lazy zeroing for us, but we also need that memory to be
16-byte aligned, but there is not portable way to get 16-byte-aligned
memory that the kernel will lazily zero for you. We can have lazy zeroing
or 16-byte alignment but not both. This makes me wonder if we couldn't just
allocate 15 bytes more than necessary and return the first address that on
a 16-byte boundary.

On Mon, Nov 24, 2014 at 11:02 PM, Viral Shah  wrote:

> To add to the point, you can also get non-aligned stuff with subarrays or
> results from a ccall.
>
> -viral
>
>
> On Tuesday, November 25, 2014 9:24:36 AM UTC+5:30, Simon Kornblith wrote:
>>
>> In general, arrays cannot be assumed to be 16-byte aligned because it's
>> always possible to create one that isn't using pointer_to_array.
>> However, from Intel's AVX introduction
>> 
>> :
>>
>> Intel® AVX has relaxed some memory alignment requirements, so now Intel
>> AVX by default allows unaligned access; however, this access may come at a
>> performance slowdown, so the old rule of designing your data to be memory
>> aligned is still good practice (16-byte aligned for 128-bit access and
>> 32-byte aligned for 256-bit access).
>>
>> On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
>>>
>>> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson
>>>  wrote:
>>> > Unfortunately, Julia allocates 16-byte aligned data by default (to
>>> help SIMD
>>> > code), and there is no calloc version of posix_memalign as far as I
>>> know.
>>>
>>> The generated machine code I've seen does not make use of this. All
>>> the load/store instructions in vectorized or unrolled loops assume
>>> unaligned pointers. (Plus, with AVX one should align to 32 bytes
>>> instead.)
>>>
>>> -erik
>>>
>>> --
>>> Erik Schnetter 
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>
>>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Viral Shah
To add to the point, you can also get non-aligned stuff with subarrays or 
results from a ccall.

-viral

On Tuesday, November 25, 2014 9:24:36 AM UTC+5:30, Simon Kornblith wrote:
>
> In general, arrays cannot be assumed to be 16-byte aligned because it's 
> always possible to create one that isn't using pointer_to_array. However, 
> from Intel's AVX introduction 
> 
> :
>
> Intel® AVX has relaxed some memory alignment requirements, so now Intel 
> AVX by default allows unaligned access; however, this access may come at a 
> performance slowdown, so the old rule of designing your data to be memory 
> aligned is still good practice (16-byte aligned for 128-bit access and 
> 32-byte aligned for 256-bit access).
>
> On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
>>
>> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson 
>>  wrote: 
>> > Unfortunately, Julia allocates 16-byte aligned data by default (to help 
>> SIMD 
>> > code), and there is no calloc version of posix_memalign as far as I 
>> know. 
>>
>> The generated machine code I've seen does not make use of this. All 
>> the load/store instructions in vectorized or unrolled loops assume 
>> unaligned pointers. (Plus, with AVX one should align to 32 bytes 
>> instead.) 
>>
>> -erik 
>>
>> -- 
>> Erik Schnetter  
>> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>>
>

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Simon Kornblith
In general, arrays cannot be assumed to be 16-byte aligned because it's 
always possible to create one that isn't using pointer_to_array. However, 
from Intel's AVX introduction 

:

Intel® AVX has relaxed some memory alignment requirements, so now Intel AVX 
by default allows unaligned access; however, this access may come at a 
performance slowdown, so the old rule of designing your data to be memory 
aligned is still good practice (16-byte aligned for 128-bit access and 
32-byte aligned for 256-bit access).

On Monday, November 24, 2014 10:01:45 PM UTC-5, Erik Schnetter wrote:
>
> On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson 
> > wrote: 
> > Unfortunately, Julia allocates 16-byte aligned data by default (to help 
> SIMD 
> > code), and there is no calloc version of posix_memalign as far as I 
> know. 
>
> The generated machine code I've seen does not make use of this. All 
> the load/store instructions in vectorized or unrolled loops assume 
> unaligned pointers. (Plus, with AVX one should align to 32 bytes 
> instead.) 
>
> -erik 
>
> -- 
> Erik Schnetter > 
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Viral Shah
Filling random garbage is even more time consuming than filling zeros! 

-viral

On Tuesday, November 25, 2014 1:26:29 AM UTC+5:30, Stefan Karpinski wrote:
>
> I guess part of the problem is that calling the `zeros` function may be 
> less obvious as a way of constructing an array to many new programmers than 
> calling the `Array` constructor. Having a Boolean fill keyword argument 
> approach might be reasonable, although calling it `zeroed` might be more 
> accurate since we won't be filling the array with a specific value but 
> rather zeroing the memory first. Alternatively, we could just fill the 
> array with random garbage intentionally so that programmers are made 
> painfully aware that they didn't initialize the array ;-)
>
> On Nov 24, 2014, at 2:39 PM, Tomas Lycken  wrote:
>
> That *is* the default usage in most introductory settings - just don't 
> show them the Array(T,n) constructor, but give them zeros and ones 
> functions instead. (It's perfectly fine to do e.g. A = zeros(10); fill!(A, 
> 5) if you don't care about the extra write...)
>
> If there is a specific setting where the students actually *need* to 
> allocate uninitialized memory (e.g. for speed), they are probably ready to 
> learn that the Array constructor gives them that.
>
> Julia's approach has so far seemed to be that the users are consenting 
> adults. I like that approach.
>
> // T
>
> On Monday, November 24, 2014 8:30:10 PM UTC+1, Ronald L. Rivest wrote:
>>
>> Regarding initialization:
>>
>>-- I'm toying with the idea of recommending Julia for an introductory 
>> programming
>>   class (rather than Python).  
>>
>>-- For this purpose, the language should not have hazards that catch 
>> the unwary.
>>
>>-- Not initializing storage is definitely a hazard.  With 
>> uninitialized storage, a 
>>   program may run fine one day, and fail mysteriously the next, 
>> depending on 
>>   the contents of memory.  This is about predictability, reliability, 
>> dependability,
>>   and correctness.
>>
>>-- I would favor a solution like
>>  A = Array(Int64,n)   -- fills with zeros
>>  A = Array(Int64,n,fill=1)  -- to fill with ones
>>  A = Array(Int64,n,fill=None)-- for an uninitialized array
>>so that the *default* is an initialized array, but the speed geeks
>>can get what they want.
>>
>> Cheers,
>> Ron
>>
>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>>
>>> If we can make allocating zeroed arrays faster that's great, but unless 
>>> we can close the performance gap all the way and eliminate the need to 
>>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>>> Unchecked.Array 
>>> plays the exact same role as the current Array constructor. It's 
>>> unclear that this would even address the original concern since it still 
>>> *allows* uninitialized allocation of arrays. This rename would just force 
>>> people who have used Array correctly in code that cares about being as 
>>> efficient as possible even for very large arrays to change their code and 
>>> use Unchecked.Array instead.
>>>
>>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>>
>>> I think that Rivest’s question may be a good reason to rethink the 
>>> initialization of structs and offer the explicit guarantee that all 
>>> unassigned elements will be initialized to 0 (and not just the jl_value_t 
>>> pointers). I would argue that the current behavior resulted more from a 
>>> desire to avoid clearing the array twice (if the user is about to call 
>>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
>>> uninitialized memory.
>>>
>>> A random array of integers is also a security concern if an attacker can 
>>> extract some other information (with some probability) about the state of 
>>> the program. Julia is not hardened by design, so you can’t safely run an 
>>> unknown code fragment, but you still might have an unintended memory 
>>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
>>> user from simply reusing a memory buffer in a security-unaware fashion 
>>> (rather than consistently allocating a new one for each use), it’s not 
>>> clear to me that the performance penalty would be all that noticeable for 
>>> map Array(X) to zero(X), and only providing an internal constructor for 
>>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
>>> #8227)
>>>
>>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
>>> stefan.karpin...@gmail.com  
>>> wrote:
>>>
>>> There are two rather different issues to consider:

 1. Preventing problems due to inadvertent programmer errors.
 2. Preventing malicious security attacks.

 When we initially made this choice, it wasn't clear if 1 would be a big 
 issue but we decided to see how it played out. It hasn't been a problem in 
 pr

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Viral Shah
How much faster is zeros with your PR? IIRC, we used to have something like 
this in the early days.

https://github.com/JuliaLang/julia/pull/9142/

-viral

On Tuesday, November 25, 2014 8:30:00 AM UTC+5:30, Erik Schnetter wrote:
>
> On Mon, Nov 24, 2014 at 7:19 PM, Stefan Karpinski  
> wrote: 
> > Should the comparison actually be more like this: 
> > 
> > julia> @time begin 
> >x = Array(Int,N) 
> >fill!(x,1) 
> >end; 
> > elapsed time: 6.782572096 seconds (800128 bytes allocated) 
> > 
> > julia> @time begin 
> >x = zeros(Int,N) 
> >fill!(x,1) 
> >end; 
> > elapsed time: 14.166256835 seconds (800176 bytes allocated) 
> > 
> > 
> > At least that's the comparison that makes sense for code that allocates 
> and 
> > then initializes an array. I consistently see a 2x slowdown or more. 
>
> My laptop can zero memory at 6+ GByte/sec. The overhead in your case 
> should be about ten times less than what you report. I suspect that 
> what you are measuring is not just the overhead of `zeros` over 
> `Array`, but something else as well. 
>
> Or Julia's `zeros` is not implemented efficiently. Zeroing memory is 
> surprisingly CPU intensive, and one has to call `memset`, or has to 
> ensure that the vectorizer kicks in, and may have to manually unroll 
> the loops. I'm not sure LLVM gets this right by default... 
>
> ... yes, Julia's `zeros` is not as good as it should be. It calls 
> `fill!`, but the generated machine code leaves to be desired. Time to 
> add a special case to `fill!`, delegating things to `memset` in 
> certain cases? 
>
> -erik 
>
> -- 
> Erik Schnetter  
> http://www.perimeterinstitute.ca/personal/eschnetter/ 
>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Erik Schnetter
On Mon, Nov 24, 2014 at 9:30 PM, Steven G. Johnson
 wrote:
> Unfortunately, Julia allocates 16-byte aligned data by default (to help SIMD
> code), and there is no calloc version of posix_memalign as far as I know.

The generated machine code I've seen does not make use of this. All
the load/store instructions in vectorized or unrolled loops assume
unaligned pointers. (Plus, with AVX one should align to 32 bytes
instead.)

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Erik Schnetter
On Mon, Nov 24, 2014 at 7:19 PM, Stefan Karpinski  wrote:
> Should the comparison actually be more like this:
>
> julia> @time begin
>x = Array(Int,N)
>fill!(x,1)
>end;
> elapsed time: 6.782572096 seconds (800128 bytes allocated)
>
> julia> @time begin
>x = zeros(Int,N)
>fill!(x,1)
>end;
> elapsed time: 14.166256835 seconds (800176 bytes allocated)
>
>
> At least that's the comparison that makes sense for code that allocates and
> then initializes an array. I consistently see a 2x slowdown or more.

My laptop can zero memory at 6+ GByte/sec. The overhead in your case
should be about ten times less than what you report. I suspect that
what you are measuring is not just the overhead of `zeros` over
`Array`, but something else as well.

Or Julia's `zeros` is not implemented efficiently. Zeroing memory is
surprisingly CPU intensive, and one has to call `memset`, or has to
ensure that the vectorizer kicks in, and may have to manually unroll
the loops. I'm not sure LLVM gets this right by default...

... yes, Julia's `zeros` is not as good as it should be. It calls
`fill!`, but the generated machine code leaves to be desired. Time to
add a special case to `fill!`, delegating things to `memset` in
certain cases?

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jameson Nash
we could also turn on USE_MMAP by default. all systems should be returning
zero pages for new memory. then we only need to worry about manually
zero'ing things that are smaller than a page

On Mon Nov 24 2014 at 9:30:46 PM Steven G. Johnson 
wrote:

>
>
> On Monday, November 24, 2014 3:05:59 PM UTC-5, Stefan Karpinski wrote:
>>
>> Using calloc could reduce this significantly since the kernel can lazily
>> fill zero pages only when you access them.
>>
>
> Even better, on some systems (e.g. Linux) calloc is implemented via
> copy-on-write of a single zero page, so it only needs to fill new zero
> pages when you write.
>
> Unfortunately, Julia allocates 16-byte aligned data by default (to help
> SIMD code), and there is no calloc version of posix_memalign as far as I
> know.
>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Steven G. Johnson


On Monday, November 24, 2014 3:05:59 PM UTC-5, Stefan Karpinski wrote:
>
> Using calloc could reduce this significantly since the kernel can lazily 
> fill zero pages only when you access them.
>

Even better, on some systems (e.g. Linux) calloc is implemented via 
copy-on-write of a single zero page, so it only needs to fill new zero 
pages when you write.

Unfortunately, Julia allocates 16-byte aligned data by default (to help 
SIMD code), and there is no calloc version of posix_memalign as far as I 
know.


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jacob Quinn
The variance you're seeing is most likely due to the garbage collector
kicking with that much memory being allocated and then abandoned.

On Mon, Nov 24, 2014 at 7:53 PM, David Smith  wrote:

> This is what I was thinking. I just assumed that the fill() time would be
> constant for both and factored that out, not knowing that malloc() was lazy.
>
> I get similar results for Stefan's bench, although the variance is large.
>
> On Monday, November 24, 2014 6:20:21 PM UTC-6, Stefan Karpinski wrote:
>>
>> Should the comparison actually be more like this:
>>
>> julia> @time begin
>>x = Array(Int,N)
>>fill!(x,1)
>>end;
>> elapsed time: 6.782572096 seconds (800128 bytes allocated)
>>
>> julia> @time begin
>>x = zeros(Int,N)
>>fill!(x,1)
>>end;
>> elapsed time: 14.166256835 seconds (800176 bytes allocated)
>>
>>
>> At least that's the comparison that makes sense for code that allocates
>> and then initializes an array. I consistently see a 2x slowdown or more.
>>
>> On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash  wrote:
>>
>>> > But you initialized it in both cases.
>>>
>>> Yes.
>>>
>>> > Is there a compiler optimization going on here that combines the
>>> zeros() and fill()?
>>>
>>> No.
>>>
>>> But there is a kernel optimization going on that complicates this
>>> measurement. Approximately, the memory requested by `malloc` (& friends) is
>>> not actually allocated until you try to read or write to it. So there are
>>> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end],
>>> and fill()), where that second operation is unavoidable, and orders of
>>> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3.
>>> Whereas I measured the speed of 1+2+3 vs 1+2+3+3.
>>>
>>> On Mon Nov 24 2014 at 6:59:50 PM David Smith  wrote:
>>>
 But you initialized it in both cases.  Is there a compiler optimization
 going on here that combines the zeros() and fill()?


 On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:

> yes. the point is to compare the cost of implicitly calling `zero`
> (resulting in the equivalent of calling zero twice) to the cost of not
> initializing the memory before writing to it. I could alternatively have
> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>
> On Mon Nov 24 2014 at 5:57:29 PM David Smith 
> wrote:
>
>> Did you mean to call zeros() in both cases?
>>
>>
>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>>
>>> It appears the fill operation accounts for about 0.15 seconds of the
>>> 6.15 seconds that my OS X laptop takes to create this array:
>>>
>>> $ ./julia -q
>>>
>>> *julia> **N=10^9*
>>>
>>> *10*
>>>
>>>
>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>>
>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71%
>>> gc time)
>>>
>>> *0-element Array{Array{Int64,1},1}*
>>>
>>>
>>> $ ./julia -q
>>>
>>> *julia> **N=10^9*
>>>
>>> *10*
>>>
>>>
>>> *julia> **@time x=zeros(Int64,N)*
>>>
>>> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22%
>>> gc time)
>>>
>>>
>>>
>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter 
>>> wrote:
>>>
 On Mon, Nov 24, 2014 at 3:01 PM, David Smith 
 wrote:
 > To add some data to this conversation, I just timed allocating a
 billion
 > Int64s on my macbook, and I got this (I ran these multiple times
 before this
 > and got similar timings):
 >
 > julia> N=1_000_000_000
 > 10
 >
 > julia> @time x = Array(Int64,N);
 > elapsed time: 0.022577671 seconds (800128 bytes allocated)
 >
 > julia> @time x = zeros(Int64,N);
 > elapsed time: 3.95432248 seconds (800152 bytes allocated)
 >
 > So we are talking adding possibly seconds to a program per large
 array
 > allocation.

 This is not quite right -- the first does not actually map the pages
 into memory; this is only done lazily when they are accessed the
 first
 time. You need to compare "alloc uninitialized; then initialize
 once"
 with "alloc zero-initialized; then initialize again".

 Current high-end system architectures have memory write speeds of
 ten
 or twenty GByte per second; this is what you should see for very
 large
 arrays -- this would be about 0.4 seconds for your case. For smaller
 arrays, the data would reside in the cache, so that the allocation
 overhead should be significantly smaller even.

 -erik

 --

>>> Erik Schnetter 
 http://www

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread David Smith
This is what I was thinking. I just assumed that the fill() time would be 
constant for both and factored that out, not knowing that malloc() was lazy.

I get similar results for Stefan's bench, although the variance is large.

On Monday, November 24, 2014 6:20:21 PM UTC-6, Stefan Karpinski wrote:
>
> Should the comparison actually be more like this:
>
> julia> @time begin
>x = Array(Int,N)
>fill!(x,1)
>end;
> elapsed time: 6.782572096 seconds (800128 bytes allocated)
>
> julia> @time begin
>x = zeros(Int,N)
>fill!(x,1)
>end;
> elapsed time: 14.166256835 seconds (800176 bytes allocated)
>
>
> At least that's the comparison that makes sense for code that allocates 
> and then initializes an array. I consistently see a 2x slowdown or more.
>
> On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash  > wrote:
>
>> > But you initialized it in both cases. 
>>
>> Yes.
>>
>> > Is there a compiler optimization going on here that combines the 
>> zeros() and fill()?
>>
>> No.
>>
>> But there is a kernel optimization going on that complicates this 
>> measurement. Approximately, the memory requested by `malloc` (& friends) is 
>> not actually allocated until you try to read or write to it. So there are 
>> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end], 
>> and fill()), where that second operation is unavoidable, and orders of 
>> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3. 
>> Whereas I measured the speed of 1+2+3 vs 1+2+3+3.
>>
>> On Mon Nov 24 2014 at 6:59:50 PM David Smith > > wrote:
>>
>>> But you initialized it in both cases.  Is there a compiler optimization 
>>> going on here that combines the zeros() and fill()?
>>>
>>>
>>> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>>>
 yes. the point is to compare the cost of implicitly calling `zero` 
 (resulting in the equivalent of calling zero twice) to the cost of not 
 initializing the memory before writing to it. I could alternatively have 
 done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.

 On Mon Nov 24 2014 at 5:57:29 PM David Smith  
 wrote:

> Did you mean to call zeros() in both cases?
>
>
> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>
>> It appears the fill operation accounts for about 0.15 seconds of the 
>> 6.15 seconds that my OS X laptop takes to create this array:
>>
>> $ ./julia -q
>>
>> *julia> **N=10^9*
>>
>> *10*
>>
>>
>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>
>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% 
>> gc time)
>>
>> *0-element Array{Array{Int64,1},1}*
>>
>>
>> $ ./julia -q
>>
>> *julia> **N=10^9*
>>
>> *10*
>>
>>
>> *julia> **@time x=zeros(Int64,N)*
>>
>> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% 
>> gc time)
>>
>>
>>
>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter  
>> wrote:
>>
>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith  
>>> wrote:
>>> > To add some data to this conversation, I just timed allocating a 
>>> billion
>>> > Int64s on my macbook, and I got this (I ran these multiple times 
>>> before this
>>> > and got similar timings):
>>> >
>>> > julia> N=1_000_000_000
>>> > 10
>>> >
>>> > julia> @time x = Array(Int64,N);
>>> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
>>> >
>>> > julia> @time x = zeros(Int64,N);
>>> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
>>> >
>>> > So we are talking adding possibly seconds to a program per large 
>>> array
>>> > allocation.
>>>
>>> This is not quite right -- the first does not actually map the pages
>>> into memory; this is only done lazily when they are accessed the 
>>> first
>>> time. You need to compare "alloc uninitialized; then initialize once"
>>> with "alloc zero-initialized; then initialize again".
>>>
>>> Current high-end system architectures have memory write speeds of ten
>>> or twenty GByte per second; this is what you should see for very 
>>> large
>>> arrays -- this would be about 0.4 seconds for your case. For smaller
>>> arrays, the data would reside in the cache, so that the allocation
>>> overhead should be significantly smaller even.
>>>
>>> -erik
>>>
>>> --
>>>
>> Erik Schnetter 
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>
>>  
>

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
Should the comparison actually be more like this:

julia> @time begin
   x = Array(Int,N)
   fill!(x,1)
   end;
elapsed time: 6.782572096 seconds (800128 bytes allocated)

julia> @time begin
   x = zeros(Int,N)
   fill!(x,1)
   end;
elapsed time: 14.166256835 seconds (800176 bytes allocated)


At least that's the comparison that makes sense for code that allocates and
then initializes an array. I consistently see a 2x slowdown or more.

On Mon, Nov 24, 2014 at 7:09 PM, Jameson Nash  wrote:

> > But you initialized it in both cases.
>
> Yes.
>
> > Is there a compiler optimization going on here that combines the zeros()
> and fill()?
>
> No.
>
> But there is a kernel optimization going on that complicates this
> measurement. Approximately, the memory requested by `malloc` (& friends) is
> not actually allocated until you try to read or write to it. So there are
> in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end],
> and fill()), where that second operation is unavoidable, and orders of
> magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3.
> Whereas I measured the speed of 1+2+3 vs 1+2+3+3.
>
> On Mon Nov 24 2014 at 6:59:50 PM David Smith 
> wrote:
>
>> But you initialized it in both cases.  Is there a compiler optimization
>> going on here that combines the zeros() and fill()?
>>
>>
>> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>>
>>> yes. the point is to compare the cost of implicitly calling `zero`
>>> (resulting in the equivalent of calling zero twice) to the cost of not
>>> initializing the memory before writing to it. I could alternatively have
>>> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>>>
>>> On Mon Nov 24 2014 at 5:57:29 PM David Smith  wrote:
>>>
 Did you mean to call zeros() in both cases?


 On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:

> It appears the fill operation accounts for about 0.15 seconds of the
> 6.15 seconds that my OS X laptop takes to create this array:
>
> $ ./julia -q
>
> *julia> **N=10^9*
>
> *10*
>
>
> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>
> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71%
> gc time)
>
> *0-element Array{Array{Int64,1},1}*
>
>
> $ ./julia -q
>
> *julia> **N=10^9*
>
> *10*
>
>
> *julia> **@time x=zeros(Int64,N)*
>
> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22%
> gc time)
>
>
>
> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter 
> wrote:
>
>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith 
>> wrote:
>> > To add some data to this conversation, I just timed allocating a
>> billion
>> > Int64s on my macbook, and I got this (I ran these multiple times
>> before this
>> > and got similar timings):
>> >
>> > julia> N=1_000_000_000
>> > 10
>> >
>> > julia> @time x = Array(Int64,N);
>> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
>> >
>> > julia> @time x = zeros(Int64,N);
>> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
>> >
>> > So we are talking adding possibly seconds to a program per large
>> array
>> > allocation.
>>
>> This is not quite right -- the first does not actually map the pages
>> into memory; this is only done lazily when they are accessed the first
>> time. You need to compare "alloc uninitialized; then initialize once"
>> with "alloc zero-initialized; then initialize again".
>>
>> Current high-end system architectures have memory write speeds of ten
>> or twenty GByte per second; this is what you should see for very large
>> arrays -- this would be about 0.4 seconds for your case. For smaller
>> arrays, the data would reside in the cache, so that the allocation
>> overhead should be significantly smaller even.
>>
>> -erik
>>
>> --
>>
> Erik Schnetter 
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jameson Nash
> But you initialized it in both cases.

Yes.

> Is there a compiler optimization going on here that combines the zeros()
and fill()?

No.

But there is a kernel optimization going on that complicates this
measurement. Approximately, the memory requested by `malloc` (& friends) is
not actually allocated until you try to read or write to it. So there are
in fact 3 effects here (roughly speaking, they are malloc, A[1:4096:end],
and fill()), where that second operation is unavoidable, and orders of
magnitude slower than the other two. You measured the speed of 1 vs. 1+2+3.
Whereas I measured the speed of 1+2+3 vs 1+2+3+3.

On Mon Nov 24 2014 at 6:59:50 PM David Smith  wrote:

> But you initialized it in both cases.  Is there a compiler optimization
> going on here that combines the zeros() and fill()?
>
>
> On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>
>> yes. the point is to compare the cost of implicitly calling `zero`
>> (resulting in the equivalent of calling zero twice) to the cost of not
>> initializing the memory before writing to it. I could alternatively have
>> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>>
>> On Mon Nov 24 2014 at 5:57:29 PM David Smith  wrote:
>>
>>> Did you mean to call zeros() in both cases?
>>>
>>>
>>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>>>
 It appears the fill operation accounts for about 0.15 seconds of the
 6.15 seconds that my OS X laptop takes to create this array:

 $ ./julia -q

 *julia> **N=10^9*

 *10*


 *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*

 elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc
 time)

 *0-element Array{Array{Int64,1},1}*


 $ ./julia -q

 *julia> **N=10^9*

 *10*


 *julia> **@time x=zeros(Int64,N)*

 elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% gc
 time)



 On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter 
 wrote:

> On Mon, Nov 24, 2014 at 3:01 PM, David Smith 
> wrote:
> > To add some data to this conversation, I just timed allocating a
> billion
> > Int64s on my macbook, and I got this (I ran these multiple times
> before this
> > and got similar timings):
> >
> > julia> N=1_000_000_000
> > 10
> >
> > julia> @time x = Array(Int64,N);
> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
> >
> > julia> @time x = zeros(Int64,N);
> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
> >
> > So we are talking adding possibly seconds to a program per large
> array
> > allocation.
>
> This is not quite right -- the first does not actually map the pages
> into memory; this is only done lazily when they are accessed the first
> time. You need to compare "alloc uninitialized; then initialize once"
> with "alloc zero-initialized; then initialize again".
>
> Current high-end system architectures have memory write speeds of ten
> or twenty GByte per second; this is what you should see for very large
> arrays -- this would be about 0.4 seconds for your case. For smaller
> arrays, the data would reside in the cache, so that the allocation
> overhead should be significantly smaller even.
>
> -erik
>
> --
>
 Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/
>



Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread David Smith
But you initialized it in both cases.  Is there a compiler optimization 
going on here that combines the zeros() and fill()?

On Monday, November 24, 2014 5:12:56 PM UTC-6, Jameson wrote:
>
> yes. the point is to compare the cost of implicitly calling `zero` 
> (resulting in the equivalent of calling zero twice) to the cost of not 
> initializing the memory before writing to it. I could alternatively have 
> done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.
>
> On Mon Nov 24 2014 at 5:57:29 PM David Smith  > wrote:
>
>> Did you mean to call zeros() in both cases?
>>
>>
>> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>>
>>> It appears the fill operation accounts for about 0.15 seconds of the 
>>> 6.15 seconds that my OS X laptop takes to create this array:
>>>
>>> $ ./julia -q
>>>
>>> *julia> **N=10^9*
>>>
>>> *10*
>>>
>>>
>>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>>
>>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc 
>>> time)
>>>
>>> *0-element Array{Array{Int64,1},1}*
>>>
>>>
>>> $ ./julia -q
>>>
>>> *julia> **N=10^9*
>>>
>>> *10*
>>>
>>>
>>> *julia> **@time x=zeros(Int64,N)*
>>>
>>> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% gc 
>>> time)
>>>
>>>
>>>
>>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter  
>>> wrote:
>>>
 On Mon, Nov 24, 2014 at 3:01 PM, David Smith  
 wrote:
 > To add some data to this conversation, I just timed allocating a 
 billion
 > Int64s on my macbook, and I got this (I ran these multiple times 
 before this
 > and got similar timings):
 >
 > julia> N=1_000_000_000
 > 10
 >
 > julia> @time x = Array(Int64,N);
 > elapsed time: 0.022577671 seconds (800128 bytes allocated)
 >
 > julia> @time x = zeros(Int64,N);
 > elapsed time: 3.95432248 seconds (800152 bytes allocated)
 >
 > So we are talking adding possibly seconds to a program per large array
 > allocation.

 This is not quite right -- the first does not actually map the pages
 into memory; this is only done lazily when they are accessed the first
 time. You need to compare "alloc uninitialized; then initialize once"
 with "alloc zero-initialized; then initialize again".

 Current high-end system architectures have memory write speeds of ten
 or twenty GByte per second; this is what you should see for very large
 arrays -- this would be about 0.4 seconds for your case. For smaller
 arrays, the data would reside in the cache, so that the allocation
 overhead should be significantly smaller even.

 -erik

 --

>>> Erik Schnetter 
 http://www.perimeterinstitute.ca/personal/eschnetter/

>>>  

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jameson Nash
yes. the point is to compare the cost of implicitly calling `zero`
(resulting in the equivalent of calling zero twice) to the cost of not
initializing the memory before writing to it. I could alternatively have
done: `@time x=zeros(); @time fill(x, 0)` to measure the same information.

On Mon Nov 24 2014 at 5:57:29 PM David Smith  wrote:

> Did you mean to call zeros() in both cases?
>
>
> On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>
>> It appears the fill operation accounts for about 0.15 seconds of the 6.15
>> seconds that my OS X laptop takes to create this array:
>>
>> $ ./julia -q
>>
>> *julia> **N=10^9*
>>
>> *10*
>>
>>
>> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>>
>> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc
>> time)
>>
>> *0-element Array{Array{Int64,1},1}*
>>
>>
>> $ ./julia -q
>>
>> *julia> **N=10^9*
>>
>> *10*
>>
>>
>> *julia> **@time x=zeros(Int64,N)*
>>
>> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% gc
>> time)
>>
>>
>>
>> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter 
>> wrote:
>>
>>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith  wrote:
>>> > To add some data to this conversation, I just timed allocating a
>>> billion
>>> > Int64s on my macbook, and I got this (I ran these multiple times
>>> before this
>>> > and got similar timings):
>>> >
>>> > julia> N=1_000_000_000
>>> > 10
>>> >
>>> > julia> @time x = Array(Int64,N);
>>> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
>>> >
>>> > julia> @time x = zeros(Int64,N);
>>> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
>>> >
>>> > So we are talking adding possibly seconds to a program per large array
>>> > allocation.
>>>
>>> This is not quite right -- the first does not actually map the pages
>>> into memory; this is only done lazily when they are accessed the first
>>> time. You need to compare "alloc uninitialized; then initialize once"
>>> with "alloc zero-initialized; then initialize again".
>>>
>>> Current high-end system architectures have memory write speeds of ten
>>> or twenty GByte per second; this is what you should see for very large
>>> arrays -- this would be about 0.4 seconds for your case. For smaller
>>> arrays, the data would reside in the cache, so that the allocation
>>> overhead should be significantly smaller even.
>>>
>>> -erik
>>>
>>> --
>>>
>> Erik Schnetter 
>>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>>
>>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread David Smith
Did you mean to call zeros() in both cases?

On Monday, November 24, 2014 3:09:38 PM UTC-6, Jameson wrote:
>
> It appears the fill operation accounts for about 0.15 seconds of the 6.15 
> seconds that my OS X laptop takes to create this array:
>
> $ ./julia -q
>
> *julia> **N=10^9*
>
> *10*
>
>
> *julia> **@time begin x=zeros(Int64,N); fill(x,0) end*
>
> elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc 
> time)
>
> *0-element Array{Array{Int64,1},1}*
>
>
> $ ./julia -q
>
> *julia> **N=10^9*
>
> *10*
>
>
> *julia> **@time x=zeros(Int64,N)*
>
> elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% gc 
> time)
>
>
>
> On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter  > wrote:
>
>> On Mon, Nov 24, 2014 at 3:01 PM, David Smith > > wrote:
>> > To add some data to this conversation, I just timed allocating a billion
>> > Int64s on my macbook, and I got this (I ran these multiple times before 
>> this
>> > and got similar timings):
>> >
>> > julia> N=1_000_000_000
>> > 10
>> >
>> > julia> @time x = Array(Int64,N);
>> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
>> >
>> > julia> @time x = zeros(Int64,N);
>> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
>> >
>> > So we are talking adding possibly seconds to a program per large array
>> > allocation.
>>
>> This is not quite right -- the first does not actually map the pages
>> into memory; this is only done lazily when they are accessed the first
>> time. You need to compare "alloc uninitialized; then initialize once"
>> with "alloc zero-initialized; then initialize again".
>>
>> Current high-end system architectures have memory write speeds of ten
>> or twenty GByte per second; this is what you should see for very large
>> arrays -- this would be about 0.4 seconds for your case. For smaller
>> arrays, the data would reside in the cache, so that the allocation
>> overhead should be significantly smaller even.
>>
>> -erik
>>
>> --
>> Erik Schnetter >
>> http://www.perimeterinstitute.ca/personal/eschnetter/
>>
>

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jameson Nash
It appears the fill operation accounts for about 0.15 seconds of the 6.15
seconds that my OS X laptop takes to create this array:

$ ./julia -q

*julia> **N=10^9*

*10*


*julia> **@time begin x=zeros(Int64,N); fill(x,0) end*

elapsed time: 6.325660691 seconds (8000136616 bytes allocated, 1.71% gc
time)

*0-element Array{Array{Int64,1},1}*


$ ./julia -q

*julia> **N=10^9*

*10*


*julia> **@time x=zeros(Int64,N)*

elapsed time: 6.160623835 seconds (814320 bytes allocated, 0.22% gc
time)



On Mon Nov 24 2014 at 3:18:39 PM Erik Schnetter 
wrote:

> On Mon, Nov 24, 2014 at 3:01 PM, David Smith 
> wrote:
> > To add some data to this conversation, I just timed allocating a billion
> > Int64s on my macbook, and I got this (I ran these multiple times before
> this
> > and got similar timings):
> >
> > julia> N=1_000_000_000
> > 10
> >
> > julia> @time x = Array(Int64,N);
> > elapsed time: 0.022577671 seconds (800128 bytes allocated)
> >
> > julia> @time x = zeros(Int64,N);
> > elapsed time: 3.95432248 seconds (800152 bytes allocated)
> >
> > So we are talking adding possibly seconds to a program per large array
> > allocation.
>
> This is not quite right -- the first does not actually map the pages
> into memory; this is only done lazily when they are accessed the first
> time. You need to compare "alloc uninitialized; then initialize once"
> with "alloc zero-initialized; then initialize again".
>
> Current high-end system architectures have memory write speeds of ten
> or twenty GByte per second; this is what you should see for very large
> arrays -- this would be about 0.4 seconds for your case. For smaller
> arrays, the data would reside in the cache, so that the allocation
> overhead should be significantly smaller even.
>
> -erik
>
> --
> Erik Schnetter 
> http://www.perimeterinstitute.ca/personal/eschnetter/
>


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Erik Schnetter
On Mon, Nov 24, 2014 at 3:01 PM, David Smith  wrote:
> To add some data to this conversation, I just timed allocating a billion
> Int64s on my macbook, and I got this (I ran these multiple times before this
> and got similar timings):
>
> julia> N=1_000_000_000
> 10
>
> julia> @time x = Array(Int64,N);
> elapsed time: 0.022577671 seconds (800128 bytes allocated)
>
> julia> @time x = zeros(Int64,N);
> elapsed time: 3.95432248 seconds (800152 bytes allocated)
>
> So we are talking adding possibly seconds to a program per large array
> allocation.

This is not quite right -- the first does not actually map the pages
into memory; this is only done lazily when they are accessed the first
time. You need to compare "alloc uninitialized; then initialize once"
with "alloc zero-initialized; then initialize again".

Current high-end system architectures have memory write speeds of ten
or twenty GByte per second; this is what you should see for very large
arrays -- this would be about 0.4 seconds for your case. For smaller
arrays, the data would reside in the cache, so that the allocation
overhead should be significantly smaller even.

-erik

-- 
Erik Schnetter 
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
Using calloc could reduce this significantly since the kernel can lazily fill 
zero pages only when you access them. 


> On Nov 24, 2014, at 3:01 PM, David Smith  wrote:
> 
> To add some data to this conversation, I just timed allocating a billion 
> Int64s on my macbook, and I got this (I ran these multiple times before this 
> and got similar timings):
> 
> julia> N=1_000_000_000 
> 10 
> 
> julia> @time x = Array(Int64,N); 
> elapsed time: 0.022577671 seconds (800128 bytes allocated) 
> 
> julia> @time x = zeros(Int64,N); 
> elapsed time: 3.95432248 seconds (800152 bytes allocated) 
> 
> So we are talking adding possibly seconds to a program per large array 
> allocation.
> 
> 
> 


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread David Smith
To add some data to this conversation, I just timed allocating a billion 
Int64s on my macbook, and I got this (I ran these multiple times before 
this and got similar timings):

julia> N=1_000_000_000 
10 

julia> @time x = Array(Int64,N); 
elapsed time: 0.022577671 seconds (800128 bytes allocated) 

julia> @time x = zeros(Int64,N); 
elapsed time: 3.95432248 seconds (800152 bytes allocated) 

So we are talking adding possibly seconds to a program per large array 
allocation.




Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
I guess part of the problem is that calling the `zeros` function may be less 
obvious as a way of constructing an array to many new programmers than calling 
the `Array` constructor. Having a Boolean fill keyword argument approach might 
be reasonable, although calling it `zeroed` might be more accurate since we 
won't be filling the array with a specific value but rather zeroing the memory 
first. Alternatively, we could just fill the array with random garbage 
intentionally so that programmers are made painfully aware that they didn't 
initialize the array ;-)

> On Nov 24, 2014, at 2:39 PM, Tomas Lycken  wrote:
> 
> That *is* the default usage in most introductory settings - just don't show 
> them the Array(T,n) constructor, but give them zeros and ones functions 
> instead. (It's perfectly fine to do e.g. A = zeros(10); fill!(A, 5) if you 
> don't care about the extra write...)
> 
> If there is a specific setting where the students actually *need* to allocate 
> uninitialized memory (e.g. for speed), they are probably ready to learn that 
> the Array constructor gives them that.
> 
> Julia's approach has so far seemed to be that the users are consenting 
> adults. I like that approach.
> 
> // T
> 
>> On Monday, November 24, 2014 8:30:10 PM UTC+1, Ronald L. Rivest wrote:
>> Regarding initialization:
>> 
>>-- I'm toying with the idea of recommending Julia for an introductory 
>> programming
>>   class (rather than Python).  
>> 
>>-- For this purpose, the language should not have hazards that catch the 
>> unwary.
>> 
>>-- Not initializing storage is definitely a hazard.  With uninitialized 
>> storage, a 
>>   program may run fine one day, and fail mysteriously the next, 
>> depending on 
>>   the contents of memory.  This is about predictability, reliability, 
>> dependability,
>>   and correctness.
>> 
>>-- I would favor a solution like
>>  A = Array(Int64,n)   -- fills with zeros
>>  A = Array(Int64,n,fill=1)  -- to fill with ones
>>  A = Array(Int64,n,fill=None)-- for an uninitialized array
>>so that the *default* is an initialized array, but the speed geeks
>>can get what they want.
>> 
>> Cheers,
>> Ron
>> 
>>> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>> If we can make allocating zeroed arrays faster that's great, but unless we 
>>> can close the performance gap all the way and eliminate the need to 
>>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>>> Unchecked.Array plays the exact same role as the current Array constructor. 
>>> It's unclear that this would even address the original concern since it 
>>> still *allows* uninitialized allocation of arrays. This rename would just 
>>> force people who have used Array correctly in code that cares about being 
>>> as efficient as possible even for very large arrays to change their code 
>>> and use Unchecked.Array instead.
>>> 
 On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
 
 I think that Rivest’s question may be a good reason to rethink the 
 initialization of structs and offer the explicit guarantee that all 
 unassigned elements will be initialized to 0 (and not just the jl_value_t 
 pointers). I would argue that the current behavior resulted more from a 
 desire to avoid clearing the array twice (if the user is about to call 
 fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
 uninitialized memory.
 
 A random array of integers is also a security concern if an attacker can 
 extract some other information (with some probability) about the state of 
 the program. Julia is not hardened by design, so you can’t safely run an 
 unknown code fragment, but you still might have an unintended memory 
 exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
 user from simply reusing a memory buffer in a security-unaware fashion 
 (rather than consistently allocating a new one for each use), it’s not 
 clear to me that the performance penalty would be all that noticeable for 
 map Array(X) to zero(X), and only providing an internal constructor for 
 grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from #8227)
 
 On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
 stefan.karpin...@gmail.com wrote:
 
> There are two rather different issues to consider:
> 
> 1. Preventing problems due to inadvertent programmer errors.
> 2. Preventing malicious security attacks.
> 
> When we initially made this choice, it wasn't clear if 1 would be a big 
> issue but we decided to see how it played out. It hasn't been a problem 
> in practice: once people grok that the Array(T, dims...) constructor 
> gives uninitialized memory and that the standard usage pattern is to call 
> it and then immediately initialize 

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Tomas Lycken
That *is* the default usage in most introductory settings - just don't show 
them the Array(T,n) constructor, but give them zeros and ones functions 
instead. (It's perfectly fine to do e.g. A = zeros(10); fill!(A, 5) if you 
don't care about the extra write...)

If there is a specific setting where the students actually *need* to 
allocate uninitialized memory (e.g. for speed), they are probably ready to 
learn that the Array constructor gives them that.

Julia's approach has so far seemed to be that the users are consenting 
adults. I like that approach.

// T

On Monday, November 24, 2014 8:30:10 PM UTC+1, Ronald L. Rivest wrote:
>
> Regarding initialization:
>
>-- I'm toying with the idea of recommending Julia for an introductory 
> programming
>   class (rather than Python).  
>
>-- For this purpose, the language should not have hazards that catch 
> the unwary.
>
>-- Not initializing storage is definitely a hazard.  With uninitialized 
> storage, a 
>   program may run fine one day, and fail mysteriously the next, 
> depending on 
>   the contents of memory.  This is about predictability, reliability, 
> dependability,
>   and correctness.
>
>-- I would favor a solution like
>  A = Array(Int64,n)   -- fills with zeros
>  A = Array(Int64,n,fill=1)  -- to fill with ones
>  A = Array(Int64,n,fill=None)-- for an uninitialized array
>so that the *default* is an initialized array, but the speed geeks
>can get what they want.
>
> Cheers,
> Ron
>
> On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>>
>> If we can make allocating zeroed arrays faster that's great, but unless 
>> we can close the performance gap all the way and eliminate the need to 
>> allocated uninitialized arrays altogether, this proposal is just a rename – 
>> Unchecked.Array 
>> plays the exact same role as the current Array constructor. It's unclear 
>> that this would even address the original concern since it still *allows* 
>> uninitialized allocation of arrays. This rename would just force people who 
>> have used Array correctly in code that cares about being as efficient as 
>> possible even for very large arrays to change their code and use 
>> Unchecked.Array instead.
>>
>> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
>>
>> I think that Rivest’s question may be a good reason to rethink the 
>> initialization of structs and offer the explicit guarantee that all 
>> unassigned elements will be initialized to 0 (and not just the jl_value_t 
>> pointers). I would argue that the current behavior resulted more from a 
>> desire to avoid clearing the array twice (if the user is about to call 
>> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
>> uninitialized memory.
>>
>> A random array of integers is also a security concern if an attacker can 
>> extract some other information (with some probability) about the state of 
>> the program. Julia is not hardened by design, so you can’t safely run an 
>> unknown code fragment, but you still might have an unintended memory 
>> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
>> user from simply reusing a memory buffer in a security-unaware fashion 
>> (rather than consistently allocating a new one for each use), it’s not 
>> clear to me that the performance penalty would be all that noticeable for 
>> map Array(X) to zero(X), and only providing an internal constructor for 
>> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from 
>> #8227)
>>
>> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
>> stefan.karpin...@gmail.com  
>> wrote:
>>
>> There are two rather different issues to consider:
>>>
>>> 1. Preventing problems due to inadvertent programmer errors.
>>> 2. Preventing malicious security attacks.
>>>
>>> When we initially made this choice, it wasn't clear if 1 would be a big 
>>> issue but we decided to see how it played out. It hasn't been a problem in 
>>> practice: once people grok that the Array(T, dims...) constructor gives 
>>> uninitialized memory and that the standard usage pattern is to call it and 
>>> then immediately initialize the memory, everything is ok. I can't 
>>> recall a single situation where someone has had some terrible bug due to 
>>> uninitialized int/float arrays.
>>>
>>> Regarding 2, Julia is not intended to be a hardened language for writing 
>>> highly secure software. It allows all sorts of unsafe actions: pointer 
>>> arithmetic, direct memory access, calling arbitrary C functions, etc. The 
>>> future of really secure software seems to be small formally verified 
>>> kernels written in statically typed languages that communicate with larger 
>>> unverified systems over restricted channels. Julia might be appropriate for 
>>> the larger unverified system but certainly not for the trusted kernel. 
>>> Adding enoug

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Ronald L. Rivest
Regarding initialization:

   -- I'm toying with the idea of recommending Julia for an introductory 
programming
  class (rather than Python).  

   -- For this purpose, the language should not have hazards that catch the 
unwary.

   -- Not initializing storage is definitely a hazard.  With uninitialized 
storage, a 
  program may run fine one day, and fail mysteriously the next, 
depending on 
  the contents of memory.  This is about predictability, reliability, 
dependability,
  and correctness.

   -- I would favor a solution like
 A = Array(Int64,n)   -- fills with zeros
 A = Array(Int64,n,fill=1)  -- to fill with ones
 A = Array(Int64,n,fill=None)-- for an uninitialized array
   so that the *default* is an initialized array, but the speed geeks
   can get what they want.

Cheers,
Ron

On Monday, November 24, 2014 1:57:14 PM UTC-5, Stefan Karpinski wrote:
>
> If we can make allocating zeroed arrays faster that's great, but unless we 
> can close the performance gap all the way and eliminate the need to 
> allocated uninitialized arrays altogether, this proposal is just a rename – 
> Unchecked.Array 
> plays the exact same role as the current Array constructor. It's unclear 
> that this would even address the original concern since it still *allows* 
> uninitialized allocation of arrays. This rename would just force people who 
> have used Array correctly in code that cares about being as efficient as 
> possible even for very large arrays to change their code and use 
> Unchecked.Array instead.
>
> On Nov 24, 2014, at 1:36 PM, Jameson Nash > 
> wrote:
>
> I think that Rivest’s question may be a good reason to rethink the 
> initialization of structs and offer the explicit guarantee that all 
> unassigned elements will be initialized to 0 (and not just the jl_value_t 
> pointers). I would argue that the current behavior resulted more from a 
> desire to avoid clearing the array twice (if the user is about to call 
> fill, zeros, ones, +, etc.) than an intentional, casual exposure of 
> uninitialized memory.
>
> A random array of integers is also a security concern if an attacker can 
> extract some other information (with some probability) about the state of 
> the program. Julia is not hardened by design, so you can’t safely run an 
> unknown code fragment, but you still might have an unintended memory 
> exposure in a client-facing app. While zero’ing memory doesn’t prevent the 
> user from simply reusing a memory buffer in a security-unaware fashion 
> (rather than consistently allocating a new one for each use), it’s not 
> clear to me that the performance penalty would be all that noticeable for 
> map Array(X) to zero(X), and only providing an internal constructor for 
> grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from #8227)
>
> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski 
> stefan.karpin...@gmail.com  
> wrote:
>
> There are two rather different issues to consider:
>>
>> 1. Preventing problems due to inadvertent programmer errors.
>> 2. Preventing malicious security attacks.
>>
>> When we initially made this choice, it wasn't clear if 1 would be a big 
>> issue but we decided to see how it played out. It hasn't been a problem in 
>> practice: once people grok that the Array(T, dims...) constructor gives 
>> uninitialized memory and that the standard usage pattern is to call it and 
>> then immediately initialize the memory, everything is ok. I can't recall 
>> a single situation where someone has had some terrible bug due to 
>> uninitialized int/float arrays.
>>
>> Regarding 2, Julia is not intended to be a hardened language for writing 
>> highly secure software. It allows all sorts of unsafe actions: pointer 
>> arithmetic, direct memory access, calling arbitrary C functions, etc. The 
>> future of really secure software seems to be small formally verified 
>> kernels written in statically typed languages that communicate with larger 
>> unverified systems over restricted channels. Julia might be appropriate for 
>> the larger unverified system but certainly not for the trusted kernel. 
>> Adding enough verification to Julia to write secure kernels is not 
>> inconceivable, but would be a major research effort. The implementation 
>> would have to check lots of things, including, of course, ensuring that all 
>> arrays are initialized.
>>
>> A couple of other points:
>>
>> Modern OSes protect against data leaking between processes by zeroing 
>> pages before a process first accesses them. Thus any data exposed by 
>> Array(T, dims...) comes from the same process and is not a security leak.
>>
>> An uninitialized array of, say, integers is not in itself a security 
>> concern – the issue is what you do with those integers. The classic 
>> security hole is to use a "random" value from uninitialized memory to 
>> access other memory by using

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
If we can make allocating zeroed arrays faster that's great, but unless we can 
close the performance gap all the way and eliminate the need to allocated 
uninitialized arrays altogether, this proposal is just a rename – 
Unchecked.Array plays the exact same role as the current Array constructor. 
It's unclear that this would even address the original concern since it still 
*allows* uninitialized allocation of arrays. This rename would just force 
people who have used Array correctly in code that cares about being as 
efficient as possible even for very large arrays to change their code and use 
Unchecked.Array instead.

> On Nov 24, 2014, at 1:36 PM, Jameson Nash  wrote:
> 
> I think that Rivest’s question may be a good reason to rethink the 
> initialization of structs and offer the explicit guarantee that all 
> unassigned elements will be initialized to 0 (and not just the jl_value_t 
> pointers). I would argue that the current behavior resulted more from a 
> desire to avoid clearing the array twice (if the user is about to call fill, 
> zeros, ones, +, etc.) than an intentional, casual exposure of uninitialized 
> memory.
> 
> A random array of integers is also a security concern if an attacker can 
> extract some other information (with some probability) about the state of the 
> program. Julia is not hardened by design, so you can’t safely run an unknown 
> code fragment, but you still might have an unintended memory exposure in a 
> client-facing app. While zero’ing memory doesn’t prevent the user from simply 
> reusing a memory buffer in a security-unaware fashion (rather than 
> consistently allocating a new one for each use), it’s not clear to me that 
> the performance penalty would be all that noticeable for map Array(X) to 
> zero(X), and only providing an internal constructor for grabbing 
> uninitialized memory (perhaps Base.Unchecked.Array(X) from #8227)
> 
> On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski stefan.karpin...@gmail.com 
> wrote:
> 
>> There are two rather different issues to consider:
>> 
>> 1. Preventing problems due to inadvertent programmer errors.
>> 2. Preventing malicious security attacks.
>> 
>> When we initially made this choice, it wasn't clear if 1 would be a big 
>> issue but we decided to see how it played out. It hasn't been a problem in 
>> practice: once people grok that the Array(T, dims...) constructor gives 
>> uninitialized memory and that the standard usage pattern is to call it and 
>> then immediately initialize the memory, everything is ok. I can't recall a 
>> single situation where someone has had some terrible bug due to 
>> uninitialized int/float arrays.
>> 
>> Regarding 2, Julia is not intended to be a hardened language for writing 
>> highly secure software. It allows all sorts of unsafe actions: pointer 
>> arithmetic, direct memory access, calling arbitrary C functions, etc. The 
>> future of really secure software seems to be small formally verified kernels 
>> written in statically typed languages that communicate with larger 
>> unverified systems over restricted channels. Julia might be appropriate for 
>> the larger unverified system but certainly not for the trusted kernel. 
>> Adding enough verification to Julia to write secure kernels is not 
>> inconceivable, but would be a major research effort. The implementation 
>> would have to check lots of things, including, of course, ensuring that all 
>> arrays are initialized.
>> 
>> A couple of other points:
>> 
>> Modern OSes protect against data leaking between processes by zeroing pages 
>> before a process first accesses them. Thus any data exposed by Array(T, 
>> dims...) comes from the same process and is not a security leak.
>> 
>> An uninitialized array of, say, integers is not in itself a security concern 
>> – the issue is what you do with those integers. The classic security hole is 
>> to use a "random" value from uninitialized memory to access other memory by 
>> using it to index into an array or otherwise convert it to a pointer. In the 
>> presence of bounds checking, however, this isn't actually a big concern 
>> since you will still either get a bounds error or a valid array value – not 
>> a meaningful one, of course, but still just a value.
>> 
>> Writing programs that are secure against malicious attacks is a hard, 
>> unsolved problem. So is doing efficient, productive high-level numerical 
>> programming. Trying to solve both problems at the same time seems like a 
>> recipe for failing at both.
>> 
>>> On Nov 24, 2014, at 11:43 AM, David Smith  wrote:
>>> 
>>> Some ideas:
>>> 
>>> Is there a way to return an error for accesses before at least one 
>>> assignment in bits types?  I.e. when the object is created uninitialized it 
>>> is marked "dirty" and only after assignment of some user values can it be 
>>> "cleanly" accessed?
>>> 
>>> Can Julia provide a thin memory management layer that grabs memory from the 
>>> OS first, zeroes it, and then gives it to the 

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Jameson Nash
I think that Rivest’s question may be a good reason to rethink the
initialization of structs and offer the explicit guarantee that all
unassigned elements will be initialized to 0 (and not just the jl_value_t
pointers). I would argue that the current behavior resulted more from a
desire to avoid clearing the array twice (if the user is about to call
fill, zeros, ones, +, etc.) than an intentional, casual exposure of
uninitialized memory.

A random array of integers is also a security concern if an attacker can
extract some other information (with some probability) about the state of
the program. Julia is not hardened by design, so you can’t safely run an
unknown code fragment, but you still might have an unintended memory
exposure in a client-facing app. While zero’ing memory doesn’t prevent the
user from simply reusing a memory buffer in a security-unaware fashion
(rather than consistently allocating a new one for each use), it’s not
clear to me that the performance penalty would be all that noticeable for
map Array(X) to zero(X), and only providing an internal constructor for
grabbing uninitialized memory (perhaps Base.Unchecked.Array(X) from #8227)

On Mon Nov 24 2014 at 12:57:22 PM Stefan Karpinski
stefan.karpin...@gmail.com  wrote:

There are two rather different issues to consider:
>
> 1. Preventing problems due to inadvertent programmer errors.
> 2. Preventing malicious security attacks.
>
> When we initially made this choice, it wasn't clear if 1 would be a big
> issue but we decided to see how it played out. It hasn't been a problem in
> practice: once people grok that the Array(T, dims...) constructor gives
> uninitialized memory and that the standard usage pattern is to call it and
> then immediately initialize the memory, everything is ok. I can't recall
> a single situation where someone has had some terrible bug due to
> uninitialized int/float arrays.
>
> Regarding 2, Julia is not intended to be a hardened language for writing
> highly secure software. It allows all sorts of unsafe actions: pointer
> arithmetic, direct memory access, calling arbitrary C functions, etc. The
> future of really secure software seems to be small formally verified
> kernels written in statically typed languages that communicate with larger
> unverified systems over restricted channels. Julia might be appropriate for
> the larger unverified system but certainly not for the trusted kernel.
> Adding enough verification to Julia to write secure kernels is not
> inconceivable, but would be a major research effort. The implementation
> would have to check lots of things, including, of course, ensuring that all
> arrays are initialized.
>
> A couple of other points:
>
> Modern OSes protect against data leaking between processes by zeroing
> pages before a process first accesses them. Thus any data exposed by
> Array(T, dims...) comes from the same process and is not a security leak.
>
> An uninitialized array of, say, integers is not in itself a security
> concern – the issue is what you do with those integers. The classic
> security hole is to use a "random" value from uninitialized memory to
> access other memory by using it to index into an array or otherwise convert
> it to a pointer. In the presence of bounds checking, however, this isn't
> actually a big concern since you will still either get a bounds error or a
> valid array value – not a meaningful one, of course, but still just a value.
>
> Writing programs that are secure against malicious attacks is a hard,
> unsolved problem. So is doing efficient, productive high-level numerical
> programming. Trying to solve both problems at the same time seems like a
> recipe for failing at both.
>
> On Nov 24, 2014, at 11:43 AM, David Smith  wrote:
>
> Some ideas:
>
> Is there a way to return an error for accesses before at least one
> assignment in bits types?  I.e. when the object is created uninitialized it
> is marked "dirty" and only after assignment of some user values can it be
> "cleanly" accessed?
>
> Can Julia provide a thin memory management layer that grabs memory from
> the OS first, zeroes it, and then gives it to the user upon initial
> allocation?  After gc+reallocation it doesn't need to be zeroed again,
> unless the next allocation is larger than anything previous, at which time
> Julia grabs more memory, sanitizes it, and hands it off.
>
> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote:
>>
>> Pointer types will initialise to undef and any operation on them fails:
>> julia> a = Array(ASCIIString, 5);
>>
>> julia> a[1]
>> ERROR: access to undefined reference
>>  in getindex at array.jl:246
>>
>> But you're right, for bits-types this is not an error an will just
>> return whatever was there before.  I think the reason this will stay
>> that way is that Julia is a numerics oriented language.  Thus you many
>> wanna create a 1GB array of Float64 and then fill it with something as
>> opposed to first fil

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Stefan Karpinski
There are two rather different issues to consider:

1. Preventing problems due to inadvertent programmer errors.
2. Preventing malicious security attacks.

When we initially made this choice, it wasn't clear if 1 would be a big issue 
but we decided to see how it played out. It hasn't been a problem in practice: 
once people grok that the Array(T, dims...) constructor gives uninitialized 
memory and that the standard usage pattern is to call it and then immediately 
initialize the memory, everything is ok. I can't recall a single situation 
where someone has had some terrible bug due to uninitialized int/float arrays.

Regarding 2, Julia is not intended to be a hardened language for writing highly 
secure software. It allows all sorts of unsafe actions: pointer arithmetic, 
direct memory access, calling arbitrary C functions, etc. The future of really 
secure software seems to be small formally verified kernels written in 
statically typed languages that communicate with larger unverified systems over 
restricted channels. Julia might be appropriate for the larger unverified 
system but certainly not for the trusted kernel. Adding enough verification to 
Julia to write secure kernels is not inconceivable, but would be a major 
research effort. The implementation would have to check lots of things, 
including, of course, ensuring that all arrays are initialized.

A couple of other points:

Modern OSes protect against data leaking between processes by zeroing pages 
before a process first accesses them. Thus any data exposed by Array(T, 
dims...) comes from the same process and is not a security leak.

An uninitialized array of, say, integers is not in itself a security concern – 
the issue is what you do with those integers. The classic security hole is to 
use a "random" value from uninitialized memory to access other memory by using 
it to index into an array or otherwise convert it to a pointer. In the presence 
of bounds checking, however, this isn't actually a big concern since you will 
still either get a bounds error or a valid array value – not a meaningful one, 
of course, but still just a value.

Writing programs that are secure against malicious attacks is a hard, unsolved 
problem. So is doing efficient, productive high-level numerical programming. 
Trying to solve both problems at the same time seems like a recipe for failing 
at both.

> On Nov 24, 2014, at 11:43 AM, David Smith  wrote:
> 
> Some ideas:
> 
> Is there a way to return an error for accesses before at least one assignment 
> in bits types?  I.e. when the object is created uninitialized it is marked 
> "dirty" and only after assignment of some user values can it be "cleanly" 
> accessed?
> 
> Can Julia provide a thin memory management layer that grabs memory from the 
> OS first, zeroes it, and then gives it to the user upon initial allocation?  
> After gc+reallocation it doesn't need to be zeroed again, unless the next 
> allocation is larger than anything previous, at which time Julia grabs more 
> memory, sanitizes it, and hands it off. 
> 
>> On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote:
>> Pointer types will initialise to undef and any operation on them fails: 
>> julia> a = Array(ASCIIString, 5); 
>> 
>> julia> a[1] 
>> ERROR: access to undefined reference 
>>  in getindex at array.jl:246 
>> 
>> But you're right, for bits-types this is not an error an will just 
>> return whatever was there before.  I think the reason this will stay 
>> that way is that Julia is a numerics oriented language.  Thus you many 
>> wanna create a 1GB array of Float64 and then fill it with something as 
>> opposed to first fill it with zeros and then fill it with something. 
>> See: 
>> 
>> julia> @time b = Array(Float64, 10^9); 
>> elapsed time: 0.029523638 seconds (800144 bytes allocated) 
>> 
>> julia> @time c = zeros(Float64, 10^9); 
>> elapsed time: 0.835062841 seconds (800168 bytes allocated) 
>> 
>> You can argue that the time gain isn't worth the risk but I suspect that 
>> others may feel different. 
>> 
>> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest  wrote: 
>> > I am just learning Julia... 
>> > 
>> > I was quite shocked today to learn that Julia does *not* 
>> > initialize allocated storage (e.g. to 0 or some default value). 
>> > E.g. the code 
>> >  A = Array(Int64,5) 
>> >  println(A[1]) 
>> > has unpredictable behavior, may disclose information from 
>> > other modules, etc. 
>> > 
>> > This is really quite unacceptable in a modern programming 
>> > language; it is as bad as not checking array reads for out-of-bounds 
>> > indices.   
>> > 
>> > Google for "uninitialized security" to find numerous instances 
>> > of security violations and unreliability problems caused by the 
>> > use of uninitialized variables, and numerous security advisories 
>> > warning of problems caused by the (perhaps inadvertent) use 
>> > of uninitialized variables. 
>> > 
>> > You can't design a programming language toda

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread David Smith
Some ideas:

Is there a way to return an error for accesses before at least one 
assignment in bits types?  I.e. when the object is created uninitialized it 
is marked "dirty" and only after assignment of some user values can it be 
"cleanly" accessed?

Can Julia provide a thin memory management layer that grabs memory from the 
OS first, zeroes it, and then gives it to the user upon initial allocation? 
 After gc+reallocation it doesn't need to be zeroed again, unless the next 
allocation is larger than anything previous, at which time Julia grabs more 
memory, sanitizes it, and hands it off. 

On Monday, November 24, 2014 2:48:05 AM UTC-6, Mauro wrote:
>
> Pointer types will initialise to undef and any operation on them fails: 
> julia> a = Array(ASCIIString, 5); 
>
> julia> a[1] 
> ERROR: access to undefined reference 
>  in getindex at array.jl:246 
>
> But you're right, for bits-types this is not an error an will just 
> return whatever was there before.  I think the reason this will stay 
> that way is that Julia is a numerics oriented language.  Thus you many 
> wanna create a 1GB array of Float64 and then fill it with something as 
> opposed to first fill it with zeros and then fill it with something. 
> See: 
>
> julia> @time b = Array(Float64, 10^9); 
> elapsed time: 0.029523638 seconds (800144 bytes allocated) 
>
> julia> @time c = zeros(Float64, 10^9); 
> elapsed time: 0.835062841 seconds (800168 bytes allocated) 
>
> You can argue that the time gain isn't worth the risk but I suspect that 
> others may feel different. 
>
> On Mon, 2014-11-24 at 09:28, Ronald L. Rivest  > wrote: 
> > I am just learning Julia... 
> > 
> > I was quite shocked today to learn that Julia does *not* 
> > initialize allocated storage (e.g. to 0 or some default value). 
> > E.g. the code 
> >  A = Array(Int64,5) 
> >  println(A[1]) 
> > has unpredictable behavior, may disclose information from 
> > other modules, etc. 
> > 
> > This is really quite unacceptable in a modern programming 
> > language; it is as bad as not checking array reads for out-of-bounds 
> > indices.   
> > 
> > Google for "uninitialized security" to find numerous instances 
> > of security violations and unreliability problems caused by the 
> > use of uninitialized variables, and numerous security advisories 
> > warning of problems caused by the (perhaps inadvertent) use 
> > of uninitialized variables. 
> > 
> > You can't design a programming language today under the naive 
> > assumption that code in that language won't be used in highly 
> > critical applications or won't be under adversarial attack. 
> > 
> > You can't reasonably ask all programmers to properly initialize 
> > their allocated storage manually any more than you can ask them 
> > to test all indices before accessing an array manually; these are 
> > things that a high-level language should do for you. 
> > 
> > The default non-initialization of allocated storage is a 
> > mis-feature that should absolutely be fixed. 
> > 
> > There is no efficiency argument here in favor of uninitialized storage 
> > that can outweigh the security and reliability disadvantages... 
> > 
> > Cheers, 
> > Ron Rivest 
>
>

Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Tamas Papp
On Mon, Nov 24 2014, Ronald L. Rivest  wrote:

> Google for "uninitialized security" to find numerous instances
> of security violations and unreliability problems caused by the
> use of uninitialized variables, and numerous security advisories
> warning of problems caused by the (perhaps inadvertent) use
> of uninitialized variables.

AFAIK Julia was not intended for applications that need to be
secure. Consequently, it may not be the best language for implementing
the next kernel, SSH server, etc. Besides the issue that you brought up,
it may have other vulnerabilities, for example, I suspect that if you
implement crypto naively in Julia it may be vulnerable to timing attacks
because of the dynamic nature of the language.

Given that initialization has a cost (however minor), I think is
reasonable not to initialize when in 99% of the cases the array will be
filled anyway with values that are actually computed using other data.

> You can't reasonably ask all programmers to properly initialize
> their allocated storage manually any more than you can ask them
> to test all indices before accessing an array manually; these are
> things that a high-level language should do for you.

Sorry, I may be missing something, but if you are not filling an array
with values, why did you create it in the first place? If you just need
a bunch of zeroes, you should probably use sparse data structures.

> There is no efficiency argument here in favor of uninitialized storage
> that can outweigh the security and reliability disadvantages...

All these considerations are subjective and application-dependent, I
doubt that there is general rule. Some high-level languages use
uninitialized arrays, Common Lisp is an example.

Best,

Tamas


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Rafael Fourquet
Regarding Mauro's example: it is my understanding that your first
example is that much faster mainly for cache reasons; indeed, the
speed difference is less visible with the following two functions:

f1(n) = (b = Array(Float64, n); fill!(b, 0.0))
f2(n) = (b = zeros(Float64, n); fill!(b, 0.0))

Then f2 will be at most twice as slow, but often faster than that.

Rafael


Re: [julia-users] Security problem with unitialized memory

2014-11-24 Thread Mauro
Pointer types will initialise to undef and any operation on them fails:
julia> a = Array(ASCIIString, 5);

julia> a[1]
ERROR: access to undefined reference
 in getindex at array.jl:246

But you're right, for bits-types this is not an error an will just
return whatever was there before.  I think the reason this will stay
that way is that Julia is a numerics oriented language.  Thus you many
wanna create a 1GB array of Float64 and then fill it with something as
opposed to first fill it with zeros and then fill it with something.
See:

julia> @time b = Array(Float64, 10^9);
elapsed time: 0.029523638 seconds (800144 bytes allocated)

julia> @time c = zeros(Float64, 10^9);
elapsed time: 0.835062841 seconds (800168 bytes allocated)

You can argue that the time gain isn't worth the risk but I suspect that
others may feel different.

On Mon, 2014-11-24 at 09:28, Ronald L. Rivest  wrote:
> I am just learning Julia...
>
> I was quite shocked today to learn that Julia does *not*
> initialize allocated storage (e.g. to 0 or some default value).
> E.g. the code
>  A = Array(Int64,5)
>  println(A[1])
> has unpredictable behavior, may disclose information from
> other modules, etc.
>
> This is really quite unacceptable in a modern programming
> language; it is as bad as not checking array reads for out-of-bounds
> indices.  
>
> Google for "uninitialized security" to find numerous instances
> of security violations and unreliability problems caused by the
> use of uninitialized variables, and numerous security advisories
> warning of problems caused by the (perhaps inadvertent) use
> of uninitialized variables.
>
> You can't design a programming language today under the naive
> assumption that code in that language won't be used in highly
> critical applications or won't be under adversarial attack.
>
> You can't reasonably ask all programmers to properly initialize 
> their allocated storage manually any more than you can ask them 
> to test all indices before accessing an array manually; these are
> things that a high-level language should do for you. 
>
> The default non-initialization of allocated storage is a 
> mis-feature that should absolutely be fixed.
>
> There is no efficiency argument here in favor of uninitialized storage
> that can outweigh the security and reliability disadvantages...
>
> Cheers,
> Ron Rivest



[julia-users] Security problem with unitialized memory

2014-11-24 Thread Ronald L. Rivest
I am just learning Julia...

I was quite shocked today to learn that Julia does *not*
initialize allocated storage (e.g. to 0 or some default value).
E.g. the code
 A = Array(Int64,5)
 println(A[1])
has unpredictable behavior, may disclose information from
other modules, etc.

This is really quite unacceptable in a modern programming
language; it is as bad as not checking array reads for out-of-bounds
indices.  

Google for "uninitialized security" to find numerous instances
of security violations and unreliability problems caused by the
use of uninitialized variables, and numerous security advisories
warning of problems caused by the (perhaps inadvertent) use
of uninitialized variables.

You can't design a programming language today under the naive
assumption that code in that language won't be used in highly
critical applications or won't be under adversarial attack.

You can't reasonably ask all programmers to properly initialize 
their allocated storage manually any more than you can ask them 
to test all indices before accessing an array manually; these are
things that a high-level language should do for you. 

The default non-initialization of allocated storage is a 
mis-feature that should absolutely be fixed.

There is no efficiency argument here in favor of uninitialized storage
that can outweigh the security and reliability disadvantages...

Cheers,
Ron Rivest