Re: [sage-support] Re: Sampling elements from a vector space

Martin Albrecht Mon, 12 May 2014 02:13:28 -0700

Why not:

%time A = random_matrix(GF(4,'a'), 36, 10^6)
CPU times: user 568 ms, sys: 12 ms, total: 580 ms
Wall time: 578 ms


However, getting the rows out takes ages. The reason is that vectors over 
GF(4) are generic, i.e. noone sat down and wrote up a simple class which 
implements these vectors as matrices with one row:

sage: A = matrix(GF(4,'a'),10, 10)
sage: type(A)
<type 'sage.matrix.matrix_mod2e_dense.Matrix_mod2e_dense'>

sage: v = vector(GF(4,'a'),10)
sage: type(v) # generic type
<type 'sage.modules.free_module_element.FreeModuleElement_generic_dense'>


Compare that with GF(2):

sage: %time A = random_matrix(GF(2), 36, 10^6)
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 16 ms
sage: %time V = A.rows()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 2.55 ms

sage: A = matrix(GF(2),10, 10)
sage: type(A)
<type 'sage.matrix.matrix_mod2_dense.Matrix_mod2_dense'>

sage: v = vector(GF(2),10)
sage: type(v) # specialised type
<type 'sage.modules.vector_mod2_dense.Vector_mod2_dense'>

So I'm afraid the answer is: "implement it and send a patch" :)

On Monday 12 May 2014 11:56:14 Gerli Viikmaa wrote:
> Thank you - this has sped up the dataset creation 2 times.
> 
> This is the resulting function:
> 
> def generate_data(field, length, n):
>      data = zero_matrix(field, n, length)
>      for i in range(n):
>          for j in range(length):
>              data[i,j] = field.random_element()
>      return data
> 
> Here the vectors are rows, so I can call sequences[i] to get the ith
> vector.
> 
> %time sequences = generate_data(GF(4,'a'), 36, 10^6)
> CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s
> Wall time: 26.93 s
> 
> compared to
> 
> space = VectorSpace(GF(4,'a'), 36)
> %time sequences=[space.random_element() for __ in range(10^6)]
> CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s
> Wall time: 58.75 s
> 
> I do wonder if the filling up can be done even better...
> 
> Gerli
> 
> On 11/05/14 23:34, Dima Pasechnik wrote:
> > On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
> >> Hi,
> >> 
> >> Thank you for your reply.
> >> 
> >> This doesn't improve the time at all (which is my main concern). I am
> >> still looking for a different way of generating this data, something
> >> that would be at least 5-10 times faster than my proposed way.
> > 
> > the problem is apparently the slow creation of elements of the space,
> > not the process of generating random elements.
> > Here are results of profiling:
> > 
> > sage: %prun sequences=[space.random_element() for __ in range(50000)]
> > 
> >           6050003 function calls in 393.762 seconds
> >           
> >              Ordered by: internal time
> >       
> >       ncalls     tottime  percall  cumtime  percall
> >       filename:lineno(function)
> >       50000      386.048  0.008  386.185    0.008
> >       free_module.py:1903(zero_vector) 50000      4.533    0.000  393.574
> >          0.008 free_module.py:4609(random_element) 1800000    1.092   
> >       0.000    1.755    0.000 finite_field_givaro.py:208(random_element)
> >       1800000    0.663    0.000    0.663    0.000 {method
> >       'random_element' of
> >       'sage.rings.finite_rings.element_givaro.Cache_gi} 1800000    0.391 
> >         0.000    0.391    0.000 {method 'random' of '_random.Random'
> >       objects} 50000      0.380    0.000  386.742    0.008
> >       free_module.py:5012(__call__)> 
> > ...... some entries, not taking much time at all, deleted...
> > 
> > If you look at the random_element() code in sage/modules/free_module.py
> > (starting at the line 4609, see e.g.
> > https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module.
> > py)
> > 
> > then you see that it calls self(0), i.e. zero_vector(),
> > and this is what eats up almost all the time.
> > 
> > It will be much faster to allocate a zero matrix
> > (zero_matrix(GF(4, 'a'), 36,10^6))
> > and fill it in, basically using the random_element() code,
> > adapted appropriately.
> > 
> > HTH,
> > Dima
> > 
> >> Gerli
> >> 
> >> On 11/05/14 22:23, Dima Pasechnik wrote:
> >>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
> >>>> Hi,
> >>>> 
> >>>> I am trying to analyse sets of random vectors from a vector space
> >>>> GF(4)^36.
> >>>> For that I need to sample rather large sets (up to 10^6 vectors) many
> >>>> times
> >>>> (I would like it to be 1000, depending on how fast I can analyse).
> >>>> 
> >>>> I first thought my analysis was slow on such large sets but apparently
> >>>> just
> >>>> generating the sets takes an incredibly long time!
> >>>> 
> >>>> What is the preferred method for efficiently generating sets of random
> >>>> vectors?
> >>>> 
> >>>> Setup:
> >>>> space = VectorSpace(GF(4, 'a'), 36)
> >>>> n = 10^6
> >>>> 
> >>>> 
> >>>> I have tried the following methods:
> >>>> 
> >>>> sample(space, n)
> >>>> gives me
> >>>> OverflowError: Python int too large to convert to C long
> >>>> 
> >>>> An attempt to sample indexes and then ask for space[i] instead:
> >>>> sample(range(4^36), n)
> >>>> also results in
> >>>> OverflowError: range() result has too many items
> >>>> 
> >>>> Trying to use space.random_element():
> >>>> First I tried to get unique samples:
> >>>> sequences = []
> >>>> 
> >>>> while len(sequences) < n:
> >>>>       elem = space.random_element()
> >>>>       
> >>>>       if elem not in sequences:
> >>>>           sequences.append(elem)
> >>>> 
> >>>> but this takes forever (and is impossible to interrupt). I let it run
> >>>> for
> >>>> several minutes and realized this was not going to work. I don't know
> >>>> how
> >>>> long it would actually take. I cannot use a set since the vectors are
> >>>> mutable (although, I must admit I haven't tried turning them immutable
> >>>> and
> >>>> seeing if using a set works better).
> >>> 
> >>> yes, you should make them immutable.
> >>> See below how.
> >>> 
> >>>> Then I decided not to care about the uniqueness:
> >>>> %time sequences=[space.random_element() for __ in range(n)]
> >>>> Best so far (in that it actually gives me a result). This takes about
> >>>> *60
> >>>> seconds* on my computer (based on a couple runs). Using xrange didn't
> >>>> affect the time.
> >>>> 
> >>>> Is it possible to improve this time? And I would prefer it if the set
> >>>> didn't contain any duplicates.
> >>> 
> >>> just do the following:
> >>> 
> >>> sequences=[space.random_element() for __ in range(n)]
> >>> 
> >>> for i in sequences:
> >>>      i.set_immutable()
> >>> 
> >>> seq_noreps=set(sequences) # now there are no repetitions
> >>> 
> >>> HTH,
> >>> Dima
> >>> 
> >>>> If generating the dataset takes a whole
> >>>> minute, it would take me over two weeks just to generate 1000 datasets
> >>>> of
> >>>> this size...
> >>>> 
> >>>> Thanks in advance,
> >>>> Gerli

signature.asc
Description: This is a digitally signed message part.

Re: [sage-support] Re: Sampling elements from a vector space

Reply via email to