Why not: %time A = random_matrix(GF(4,'a'), 36, 10^6) CPU times: user 568 ms, sys: 12 ms, total: 580 ms Wall time: 578 ms
However, getting the rows out takes ages. The reason is that vectors over GF(4) are generic, i.e. noone sat down and wrote up a simple class which implements these vectors as matrices with one row: sage: A = matrix(GF(4,'a'),10, 10) sage: type(A) <type 'sage.matrix.matrix_mod2e_dense.Matrix_mod2e_dense'> sage: v = vector(GF(4,'a'),10) sage: type(v) # generic type <type 'sage.modules.free_module_element.FreeModuleElement_generic_dense'> Compare that with GF(2): sage: %time A = random_matrix(GF(2), 36, 10^6) CPU times: user 16 ms, sys: 0 ns, total: 16 ms Wall time: 16 ms sage: %time V = A.rows() CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 2.55 ms sage: A = matrix(GF(2),10, 10) sage: type(A) <type 'sage.matrix.matrix_mod2_dense.Matrix_mod2_dense'> sage: v = vector(GF(2),10) sage: type(v) # specialised type <type 'sage.modules.vector_mod2_dense.Vector_mod2_dense'> So I'm afraid the answer is: "implement it and send a patch" :) On Monday 12 May 2014 11:56:14 Gerli Viikmaa wrote: > Thank you - this has sped up the dataset creation 2 times. > > This is the resulting function: > > def generate_data(field, length, n): > data = zero_matrix(field, n, length) > for i in range(n): > for j in range(length): > data[i,j] = field.random_element() > return data > > Here the vectors are rows, so I can call sequences[i] to get the ith > vector. > > %time sequences = generate_data(GF(4,'a'), 36, 10^6) > CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s > Wall time: 26.93 s > > compared to > > space = VectorSpace(GF(4,'a'), 36) > %time sequences=[space.random_element() for __ in range(10^6)] > CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s > Wall time: 58.75 s > > I do wonder if the filling up can be done even better... > > Gerli > > On 11/05/14 23:34, Dima Pasechnik wrote: > > On 2014-05-11, Gerli Viikmaa <[email protected]> wrote: > >> Hi, > >> > >> Thank you for your reply. > >> > >> This doesn't improve the time at all (which is my main concern). I am > >> still looking for a different way of generating this data, something > >> that would be at least 5-10 times faster than my proposed way. > > > > the problem is apparently the slow creation of elements of the space, > > not the process of generating random elements. > > Here are results of profiling: > > > > sage: %prun sequences=[space.random_element() for __ in range(50000)] > > > > 6050003 function calls in 393.762 seconds > > > > Ordered by: internal time > > > > ncalls tottime percall cumtime percall > > filename:lineno(function) > > 50000 386.048 0.008 386.185 0.008 > > free_module.py:1903(zero_vector) 50000 4.533 0.000 393.574 > > 0.008 free_module.py:4609(random_element) 1800000 1.092 > > 0.000 1.755 0.000 finite_field_givaro.py:208(random_element) > > 1800000 0.663 0.000 0.663 0.000 {method > > 'random_element' of > > 'sage.rings.finite_rings.element_givaro.Cache_gi} 1800000 0.391 > > 0.000 0.391 0.000 {method 'random' of '_random.Random' > > objects} 50000 0.380 0.000 386.742 0.008 > > free_module.py:5012(__call__)> > > ...... some entries, not taking much time at all, deleted... > > > > If you look at the random_element() code in sage/modules/free_module.py > > (starting at the line 4609, see e.g. > > https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module. > > py) > > > > then you see that it calls self(0), i.e. zero_vector(), > > and this is what eats up almost all the time. > > > > It will be much faster to allocate a zero matrix > > (zero_matrix(GF(4, 'a'), 36,10^6)) > > and fill it in, basically using the random_element() code, > > adapted appropriately. > > > > HTH, > > Dima > > > >> Gerli > >> > >> On 11/05/14 22:23, Dima Pasechnik wrote: > >>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote: > >>>> Hi, > >>>> > >>>> I am trying to analyse sets of random vectors from a vector space > >>>> GF(4)^36. > >>>> For that I need to sample rather large sets (up to 10^6 vectors) many > >>>> times > >>>> (I would like it to be 1000, depending on how fast I can analyse). > >>>> > >>>> I first thought my analysis was slow on such large sets but apparently > >>>> just > >>>> generating the sets takes an incredibly long time! > >>>> > >>>> What is the preferred method for efficiently generating sets of random > >>>> vectors? > >>>> > >>>> Setup: > >>>> space = VectorSpace(GF(4, 'a'), 36) > >>>> n = 10^6 > >>>> > >>>> > >>>> I have tried the following methods: > >>>> > >>>> sample(space, n) > >>>> gives me > >>>> OverflowError: Python int too large to convert to C long > >>>> > >>>> An attempt to sample indexes and then ask for space[i] instead: > >>>> sample(range(4^36), n) > >>>> also results in > >>>> OverflowError: range() result has too many items > >>>> > >>>> Trying to use space.random_element(): > >>>> First I tried to get unique samples: > >>>> sequences = [] > >>>> > >>>> while len(sequences) < n: > >>>> elem = space.random_element() > >>>> > >>>> if elem not in sequences: > >>>> sequences.append(elem) > >>>> > >>>> but this takes forever (and is impossible to interrupt). I let it run > >>>> for > >>>> several minutes and realized this was not going to work. I don't know > >>>> how > >>>> long it would actually take. I cannot use a set since the vectors are > >>>> mutable (although, I must admit I haven't tried turning them immutable > >>>> and > >>>> seeing if using a set works better). > >>> > >>> yes, you should make them immutable. > >>> See below how. > >>> > >>>> Then I decided not to care about the uniqueness: > >>>> %time sequences=[space.random_element() for __ in range(n)] > >>>> Best so far (in that it actually gives me a result). This takes about > >>>> *60 > >>>> seconds* on my computer (based on a couple runs). Using xrange didn't > >>>> affect the time. > >>>> > >>>> Is it possible to improve this time? And I would prefer it if the set > >>>> didn't contain any duplicates. > >>> > >>> just do the following: > >>> > >>> sequences=[space.random_element() for __ in range(n)] > >>> > >>> for i in sequences: > >>> i.set_immutable() > >>> > >>> seq_noreps=set(sequences) # now there are no repetitions > >>> > >>> HTH, > >>> Dima > >>> > >>>> If generating the dataset takes a whole > >>>> minute, it would take me over two weeks just to generate 1000 datasets > >>>> of > >>>> this size... > >>>> > >>>> Thanks in advance, > >>>> Gerli
signature.asc
Description: This is a digitally signed message part.
