On 12 May 2014 09:56, Gerli Viikmaa <[email protected]> wrote: > Thank you - this has sped up the dataset creation 2 times. > > This is the resulting function: > > def generate_data(field, length, n): > data = zero_matrix(field, n, length) > for i in range(n): > for j in range(length): > data[i,j] = field.random_element() > return data > > Here the vectors are rows, so I can call sequences[i] to get the ith > vector. > > %time sequences = generate_data(GF(4,'a'), 36, 10^6) > CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s > Wall time: 26.93 s > > compared to > > > space = VectorSpace(GF(4,'a'), 36) > %time sequences=[space.random_element() for __ in range(10^6)] > CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s > Wall time: 58.75 s > > I do wonder if the filling up can be done even better... >
To save memory, though not necessarily time, do not create the whole n-row matrix. Instead, create and return one random vector using "yield", with a count so you can stop the iteration when n vectors have been returned. In other words, make this into an iterator function. John Cremona > > Gerli > > > On 11/05/14 23:34, Dima Pasechnik wrote: > >> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote: >> >>> Hi, >>> >>> Thank you for your reply. >>> >>> This doesn't improve the time at all (which is my main concern). I am >>> still looking for a different way of generating this data, something >>> that would be at least 5-10 times faster than my proposed way. >>> >> the problem is apparently the slow creation of elements of the space, >> not the process of generating random elements. >> Here are results of profiling: >> >> sage: %prun sequences=[space.random_element() for __ in range(50000)] >> 6050003 function calls in 393.762 seconds >> >> Ordered by: internal time >> >> ncalls tottime percall cumtime percall >> filename:lineno(function) >> 50000 386.048 0.008 386.185 0.008 >> free_module.py:1903(zero_vector) >> 50000 4.533 0.000 393.574 0.008 >> free_module.py:4609(random_element) >> 1800000 1.092 0.000 1.755 0.000 >> finite_field_givaro.py:208(random_element) >> 1800000 0.663 0.000 0.663 0.000 {method >> 'random_element' of 'sage.rings.finite_rings.element_givaro.Cache_gi} >> 1800000 0.391 0.000 0.391 0.000 {method 'random' of >> '_random.Random' objects} >> 50000 0.380 0.000 386.742 0.008 >> free_module.py:5012(__call__) >> ...... some entries, not taking much time at all, deleted... >> >> If you look at the random_element() code in sage/modules/free_module.py >> (starting at the line 4609, see e.g. >> https://github.com/sagemath/sage/blob/master/src/sage/ >> modules/free_module.py) >> >> then you see that it calls self(0), i.e. zero_vector(), >> and this is what eats up almost all the time. >> >> It will be much faster to allocate a zero matrix >> (zero_matrix(GF(4, 'a'), 36,10^6)) >> and fill it in, basically using the random_element() code, >> adapted appropriately. >> >> HTH, >> Dima >> >> Gerli >>> >>> On 11/05/14 22:23, Dima Pasechnik wrote: >>> >>>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> I am trying to analyse sets of random vectors from a vector space >>>>> GF(4)^36. >>>>> For that I need to sample rather large sets (up to 10^6 vectors) many >>>>> times >>>>> (I would like it to be 1000, depending on how fast I can analyse). >>>>> >>>>> I first thought my analysis was slow on such large sets but apparently >>>>> just >>>>> generating the sets takes an incredibly long time! >>>>> >>>>> What is the preferred method for efficiently generating sets of random >>>>> vectors? >>>>> >>>>> Setup: >>>>> space = VectorSpace(GF(4, 'a'), 36) >>>>> n = 10^6 >>>>> >>>>> >>>>> I have tried the following methods: >>>>> >>>>> sample(space, n) >>>>> gives me >>>>> OverflowError: Python int too large to convert to C long >>>>> >>>>> An attempt to sample indexes and then ask for space[i] instead: >>>>> sample(range(4^36), n) >>>>> also results in >>>>> OverflowError: range() result has too many items >>>>> >>>>> Trying to use space.random_element(): >>>>> First I tried to get unique samples: >>>>> sequences = [] >>>>> while len(sequences) < n: >>>>> elem = space.random_element() >>>>> if elem not in sequences: >>>>> sequences.append(elem) >>>>> but this takes forever (and is impossible to interrupt). I let it run >>>>> for >>>>> several minutes and realized this was not going to work. I don't know >>>>> how >>>>> long it would actually take. I cannot use a set since the vectors are >>>>> mutable (although, I must admit I haven't tried turning them immutable >>>>> and >>>>> seeing if using a set works better). >>>>> >>>> yes, you should make them immutable. >>>> See below how. >>>> >>>> Then I decided not to care about the uniqueness: >>>>> %time sequences=[space.random_element() for __ in range(n)] >>>>> Best so far (in that it actually gives me a result). This takes about >>>>> *60 >>>>> seconds* on my computer (based on a couple runs). Using xrange didn't >>>>> affect the time. >>>>> >>>>> Is it possible to improve this time? And I would prefer it if the set >>>>> didn't contain any duplicates. >>>>> >>>> just do the following: >>>> >>>> sequences=[space.random_element() for __ in range(n)] >>>> for i in sequences: >>>> i.set_immutable() >>>> seq_noreps=set(sequences) # now there are no repetitions >>>> >>>> HTH, >>>> Dima >>>> >>>> If generating the dataset takes a whole >>>>> minute, it would take me over two weeks just to generate 1000 datasets >>>>> of >>>>> this size... >>>>> >>>>> Thanks in advance, >>>>> Gerli >>>>> >>>>> > -- > You received this message because you are subscribed to the Google Groups > "sage-support" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/sage-support. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "sage-support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
