On 12 May 2014 09:56, Gerli Viikmaa <[email protected]> wrote:

> Thank you - this has sped up the dataset creation 2 times.
>
> This is the resulting function:
>
> def generate_data(field, length, n):
>     data = zero_matrix(field, n, length)
>     for i in range(n):
>         for j in range(length):
>             data[i,j] = field.random_element()
>     return data
>
> Here the vectors are rows, so I can call sequences[i] to get the ith
> vector.
>
> %time sequences = generate_data(GF(4,'a'), 36, 10^6)
> CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s
> Wall time: 26.93 s
>
> compared to
>
>
> space = VectorSpace(GF(4,'a'), 36)
> %time sequences=[space.random_element() for __ in range(10^6)]
> CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s
> Wall time: 58.75 s
>
> I do wonder if the filling up can be done even better...
>

To save memory, though not necessarily time, do not create the whole n-row
matrix.  Instead, create and return one random vector using "yield", with a
count so you can stop the iteration when n vectors have been returned.  In
other words, make this into an iterator function.

John Cremona


>
> Gerli
>
>
> On 11/05/14 23:34, Dima Pasechnik wrote:
>
>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Thank you for your reply.
>>>
>>> This doesn't improve the time at all (which is my main concern). I am
>>> still looking for a different way of generating this data, something
>>> that would be at least 5-10 times faster than my proposed way.
>>>
>> the problem is apparently the slow creation of elements of the space,
>> not the process of generating random elements.
>> Here are results of profiling:
>>
>> sage: %prun sequences=[space.random_element() for __ in range(50000)]
>>           6050003 function calls in 393.762 seconds
>>
>>              Ordered by: internal time
>>
>>       ncalls     tottime  percall  cumtime  percall
>> filename:lineno(function)
>>       50000      386.048  0.008  386.185    0.008
>> free_module.py:1903(zero_vector)
>>       50000      4.533    0.000  393.574    0.008
>> free_module.py:4609(random_element)
>>       1800000    1.092    0.000    1.755    0.000
>> finite_field_givaro.py:208(random_element)
>>       1800000    0.663    0.000    0.663    0.000 {method
>> 'random_element' of 'sage.rings.finite_rings.element_givaro.Cache_gi}
>>       1800000    0.391    0.000    0.391    0.000 {method 'random' of
>> '_random.Random' objects}
>>       50000      0.380    0.000  386.742    0.008
>> free_module.py:5012(__call__)
>> ...... some entries, not taking much time at all, deleted...
>>
>> If you look at the random_element() code in sage/modules/free_module.py
>> (starting at the line 4609, see e.g.
>> https://github.com/sagemath/sage/blob/master/src/sage/
>> modules/free_module.py)
>>
>> then you see that it calls self(0), i.e. zero_vector(),
>> and this is what eats up almost all the time.
>>
>> It will be much faster to allocate a zero matrix
>> (zero_matrix(GF(4, 'a'), 36,10^6))
>> and fill it in, basically using the random_element() code,
>> adapted appropriately.
>>
>> HTH,
>> Dima
>>
>>  Gerli
>>>
>>> On 11/05/14 22:23, Dima Pasechnik wrote:
>>>
>>>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to analyse sets of random vectors from a vector space
>>>>> GF(4)^36.
>>>>> For that I need to sample rather large sets (up to 10^6 vectors) many
>>>>> times
>>>>> (I would like it to be 1000, depending on how fast I can analyse).
>>>>>
>>>>> I first thought my analysis was slow on such large sets but apparently
>>>>> just
>>>>> generating the sets takes an incredibly long time!
>>>>>
>>>>> What is the preferred method for efficiently generating sets of random
>>>>> vectors?
>>>>>
>>>>> Setup:
>>>>> space = VectorSpace(GF(4, 'a'), 36)
>>>>> n = 10^6
>>>>>
>>>>>
>>>>> I have tried the following methods:
>>>>>
>>>>> sample(space, n)
>>>>> gives me
>>>>> OverflowError: Python int too large to convert to C long
>>>>>
>>>>> An attempt to sample indexes and then ask for space[i] instead:
>>>>> sample(range(4^36), n)
>>>>> also results in
>>>>> OverflowError: range() result has too many items
>>>>>
>>>>> Trying to use space.random_element():
>>>>> First I tried to get unique samples:
>>>>> sequences = []
>>>>> while len(sequences) < n:
>>>>>       elem = space.random_element()
>>>>>       if elem not in sequences:
>>>>>           sequences.append(elem)
>>>>> but this takes forever (and is impossible to interrupt). I let it run
>>>>> for
>>>>> several minutes and realized this was not going to work. I don't know
>>>>> how
>>>>> long it would actually take. I cannot use a set since the vectors are
>>>>> mutable (although, I must admit I haven't tried turning them immutable
>>>>> and
>>>>> seeing if using a set works better).
>>>>>
>>>> yes, you should make them immutable.
>>>> See below how.
>>>>
>>>>  Then I decided not to care about the uniqueness:
>>>>> %time sequences=[space.random_element() for __ in range(n)]
>>>>> Best so far (in that it actually gives me a result). This takes about
>>>>> *60
>>>>> seconds* on my computer (based on a couple runs). Using xrange didn't
>>>>> affect the time.
>>>>>
>>>>> Is it possible to improve this time? And I would prefer it if the set
>>>>> didn't contain any duplicates.
>>>>>
>>>> just do the following:
>>>>
>>>> sequences=[space.random_element() for __ in range(n)]
>>>> for i in sequences:
>>>>      i.set_immutable()
>>>> seq_noreps=set(sequences) # now there are no repetitions
>>>>
>>>> HTH,
>>>> Dima
>>>>
>>>>  If generating the dataset takes a whole
>>>>> minute, it would take me over two weeks just to generate 1000 datasets
>>>>> of
>>>>> this size...
>>>>>
>>>>> Thanks in advance,
>>>>> Gerli
>>>>>
>>>>>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-support" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sage-support.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Reply via email to