On 2014-05-12, Gerli Viikmaa <[email protected]> wrote:
> Thank you - this has sped up the dataset creation 2 times.
>
> This is the resulting function:
>
> def generate_data(field, length, n):
>      data = zero_matrix(field, n, length)
>      for i in range(n):
>          for j in range(length):
>              data[i,j] = field.random_element()
>      return data
>
> Here the vectors are rows, so I can call sequences[i] to get the ith 
> vector.
>
> %time sequences = generate_data(GF(4,'a'), 36, 10^6)
> CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s
> Wall time: 26.93 s
>
> compared to
>
> space = VectorSpace(GF(4,'a'), 36)
> %time sequences=[space.random_element() for __ in range(10^6)]
> CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s
> Wall time: 58.75 s
>
> I do wonder if the filling up can be done even better...

you can get another 30-50% or so of speedup if you replace
field.random_element() by field._cache.random_element()
in your code.
It is because finite_field_givaro.py:208(random_element)
is a python function which just calls _cache.random_element()
and does nothing else.
(again, use %prun to see where the bottleneck in your code is).

You might then also try using Cython.
Finally, I don't know how these random elements are found; it could be a
generic procedure that does not do too well for small fields.
It is done by Givaro (cf http://givaro.forge.imag.fr/givaro-html/)
so you might want to dig into it a bit.
But probably you'd then need to code in Cython or in C/C++ to
get an improvement...

HTH,
Dima

>
> Gerli
>
> On 11/05/14 23:34, Dima Pasechnik wrote:
>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>> Hi,
>>>
>>> Thank you for your reply.
>>>
>>> This doesn't improve the time at all (which is my main concern). I am
>>> still looking for a different way of generating this data, something
>>> that would be at least 5-10 times faster than my proposed way.
>> the problem is apparently the slow creation of elements of the space,
>> not the process of generating random elements.
>> Here are results of profiling:
>>
>> sage: %prun sequences=[space.random_element() for __ in range(50000)]
>>           6050003 function calls in 393.762 seconds
>>
>>              Ordered by: internal time
>>
>>       ncalls     tottime  percall  cumtime  percall filename:lineno(function)
>>       50000      386.048  0.008  386.185    0.008 
>> free_module.py:1903(zero_vector)
>>       50000      4.533    0.000  393.574    0.008 
>> free_module.py:4609(random_element)
>>       1800000    1.092    0.000    1.755    0.000 
>> finite_field_givaro.py:208(random_element)
>>       1800000    0.663    0.000    0.663    0.000 {method 'random_element' 
>> of 'sage.rings.finite_rings.element_givaro.Cache_gi}
>>       1800000    0.391    0.000    0.391    0.000 {method 'random' of 
>> '_random.Random' objects}
>>       50000      0.380    0.000  386.742    0.008 
>> free_module.py:5012(__call__)
>> ...... some entries, not taking much time at all, deleted...
>>
>> If you look at the random_element() code in sage/modules/free_module.py
>> (starting at the line 4609, see e.g.
>> https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module.py)
>>
>> then you see that it calls self(0), i.e. zero_vector(),
>> and this is what eats up almost all the time.
>>
>> It will be much faster to allocate a zero matrix
>> (zero_matrix(GF(4, 'a'), 36,10^6))
>> and fill it in, basically using the random_element() code,
>> adapted appropriately.
>>
>> HTH,
>> Dima
>>
>>> Gerli
>>>
>>> On 11/05/14 22:23, Dima Pasechnik wrote:
>>>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> I am trying to analyse sets of random vectors from a vector space 
>>>>> GF(4)^36.
>>>>> For that I need to sample rather large sets (up to 10^6 vectors) many 
>>>>> times
>>>>> (I would like it to be 1000, depending on how fast I can analyse).
>>>>>
>>>>> I first thought my analysis was slow on such large sets but apparently 
>>>>> just
>>>>> generating the sets takes an incredibly long time!
>>>>>
>>>>> What is the preferred method for efficiently generating sets of random
>>>>> vectors?
>>>>>
>>>>> Setup:
>>>>> space = VectorSpace(GF(4, 'a'), 36)
>>>>> n = 10^6
>>>>>
>>>>>
>>>>> I have tried the following methods:
>>>>>
>>>>> sample(space, n)
>>>>> gives me
>>>>> OverflowError: Python int too large to convert to C long
>>>>>
>>>>> An attempt to sample indexes and then ask for space[i] instead:
>>>>> sample(range(4^36), n)
>>>>> also results in
>>>>> OverflowError: range() result has too many items
>>>>>
>>>>> Trying to use space.random_element():
>>>>> First I tried to get unique samples:
>>>>> sequences = []
>>>>> while len(sequences) < n:
>>>>>       elem = space.random_element()
>>>>>       if elem not in sequences:
>>>>>           sequences.append(elem)
>>>>> but this takes forever (and is impossible to interrupt). I let it run for
>>>>> several minutes and realized this was not going to work. I don't know how
>>>>> long it would actually take. I cannot use a set since the vectors are
>>>>> mutable (although, I must admit I haven't tried turning them immutable and
>>>>> seeing if using a set works better).
>>>> yes, you should make them immutable.
>>>> See below how.
>>>>
>>>>> Then I decided not to care about the uniqueness:
>>>>> %time sequences=[space.random_element() for __ in range(n)]
>>>>> Best so far (in that it actually gives me a result). This takes about *60
>>>>> seconds* on my computer (based on a couple runs). Using xrange didn't
>>>>> affect the time.
>>>>>
>>>>> Is it possible to improve this time? And I would prefer it if the set
>>>>> didn't contain any duplicates.
>>>> just do the following:
>>>>
>>>> sequences=[space.random_element() for __ in range(n)]
>>>> for i in sequences:
>>>>      i.set_immutable()
>>>> seq_noreps=set(sequences) # now there are no repetitions
>>>>
>>>> HTH,
>>>> Dima
>>>>
>>>>> If generating the dataset takes a whole
>>>>> minute, it would take me over two weeks just to generate 1000 datasets of
>>>>> this size...
>>>>>
>>>>> Thanks in advance,
>>>>> Gerli
>>>>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Reply via email to