On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
> Hi,
>
> Thank you for your reply.
>
> This doesn't improve the time at all (which is my main concern). I am 
> still looking for a different way of generating this data, something 
> that would be at least 5-10 times faster than my proposed way.

the problem is apparently the slow creation of elements of the space,
not the process of generating random elements.
Here are results of profiling:

sage: %prun sequences=[space.random_element() for __ in range(50000)]
         6050003 function calls in 393.762 seconds

            Ordered by: internal time

     ncalls     tottime  percall  cumtime  percall filename:lineno(function)
     50000      386.048  0.008  386.185    0.008 
free_module.py:1903(zero_vector)
     50000      4.533    0.000  393.574    0.008 
free_module.py:4609(random_element)
     1800000    1.092    0.000    1.755    0.000 
finite_field_givaro.py:208(random_element)
     1800000    0.663    0.000    0.663    0.000 {method 'random_element' of 
'sage.rings.finite_rings.element_givaro.Cache_gi}
     1800000    0.391    0.000    0.391    0.000 {method 'random' of 
'_random.Random' objects}
     50000      0.380    0.000  386.742    0.008 free_module.py:5012(__call__)
...... some entries, not taking much time at all, deleted...

If you look at the random_element() code in sage/modules/free_module.py
(starting at the line 4609, see e.g.
https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module.py)

then you see that it calls self(0), i.e. zero_vector(),
and this is what eats up almost all the time.

It will be much faster to allocate a zero matrix
(zero_matrix(GF(4, 'a'), 36,10^6))
and fill it in, basically using the random_element() code,
adapted appropriately.

HTH,
Dima

>
> Gerli
>
> On 11/05/14 22:23, Dima Pasechnik wrote:
>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>> Hi,
>>>
>>> I am trying to analyse sets of random vectors from a vector space GF(4)^36.
>>> For that I need to sample rather large sets (up to 10^6 vectors) many times
>>> (I would like it to be 1000, depending on how fast I can analyse).
>>>
>>> I first thought my analysis was slow on such large sets but apparently just
>>> generating the sets takes an incredibly long time!
>>>
>>> What is the preferred method for efficiently generating sets of random
>>> vectors?
>>>
>>> Setup:
>>> space = VectorSpace(GF(4, 'a'), 36)
>>> n = 10^6
>>>
>>>
>>> I have tried the following methods:
>>>
>>> sample(space, n)
>>> gives me
>>> OverflowError: Python int too large to convert to C long
>>>
>>> An attempt to sample indexes and then ask for space[i] instead:
>>> sample(range(4^36), n)
>>> also results in
>>> OverflowError: range() result has too many items
>>>
>>> Trying to use space.random_element():
>>> First I tried to get unique samples:
>>> sequences = []
>>> while len(sequences) < n:
>>>      elem = space.random_element()
>>>      if elem not in sequences:
>>>          sequences.append(elem)
>>> but this takes forever (and is impossible to interrupt). I let it run for
>>> several minutes and realized this was not going to work. I don't know how
>>> long it would actually take. I cannot use a set since the vectors are
>>> mutable (although, I must admit I haven't tried turning them immutable and
>>> seeing if using a set works better).
>> yes, you should make them immutable.
>> See below how.
>>
>>> Then I decided not to care about the uniqueness:
>>> %time sequences=[space.random_element() for __ in range(n)]
>>> Best so far (in that it actually gives me a result). This takes about *60
>>> seconds* on my computer (based on a couple runs). Using xrange didn't
>>> affect the time.
>>>
>>> Is it possible to improve this time? And I would prefer it if the set
>>> didn't contain any duplicates.
>> just do the following:
>>
>> sequences=[space.random_element() for __ in range(n)]
>> for i in sequences:
>>     i.set_immutable()
>> seq_noreps=set(sequences) # now there are no repetitions
>>
>> HTH,
>> Dima
>>
>>> If generating the dataset takes a whole
>>> minute, it would take me over two weeks just to generate 1000 datasets of
>>> this size...
>>>
>>> Thanks in advance,
>>> Gerli
>>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Reply via email to