On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
> Hi,
>
> Thank you for your reply.
>
> This doesn't improve the time at all (which is my main concern). I am
> still looking for a different way of generating this data, something
> that would be at least 5-10 times faster than my proposed way.
the problem is apparently the slow creation of elements of the space,
not the process of generating random elements.
Here are results of profiling:
sage: %prun sequences=[space.random_element() for __ in range(50000)]
6050003 function calls in 393.762 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
50000 386.048 0.008 386.185 0.008
free_module.py:1903(zero_vector)
50000 4.533 0.000 393.574 0.008
free_module.py:4609(random_element)
1800000 1.092 0.000 1.755 0.000
finite_field_givaro.py:208(random_element)
1800000 0.663 0.000 0.663 0.000 {method 'random_element' of
'sage.rings.finite_rings.element_givaro.Cache_gi}
1800000 0.391 0.000 0.391 0.000 {method 'random' of
'_random.Random' objects}
50000 0.380 0.000 386.742 0.008 free_module.py:5012(__call__)
...... some entries, not taking much time at all, deleted...
If you look at the random_element() code in sage/modules/free_module.py
(starting at the line 4609, see e.g.
https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module.py)
then you see that it calls self(0), i.e. zero_vector(),
and this is what eats up almost all the time.
It will be much faster to allocate a zero matrix
(zero_matrix(GF(4, 'a'), 36,10^6))
and fill it in, basically using the random_element() code,
adapted appropriately.
HTH,
Dima
>
> Gerli
>
> On 11/05/14 22:23, Dima Pasechnik wrote:
>> On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
>>> Hi,
>>>
>>> I am trying to analyse sets of random vectors from a vector space GF(4)^36.
>>> For that I need to sample rather large sets (up to 10^6 vectors) many times
>>> (I would like it to be 1000, depending on how fast I can analyse).
>>>
>>> I first thought my analysis was slow on such large sets but apparently just
>>> generating the sets takes an incredibly long time!
>>>
>>> What is the preferred method for efficiently generating sets of random
>>> vectors?
>>>
>>> Setup:
>>> space = VectorSpace(GF(4, 'a'), 36)
>>> n = 10^6
>>>
>>>
>>> I have tried the following methods:
>>>
>>> sample(space, n)
>>> gives me
>>> OverflowError: Python int too large to convert to C long
>>>
>>> An attempt to sample indexes and then ask for space[i] instead:
>>> sample(range(4^36), n)
>>> also results in
>>> OverflowError: range() result has too many items
>>>
>>> Trying to use space.random_element():
>>> First I tried to get unique samples:
>>> sequences = []
>>> while len(sequences) < n:
>>> elem = space.random_element()
>>> if elem not in sequences:
>>> sequences.append(elem)
>>> but this takes forever (and is impossible to interrupt). I let it run for
>>> several minutes and realized this was not going to work. I don't know how
>>> long it would actually take. I cannot use a set since the vectors are
>>> mutable (although, I must admit I haven't tried turning them immutable and
>>> seeing if using a set works better).
>> yes, you should make them immutable.
>> See below how.
>>
>>> Then I decided not to care about the uniqueness:
>>> %time sequences=[space.random_element() for __ in range(n)]
>>> Best so far (in that it actually gives me a result). This takes about *60
>>> seconds* on my computer (based on a couple runs). Using xrange didn't
>>> affect the time.
>>>
>>> Is it possible to improve this time? And I would prefer it if the set
>>> didn't contain any duplicates.
>> just do the following:
>>
>> sequences=[space.random_element() for __ in range(n)]
>> for i in sequences:
>> i.set_immutable()
>> seq_noreps=set(sequences) # now there are no repetitions
>>
>> HTH,
>> Dima
>>
>>> If generating the dataset takes a whole
>>> minute, it would take me over two weeks just to generate 1000 datasets of
>>> this size...
>>>
>>> Thanks in advance,
>>> Gerli
>>>
>
--
You received this message because you are subscribed to the Google Groups
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.