[sage-support] Re: Sampling elements from a vector space

Dima Pasechnik Sun, 11 May 2014 12:23:37 -0700

On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:
> Hi,
>
> I am trying to analyse sets of random vectors from a vector space GF(4)^36. 
> For that I need to sample rather large sets (up to 10^6 vectors) many times 
> (I would like it to be 1000, depending on how fast I can analyse).
>
> I first thought my analysis was slow on such large sets but apparently just 
> generating the sets takes an incredibly long time!
>
> What is the preferred method for efficiently generating sets of random 
> vectors? 
>
> Setup:
> space = VectorSpace(GF(4, 'a'), 36)
> n = 10^6 
>
>
> I have tried the following methods:
>
> sample(space, n)
> gives me
> OverflowError: Python int too large to convert to C long
>
> An attempt to sample indexes and then ask for space[i] instead:
> sample(range(4^36), n)
> also results in 
> OverflowError: range() result has too many items
>
> Trying to use space.random_element():
> First I tried to get unique samples:
> sequences = []
> while len(sequences) < n:
>     elem = space.random_element()
>     if elem not in sequences:
>         sequences.append(elem)
> but this takes forever (and is impossible to interrupt). I let it run for 
> several minutes and realized this was not going to work. I don't know how 
> long it would actually take. I cannot use a set since the vectors are 
> mutable (although, I must admit I haven't tried turning them immutable and 
> seeing if using a set works better).


yes, you should make them immutable.
See below how.

>
> Then I decided not to care about the uniqueness:
> %time sequences=[space.random_element() for __ in range(n)]
> Best so far (in that it actually gives me a result). This takes about *60 
> seconds* on my computer (based on a couple runs). Using xrange didn't 
> affect the time.
>
> Is it possible to improve this time? And I would prefer it if the set 
> didn't contain any duplicates.
just do the following:

sequences=[space.random_element() for __ in range(n)]
for i in sequences:
   i.set_immutable()
seq_noreps=set(sequences) # now there are no repetitions

HTH,
Dima

> If generating the dataset takes a whole 
> minute, it would take me over two weeks just to generate 1000 datasets of 
> this size...
>
> Thanks in advance,
> Gerli
>

-- 
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

[sage-support] Re: Sampling elements from a vector space

Reply via email to