On 2014-05-11, Gerli Viikmaa <[email protected]> wrote: > Hi, > > I am trying to analyse sets of random vectors from a vector space GF(4)^36. > For that I need to sample rather large sets (up to 10^6 vectors) many times > (I would like it to be 1000, depending on how fast I can analyse). > > I first thought my analysis was slow on such large sets but apparently just > generating the sets takes an incredibly long time! > > What is the preferred method for efficiently generating sets of random > vectors? > > Setup: > space = VectorSpace(GF(4, 'a'), 36) > n = 10^6 > > > I have tried the following methods: > > sample(space, n) > gives me > OverflowError: Python int too large to convert to C long > > An attempt to sample indexes and then ask for space[i] instead: > sample(range(4^36), n) > also results in > OverflowError: range() result has too many items > > Trying to use space.random_element(): > First I tried to get unique samples: > sequences = [] > while len(sequences) < n: > elem = space.random_element() > if elem not in sequences: > sequences.append(elem) > but this takes forever (and is impossible to interrupt). I let it run for > several minutes and realized this was not going to work. I don't know how > long it would actually take. I cannot use a set since the vectors are > mutable (although, I must admit I haven't tried turning them immutable and > seeing if using a set works better).
yes, you should make them immutable. See below how. > > Then I decided not to care about the uniqueness: > %time sequences=[space.random_element() for __ in range(n)] > Best so far (in that it actually gives me a result). This takes about *60 > seconds* on my computer (based on a couple runs). Using xrange didn't > affect the time. > > Is it possible to improve this time? And I would prefer it if the set > didn't contain any duplicates. just do the following: sequences=[space.random_element() for __ in range(n)] for i in sequences: i.set_immutable() seq_noreps=set(sequences) # now there are no repetitions HTH, Dima > If generating the dataset takes a whole > minute, it would take me over two weeks just to generate 1000 datasets of > this size... > > Thanks in advance, > Gerli > -- You received this message because you are subscribed to the Google Groups "sage-support" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/sage-support. For more options, visit https://groups.google.com/d/optout.
