Re: [sage-support] Re: Sampling elements from a vector space

Gerli Viikmaa Mon, 12 May 2014 02:42:08 -0700

Hmm, I didn't know about the random_matrix command.

So far it works very good for me:
%time sequences = random_matrix(GF(4,'a'), *10^6, 36*) # notice the order
CPU times: user 0.79 s, sys: 0.00 s, total: 0.79 s
Wall time: 0.77 s


%time sequences[0]
CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s
Wall time: 0.00 s

(0, a + 1, a, a, a, a, 0, a, a, a, 0, a, 1, a, a + 1, a, 1, a, 0, a, 0,a, a + 1, 1, a, 1, a + 1, 0, 0, a, 0, a + 1, a, 1, a + 1, 0)


This part is slow, though, yes:
%time V = sequences.rows()
CPU times: user 38.08 s, sys: 0.42 s, total: 38.50 s
Wall time: 38.63 s

And I did manage to fill up my memory when creating the transposedmatrix (as you suggested) and printing out the first row (containing amillion elements).


Too bad I'm not working in GF(2)...

I believe I can work with this now. Thank you all for your help.

Gerli

On 12/05/14 12:12, Martin Albrecht wrote:

Why not:

%time A = random_matrix(GF(4,'a'), 36, 10^6)
CPU times: user 568 ms, sys: 12 ms, total: 580 ms
Wall time: 578 ms

However, getting the rows out takes ages. The reason is that vectors over
GF(4) are generic, i.e. noone sat down and wrote up a simple class which
implements these vectors as matrices with one row:

sage: A = matrix(GF(4,'a'),10, 10)
sage: type(A)
<type 'sage.matrix.matrix_mod2e_dense.Matrix_mod2e_dense'>

sage: v = vector(GF(4,'a'),10)
sage: type(v) # generic type
<type 'sage.modules.free_module_element.FreeModuleElement_generic_dense'>


Compare that with GF(2):

sage: %time A = random_matrix(GF(2), 36, 10^6)
CPU times: user 16 ms, sys: 0 ns, total: 16 ms
Wall time: 16 ms
sage: %time V = A.rows()
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 2.55 ms

sage: A = matrix(GF(2),10, 10)
sage: type(A)
<type 'sage.matrix.matrix_mod2_dense.Matrix_mod2_dense'>

sage: v = vector(GF(2),10)
sage: type(v) # specialised type
<type 'sage.modules.vector_mod2_dense.Vector_mod2_dense'>

So I'm afraid the answer is: "implement it and send a patch" :)

On Monday 12 May 2014 11:56:14 Gerli Viikmaa wrote:

Thank you - this has sped up the dataset creation 2 times.

This is the resulting function:

def generate_data(field, length, n):
      data = zero_matrix(field, n, length)
      for i in range(n):
          for j in range(length):
              data[i,j] = field.random_element()
      return data

Here the vectors are rows, so I can call sequences[i] to get the ith
vector.

%time sequences = generate_data(GF(4,'a'), 36, 10^6)
CPU times: user 26.28 s, sys: 0.56 s, total: 26.84 s
Wall time: 26.93 s

compared to

space = VectorSpace(GF(4,'a'), 36)
%time sequences=[space.random_element() for __ in range(10^6)]
CPU times: user 58.41 s, sys: 0.20 s, total: 58.60 s
Wall time: 58.75 s

I do wonder if the filling up can be done even better...

Gerli

On 11/05/14 23:34, Dima Pasechnik wrote:

On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:

Hi,

Thank you for your reply.

This doesn't improve the time at all (which is my main concern). I am
still looking for a different way of generating this data, something
that would be at least 5-10 times faster than my proposed way.

the problem is apparently the slow creation of elements of the space,
not the process of generating random elements.
Here are results of profiling:

sage: %prun sequences=[space.random_element() for __ in range(50000)]

           6050003 function calls in 393.762 seconds

Ordered by: internal timencalls tottime percall cumtime percall

       filename:lineno(function)
       50000      386.048  0.008  386.185    0.008
       free_module.py:1903(zero_vector) 50000      4.533    0.000  393.574
          0.008 free_module.py:4609(random_element) 1800000    1.092
       0.000    1.755    0.000 finite_field_givaro.py:208(random_element)
       1800000    0.663    0.000    0.663    0.000 {method
       'random_element' of
       'sage.rings.finite_rings.element_givaro.Cache_gi} 1800000    0.391
         0.000    0.391    0.000 {method 'random' of '_random.Random'
       objects} 50000      0.380    0.000  386.742    0.008
       free_module.py:5012(__call__)>
...... some entries, not taking much time at all, deleted...

If you look at the random_element() code in sage/modules/free_module.py
(starting at the line 4609, see e.g.
https://github.com/sagemath/sage/blob/master/src/sage/modules/free_module.
py)

then you see that it calls self(0), i.e. zero_vector(),
and this is what eats up almost all the time.

It will be much faster to allocate a zero matrix
(zero_matrix(GF(4, 'a'), 36,10^6))
and fill it in, basically using the random_element() code,
adapted appropriately.

HTH,
Dima

Gerli

On 11/05/14 22:23, Dima Pasechnik wrote:

On 2014-05-11, Gerli Viikmaa <[email protected]> wrote:

Hi,

I am trying to analyse sets of random vectors from a vector space
GF(4)^36.
For that I need to sample rather large sets (up to 10^6 vectors) many
times
(I would like it to be 1000, depending on how fast I can analyse).

I first thought my analysis was slow on such large sets but apparently
just
generating the sets takes an incredibly long time!

What is the preferred method for efficiently generating sets of random
vectors?

Setup:
space = VectorSpace(GF(4, 'a'), 36)
n = 10^6


I have tried the following methods:

sample(space, n)
gives me
OverflowError: Python int too large to convert to C long

An attempt to sample indexes and then ask for space[i] instead:
sample(range(4^36), n)
also results in
OverflowError: range() result has too many items

Trying to use space.random_element():
First I tried to get unique samples:
sequences = []

while len(sequences) < n:
       elem = space.random_element()

if elem not in sequences:

           sequences.append(elem)

but this takes forever (and is impossible to interrupt). I let it run
for
several minutes and realized this was not going to work. I don't know
how
long it would actually take. I cannot use a set since the vectors are
mutable (although, I must admit I haven't tried turning them immutable
and
seeing if using a set works better).

yes, you should make them immutable.
See below how.

Then I decided not to care about the uniqueness:
%time sequences=[space.random_element() for __ in range(n)]
Best so far (in that it actually gives me a result). This takes about
*60
seconds* on my computer (based on a couple runs). Using xrange didn't
affect the time.

Is it possible to improve this time? And I would prefer it if the set
didn't contain any duplicates.

just do the following:

sequences=[space.random_element() for __ in range(n)]

for i in sequences:
      i.set_immutable()

seq_noreps=set(sequences) # now there are no repetitions

HTH,
Dima

If generating the dataset takes a whole
minute, it would take me over two weeks just to generate 1000 datasets
of
this size...

Thanks in advance,
Gerli


--
You received this message because you are subscribed to the Google Groups 
"sage-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-support.
For more options, visit https://groups.google.com/d/optout.

Re: [sage-support] Re: Sampling elements from a vector space

Reply via email to