[python-win32] Identify unique data from sequence array

2010-12-22 Thread otrov
Hi,
I failed in my first idea to solve this problem with matlab/octave, as I just 
started using this tools for data manipulation, and then thought to try python 
as more feature rich descriptive language and post this problem to python group 
I'm subscribed already

Let's consider this simple dictionary object (scipy array):

X = array([[1, 2],
   [1, 2],
   [2, 2],
   [3, 1],
   [2, 3],
   [1, 2],
   [1, 2],
   [2, 2],
   [3, 1],
   [2, 3],
   [1, 2],
   [1, 2],
   [2, 2],
   [3, 1],
   [2, 3],
   ...,
   [1, 2],
   [1, 2],
   [2, 2],
   [3, 1],
   [2, 3]]

I would like to extract repeated sequence data:

Y = array([[1, 2],
   [1, 2],
   [2, 2],
   [3, 1],
   [2, 3]]

as a result.

Arrays are consisted of 10^7 to 10^8 elements, and unique sequence consists of 
maximum 10^6 elements, usually less like 10^5

Thanks for your time

___
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32


Re: [python-win32] Identify unique data from sequence array

2010-12-22 Thread otrov
 1. Start with the first element (call it L)
 2. Scan downwind for an matching element (call it R)
 3. Compare L+1 and R+1 until you find a mismatch -- that's the current
 largest match.
 4. Repeat from 2 to see if you can find a longer match.

Actually, step 4. Repeat from 2..., can be further optimized with searching 
for match between preceding element of R and element shifted by R positions 
relative to R, then couple of routine random checks for matches in between L 
and R and shifted values by R positions ;)

___
python-win32 mailing list
python-win32@python.org
http://mail.python.org/mailman/listinfo/python-win32