Re: [Numpy-discussion] how to store variable length string in array and get single char by it's position?

2010-09-14 Thread Keith Goodman
On Tue, Sep 14, 2010 at 9:25 AM, kee chen keekychen.sha...@gmail.com wrote:
 Dear All,

 Suppose I have a list group some kind like DNA sequence:

 1  ATGCATGCAATTGGCC
 2  ATGCATGCAATTGGCCATCD
 3  CATGCAATTGGC
 ..
 10 CATGCAAATTGGC

 the string length of each item is not sure and may get change/update later,
 then how can I store above in a numpy array (include the ID) and easy to get
 the single value?

 for example
 1.  ATGCATGCAATTGGCC
 I want get the first T then I use something like array[1][1], means
 A[T]G..   and if I want to update the 3rd postion I can use array[1][2]
 = T to set the AT[G]C... to AT[T]C...?

How about using a python list:

 array = ['ATGC', 'CATGA', 'A']
 array[0][1]
   'T'
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] how to store variable length string in array and get single char by it's position?

2010-09-14 Thread Bruce Southey

 On 09/14/2010 11:33 AM, Keith Goodman wrote:

On Tue, Sep 14, 2010 at 9:25 AM, kee chenkeekychen.sha...@gmail.com  wrote:

Dear All,

Suppose I have a list group some kind like DNA sequence:

1  ATGCATGCAATTGGCC
2  ATGCATGCAATTGGCCATCD
3  CATGCAATTGGC
..
10 CATGCAAATTGGC

the string length of each item is not sure and may get change/update later,
then how can I store above in a numpy array (include the ID) and easy to get
the single value?

for example
1.  ATGCATGCAATTGGCC
I want get the first T then I use something like array[1][1], means
A[T]G..   and if I want to update the 3rd postion I can use array[1][2]
= T to set the AT[G]C... to AT[T]C...?

How about using a python list:


array = ['ATGC', 'CATGA', 'A']
array[0][1]

'T'
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Variable length is incompatible with numpy:
'NumPy arrays have a fixed size at creation, unlike Python lists (which 
can grow dynamically).'

http://docs.scipy.org/doc/numpy/user/whatisnumpy.html

So you have to allocate the space for the largest sequence - although 
you can try using Scipy's sparse matrix (where nucleotides/amino acids 
are coded in numbers starting at 1 such that zeros code the empty areas.


If lists (or dictionaries) don't work for you, you might want to explore 
bioinformatics packages like 'pygr' (http://code.google.com/p/pygr/) and 
Biopython (http://biopython.org/wiki/Main_Page) or try more general 
approaches such hdf5 and pytables.


Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion