Re: Extracting lines from text files - script with a couple of 'side effects'

2013-09-26 Thread mstagliamonte
Hi,

Thanks for the answers! And Dave, thanks for explaining the cause of the 
problem I will keep that in mind for the future. You're right, I am doing the 
search backward, it just seemed easier for me to do it in that way. Looks like 
I need to keep practising...

Both your suggestions work, I will try and learn from them.

Have a nice day
Max
-- 
https://mail.python.org/mailman/listinfo/python-list


Extracting lines from text files - script with a couple of 'side effects'

2013-09-25 Thread mstagliamonte
Dear All,

Here I am, with another newbie question. I am trying to extract some lines from 
a fasta (text) file which match the headers in another file. i.e:
Fasta file:
header1|info1:info2_info3
general text
header2|info1:info2_info3
general text

headers file:
header1|info1:info2_info3
header2|info1:info2_info3

I want to create a third file, similar to the first one, but only containing 
headers and text of what is listed in the second file. Also, I want to print 
out how many headers were actually found from the second file to match the 
first.

I have done a script which seems to work, but with a couple of 'side effects'
Here is my script:
---
import re
class Extractor():

def __init__(self,headers_file, fasta_file,output_file):
with open(headers_file,'r') as inp0:
counter0=0
container=''
inp0_bis=inp0.read().split('\n')
for x in inp0_bis:
container+=x.replace(':','_').replace('|','_')
with open(fasta_file,'r') as inp1:
inp1_bis=inp1.read().split('')
for i in inp1_bis:
i_bis= i.split('\n')   
match = 
re.search(i_bis[0].replace(':','_').replace('|','_'),container)
if match:
counter0+=1
with open(output_file,'at') as out0:
out0.write(''+i)
 print '{} sequences were found'.format(counter0)

---
Side effects:
1) The very first header is written as header1 rather than header1
2) the number of sequences found is 1 more than the ones actually found!

Have you got any thoughts about causes/solutions?

Thanks for your time!
P.S.: I think I have removed the double posting... not sure...
Max
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote:
 Hi everyone,
 
 
 
 I am a beginner in python and trying to find my way through... :)
 
 
 
 I am writing a script to get numbers from the headers of a text file.
 
 
 
 If the header is something like:
 
 h01 = ('scaffold_1')
 
 I just use:
 
 h01.lstrip('scaffold_')
 
 and this returns me with '1'
 
 
 
 But, if the header is:
 
 h02: ('contig-100_1')
 
 if I use:
 
 h02.lstrip('contig-100_')
 
 this returns me with: ''
 
 ...basically nothing. What surprises me is that if I do in this other way:
 
 h02b = h02.lstrip('contig-100')
 
 I get h02b = ('_1')
 
 and subsequently:
 
 h02b.lstrip('_')
 
 returns me with: '1' which is what I wanted!
 
 
 
 Why is this happening? What am I missing?
 
 
 
 Thanks for your help and attention
 
 Max

-- 
http://mail.python.org/mailman/listinfo/python-list


lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
Hi everyone,

I am a beginner in python and trying to find my way through... :)

I am writing a script to get numbers from the headers of a text file.

If the header is something like:
h01 = ('scaffold_1')
I just use:
h01.lstrip('scaffold_')
and this returns me '1'

But, if the header is:
h02: ('contig-100_0')
if I use:
h02.lstrip('contig-100_')
this returns me with: ''
...basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!

Why is this happening? What am I missing?

Thanks for your help and attention
Max

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote:
 Hi everyone,
 
 
 
 I am a beginner in python and trying to find my way through... :)
 
 
 
 I am writing a script to get numbers from the headers of a text file.
 
 
 
 If the header is something like:
 
 h01 = ('scaffold_1')
 
 I just use:
 
 h01.lstrip('scaffold_')
 
 and this returns me '1'
 
 
 
 But, if the header is:
 
 h02: ('contig-100_0')
 
 if I use:
 
 h02.lstrip('contig-100_')
 
 this returns me with: ''
 
 ...basically nothing. What surprises me is that if I do in this other way:
 
 h02b = h02.lstrip('contig-100')
 
 I get h02b = ('_1')
 
 and subsequently:
 
 h02b.lstrip('_')
 
 returns me with: '1' which is what I wanted!
 
 
 
 Why is this happening? What am I missing?
 
 
 
 Thanks for your help and attention
 
 Max

edit: h02: ('contig-100_1')
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote:
 Hi everyone,
 
 
 
 I am a beginner in python and trying to find my way through... :)
 
 
 
 I am writing a script to get numbers from the headers of a text file.
 
 
 
 If the header is something like:
 
 h01 = ('scaffold_1')
 
 I just use:
 
 h01.lstrip('scaffold_')
 
 and this returns me '1'
 
 
 
 But, if the header is:
 
 h02: ('contig-100_0')
 
 if I use:
 
 h02.lstrip('contig-100_')
 
 this returns me with: ''
 
 ...basically nothing. What surprises me is that if I do in this other way:
 
 h02b = h02.lstrip('contig-100')
 
 I get h02b = ('_1')
 
 and subsequently:
 
 h02b.lstrip('_')
 
 returns me with: '1' which is what I wanted!
 
 
 
 Why is this happening? What am I missing?
 
 
 
 Thanks for your help and attention
 
 Max

edit: h02= ('contig-100_1') 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote:
 Hi everyone,
 
 
 
 I am a beginner in python and trying to find my way through... :)
 
 
 
 I am writing a script to get numbers from the headers of a text file.
 
 
 
 If the header is something like:
 
 h01 = ('scaffold_1')
 
 I just use:
 
 h01.lstrip('scaffold_')
 
 and this returns me '1'
 
 
 
 But, if the header is:
 
 h02: ('contig-100_0')
 
 if I use:
 
 h02.lstrip('contig-100_')
 
 this returns me with: ''
 
 ...basically nothing. What surprises me is that if I do in this other way:
 
 h02b = h02.lstrip('contig-100')
 
 I get h02b = ('_1')
 
 and subsequently:
 
 h02b.lstrip('_')
 
 returns me with: '1' which is what I wanted!
 
 
 
 Why is this happening? What am I missing?
 
 
 
 Thanks for your help and attention
 
 Max

edit:
h02= ('contig-100_1')
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:41:43 AM UTC-4, Fábio Santos wrote:
 On 4 Jun 2013 16:34, mstagliamonte madm...@yahoo.it wrote:
 
 
 
  On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote:
 
   Hi everyone,
 
  
 
  
 
  
 
   I am a beginner in python and trying to find my way through... :)
 
  
 
  
 
  
 
   I am writing a script to get numbers from the headers of a text file.
 
  
 
  
 
  
 
   If the header is something like:
 
  
 
   h01 = ('scaffold_1')
 
  
 
   I just use:
 
  
 
   h01.lstrip('scaffold_')
 
  
 
   and this returns me '1'
 
  
 
  
 
  
 
   But, if the header is:
 
  
 
   h02: ('contig-100_0')
 
  
 
   if I use:
 
  
 
   h02.lstrip('contig-100_')
 
  
 
   this returns me with: ''
 
  
 
   ...basically nothing. What surprises me is that if I do in this other way:
 
  
 
   h02b = h02.lstrip('contig-100')
 
  
 
   I get h02b = ('_1')
 
  
 
   and subsequently:
 
  
 
   h02b.lstrip('_')
 
  
 
   returns me with: '1' which is what I wanted!
 
  
 
  
 
  
 
   Why is this happening? What am I missing?
 
  
 
  
 
  
 
   Thanks for your help and attention
 
  
 
   Max
 
 
 
  edit: h02: ('contig-100_1')
 
 You don't have to use ('..') to declare a string. Just 'your string' will do.
 
 You can use str.split to split your string by a character.
 
 (Not tested)
 
 string_on_left, numbers = 'contig-100_01'.split('-')
 
 left_number, right_number = numbers.split('_')
 
 left_number, right_number = int(left_number), int(right_number)
 
 Of course, you will want to replace the variable names.
 
 If you have more advanced parsing needs, you will want to look at regular 
 expressions or blobs.

Thanks, I will try it straight away. Still, I don't understand why the original 
command is returning me with nothing !? Have you got any idea? 
I am trying to understand a bit the 'nuts and bolts' of what I am doing and 
this result does not make any sense to me

Regards
Max
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
On Tuesday, June 4, 2013 11:48:55 AM UTC-4, MRAB wrote:
 On 04/06/2013 16:21, mstagliamonte wrote:
 
  Hi everyone,
 
 
 
  I am a beginner in python and trying to find my way through... :)
 
 
 
  I am writing a script to get numbers from the headers of a text file.
 
 
 
  If the header is something like:
 
  h01 = ('scaffold_1')
 
  I just use:
 
  h01.lstrip('scaffold_')
 
  and this returns me '1'
 
 
 
  But, if the header is:
 
  h02: ('contig-100_0')
 
  if I use:
 
  h02.lstrip('contig-100_')
 
  this returns me with: ''
 
  ...basically nothing. What surprises me is that if I do in this other way:
 
  h02b = h02.lstrip('contig-100')
 
  I get h02b = ('_1')
 
  and subsequently:
 
  h02b.lstrip('_')
 
  returns me with: '1' which is what I wanted!
 
 
 
  Why is this happening? What am I missing?
 
 
 
 The methods 'lstrip', 'rstrip' and 'strip' don't strip a string, they
 
 strip characters.
 
 
 
 You should think of the argument as a set of characters to be removed.
 
 
 
 This code:
 
 
 
 h01.lstrip('scaffold_')
 
 
 
 will return the result of stripping the characters '', '_', 'a', 'c',
 
 'd', 'f', 'l', 'o' and 's' from the left-hand end of h01.
 
 
 
 A simpler example:
 
 
 
   'xyyxyabc'.lstrip('xy')
 
 'abc'
 
 
 
 It strips the characters 'x' and 'y' from the string, not the string
 
 'xy' as such.
 
 
 
 They are that way because they have been in Python for a long time,
 
 long before sets and such like were added to the language.

Hey,

Great! Now I understand!
So, basically, it is also stripping the numbers after the '_' !!

Thank you, I know a bit more now!

Have a nice day everyone :)
Max
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread mstagliamonte
Thanks to everyone! I didn't expect so many replies in such a short time!

Regards,
Max
-- 
http://mail.python.org/mailman/listinfo/python-list