Re: Extracting lines from text files - script with a couple of 'side effects'
Hi, Thanks for the answers! And Dave, thanks for explaining the cause of the problem I will keep that in mind for the future. You're right, I am doing the search backward, it just seemed easier for me to do it in that way. Looks like I need to keep practising... Both your suggestions work, I will try and learn from them. Have a nice day Max -- https://mail.python.org/mailman/listinfo/python-list
Extracting lines from text files - script with a couple of 'side effects'
Dear All, Here I am, with another newbie question. I am trying to extract some lines from a fasta (text) file which match the headers in another file. i.e: Fasta file: header1|info1:info2_info3 general text header2|info1:info2_info3 general text headers file: header1|info1:info2_info3 header2|info1:info2_info3 I want to create a third file, similar to the first one, but only containing headers and text of what is listed in the second file. Also, I want to print out how many headers were actually found from the second file to match the first. I have done a script which seems to work, but with a couple of 'side effects' Here is my script: --- import re class Extractor(): def __init__(self,headers_file, fasta_file,output_file): with open(headers_file,'r') as inp0: counter0=0 container='' inp0_bis=inp0.read().split('\n') for x in inp0_bis: container+=x.replace(':','_').replace('|','_') with open(fasta_file,'r') as inp1: inp1_bis=inp1.read().split('') for i in inp1_bis: i_bis= i.split('\n') match = re.search(i_bis[0].replace(':','_').replace('|','_'),container) if match: counter0+=1 with open(output_file,'at') as out0: out0.write(''+i) print '{} sequences were found'.format(counter0) --- Side effects: 1) The very first header is written as header1 rather than header1 2) the number of sequences found is 1 more than the ones actually found! Have you got any thoughts about causes/solutions? Thanks for your time! P.S.: I think I have removed the double posting... not sure... Max -- https://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me with '1' But, if the header is: h02: ('contig-100_1') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max -- http://mail.python.org/mailman/listinfo/python-list
lstrip problem - beginner question
Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max edit: h02: ('contig-100_1') -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max edit: h02= ('contig-100_1') -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max edit: h02= ('contig-100_1') -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:41:43 AM UTC-4, Fábio Santos wrote: On 4 Jun 2013 16:34, mstagliamonte madm...@yahoo.it wrote: On Tuesday, June 4, 2013 11:21:53 AM UTC-4, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? Thanks for your help and attention Max edit: h02: ('contig-100_1') You don't have to use ('..') to declare a string. Just 'your string' will do. You can use str.split to split your string by a character. (Not tested) string_on_left, numbers = 'contig-100_01'.split('-') left_number, right_number = numbers.split('_') left_number, right_number = int(left_number), int(right_number) Of course, you will want to replace the variable names. If you have more advanced parsing needs, you will want to look at regular expressions or blobs. Thanks, I will try it straight away. Still, I don't understand why the original command is returning me with nothing !? Have you got any idea? I am trying to understand a bit the 'nuts and bolts' of what I am doing and this result does not make any sense to me Regards Max -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
On Tuesday, June 4, 2013 11:48:55 AM UTC-4, MRAB wrote: On 04/06/2013 16:21, mstagliamonte wrote: Hi everyone, I am a beginner in python and trying to find my way through... :) I am writing a script to get numbers from the headers of a text file. If the header is something like: h01 = ('scaffold_1') I just use: h01.lstrip('scaffold_') and this returns me '1' But, if the header is: h02: ('contig-100_0') if I use: h02.lstrip('contig-100_') this returns me with: '' ...basically nothing. What surprises me is that if I do in this other way: h02b = h02.lstrip('contig-100') I get h02b = ('_1') and subsequently: h02b.lstrip('_') returns me with: '1' which is what I wanted! Why is this happening? What am I missing? The methods 'lstrip', 'rstrip' and 'strip' don't strip a string, they strip characters. You should think of the argument as a set of characters to be removed. This code: h01.lstrip('scaffold_') will return the result of stripping the characters '', '_', 'a', 'c', 'd', 'f', 'l', 'o' and 's' from the left-hand end of h01. A simpler example: 'xyyxyabc'.lstrip('xy') 'abc' It strips the characters 'x' and 'y' from the string, not the string 'xy' as such. They are that way because they have been in Python for a long time, long before sets and such like were added to the language. Hey, Great! Now I understand! So, basically, it is also stripping the numbers after the '_' !! Thank you, I know a bit more now! Have a nice day everyone :) Max -- http://mail.python.org/mailman/listinfo/python-list
Re: lstrip problem - beginner question
Thanks to everyone! I didn't expect so many replies in such a short time! Regards, Max -- http://mail.python.org/mailman/listinfo/python-list