Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

Wolfgang Maier Wed, 16 Jul 2014 13:57:26 -0700

On 16.07.2014 10:04, [email protected] wrote:

Hi there!!!
I have a file  with this data
['uc002uvo.3 ', 'uc001mae.1']
['uc010dya.2 ', 'uc001kko.2']
['uc003ejx.2 ', 'uc010yfr.1']
['uc001bhk.2 ', 'uc003eib.2']
['uc001znc.2 ', 'uc001efn.2']
['uc002ycq.2 ', 'uc001vnh.2']
['uc001odf.1 ', 'uc002mwd.2']
['uc010jkn.1 ', 'uc010luk.1']
['uc003uhf.3 ', 'uc010tqd.1']
['uc002rue.3 ', 'uc001tex.2']
['uc011dtt.1 ', 'uc001lkv.1']
['uc003yyt.2 ', 'uc003mkl.2']
['uc003pkv.2 ', 'uc003ytw.2']
['uc010bhz.2 ', 'uc002kbt.1']
['uc001wnj.2 ', 'uc009wtj.1']
['uc011lyh.1 ', 'uc003jvb.2']
['uc002awj.1 ', 'uc009znm.1']
['uc010bft.2 ', 'uc002cxz.1']
['uc011mar.1 ', 'uc001lvb.1']
['uc001oxl.2 ', 'uc002lvx.1']


I want to replace of the things after the dots, so I want to have  a file with
this output:

['uc002uvo ', 'uc001mae']
['uc010dya ', 'uc001kko']
...

I try to use regular expression but I have  a strange output

with open("non_annotati.csv") as p:
     for i in p:
         lines= i.rstrip("\n").split("\t")


lines is not the best variable name why not use:
           gene1, gene2 = i.rstrip("\n").split("\t")

         mit = re.sub(r'(\.\d$)','',lines[0])
         mit2 = re.sub(r'(\.\d$)','',lines[1])
         print mit,mit2

While Danny has pointed out the actual reason why your code is notworking with this specific input data, it's generally not a good idea tomake too specific assumptions about input formatting by specifying '\n'and ’\t' explicitly when all you want to do is to eliminate whitespace:


>>> help(s.split)
Help on built-in function split:

split(...) method of builtins.str instance
    S.split(sep=None, maxsplit=-1) -> list of strings

    Return a list of the words in S, using sep as the
    delimiter string.  If maxsplit is given, at most maxsplit
    splits are done. If sep is not specified or is None, any
    whitespace string is a separator and empty strings are
    removed from the result.

>>> s='uc002uvo.3 \tuc001mae.1\r\n'  # Windows line breaks
>>> s.split()
['uc002uvo.3', 'uc001mae.1']

and I agree with Joel that re is overkill here. In fact, your currentregexp will fail with two digit numbers after the dot though I don'tknow whether such names can occur in your data.


Best,
Wolfgang

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] R: Tutor Digest, Vol 125, Issue 49

Reply via email to