Fw: Re: Re: joining files
Note: Forwarded message attached -- Original Message -- From: mannu jhamannu_0...@rediffmail.com To: tuomas.vesteri...@iki.fi Subject: Re: Re: joining files---BeginMessage--- import os def merge_sources(sources): # sources is a list of tuples (source_name, source_data) data = [] keysets = [] for nme, sce in sources: lines = {} for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) keysets.append(set(lines.keys())) data.append(lines) common_keys = keysets[0] for keys in keysets[1:]: common_keys = common_keys.intersection(keys) result = {} for key in common_keys: result[key] = dict(d[key] for d in data if key in d) return result if __name__ == __main__: # Your test files here are replaced by local strings print merge_sources([(file1, file1), (file2, file2), (file3, file3)]) print merge_sources([(input1, input1), (input2, input2)]) Test_results = ''' {'22': {'file3': ['22', 'C'], 'file2': ['22', '0'], 'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9' '17.2', '33.4']}} {'194': {'input2': ['194', 'C'], 'input1': ['194', '8.00', '121.23', '54.79', '4.12', '180.06']}, '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'], 'input1': ['175', '8.42', '120.50', '55.31', '4.04', '180.33']}, '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'], 'input1': ['15', '8.45', '119.04', '55.02', '4.08', '178.89']}, '187': {'input2': ['187', 'H', '190', 'T'], 'input1': ['187', '7.79', '122.27', '54.37', '4.26', '179.75']}} Dear Sir, I tried above program but with that it is showing error: nmru...@caf:~ python join1.py Traceback (most recent call last): File join1.py, line 24, in print merge_sources([(file1, file1), (file2, file2), (file3,file3)]) NameError: name 'file1' is not defined nmru...@caf:~ Add test data to the code as: file1 = '''22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1''' Thankyou very much sir it is working..Thankyou once again for your kind help. only one problem I am now facing i.e. when I tried to replace test data with filename 1.e. file1 = open(input11.txt) file2 = open(output22.txt) print merge_sources([(file1, file1), (file2, file2)]) then it is showing error ph08...@linux-af0n:~ python new.py Traceback (most recent call last): File new.py, line 25, in print merge_sources([(file1, file1), (file2, file2)]) File new.py, line 9, in merge_sources for line in sce.split(os.linesep): AttributeError: 'file' object has no attribute 'split' ph08...@linux-af0n:~ where my input11.txt is: '''187 7.79 122.27 54.37 4.26 179.75 194 8.00 121.23 54.79 4.12 180.06 15 8.45 119.04 55.02 4.08 178.89 176 7.78 118.68 54.57 4.20 181.06 180 7.50 119.21 53.93 179.80 190 7.58 120.44 54.62 4.25 180.02 152 8.39 120.63 55.10 4.15 179.10 154 7.79 119.62 54.47 4.22 180.46 175 8.42 120.50 55.31 4.04 180.33''' and output22.txt is: '''15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C''' since my files are very big hence i want to give filename as input. ---End Message--- -- http://mail.python.org/mailman/listinfo/python-list
Re: Re: joining files
On Mon, 17 May 2010 23:57:18 +0530 wrote Try: file = open(input11.txt) file1 = file.read() # file1 is a string file.close() or file1 = open(input11.txt) # file1 is an open file object and replace lines: for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) with lines: for line in sce: lst = line.split() lines[lst[0]] = (nme, lst) sce.close() Thankyou very much sir it has worked. Thankyou once again. -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
mannu jha wrote: On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 1877.79 122.27 54.37 4.26 179.75 1948.00 121.23 54.79 4.12 180.06 158.45 119.04 55.02 4.08 178.89 1767.78 118.68 54.57 4.20 181.06 1807.50 119.21 53.93 179.80 1907.58 120.44 54.62 4.25 180.02 1528.39 120.63 55.10 4.15 179.10 1547.79 119.62 54.47 4.22 180.46 1758.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 1547.79 119.62 54.47 4.22 180.46|158 H |164 H 1758.42 120.50 55.31 4.04 180.33|176 H 1807.50 119.21 53.93 179.80|187 H 1907.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. Several mistakes here, some in making it unnecessarily complex, but I'll concentrate on the ones that just don't work. Your key() function returns the first two characters of the line. So you're keying not on the whole number, but only on the first two digits of it. To find out what's going on, you need to decompose the complex line from: d[keyfunc(item)][index] = item to some things you can actually examine: key = keyfunc(item) d[key][index] = item You don't make any check to see if a particular item is in all the files. For your particular data structure, this would mean that a particular value in the dictionary (which is a list of two items) has all non-blank strings in it. To do this, you might want to do an all() function on the list. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Re: joining files
On Sun, 16 May 2010 23:51:10 +0530 wrote On 05/16/2010 05:04 PM, Dave Angel wrote: (You forgot to include the python-list in your response. So it only went to me. Normally, you just do reply-all to the message) mannu jha wrote: On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 187 7.79 122.27 54.37 4.26 179.75 194 8.00 121.23 54.79 4.12 180.06 15 8.45 119.04 55.02 4.08 178.89 176 7.78 118.68 54.57 4.20 181.06 180 7.50 119.21 53.93 179.80 190 7.58 120.44 54.62 4.25 180.02 152 8.39 120.63 55.10 4.15 179.10 154 7.79 119.62 54.47 4.22 180.46 175 8.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 154 7.79 119.62 54.47 4.22 180.46|158 H |164 H 175 8.42 120.50 55.31 4.04 180.33|176 H 180 7.50 119.21 53.93 179.80|187 H 190 7.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. import os def merge_sources(sources): # sources is a list of tuples (source_name, source_data) data = [] keysets = [] for nme, sce in sources: lines = {} for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) keysets.append(set(lines.keys())) data.append(lines) common_keys = keysets[0] for keys in keysets[1:]: common_keys = common_keys.intersection(keys) result = {} for key in common_keys: result[key] = dict(d[key] for d in data if key in d) return result if __name__ == __main__: # Your test files here are replaced by local strings print merge_sources([(file1, file1), (file2, file2), (file3, file3)]) print merge_sources([(input1, input1), (input2, input2)]) Test_results = ''' {'22': {'file3': ['22', 'C'], 'file2': ['22', '0'], 'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9', '17.2', '33.4']}} {'194': {'input2': ['194', 'C'], 'input1': ['194', '8.00', '121.23', '54.79', '4.12', '180.06']}, '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'], 'input1': ['175', '8.42', '120.50', '55.31', '4.04', '180.33']}, '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'], 'input1': ['15', '8.45', '119.04', '55.02', '4.08', '178.89']}, '187': {'input2': ['187', 'H', '190', 'T'], 'input1': ['187', '7.79', '122.27', '54.37', '4.26', '179.75']}} Dear Sir, I tried above program but with that it is showing error: nmru...@caf:~ python join1.py Traceback (most recent call last): File join1.py, line 24, in print merge_sources([(file1, file1), (file2, file2), (file3, NameError: name 'file1' is not defined nmru...@caf:~ -- http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Fw: Re: Re: Re: joining files
Note: Forwarded message attached -- Original Message -- From: mannu jhamannu_0...@rediffmail.com To: mannu_0...@rediffmail.com Subject: Re: Re: Re: joining files---BeginMessage--- On Sun, 16 May 2010 23:51:10 +0530 wrote On 05/16/2010 05:04 PM, Dave Angel wrote: (You forgot to include the python-list in your response. So it only went to me. Normally, you just do reply-all to the message) mannu jha wrote: On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 187 7.79 122.27 54.37 4.26 179.75 194 8.00 121.23 54.79 4.12 180.06 15 8.45 119.04 55.02 4.08 178.89 176 7.78 118.68 54.57 4.20 181.06 180 7.50 119.21 53.93 179.80 190 7.58 120.44 54.62 4.25 180.02 152 8.39 120.63 55.10 4.15 179.10 154 7.79 119.62 54.47 4.22 180.46 175 8.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 154 7.79 119.62 54.47 4.22 180.46|158 H |164 H 175 8.42 120.50 55.31 4.04 180.33|176 H 180 7.50 119.21 53.93 179.80|187 H 190 7.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. import os def merge_sources(sources): # sources is a list of tuples (source_name, source_data) data = [] keysets = [] for nme, sce in sources: lines = {} for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) keysets.append(set(lines.keys())) data.append(lines) common_keys = keysets[0] for keys in keysets[1:]: common_keys = common_keys.intersection(keys) result = {} for key in common_keys: result[key] = dict(d[key] for d in data if key in d) return result if __name__ == __main__: # Your test files here are replaced by local strings print merge_sources([(file1, file1), (file2, file2), (file3, file3)]) print merge_sources([(input1, input1), (input2, input2)]) Test_results = ''' {'22': {'file3': ['22', 'C'], 'file2': ['22', '0'], 'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9', '17.2', '33.4']}} {'194': {'input2': ['194', 'C'], 'input1': ['194', '8.00', '121.23', '54.79', '4.12', '180.06']}, '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'], 'input1': ['175', '8.42', '120.50', '55.31', '4.04', '180.33']}, '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'], 'input1': ['15', '8.45', '119.04', '55.02', '4.08', '178.89']}, '187': {'input2': ['187', 'H', '190', 'T'], 'input1': ['187', '7.79', '122.27', '54.37', '4.26', '179.75']}} Dear Sir, I tried above program but with that it is showing error: nmru...@caf:~ python join1.py Traceback (most recent call last): File join1.py, line 24, in print merge_sources([(file1, file1), (file2, file2), (file3, NameError: name 'file1' is not defined nmru...@caf:~ I tried with this: import os def merge_sources(sources): # sources is a list of tuples (source_name, source_data) data = [] keysets = [] for nme, sce in sources: lines = {} for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) keysets.append(set(lines.keys())) data.append(lines) common_keys = keysets[0] for keys in keysets[1:]: common_keys = common_keys.intersection(keys) result = {} for key in common_keys: result[key] = dict(d[key] for d in data if key in d) return result if __name__ == __main__: # Your test files here are replaced by local strings file1 = [22 110.1 33 331.5 22.7 5 271.9 17.2 33.4, 4 55.1] file2 = [5 H, 22 0] print merge_sources([(file1, file1), (file2, file2)]) but with this it is showing error: nmru...@caf:~ python join1.py Traceback (most recent call last): File join1.py, line 28, in print merge_sources([(file1, file1), (file2, file2)]) File join1.py, line 9, in merge_sources for line
Re: joining files
On Sun, May 16, 2010 at 5:02 PM, mannu jha mannu_0...@rediffmail.com wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C This had better not be yet another assignment you're asking us to help you with ? *sigh* Break your problem down! Since you haven't really asked a specific question I can't give you a specific answer. --James -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
On Sun, May 16, 2010 at 12:02 AM, mannu jha mannu_0...@rediffmail.com wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C Outline of approach: 1. For each file, create a dict mapping the first number on each line to that line. 2. Take the set intersection of the key sets of the dictionaries. 3. For each key in the intersection, get the values associated with it from all the dicts and combine them, then output the combination. HTH, but you're not gonna get any code out of me. Some trivial parsing and knowledge of Python's datatypes is involved; you should already know that or be able to readily figure it out from the docs. Cheers, Chris -- insert Indian programmer quality joke here http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C Do you have a spec? Have you added any code to the last assignment to deal with this question, and in what way isn't it working? Why don't you post your code? Generally, you seem to have lines where the first word is a key to the line. The word appears to be distinguished by whitespace. So finding the key from a line would be just line.split()[0]Then you build a dictionary from each file. You didn't specify whether a given file might have multiple lines with the same key, so I'll just say to watch out for that, as a dictionary will cheerfully overwrite entries with new ones. Since your rule on multiple files is apparently to throw out any line whose key isn't in all files, you'd need to make a dictionary for each file, then analyze all of them in a later pass. That pass could involve iterating through one of the dictionaries, and for each key deciding if it's in all of the others. One way to do that is to build a list and run all() on it. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Fw: Re: Re: joining files
Note: Forwarded message attached -- Original Message -- From: mannu jhamannu_0...@rediffmail.com To: da...@ieee.org Subject: Re: Re: joining files---BeginMessage--- On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 1877.79 122.27 54.37 4.26 179.75 1948.00 121.23 54.79 4.12 180.06 158.45 119.04 55.02 4.08 178.89 1767.78 118.68 54.57 4.20 181.06 1807.50 119.21 53.93 179.80 1907.58 120.44 54.62 4.25 180.02 1528.39 120.63 55.10 4.15 179.10 1547.79 119.62 54.47 4.22 180.46 1758.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 1547.79 119.62 54.47 4.22 180.46|158 H |164 H 1758.42 120.50 55.31 4.04 180.33|176 H 1807.50 119.21 53.93 179.80|187 H 1907.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. ---End Message--- -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
(You forgot to include the python-list in your response. So it only went to me. Normally, you just do reply-all to the message) mannu jha wrote: On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 1877.79 122.27 54.37 4.26 179.75 1948.00 121.23 54.79 4.12 180.06 158.45 119.04 55.02 4.08 178.89 1767.78 118.68 54.57 4.20 181.06 1807.50 119.21 53.93 179.80 1907.58 120.44 54.62 4.25 180.02 1528.39 120.63 55.10 4.15 179.10 1547.79 119.62 54.47 4.22 180.46 1758.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 1547.79 119.62 54.47 4.22 180.46|158 H |164 H 1758.42 120.50 55.31 4.04 180.33|176 H 1807.50 119.21 53.93 179.80|187 H 1907.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. I'm about to travel all day, so my response will be quite brief. Not sure what you mean by the blank and key values that source() takes, since they're just passed on to its return value. I don't see any place where you compare the items from the various files, so you aren't checking if an item is in multiple files. DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
In article mailman.255.1273997908.32709.python-l...@python.org, Chris Rebert c...@rebertia.com wrote: -- insert Indian programmer quality joke here That's not funny. I'm sure I'd have little difficulty finding poor programmers of whatever demographic groups you belong to. Or perhaps you haven't noticed that PEBKACs are everywhere? -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ f u cn rd ths, u cn gt a gd jb n nx prgrmmng. -- http://mail.python.org/mailman/listinfo/python-list
Re: joining files
On 05/16/2010 05:04 PM, Dave Angel wrote: (You forgot to include the python-list in your response. So it only went to me. Normally, you just do reply-all to the message) mannu jha wrote: On Sun, 16 May 2010 13:52:31 +0530 wrote mannu jha wrote: Hi, I have few files like this: file1: 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4 4 55.1 file1 has total 4 column but some of them are missing in few row. file2: 5 H 22 0 file3: 4 T 5 B 22 C 121 S in all these files first column is the main source of matching their entries. So What I want in the output is only those entries which is coming in all three files. output required: 5 271.9 17.2 33.4 5 H 5 T 22 110.1 22 0 22 C I am trying with this : from collections import defaultdict def merge(sources): blanks = [blank for items, blank, keyfunc in sources] d = defaultdict(lambda: blanks[:]) for index, (items, blank, keyfunc) in enumerate(sources): for item in items: d[keyfunc(item)][index] = item for key in sorted(d): yield d[key] if __name__ == __main__: a = open(input1.txt) c = open(input2.txt) def key(line): return line[:2] def source(stream, blank=, key=key): return (line.strip() for line in stream), blank, key for m in merge([source(x) for x in [a,c]]): print |.join(c.ljust(10) for c in m) but with input1.txt: 187 7.79 122.27 54.37 4.26 179.75 194 8.00 121.23 54.79 4.12 180.06 15 8.45 119.04 55.02 4.08 178.89 176 7.78 118.68 54.57 4.20 181.06 180 7.50 119.21 53.93 179.80 190 7.58 120.44 54.62 4.25 180.02 152 8.39 120.63 55.10 4.15 179.10 154 7.79 119.62 54.47 4.22 180.46 175 8.42 120.50 55.31 4.04 180.33 and input2.txt: 15 H 37 H 95 T 124 H 130 H 152 H 154 H 158 H 164 H 175 H 176 H 180 H 187 H 190 T 194 C 196 H 207 H 210 H 232 H it is giving output as: | |124 H |130 H 154 7.79 119.62 54.47 4.22 180.46|158 H |164 H 175 8.42 120.50 55.31 4.04 180.33|176 H 180 7.50 119.21 53.93 179.80|187 H 190 7.58 120.44 54.62 4.25 180.02|196 H |207 H |210 H |232 H |37 H |95 T so it not matching it properly, can anyone please suggest where I am doing mistake. I'm about to travel all day, so my response will be quite brief. Not sure what you mean by the blank and key values that source() takes, since they're just passed on to its return value. I don't see any place where you compare the items from the various files, so you aren't checking if an item is in multiple files. DaveA import os def merge_sources(sources): # sources is a list of tuples (source_name, source_data) data = [] keysets = [] for nme, sce in sources: lines = {} for line in sce.split(os.linesep): lst = line.split() lines[lst[0]] = (nme, lst) keysets.append(set(lines.keys())) data.append(lines) common_keys = keysets[0] for keys in keysets[1:]: common_keys = common_keys.intersection(keys) result = {} for key in common_keys: result[key] = dict(d[key] for d in data if key in d) return result if __name__ == __main__: # Your test files here are replaced by local strings print merge_sources([(file1, file1), (file2, file2), (file3, file3)]) print merge_sources([(input1, input1), (input2, input2)]) Test_results = ''' {'22': {'file3': ['22', 'C'], 'file2': ['22', '0'], 'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9', '17.2', '33.4']}} {'194': {'input2': ['194', 'C'], 'input1': ['194', '8.00', '121.23', '54.79', '4.12', '180.06']}, '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'], 'input1': ['175', '8.42', '120.50', '55.31', '4.04', '180.33']}, '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'], 'input1': ['15', '8.45', '119.04', '55.02', '4.08', '178.89']}, '187': {'input2': ['187', 'H', '190', 'T'], 'input1': ['187', '7.79', '122.27', '54.37', '4.26', '179.75']}} ''' -- http://mail.python.org/mailman/listinfo/python-list