Fw: Re: Re: joining files

2010-05-18 Thread mannu jha
Note: Forwarded message attached

-- Original Message --

From: mannu jhamannu_0...@rediffmail.com
To: tuomas.vesteri...@iki.fi
Subject: Re: Re: joining files---BeginMessage---
 import os
 def merge_sources(sources):
   # sources is a list of tuples (source_name, source_data)
   data = []
   keysets = []
   for nme, sce in sources:
lines = {}
for line in sce.split(os.linesep):
 lst = line.split()
 lines[lst[0]] = (nme, lst)
keysets.append(set(lines.keys()))
data.append(lines)
   common_keys = keysets[0]
   for keys in keysets[1:]:
common_keys = common_keys.intersection(keys)
   result = {}
   for key in common_keys:
result[key] = dict(d[key] for d in data if key in d)
   return result
 if __name__ == __main__:
   # Your test files here are replaced by local strings
   print merge_sources([(file1, file1), (file2, file2), (file3,

 file3)])
   print merge_sources([(input1, input1), (input2, input2)])
 Test_results = '''
 {'22': {'file3': ['22', 'C'],
'file2': ['22', '0'],
'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9'
  '17.2', '33.4']}}
 {'194': {'input2': ['194', 'C'],
'input1': ['194', '8.00', '121.23', '54.79', '4.12',
   '180.06']},
  '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'],
'input1': ['175', '8.42', '120.50', '55.31', '4.04',
   '180.33']},
  '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'],
'input1': ['15', '8.45', '119.04', '55.02', '4.08',
   '178.89']},
  '187': {'input2': ['187', 'H', '190', 'T'],
'input1': ['187', '7.79', '122.27', '54.37', '4.26',
   '179.75']}}
 Dear Sir,

 I tried above program but with that it is showing error:
 nmru...@caf:~ python join1.py
 Traceback (most recent call last):
  File join1.py, line 24, in
   print merge_sources([(file1, file1), (file2, file2), (file3,file3)])
 NameError: name 'file1' is not defined
 nmru...@caf:~
Add test data to the code as:
file1 = '''22 110.1 33 331.5 22.7 5 271.9 17.2 33.4
4 55.1'''
Thankyou very much sir it is working..Thankyou once again for your kind 
help. 

only one problem I am now facing i.e. when I tried to replace test data with 
filename 1.e.
file1 = open(input11.txt)
file2 = open(output22.txt)
print merge_sources([(file1, file1), (file2, file2)])
then it is showing error 

ph08...@linux-af0n:~ python new.py
Traceback (most recent call last):
  File new.py, line 25, in 
print merge_sources([(file1, file1), (file2, file2)])
  File new.py, line 9, in merge_sources
for line in sce.split(os.linesep):
AttributeError: 'file' object has no attribute 'split'
ph08...@linux-af0n:~ 

where my input11.txt is:
'''187 7.79 122.27 54.37 4.26 179.75
194 8.00 121.23 54.79 4.12 180.06
15 8.45 119.04 55.02 4.08 178.89
176 7.78 118.68 54.57 4.20 181.06
180 7.50 119.21 53.93 179.80
190 7.58 120.44 54.62 4.25 180.02
152 8.39 120.63 55.10 4.15 179.10
154 7.79 119.62 54.47 4.22 180.46
175 8.42 120.50 55.31 4.04 180.33'''

and output22.txt is:
'''15 H
37 H
95 T
124 H
130 H
152 H
154 H
158 H
164 H
175 H
176 H
180 H
187 H
190 T
194 C'''

since my files are very big hence i want to give filename as input.





---End Message---
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Re: joining files

2010-05-18 Thread mannu jha


On Mon, 17 May 2010 23:57:18 +0530  wrote
Try:
file = open(input11.txt)
file1 = file.read() # file1 is a string
file.close()
or
file1 = open(input11.txt) # file1 is an open file object
and replace lines:
 for line in sce.split(os.linesep):
   lst = line.split()
   lines[lst[0]] = (nme, lst)
with lines:
 for line in sce:
   lst = line.split()
   lines[lst[0]] = (nme, lst)
 sce.close()


Thankyou very much sir it has worked. Thankyou once again.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-17 Thread Dave Angel

mannu jha wrote:

On Sun, 16 May 2010 13:52:31 +0530  wrote
  

mannu jha wrote:



  

Hi,



  

  

I have few files like this:
file1:
22 110.1 
33 331.5 22.7 
5 271.9 17.2 33.4
4 55.1 



  

file1 has total 4 column but some of them are missing in few row.



  

file2:
5 H
22 0



  

file3:
4 T
5 B
22 C
121 S



  

  

in all these files first column is the main source of matching their entries. 
So What I want in the output is only those entries which is coming in all three 
files.



  

output required:



  

5 271.9 17.2 33.4 5 H 5 T
22 110.1 22 0 22 C



  
I am trying with this :


from collections import defaultdict

def merge(sources):
blanks = [blank for items, blank, keyfunc in sources]
d = defaultdict(lambda: blanks[:])
for index, (items, blank, keyfunc) in enumerate(sources):
for item in items:
d[keyfunc(item)][index] = item
for key in sorted(d):
yield d[key]

if __name__ == __main__:
a = open(input1.txt)

c = open(input2.txt)


def key(line):
return line[:2]
def source(stream, blank=, key=key):
return (line.strip() for line in stream), blank, key
for m in merge([source(x) for x in [a,c]]):
print |.join(c.ljust(10) for c in m)

but with input1.txt:
1877.79   122.27   54.37   4.26   179.75
1948.00   121.23   54.79   4.12   180.06
158.45   119.04   55.02   4.08   178.89
1767.78   118.68   54.57   4.20   181.06
1807.50   119.21   53.93  179.80
1907.58   120.44   54.62   4.25   180.02
1528.39   120.63   55.10   4.15   179.10
1547.79   119.62   54.47   4.22   180.46
1758.42   120.50   55.31   4.04   180.33
and input2.txt:
 15   H 
 37   H 
 95   T
124   H 
130   H 
152   H 
154   H 
158   H 
164   H
175   H 
176   H 
180   H
187   H 
190   T

194   C
196   H 
207   H 
210   H 
232   H 
it is giving output as:

  |
  |124   H
  |130   H
1547.79   119.62   54.47   4.22   180.46|158   H
  |164   H
1758.42   120.50   55.31   4.04   180.33|176   H
1807.50   119.21   53.93  179.80|187   H
1907.58   120.44   54.62   4.25   180.02|196   H
  |207   H
  |210   H
  |232   H
  |37   H
  |95   T
so it not matching it properly, can anyone please suggest where I am doing 
mistake.


  
Several mistakes here, some in making it unnecessarily complex, but I'll 
concentrate on the ones that just don't work.


Your key() function returns the first two characters of the line.  So 
you're keying not on the whole number, but only on the first two digits 
of it.  To find out what's going on, you need to decompose the complex 
line from:
 


   d[keyfunc(item)][index] = item

to some things you can actually examine:
  key = keyfunc(item)
  d[key][index] = item


You don't make any check to see if a particular item is in all the 
files.  For your particular data structure, this would mean that a 
particular value in the dictionary (which is a list of two items) has 
all non-blank strings in it.  To do this, you might want to do an all() 
function on the list.



DaveA


--
http://mail.python.org/mailman/listinfo/python-list


Re: Re: joining files

2010-05-17 Thread mannu jha

On Sun, 16 May 2010 23:51:10 +0530  wrote
On 05/16/2010 05:04 PM, Dave Angel wrote:
 (You forgot to include the python-list in your response. So it only
 went to me. Normally, you just do reply-all to the message)
 mannu jha wrote:
 On Sun, 16 May 2010 13:52:31 +0530 wrote
 mannu jha wrote:
 Hi,
 I have few files like this:
 file1:
 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4
 4 55.1
 file1 has total 4 column but some of them are missing in few row.
 file2:
 5 H
 22 0
 file3:
 4 T
 5 B
 22 C
 121 S
 in all these files first column is the main source of matching their 
 entries. So What I want in the output is only those entries which is coming 
 in all three files. output required:
 5 271.9 17.2 33.4 5 H 5 T
 22 110.1 22 0 22 C
 I am trying with this :
 from collections import defaultdict
 def merge(sources):
 blanks = [blank for items, blank, keyfunc in sources]
 d = defaultdict(lambda: blanks[:])
 for index, (items, blank, keyfunc) in enumerate(sources):
 for item in items:
 d[keyfunc(item)][index] = item
 for key in sorted(d):
 yield d[key]
 if __name__ == __main__:
 a = open(input1.txt)
 c = open(input2.txt)
 def key(line):
 return line[:2]
 def source(stream, blank=, key=key):
 return (line.strip() for line in stream), blank, key
 for m in merge([source(x) for x in [a,c]]):
 print |.join(c.ljust(10) for c in m)
 but with input1.txt:
 187 7.79 122.27 54.37 4.26 179.75
 194 8.00 121.23 54.79 4.12 180.06
 15 8.45 119.04 55.02 4.08 178.89
 176 7.78 118.68 54.57 4.20 181.06
 180 7.50 119.21 53.93 179.80
 190 7.58 120.44 54.62 4.25 180.02
 152 8.39 120.63 55.10 4.15 179.10
 154 7.79 119.62 54.47 4.22 180.46
 175 8.42 120.50 55.31 4.04 180.33
 and input2.txt:
 15 H 37 H 95 T
 124 H 130 H 152 H 154 H 158 H 164 H
 175 H 176 H 180 H
 187 H 190 T
 194 C
 196 H 207 H 210 H 232 H it is giving output as:
 |
 |124 H
 |130 H
 154 7.79 119.62 54.47 4.22 180.46|158 H
 |164 H
 175 8.42 120.50 55.31 4.04 180.33|176 H
 180 7.50 119.21 53.93 179.80|187 H
 190 7.58 120.44 54.62 4.25 180.02|196 H
 |207 H
 |210 H
 |232 H
 |37 H
 |95 T
 so it not matching it properly, can anyone please suggest where I am doing 
 mistake.

import os

def merge_sources(sources):
   # sources is a list of tuples (source_name, source_data)
   data = []
   keysets = []
   for nme, sce in sources:
 lines = {}
 for line in sce.split(os.linesep):
   lst = line.split()
   lines[lst[0]] = (nme, lst)
 keysets.append(set(lines.keys()))
 data.append(lines)
   common_keys = keysets[0]
   for keys in keysets[1:]:
 common_keys = common_keys.intersection(keys)
   result = {}
   for key in common_keys:
 result[key] = dict(d[key] for d in data if key in d)
   return result
if __name__ == __main__:
   # Your test files here are replaced by local strings
   print merge_sources([(file1, file1), (file2, file2), (file3, 
file3)])
   print merge_sources([(input1, input1), (input2, input2)])
Test_results = '''
{'22': {'file3': ['22', 'C'],
 'file2': ['22', '0'],
 'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9',
  '17.2', '33.4']}}
{'194': {'input2': ['194', 'C'],
 'input1': ['194', '8.00', '121.23', '54.79', '4.12',
   '180.06']},
 '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'],
 'input1': ['175', '8.42', '120.50', '55.31', '4.04',
   '180.33']},
  '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'],
 'input1': ['15', '8.45', '119.04', '55.02', '4.08',
   '178.89']},
 '187': {'input2': ['187', 'H', '190', 'T'],
 'input1': ['187', '7.79', '122.27', '54.37', '4.26',
   '179.75']}}

Dear Sir,

I tried above program but with that it is showing error:
nmru...@caf:~ python join1.py
Traceback (most recent call last):
  File join1.py, line 24, in 
print merge_sources([(file1, file1), (file2, file2), (file3,
NameError: name 'file1' is not defined
nmru...@caf:~ 



-- 

http://mail.python.org/mailman/listinfo/python-list

-- 
http://mail.python.org/mailman/listinfo/python-list


Fw: Re: Re: Re: joining files

2010-05-17 Thread mannu jha
Note: Forwarded message attached

-- Original Message --

From: mannu jhamannu_0...@rediffmail.com
To: mannu_0...@rediffmail.com
Subject: Re: Re: Re: joining files---BeginMessage---
On Sun, 16 May 2010 23:51:10 +0530 wrote
On 05/16/2010 05:04 PM, Dave Angel wrote:
 (You forgot to include the python-list in your response. So it only
 went to me. Normally, you just do reply-all to the message)
 mannu jha wrote:
 On Sun, 16 May 2010 13:52:31 +0530 wrote
 mannu jha wrote:
 Hi,
 I have few files like this:
 file1:
 22 110.1 33 331.5 22.7 5 271.9 17.2 33.4
 4 55.1
 file1 has total 4 column but some of them are missing in few row.
 file2:
 5 H
 22 0
 file3:
 4 T
 5 B
 22 C
 121 S
 in all these files first column is the main source of matching their 
 entries. So What I want in the output is only those entries which is coming 
 in all three files. output required:
 5 271.9 17.2 33.4 5 H 5 T
 22 110.1 22 0 22 C
 I am trying with this :
 from collections import defaultdict
 def merge(sources):
 blanks = [blank for items, blank, keyfunc in sources]
 d = defaultdict(lambda: blanks[:])
 for index, (items, blank, keyfunc) in enumerate(sources):
 for item in items:
 d[keyfunc(item)][index] = item
 for key in sorted(d):
 yield d[key]
 if __name__ == __main__:
 a = open(input1.txt)
 c = open(input2.txt)
 def key(line):
 return line[:2]
 def source(stream, blank=, key=key):
 return (line.strip() for line in stream), blank, key
 for m in merge([source(x) for x in [a,c]]):
 print |.join(c.ljust(10) for c in m)
 but with input1.txt:
 187 7.79 122.27 54.37 4.26 179.75
 194 8.00 121.23 54.79 4.12 180.06
 15 8.45 119.04 55.02 4.08 178.89
 176 7.78 118.68 54.57 4.20 181.06
 180 7.50 119.21 53.93 179.80
 190 7.58 120.44 54.62 4.25 180.02
 152 8.39 120.63 55.10 4.15 179.10
 154 7.79 119.62 54.47 4.22 180.46
 175 8.42 120.50 55.31 4.04 180.33
 and input2.txt:
 15 H 37 H 95 T
 124 H 130 H 152 H 154 H 158 H 164 H
 175 H 176 H 180 H
 187 H 190 T
 194 C
 196 H 207 H 210 H 232 H it is giving output as:
 |
 |124 H
 |130 H
 154 7.79 119.62 54.47 4.22 180.46|158 H
 |164 H
 175 8.42 120.50 55.31 4.04 180.33|176 H
 180 7.50 119.21 53.93 179.80|187 H
 190 7.58 120.44 54.62 4.25 180.02|196 H
 |207 H
 |210 H
 |232 H
 |37 H
 |95 T
 so it not matching it properly, can anyone please suggest where I am doing 
 mistake.

import os

def merge_sources(sources):
# sources is a list of tuples (source_name, source_data)
data = []
keysets = []
for nme, sce in sources:
lines = {}
for line in sce.split(os.linesep):
lst = line.split()
lines[lst[0]] = (nme, lst)
keysets.append(set(lines.keys()))
data.append(lines)
common_keys = keysets[0]
for keys in keysets[1:]:
common_keys = common_keys.intersection(keys)
result = {}
for key in common_keys:
result[key] = dict(d[key] for d in data if key in d)
return result
if __name__ == __main__:
# Your test files here are replaced by local strings
print merge_sources([(file1, file1), (file2, file2), (file3,
file3)])
print merge_sources([(input1, input1), (input2, input2)])
Test_results = '''
{'22': {'file3': ['22', 'C'],
'file2': ['22', '0'],
'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9',
'17.2', '33.4']}}
{'194': {'input2': ['194', 'C'],
'input1': ['194', '8.00', '121.23', '54.79', '4.12',
'180.06']},
'175': {'input2': ['175', 'H', '176', 'H', '180', 'H'],
'input1': ['175', '8.42', '120.50', '55.31', '4.04',
'180.33']},
'15': {'input2': ['15', 'H', '37', 'H', '95', 'T'],
'input1': ['15', '8.45', '119.04', '55.02', '4.08',
'178.89']},
'187': {'input2': ['187', 'H', '190', 'T'],
'input1': ['187', '7.79', '122.27', '54.37', '4.26',
'179.75']}}

Dear Sir,

I tried above program but with that it is showing error:
nmru...@caf:~ python join1.py
Traceback (most recent call last):
File join1.py, line 24, in
print merge_sources([(file1, file1), (file2, file2), (file3,
NameError: name 'file1' is not defined
nmru...@caf:~ 

I tried with this:
import os

def merge_sources(sources):
# sources is a list of tuples (source_name, source_data)
data = []
keysets = []
for nme, sce in sources:
lines = {}
for line in sce.split(os.linesep):
lst = line.split()
lines[lst[0]] = (nme, lst)
keysets.append(set(lines.keys()))
data.append(lines)
common_keys = keysets[0]
for keys in keysets[1:]:
common_keys = common_keys.intersection(keys)
result = {}
for key in common_keys:
result[key] = dict(d[key] for d in data if key in d)
return result

if __name__ == __main__:
# Your test files here are replaced by local strings
   file1 = [22 110.1 33 331.5 22.7 5 271.9 17.2 33.4,
4 55.1]
   file2 = [5 H,
22 0]
   print merge_sources([(file1, file1), (file2, file2)])

but with this it is showing error:
nmru...@caf:~ python join1.py
Traceback (most recent call last):
  File join1.py, line 28, in 
print merge_sources([(file1, file1), (file2, file2)])
  File join1.py, line 9, in merge_sources
for line

Re: joining files

2010-05-16 Thread James Mills
On Sun, May 16, 2010 at 5:02 PM, mannu jha mannu_0...@rediffmail.com wrote:
 Hi,

 I have few files like this:
 file1:
 22 110.1
 33 331.5 22.7
 5 271.9 17.2 33.4
 4 55.1

 file1 has total 4 column but some of them are missing in few row.

 file2:
 5 H
 22 0

 file3:
 4 T
 5 B
 22 C
 121 S

 in all these files first column is the main source of matching their
 entries. So What I want in the output is only those entries which is coming
 in all three files.
 output required:
 5 271.9 17.2 33.4 5 H 5 T
 22 110.1 22 0 22 C

This had better not be yet another assignment
you're asking us to help you with ? *sigh*

Break your problem down!

Since you haven't really asked a specific question
I can't give you a specific answer.

--James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-16 Thread Chris Rebert
On Sun, May 16, 2010 at 12:02 AM, mannu jha mannu_0...@rediffmail.com wrote:
 Hi,

 I have few files like this:
 file1:
 22 110.1
 33 331.5 22.7
 5 271.9 17.2 33.4
 4 55.1

 file1 has total 4 column but some of them are missing in few row.

 file2:
 5 H
 22 0

 file3:
 4 T
 5 B
 22 C
 121 S

 in all these files first column is the main source of matching their
 entries. So What I want in the output is only those entries which is coming
 in all three files.
 output required:
 5 271.9 17.2 33.4 5 H 5 T
 22 110.1 22 0 22 C

Outline of approach:
1. For each file, create a dict mapping the first number on each line
to that line.
2. Take the set intersection of the key sets of the dictionaries.
3. For each key in the intersection, get the values associated with it
from all the dicts and combine them, then output the combination.

HTH, but you're not gonna get any code out of me.
Some trivial parsing and knowledge of Python's datatypes is involved;
you should already know that or be able to readily figure it out from
the docs.

Cheers,
Chris
--
insert Indian programmer quality joke here
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-16 Thread Dave Angel

mannu jha wrote:

Hi,

I have few files like this:
file1:
22 110.1  
33 331.5 22.7 
5 271.9 17.2 33.4
4 55.1 


file1 has total 4 column but some of them are missing in few row.

file2:
5 H
22 0

file3:
4 T
5 B
22 C
121 S

in all these files first column is the main source of matching their entries. 
So What I want in the output is only those entries which is coming in all three 
files.
output required:
5 271.9 17.2 33.4 5 H  5 T
22 110.1 22 0 22 C


  
Do you have a spec?  Have you added any code to the last assignment to 
deal with this question, and in what way isn't it working?  Why don't 
you post your code?


Generally, you seem to have lines where the first word is a key to the 
line.  The word appears to be distinguished by whitespace.  So finding 
the key from a line would be just   line.split()[0]Then you build a 
dictionary from each file.  You didn't specify whether a given file 
might have multiple lines with the same key, so I'll just say to watch 
out for that, as a dictionary will cheerfully overwrite entries with new 
ones.


Since your rule on multiple files is apparently to throw out any line 
whose key isn't in all files, you'd need to make a dictionary for each 
file, then analyze all  of them in a later pass.  That pass could 
involve iterating through one of the dictionaries, and for each key 
deciding if it's in all of the others.  One way to do that is to build a 
list and run all() on it.


DaveA



--
http://mail.python.org/mailman/listinfo/python-list


Fw: Re: Re: joining files

2010-05-16 Thread mannu jha
Note: Forwarded message attached

-- Original Message --

From: mannu jhamannu_0...@rediffmail.com
To: da...@ieee.org
Subject: Re: Re: joining files---BeginMessage---


On Sun, 16 May 2010 13:52:31 +0530  wrote
mannu jha wrote:

 Hi,



 I have few files like this:
 file1:
 22 110.1 
 33 331.5 22.7 
 5 271.9 17.2 33.4
 4 55.1 

 file1 has total 4 column but some of them are missing in few row.

 file2:
 5 H
 22 0

 file3:
 4 T
 5 B
 22 C
 121 S



 in all these files first column is the main source of matching their entries. 
 So What I want in the output is only those entries which is coming in all 
 three files.

 output required:

 5 271.9 17.2 33.4 5 H 5 T
 22 110.1 22 0 22 C


I am trying with this :

from collections import defaultdict

def merge(sources):
blanks = [blank for items, blank, keyfunc in sources]
d = defaultdict(lambda: blanks[:])
for index, (items, blank, keyfunc) in enumerate(sources):
for item in items:
d[keyfunc(item)][index] = item
for key in sorted(d):
yield d[key]

if __name__ == __main__:
a = open(input1.txt)

c = open(input2.txt)

def key(line):
return line[:2]
def source(stream, blank=, key=key):
return (line.strip() for line in stream), blank, key
for m in merge([source(x) for x in [a,c]]):
print |.join(c.ljust(10) for c in m)

but with input1.txt:
1877.79   122.27   54.37   4.26   179.75
1948.00   121.23   54.79   4.12   180.06
158.45   119.04   55.02   4.08   178.89
1767.78   118.68   54.57   4.20   181.06
1807.50   119.21   53.93  179.80
1907.58   120.44   54.62   4.25   180.02
1528.39   120.63   55.10   4.15   179.10
1547.79   119.62   54.47   4.22   180.46
1758.42   120.50   55.31   4.04   180.33
and input2.txt:
 15   H 
 37   H 
 95   T
124   H 
130   H 
152   H 
154   H 
158   H 
164   H
175   H 
176   H 
180   H
187   H 
190   T
194   C
196   H 
207   H 
210   H 
232   H 
it is giving output as:
  |
  |124   H
  |130   H
1547.79   119.62   54.47   4.22   180.46|158   H
  |164   H
1758.42   120.50   55.31   4.04   180.33|176   H
1807.50   119.21   53.93  179.80|187   H
1907.58   120.44   54.62   4.25   180.02|196   H
  |207   H
  |210   H
  |232   H
  |37   H
  |95   T
so it not matching it properly, can anyone please suggest where I am doing 
mistake.


---End Message---
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-16 Thread Dave Angel
(You forgot to include the python-list in your response.  So it only 
went to me.  Normally, you just do reply-all to the message)


mannu jha wrote:

On Sun, 16 May 2010 13:52:31 +0530  wrote
  

mannu jha wrote:



  

Hi,



  

  

I have few files like this:
file1:
22 110.1 
33 331.5 22.7 
5 271.9 17.2 33.4
4 55.1 



  

file1 has total 4 column but some of them are missing in few row.



  

file2:
5 H
22 0



  

file3:
4 T
5 B
22 C
121 S



  

  

in all these files first column is the main source of matching their entries. 
So What I want in the output is only those entries which is coming in all three 
files.



  

output required:



  

5 271.9 17.2 33.4 5 H 5 T
22 110.1 22 0 22 C



  
I am trying with this :


from collections import defaultdict

def merge(sources):
blanks = [blank for items, blank, keyfunc in sources]
d = defaultdict(lambda: blanks[:])
for index, (items, blank, keyfunc) in enumerate(sources):
for item in items:
d[keyfunc(item)][index] = item
for key in sorted(d):
yield d[key]

if __name__ == __main__:
a = open(input1.txt)

c = open(input2.txt)


def key(line):
return line[:2]
def source(stream, blank=, key=key):
return (line.strip() for line in stream), blank, key
for m in merge([source(x) for x in [a,c]]):
print |.join(c.ljust(10) for c in m)

but with input1.txt:
1877.79   122.27   54.37   4.26   179.75
1948.00   121.23   54.79   4.12   180.06
158.45   119.04   55.02   4.08   178.89
1767.78   118.68   54.57   4.20   181.06
1807.50   119.21   53.93  179.80
1907.58   120.44   54.62   4.25   180.02
1528.39   120.63   55.10   4.15   179.10
1547.79   119.62   54.47   4.22   180.46
1758.42   120.50   55.31   4.04   180.33
and input2.txt:
 15   H 
 37   H 
 95   T
124   H 
130   H 
152   H 
154   H 
158   H 
164   H
175   H 
176   H 
180   H
187   H 
190   T

194   C
196   H 
207   H 
210   H 
232   H 
it is giving output as:

  |
  |124   H
  |130   H
1547.79   119.62   54.47   4.22   180.46|158   H
  |164   H
1758.42   120.50   55.31   4.04   180.33|176   H
1807.50   119.21   53.93  179.80|187   H
1907.58   120.44   54.62   4.25   180.02|196   H
  |207   H
  |210   H
  |232   H
  |37   H
  |95   T
so it not matching it properly, can anyone please suggest where I am doing 
mistake.



  

I'm about to travel all day, so my response will be quite brief.

Not sure what you mean by the blank and key values that source() takes, 
since they're just passed on to its return value.


I don't see any place where you compare the items from the various 
files, so you aren't checking if an item is in multiple files.


DaveA

--
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-16 Thread Aahz
In article mailman.255.1273997908.32709.python-l...@python.org,
Chris Rebert  c...@rebertia.com wrote:

--
insert Indian programmer quality joke here

That's not funny.  I'm sure I'd have little difficulty finding poor
programmers of whatever demographic groups you belong to.  Or perhaps you
haven't noticed that PEBKACs are everywhere?
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

f u cn rd ths, u cn gt a gd jb n nx prgrmmng.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: joining files

2010-05-16 Thread Tuomas Vesterinen

On 05/16/2010 05:04 PM, Dave Angel wrote:

(You forgot to include the python-list in your response.  So it only
went to me. Normally, you just do reply-all to the message)

mannu jha wrote:

On Sun, 16 May 2010 13:52:31 +0530 wrote

mannu jha wrote:



Hi,




I have few files like this:
file1:
22 110.1 33 331.5 22.7 5 271.9 17.2 33.4
4 55.1



file1 has total 4 column but some of them are missing in few row.



file2:
5 H
22 0



file3:
4 T
5 B
22 C
121 S




in all these files first column is the main source of matching their
entries. So What I want in the output is only those entries which is
coming in all three files.



output required:



5 271.9 17.2 33.4 5 H 5 T
22 110.1 22 0 22 C


I am trying with this :

from collections import defaultdict

def merge(sources):
blanks = [blank for items, blank, keyfunc in sources]
d = defaultdict(lambda: blanks[:])
for index, (items, blank, keyfunc) in enumerate(sources):
for item in items:
d[keyfunc(item)][index] = item
for key in sorted(d):
yield d[key]

if __name__ == __main__:
a = open(input1.txt)
c = open(input2.txt)

def key(line):
return line[:2]
def source(stream, blank=, key=key):
return (line.strip() for line in stream), blank, key
for m in merge([source(x) for x in [a,c]]):
print |.join(c.ljust(10) for c in m)

but with input1.txt:
187 7.79 122.27 54.37 4.26 179.75
194 8.00 121.23 54.79 4.12 180.06
15 8.45 119.04 55.02 4.08 178.89
176 7.78 118.68 54.57 4.20 181.06
180 7.50 119.21 53.93 179.80
190 7.58 120.44 54.62 4.25 180.02
152 8.39 120.63 55.10 4.15 179.10
154 7.79 119.62 54.47 4.22 180.46
175 8.42 120.50 55.31 4.04 180.33
and input2.txt:
15 H 37 H 95 T
124 H 130 H 152 H 154 H 158 H 164 H
175 H 176 H 180 H
187 H 190 T
194 C
196 H 207 H 210 H 232 H it is giving output as:
|
|124 H
|130 H
154 7.79 119.62 54.47 4.22 180.46|158 H
|164 H
175 8.42 120.50 55.31 4.04 180.33|176 H
180 7.50 119.21 53.93 179.80|187 H
190 7.58 120.44 54.62 4.25 180.02|196 H
|207 H
|210 H
|232 H
|37 H
|95 T
so it not matching it properly, can anyone please suggest where I am
doing mistake.




I'm about to travel all day, so my response will be quite brief.

Not sure what you mean by the blank and key values that source() takes,
since they're just passed on to its return value.

I don't see any place where you compare the items from the various
files, so you aren't checking if an item is in multiple files.

DaveA


import os

def merge_sources(sources):
# sources is a list of tuples (source_name, source_data)
data = []
keysets = []
for nme, sce in sources:
lines = {}
for line in sce.split(os.linesep):
lst = line.split()
lines[lst[0]] = (nme, lst)
keysets.append(set(lines.keys()))
data.append(lines)
common_keys = keysets[0]
for keys in keysets[1:]:
common_keys = common_keys.intersection(keys)
result = {}
for key in common_keys:
result[key] = dict(d[key] for d in data if key in d)
return result

if __name__ == __main__:
# Your test files here are replaced by local strings
print merge_sources([(file1, file1), (file2, file2), (file3, 
file3)])

print merge_sources([(input1, input1), (input2, input2)])

Test_results = '''
{'22': {'file3': ['22', 'C'],
'file2': ['22', '0'],
'file1': ['22', '110.1', '33', '331.5', '22.7', '5', '271.9',
  '17.2', '33.4']}}

{'194': {'input2': ['194', 'C'],
 'input1': ['194', '8.00', '121.23', '54.79', '4.12',
'180.06']},
 '175': {'input2': ['175', 'H', '176', 'H', '180', 'H'],
 'input1': ['175', '8.42', '120.50', '55.31', '4.04',
'180.33']},
  '15': {'input2': ['15', 'H', '37', 'H', '95', 'T'],
 'input1': ['15', '8.45', '119.04', '55.02', '4.08',
'178.89']},
 '187': {'input2': ['187', 'H', '190', 'T'],
 'input1': ['187', '7.79', '122.27', '54.37', '4.26',
'179.75']}}
'''

--
http://mail.python.org/mailman/listinfo/python-list