Re: [Tutor] csv merge with different column title
Pareshkumar Panchal wrote: > I am trying to merge two csv files with following condition. > > filea.csv column( a1,a2,a3) > fileb.csv column( b1,b2,b3) > > i wanted to merge if a1 & b1 reference (as the title is different but > contents are valid for comparison). If the column title is the problem you can rename it. > also the result i need is > a1,a2,a3,b1,b2 (not b3). Then delete that column either before or after the merge. > i am using openpyxl,csv modules at this moment. > > any help appreciated. Here's a demo using pandas: >>> import pandas as pd Read the original files: >>> left = pd.read_csv("left.csv") >>> left alpha beta gamma 0 fooonetwo 1 bar three four 2 baz fivesix 3 ham seven eight [4 rows x 3 columns] >>> right = pd.read_csv("right.csv") >>> right delta epsilon zeta 0 foo the quick 1 bar brownfox 2 baz jumps over 3 spam the lazy [4 rows x 3 columns] Merge by alpha/delta column: >>> left.merge(right, left_on="alpha", right_on="delta", how="outer") alpha beta gamma delta epsilon zeta 0 fooonetwo foo the quick 1 bar three four bar brownfox 2 baz fivesix baz jumps over 3 ham seven eight NaN NaNNaN 4 NaNNaNNaN spam the lazy [5 rows x 6 columns] >>> both = _ >>> import sys Write to file (or stream as I did for demonstration purposes): >>> both.to_csv(sys.stdout) ,alpha,beta,gamma,delta,epsilon,zeta 0,foo,one,two,foo,the,quick 1,bar,three,four,bar,brown,fox 2,baz,five,six,baz,jumps,over 3,ham,seven,eight,,, 4spam,the,lazy Delete a column you don't need (could have done that earlier): >>> del both["zeta"] >>> both.to_csv(sys.stdout) ,alpha,beta,gamma,delta,epsilon 0,foo,one,two,foo,the 1,bar,three,four,bar,brown 2,baz,five,six,baz,jumps 3,ham,seven,eight,, 4spam,the ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] csv merge with different column title
On 14/04/18 04:32, Pareshkumar Panchal wrote: > I am trying to merge two csv files with following condition. > > filea.csv column( a1,a2,a3) > fileb.csv column( b1,b2,b3) > > i wanted to merge if a1 & b1 reference (as the title is different but > contents are valid for comparison). also the result i need is > a1,a2,a3,b1,b2 (not b3). This doesn't have much to do with the data being CSV but applies to merging any data structures. Lets start by assuming the data is small enough to read both filers into memory as two lists. (If not the same principles apply but you have to do it from disk which will be slower) First read your two lists A and B. Sort both lists based on the first field. Start at the top of list A and search for a corresponding item in B. step through B processing each item until the field no longer matches step onto the next item in A and repeat the above process When you reach the end of either list you are done. There are some other (cleverer) ways to do this but that simple approach is how batch data processing has worked for the last 60 years. It's simple to write and debug. Try it, if it doesn't work for you show us what you've tried and any errors. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] csv merge with different column title
Hi, I am trying to merge two csv files with following condition. filea.csv column( a1,a2,a3) fileb.csv column( b1,b2,b3) i wanted to merge if a1 & b1 reference (as the title is different but contents are valid for comparison). also the result i need is a1,a2,a3,b1,b2 (not b3). i am using openpyxl,csv modules at this moment. any help appreciated. Thank you, P ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor