On Thursday, June 22, 2017 at 12:16:28 PM UTC+1, kishan.samp...@gmail.com wrote:
> I want to write a common file in which It can add the frequency by adding 
> multiple csv file and if the same words are repeated in python then it should 
> add the frequency in the common file can any one help me please
> 
> 
> import re
> import operator
> import string
> 
> class words:
>     def __init__(self,fh):
>         self.fh = fh
>     def read(self):
>         for line in fh:
>             yield line.split()
> 
> if __name__ == "__main__":
>     frequency = {}
>     document_text = open('data_analysis.csv', 'r')
>     common1_file = open("common_file1.csv", "r")
>     
>     text_string = document_text.read().lower()
>     match_pattern = re.findall(r'\b[a-z]{3,15}\b', text_string)
>     
>     text_string_one = common1_file.read().lower()
>     match_pattern_one = re.findall(r'\b[a-z]{3,15}\b', text_string_one)
>     #print("match_pattern"+(str(match_pattern)))
>     for word in match_pattern:
>         for word1 in match_pattern_one:
>             count = frequency.get(word,0)
>             count1 = frequency.get(word1,0)
>             if word1 == word:
>                 frequency[word] = count + count1
>             else:
>                 frequency[word] = count 
>     
> 
>     frequency_list = frequency.keys()
>     text_file = open("common_file1.csv", "w")
>     for words in frequency_list:
>         data = (words, frequency[words])
>         print (data)
>         #text_file = open("common_file1.csv", "w")
>         #for i in data:
>         #store_fre = (str(data)+"\n")
>         text_file.write(str(data)+"\n")
>     
> 
>     text_file.close()
> 
> 
> this is my code written by me til now but not getting satisfied results

Dictionary 'frequency' is updated only with values of 0.

If the aim is to get a count of occurrences for each word 
where the word exists in both input files, you could replace this:

for word in match_pattern: 
    for word1 in match_pattern_one: 
        count = frequency.get(word,0) 
        count1 = frequency.get(word1,0) 
        if word1 == word: 
            frequency[word] = count + count1 
        else: 
            frequency[word] = count 

with this:

all_words = match_pattern + match_pattern_one
word_set = set(match_pattern) & set(match_pattern_one)
while word_set:
    word = word_set.pop()
    count = all_words.count(word)
    frequency[word] = count

Other observations:
- Reading from and writing to the csv files is not utilsing the csv format
- The regex may be too restrictive and not all expected words extracted
- The output is written to one of the input files, overwriting the original 
content of the input file
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to