On 12/16/21 3:00 PM, hanan lamaazi wrote:
Dear All,

I really need your assistance,

I have a dataset with 1005000 rows and 25 columns,

The main column that I repeatedly use are Time, ID, and Reputation

First I sliced the data based on the time, and I append the sliced data in
a list called "df_list". So I get 201 lists with 25 columns

The main code is starting for here:

for elem in df_list:

{do something.....}

{Here I'm trying to calculate the outliers}

Out.append(outliers)

Now my problem is that I need to locate those outliers in the df_list and
then update another column with is the "Reputation"

Note that the there is a duplicated IDs but at different time slot

example is ID = 1 is outliers, I need to select all ID = 1 in the list and
update their reputation column

I tried those solutions:
1)

grp = data11.groupby(['ID'])
         for i in GlobalNotOutliers.ID:
             data11.loc[grp.get_group(i).index, 'Reput'] += 1

         for j in GlobalOutliers.ID:
             data11.loc[grp.get_group(j).index, 'Reput'] -= 1


It works for a dataframe but not for a list

2)

for elem in df_list:

elem.loc[elem['ID'].isin(Outlier['ID'])]


It doesn't select the right IDs, it gives the whole values in elem

3) Here I set the index using IDs:

for i in Outlier.index:
     for elem in df_list:
         print(elem.Reput)
         if i in elem.index:
#             elem.loc[elem[i] , 'Reput'] += 1
             m = elem.iloc[i, :]
             print(m)


It gives this error:

IndexError: single positional indexer is out-of-bounds


I'm greatly thankful to anyone who can help me,

I'd suggest you group your records by date and put each group into a dict whose key is date. Collecting each record into its group, append to it the index of the respective record in the original list. Then go through all your groups, record by record, finding outliers. The last item in the record is the index of the record in the original list identifying the record you want to update. Something like this:

    dictionary = {}
    for i, record in enumerate (original_list):
        date = record [DATE_INDEX]
        if date in dictionary:
            dictionary [date].append ((record, i))
        else:
            dictionary[date] = [(record, i)]

    reputation_indexes = set ()
    for date, records in dictionary.items ():
        for record, i in records:
            if has_outlier (record):
                reputation_indexes.add (i)

    for i in reputation_idexes:
        update_reputation (original_list [i])

Frederic



--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to