Re: Search & replace an object in the database

2022-01-24 Thread jools
Thanks! I’ll consider this.

The whole thing is supposed to be a tool for everyday work (although: a 
very dangerous one!), so the “fix the database once” strategy isn’t going 
to work for me.

I’ve added some tests for checking different relation types (m2m, fk, 
one-to-one, generic relation, parent-child links) and it seems to work 
fine; I tested using a bunch of complex records from production.

> If this is not a one-time thing, then you might consider doing something 
similar through the admin interface, but breaking the problem down into 
separate phases using intermediate models. Then you could review each 
possible merge before going on to the next phase of actually doing a merge, 
or choosing to ignore a merge candidate. You could also do additional 
screening to, for example, select *which* customer record should be 
considered complete and which ones should be retired.

That’s a good point. I think I’ll do something similar but more simple by 
providing an overview of all merges beforehand. This way, people will know 
what’s going to happen and have an opportunity to fix any issues upfront.

cheers, j

Thomas schrieb am Freitag, 21. Januar 2022 um 23:37:11 UTC+1:

> Yes this exactly. A couple of possible additions:
>
> I’ve had great luck using python and SQL queries to manipulate a database 
> including converting from other schemas and merging duplicate records.
>
> Make a test database from your backup, and beat it up with your scripted 
> code until you are happy with the result. Just keep deleting the test 
> database and reconstructing it from the backup to try again. The python 
> code can do things easily such as sucking in all records, looking for 
> overlaps on various fields, and merging the results.
>
> If this is not a one-time thing, then you might consider doing something 
> similar through the admin interface, but breaking the problem down into 
> separate phases using intermediate models. Then you could review each 
> possible merge before going on to the next phase of actually doing a merge, 
> or choosing to ignore a merge candidate. You could also do additional 
> screening to, for example, select *which* customer record should be 
> considered complete and which ones should be retired.
>
> hth
>
> - Tom
>
> On Jan 21, 2022, at 12:16 PM, Bernard Mallala  wrote:
>
> I would approach this differently and from the database side mostly
>
>1. Backup the database so you can restore if something happens
>2. In Django, Create an an empty copy of the Customer database e.g 
>CustomerB
>3. Using a SQL query tool, select all duplicate copies from Customer 
>and insert into CustomerB
>4. If needed, update the model for CustomerB so that you are able to 
>associate or reference every duplicate in CustomerB with the right record 
>in Customer
>5. Using a SQL query too delete all duplicate copies of Customers from 
>Customer table
>6. If this breaks your application in any way, account for that by 
>ensuring that every duplicate record in CustomerB has a primary record in 
>Customer that suffices and returns results. This will ensure that all 
>relations with other models and records are honored.
>7. Update or fix, both models to ensure that duplicates can no longer 
>be inserted
>8. Perform any other schema updates
>
>
>
> On Friday, January 21, 2022 at 12:43:03 PM UTC-7 yebo...@gmail.com wrote:
>
>> Okay bro try this script hope it works
>>
>> On Fri, 21 Jan 2022 at 10:05, jools  wrote:
>>
>>> Dear Django users,
>>>
>>> here’s an interesting case and I’m curious whether somebody can point me 
>>> in the right direction.
>>>
>>> Here’s the problem:
>>> My company has a database of customers. Over the years, many duplicates 
>>> have been created. For cleaning up duplicates, I’d like to have a 
>>> search-and-replace functionality.
>>>
>>> I.e.: Replace all references in the database to customer  with 
>>> customer .
>>> The database schema has quite a bit of complexity, so I’m aiming to find 
>>> a very generic solution.
>>>
>>> Approach:
>>> Django has a bit of functionality to find all references to an existing 
>>> object, namely django.db.models.deletion.Collector and I’m using it to 
>>> find all references.
>>>
>>> Though, the “replace” logic seems quite hard to get right:
>>>
>>>- It has to keep parent links intact
>>>- It has to recognize references in parent models (Customer model is 
>>>derived from the concrete model Actor)
>>>- It has to recognize generic relations, built with Django’s content 
>>>types
>>>
>>>
>>> My stub implementation comes below.
>>>
>>>
>>>- Has anybody else implemented sth like this, e.g. in helper library?
>>>- Do you think the approach is right?
>>>- What would you differently?
>>>- Any caveats that you know of?
>>>
>>> Best regards
>>> Jools
>>>
>>> Stub implementation:
>>>
>>> from django.db.models.deletion import Collector, ProtectedError, 
>>> 

Re: Search & replace an object in the database

2022-01-21 Thread Thomas Lockhart
Yes this exactly. A couple of possible additions:

I’ve had great luck using python and SQL queries to manipulate a database 
including converting from other schemas and merging duplicate records.

Make a test database from your backup, and beat it up with your scripted code 
until you are happy with the result. Just keep deleting the test database and 
reconstructing it from the backup to try again. The python code can do things 
easily such as sucking in all records, looking for overlaps on various fields, 
and merging the results.

If this is not a one-time thing, then you might consider doing something 
similar through the admin interface, but breaking the problem down into 
separate phases using intermediate models. Then you could review each possible 
merge before going on to the next phase of actually doing a merge, or choosing 
to ignore a merge candidate. You could also do additional screening to, for 
example, select *which* customer record should be considered complete and which 
ones should be retired.

hth

- Tom

> On Jan 21, 2022, at 12:16 PM, Bernard Mallala  wrote:
> 
> I would approach this differently and from the database side mostly
> Backup the database so you can restore if something happens
> In Django, Create an an empty copy of the Customer database e.g CustomerB
> Using a SQL query tool, select all duplicate copies from Customer and insert 
> into CustomerB
> If needed, update the model for CustomerB so that you are able to associate 
> or reference every duplicate in CustomerB with the right record in Customer
> Using a SQL query too delete all duplicate copies of Customers from Customer 
> table
> If this breaks your application in any way, account for that by ensuring that 
> every duplicate record in CustomerB has a primary record in Customer that 
> suffices and returns results. This will ensure that all relations with other 
> models and records are honored.
> Update or fix, both models to ensure that duplicates can no longer be inserted
> Perform any other schema updates
> 
> 
> On Friday, January 21, 2022 at 12:43:03 PM UTC-7 yebo...@gmail.com 
>  wrote:
> Okay bro try this script hope it works
> 
> On Fri, 21 Jan 2022 at 10:05, jools  > wrote:
> Dear Django users,
> 
> here’s an interesting case and I’m curious whether somebody can point me in 
> the right direction.
> 
> Here’s the problem:
> My company has a database of customers. Over the years, many duplicates have 
> been created. For cleaning up duplicates, I’d like to have a 
> search-and-replace functionality.
> 
> I.e.: Replace all references in the database to customer  with customer 
> .
> The database schema has quite a bit of complexity, so I’m aiming to find a 
> very generic solution.
> 
> Approach:
> Django has a bit of functionality to find all references to an existing 
> object, namely django.db.models.deletion.Collector and I’m using it to find 
> all references.
> 
> Though, the “replace” logic seems quite hard to get right:
> It has to keep parent links intact
> It has to recognize references in parent models (Customer model is derived 
> from the concrete model Actor)
> It has to recognize generic relations, built with Django’s content types
> 
> My stub implementation comes below.
> 
> Has anybody else implemented sth like this, e.g. in helper library?
> Do you think the approach is right?
> What would you differently?
> Any caveats that you know of?
> Best regards
> Jools
> 
> Stub implementation:
> 
> from django.db.models.deletion import Collector, ProtectedError, 
> RestrictedError
> 
> 
> def replace_object(old_obj, new_obj, using='default'):
> """
> Replace an old object with a new object throughout the database.
> """
> collector = Collector(using)
> 
> try:
> collector.collect([old_obj])
> except ProtectedError:
> pass
> except RestrictedError:
> pass
> 
> for model, obj in collector.instances_with_model():
> for field_opts in obj._meta.get_fields():
> if __is_qualified(field_opts, new_obj):
> setattr(obj, field_opts.name , 
> new_obj)
> obj.save()
> 
> def __is_qualified(field_opts, new_obj):
> if not hasattr(field_opts, 'related_model'):
> return False
> 
> # This check is wrong for multi-table inheritance, I suppose?
> if field_opts.related_model != new_obj.__class__:
> return False
> if hasattr(field_opts, 'parent_link') and field_opts.parent_link is True:
> return False
> 
> return True
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to django-users...@googlegroups.com 
> .
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com
>  

Re: Search & replace an object in the database

2022-01-21 Thread Bernard Mallala
I would approach this differently and from the database side mostly

   1. Backup the database so you can restore if something happens
   2. In Django, Create an an empty copy of the Customer database e.g 
   CustomerB
   3. Using a SQL query tool, select all duplicate copies from Customer and 
   insert into CustomerB
   4. If needed, update the model for CustomerB so that you are able to 
   associate or reference every duplicate in CustomerB with the right record 
   in Customer
   5. Using a SQL query too delete all duplicate copies of Customers from 
   Customer table
   6. If this breaks your application in any way, account for that by 
   ensuring that every duplicate record in CustomerB has a primary record in 
   Customer that suffices and returns results. This will ensure that all 
   relations with other models and records are honored.
   7. Update or fix, both models to ensure that duplicates can no longer be 
   inserted
   8. Perform any other schema updates



On Friday, January 21, 2022 at 12:43:03 PM UTC-7 yebo...@gmail.com wrote:

> Okay bro try this script hope it works
>
> On Fri, 21 Jan 2022 at 10:05, jools  wrote:
>
>> Dear Django users,
>>
>> here’s an interesting case and I’m curious whether somebody can point me 
>> in the right direction.
>>
>> Here’s the problem:
>> My company has a database of customers. Over the years, many duplicates 
>> have been created. For cleaning up duplicates, I’d like to have a 
>> search-and-replace functionality.
>>
>> I.e.: Replace all references in the database to customer  with 
>> customer .
>> The database schema has quite a bit of complexity, so I’m aiming to find 
>> a very generic solution.
>>
>> Approach:
>> Django has a bit of functionality to find all references to an existing 
>> object, namely django.db.models.deletion.Collector and I’m using it to 
>> find all references.
>>
>> Though, the “replace” logic seems quite hard to get right:
>>
>>- It has to keep parent links intact
>>- It has to recognize references in parent models (Customer model is 
>>derived from the concrete model Actor)
>>- It has to recognize generic relations, built with Django’s content 
>>types
>>
>>
>> My stub implementation comes below.
>>
>>
>>- Has anybody else implemented sth like this, e.g. in helper library?
>>- Do you think the approach is right?
>>- What would you differently?
>>- Any caveats that you know of?
>>
>> Best regards
>> Jools
>>
>> Stub implementation:
>>
>> from django.db.models.deletion import Collector, ProtectedError, 
>> RestrictedError
>>
>>
>> def replace_object(old_obj, new_obj, using='default'):
>> """
>> Replace an old object with a new object throughout the database.
>> """
>> collector = Collector(using)
>>
>> try:
>> collector.collect([old_obj])
>> except ProtectedError:
>> pass
>> except RestrictedError:
>> pass
>>
>> for model, obj in collector.instances_with_model():
>> for field_opts in obj._meta.get_fields():
>> if __is_qualified(field_opts, new_obj):
>> setattr(obj, field_opts.name, new_obj)
>> obj.save()
>>
>> def __is_qualified(field_opts, new_obj):
>> if not hasattr(field_opts, 'related_model'):
>> return False
>> 
>> # This check is wrong for multi-table inheritance, I suppose?
>> if field_opts.related_model != new_obj.__class__:
>> return False
>> if hasattr(field_opts, 'parent_link') and field_opts.parent_link is 
>> True:
>> return False
>>
>> return True
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Django users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to django-users...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com
>>  
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/32c497c1-0640-4cae-9fdd-16fb4fb62e70n%40googlegroups.com.


Re: Search & replace an object in the database

2022-01-21 Thread Yeboah Dominic
Okay bro try this script hope it works

On Fri, 21 Jan 2022 at 10:05, jools  wrote:

> Dear Django users,
>
> here’s an interesting case and I’m curious whether somebody can point me
> in the right direction.
>
> Here’s the problem:
> My company has a database of customers. Over the years, many duplicates
> have been created. For cleaning up duplicates, I’d like to have a
> search-and-replace functionality.
>
> I.e.: Replace all references in the database to customer  with
> customer .
> The database schema has quite a bit of complexity, so I’m aiming to find a
> very generic solution.
>
> Approach:
> Django has a bit of functionality to find all references to an existing
> object, namely django.db.models.deletion.Collector and I’m using it to
> find all references.
>
> Though, the “replace” logic seems quite hard to get right:
>
>- It has to keep parent links intact
>- It has to recognize references in parent models (Customer model is
>derived from the concrete model Actor)
>- It has to recognize generic relations, built with Django’s content
>types
>
>
> My stub implementation comes below.
>
>
>- Has anybody else implemented sth like this, e.g. in helper library?
>- Do you think the approach is right?
>- What would you differently?
>- Any caveats that you know of?
>
> Best regards
> Jools
>
> Stub implementation:
>
> from django.db.models.deletion import Collector, ProtectedError,
> RestrictedError
>
>
> def replace_object(old_obj, new_obj, using='default'):
> """
> Replace an old object with a new object throughout the database.
> """
> collector = Collector(using)
>
> try:
> collector.collect([old_obj])
> except ProtectedError:
> pass
> except RestrictedError:
> pass
>
> for model, obj in collector.instances_with_model():
> for field_opts in obj._meta.get_fields():
> if __is_qualified(field_opts, new_obj):
> setattr(obj, field_opts.name, new_obj)
> obj.save()
>
> def __is_qualified(field_opts, new_obj):
> if not hasattr(field_opts, 'related_model'):
> return False
>
> # This check is wrong for multi-table inheritance, I suppose?
> if field_opts.related_model != new_obj.__class__:
> return False
> if hasattr(field_opts, 'parent_link') and field_opts.parent_link is
> True:
> return False
>
> return True
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/CAPVAacg3ZOELkofUoSHG1T4r6iSiSofUT%2BQRJa%3Doedi_KForGw%40mail.gmail.com.
import hashlib
import requests

seckey = "FLWSECK_TEST-88ebd2a6edb534bbc663d33efcd1211a-X"
hashedseckey = hashlib.md5(seckey.encode("utf-8")).hexdigest()
hashedseckeylast12 = hashedseckey[-12:]
seckeyadjusted = seckey.replace('FLWSECK-', '')
seckeyadjustedfirst12 = seckeyadjusted[:12]

# print(seckeyadjustedfirst12 + hashedseckeylast12)


resp = requests.post("https://tnttukbbvmi.sandbox.verygoodproxy.com/post;,
headers={"Content-type": "application/json"},
json={ "secret_key": "4pXY0grBDyaqeiAAwQ7T5g=="})

print(str(resp.text))


Re: Search & replace an object in the database

2022-01-21 Thread jools
Here’s an update:

I sticked to the idea of using collector; it categorizes models from a 
deletion perspective into:

   - fast_deletes - models that can be easily deleted, without checking for 
   other relations
   - update_fields - field that have SET_NULL, i.e. relations to the 
   customer objects that are only updated
   - … all the rest

So I kept the structure from Collector, looping the three categories 
mentioned above. It works ok; for each instance, you’ll have the check 
instance._meta.get_fields() to find the field related to the old customer 
record.
Fixing GenericRelations require the most code, because you’ll have to find 
the content type, object id (and maybe content object) fields that are 
related to the old customer record.

Moreover, you’ll have to be aware that you may violate foreign key 
constraints when replacing the old id.

My implementation is attached, if it’s interesting for anybody.

Have a nice weekend y’all
jools
jools schrieb am Freitag, 21. Januar 2022 um 16:05:58 UTC+1:

> Dear Django users,
>
> here’s an interesting case and I’m curious whether somebody can point me 
> in the right direction.
>
> Here’s the problem:
> My company has a database of customers. Over the years, many duplicates 
> have been created. For cleaning up duplicates, I’d like to have a 
> search-and-replace functionality.
>
> I.e.: Replace all references in the database to customer  with 
> customer .
> The database schema has quite a bit of complexity, so I’m aiming to find a 
> very generic solution.
>
> Approach:
> Django has a bit of functionality to find all references to an existing 
> object, namely django.db.models.deletion.Collector and I’m using it to 
> find all references.
>
> Though, the “replace” logic seems quite hard to get right:
>
>- It has to keep parent links intact
>- It has to recognize references in parent models (Customer model is 
>derived from the concrete model Actor)
>- It has to recognize generic relations, built with Django’s content 
>types
>
>
> My stub implementation comes below.
>
>
>- Has anybody else implemented sth like this, e.g. in helper library?
>- Do you think the approach is right?
>- What would you differently?
>- Any caveats that you know of?
>
> Best regards
> Jools
>
> Stub implementation:
>
> from django.db.models.deletion import Collector, ProtectedError, 
> RestrictedError
>
>
> def replace_object(old_obj, new_obj, using='default'):
> """
> Replace an old object with a new object throughout the database.
> """
> collector = Collector(using)
>
> try:
> collector.collect([old_obj])
> except ProtectedError:
> pass
> except RestrictedError:
> pass
>
> for model, obj in collector.instances_with_model():
> for field_opts in obj._meta.get_fields():
> if __is_qualified(field_opts, new_obj):
> setattr(obj, field_opts.name, new_obj)
> obj.save()
>
> def __is_qualified(field_opts, new_obj):
> if not hasattr(field_opts, 'related_model'):
> return False
> 
> # This check is wrong for multi-table inheritance, I suppose?
> if field_opts.related_model != new_obj.__class__:
> return False
> if hasattr(field_opts, 'parent_link') and field_opts.parent_link is 
> True:
> return False
>
> return True
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/1a094625-ab1c-4763-81f2-d8c77ffab870n%40googlegroups.com.
from django.contrib.contenttypes.fields import GenericRelation, GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db.models import sql
from django.db.models.deletion import Collector
from django.db.utils import IntegrityError


class LogMessage:
def __init__(self, instance, message=""):
self._instance = instance
self._message = message

def __str__(self):
try:
model_name = self._instance._meta.verbose_name
except AttributeError:
model_name = self._instance.__class__.__name__

return f'{model_name}, {str(self._instance)}, {self._message}'


class ObjectReplacer:
def __init__(self):
self._errors = []
self._updates = []

def replace_object(self, old_obj, new_obj, using='default'):
collector = Collector(using)
collector.collect([old_obj], keep_parents=True)
self.__update_fields(collector, new_obj)
self.__update_fast_deletes(collector, new_obj)
self.__update_the_rest(collector, old_obj, new_obj)

self.show_updates()
self.show_errors()

def __update_fields(self, collector, new_obj):
for model, 

Search & replace an object in the database

2022-01-21 Thread jools
Dear Django users,

here’s an interesting case and I’m curious whether somebody can point me in 
the right direction.

Here’s the problem:
My company has a database of customers. Over the years, many duplicates 
have been created. For cleaning up duplicates, I’d like to have a 
search-and-replace functionality.

I.e.: Replace all references in the database to customer  with 
customer .
The database schema has quite a bit of complexity, so I’m aiming to find a 
very generic solution.

Approach:
Django has a bit of functionality to find all references to an existing 
object, namely django.db.models.deletion.Collector and I’m using it to find 
all references.

Though, the “replace” logic seems quite hard to get right:

   - It has to keep parent links intact
   - It has to recognize references in parent models (Customer model is 
   derived from the concrete model Actor)
   - It has to recognize generic relations, built with Django’s content 
   types


My stub implementation comes below.


   - Has anybody else implemented sth like this, e.g. in helper library?
   - Do you think the approach is right?
   - What would you differently?
   - Any caveats that you know of?

Best regards
Jools

Stub implementation:

from django.db.models.deletion import Collector, ProtectedError, 
RestrictedError


def replace_object(old_obj, new_obj, using='default'):
"""
Replace an old object with a new object throughout the database.
"""
collector = Collector(using)

try:
collector.collect([old_obj])
except ProtectedError:
pass
except RestrictedError:
pass

for model, obj in collector.instances_with_model():
for field_opts in obj._meta.get_fields():
if __is_qualified(field_opts, new_obj):
setattr(obj, field_opts.name, new_obj)
obj.save()

def __is_qualified(field_opts, new_obj):
if not hasattr(field_opts, 'related_model'):
return False

# This check is wrong for multi-table inheritance, I suppose?
if field_opts.related_model != new_obj.__class__:
return False
if hasattr(field_opts, 'parent_link') and field_opts.parent_link is 
True:
return False

return True

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/4cd485cd-cc20-481d-a7be-3179bcc5c98bn%40googlegroups.com.