I still don't see the need of a dict-like something holding 10m hashes to 
discern in some 10k lines which one to insert.......
solutions:
1)
if the files you're going to insert have less rows than the number of rows 
in the table, revert the logic: fetch only table rows that could be 
matching the files. instead of fetching + hashing 10m things, you hash 10k 
of them
2)
choose proper pkeys and code a trigger (ON INSERT). Let the backend do the 
work (guess what, they're engineered to manage data!), not a single python 
process that fills the memory
3)
store the hash in a separate column (or a separate table). Instead of 
fetching n rows * number of columns values, and then hash it, you fetch the 
hashed value already. 

On Tuesday, March 17, 2015 at 12:14:20 AM UTC+1, LoveWeb2py wrote:
>
> Thank you for the feedback everyone.
>
> The main reason I fetch them all first is to make sure I'm not inserting 
> duplicate records. We have a lot of files that have thousands of records 
> and sometimes they're duplicates. I hash a few columns from each record and 
> if the value is the same then I don't insert the record. If there is a more 
> efficient way to do this please let me know.
>
>

-- 
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to