On Sun, Sep 23, 2018 at 9:05 PM Ioannes <[email protected]> wrote:
>
> Hi all,
>
> I had originally asked a question here: 
> https://stackoverflow.com/questions/52391072/creating-and-appending-to-a-list-in-sqlalchemy-database-table/52391339#52391339.
>  However as I am still really stuck, I thought asking to a group more 
> specifically focussed on SQLAlchemy would be better.
>
> I have a table in the database (see below for test file going into table).The 
> table has two fields, name (the latin name e.g. homo sapiens) and other names 
> (the common names e.g. human, man). I want to update a field (other names) in 
> the table, so instead of having:
>
> Rana rugosa human
> Rana rugosa man
> Rana rugosa frog
> Rana rugosa cow
>
> In the database table, It will return this:
>
> Rana rugosa human,man,frog,cow
>
> The test_data file looks like this:
>
> origin_organism        common_name         tested_organism
> Rana rugosa            human                -
> Rana rugosa            man                  -
> Rana rugosa            frog                 homo sapiens
> Rana rugosa            cow                  Rana rugosa
> Rana rugosa            frog                 Rana rugosa
> Rana rugosa            frog                 -
> Rana rugosa            frog                 -
> Rana rugosa            frog                homo sapiens
> -                      -                               -
> -                      -                               homo sapiens
> -                      -                               -
> -                      -                               -
> -                      -                               -
> -                      -                               -
> streptococcus pneumoniae    -              -
>

The traditional way to model this in a database would be to have a
separate table for the common_name. There would be multiple rows in
the common_name table pointing back at a single row in the organism
table; this is known as a "one-to-many relationship". However, I'll
assume for the moment that you have your own reasons for wanting to
store them all in the same table.

> The code:
>
> import sys
> from sqlalchemy.orm  import *
> from sqlalchemy  import *

("from <module> import *" is generally frowned upon in Python, because
it makes the rest of the code harder to reason about)

> from dbn.sqlalchemy_module  import lib
> import pd
>
> engine = lib.get_engine(user="user", psw="pwd", db="db", 
> db_host="111.111.111.11")
> Base = lib.get_automapped_base(engine)
> session = Session(engine)
> tbs = lib.get_mapped_classes(Base)
> session.rollback()
> df = pd.read_excel('test_data.xlsx', sheet_name = 'test2')
>
>
>
>
> for index, row in df.iterrows():
>     origin_latin_name = row['origin_organism'].strip().lower()
>     other_names_name = row['common_name'].strip().lower()
>     tested_species = row['tested_organism'].strip().lower()
>
>
> if origin_latin_name not in [None, "None", "", "-"]:
>     instance = [x[0] for x in 
> Session.query(session,tbs['species'].name).filter_by(name=origin_latin_name).all()]

"Session.query(session, ...)" seems like an overly-verbose way to
spell "session.query(...)". Is there any particular reason why you've
written it like that?

At this point, "instance" will either be an empty list, or a list
containing a single string, the value of the "name" column that
matched origin_latin_name.

It's important to note that, depending on how the database was set up,
the string in the list is not necessarily exactly the same as
origin_latin_name. For example, MySQL string fields compare
case-insensitively by default. So it's possible that a row would match
in the database, but then your "if" clause below would not match.

>     if origin_latin_name not in instance:
>         origin_species = lib.get_or_create(
>             session,
>             tbs["species"],
>             name = origin_latin_name,
>             other_names = other_names_name
>         )

I assume by the name that "get_or_create" is actually going to query
the database again to see if a matching row already exists, which
seems like wasted effort. You already know that it's not in the
database.

>
>     elif origin_latin_name in instance:

Above would be better written as "else:"

>         other_names_query = 
> Session.query(session,tbs['species'].other_names).filter_by(name=origin_latin_name)
>         other_names_query_list = [x for x in other_names_query]
>         original_list2 = list(set([y for y in x[0].split(',') for x in 
> other_names_query_list]))
>         if other_names_name not in original_list2:
>             original_list2.append(other_names_name)
>             new_list = ','.join(original_list2)
>             new_names = {'other_names':','.join(original_list2)}

Note here that if other_names_name is *already* in original_list2, you
won't enter the body of this "if" statement, and "new_list" will not
be defined (or will still have the value from a previous iteration,
since you go round the outer loop multiple times)

>
>         origin_species = lib.get_or_create(
>             session,
>             tbs["species"],
>             name = origin_latin_name,
>             other_names = new_list
>         )
>
> The part from the elif statement doesn't work. I've ran into two problems:
>
> (1) The most recent error I got: NameError: name 'new_list' is not defined

As mentioned above, this will happen when "other_names_name" is
already in "original_list2".

>
> (2) another error I got is that I have another table further on
>
> map1 = lib.get_or_create(
>     session,
>     tbs["map1"],
>     age_id_id = age,
>>
>>     name_id_id = origin_species.id
>>
>
>     )
>
> ...and it said that origin_species object cannot be found, but I think this 
> is linked to the elif statement, that somehow the origin_species object is 
> not being updated properly.

When you get this kind of error, it's often helpful to split your code
up into smaller functions, to make it more obvious where each name is
defined and used. In this case, you've somehow ended up on this line
without ever executing one of the "origin_species = ..." lines.

>
> If anyone could help I would appreciate it.
>

You have made life harder for yourself by querying for individual
columns, rather than using the SQLAlchemy ORM in the "natural" way.
Rather than passing a column to session.query(), you should pass the
table itself. The object that you get back will contain the primary
key of the corresponding row in the database, which means that
SQLAlchemy can update that row for you later. (Note that this does
require your table to have a primary key, which shouldn't be a problem
in this case). Something like this:

# A shortcut to save us typing tbs['species'] all the time
Species = tbs['species']

origin_species = session.query(Species).filter_by(name=origin_latin_name).one()

if origin_species is None:
    # No corresponding row in the database, so create one
    origin_species = Species(
        name=origin_latin_name,
        other_names=other_names_name,
    )
    session.add(origin_species)
else:
    # Row already exists in the database, just update the other_names column
    other_names = set(origin_species.other_names.split(','))
    other_names.add(other_names_name)
    other_names_csv = ','.join(sorted(other_names))
    origin_species.other_names = other_names_csv

# Flushing the session causes our changes to be written to the database
session.flush()


Hope that helps,

Simon

-- 
SQLAlchemy - 
The Python SQL Toolkit and Object Relational Mapper

http://www.sqlalchemy.org/

To post example code, please provide an MCVE: Minimal, Complete, and Verifiable 
Example.  See  http://stackoverflow.com/help/mcve for a full description.
--- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to