Re: [sqlalchemy] postgresql large objects

Jason Newton Wed, 25 Jun 2014 14:36:12 -0700

> Just in case this wasn't apparent, you certainly *can* use psycopg2's
> bindings when you're in an otherwise SQLAlchemy app.  Worst case you can
> retrieve a raw psycopg2 connection using connection.raw_connection and do
> whatever you need.  If you truly have some use for LOBs, SQLAlchemy isn't
> preventing you from using it, it's just not providing any niceties around
> it.   The fact that these unusual use cases are not ever prohibited by SQLA
> further raises the bar to adding first class support for them.
>
> Yes, this much I knew would work, its just clunkier than it could be.




>
>
>  PG's BYTEA is NOT arbitrarily large, it has quite real practical limits
> and I've hit them regularly in storing compressed HDF5 documents in the
> database as part of a bench testing framework.  The theoretical limit is
> 1GB but this limit is far less in practice (
> http://www.postgresql.org/message-id/cafj8pracfkoinp2uxeizod5krx29n2ofsoldh0w6ej7rxko...@mail.gmail.com
> ) . http://www.postgresql.org/docs/9.3/static/storage-toast.html  I'm not
> looking for streaming, retrieve/set binary buffers.  It'd be nice to
> translate it transparently to HDF5 python in-memory objects (ie h5py).
>
> Python translation is very easy in SQLA, its just if you have special
> needs for SQL syntaxes, that's where special behaviors may be needed.  So
> far it sounds like the only blocking factor is that bind_sql needs to
> distinguish between INSERT and UPDATE.  that's not a terribly tall order
> though it is inconvenient in that the API would need a
> backwards-compatibility layer.
>

Perhaps you can keep backwards compatibility by introspecting the callbacks
to determine the arity of their arguments.


>
>
>
>
>
>>
>> As the feature involves SQL functions I don't think you'd be dealing only
>> with bind_processor(), the SQL functions themselves would probably be via
>> SQL-level processing, see
>> http://docs.sqlalchemy.org/en/rel_0_9/core/types.html#applying-sql-level-bind-result-processing.
>> It's true these functions aren't given access to the compiler context where
>> you'd be able to discern INSERT from UPDATE, so I'd probably use two
>> variants of datatype, an InsertLOB and UpdateLOB datatype, then I'd apply
>> these to an INSERT or UPDATE statement at runtime probably using a
>> @compiles decorator -
>> http://docs.sqlalchemy.org/en/rel_0_9/core/compiler.html, that is, take
>> the existing LOB type and just swap it out for InsertLOB/UpdateLOB.
>> Adding compiler context to TypeEngine is something we could look into as a
>> feature as well so that bind_expression() has this available somehow and
>> switching around types wouldn't be needed.
>>
>
>  Separate datatypes is clearly not a good approach to this.
>
>
> Let me clarify that these separate datatypes would be totally invisible to
> the user.   The user would work with a single LOB type.  Translation to
> Insert/Update versions would not be explicit and would occur at the point
> at which the insert/update construct is compiled.   At the moment this may
> be the best approach short of modifying the library (but then again I can't
> say much about the approach because i have little idea what the SQL we're
> talking about looks like).
>

Ah.  I thought it would be done in the declarative table descriptions.
I'm not sure, being relatively new to SQLA, that would look like to do.


>
>
>   Everything starts with a transaction block.
>
> the psycopg2 DBAPI is implicitly in a transaction block at all times
> unless "autocommit" mode is set up, so there shouldn't be any problem here.
>
> AFAIK it won't work with autocommit but I don't have documentation to
prove that, I just remember getting errors in I think JDBC pertaining to it.


>
>    Hand crafted inserts use server side lo_create(-1) (in sql) which
> allocates a new large object and returning clause to get the oid in one go
> back to the client side.  Then I start using the lobject api on the
> returned oid.
>
> I'd like to see explicit SQL, preferably in the form of a psycopg2 script
> that illustrates all the operations you wish to support and specifically
> how they must interact with the database.
>
>
I've attached example usages.  As I indicated in my prior email, right now
I only do inserts/selects.


>
>    In patches I've made to cppdb, in the bind of std::istream I call
> client side lo_create, lo_open, a sequence of lo_writes, and finally
> lo_close.  In retrospect this doesn't handle updates particularly well and
> maybe this could be used in sqlalchemy as is too - an update just allocates
> a new large object and unreferences the old one.
>
> I wonder why the mechanics of these functions can't be more cleanly
> wrapped into server-side stored procedures?    If they can be distilled
> into simple insert_lob() update_lob() functions that would potentially be a
> better separation of duties.
>
>
I think it's because only "small" lobs could rely on that -  huge lobs that
exceed working memory fit better with a file like api.  It would be a nice
convenience to have, perhaps.  These same "small lobs" (ie lobs that fit in
working memory - keeping in mind modern commodity servers easily get 64GB)
are the kind that I would expect SQLA to deal with although if they binded
file like objects and buffers, that would allow both cases to be covered.
I don't think I'll be writing code that uses those stored procedures but
they could be nice conveniences.  If SQLA managed to support these lobs
well however, I don't think users would use it directly.


>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "sqlalchemy" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/sqlalchemy/4TCg_TVLgHc/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

def chunklengths(N, n):
    """Produce `n`-sized chunks from `N`."""
    for start in range(0, N, n):
        seqlen = N - start if start + n >= N else n
        yield seqlen
def chunks(s, n):
    """Produce `n`-character chunks from `s`."""
    offset = 0
    for chunklen in chunklengths(len(s), n):
        yield s[offset:(offset + n)]
        offset = offset + n

with conn.cursor() as cur:
    cur.execute(
        'select dataset, flight, run, input.icd from input inner join run_table on run_table.id = input.flight_run_id where input.id = %s;',
        (input_id,)
    )
    dataset, flight, run, icd = cur.fetchone()
    flight_path = pjoin(dbroot, dataset, 'flights', flight)
    run_path = pjoin(flight_path, 'runs', run)
    trace_path = pjoin(run_path, icd, 'trace.h5')
    settings = file(pjoin(flight_path, 'conf', 'settings.yaml'), 'r').read()
    icd_output = file(pjoin(run_path, icd, 'icd_output.bin'), 'rb').read()

    print 'inserting : sizes: conf: %d icd_output: %d trace: %d'%(len(settings), len(icd_output), os.path.getsize(trace_path))
    cur.execute(
        "INSERT INTO pr_result(result_set, configuration, configuration_md5, input_id, icd_output, trace_oid) values ("
        "%(result_set)s, %(configuration)s, md5(%(configuration)s), %(input_id)s, %(icd_output)s, lo_creat(-1) ) "
        "RETURNING id, trace_oid;",
        {'result_set': opts.result_set, 'configuration': settings, 'input_id': input_id, 'icd_output': Binary(icd_output)}
    )
    rid, trace_oid = cur.fetchone()
    print 'uploading trace'
    trace_remote = conn.lobject(trace_oid, 'rw')
    trace_str = file(trace_path, 'rb').read()
    for chunk in chunks(trace_str, 1024*1024*10):
        trace_remote.write(chunk)
    trace_remote.close()
    conn.commit()
    print 'Inserted result %s as id = %d'%(run_path, rid)


#pull to a file example, in many cases one will pull to memory instead
with conn.cursor() as cur:
    cur.execute(
        "select trace_oid where result.id = %s;"
        {'id': opts.id}
    )
    trace_oid = cur.fetchone()
    trace_remote = conn.lobject(trace_oid, 'r')
    trace_remote.seek(0, 2) #SEEK_END
    nbytes = trace_remote.tell()
    trace_remote.seek(0, 0) #SEEK_SET
    print 'pulling trace nbytes = %d'%(nbytes,)

    with file(trace_path, 'wb') as trace_file:
        for chunksize in chunklens(nbytes, 1024*1024*10):
            chunk = trace_remote.read(chunksize)
            trace_file.write(chunk)
            if chunk == 0:
                break
    trace_remote.close()
    conn.commit()

Re: [sqlalchemy] postgresql large objects

Reply via email to