New to python, web development and hence flask. I have a small application 
that accepts excel inputs, extracts the data and writes it out oracle. My 
stack includes flask-sqlalchemy, cx_oracle. When I upload a small excel 
file with about 10 K rows, it works just fine. When I throw really large 
files at it up to a million rows, it craps out and I get :

sqlalchemy.exc.operationalerror: (cx_oracle.operationalerror) ora-03114: not 
connected to oracle

My view function:

@theapp.route('/upload', methods=('GET', 'POST'))
def upload():
    start_time = time.time()
    form = UploadForm()
    if form.validate_on_submit():
        file_name = secure_filename(form.upload.data.filename)
        form.upload.data.save(os.path.join(theapp.config['UPLOAD_FOLDER'], 
file_name))
        input_file = os.path.join(theapp.config['UPLOAD_FOLDER'], file_name)
        wb = load_workbook(input_file, data_only = True, read_only=True, 
use_iterators=True)
        sheet = wb.worksheets[0]
        allrows = sheet.iter_rows() #this is a generator
        headerobject = next(allrows) # row object for headerobject

        # Column names from excel file
        headerlist = []
        for c in headerobject:
            if c.value is not None: # If user uses delete button to delete 
contents of header cell, ignore it
                if isinstance(c.value, str) is False: # if not a string
                    return bad_request("Column names have to be strings 
without spaces. Value in cell {} is not a string".format(c.coordinate))
                if not c.value.strip(): # Empty string in header
                    return bad_request("Cell {} has line space(s). Delete 
the contents of the cell".format(c.coordinate))
                headerlist.append(c.value.strip())
        #print("Excel {}".format(headerlist))

        duplicate_elements = [k for k,v in Counter(headerlist).items() if v 
>1]
        if duplicate_elements: # A falsy. If there are duplicate elements
            return bad_request("Column(s) {} are 
duplicates".format(duplicate_elements))

        tableinfo = form.targettableinfo.data # tableinfo object in the 
databse order
        tablecolumns = []
        for col in tableinfo.fields:
            tablecolumns.append(col.name)
        #print("Table columns {}".format(tablecolumns))

        # Check if the columns match
        if not set(headerlist) == set(tablecolumns):
            foreign_columns = [x for x in headerlist if x not in 
tablecolumns]
            if foreign_columns:
                return bad_request("Column(s) {} in your excel file are not 
defined".format(foreign_columns))
            unfulfilled_columns = [x for x in tablecolumns if x not in 
headerlist]
            if unfulfilled_columns:
                return bad_request("Column(s) {} are expected by the table 
but not found in your excel file.".format(unfulfilled_columns))

        # Sort the tablecolumns object in the order of the headerlist
        sortedtablecolumnsobject = sorted(tableinfo.fields, key=lambda x: 
headerlist.index(x.name))

        # create a list of dicts to hold data to be written
        output_data = []
        for row in allrows:
            row_dict = {}
            for fobject,cobject in zip(sortedtablecolumnsobject,row):
            ######################################
            ##Validation & assignment
            ######################################
                if fobject.type.name == "Text":
                    if cobject.value == None:
                        row_dict[fobject.name] = cobject.value
                    else:
                        row_dict[fobject.name] = 
removeNonAscii(str(cobject.value))
                        #row_dict[fobject.name] = 
cobject.value.encode('utf-8')
                        #print(str(cobject.value))
                elif fobject.type.name == "Number":
                    if dao.isnumber(cobject.value):
                        row_dict[fobject.name] = float(cobject.value)
                        #print(float(cobject.value))
                    elif cobject.value is None:
                        row_dict[fobject.name] = cobject.value
                        #print(cobject.value)
                    else:
                        return bad_request("Cell {} has value {}. A numeric 
value is expected".format(cobject.coordinate, cobject.value))
                elif fobject.type.name == "Date":
                    if type(cobject.value) == datetime.datetime:
                        row_dict[fobject.name] = cobject.value
                    elif cobject.value is None:
                        row_dict[fobject.name] = cobject.value
                    else:
                        return bad_request("Cell {} does not contain a 
valid date".format(cobject.coordinate))
            ##############################################################

            output_data.append(row_dict)

        #print(output_data)
        m = db.MetaData()
        t = db.Table(tableinfo.table_name,m,autoload = True, autoload_with 
= db.engine)
        db.engine.execute(t.insert(),output_data)

        print("{} seconds" .format(time.time() - start_time))

        return redirect(url_for('summary'))

    return render_template('upload.html', title = 'Upload', form = form)

Basically, I create a giant list of dictionaries and execute an insert on 
the destination table with that output list. Like I said, it works for 
smaller files. For larger files, I think the connection to the database 
dies before I attempt to do the inserts. My stack trace shows that the 
error originates from the line that does the insertion. Is there a way to 
keep the database connection alive even if the request is taking too long? 
I have experimented with SQLALCHEMY_POOL_RECYCLE and 
SQLALCHEMY_POOL_TIMEOUT configurations in flask-sqlalchemy but it had no 
significance. What can I do to keep my database session/connection active 
during a long request? Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Reply via email to