Hi Adrian,

From: "Adrian Klaver" <adrian.kla...@aklaver.com>
Subject: Re: [GENERAL] "Ungroup" data for import into PostgreSQL


On 01/15/2015 04:56 PM, Jim Nasby wrote:
On 1/15/15 9:43 AM, George Weaver wrote:
Hi List,

I need to import data from a large Excel spreadsheet into a PostgreSQL
table.  I have a program that uses ODBC to connect to Excel and
extract data using SQL queries.  The program then inserts the data
into a PostgreSQL table.

The challenge with this particular spreadsheet is that it is arrayed
thus:

Model No 1     Product Code 15
    Serial No 1       No on Hand
    Serial No 2       No on Hand
    Serial No 3       No on Hand
Model No 4    Product Code 9
    Serial No 12     No on Hand
Model No 5    Product Code 27
    Serial No 6       No on Hand
    Serial No 14     No on Hand

etc.

I need the data in PostgreSQL arrayed thus

Model No 1    Product Code 15    Serial No 1       No on Hand
Model No 1    Product Code 15    Serial No 2       No on Hand
Model No 1    Product Code 15    Serial No 3       No on Hand
Model No 4    Product Code 9      Serial No 12     No on Hand
Model No 5    Product Code 27    Serial No 6       No on Hand
Model No 5    Product Code 27    Serial No 14     No on Hand

I can import the data procedurely using plpgsql to match the
individual rows to the master for each row (import the raw data into a
buffer table in PostgreSQL and then looping through the rows in the
buffer table and checking to see when the Model No changes).

Note that if you're doing that you better be putting the rownumber from
excel into the table... result sets are NOT guaranteed to be in insert
order!

I'm wondering if there is a more elegant way to do this using straight
sql from Excel?

Well, that's really an excel question, not a Postgres question...

If you load the whole spreadsheet into a single table and have a way to
differentiate between the different rows then you might be able to do
something with CTE's to relate a serial number to the product code. That
might be faster than plpgsql.

You might also be able to do something creative with formulas in excel
to copy the product code data to the serial # rows. You could then
import the whole thing and re-normalize it.

There's probably some stuff you could do with VBA too. If you care about
performance you don't want to execute SQL statements for each
spreadsheet row.

Or if you really want to slice and dice and you use Python, then take a look at Pandas:

http://pandas.pydata.org/

In particular the IO functions:

http://pandas.pydata.org/pandas-docs/stable/io.html

I don't use Python but I found the information in these links very interesting.

Thanks!



--
Adrian Klaver
adrian.kla...@aklaver.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to