Hi,

I'm trying to import some relatively huge (~300GiB) set of data from
Microsoft SQL Server database to PostgreSQL. For this purpose, I use
bcp.exe (bulk copy utility) comes with MSSQL. Despite there are command
line arguments which you can specify batch size, escape characters, null
fields, etc.; %90 of these arguments are unsuprisingly ignored by
bcp.exe. The problem is, bcp.exe

- Uses nothing to specify null fields, which is equivalent to

    COPY ... WITH NULL AS ''

  in PostgreSQL.

- Uses \x00 character to specify empty strings.

(Sorry not any single part of this mess is configurable.) To
successfully accept such an input during "COPY ... FROM ..." in
PostgreSQL, I decided to write middleware triggers to tables including
columns of type character varying. Below is the procedure I come up with

  CREATE OR REPLACE FUNCTION create_bcp_fix_triggers (_table_schema text)
  RETURNS void AS $$
  DECLARE
      _table       record;
      _column      record;
      _create_stmt text;
  BEGIN
      SET standard_conforming_strings TO off;
      SET escape_string_warning TO off;
      -- Find every table having a column of type 'character varying'.
      FOR _table
       IN SELECT table_name
            FROM information_schema.columns
           WHERE table_schema = _table_schema AND
                 data_type = 'character varying'
        GROUP BY table_name
        ORDER BY table_name
     LOOP _create_stmt = 'BEGIN;\n' ||
                         'CREATE OR REPLACE\n' ||
                         'FUNCTION ' || _table.table_name || '_bcp_fix ()\n' ||
                         'RETURNS "trigger" AS $bcp-fix$\n' ||
                         'BEGIN\n';
          -- Create appropriate bcp.exe fix clauses for every found column.
          FOR _column
           IN SELECT column_name
                FROM information_schema.columns
               WHERE table_schema = _table_schema AND
                     table_name   = _table.table_name
         LOOP _create_stmt = _create_stmt ||
                             '    IF NEW.' || _column.column_name || ' = 
E''\0'' THEN\n' ||
                             '        NEW.' || _column.column_name || ' = 
''''\n' ||
                             '    END IF;\n';
          END LOOP;
          _create_stmt = _create_stmt ||
                         '    RETURN NEW;\n' ||
                         'END;' ||
                         '$bcp-fix$ LANGUAGE plpgsql;\n' ||
                         'CREATE TRIGGER ' || _table.table_name || '_bcp_fix\n' 
||
                         '    BEFORE INSERT ON ' || _table.table_name || '\n' ||
                         '    FOR EACH ROW EXECUTE ' ||
                         '    PROCEDURE ' || _table.table_name || 
'_bcp_fix();\n' ||
                         'COMMIT;';
          EXECUTE _create_stmt;
      END LOOP;
  END;
  $$ LANGUAGE plpgsql;

But executing this procedure fails for some reason I couldn't
understand.

  > SELECT public.create_bcp_fix_triggers('commsrv');
  ERROR:  syntax error at or near "AS $"
  LINE 4: RETURNS "trigger" AS $bcp-fix$
                            ^
  QUERY:  BEGIN;
  CREATE OR REPLACE
  FUNCTION xyz_bcp_fix ()
  RETURNS "trigger" AS $bcp-fix$
  BEGIN
      IF NEW.foo = E'        NEW.foo = ''
      END IF;
      IF NEW.bar = E'        NEW.bar = ''
      END IF;
      RETURN NEW;
  END;$bcp-fix$ LANGUAGE plpgsql;
  CREATE TRIGGER xyz_bcp_fix
      BEFORE INSERT ON xyz
      FOR EACH ROW EXECUTE     PROCEDURE xyz_bcp_fix();
  COMMIT;
  CONTEXT:  PL/pgSQL function "create_bcp_fix_triggers" line 41 at
  EXECUTE statement

Can anybody help me to spot the problem?


Regards.

P.S. For data transfer during migratition, I first considered using "EMS
     Data Export for SQL Server" tool, but it lacks of performance.
     (~1000rows/sec) Namely, I'm open to any bcp.exe alternatives.

Reply via email to