On 01/07/2015 08:25 AM, Aaron Botsis wrote:
Hi folks, I was having a problem importing json data with COPY. Lots of things export data nicely as one json blob per line. This is excellent for directly importing into a JSON/JSONB column for analysis.

...Except when there’s an embedded doublequote. Or anything that’s escaped. COPY handles this, but by the time the escaped char hit the JSON parser, it's not escaped anymore. This breaks the JSON parsing. This means I need to manipulate the input data to double-escape it. See bug #12320 for an example. Yuck.

I propose this small patch that simply allows specifying COPY … ESCAPE without requiring the CSV parser. It will make it much easier to directly use json formatted export data for folks going forward. This seemed like the simplest route.

Usage is simply:

postgres=# copy t1 from '/Users/nok/Desktop/queries.json';
ERROR:  invalid input syntax for type json
DETAIL:  Token "root" is invalid.
CONTEXT: JSON data, line 1: ...1418066241619 AND <=1418671041621) AND user:"root... COPY t1, line 3, column bleh: "{"timestamp":"2014-12-15T19:17:32.505Z","duration":7.947,"query":{"query":{"filtered":{"filter":{"qu..."
postgres=# copy t1 from '/Users/nok/Desktop/queries.json' escape '';
COPY 1966




This isn't a bug. Neither CSV format nor TEXT format are partucularly suitable for json. I'm quite certain I could compose legal json that will break your proposal (for example, with an embedded newline in the white space.)

It's also unnecessary. CSV format, while not designed for this, is nevertheless sufficiently flexible to allow successful import of json data meeting certain criteria (essentially no newlines), like this:

   copy the_table(jsonfield)
   from '/path/to/jsondata'
   csv quote e'\x01' delimiter e'\x02';


You aren't the first person to encounter this problem. See <http://adpgtech.blogspot.com/2014/09/importing-json-data.html>

Maybe we need to add something like this to the docs, or to the wiki.

Note too my comment in that blog post:

   Now this solution is a bit of a hack. I wonder if there's a case for
   a COPY mode that simply treats each line as a single datum. I also
   wonder if we need some more specialized tools for importing JSON,
   possibly one or more Foreign Data Wrappers. Such things could
   handle, say, embedded newline punctuation.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to