Re: Using COPY to import large xml file

Anto Aravinth Mon, 25 Jun 2018 08:32:16 -0700

On Mon, Jun 25, 2018 at 8:54 PM, Anto Aravinth <[email protected]>
wrote:


>
>
> On Mon, Jun 25, 2018 at 8:20 PM, Nicolas Paris <[email protected]>
> wrote:
>
>>
>> 2018-06-25 16:25 GMT+02:00 Anto Aravinth <[email protected]>:
>>
>>> Thanks a lot. But I do got lot of challenges! Looks like SO data
>>> contains lot of tabs within itself.. So tabs delimiter didn't work for me.
>>> I thought I can give a special demiliter but looks like Postrgesql copy
>>> allow only one character as delimiter :(
>>>
>>> Sad, I guess only way is to insert or do a through serialization of my
>>> data into something that COPY can understand.
>>>
>>
>> easiest way would be:
>> xml -> csv -> \copy
>>
>> by csv, I mean regular quoted csv (Simply wrap csv field with double
>> quote, and escape
>> enventually contained quotes with an other double quote.).
>>
>
> I tried but no luck. Here is the sample csv, I wrote from my xml convertor:
>
> 1       "Are questions about animations or comics inspired by Japanese
> culture or styles considered on-topic?"  "pExamples include a href=""
> http://www.imdb.com/title/tt0417299/""; rel=""nofollow""Avatar/a, a href=""
> http://www.imdb.com/title/tt1695360/""; rel=""nofollow""Korra/a and, to
> some extent, a href=""http://www.imdb.com/title/tt0278238/"";
> rel=""nofollow""Samurai Jack/a. They're all widely popular American
> cartoons, sometimes even referred to as ema href=""https://en.wikipedia.
> org/wiki/Anime-influenced_animation"" rel=""nofollow""Amerime/a/em./p
>
>
> pAre questions about these series on-topic?/p
>
> "       "pExamples include a href=""http://www.imdb.com/title/tt0417299/"";
> rel=""nofollow""Avatar/a, a href=""http://www.imdb.com/title/tt1695360/"";
> rel=""nofollow""Korra/a and, to some extent, a href=""http://www.imdb.com/
> title/tt0278238/"" rel=""nofollow""Samurai Jack/a. They're all widely
> popular American cartoons, sometimes even referred to as ema href=""
> https://en.wikipedia.org/wiki/Anime-influenced_animation"";
> rel=""nofollow""Amerime/a/em./p
>
>
> pAre questions about these series on-topic?/p
>
> "       "null"
>
> the schema of my table is:
>
>   CREATE TABLE so2 (
>     id  INTEGER NOT NULL PRIMARY KEY,
>     title varchar(1000) NULL,
>     posts text,
>     body TSVECTOR,
>     parent_id INTEGER NULL,
>     FOREIGN KEY (parent_id) REFERENCES so1(id)
> );
>
> and when I run:
>
> COPY so2 from '/Users/user/programs/js/node-mbox/file.csv';
>
>
> I get:
>
>
> *ERROR:  missing data for column "body"*

*CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
inspired by Japanese culture or styles considered on-top..."*


> CONTEXT:  COPY so2, line 1: "1 "Are questions about animations or comics
> inspired by Japanese culture or styles considered on-top..."
>
> Not sure what I'm missing. Not sure the above csv is breaking because I
> have newlines within my content. But the error message is very hard to
> debug.
>
>
>
>>
>> Postgresql copy csv parser is one of the most robust I ever tested
>> before.
>>
>
>

Re: Using COPY to import large xml file

Reply via email to