Hi,

The CSV file is about 250 GB, with about 1 billion rows of data. Persistence is on and there is enough memory.
It has been successfully imported, but it takes a long time.

The problem at present is that the data of this large table is imported successfully, and then 50 million tables are imported. The speed of data writing is significantly slowed down.

We have four hosts in total. The cache configuration is as follows:
<property name="backups" value="1"/>
<property name="partitionLossPolicy" value="READ_ONLY_SAFE"/>

With persistence enabled, the other parameters are nothing special.

在 2019/7/12 下午1:47, Павлухин Иван 写道:
Hi,

Currently COPY is a mechanism designed for the fastest data load. Yes,
you can try separate your data in chunks and execute COPY in parallel.
By the way, where is your input located and what is it size in bytes
(Gb)? Is persistence enabled? Does a DataRegion have enough memory to
keep all data?

ср, 10 июл. 2019 г. в 05:02, 18624049226 <18624049...@163.com>:
If the COPY command is used to import a large amount of data, the execution 
time is a little long.
In the current test environment, the performance is about more than 10,000/s, 
so if it is 100 million data, it will take several hours.

Is there a faster way to import, or is COPY working in parallel?

thanks!



Reply via email to