Our db team using greenplum for one year,we have 680g(one billion+ lines) data need to load to greenplum every day. we writed a program to call gpload to load data every 10min. every time will load 100000 to 10000000+ lines. we can accept abount 1/100000 error lines.
our environment: os version: rhel 6.3 greenplum : 4.3.5.2 gpload : 4.3.5.2 there's some problems we made when using gpload: 1、"line too long" this error make gpload failed, even if there is only one line in all my files needed load. we set "error_limit","segment reject limit" but not effected. if i try to find the error line,it is very hard.so we set max_line_length to "1048576" 2、"no partition key" this also make us headache, maybe there is only one line not correct(a delimiter in column not expected), or encoding not recognized; this will make gpload failed like problem 1; 3、column too long this will make gpload failed,too. we replace all data type to text,to skip this question. 4、in my product environment,when the greenplum cluster got error,logged like this: fatal","57m01","the database system is in mirror or uninitialized mode",,,,,,,0,,"postmaster.c",2994, but gpload and gpfdist process sleeped,not exit. we visit the gpload.py script,we found there is a problem not considered. gpload.py load data like this steps: step1: read_config() step2: setup_connection() --connect db the first time step3: read_table_metadata() step4: read_columns() step5: read_mapping() step6: start_gpfdists() step7: do_method() finally,it will: step8: removing temporary data --connect db the second time step9:killing gpfdist we find when step8 got error(db was not connected),the process will sleeping. thanks for visit. [email protected] From: Lei Chang Date: 2016-04-08 07:52 To: user CC: dev Subject: Re: May I mention some gpload issues? please. thanks! Cheers Lei On Thu, Apr 7, 2016 at 11:56 PM, [email protected] <[email protected]> wrote: Whan I using gpload in my work, I got some problems on it, May I mention some gpload issues here? [email protected]
