Hi, Indeed, the -e parameter isn't documented for the bulk loader on the documentation page. This is definitely not intentional.
It also does indeed appear that there are some functional differences between the way that the escape character is defined between psql (where there is no default escape character) and the bulk loader (where backslash is the default escape character). Also, the ignore-errors flag should truly ignore errors -- I think that the reason that it's not doing that here is that there's an error being thrown from the CSV parsing library, and not from Phoenix. However, I agree that this shouldn't stop the bulk load tool. Could you please log these issues in the Phoenix jira at https://issues.apache.org/jira/browse/PHOENIX A potential work-around (although I'm not sure if it will work) is to supply an empty character string as the escape character for the bulk loader, as follows: -e '' (that's the -e flag, followed by two single quotes). That might resolve this issue on the short-term, although I'm not sure. - Gabriel On Fri, Dec 9, 2016 at 2:28 AM, rubysina <[email protected]> wrote: > ok. thank you. > > but there's no parameter -e on page > http://phoenix.apache.org/bulk_dataload.html > and, why the -g,–ignore-errors parameter doesn't work? if there's some > lines ended with backslash, just ignore it, why fail? > > there's always something error in txt files. why not ignore it? how? > > and, if using -e parameter, what character should I use? > seems that I must find a special character, but I don't know which is > correct. > actually, I don't want to use any escape character. > is there any special option like "escape off" or something else, so I can > load anything without treating any character as an escape letter. > > some other products , like greenplum, do have such interesting setting when > bulkloading txt file: escape: 'OFF' > > ----------------------------------------------------------------- > quote on http://phoenix.apache.org/bulk_dataload.html > The following parameters can be used with the MapReduce loader. > Parameter Description > -i,–input Input CSV path (mandatory) > -t,–table Phoenix table name (mandatory) > -a,–array-delimiter Array element delimiter (optional) > -c,–import-columns Comma-separated list of columns to be imported > -d,–delimiter Input delimiter, defaults to comma > -g,–ignore-errors Ignore input errors > -o,–output Output path for temporary HFiles (optional) > -s,–schema Phoenix schema name (optional) > -z,–zookeeper Zookeeper quorum to connect to (optional) > -it,–index-table Index table name to load (optional) > > > -------------------------------- > > From: Gabriel Reid <[email protected]> > Subject: Re: Error with lines ended with backslash when Bulk Data Loading > Date: 2016-12-09 02:06 (+0800) > List: [email protected] > Hi Backslash is the default escape character that is used for parsing CSV > data when running a bulk import, so it has a special meaning. You can supply > a different (custom) escape character with the -e or --escape flag on the > command line so that parsing your CSV files that include backslashes like > this will run properly. - Gabriel > > ----- Original Message ----- > From: "rubysina" <[email protected]> > To: "user" <[email protected]> > Subject: Error with lines ended with backslash when Bulk Data Loading > Date: 2016-12-08 16:11 > > hi, I'm new to phoenix sql and here's a little problem. > > I'm following this page http://phoenix.apache.org/bulk_dataload.html > I just found that the MapReduce importer could not load file with lines > ended with backslash > even with the -g parameter , i.e. ignore-errors, "java.io.IOException: EOF > whilst processing escape sequence" > > but it's OK if the line contains backslash but not at the end of line, > > and there's no problem when using psql.py to load the same file. > > why? how? > > thank you. > > > > ----------------------------------------------------------------------------------------------- > for example: > > > create table a(a char(100) primary key) > > echo \\>a.csv > cat a.csv > \ > hdfs dfs -put a.csv > ...JsonBulkLoadTool -g -t a -i a.csv > -- error > 16/12/08 15:44:21 INFO mapreduce.Job: Task Id : > attempt_1481093434027_0052_m_000000_0, Status : FAILED > Error: java.lang.RuntimeException: java.lang.RuntimeException: > java.io.IOException: EOF whilst processing escape sequence > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:202) > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:74) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > Caused by: java.lang.RuntimeException: java.io.IOException: EOF whilst > processing escape sequence > at > org.apache.commons.csv.CSVParser$1.getNextRecord(CSVParser.java:398) > at org.apache.commons.csv.CSVParser$1.hasNext(CSVParser.java:407) > at com.google.common.collect.Iterators.getNext(Iterators.java:890) > at com.google.common.collect.Iterables.getFirst(Iterables.java:781) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:109) > at > org.apache.phoenix.mapreduce.CsvToKeyValueMapper$CsvLineParser.parse(CsvToKeyValueMapper.java:91) > at > org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:161) > ... 9 more > > > > echo \\a>a.csv > cat a.csv > \a > hdfs dfs -rm a.csv > hdfs dfs -put a.csv > ...JsonBulkLoadTool -g -t a -i a.csv > -- success > > > echo \\>a.csv > cat a.csv > \ > psql.py -t A zoo a.csv > CSV Upsert complete. 1 rows upserted > -- success > > > thank you.
