RE: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row

Cox, Jonathan A Wed, 30 Mar 2016 16:06:42 -0700

To add a little more detail on this issue, the real problems appears to be that 
a CSV containing the "\" character is being interpreted as an escape sequence 
by Phoenix (java.lang.String). So I happened to have a row where a "\" appeared 
directly before my delimiter. Therefore, my delimiter was escaped and ignored.


I'm wondering if this is desirable behavior. Should the CSV be allowed to 
contain escape sequences, or should the ASCII text be interpreted directly as 
it is? In other words, if you want a tab (\t), it should just be ASCII 0x09 in 
the file (or whatever the latest and greatest text format is these days).

From: Cox, Jonathan A [mailto:ja...@sandia.gov]
Sent: Wednesday, March 30, 2016 4:41 PM
To: user@phoenix.apache.org
Subject: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row

Actually, it seems that the line causing my problem really was missing a 
column. I checked the behavior of StringToArrayConverter in 
org.apache.phoenix.util.csv, and it does not exhibit such behavior.

So the fault is on my end.

Thanks

From: Cox, Jonathan A
Sent: Wednesday, March 30, 2016 3:36 PM
To: 'user@phoenix.apache.org'
Subject: Problem Bulk Loading CSV with Empty Value at End of Row

I am using the CsvBulkLoaderTool to ingest a tab separated file that can 
contain empty columns. The problem is that the loader incorrectly interprets an 
empty last column as a non-existent column (instead of as an null entry).

For example, imagine I have a comma separated CSV with the following format:
key,username,password,gender,position,age,school,favorite_color

Now, let's say my CSV file contains the following row, where the gender field 
is missing. This will load correctly:
*#Ssj289,joeblow,sk29ssh, ,CEO,102,MIT,blue<new line>

However, if the missing field happens to be the last entry (favorite_color), it 
complains that there are only 7 of 8 required columns present:
*#Ssj289,joeblow,sk29ssh,female ,CEO,102,MIT, <new line>

This behavior will throw an error and fail to load the entire CSV file. Any 
pointers on how I can modify the source to have Phoenix interpret 
<delimiter><newline> as an empty/null last column?

Thanks,
Jon
(actual error is pasted below)


java.lang.Exception: java.lang.RuntimeException: 
java.lang.IllegalArgumentException: CSV record does not have enough values (has 
26, but needs 27)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.IllegalArgumentException: CSV 
record does not have enough values (has 26, but needs 27)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:197)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:72)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: CSV record does not have enough 
values (has 26, but needs 27)
        at 
org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:74)
        at 
org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:44)
        at 
org.apache.phoenix.util.UpsertExecutor.execute(UpsertExecutor.java:133)
        at 
org.apache.phoenix.mapreduce.FormatToBytesWritableMapper.map(FormatToBytesWritableMapper.java:166)
        ... 10 more
16/03/30 15:01:01 INFO mapreduce.Job: Job job_local1507432235_0

RE: [EXTERNAL] RE: Problem Bulk Loading CSV with Empty Value at End of Row

Reply via email to