[ 
https://issues.apache.org/jira/browse/CASSANDRA-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453359#comment-16453359
 ] 

Stefania commented on CASSANDRA-14391:
--------------------------------------

The columns can be specified in the command line. From typing {{HELP COPY}} in 
cqlsh:
{code:java}
COPY <table_name> [ ( column [, ...] ) ]
FROM ( '<file_pattern_1, file_pattern_2, ... file_pattern_n>' | STDIN )
[ WITH <option>='value' [AND ...] ];{code}
The {{HEADER}} option simply indicates whether the first line of the file 
should be ignored: 
{code:java}
HEADER=false            - whether to ignore the first line{code}
There is no present way to load column names from the file, this is not a bug 
it's the expected behavior. It can be improved but I don't think this ticket 
should be a bug.

> COPY FROM ignores headers
> -------------------------
>
>                 Key: CASSANDRA-14391
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14391
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>         Environment: cqlsh 5.0.1 and Cassandra 3.11.2 on macOS 10.13.2.
>            Reporter: M. Justin
>            Priority: Major
>
> COPY FROM appears to ignore the headers value, even when "headers = true" is 
> specified. This means that if the columns are reordered, the import process 
> will save values in the wrong columns.  Additionally, if there are missing 
> columns, an error occurs, even if those columns are not primary key columns.
> This behavior contradicts the behavior [specified in the 
> docs|https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html] 
> (emphasis mine).
> {quote}COPY FROM imports data from a CSV file into an existing table. Each 
> line in the source file is imported as a row. All rows in the dataset must 
> contain the same number of fields and have values in the PRIMARY KEY fields. 
> The process verifies the PRIMARY KEY and updates existing records. If 
> {{HEADER = false}} and no column names are specified, the fields are imported 
> in deterministic order. *When column names are specified, fields are imported 
> in that order. Missing and empty fields are set to null. The source cannot 
> have more fields than the target table, however it can have fewer fields.*
> {quote}
> h2. Example
> {noformat:title=temp.csv}
> col2,col1,col3
> column value 1,key2,3
> column value 2,key4,3
> column value 3,key3,3
> column value 4,key1,3
> {noformat}
> {code:sql}
> create keyspace copy_to_from_test WITH replication = { 'class' : 
> 'SimpleStrategy', 'replication_factor' : 1 };
> use copy_to_from_test;
> create table test_table (col1 text primary key, col2 text, col3 bigint);
> copy test_table from 'temp.csv' with header = true;
> {code}
> The above code will incorrectly swap the "col2" and "col1" values, since it 
> expects the first column to be "col1". If I had instead swapped the order of 
> "col3", I would have received an error on input, as it would have attempted 
> to store text in a numerical column.
> h2.  Expected Behavior
> I would expect specifying "with header = true" on a COPY FROM statement to 
> use the headers as column names for insertion, rather than merely skipping 
> the header row.  Missing non-primary key columns should be set to null.
> h2. Other
> I ran across this issue when copying between two of my environments. One of 
> the environments had changed the columns in the primary key, but the other 
> had not yet. This caused the order of the columns to vary between the 
> environments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to