[ https://issues.apache.org/jira/browse/CASSANDRA-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453359#comment-16453359 ]
Stefania commented on CASSANDRA-14391: -------------------------------------- The columns can be specified in the command line. From typing {{HELP COPY}} in cqlsh: {code:java} COPY <table_name> [ ( column [, ...] ) ] FROM ( '<file_pattern_1, file_pattern_2, ... file_pattern_n>' | STDIN ) [ WITH <option>='value' [AND ...] ];{code} The {{HEADER}} option simply indicates whether the first line of the file should be ignored: {code:java} HEADER=false - whether to ignore the first line{code} There is no present way to load column names from the file, this is not a bug it's the expected behavior. It can be improved but I don't think this ticket should be a bug. > COPY FROM ignores headers > ------------------------- > > Key: CASSANDRA-14391 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14391 > Project: Cassandra > Issue Type: Bug > Components: CQL > Environment: cqlsh 5.0.1 and Cassandra 3.11.2 on macOS 10.13.2. > Reporter: M. Justin > Priority: Major > > COPY FROM appears to ignore the headers value, even when "headers = true" is > specified. This means that if the columns are reordered, the import process > will save values in the wrong columns. Additionally, if there are missing > columns, an error occurs, even if those columns are not primary key columns. > This behavior contradicts the behavior [specified in the > docs|https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html] > (emphasis mine). > {quote}COPY FROM imports data from a CSV file into an existing table. Each > line in the source file is imported as a row. All rows in the dataset must > contain the same number of fields and have values in the PRIMARY KEY fields. > The process verifies the PRIMARY KEY and updates existing records. If > {{HEADER = false}} and no column names are specified, the fields are imported > in deterministic order. *When column names are specified, fields are imported > in that order. Missing and empty fields are set to null. The source cannot > have more fields than the target table, however it can have fewer fields.* > {quote} > h2. Example > {noformat:title=temp.csv} > col2,col1,col3 > column value 1,key2,3 > column value 2,key4,3 > column value 3,key3,3 > column value 4,key1,3 > {noformat} > {code:sql} > create keyspace copy_to_from_test WITH replication = { 'class' : > 'SimpleStrategy', 'replication_factor' : 1 }; > use copy_to_from_test; > create table test_table (col1 text primary key, col2 text, col3 bigint); > copy test_table from 'temp.csv' with header = true; > {code} > The above code will incorrectly swap the "col2" and "col1" values, since it > expects the first column to be "col1". If I had instead swapped the order of > "col3", I would have received an error on input, as it would have attempted > to store text in a numerical column. > h2. Expected Behavior > I would expect specifying "with header = true" on a COPY FROM statement to > use the headers as column names for insertion, rather than merely skipping > the header row. Missing non-primary key columns should be set to null. > h2. Other > I ran across this issue when copying between two of my environments. One of > the environments had changed the columns in the primary key, but the other > had not yet. This caused the order of the columns to vary between the > environments. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org