James,
Looks like I'm on the right track, however I'm not sure why it is not accepting
my delimiters. I am using the TPC-H data set, so for instance here is what a
line from customer.csv looks like:
6967|Customer#000006967|uMPce8nER9v3PCIcsZmNlSrCKcau6tJd4qe|13|23-816-949-8373|7865.21|MACHINERY|r
pinto beans. regular multipliers detect carefully. carefully final
instructions affix quickly. packages boost af|
When I try to import the csv file into my table "CUSTOMER", it looks like psql
is not liking the delimiters I pass in. If I use the 3 numbers like in the
usage below, I just get a wrong format error, but it at least attempts to
import the data. Any thoughts?
./psql.sh -t CUSTOMER -h
C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT
-d | localhost:2181 customer.csv
Usage: psql [-t table-name] [-h comma-separated-column-names | in-line] [-d
field-delimiter-char quote-char escape-char]<zookeeper>
<path-to-sql-or-csv-file>...
By default, the name of the CSV file is used to determine the Phoenix table
into which the CSV data is loaded
and the ordinal value of the columns determines the mapping.
-t overrides the table into which the CSV data is loaded
-h overrides the column names to which the CSV data maps
A special value of in-line indicating that the first line of the CSV file
determines the column to which the data maps.
-s uses strict mode by throwing an exception if a column name doesn't match
during CSV loading.
-d uses custom delimiters for CSV loader, need to specify single char for
field delimiter, phrase delimiter, and escape char.
number is NOT usually a delimiter and shall be taken as 1 -> ctrl A, 2 ->
ctrl B ... 9 -> ctrl I.
Examples:
psql localhost my_ddl.sql
psql localhost my_ddl.sql my_table.csv
psql -t my_table my_cluster:1825 my_table2012-Q3.csv
psql -t my_table -h col1,col2,col3 my_cluster:1825 my_table2012-Q3.csv
psql -t my_table -h col1,col2,col3 -d 1 2 3 my_cluster:1825
my_table2012-Q3.csv
Thanks
From: Devin Pinkston [mailto:[email protected]]
Sent: Thursday, February 06, 2014 8:41 AM
To: [email protected]
Subject: RE: Import Delimiter
James,
Interesting thanks for the info. So if I were to import data containing pipe
delimiters, I would have to use the non map-reduce bulk loader. Are you
referencing that sqlline would have to be used?
Sorry I am trying to figure out how I can import these large flat files this
way.
Thank you.
From: James Taylor [mailto:[email protected]]
Sent: Wednesday, February 05, 2014 8:25 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Import Delimiter
You're right. It was added to the non map-reduce bulk loader. This is the
loader that loads local CSV files through the bin/psql.sh script. There's a -d
option that was added in this pull request[1]. It would be nice to add this
same functionality to our csv map-reduce bulk loader too if anyone is
interested.
Thanks,
James
[1] https://github.com/forcedotcom/phoenix/pull/514
On Wed, Feb 5, 2014 at 9:35 AM, Nick Dimiduk
<[email protected]<mailto:[email protected]>> wrote:
Hi James,
I'm looking through the bulkload job, and it looks to me light this isn't
configurable at the moment. Have a look at
https://github.com/apache/incubator-phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/map/reduce/MapReduceJob.java#L136
Is there something I'm missing? Perhaps I'm looking in the wrong place?
Thanks,
Nick
On Wed, Feb 5, 2014 at 10:16 AM, Devin Pinkston
<[email protected]<mailto:[email protected]>> wrote:
James,
Thanks for the quick response. Do you know what the argument or command is to
pass in?
For instance ./csv-bulk-loader.sh -delimiter '|'
Thanks
From: James Taylor
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, February 05, 2014 11:51 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Import Delimiter
Hello,
The CSV map-reduce based bulk loader supports custom delimiters. Might need to
be doc-ed, though.
Thanks,
James
On Wednesday, February 5, 2014, Devin Pinkston
<[email protected]<mailto:[email protected]>> wrote:
Hello,
I am trying to import data into HBASE however I have '|' or pipe delimiters=
in my file instead of commas. I don't see a way to pass in a different
separator/delimiter with the jar. What would be the best way to import data =
like this?
Thanks
The information contained in this transmission may contain privileged and
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review,
dissemination, distribution or duplication of this communication is strictly
prohibited.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus,
fault or defect and it is therefore the responsibility of the recipient to
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to
[email protected]<mailto:[email protected]>. Thank you.
The information contained in this transmission may contain privileged and
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review,
dissemination, distribution or duplication of this communication is strictly
prohibited.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus,
fault or defect and it is therefore the responsibility of the recipient to
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to
[email protected]<mailto:[email protected]>. Thank you.
The information contained in this transmission may contain privileged and
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review,
dissemination, distribution or duplication of this communication is strictly
prohibited.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus,
fault or defect and it is therefore the responsibility of the recipient to
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to
[email protected]<mailto:[email protected]>. Thank you.
The information contained in this transmission may contain privileged and
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review,
dissemination, distribution or duplication of this communication is strictly
prohibited.
If you are not the intended recipient, please contact the sender by reply
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus,
fault or defect and it is therefore the responsibility of the recipient to
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to
[email protected]. Thank you.