Hi Roshan,
The following snippet summarizes the delimiters for your Hive table:
colelction.delim \u0002
field.delim \u0001
mapkey.delim \u0003
serialization.format \u0001
Your fields are delimited by \u0001, collections are delimited by \u0002 and
the delimiter between the key and value in any maps is \u0003. Can you verify
that your XML content doesn't contain any of these characters?
If this still doesn't help, could you pick an affected row and share what the
XML appears as in Hive and what it is expected to be?
Good luck!
Mark
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
----- Original Message -----
From: "mperformer" <[email protected]>
To: [email protected]
Sent: Sunday, May 6, 2012 11:34:55 PM
Subject: Re: Data are not displayed correctly on hive tables
Hi Mark
Many thanks for your reply. Please find the below output.
hive> describe formatted messagetemplate;
OK
# col_name data_type comment
messagetemplateid bigint None
messagetemplatename string None
datacol string None
messagetemplatetype string None
messagetype string None
messagetemplatedescription string None
originatingtemplateid bigint None
edited boolean None
userid bigint None
projectid bigint None
responsetemplateid bigint None
# Detailed Table Information
Database: default
Owner: root
CreateTime: Mon May 07 12:06:59 EST 2012
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://app6:9100/mnt/hive-test/warehouse/messagetemplate
Table Type: MANAGED_TABLE
Table Parameters:
comment This is the messagetemplate table
transient_lastDdlTime 1336356473
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
colelction.delim \u0002
field.delim \u0001
mapkey.delim \u0003
serialization.format \u0001
Time taken: 3.2 seconds
Thanks again.
./Roshan.
On Mon, May 7, 2012 at 1:06 PM, Mark Grover < [email protected] > wrote:
Could you share the output of the following command in Hive:
describe formatted messagetemplate
My hunch is that your Hive table is using a delimiter (e.g. '\t') that appears
in the content of your XML.
Mark Grover, Business Intelligence Analyst
OANDA Corporation
www: oanda.com www: fxtrade.com
----- Original Message -----
From: "mperformer" < [email protected] >
To: [email protected]
Sent: Sunday, May 6, 2012 8:34:27 PM
Subject: Data are not displayed correctly on hive tables
Hi
I am using
• Hadoop 0.20.2
• Hive 0.8.1
• Sqoop 1.4.1-incubating
in my sample project. Currently I am importing data from PostgreSQL to Hive
table using Sqoop. My database table in PostgreSQL has 4 columns and one column
stores a bit large XML file as TEXT data type. The same column defined in HIVE
as string, but after that column data is not importing and shows as null;
Table structure in PostgreSQL
CREATE TABLE public.messagetemplate (
messagetemplateid BIGSERIAL,
messagetemplatename TEXT,
data TEXT,
messagetemplatetype TEXT,
CONSTRAINT pk_messagetemplate PRIMARY KEY(messagetemplateid)
) WITHOUT OIDS;
Table structure in Hive
hive> desc messagetemplate;
OK
messagetemplateid bigint
messagetemplatename string
data string
messagetemplatetype string
The data column store the XML file as text, but during the import to hive, all
data are imported properly (checked the files in HDFS). But using HIVE select
statement, it only shows small part from the XML text and the rest column (last
column) is null.
Could someone please help me to sort this out. Thanks.