On Fri, Jun 16, 2017 at 6:36 AM, Kuldeep Chitrakar <
kuldeep.chitra...@synechron.com> wrote:

> I have two questions regarding loading tables which are defined as RCFile,
> Sequencfile  etc.
>
>
>
> Q1.
>
> 1.       Suppose a table is defined as STORED AS RCFILE or SEQUENCEFILE,
> how do we load this table. If source data file is in CSV format, do we have
> to load that file in some temp table and then load the target RCFile table
> from this temp table using INSERT OVERWRITE…. If this is true then does it
> mean we can only have managed table which could be defined as
> RCFile,SEQUENCEFile as we can not load external table using INSERT
> statement.
>
Yes, the normal flow is to have a staging table that is using the incoming
format and use insert reformat the data. (Side plug - You should seriously
consider using ORC or Parquet. They are much better for most use cases than
RCFile or SequenceFile.) The normal flow looks like:

create table passwd_staging (
  name string,
  not_used string,
  uid int,
  gid int,
  full_name string,
  home_dir string,
  shell string
) row format delimited fields terminated by ":"
  stored as textfile;

create table passwd (
  name string,
  not_used string,
  uid int,
  gid int,
  full_name string,
  home_dir string,
  shell string
) stored as orc;

load data local inpath "/etc/passwd" overwrite into table passwd_staging;

insert overwrite table passwd select * from passwd_staging;

> 2.       How can we store the data in external table which is defined as
> RCFile. Do we have to convert the source data file in RCFile format first
> using some tool.
>
It is easy to write such a tool. In fact, there is a patch on
https://issues.apache.org/jira/browse/ORC-199 that extends the ORC convert
tool to convert CSV into ORC. If you have to do it at scale however, you
are better off with the staging table approach above, because that will
generate a distributed job.

.. Oqwn

>
>
> Q2. When we use DESCRIBE FORMATTED <TableName> it shows COMPRESSED = NO.
> What does this mean, Even for compressed data in a table it shows NO.
>
>
>
>
>
> Thanks,
>
> Kuldeep
>

Reply via email to