Yes, we did run into this issue too. Typically if the text hive table
exceeds 100 million when converting txt table into ORC table.

On Fri, Dec 9, 2016 at 9:08 AM, Joaquin Alzola <joaquin.alz...@lebara.com>
wrote:

> HI List
>
>
>
> The transformation from textfile table to stored ORC table takes quiet a
> long time.
>
>
>
> Steps follow>
>
>
>
> 1.Create one normal table using textFile format
>
> 2.Load the data normally into this table
>
> 3.Create one table with the schema of the expected results of your normal
> hive table using stored as orcfile
>
> 4.Insert overwrite query to copy the data from textFile table to orcfile
> table
>
>
>
> I have about 1,5 million records with about 550 fields in each row.
>
>
>
> Doing step 4 takes about 30 minutes (moving from one format to the other).
>
>
>
> I have spark with only one worker (same for HDFS) so running now a
> standalone server but with 25G and 14 cores on that worker.
>
>
>
> BR
>
>
>
> Joaquin
> This email is confidential and may be subject to privilege. If you are not
> the intended recipient, please do not copy or disclose its content but
> contact the sender immediately upon receipt.
>

Reply via email to