Hi Rakesh,
How big are your files? and is the data ordered/sorted by column on which
you are running distinct on? if column contains empty string, null and
spaces which all treated as different by hive. Converting them to hive's
native null type can help in improving performance..
Thank you,
An extension to below problem, I have noticed something else too in Hive
2.1.0.
If I create a *external* table with specific location and with partitions,
after renaming the partition, the underline folder names do not change.
for e.g -
insert into test_local_part partition (col2=1) values
if you were running CTAS command from hive CLI, you might have noticed that
headers get printed in CLI once the query execution is completed. I think
the property hive.cli.print.header is there to only print headers only in
cli.
Not sure about S3, but I tried below which worked perfectly
hive -e
apologies,
my command was
hive -e "set hive.cli.print.header=true; *select * from abc*" >> output.txt
Thank you,
*Pushkar Gujar*
On Wed, Mar 8, 2017 at 8:49 PM, Pushkar.Gujar <pushkarvgu...@gmail.com>
wrote:
> if you were running CTAS command from hive CL