Joe McDonnell created IMPALA-8821:
-------------------------------------

             Summary: Dataload for remote clusters should use recover partitions
                 Key: IMPALA-8821
                 URL: https://issues.apache.org/jira/browse/IMPALA-8821
             Project: IMPALA
          Issue Type: Improvement
          Components: Infrastructure
    Affects Versions: Impala 3.3.0
            Reporter: Joe McDonnell


Some test setups have data already in place and only need to run the DDLs to 
sync up the metadata. This corresponds to running 
testdata/bin/create-load-data.sh using a data snapshot but without 
skip_metadata_load.

Right now, for partitioned tables where the partitions are created dynamically 
as part of the insert, generate-schema-statements.py forces a reload:
{noformat}
# Force reloading of the table if the user specified the --force option or
# if the table is partitioned and there was no ALTER section specified. This is 
to
# ensure the partition metadata is always properly created. The ALTER section is
# used to create partitions, so if that section exists there is no need to force
# reload.
# IMPALA-6579: Also force reload all Kudu tables. The Kudu entity referenced
# by the table may or may not exist, so requiring a force reload guarantees
# that the Kudu entity is always created correctly.
# TODO: Rename the ALTER section to ALTER_TABLE_ADD_PARTITION
force_reload = options.force_reload or (partition_columns and not alter) or \
    file_format == 'kudu'{noformat}
In the case where the data is already in place, this would drop that data and 
reload it. Instead, we should just use "recover partitions" on that table to 
get all the partition information.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to