[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2023-04-25 Thread tanghui (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

tanghui updated HIVE-17361:
---
Description: 
引入 ACID 后不支持加载数据。需要填补 ACID 表和常规配置单元表之间的差距。

当前文档位于[DML 
操作|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]和[将文件加载到表中|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

 
 * 加载数据对数据执行非常有限的验证,特别是它使用可能不在 0_0 中的输入文件名,这可能会破坏某些读取逻辑。(当然会酸)。
 * 它不检查文件的架构。这对于 Acid 来说可能不是问题,它需要自描述的 ORC,因此 Schema Evolution 
可以无缝地处理这个问题。(假设架构没有太大不同)。
 * 它会检查 _InputFormat_S 是否兼容。
 * 分桶(并因此排序)表不支持加载数据(但仅当 hive.strict.checks.bucketing=true(默认))。将保留对 Acid 的限制。
 * 加载数据支持 OVERWRITE 子句
 * 文件权限/所有权会发生什么:重命名与复制差异


实施将遵循与中相同的想法HIVE-14988并为 OVERWRITE 子句使用 base_N/ 目录。


minor compaction 如何处理原始文件的 delta/base?
由于 delta_8_8/_meta_data 是在文件移动之前创建的,因此 delta_8_8 在填充之前变得可见。这是一个问题吗?
不是因为 txn 8 没有提交。
h3. 实施说明/限制(补丁 25)
 * 不支持分桶/排序表
 * 输入文件名必须采用 0_0/0_0_copy_1 形式 - 强制执行。(HIVE-18125)
 * 加载数据创建一个包含新文件的 delta_x_x/
 * Load Data w/Overwrite 创建一个包含新文件的 base_x/
 * “_metadata_acid”文件放置在目标目录中以指示它需要在读取时进行特殊处理
 * 输入文件必须是“普通”ORC 文件,即不包含 acid 元数据列,如果这些文件是从另一个 Acid 
表复制的,就会出现这种情况。在后一种情况下,数据中嵌入的 ROW_ID 
在目标表中可能没有意义(例如,如果它在不同的集群中)。此类文件也可能混合了已提交和已中止的数据。
 ** 稍后可以通过向 _metadata_acid 文件添加信息以在读取时忽略现有的 ROW_ID 来放松这一点。
 * ROW_ID 在读取时动态附加,并通过压缩永久保存。这与处理在转换为 Acid 之前写入表的文件的方式相同。
 * 支持矢量化

  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* A '_metadata_acid' file is placed in the target directory to indicate it 
requires special handling on read
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target table (if it's in a different cluster, for example).  Such files 
may also have a mix of committed and aborted data.
** this could be relaxed later by adding info to the _metadata_acid file to 
ignore existing ROW_IDs on read.
* ROW_IDs are attached dynamically at read time and made permanent by 
compaction.  This is done the same way has handling of files that were written 
to a table before it was converted to Acid.
* Vectorization is supported




> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> 引入 ACID 后不支持加载数据。需要填补 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-12-01 Thread Lefty Leverenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lefty Leverenz updated HIVE-17361:
--
Labels: TODOC3.0  (was: )

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
>  Labels: TODOC3.0
> Fix For: 3.0.0
>
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?
> It's not since txn 8 is not committed.
> h3. Implementation Notes/Limitations (patch 25)
> * bucketed/sorted tables are not supported
> * input files names must be of the form 0_0/0_0_copy_1 - enforced. 
> (HIVE-18125)
> * Load Data creates a delta_x_x/ that contains new files
> * Load Data w/Overwrite creates a base_x/ that contains new files
> * A '_metadata_acid' file is placed in the target directory to indicate it 
> requires special handling on read
> * The input files must be 'plain' ORC files, i.e. not contain acid metadata 
> columns as would be the case if these files were copied from another Acid 
> table.  In the latter case, the ROW_IDs embedded in the data may not make 
> sense in the target table (if it's in a different cluster, for example).  
> Such files may also have a mix of committed and aborted data.
> ** this could be relaxed later by adding info to the _metadata_acid file to 
> ignore existing ROW_IDs on read.
> * ROW_IDs are attached dynamically at read time and made permanent by 
> compaction.  This is done the same way has handling of files that were 
> written to a table before it was converted to Acid.
> * Vectorization is supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-30 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Alan for the review

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?
> It's not since txn 8 is not committed.
> h3. Implementation Notes/Limitations (patch 25)
> * bucketed/sorted tables are not supported
> * input files names must be of the form 0_0/0_0_copy_1 - enforced. 
> (HIVE-18125)
> * Load Data creates a delta_x_x/ that contains new files
> * Load Data w/Overwrite creates a base_x/ that contains new files
> * A '_metadata_acid' file is placed in the target directory to indicate it 
> requires special handling on read
> * The input files must be 'plain' ORC files, i.e. not contain acid metadata 
> columns as would be the case if these files were copied from another Acid 
> table.  In the latter case, the ROW_IDs embedded in the data may not make 
> sense in the target table (if it's in a different cluster, for example).  
> Such files may also have a mix of committed and aborted data.
> ** this could be relaxed later by adding info to the _metadata_acid file to 
> ignore existing ROW_IDs on read.
> * ROW_IDs are attached dynamically at read time and made permanent by 
> compaction.  This is done the same way has handling of files that were 
> written to a table before it was converted to Acid.
> * Vectorization is supported



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* A '_metadata_acid' files is placed in the target directory to indicate it 
requires special handling on read
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target table (if it's in a different cluster, for example).  Such files 
may also have a mix of committed and aborted data.
** this could be relaxed later by adding info to the _metadata_acid file to 
ignore existing ROW_IDs on read.
* ROW_IDs are attached dynamically at read time and made permanent by 
compaction.  This is done the same way has handling of files that were written 
to a table before it was converted to Acid.
* Vectorization is supported



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target table (if it's in a different cluster, for example).  Such files 
may also have a mix of committed and aborted 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* A '_metadata_acid' file is placed in the target directory to indicate it 
requires special handling on read
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target table (if it's in a different cluster, for example).  Such files 
may also have a mix of committed and aborted data.
** this could be relaxed later by adding info to the _metadata_acid file to 
ignore existing ROW_IDs on read.
* ROW_IDs are attached dynamically at read time and made permanent by 
compaction.  This is done the same way has handling of files that were written 
to a table before it was converted to Acid.
* Vectorization is supported



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* A '_metadata_acid' files is placed in the target directory to indicate it 
requires special handling on read
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

h3. Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced. 
(HIVE-18125)
* Load Data creates a delta_x_x/ that contains new files
* Load Data w/Overwrite creates a base_x/ that contains new files
* The input files must be 'plain' ORC files, i.e. not contain acid metadata 
columns as would be the case if these files were copied from another Acid 
table.  In the latter case, the ROW_IDs embedded in the data may not make sense 
in the target table (if it's in a different cluster, for example).  Such files 
may also have a mix of committed and aborted data.
** this could be relaxed later by adding info to the _metadata_acid file to 
ignore existing ROW_IDs on read.
* ROW_IDs are attached dynamically at read time and made permanent by 
compaction.  This is done the same way has handling of files that were written 
to a table before it was converted to Acid.
* Vectorization is supported



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced.




> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?
It's not since txn 8 is not committed.

Implementation Notes/Limitations (patch 25)
* bucketed/sorted tables are not supported
* input files names must be of the form 0_0/0_0_copy_1 - enforced.



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?



> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-28 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.25.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.25.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.24.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.24.patch, HIVE-17361.3.patch, 
> HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.23.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.23.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: (was: HIVE-17361.22.patch)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-22 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.22.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.22.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.21.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.21.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.20.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.20.patch, HIVE-17361.3.patch, 
> HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-21 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.19.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.19.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-20 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.17.patch

patch 17 has vectorized code path and some test fixes

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.17.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-17 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.16.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.16.patch, HIVE-17361.2.patch, HIVE-17361.3.patch, 
> HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-16 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.14.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.14.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?
> Since delta_8_8/_meta_data is created before files are moved, delta_8_8 
> becomes visible before it's populated.  Is that an issue?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?
Since delta_8_8/_meta_data is created before files are moved, delta_8_8 becomes 
visible before it's populated.  Is that an issue?


  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?



> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle 

[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-15 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.12.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.12.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.11.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.11.patch, HIVE-17361.2.patch, HIVE-17361.3.patch, 
> HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.10.patch

10 is the same as 9 - build bot lost it again

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.10.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-13 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.09.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.09.patch, HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.08.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.08.patch, 
> HIVE-17361.1.patch, HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.

\\
How is minor compaction going to handle delta/base with original files?


  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.
\\
How is minor compaction going to handle delta/base with original files?



> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.1.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.
\\
How is minor compaction going to handle delta/base with original files?


  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.



> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.1.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.
> \\
> How is minor compaction going to handle delta/base with original files?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Status: Patch Available  (was: Open)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.1.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Attachment: HIVE-17361.07.patch

HIVE-17361.07.patch - WIP

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.07.patch, HIVE-17361.1.patch, 
> HIVE-17361.2.patch, HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-03 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences

\\
The implementation will follow the same idea as in HIVE-14988 and use a base_N/ 
dir for OVERWRITE clause.


  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences




> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences
> \\
> The implementation will follow the same idea as in HIVE-14988 and use a 
> base_N/ dir for OVERWRITE clause.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-02 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause
* What happens to file permissions/ownership: rename vs copy differences



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause




> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause
> * What happens to file permissions/ownership: rename vs copy differences



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data (but only if 
hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
Acid.
* Load Data supports OVERWRITE clause



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data.  Will keep this 
restriction for Acid.
* Load Data supports OVERWRITE clause




> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data (but only if 
> hive.strict.checks.bucketing=true (default)).  Will keep this restriction for 
> Acid.
> * Load Data supports OVERWRITE clause



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under [DML 
Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
 and [Loading files into 
tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:

\\
* Load Data performs very limited validations of the data, in particular it 
uses the input file name which may not be in 0_0 which can break some read 
logic.  (Certainly will for Acid).
* It does not check the schema of the file.  This may be a non issue for Acid 
which requires ORC which is self describing so Schema Evolution may handle this 
seamlessly.  (Assuming Schema is not too different).
* It does check that _InputFormat_S are compatible. 
* Bucketed (and thus sorted) tables don't support Load Data.  Will keep this 
restriction for Acid.
* Load Data supports OVERWRITE clause



  was:
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under : 


> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under [DML 
> Operations|https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-DMLOperations]
>  and [Loading files into 
> tables|https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Loadingfilesintotables]:
> \\
> * Load Data performs very limited validations of the data, in particular it 
> uses the input file name which may not be in 0_0 which can break some 
> read logic.  (Certainly will for Acid).
> * It does not check the schema of the file.  This may be a non issue for Acid 
> which requires ORC which is self describing so Schema Evolution may handle 
> this seamlessly.  (Assuming Schema is not too different).
> * It does check that _InputFormat_S are compatible. 
> * Bucketed (and thus sorted) tables don't support Load Data.  Will keep this 
> restriction for Acid.
> * Load Data supports OVERWRITE clause



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-11-01 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Description: 
LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
between ACID table and regular hive table.

Current Documentation is under : 

  was:LOAD DATA was not supported since ACID was introduced. Need to fill this 
gap between ACID table and regular hive table.


> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.
> Current Documentation is under : 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-10-27 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Issue Type: New Feature  (was: Bug)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-09-29 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Priority: Critical  (was: Major)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Eugene Koifman
>Priority: Critical
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-09-11 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17361:
--
Status: Open  (was: Patch Available)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-31 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-17361:
-
Attachment: HIVE-17361.4.patch

The build bot failed for patch 3.
Made an identical patch 4 to trigger precommit test.

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch, HIVE-17361.4.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-31 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-17361:
-
Attachment: HIVE-17361.3.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch, 
> HIVE-17361.3.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-30 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-17361:
-
Attachment: HIVE-17361.2.patch

patch 2 with a different approach

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch, HIVE-17361.2.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-17361:
-
Attachment: HIVE-17361.1.patch

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17361) Support LOAD DATA for transactional tables

2017-08-22 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-17361:
-
Status: Patch Available  (was: Open)

> Support LOAD DATA for transactional tables
> --
>
> Key: HIVE-17361
> URL: https://issues.apache.org/jira/browse/HIVE-17361
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-17361.1.patch
>
>
> LOAD DATA was not supported since ACID was introduced. Need to fill this gap 
> between ACID table and regular hive table.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)