subject:"\[jira\] \[Updated\] \(HIVE\-18350\) load data should rename files consistent with insert statements"

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-08 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

The failures are unrelated, pushed to master.

 

Thanks [~sershe] and [~ashutoshc] for the reviews.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.15.patch, HIVE-18350.16.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-07 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.16.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.15.patch, HIVE-18350.16.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-07 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.16.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.15.patch, HIVE-18350.16.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-07 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.16.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.15.patch, HIVE-18350.16.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.15.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.15.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-05 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.14.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.14.patch, HIVE-18350.2.patch, HIVE-18350.3.patch, 
> HIVE-18350.4.patch, HIVE-18350.5.patch, HIVE-18350.6.patch, 
> HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-03 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.13.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.13.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-02 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.12.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.12.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-02 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.11.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.2.patch, HIVE-18350.3.patch, 
> HIVE-18350.4.patch, HIVE-18350.5.patch, HIVE-18350.6.patch, 
> HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-02 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.11.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-02 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.11.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.11.patch, HIVE-18350.2.patch, HIVE-18350.3.patch, 
> HIVE-18350.4.patch, HIVE-18350.5.patch, HIVE-18350.6.patch, 
> HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.10.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.10.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.10.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.10.patch, 
> HIVE-18350.2.patch, HIVE-18350.3.patch, HIVE-18350.4.patch, 
> HIVE-18350.5.patch, HIVE-18350.6.patch, HIVE-18350.7.patch, 
> HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-02-01 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.9.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch, HIVE-18350.9.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-31 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.8.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch, HIVE-18350.8.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-31 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.7.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, 
> HIVE-18350.6.patch, HIVE-18350.7.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-17 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.6.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch, HIVE-18350.6.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.5.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch, HIVE-18350.5.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Description: 
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.

For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
 For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.

This will likely affect most of the tests which load data which is pretty 
significant due to which it is further divided into two subtasks for smoother 
merge.

For existing tables in customer database, it is recommended to reload bucketed 
tables otherwise if customer tries to run SMB join and there is a bucket for 
which there is no split, then there is a possibility of getting incorrect 
results. However, this is not a regression as it would happen even without the 
patch.
With this patch however, and reloading data, the results should be correct.

For non-bucketed tables and external tables, there is no difference in behavior 
and reloading data is not needed.

  was:
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.

This will likely affect most of the tests which load data which is pretty 
significant due to which it is further divided into two subtasks for smoother 
merge.


> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
>  For bucketed table, hive relies on user to name the files matching the 
> bucket in non-strict mode. Hive assumes that the data belongs to same bucket 
> in a file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.
> For existing tables in customer database, it is recommended to reload 
> bucketed tables otherwise if customer tries to run SMB join and there is a 
> bucket for which there is no split, then there is a possibility of getting 
> incorrect results. However, this is not a regression as it would happen even 
> without the patch.
> With this patch however, and reloading data, the results should be correct.
> For non-bucketed tables and external tables, there is no difference in 
> behavior and reloading data is not needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-18350:
--
Hadoop Flags: Incompatible change

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.4.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: (was: HIVE-18350.4.patch)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-16 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.4.patch

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>Priority: Major
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch, HIVE-18350.4.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-12 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.3.patch

Reuploading the patch after build failure.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, 
> HIVE-18350.3.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-12 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.2.patch

Patch with all the code changes and new tests.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-06 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Description: 
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.

This will likely affect most of the tests which load data which is pretty 
significant due to which it is further divided into two subtasks for smoother 
merge.

  was:
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.

This will likely affect most of the tests which load data which is pretty 
significant.


> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant due to which it is further divided into two subtasks for smoother 
> merge.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-04 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Status: Patch Available  (was: Open)

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-04 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Attachment: HIVE-18350.1.patch

Only contains changes for bucketed tables.

> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18350.1.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-04 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Description: 
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.

This will likely affect most of the tests which load data which is pretty 
significant.

  was:
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.



> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

2018-01-03 Thread Deepak Jaiswal (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18350:
--
Description: 
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


For non-bucketed table, hive renames all the files regardless of how they were 
named by the user.
For bucketed table, hive relies on user to name the files matching the bucket 
in non-strict mode. Hive assumes that the data belongs to same bucket in a 
file. In strict mode, loading bucketed table is disabled.


  was:
Insert statements create files of format ending with _0, 0001_0 etc. 
However, the load data uses the input file name. That results in inconsistent 
naming convention which makes SMB joins difficult in some scenarios and may 
cause trouble for other types of queries in future.

We need consistent naming convention.


> load data should rename files consistent with insert statements
> ---
>
> Key: HIVE-18350
> URL: https://issues.apache.org/jira/browse/HIVE-18350
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For non-bucketed table, hive renames all the files regardless of how they 
> were named by the user.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

[jira] [Updated] (HIVE-18350) load data should rename files consistent with insert statements

33 matches

Site Navigation

Mail list logo

Footer information