[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Attachment: HIVE-1950.5.patch

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
 HIVE-1950.4.patch, HIVE-1950.5.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-09 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Attachment: HIVE-1950.3.patch

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Attachment: HIVE-1950.2.patch

A new patch addressed the review comments.

Will put a few into followup including the stat update.

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Status: Patch Available  (was: Open)

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-1950:
---

Attachment: HIVE-1950.1.patch

A patch for review. 

The code now is kind of very clean. Comments about how to make it clean are 
welcome!

 Block merge for RCFile
 --

 Key: HIVE-1950
 URL: https://issues.apache.org/jira/browse/HIVE-1950
 Project: Hive
  Issue Type: New Feature
Reporter: He Yongqiang
Assignee: He Yongqiang
 Attachments: HIVE-1950.1.patch


 In our env, there are a lot of small files inside one partition/table. In 
 order to reduce the namenode load, we have one dedicated housekeeping job 
 running to merge these file. Right now the merge is an 'insert overwrite' in 
 hive, and requires decompress the data and compress it. This jira is to add a 
 command in Hive to do the merge without decompress and recompress the data.
 Something like alter table tbl_name [partition ()] merge files. In this 
 jira the new command will only support RCFile, since there need some new APIs 
 to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira