[jira] [Commented] (HIVE-1950) Block merge for RCFile

2014-10-16 Thread Lefty Leverenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174764#comment-14174764
 ] 

Lefty Leverenz commented on HIVE-1950:
--

Doc note:  [~prasanth_j] documented this in the wiki here:

* [DDL -- Alter Table/Partition Concatenate | 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/PartitionConcatenate]

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Fix For: 0.8.0
>
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998583#comment-12998583
 ] 

He Yongqiang commented on HIVE-1950:


it's a typo, and i fixed in the new patch. 

HIVE_STATS_ATOMIC is an existing conf for stats.

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998582#comment-12998582
 ] 

Ning Zhang commented on HIVE-1950:
--

@Ted, thanks. Yongqiang has addressed the typo in his latest patch. 

+1 Will commit if tests pass. 

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998572#comment-12998572
 ] 

Ted Yu commented on HIVE-1950:
--

This may confuse somebody:
{code}
+  boolean automatic = HiveConf.getBoolVar(conf, 
HiveConf.ConfVars.HIVE_STATS_ATOMIC);
{code}


> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-23 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998522#comment-12998522
 ] 

He Yongqiang commented on HIVE-1950:


A new patch based on trunk and integrated Ning's last comments.

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-22 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12998060#comment-12998060
 ] 

He Yongqiang commented on HIVE-1950:


will update a new patch after 1517.

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-15 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12995112#comment-12995112
 ] 

Ning Zhang commented on HIVE-1950:
--

2nd Round:
=
QTestUtil: 
  same as my previous comment: revert the change if it belongs to another JIRA

StatsTask: 
  line 274, 310: you may not need the updateOnly variable in StatsWork. Instead 
you can just check HiveConf.ConfVars.HIVE_STATS_ATOMIC. 
 
CombineHiveKey.java: 
  missing Apache liense header

RCFileMergeMapper.java:
  the jobClose() functions should handle the exceptional case when abort is 
true (similar to what FileSinkOperator does) or an exception was thrown from 
the hadoopo layber but it failed to call close(abort=true). 
  also in jobClose(), the partition's old directory is first moved to backup 
directory and then the intermediate directory is moved to the partition's 
destination directory. All this is done when the partition is online (other 
queries can read the partition's directory). You may want to create a follow-up 
JIRA to make this partition as offline during the move. 



> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-14 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994616#comment-12994616
 ] 

He Yongqiang commented on HIVE-1950:


QTestUtil.java is not related to this jira. should open a new one for it.

>>jobExecHelper is constructed in both the constructors and initialize(). Is 
>>there a reason?
This is because the existing code may use ExecDriver and may not call 
initialize() (like ExecDriver's main()).

>>checkFatalError: why removed some code?
No code is removed, just some code is moved to jobExecHelper.

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-14 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994610#comment-12994610
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, I'm still reviewing the new patch (.4) but found some of my comments 
are not address (e.g., QTestUtil). Can you elaborate which comments have been 
addressed and which are not (and the reasons)?

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-11 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993852#comment-12993852
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, the patch doesn't compile. Below are some initial reviews from me:

QTestUtil.java: 
 334: you may want to add those index tables that you want to keep in 
srcTables. Otherewise indexes that are created inside a test will not be 
cleaned -- side-effect. 

StatsTask:
 a StatsTask is added in DDLSemanticAnalyzer for the mege task but why set it 
to do nothing? 

ExecDriver:
 jobExecHelper is constructed in both the constructors and initialize(). Is 
there a reason?

 checkFatalError: why removed some code?

 Why remove METASTOREPWD?

DDLTask:
 move semantics checking (index & archive checking etc.) to 
DDLSemanticAnalyzer. Execution time should only raise exception if there are 
runtime exceptions. In another word, explain plan of the query shoull throw an 
exception if there are indexes or table is archived. 

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-11 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993806#comment-12993806
 ] 

Ning Zhang commented on HIVE-1950:
--

Yongqiang, does the review board have the latest patch?

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-09 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992701#comment-12992701
 ] 

He Yongqiang commented on HIVE-1950:


>>2. Move RCFile check to SemanticAnalyzer from runtime.
SemanticAnalyzer only throws SemanticException. we may should keep this 
semantic. Moving the check to SemanticAnalyzer will need it to handle a lot of 
HiveExceptions (thrown by getTable etc).

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-09 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992674#comment-12992674
 ] 

Namit Jain commented on HIVE-1950:
--

1. Can you change merge_files to concatenate ?
   alter table  concatenate;

2. Move RCFile check to SemanticAnalyzer from runtime.

3. More comments: DDLTask.java/mergeFiles
   RCFile: all the new functions etc.


> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread Ning Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992227#comment-12992227
 ] 

Ning Zhang commented on HIVE-1950:
--

As discussed offline, this patch should be able to handle stats update 
(creating a StatsTask as child). 

Also please keep in mind that the design and implementation of the new 
MergeTask should be easy to be used in the merge process in INSERT OVERWRITE. 

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-08 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992112#comment-12992112
 ] 

He Yongqiang commented on HIVE-1950:


review comments from internal review:
1) if the stats present, try to correct it
2) jobClose of RCFileMergeMapper should share the code in FileSinkOperator
3) move the original data to a dump loc first
4) remove getRecordWriter() and RCFileBlockMergeOutputFormat
5) ioCxt for input file changed
6) disable merge for archived table/partition and bucketized table/partition
7) comments
8) negative tests for hiveinputformat



> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990434#comment-12990434
 ] 

Namit Jain commented on HIVE-1950:
--

I will take a look - 1 minor comment.
Can you add some negative tests:

1. merge_files should fail if there is a index on the table/partition.
2. merge_files should fail if the table is partitioned, but the user did not 
specify the partition.


Going forward, we should support merge even if the partition is not fully 
specified.

alter table srcpart partition (ds='1') merge_files;

should merge ds=1/hr=1 and ds=1/hr=2 as a follow-up.
But for now, they should throw an error

> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1950) Block merge for RCFile

2011-02-03 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990389#comment-12990389
 ] 

He Yongqiang commented on HIVE-1950:


review board:
https://reviews.apache.org/r/388/


> Block merge for RCFile
> --
>
> Key: HIVE-1950
> URL: https://issues.apache.org/jira/browse/HIVE-1950
> Project: Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>Assignee: He Yongqiang
> Attachments: HIVE-1950.1.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira