[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974496#comment-13974496 ] Ashutosh Chauhan commented on HIVE-1608: That indeed is useful. Lets do this change. I guess for performance there might be some binary format we may want to choose later, but for now supporting new line character in column is welcome change. We have to make sure its backward compatible for insert overwrite case though. > use sequencefile as the default for storing intermediate results > > > Key: HIVE-1608 > URL: https://issues.apache.org/jira/browse/HIVE-1608 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Brock Noland > Fix For: 0.14.0 > > Attachments: HIVE-1608.patch > > > The only argument for having a text file for storing intermediate results > seems to be better debuggability. > But, tailing a sequence file is possible, and it should be more space > efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973285#comment-13973285 ] Brock Noland commented on HIVE-1608: The big win here is that columns with new lines don't get screwed up by default. That is they work out of the box. > use sequencefile as the default for storing intermediate results > > > Key: HIVE-1608 > URL: https://issues.apache.org/jira/browse/HIVE-1608 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Brock Noland > Fix For: 0.14.0 > > Attachments: HIVE-1608.patch > > > The only argument for having a text file for storing intermediate results > seems to be better debuggability. > But, tailing a sequence file is possible, and it should be more space > efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973210#comment-13973210 ] Edward Capriolo commented on HIVE-1608: --- It is not much. SequenceFile + none (codec) only ads some block information around text. I still thing sequence by default is a good idea. It makes it easier to add compression later without sacrificing split- ablility. > use sequencefile as the default for storing intermediate results > > > Key: HIVE-1608 > URL: https://issues.apache.org/jira/browse/HIVE-1608 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Brock Noland > Fix For: 0.14.0 > > Attachments: HIVE-1608.patch > > > The only argument for having a text file for storing intermediate results > seems to be better debuggability. > But, tailing a sequence file is possible, and it should be more space > efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962487#comment-13962487 ] Brock Noland commented on HIVE-1608: A couple of notes on this one: 1) Most of the test failures look to be related to the .q.out files being different (referencing TextFile output class not SequenceFile) 2) This change as-is would be backwards incompatible for INSERT OVERWRITE DIRECTORY users. Thus I think we need: 1) Leave the default of TextFile for INSERT OVERWRITE DIRECTORY 2) Update the .q.out files > use sequencefile as the default for storing intermediate results > > > Key: HIVE-1608 > URL: https://issues.apache.org/jira/browse/HIVE-1608 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Brock Noland > Fix For: 0.14.0 > > Attachments: HIVE-1608.patch > > > The only argument for having a text file for storing intermediate results > seems to be better debuggability. > But, tailing a sequence file is possible, and it should be more space > efficient -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961446#comment-13961446 ] Hive QA commented on HIVE-1608: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12638875/HIVE-1608.patch {color:red}ERROR:{color} -1 due to 487 failed/errored test(s), 5548 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7 org.apache.hadoop.hive.cli.TestCliDriver.testC
[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results
[ https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961224#comment-13961224 ] Edward Capriolo commented on HIVE-1608: --- If the sequence file is not compressed it is actually larger then the text file... > use sequencefile as the default for storing intermediate results > > > Key: HIVE-1608 > URL: https://issues.apache.org/jira/browse/HIVE-1608 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.7.0 >Reporter: Namit Jain >Assignee: Brock Noland > Fix For: 0.14.0 > > Attachments: HIVE-1608.patch > > > The only argument for having a text file for storing intermediate results > seems to be better debuggability. > But, tailing a sequence file is possible, and it should be more space > efficient -- This message was sent by Atlassian JIRA (v6.2#6252)