[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-18 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974496#comment-13974496
 ] 

Ashutosh Chauhan commented on HIVE-1608:


That indeed is useful. Lets do this change. I guess for performance there might 
be some binary format we may want to choose later, but for now supporting new 
line character in column is welcome change. We have to make sure its backward 
compatible for insert overwrite case though.

> use sequencefile as the default for storing intermediate results
> 
>
> Key: HIVE-1608
> URL: https://issues.apache.org/jira/browse/HIVE-1608
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Brock Noland
> Fix For: 0.14.0
>
> Attachments: HIVE-1608.patch
>
>
> The only argument for having a text file for storing intermediate results 
> seems to be better debuggability.
> But, tailing a sequence file is possible, and it should be more space 
> efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973285#comment-13973285
 ] 

Brock Noland commented on HIVE-1608:


The big win here is that columns with new lines don't get screwed up by 
default. That is they work out of the box. 

> use sequencefile as the default for storing intermediate results
> 
>
> Key: HIVE-1608
> URL: https://issues.apache.org/jira/browse/HIVE-1608
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Brock Noland
> Fix For: 0.14.0
>
> Attachments: HIVE-1608.patch
>
>
> The only argument for having a text file for storing intermediate results 
> seems to be better debuggability.
> But, tailing a sequence file is possible, and it should be more space 
> efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973210#comment-13973210
 ] 

Edward Capriolo commented on HIVE-1608:
---

It is not much. SequenceFile + none (codec) only ads some block information 
around text. I still thing sequence by default is a good idea. It makes it 
easier to add compression later without sacrificing split- ablility. 

> use sequencefile as the default for storing intermediate results
> 
>
> Key: HIVE-1608
> URL: https://issues.apache.org/jira/browse/HIVE-1608
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Brock Noland
> Fix For: 0.14.0
>
> Attachments: HIVE-1608.patch
>
>
> The only argument for having a text file for storing intermediate results 
> seems to be better debuggability.
> But, tailing a sequence file is possible, and it should be more space 
> efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-07 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13962487#comment-13962487
 ] 

Brock Noland commented on HIVE-1608:


A couple of notes on this one:

1) Most of the test failures look to be related to the .q.out files being 
different (referencing TextFile output class not SequenceFile)
2) This change as-is would be backwards incompatible for INSERT OVERWRITE 
DIRECTORY users.

Thus I think we need:
1) Leave the default of TextFile for INSERT OVERWRITE DIRECTORY
2) Update the .q.out files

> use sequencefile as the default for storing intermediate results
> 
>
> Key: HIVE-1608
> URL: https://issues.apache.org/jira/browse/HIVE-1608
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Brock Noland
> Fix For: 0.14.0
>
> Attachments: HIVE-1608.patch
>
>
> The only argument for having a text file for storing intermediate results 
> seems to be better debuggability.
> But, tailing a sequence file is possible, and it should be more space 
> efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-06 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961446#comment-13961446
 ] 

Hive QA commented on HIVE-1608:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12638875/HIVE-1608.patch

{color:red}ERROR:{color} -1 due to 487 failed/errored test(s), 5548 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alias_casted_column
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_allcolref_in_udf
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_filter
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_part
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_select
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_annotate_stats_union
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ansi_sql_arithmetic
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join0
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join13
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join16
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join18_multi_distinct
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join20
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join21
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join22
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join23
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join24
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join27
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join28
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join29
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join30
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join31
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join32
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_without_localtask
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_smb_mapjoin_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_14
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_15
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_binarysortable_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_groupby
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucket_map_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testC

[jira] [Commented] (HIVE-1608) use sequencefile as the default for storing intermediate results

2014-04-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13961224#comment-13961224
 ] 

Edward Capriolo commented on HIVE-1608:
---

If the sequence file is not compressed it is actually larger then the text 
file...

> use sequencefile as the default for storing intermediate results
> 
>
> Key: HIVE-1608
> URL: https://issues.apache.org/jira/browse/HIVE-1608
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.7.0
>Reporter: Namit Jain
>Assignee: Brock Noland
> Fix For: 0.14.0
>
> Attachments: HIVE-1608.patch
>
>
> The only argument for having a text file for storing intermediate results 
> seems to be better debuggability.
> But, tailing a sequence file is possible, and it should be more space 
> efficient



--
This message was sent by Atlassian JIRA
(v6.2#6252)