[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-02-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-2340:
-

Attachment: HIVE-2340.12.patch

[~navis]: Thanks that pointer helped. The column pruning did indeed not carry 
over the some of the columns in the colExprMap. HIVE-2339 is in, but it was 
missing handling of the KEY.* columns.

I've also looked at infer_bucket_sort and see what you're saying. Seems ok to 
have additional sort columns/buckets as long as the ones that are explicitly 
asked for are there. I've updated the golden file for that test.

Running full test suite on .12 now.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, HIVE-2340.D1209.11.patch, 
 HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, 
 HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2339) Preserve RS key columns in columnExprMap after CP optimization, which might be useful to other optimizers

2013-02-12 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576683#comment-13576683
 ] 

Yin Huai commented on HIVE-2339:


should we reopen this one? From the discussion of HIVE-2340, seems HIVE-1989 
has not completely resolve the issue. Also, HIVE-2206 needs to use mapping of 
column names to track common keys among ReduceSinkOperators. Since the mapping 
of column names of keys are missing, if hive.map.aggr=false, the current 
patch of HIVE-2206 cannot detect common keys. hive.map.aggr=true will not be 
a problem since I still generate a reduce-side aggregation which is not in the 
plan tree and thus, will not go through CP optimization

 Preserve RS key columns in columnExprMap after CP optimization, which might 
 be useful to other optimizers
 -

 Key: HIVE-2339
 URL: https://issues.apache.org/jira/browse/HIVE-2339
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Fix For: 0.8.0

 Attachments: HIVE-2339.1.patch


 In ColumnPrunerProcFactory#pruneReduceSinkOperator, only VALUE parts are 
 retained from columnExprMap. Doesn't anyone want KEY parts to retained, 
 either? In my case, it was very useful for backtracking column names and 
 removing RS in *-RS-*-RS-GBY case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-12 Thread Owen O'Malley (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576691#comment-13576691
 ] 

Owen O'Malley commented on HIVE-3874:
-

Kevin, I had some distractions at work, but I should get the patch uploaded 
today.

 Create a new Optimized Row Columnar file format for Hive
 

 Key: HIVE-3874
 URL: https://issues.apache.org/jira/browse/HIVE-3874
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Owen O'Malley
Assignee: Owen O'Malley
 Attachments: hive.3874.2.patch, OrcFileIntro.pptx, orc.tgz


 There are several limitations of the current RC File format that I'd like to 
 address by creating a new format:
 * each column value is stored as a binary blob, which means:
 ** the entire column value must be read, decompressed, and deserialized
 ** the file format can't use smarter type-specific compression
 ** push down filters can't be evaluated
 * the start of each row group needs to be found by scanning
 * user metadata can only be added to the file when the file is created
 * the file doesn't store the number of rows per a file or row group
 * there is no mechanism for seeking to a particular row number, which is 
 required for external indexes.
 * there is no mechanism for storing light weight indexes within the file to 
 enable push-down filters to skip entire row groups.
 * the type of the rows aren't stored in the file

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1967 - Still Failing

2013-02-12 Thread Apache Jenkins Server
Changes for Build #1964
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #1965

Changes for Build #1966

Changes for Build #1967



No tests ran.

The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1967)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1967/ to 
view the results.

Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #290

2013-02-12 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/290/

--
[...truncated 5492 lines...]
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/pdk/src/test/resources
 does not exist.

init:
 [echo] Project: pdk

create-dirs:
 [echo] Project: builtins
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/classes
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/src
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/classes
[mkdir] Created dir: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/builtins/test/resources
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/builtins/src/test/resources
 does not exist.

init:
 [echo] Project: builtins

jar:
 [echo] Project: hive

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/ivy/ivysettings.xml
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/thrift/libthrift/0.7.0/libthrift-0.7.0.jar
 ...
[ivy:resolve]  (294kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] org.apache.thrift#libthrift;0.7.0!libthrift.jar 
(155ms)
[ivy:report] Processing 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/common/java;/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/ivy/ivysettings.xml
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/com/google/guava/guava/r09/guava-r09.jar ...
[ivy:resolve] 
...
 (1117kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] com.google.guava#guava;r09!guava.jar (165ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.20.2/hadoop-core-0.20.2.jar
 ...
[ivy:resolve] 

 (2624kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-core;0.20.2!hadoop-core.jar (160ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-tools/0.20.2/hadoop-tools-0.20.2.jar
 ...
[ivy:resolve] ... (68kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-tools;0.20.2!hadoop-tools.jar (211ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-test/0.20.2/hadoop-test-0.20.2.jar
 ...
[ivy:resolve] 

 (1527kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 
org.apache.hadoop#hadoop-test;0.20.2!hadoop-test.jar (245ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/commons-cli/commons-cli/1.2/commons-cli-1.2.jar 
...
[ivy:resolve] ... (40kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] commons-cli#commons-cli;1.2!commons-cli.jar (32ms)
[ivy:resolve] downloading 
http://repo1.maven.org/maven2/xmlenc/xmlenc/0.52/xmlenc-0.52.jar ...
[ivy:resolve] .. (14kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] 

[jira] [Commented] (HIVE-2655) Ability to define functions in HQL

2013-02-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576734#comment-13576734
 ] 

Brock Noland commented on HIVE-2655:


Jonathan,

I haven't seen any updates to this JIRA in a while. Are you still working on 
it? If not, would you mind if I took it forward?

Brock

 Ability to define functions in HQL
 --

 Key: HIVE-2655
 URL: https://issues.apache.org/jira/browse/HIVE-2655
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Jonathan Perlow
Assignee: Jonathan Chang
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch


 Ability to create functions in HQL as a substitute for creating them in Java.
 Jonathan Chang requested I create this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition

2013-02-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4009:
---

Status: Patch Available  (was: Open)

 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Brock Noland

 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4009) CLI Tests fail randomly due to MapReduce LocalJobRunner race condition

2013-02-12 Thread Brock Noland (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4009:
---

Attachment: HIVE-4009-0.patch

 CLI Tests fail randomly due to MapReduce LocalJobRunner race condition
 --

 Key: HIVE-4009
 URL: https://issues.apache.org/jira/browse/HIVE-4009
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Brock Noland
 Attachments: HIVE-4009-0.patch


 Hadoop has a race condition MAPREDUCE-5001 which causes tests to fail 
 randomly when using LocalJobRunner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3911) udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is disabled.

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576811#comment-13576811
 ] 

Thiruvel Thirumoolan commented on HIVE-3911:


This also happens with all usages of NumericHistogram (udaf histogram_numeric() 
too). This algorithm deals with double and the order in which inputs goes to 
the algorithm matter. If the order is different (as in this case), the results 
will be different. In Hadoop 20.x, the inputs goto the UDAF as it is in the 
table. But in Hadoop-23, the input order is reversed and the final output also 
is different. I have uploaded a patch which works fine for histogram_numeric() 
but fails with a small difference for udaf_percentile_approx. If there is way 
to tune this in Hadoop-23 that should help.

 udaf_percentile_approx.q fails with Hadoop 0.23.5 when map-side aggr is 
 disabled.
 -

 Key: HIVE-3911
 URL: https://issues.apache.org/jira/browse/HIVE-3911
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.11.0

 Attachments: HIVE-3911.patch


 I am running Hive10 unit tests against Hadoop 0.23.5 and 
 udaf_percentile_approx.q fails with a different value when map-side aggr is 
 disabled and only when 3rd argument to this UDAF is 100. Matches expected 
 output when map-side aggr is enabled for the same arguments.
 This test passes when hadoop.version is 1.1.1 and fails when its 0.23.x or 
 2.0.0-alpha or 2.0.2-alpha.
 [junit] 20c20
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 47c47
 [junit]  254.083331
 [junit] ---
 [junit]  252.77
 [junit] 74c74
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]
 [junit] 101c101
 [junit]  
 [23.358,254.083331,477.0625,489.54667]
 [junit] ---
 [junit]  [24.07,252.77,476.9,487.82]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3972) Support using multiple reducer for fetching order by results

2013-02-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576831#comment-13576831
 ] 

Ashutosh Chauhan commented on HIVE-3972:


[~navis] I agree HIVE-3562 is orthogonal issue which will make what I am 
suggesting lesser of an issue, but there are still some cases. As getting 
discussed on HIVE-3562 consider following query: 
{code}
select value, sum(key) as sum from src group by value order by value limit 10;
{code}
In this case, limit can't be pushed in map-phase. So, HIVE-3562 optimization 
won't kick in. After patch as it is currently on this jira, we will generate 
1MR job with multiple reducers and than do order-by on client in Fetch task. 
Here if you don't take advantage of the fact that there is a limit in query you 
might possibly read millions of rows from hdfs, bring all of them in client 
memory and than just show 10 to user. If you instead take limit into account 
and stop merging and reading as soon as you have seen 10 rows, you have saved 
both on hdfs IO as well as client memory. Make sense ? 

 Support using multiple reducer for fetching order by results
 

 Key: HIVE-3972
 URL: https://issues.apache.org/jira/browse/HIVE-3972
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3972.D8349.1.patch, HIVE-3972.D8349.2.patch, 
 HIVE-3972.D8349.3.patch


 Queries for fetching results which have lastly order by clause make final 
 MR run with single reducer, which can be too much. For example, 
 {code}
 select value, sum(key) as sum from src group by value order by sum;
 {code}
 If number of reducer is reasonable, multiple result files could be merged 
 into single sorted stream in the fetcher level.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join does not kick-in and runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Summary: Sort Merge Join does not kick-in and runs locally  (was: Sort 
Merge Join does not kick-in)

 Sort Merge Join does not kick-in and runs locally
 -

 Key: HIVE-4011
 URL: https://issues.apache.org/jira/browse/HIVE-4011
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
 Environment: Linux
Reporter: Amir Youssefi
  Labels: joins, mapjoin

 After required settings to get Sort Merge Join, it does not kick-in and falls 
 back to MapJoin with a local first step (on two bucketed and partitioned 
 tables).
 Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
 ran it on Hive 0.10 with sample public data and regular storage Formats.
 More details:
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 select /*+ MAPJOIN(l) */
 l.stock_price_open lo,
 r.stock_price_open ro
 from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
 l.stock_symbol = r.stock_symbol and l.dte=r.dte)
 where ...
 DDL:
 (both tables)
 PARTITIONED BY (year string)
 CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
 STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
 also made sure we had:
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Summary: Sort Merge Join runs locally  (was: Sort Merge Join does not 
kick-in and runs locally)

 Sort Merge Join runs locally
 

 Key: HIVE-4011
 URL: https://issues.apache.org/jira/browse/HIVE-4011
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
 Environment: Linux
Reporter: Amir Youssefi
  Labels: joins, mapjoin

 After required settings to get Sort Merge Join, it does not kick-in and falls 
 back to MapJoin with a local first step (on two bucketed and partitioned 
 tables).
 Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
 ran it on Hive 0.10 with sample public data and regular storage Formats.
 More details:
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 select /*+ MAPJOIN(l) */
 l.stock_price_open lo,
 r.stock_price_open ro
 from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
 l.stock_symbol = r.stock_symbol and l.dte=r.dte)
 where ...
 DDL:
 (both tables)
 PARTITIONED BY (year string)
 CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
 STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
 also made sure we had:
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4011) Sort Merge Join runs locally

2013-02-12 Thread Amir Youssefi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amir Youssefi updated HIVE-4011:


Attachment: SMJ-JIRA-4011.txt

 Sort Merge Join runs locally
 

 Key: HIVE-4011
 URL: https://issues.apache.org/jira/browse/HIVE-4011
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.9.0, 0.10.0
 Environment: Linux
Reporter: Amir Youssefi
  Labels: joins, mapjoin
 Attachments: SMJ-JIRA-4011.txt


 After required settings to get Sort Merge Join, it does not kick-in and falls 
 back to MapJoin with a local first step (on two bucketed and partitioned 
 tables).
 Ran into the issue on Hive 0.9 at large scale to make sure issue persists I 
 ran it on Hive 0.10 with sample public data and regular storage Formats.
 More details:
 set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 select /*+ MAPJOIN(l) */
 l.stock_price_open lo,
 r.stock_price_open ro
 from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
 l.stock_symbol = r.stock_symbol and l.dte=r.dte)
 where ...
 DDL:
 (both tables)
 PARTITIONED BY (year string)
 CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
 STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
 also made sure we had:
 set hive.enforce.bucketing=true;
 set hive.enforce.sorting=true;
 Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4011) Sort Merge Join does not kick-in

2013-02-12 Thread Amir Youssefi (JIRA)
Amir Youssefi created HIVE-4011:
---

 Summary: Sort Merge Join does not kick-in
 Key: HIVE-4011
 URL: https://issues.apache.org/jira/browse/HIVE-4011
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0, 0.9.0
 Environment: Linux
Reporter: Amir Youssefi


After required settings to get Sort Merge Join, it does not kick-in and falls 
back to MapJoin with a local first step (on two bucketed and partitioned 
tables).

Ran into the issue on Hive 0.9 at large scale to make sure issue persists I ran 
it on Hive 0.10 with sample public data and regular storage Formats.

More details:

set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;

select /*+ MAPJOIN(l) */
l.stock_price_open lo,
r.stock_price_open ro
from nyse_stocks_pcsb l JOIN nyse_stocks_pcsb_dup r ON (l.year = r.year and 
l.stock_symbol = r.stock_symbol and l.dte=r.dte)
where ...

DDL:

(both tables)
PARTITIONED BY (year string)
CLUSTERED BY (stock_symbol) SORTED BY (stock_symbol) INTO 4 BUCKETS
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'


also made sure we had:

set hive.enforce.bucketing=true;
set hive.enforce.sorting=true;

Run logs and more info in attached file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2655) Ability to define functions in HQL

2013-02-12 Thread Jonathan Chang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576870#comment-13576870
 ] 

Jonathan Chang commented on HIVE-2655:
--

Yeah, I haven't had a chance to work on this.  Looks like all that needs to be 
done at this point is unittests.  I would be more than happy to have you take 
this the rest of the way!

 Ability to define functions in HQL
 --

 Key: HIVE-2655
 URL: https://issues.apache.org/jira/browse/HIVE-2655
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Jonathan Perlow
Assignee: Jonathan Chang
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch


 Ability to create functions in HQL as a substitute for creating them in Java.
 Jonathan Chang requested I create this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-02-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576873#comment-13576873
 ] 

Ashutosh Chauhan commented on HIVE-3403:


Make sense. I am not suggesting to include all that in this jira, but wanted to 
make sure we are on same page as to where are heading.
Though, w.r.t, configs I can see your point about adding more configs, but I 
still think by default optimization configs should be on. Whole point of 
release is to ship stable codebase. By definition trunk is not considered 
stable (as stable as in making release out of it), so time we get between 
committing to trunk and releasing is for stablizing new codebase, but if by 
default configs are off, bugs lurking in new codebase will never be exposed.  

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, 
 hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, 
 hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, 
 hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2655) Ability to define functions in HQL

2013-02-12 Thread Brock Noland (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576882#comment-13576882
 ] 

Brock Noland commented on HIVE-2655:


Hi,

OK great! Yes the patch required only minor rebasing. I'll work on the unit 
tests and then post the resulting patch here.

Thanks!
Brock

 Ability to define functions in HQL
 --

 Key: HIVE-2655
 URL: https://issues.apache.org/jira/browse/HIVE-2655
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Jonathan Perlow
Assignee: Jonathan Chang
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch


 Ability to create functions in HQL as a substitute for creating them in Java.
 Jonathan Chang requested I create this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4008) MiniMR tests fail with latest version of Hadoop 23

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-4008:
---

Attachment: HIVE-4008_branch10.patch

HIVE-4008_branch10.patch tested with hadoop 20.x and 23.x.

 MiniMR tests fail with latest version of Hadoop 23
 --

 Key: HIVE-4008
 URL: https://issues.apache.org/jira/browse/HIVE-4008
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.10.1

 Attachments: HIVE-4008_branch10.patch


 TestMinimrCliDriver and TestNegativeMinimrCliDriver run fine with 0.23.4 on 
 branch 10, but when I moved to 23.5 or a build of 23.6, they start to fail. 
 YARN-144 seem to be the reason and will upload a patch soon for branch10.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3252) Add environment context to metastore Thrift calls

2013-02-12 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3252:


   Resolution: Fixed
Fix Version/s: 0.11.0
 Assignee: Samuel Yuan  (was: John Reese)
   Status: Resolved  (was: Patch Available)

Committed, thanks Sam.

 Add environment context to metastore Thrift calls
 -

 Key: HIVE-3252
 URL: https://issues.apache.org/jira/browse/HIVE-3252
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: John Reese
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt


 Currently in the Hive Thrift metastore API create_table, add_partition, 
 alter_table, alter_partition have with_environment_context analogs.  It would 
 be really useful to add similar methods from drop_partition, drop_table, and 
 append_partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HIVE-4012:
--

 Summary: Unit test failures with Hadoop 23 due to HADOOP-8551
 Key: HIVE-4012
 URL: https://issues.apache.org/jira/browse/HIVE-4012
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.11.0, 0.10.1


With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs -mkdir 
of foo/bar when foo does not exist. One has to use '-p' option (not available 
in Hadoop 20.x). A bunch of our test cases rely on this feature and this was to 
make it interoperable with Windows too (HIVE-3204). However, all these unit 
tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not possible to 
use the '-p' option in the tests as thats not supported in Hadoop 20.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-12 Thread Miles Shang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576944#comment-13576944
 ] 

Miles Shang commented on HIVE-4010:
---

Unit tests pass.

 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576946#comment-13576946
 ] 

Phabricator commented on HIVE-4010:
---

mshang has commented on the revision HIVE-4010 [jira] Failure finding iterate 
method with matching signature.

  Ran full unit test suite. Pass.

REVISION DETAIL
  https://reviews.facebook.net/D8517

To: JIRA, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13576960#comment-13576960
 ] 

Phabricator commented on HIVE-4010:
---

jonchang has added reviewers to the revision HIVE-4010 [jira] Failure finding 
iterate method with matching signature.
Added Reviewers: kevinwilfong

  I'm not a Hive committer.

REVISION DETAIL
  https://reviews.facebook.net/D8517

To: JIRA, jonchang, kevinwilfong, mshang


 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls

2013-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577003#comment-13577003
 ] 

Hudson commented on HIVE-3252:
--

Integrated in hive-trunk-hadoop1 #81 (See 
[https://builds.apache.org/job/hive-trunk-hadoop1/81/])
HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan 
via kevinwilfong) (Revision 1445309)

 Result = ABORTED
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309
Files : 
* /hive/trunk/metastore/if/hive_metastore.thrift
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java


 Add environment context to metastore Thrift calls
 -

 Key: HIVE-3252
 URL: https://issues.apache.org/jira/browse/HIVE-3252
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: John Reese
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt


 Currently in the Hive Thrift metastore API create_table, add_partition, 
 alter_table, alter_partition have with_environment_context analogs.  It would 
 be really useful to add similar methods from drop_partition, drop_table, and 
 append_partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-4012:
---

Attachment: HIVE-4012_branch10.patch

 Unit test failures with Hadoop 23 due to HADOOP-8551
 

 Key: HIVE-4012
 URL: https://issues.apache.org/jira/browse/HIVE-4012
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Thiruvel Thirumoolan
 Fix For: 0.11.0, 0.10.1

 Attachments: HIVE-4012_branch10.patch


 With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs 
 -mkdir of foo/bar when foo does not exist. One has to use '-p' option (not 
 available in Hadoop 20.x). A bunch of our test cases rely on this feature and 
 this was to make it interoperable with Windows too (HIVE-3204). However, all 
 these unit tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not 
 possible to use the '-p' option in the tests as thats not supported in Hadoop 
 20.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)
Thiruvel Thirumoolan created HIVE-4013:
--

 Summary: Misc test failures on Hive10 with Hadoop 0.23.x
 Key: HIVE-4013
 URL: https://issues.apache.org/jira/browse/HIVE-4013
 Project: Hive
  Issue Type: Bug
Reporter: Thiruvel Thirumoolan
 Attachments: HIVE-4013_branch10.patch

Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build 
of 0.23.6 also). Its more like making the tests deterministic, adding order by 
to all the queries.

list_bucket_query_oneskew_3.q
list_bucket_query_multiskew_2.q
list_bucket_query_multiskew_3.q
list_bucket_query_multiskew_1.q
parenthesis_star_by.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-4013:
---

Attachment: HIVE-4013_branch10.patch

 Misc test failures on Hive10 with Hadoop 0.23.x
 ---

 Key: HIVE-4013
 URL: https://issues.apache.org/jira/browse/HIVE-4013
 Project: Hive
  Issue Type: Bug
Reporter: Thiruvel Thirumoolan
 Attachments: HIVE-4013_branch10.patch


 Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build 
 of 0.23.6 also). Its more like making the tests deterministic, adding order 
 by to all the queries.
 list_bucket_query_oneskew_3.q
 list_bucket_query_multiskew_2.q
 list_bucket_query_multiskew_3.q
 list_bucket_query_multiskew_1.q
 parenthesis_star_by.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-hadoop2 - Build # 119 - Still Failing

2013-02-12 Thread Apache Jenkins Server
Changes for Build #81

Changes for Build #82
[namit] HIVE-3927 Potential overflow with new RCFileCat column sizes options
(Kevin Wilfong via namit)


Changes for Build #83

Changes for Build #84
[cws] HIVE-3931. Add Oracle metastore upgrade script for 0.9 to 10.0
 (Prasad Mujumdar via cws)


Changes for Build #85

Changes for Build #86
[hashutosh] HIVE-3913 : Possible deadlock in ZK lock manager (Mikhail Bautin 
via Ashutosh Chauhan)

[hashutosh] HIVE-3833 : object inspectors should be initialized based on 
partition metadata (Namit Jain via Ashutosh Chauhan)


Changes for Build #87

Changes for Build #88
[namit] HIVE-3825 Add Operator level Hooks
(Pamela Vagata via namit)

[hashutosh] HIVE-3528 : Avro SerDe doesn't handle serializing Nullable types 
that require access to a Schema (Sean Busbey via Ashutosh Chauhan)

[namit] HIVE-3943 Skewed query fails if hdfs path has special characters
(Gang Tim Liu via namit)


Changes for Build #89
[namit] HIVE-3527 Allow CREATE TABLE LIKE command to take TBLPROPERTIES
(Kevin Wilfong via namit)

[namit] HIVE-3944 Make accept qfile argument for miniMR tests
(Navis via namit)


Changes for Build #90
[namit] HIVE-3912 table_access_keys_stats.q fails with hadoop 0.23
(Sushanth Sownyan via namit)

[namit] HIVE-3921 recursive_dir.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3923 join_filters_overlap.q fails on 0.23
(Sushanth Sowmyan via namit)

[namit] HIVE-3924 join_nullsafe.q fails on 0.23
(Sushanth Sownyan via namit)

[hashutosh] Adding csv.txt file, left out from commit of 3528


Changes for Build #91

Changes for Build #92
[hashutosh] HIVE-3799 : Better error message if metalisteners or hookContext 
cannot be loaded/instantiated (Navis via Ashutosh Chauhan)

[hashutosh] HIVE-3947 : MiniMR test remains pending after test completion 
(Navis via Ashutosh Chauhan)


Changes for Build #93

Changes for Build #94
[kevinwilfong] HIVE-3903. Allow updating bucketing/sorting metadata of a 
partition through the CLI. (Samuel Yuan via kevinwilfong)


Changes for Build #95
[namit] HIVE-3873 lot of tests failing for hadoop 23
(Gang Tim Liu via namit)


Changes for Build #96
[hashutosh] Missed deleting empty file GenMRRedSink4.java while commiting 3784

[hashutosh] HIVE-de-emphasize mapjoin hint (Namit Jain via Ashutosh Chauhan)


Changes for Build #97
[namit] HIVE-933 Infer bucketing/sorting properties
(Kevin Wilfong via namit)

[hashutosh] HIVE-3950 : Remove code for merging files via MR job (Ashutosh 
Chauhan, Reviewed by Namit Jain)


Changes for Build #98

Changes for Build #99
[kevinwilfong] HIVE-3940. Track columns accessed in each table in a query. 
(Samuel Yuan via kevinwilfong)


Changes for Build #100
[namit] HIVE-3778 Add MapJoinDesc.isBucketMapJoin() as part of explain plan
(Gang Tim Liu via namit)


Changes for Build #101

Changes for Build #102

Changes for Build #103

Changes for Build #104
[hashutosh] HIVE-3977 : Hive 0.10 postgres schema script is broken (Johnny 
Zhang via Ashutosh Chauhan)

[hashutosh] HIVE-3932 : Hive release tarballs don't contain PostgreSQL 
metastore scripts (Mark Grover via Ashutosh Chauhan)


Changes for Build #105
[hashutosh] HIVE-3918 : Normalize more CRLF line endings (Mark Grover via 
Ashutosh Chauhan)

[namit] HIVE-3917 Support noscan operation for analyze command
(Gang Tim Liu via namit)


Changes for Build #106
[namit] HIVE-3937 Hive Profiler
(Pamela Vagata via namit)

[hashutosh] HIVE-3571 : add a way to run a small unit quickly (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-3956 : TestMetaStoreAuthorization always uses the same port 
(Navis via Ashutosh Chauhan)


Changes for Build #107

Changes for Build #108

Changes for Build #109

Changes for Build #110
[namit] HIVE-2839 Filters on outer join with mapjoin hint is not applied 
correctly
(Navis via namit)


Changes for Build #111

Changes for Build #112
[namit] HIVE-3998 Oracle metastore update script will fail when upgrading from 
0.9.0 to
0.10.0 (Jarek and Mark via namit)

[namit] HIVE-3999 Mysql metastore upgrade script will end up with different 
schema than
the full schema load (Jarek and Mark via namit)


Changes for Build #113

Changes for Build #114
[namit] HIVE-3995 PostgreSQL upgrade scripts are not valid
(Jarek and Mark via namit)


Changes for Build #115

Changes for Build #116
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #117

Changes for Build #118

Changes for Build #119



33 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_query_oneskew_1

Error Message:
Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.

Stack Trace:
junit.framework.AssertionFailedError: Unexpected exception
See build/ql/tmp/hive.log, or try ant test ... -Dtest.silent=false to get 
more logs.
at junit.framework.Assert.fail(Assert.java:50)
at 

[jira] [Updated] (HIVE-3874) Create a new Optimized Row Columnar file format for Hive

2013-02-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3874:
--

Attachment: HIVE-3874.D8529.1.patch

omalley requested code review of HIVE-3874 [jira] Create a new Optimized Row 
Columnar file format for Hive.

Reviewers: JIRA

HIVE-3874. Create ORC File format.

There are several limitations of the current RC File format that I'd like to 
address by creating a new format:

each column value is stored as a binary blob, which means:

the entire column value must be read, decompressed, and 
deserialized
the file format can't use smarter type-specific compression
push down filters can't be evaluated

the start of each row group needs to be found by scanning
user metadata can only be added to the file when the file is created
the file doesn't store the number of rows per a file or row group
there is no mechanism for seeking to a particular row number, which is 
required for external indexes.
there is no mechanism for storing light weight indexes within the file 
to enable push-down filters to skip entire row groups.
the type of the rows aren't stored in the file

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8529

AFFECTED FILES
  build.properties
  build.xml
  ivy/libraries.properties
  ql/build.xml
  ql/ivy.xml
  ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldReader.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/BitFieldWriter.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/BooleanColumnStatistics.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatistics.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/ColumnStatisticsImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionCodec.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/CompressionKind.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/DoubleColumnStatistics.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicByteArray.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/DynamicIntArray.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/FileDump.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/InStream.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/IntegerColumnStatistics.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcFile.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcInputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcOutputFormat.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcSerde.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcStruct.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OrcUnion.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/OutStream.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/PositionProvider.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/PositionRecorder.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/PositionedOutputStream.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/Reader.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/ReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReader.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RecordReaderImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteReader.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthByteWriter.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerReader.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/RunLengthIntegerWriter.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/SerializationUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/SnappyCodec.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/StreamName.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/StringColumnStatistics.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/StringRedBlackTree.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/StripeInformation.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/Writer.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/WriterImpl.java
  ql/src/java/org/apache/hadoop/hive/ql/orc/ZlibCodec.java
  ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestBitFieldReader.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestDynamicArray.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestFileDump.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInStream.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcStruct.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthByteReader.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestRunLengthIntegerReader.java
  ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestSerializationUtils.java
  

[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-02-12 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-2340:
-

Attachment: HIVE-2340.13.patch

Clean bill of health on 12, except for incorrect golden files in 
TestParse_join2 and TestMinimrCliDriver_reduce_deduplicate. I've updated the 
golden files in patch .13.

 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, 
 HIVE-2340.D1209.11.patch, HIVE-2340.D1209.6.patch, HIVE-2340.D1209.7.patch, 
 HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2655) Ability to define functions in HQL

2013-02-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577188#comment-13577188
 ] 

Phabricator commented on HIVE-2655:
---

brock has commented on the revision HIVE-2655 [jira] Ability to define 
functions in HQL.

  Regarding

  * Defining the same macro twice.
  * Dropping a macro that doesn't exist.

  I assume the behavior should be the same as functions with allows both of 
these behaviors. As such I wonder why they were listed under negative tests?

REVISION DETAIL
  https://reviews.facebook.net/D915

BRANCH
  macro

ARCANIST PROJECT
  hive

To: JIRA, jsichi, cwsteinbach, jonchang
Cc: jonchang, ikabiljo, brock


 Ability to define functions in HQL
 --

 Key: HIVE-2655
 URL: https://issues.apache.org/jira/browse/HIVE-2655
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Jonathan Perlow
Assignee: Jonathan Chang
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2655.D915.4.patch


 Ability to create functions in HQL as a substitute for creating them in Java.
 Jonathan Chang requested I create this issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-02-12 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577198#comment-13577198
 ] 

Vikram Dixit K commented on HIVE-3403:
--

In the patch, the auto_sortmerge_join_6.q is missing.


 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3403.10.patch, hive.3403.11.patch, 
 hive.3403.12.patch, hive.3403.13.patch, hive.3403.14.patch, 
 hive.3403.15.patch, hive.3403.16.patch, hive.3403.17.patch, 
 hive.3403.18.patch, hive.3403.19.patch, hive.3403.1.patch, 
 hive.3403.21.patch, hive.3403.22.patch, hive.3403.23.patch, 
 hive.3403.24.patch, hive.3403.25.patch, hive.3403.26.patch, 
 hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, hive.3403.5.patch, 
 hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K resolved HIVE-3652.
--

   Resolution: Duplicate
Fix Version/s: 0.11.0

The work required for this jira is fixed as part of de-emphasizing of map-join 
work done in HIVE-3784. The query 

{format}select /*+ MAPJOIN(b,c) */ from FACT a join DIM1 b on a.k1=b.k1 JOIN 
DIM2 c on b.k2=c.k2{format}

runs in 1 MR job (based on the noConditionalTask.size).

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary

2013-02-12 Thread Vinod Kumar Vavilapalli (JIRA)
Vinod Kumar Vavilapalli created HIVE-4014:
-

 Summary: Hive+RCFile is not doing column pruning and reading much 
more data than necessary
 Key: HIVE-4014
 URL: https://issues.apache.org/jira/browse/HIVE-4014
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli


With even simple projection queries, I see that HDFS bytes read counter doesn't 
show any reduction in the amount of data read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4014) Hive+RCFile is not doing column pruning and reading much more data than necessary

2013-02-12 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577216#comment-13577216
 ] 

Vinod Kumar Vavilapalli commented on HIVE-4014:
---

I already tracked it down, will upload a patch soon..

 Hive+RCFile is not doing column pruning and reading much more data than 
 necessary
 -

 Key: HIVE-4014
 URL: https://issues.apache.org/jira/browse/HIVE-4014
 Project: Hive
  Issue Type: Bug
Reporter: Vinod Kumar Vavilapalli
Assignee: Vinod Kumar Vavilapalli

 With even simple projection queries, I see that HDFS bytes read counter 
 doesn't show any reduction in the amount of data read.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4015) Add ORC file to the grammar as a file format

2013-02-12 Thread Owen O'Malley (JIRA)
Owen O'Malley created HIVE-4015:
---

 Summary: Add ORC file to the grammar as a file format
 Key: HIVE-4015
 URL: https://issues.apache.org/jira/browse/HIVE-4015
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley


It would be much more convenient for users if we enable them to use ORC as a 
file format in the HQL grammar. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-02-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3403:
-

Attachment: auto_sortmerge_join_1_modified.q

I run with a modified version of the auto_sortmerge_join_1.q file (attached to 
the JIRA) and create a query where 2 of the tables in a join are sorted and 
bucketed and the 3rd table is not sorted. I have enabled the auto map join 
convert config. I am seeing this exception:

FAILED: ClassCastException org.apache.hadoop.hive.ql.exec.SMBMapJ
oinOperator cannot be cast to org.apache.hadoop.hive.ql.exec.MapJoinOperator

I do not see the exception if I set the noConditionalTask.size to a size 
greater than the size of the 2 small tables (src1 and small_table) for e.g. 500.

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1_modified.q, hive.3403.10.patch, 
 hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, 
 hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, 
 hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, 
 hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, 
 hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, 
 hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4015) Add ORC file to the grammar as a file format

2013-02-12 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley reassigned HIVE-4015:
---

Assignee: Owen O'Malley

 Add ORC file to the grammar as a file format
 

 Key: HIVE-4015
 URL: https://issues.apache.org/jira/browse/HIVE-4015
 Project: Hive
  Issue Type: Improvement
Reporter: Owen O'Malley
Assignee: Owen O'Malley

 It would be much more convenient for users if we enable them to use ORC as a 
 file format in the HQL grammar. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Status: Patch Available  (was: Open)

 Correctly enforce the memory limit on the multi-table map-join
 --

 Key: HIVE-3996
 URL: https://issues.apache.org/jira/browse/HIVE-3996
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-3996_2.patch, HIVE-3996.patch


 Currently with HIVE-3784, the joins are converted to map-joins based on 
 checks of the table size against the config variable: 
 hive.auto.convert.join.noconditionaltask.size. 
 However, the current implementation will also merge multiple mapjoin 
 operators into a single task regardless of whether the sum of the table sizes 
 will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3996) Correctly enforce the memory limit on the multi-table map-join

2013-02-12 Thread Vikram Dixit K (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-3996:
-

Attachment: HIVE-3996_2.patch

Updated patch which improves existing tests with my patch.

 Correctly enforce the memory limit on the multi-table map-join
 --

 Key: HIVE-3996
 URL: https://issues.apache.org/jira/browse/HIVE-3996
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.11.0
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-3996_2.patch, HIVE-3996.patch


 Currently with HIVE-3784, the joins are converted to map-joins based on 
 checks of the table size against the config variable: 
 hive.auto.convert.join.noconditionaltask.size. 
 However, the current implementation will also merge multiple mapjoin 
 operators into a single task regardless of whether the sum of the table sizes 
 will exceed the configured value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Problem with Phabricator

2013-02-12 Thread Samuel Yuan
Hi all,

I recently found an issue with the installation of Phabricator used for code 
review (http://reviews.facebook.net). I reported it and was told that it can 
actually be fixed with an upgrade of Pygments to the latest release (see 
https://secure.phabricator.com/T2535). Is anyone familiar with how to go about 
doing that?

Thanks,
Sam


[jira] [Commented] (HIVE-3850) hour() function returns 12 hour clock value when using timestamp datatype

2013-02-12 Thread Anandha L Ranganathan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577298#comment-13577298
 ] 

Anandha L Ranganathan commented on HIVE-3850:
-

Hello Arun,
Justification: You should read   all the  comments and the ticket was re-opened.
Also, I added .q and .q.out in the patch.

 


 hour() function returns 12 hour clock value when using timestamp datatype
 -

 Key: HIVE-3850
 URL: https://issues.apache.org/jira/browse/HIVE-3850
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.9.0, 0.10.0
Reporter: Pieterjan Vriends
 Fix For: 0.11.0

 Attachments: hive-3850.patch, HIVE-3850.patch.txt


 Apparently UDFHour.java does have two evaluate() functions. One that does 
 accept a Text object as parameter and one that does use a TimeStampWritable 
 object as parameter. The first function does return the value of 
 Calendar.HOUR_OF_DAY and the second one of Calendar.HOUR. In the 
 documentation I couldn't find any information on the overload of the 
 evaluation function. I did spent quite some time finding out why my statement 
 didn't return a 24 hour clock value.
 Shouldn't both functions return the same?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-12 Thread Miles Shang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Shang reassigned HIVE-4010:
-

Assignee: Miles Shang

 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4010) Failure finding iterate method with matching signature

2013-02-12 Thread Miles Shang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Miles Shang updated HIVE-4010:
--

Status: Patch Available  (was: Open)

 Failure finding iterate method with matching signature
 --

 Key: HIVE-4010
 URL: https://issues.apache.org/jira/browse/HIVE-4010
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Miles Shang
Assignee: Miles Shang
Priority: Minor
 Attachments: HIVE-4010.D8517.1.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 http://fburl.com/10467687

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3403) user should not specify mapjoin to perform sort-merge bucketed join

2013-02-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3403:
-

Status: Open  (was: Patch Available)

Thanks Vikram, I will take a look.

 user should not specify mapjoin to perform sort-merge bucketed join
 ---

 Key: HIVE-3403
 URL: https://issues.apache.org/jira/browse/HIVE-3403
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: auto_sortmerge_join_1_modified.q, hive.3403.10.patch, 
 hive.3403.11.patch, hive.3403.12.patch, hive.3403.13.patch, 
 hive.3403.14.patch, hive.3403.15.patch, hive.3403.16.patch, 
 hive.3403.17.patch, hive.3403.18.patch, hive.3403.19.patch, 
 hive.3403.1.patch, hive.3403.21.patch, hive.3403.22.patch, 
 hive.3403.23.patch, hive.3403.24.patch, hive.3403.25.patch, 
 hive.3403.26.patch, hive.3403.2.patch, hive.3403.3.patch, hive.3403.4.patch, 
 hive.3403.5.patch, hive.3403.6.patch, hive.3403.7.patch, hive.3403.8.patch, 
 hive.3403.9.patch


 Currently, in order to perform a sort merge bucketed join, the user needs
 to set hive.optimize.bucketmapjoin.sortedmerge to true, and also specify the 
 mapjoin hint.
 The user should not specify any hints.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-4013) Misc test failures on Hive10 with Hadoop 0.23.x

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan reassigned HIVE-4013:
--

Assignee: Thiruvel Thirumoolan

 Misc test failures on Hive10 with Hadoop 0.23.x
 ---

 Key: HIVE-4013
 URL: https://issues.apache.org/jira/browse/HIVE-4013
 Project: Hive
  Issue Type: Bug
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Attachments: HIVE-4013_branch10.patch


 Following fail with latest builds of Hadoop23 (tested with 0.23.5 and a build 
 of 0.23.6 also). Its more like making the tests deterministic, adding order 
 by to all the queries.
 list_bucket_query_oneskew_3.q
 list_bucket_query_multiskew_2.q
 list_bucket_query_multiskew_3.q
 list_bucket_query_multiskew_1.q
 parenthesis_star_by.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4012) Unit test failures with Hadoop 23 due to HADOOP-8551

2013-02-12 Thread Thiruvel Thirumoolan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thiruvel Thirumoolan updated HIVE-4012:
---

Assignee: Thiruvel Thirumoolan

 Unit test failures with Hadoop 23 due to HADOOP-8551
 

 Key: HIVE-4012
 URL: https://issues.apache.org/jira/browse/HIVE-4012
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11.0
Reporter: Thiruvel Thirumoolan
Assignee: Thiruvel Thirumoolan
 Fix For: 0.11.0, 0.10.1

 Attachments: HIVE-4012_branch10.patch


 With HADOOP-8551 (=23.3 or =2.0.2-alpha), its not possible to do a dfs 
 -mkdir of foo/bar when foo does not exist. One has to use '-p' option (not 
 available in Hadoop 20.x). A bunch of our test cases rely on this feature and 
 this was to make it interoperable with Windows too (HIVE-3204). However, all 
 these unit tests fail when using Hadoop =23.3 or =2.0.2-alpha. Its also not 
 possible to use the '-p' option in the tests as thats not supported in Hadoop 
 20.x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3951) Allow Decimal type columns in Regex Serde

2013-02-12 Thread Mark Grover (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577324#comment-13577324
 ] 

Mark Grover commented on HIVE-3951:
---

This patch is ready for review. Would anyone be willing to please review?

 Allow Decimal type columns in Regex Serde
 -

 Key: HIVE-3951
 URL: https://issues.apache.org/jira/browse/HIVE-3951
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.10.0
Reporter: Mark Grover
Assignee: Mark Grover
 Fix For: 0.11.0

 Attachments: HIVE-3951.1.patch


 Decimal type in Hive was recently added by HIVE-2693. We should allow users 
 to create tables with decimal type columns when using Regex Serde. 
 HIVE-3004 did something similar for other primitive types.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577329#comment-13577329
 ] 

Namit Jain commented on HIVE-4007:
--

I agree, it is incompatible. I can change the existing serde's in hive 
codebase, but there may be external serde's out there, which I have
no control over.

We have to take this hit sometime.

 Create abstract classes for serializer and deserializer
 ---

 Key: HIVE-4007
 URL: https://issues.apache.org/jira/browse/HIVE-4007
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain

 Currently, it is very difficult to change the Serializer/Deserializer
 interface, since all the SerDes directly implement the interface.
 Instead, we should have abstract classes for implementing these interfaces.
 In case of a interface change, only the abstract class and the relevant 
 serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4007:
-

Attachment: hive.4007.1.patch

 Create abstract classes for serializer and deserializer
 ---

 Key: HIVE-4007
 URL: https://issues.apache.org/jira/browse/HIVE-4007
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4007.1.patch


 Currently, it is very difficult to change the Serializer/Deserializer
 interface, since all the SerDes directly implement the interface.
 Instead, we should have abstract classes for implementing these interfaces.
 In case of a interface change, only the abstract class and the relevant 
 serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Amareshwari Sriramadasu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amareshwari Sriramadasu reopened HIVE-3652:
---


When I ran the same query on the latest trunk with HIVE-3784 fixed, I see the 
following :

{noformat}
explain select /*+ MAPJOIN(b,c) */ * from fact a join dim1 b on a.k1=b.k1 JOIN  
dim2 c on a.k2=c.k2;
FAILED: SemanticException [Error 10227]: Not all clauses are supported with 
mapjoin hint. Please remove mapjoin hint. 
{noformat}

When I set hive.auto.convert.join=true; and run the following :

{noformat}
explain select * from fact a join dim1 b on a.k1=b.k1 JOIN  dim2 c on a.k2=c.k2;

STAGE DEPENDENCIES:
  Stage-10 is a root stage , consists of Stage-13, Stage-14, Stage-1
  Stage-13 has a backup stage: Stage-1
  Stage-8 depends on stages: Stage-13
  Stage-7 depends on stages: Stage-1, Stage-8, Stage-9 , consists of Stage-11, 
Stage-12, Stage-2
  Stage-11 has a backup stage: Stage-2
  Stage-5 depends on stages: Stage-11
  Stage-12 has a backup stage: Stage-2
  Stage-6 depends on stages: Stage-12
  Stage-2
  Stage-14 has a backup stage: Stage-1
  Stage-9 depends on stages: Stage-14
  Stage-1
  Stage-0 is a root stage

{noformat}

And the above query launches two MR jobs. Correct me if i am doing anything 
wrong. 

Namit, Can you confirm if this is fixed in HIVE-3784 and is there any other way 
to run this?

Vikram, If you are seeing this fixed, can you please add tests if no code 
changes are required?

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577345#comment-13577345
 ] 

Namit Jain commented on HIVE-4007:
--

https://reviews.facebook.net/D8541

 Create abstract classes for serializer and deserializer
 ---

 Key: HIVE-4007
 URL: https://issues.apache.org/jira/browse/HIVE-4007
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4007.1.patch


 Currently, it is very difficult to change the Serializer/Deserializer
 interface, since all the SerDes directly implement the interface.
 Instead, we should have abstract classes for implementing these interfaces.
 In case of a interface change, only the abstract class and the relevant 
 serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-12 Thread Navis (JIRA)
Navis created HIVE-4016:
---

 Summary: Remove init(fname) from TestParse.vm for each test
 Key: HIVE-4016
 URL: https://issues.apache.org/jira/browse/HIVE-4016
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial


TestParse does not change any of configuration or data, which means calling 
init() method before each test is not necessary. After removing it, test time 
reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-12 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-4016:


Status: Patch Available  (was: Open)

 Remove init(fname) from TestParse.vm for each test
 --

 Key: HIVE-4016
 URL: https://issues.apache.org/jira/browse/HIVE-4016
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4016.D8547.1.patch


 TestParse does not change any of configuration or data, which means calling 
 init() method before each test is not necessary. After removing it, test time 
 reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4016) Remove init(fname) from TestParse.vm for each test

2013-02-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4016:
--

Attachment: HIVE-4016.D8547.1.patch

navis requested code review of HIVE-4016 [jira] Remove init(fname) from 
TestParse.vm for each test.

Reviewers: JIRA

HIVE-4016 Remove init(fname) from TestParse.vm for each test

TestParse does not change any of configuration or data, which means calling 
init() method before each test is not necessary. After removing it, test time 
reduced to 260sec to 16sec.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D8547

AFFECTED FILES
  ql/src/test/templates/TestParse.vm

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/20805/

To: JIRA, navis


 Remove init(fname) from TestParse.vm for each test
 --

 Key: HIVE-4016
 URL: https://issues.apache.org/jira/browse/HIVE-4016
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-4016.D8547.1.patch


 TestParse does not change any of configuration or data, which means calling 
 init() method before each test is not necessary. After removing it, test time 
 reduced to 260sec to 16sec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-4017) Can't close long running hive Query Statements

2013-02-12 Thread Kugathasan Abimaran (JIRA)
Kugathasan Abimaran created HIVE-4017:
-

 Summary: Can't close long running hive Query Statements
 Key: HIVE-4017
 URL: https://issues.apache.org/jira/browse/HIVE-4017
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
 Environment: Ubuntu 11.04
Reporter: Kugathasan Abimaran


Currently, we can't set the hive query timeout period. Hive returns Method not 
supported. Are there anyways to stop the long running hive query statements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577353#comment-13577353
 ] 

Amareshwari Sriramadasu commented on HIVE-3652:
---

Even with hive.auto.convert.join.noconditionaltask set to true, I'm seeing two 
MR jobs getting launched.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-4007) Create abstract classes for serializer and deserializer

2013-02-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-4007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-4007:
-

Attachment: hive.4007.2.patch

 Create abstract classes for serializer and deserializer
 ---

 Key: HIVE-4007
 URL: https://issues.apache.org/jira/browse/HIVE-4007
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.4007.1.patch, hive.4007.2.patch


 Currently, it is very difficult to change the Serializer/Deserializer
 interface, since all the SerDes directly implement the interface.
 Instead, we should have abstract classes for implementing these interfaces.
 In case of a interface change, only the abstract class and the relevant 
 serde needs to change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577365#comment-13577365
 ] 

Namit Jain commented on HIVE-3652:
--

Is your size threshold correct -- hive.auto.convert.join.noconditionaltask.size 
?

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Amareshwari Sriramadasu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577368#comment-13577368
 ] 

Amareshwari Sriramadasu commented on HIVE-3652:
---

bq. Is your size threshold correct – 
hive.auto.convert.join.noconditionaltask.size ?
Yes. The tables are very small. I tested with empty tables as well. I'm seeing 
the same behavior.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls

2013-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577372#comment-13577372
 ] 

Hudson commented on HIVE-3252:
--

Integrated in Hive-trunk-h0.21 #1968 (See 
[https://builds.apache.org/job/Hive-trunk-h0.21/1968/])
HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan 
via kevinwilfong) (Revision 1445309)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309
Files : 
* /hive/trunk/metastore/if/hive_metastore.thrift
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java


 Add environment context to metastore Thrift calls
 -

 Key: HIVE-3252
 URL: https://issues.apache.org/jira/browse/HIVE-3252
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: John Reese
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt


 Currently in the Hive Thrift metastore API create_table, add_partition, 
 alter_table, alter_partition have with_environment_context analogs.  It would 
 be really useful to add similar methods from drop_partition, drop_table, and 
 append_partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Hive-trunk-h0.21 - Build # 1968 - Still Failing

2013-02-12 Thread Apache Jenkins Server
Changes for Build #1964
[namit] HIVE-4001 Add o.a.h.h.serde.Constants for backward compatibility
(Navis via namit)


Changes for Build #1965

Changes for Build #1966

Changes for Build #1967

Changes for Build #1968
[kevinwilfong] HIVE-3252. Add environment context to metastore Thrift calls. 
(Samuel Yuan via kevinwilfong)




1 tests failed.
FAILED:  
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at 
net.sf.antcontrib.logic.ForTask.doSequentialIteration(ForTask.java:259)
at net.sf.antcontrib.logic.ForTask.doToken(ForTask.java:268)
at net.sf.antcontrib.logic.ForTask.doTheTasks(ForTask.java:299)
at net.sf.antcontrib.logic.ForTask.execute(ForTask.java:244)




The Apache Jenkins build system has built Hive-trunk-h0.21 (build #1968)

Status: Still Failing

Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/1968/ to 
view the results.

[jira] [Commented] (HIVE-3252) Add environment context to metastore Thrift calls

2013-02-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577376#comment-13577376
 ] 

Hudson commented on HIVE-3252:
--

Integrated in Hive-trunk-hadoop2 #120 (See 
[https://builds.apache.org/job/Hive-trunk-hadoop2/120/])
HIVE-3252. Add environment context to metastore Thrift calls. (Samuel Yuan 
via kevinwilfong) (Revision 1445309)

 Result = FAILURE
kevinwilfong : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1445309
Files : 
* /hive/trunk/metastore/if/hive_metastore.thrift
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.cpp
* /hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore.h
* 
/hive/trunk/metastore/src/gen/thrift/gen-cpp/ThriftHiveMetastore_server.skeleton.cpp
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Database.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/EnvironmentContext.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Index.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Partition.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/PrincipalPrivilegeSet.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Schema.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SerDeInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/SkewedInfo.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/StorageDescriptor.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/Table.java
* 
/hive/trunk/metastore/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.java
* /hive/trunk/metastore/src/gen/thrift/gen-php/metastore/ThriftHiveMetastore.php
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore-remote
* 
/hive/trunk/metastore/src/gen/thrift/gen-py/hive_metastore/ThriftHiveMetastore.py
* /hive/trunk/metastore/src/gen/thrift/gen-rb/thrift_hive_metastore.rb
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
* 
/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStoreClient.java
* 
/hive/trunk/metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStoreWithEnvironmentContext.java


 Add environment context to metastore Thrift calls
 -

 Key: HIVE-3252
 URL: https://issues.apache.org/jira/browse/HIVE-3252
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: John Reese
Assignee: Samuel Yuan
Priority: Minor
 Fix For: 0.11.0

 Attachments: HIVE-3252.1.patch.txt, HIVE-3252.2.patch.txt


 Currently in the Hive Thrift metastore API create_table, add_partition, 
 alter_table, alter_partition have with_environment_context analogs.  It would 
 be really useful to add similar methods from drop_partition, drop_table, and 
 append_partition.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3652) Join optimization for star schema

2013-02-12 Thread Vikram Dixit K (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13577391#comment-13577391
 ] 

Vikram Dixit K commented on HIVE-3652:
--

Hi Amareshwari,

If you look at test case join32.q, it is almost the same as the one you had 
posted. It launches only one MR task 
(http://svn.apache.org/viewvc/hive/trunk/ql/src/test/results/clientpositive/join32.q.out?view=markup)
 I tried this with a fully installed cluster as well and I can see only one 
task. Another issue to consider would be HIVE-3996 and see if that makes a 
difference. Kindly correct me if I am wrong.

Thanks
Vikram.

 Join optimization for star schema
 -

 Key: HIVE-3652
 URL: https://issues.apache.org/jira/browse/HIVE-3652
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Amareshwari Sriramadasu
Assignee: Vikram Dixit K
 Fix For: 0.11.0


 Currently, if we join one fact table with multiple dimension tables, it 
 results in multiple mapreduce jobs for each join with dimension table, 
 because join would be on different keys for each dimension. 
 Usually all the dimension tables will be small and can fit into memory and so 
 map-side join can used to join with fact table.
 In this issue I want to look at optimizing such query to generate single 
 mapreduce job sothat mapper loads dimension tables into memory and joins with 
 fact table on different keys as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2340) optimize orderby followed by a groupby

2013-02-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2340:
--

Attachment: HIVE-2340.D1209.12.patch

navis updated the revision HIVE-2340 [jira] optimize orderby followed by a 
groupby.

  1. Changed policy of creating new metadatas(colExprMap, etc) in 
ColumnPrunerProcFactory.pruneReduceSinkOperator()
  - Remove not retained values from RowResolver, colExprMap and schema (instead 
of creating new entities by adding retained values)
  2. Changed order of applying CP and PPD. Now PPD applies first and CP next 
(which was CP-PPD)
  - CP removes some expr mappings which was not yet propagated by PPD
  - Also removed pruning schema of FilterOperator, which seemed not right (It's 
not certain that TS will actually prune columns)
  3. Refactored to share same code base in ExprNodeDescUtils which was 
introduced by HIVE-2839

  Will run full test tonight

Reviewers: JIRA

REVISION DETAIL
  https://reviews.facebook.net/D1209

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D1209?vs=27315id=27669#toc

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  conf/hive-default.xml.template
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ColumnPrunerProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/MapJoinProcessor.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ReduceSinkDeDuplication.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinResolver.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SkewJoinProcFactory.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
  ql/src/java/org/apache/hadoop/hive/ql/plan/JoinDesc.java
  ql/src/test/queries/clientpositive/auto_join26.q
  ql/src/test/queries/clientpositive/groupby_distinct_samekey.q
  ql/src/test/queries/clientpositive/reduce_deduplicate.q
  ql/src/test/queries/clientpositive/reduce_deduplicate_extended.q
  ql/src/test/results/clientpositive/cluster.q.out
  ql/src/test/results/clientpositive/groupby2.q.out
  ql/src/test/results/clientpositive/groupby2_map_skew.q.out
  ql/src/test/results/clientpositive/groupby_cube1.q.out
  ql/src/test/results/clientpositive/groupby_distinct_samekey.q.out
  ql/src/test/results/clientpositive/groupby_rollup1.q.out
  ql/src/test/results/clientpositive/index_bitmap3.q.out
  ql/src/test/results/clientpositive/index_bitmap_auto.q.out
  ql/src/test/results/clientpositive/infer_bucket_sort.q.out
  ql/src/test/results/clientpositive/ppd2.q.out
  ql/src/test/results/clientpositive/ppd_gby_join.q.out
  ql/src/test/results/clientpositive/reduce_deduplicate.q.out
  ql/src/test/results/clientpositive/reduce_deduplicate_extended.q.out
  ql/src/test/results/clientpositive/semijoin.q.out
  ql/src/test/results/clientpositive/union24.q.out
  ql/src/test/results/compiler/plan/input2.q.xml
  ql/src/test/results/compiler/plan/input3.q.xml
  ql/src/test/results/compiler/plan/join1.q.xml
  ql/src/test/results/compiler/plan/join2.q.xml
  ql/src/test/results/compiler/plan/join3.q.xml
  ql/src/test/results/compiler/plan/sample1.q.xml
  ql/src/test/results/compiler/plan/sample2.q.xml
  ql/src/test/results/compiler/plan/sample3.q.xml
  ql/src/test/results/compiler/plan/sample4.q.xml
  ql/src/test/results/compiler/plan/sample5.q.xml
  ql/src/test/results/compiler/plan/sample6.q.xml
  ql/src/test/results/compiler/plan/sample7.q.xml

To: JIRA, navis
Cc: hagleitn, njain


 optimize orderby followed by a groupby
 --

 Key: HIVE-2340
 URL: https://issues.apache.org/jira/browse/HIVE-2340
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
  Labels: perfomance
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2340.D1209.5.patch, HIVE-2340.12.patch, 
 HIVE-2340.13.patch, HIVE-2340.1.patch.txt, HIVE-2340.D1209.10.patch, 
 HIVE-2340.D1209.11.patch, HIVE-2340.D1209.12.patch, HIVE-2340.D1209.6.patch, 
 HIVE-2340.D1209.7.patch, HIVE-2340.D1209.8.patch, HIVE-2340.D1209.9.patch, 
 testclidriver.txt


 Before implementing optimizer for JOIN-GBY, try to implement RS-GBY 
 optimizer(cluster-by following group-by).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira