Top-K optimization

2012-11-19 Thread Sivaramakrishnan Narayanan
Hi All,

I'm a developer at Qubole (http://www.qubole.com) looking at Hadoop and Hive. 
In my past life, I was on the optimizer team of Greenplum Parallel Database. 
I'm a newbie to the Hive mailing list, so apologies for any missteps. I've done 
some searching in the Hive mailing list and JIRA and have not found any 
discussions around this topic - please feel free to redirect me to any old 
discussions I might've missed.

A class of queries we're interested in optimizing are top-k queries i.e. 
queries of the form:

(1) SELECT x, y from T order by z limit 10

You can imagine similar query with aggregates:

(2) SELECT x, y, count(*) as c from T group by x, y order by c desc limit 10

I'll continue my discussion with example (1) for simplicity. The way such a 
query is executed, every mapper sorts all rows from T and writes it to local 
files. Reducers (in this example, singular) read these files and merge them. 
These rows are fed to the limit operator which stops after 10 rows. 

The change I'm proposing is a combination of Hive and Hadoop changes which will 
greatly improve the performance of such queries:

Hadoop change:
- New parameter map.sort.limitrecords which determines how many records 
each mapper in a job will send to every reducer
- When writing out local files after sorting, map-task stops after 
map.sort.limitrecords records for each reducer
- Effectively, each mapper sends out its top-K records

Hive change:
- Determining when the Top-K optimization is applicable and setting K 
in ReduceSinkDesc
- Passing the K value along to MapredWork
- ExecDriver sets map.sort.limitrecords before executing the job 
corresponding to the MapredWork

This change will reduce the amount of I/O that happens on the map-side (writing 
only 10 rows per reducer as opposed to entire table) and can have a big effect 
on performance. Furthermore, it is possible to make the sort on the mapper side 
a top-k sort which can further improve performance - but the deep pocket is 
really the I/O savings. In my experiments, I see a 5x performance improvement 
for such queries.

Please let me know if this is of general interest - I'll be happy to contribute 
this back to the community. I'll also be mailing the Hadoop mailing list about 
this.

Thanks
Siva

Re: Top-K optimization

2012-11-19 Thread Namit Jain
Hi Siva,


Take a look at https://issues.apache.org/jira/browse/HIVE-3562.

It is in my todo list, but I have not been able to review this.

I think, this addresses a very similar problem. If yes, can you also
review the
above patch ?


Thanks,
-namit


On 11/19/12 3:10 PM, Sivaramakrishnan Narayanan tarb...@gmail.com
wrote:

Hi All,

I'm a developer at Qubole (http://www.qubole.com) looking at Hadoop and
Hive. In my past life, I was on the optimizer team of Greenplum Parallel
Database. I'm a newbie to the Hive mailing list, so apologies for any
missteps. I've done some searching in the Hive mailing list and JIRA and
have not found any discussions around this topic - please feel free to
redirect me to any old discussions I might've missed.

A class of queries we're interested in optimizing are top-k queries i.e.
queries of the form:

(1) SELECT x, y from T order by z limit 10

You can imagine similar query with aggregates:

(2) SELECT x, y, count(*) as c from T group by x, y order by c desc limit
10

I'll continue my discussion with example (1) for simplicity. The way such
a query is executed, every mapper sorts all rows from T and writes it to
local files. Reducers (in this example, singular) read these files and
merge them. These rows are fed to the limit operator which stops after 10
rows. 

The change I'm proposing is a combination of Hive and Hadoop changes
which will greatly improve the performance of such queries:

Hadoop change:
   - New parameter map.sort.limitrecords which determines how many records
each mapper in a job will send to every reducer
   - When writing out local files after sorting, map-task stops after
map.sort.limitrecords records for each reducer
   - Effectively, each mapper sends out its top-K records

Hive change:
   - Determining when the Top-K optimization is applicable and setting K in
ReduceSinkDesc
   - Passing the K value along to MapredWork
   - ExecDriver sets map.sort.limitrecords before executing the job
corresponding to the MapredWork

This change will reduce the amount of I/O that happens on the map-side
(writing only 10 rows per reducer as opposed to entire table) and can
have a big effect on performance. Furthermore, it is possible to make the
sort on the mapper side a top-k sort which can further improve
performance - but the deep pocket is really the I/O savings. In my
experiments, I see a 5x performance improvement for such queries.

Please let me know if this is of general interest - I'll be happy to
contribute this back to the community. I'll also be mailing the Hadoop
mailing list about this.

Thanks
Siva



[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2012-11-19 Thread Sivaramakrishnan Narayanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500118#comment-13500118
 ] 

Sivaramakrishnan Narayanan commented on HIVE-3562:
--

I'm interested in this particular optimization. Let's say the table src have N 
rows and we're interested in top-K. If the rows in T are in almost descending 
order and we're interested in ascending Top-K (this is very likely when 
ordering by timestamps), then the number of memcopies will be N * K. See code 
fragment:

{code}
+public boolean isTopN(byte[] key) {
+  int index = Arrays.binarySearch(keys, key, C);
+  index = index  0 ? -index -1 : index;
+  if (index = keys.length - 1) {
+return false;
+  }
+  System.arraycopy(keys, index, keys, index + 1, keys.length - index - 1);
+  keys[index] = Arrays.copyOf(key, key.length);
+  return true;
+}
+  }
{code}

You could use a linked list, but binary search is not an option in that case.

An alternate approach to the problem is to use a combination of Hive and Hadoop 
changes.

Hadoop change:
* New parameter map.sort.limitrecords which determines how many records each 
mapper in a job will send to every reducer
* When writing out local files after sorting, map-task stops after 
map.sort.limitrecords records for each reducer
* Effectively, each mapper sends out its top-K records

Hive change:
* Determining when the Top-K optimization is applicable and setting K in 
ReduceSinkDesc
* Passing the K value along to MapredWork
* ExecDriver sets map.sort.limitrecords before executing the job corresponding 
to the MapredWork

This change will reduce the amount of I/O that happens on the map-side (writing 
only 10 rows per reducer as opposed to entire table) and can have a big effect 
on performance. Furthermore, it is possible to make the sort on the mapper side 
a top-k sort which can further improve performance - but the deep pocket is 
really the I/O savings. In my experiments, I see a 5x performance improvement 
for such queries.

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3633) sort-merge join does not work with sub-queries

2012-11-19 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3633:
-

Attachment: hive.3633.5.patch

 sort-merge join does not work with sub-queries
 --

 Key: HIVE-3633
 URL: https://issues.apache.org/jira/browse/HIVE-3633
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3633.1.patch, hive.3633.2.patch, hive.3633.3.patch, 
 hive.3633.4.patch, hive.3633.5.patch


 Consider the following query:
 create table smb_bucket_1(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 create table smb_bucket_2(key int, value string) CLUSTERED BY (key) SORTED BY 
 (key) INTO 6 BUCKETS STORED AS TEXTFILE;
 -- load the above tables
 set hive.optimize.bucketmapjoin = true;
 set hive.optimize.bucketmapjoin.sortedmerge = true;
 set hive.input.format = 
 org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;
 explain
 select count(*) from
 (
 select /*+mapjoin(a)*/ a.key as key1, b.key as key2, a.value as value1, 
 b.value as value2
 from smb_bucket_1 a join smb_bucket_2 b on a.key = b.key)
 subq;
 The above query does not use sort-merge join. This would be very useful as we 
 automatically convert the queries to use sorting and bucketing properties for 
 join.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3562) Some limit can be pushed down to map stage

2012-11-19 Thread Sivaramakrishnan Narayanan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500165#comment-13500165
 ] 

Sivaramakrishnan Narayanan commented on HIVE-3562:
--

Apologies, you can use a heap to maintain a top-k as opposed to an array or a 
linked list. 

You may also want to consider the case where the top-k do not fit in memory. 
One possibility would be to employ this optimization only if K is less than 
some threshold.

This approach has the advantage that it is a Hive-only change and does not 
depend on a Hadoop change. That is a pretty big plus.

 Some limit can be pushed down to map stage
 --

 Key: HIVE-3562
 URL: https://issues.apache.org/jira/browse/HIVE-3562
 Project: Hive
  Issue Type: Bug
Reporter: Navis
Assignee: Navis
Priority: Trivial
 Attachments: HIVE-3562.D5967.1.patch


 Queries with limit clause (with reasonable number), for example
 {noformat}
 select * from src order by key limit 10;
 {noformat}
 makes operator tree, 
 TS-SEL-RS-EXT-LIMIT-FS
 But LIMIT can be partially calculated in RS, reducing size of shuffling.
 TS-SEL-RS(TOP-N)-EXT-LIMIT-FS

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false #203

2012-11-19 Thread Apache Jenkins Server
See 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/

--
[...truncated 10343 lines...]

compile-test:
 [echo] Project: serde
[javac] Compiling 26 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/serde/test/classes
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

create-dirs:
 [echo] Project: service
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/service/src/test/resources
 does not exist.

init:
 [echo] Project: service

ivy-init-settings:
 [echo] Project: service

ivy-resolve:
 [echo] Project: service
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-service-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/ivy/report/org.apache.hive-hive-service-default.html

ivy-retrieve:
 [echo] Project: service

compile:
 [echo] Project: service

ivy-resolve-test:
 [echo] Project: service

ivy-retrieve-test:
 [echo] Project: service

compile-test:
 [echo] Project: service
[javac] Compiling 2 source files to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/service/test/classes

test:
 [echo] Project: hive

test-shims:
 [echo] Project: hive

test-conditions:
 [echo] Project: shims

gen-test:
 [echo] Project: shims

create-dirs:
 [echo] Project: shims
 [copy] Warning: 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/test/resources
 does not exist.

init:
 [echo] Project: shims

ivy-init-settings:
 [echo] Project: shims

ivy-resolve:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml
[ivy:report] Processing 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/ivy/resolution-cache/org.apache.hive-hive-shims-default.xml
 to 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/ivy/report/org.apache.hive-hive-shims-default.html

ivy-retrieve:
 [echo] Project: shims

compile:
 [echo] Project: shims
 [echo] Building shims 0.20

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20/java
 against hadoop 0.20.2 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/hadoopcore/hadoop-0.20.2)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.20S

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.20S/java
 against hadoop 1.0.0 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/hadoopcore/hadoop-1.0.0)

ivy-init-settings:
 [echo] Project: shims

ivy-resolve-hadoop-shim:
 [echo] Project: shims
[ivy:resolve] :: loading settings :: file = 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/ivy/ivysettings.xml

ivy-retrieve-hadoop-shim:
 [echo] Project: shims
 [echo] Building shims 0.23

build_shims:
 [echo] Project: shims
 [echo] Compiling 
https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/ws/hive/shims/src/common/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/common-secure/java;/home/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/hive/shims/src/0.23/java
 against hadoop 0.23.3 
(https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21-keepgoing=false/203/artifact/hive/build/hadoopcore/hadoop-0.23.3)

[jira] [Commented] (HIVE-3705) Adding authorization capability to the metastore

2012-11-19 Thread Rob Weltman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500375#comment-13500375
 ] 

Rob Weltman commented on HIVE-3705:
---

A new JIRA has been opened for the larger issues around the desired semantics 
of Hive authorization and ensuring they are enforced:

https://issues.apache.org/jira/browse/HIVE-3720


 Adding authorization capability to the metastore
 

 Key: HIVE-3705
 URL: https://issues.apache.org/jira/browse/HIVE-3705
 Project: Hive
  Issue Type: New Feature
  Components: Authorization, Metastore
Reporter: Sushanth Sowmyan
Assignee: Sushanth Sowmyan
 Attachments: HIVE-3705.D6681.1.patch, HIVE-3705.D6681.2.patch, 
 hive-backend-auth.git.patch, hivesec_investigation.pdf


 In an environment where multiple clients access a single metastore, and we 
 want to evolve hive security to a point where it's no longer simply 
 preventing users from shooting their own foot, we need to be able to 
 authorize metastore calls as well, instead of simply performing every 
 metastore api call that's made.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata reassigned HIVE-3718:
---

Assignee: Pamela Vagata

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500469#comment-13500469
 ] 

Carl Steinbach commented on HIVE-2206:
--

@Yin: The correlation optimizer is only enabled for a small set of new 
CliDriver tests. If I enable the correlation optimizer by default, which of the 
existing CliDriver tests are expected to fail?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: (was: HIVE-3718.1.patch.txt)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: HIVE-3718.1.patch.txt

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Status: Patch Available  (was: Open)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread David Inbar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500474#comment-13500474
 ] 

David Inbar commented on HIVE-2206:
---

I will be on vacation through Friday Nov 23rd, but will be checking email and 
voicemail periodically.

For all time-critical items, please call my mobile phone.

Many thanks,
David

NOTICE: All information in and attached to this email may be proprietary, 
confidential, privileged and otherwise protected from improper or erroneous 
disclosure. If you are not the sender's intended recipient, you are not 
authorized to intercept, read, print, retain, copy, forward, or disseminate 
this message.



 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500493#comment-13500493
 ] 

Kevin Wilfong commented on HIVE-3718:
-

+1

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3647) map-side groupby wrongly due to HIVE-3432

2012-11-19 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3647:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

Committed, thanks Namit.

 map-side groupby wrongly due to HIVE-3432
 -

 Key: HIVE-3647
 URL: https://issues.apache.org/jira/browse/HIVE-3647
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3647.1.patch, hive.3647.2.patch, hive.3647.3.patch, 
 hive.3647.4.patch, hive.3647.5.patch, hive.3647.6.patch, hive.3647.7.patch, 
 hive.3647.8.patch


 There seems to be a bug due to HIVE-3432.
 We are converting the group by to a map side group by after only looking at
 sorting columns. This can give wrong results if the data is sorted and
 bucketed by different columns.
 Add some tests for that scenario, verify and fix any issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500500#comment-13500500
 ] 

Carl Steinbach commented on HIVE-3678:
--

The upgrade scripts look good to me. As for HIVE-3712 which is included in this 
patch, I have started to wonder if it would be better for the metastore DB to 
store the column stats values (e.g. min/max value, num trues/falses, 
min/max/avg length, etc) as a JSON text blob. This approach would make the code 
more portable by eliminating dependencies on specific DBs and will also make it 
easier to add new fields in the future. The big downside of this approach is 
that we won't be able to push down column stats filters on these fields, but 
I'm not convinced that this is a practical use case in the first place.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500499#comment-13500499
 ] 

Yin Huai commented on HIVE-2206:


[~cwsteinbach]
If the optimizer is enabled by default, based on my last tests, only 
auto_join26.q is expected to fail, because it will be optimized by correlation 
optimizer. But, except the query plan, the query result of auto_join26.q is 
correct. Also, once I finished HIVE-3671 (I am working on it right now), the 
failure of auto_join26.q should be eliminated.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.1.patch.txt, HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, 
 HIVE-2206.4.patch.txt, HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, 
 HIVE-2206.6.patch.txt, HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3718) Add check to determine whether partition can be dropped at Semantic Analysis time

2012-11-19 Thread Pamela Vagata (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pamela Vagata updated HIVE-3718:


Attachment: (was: HIVE-3718.1.patch.txt)

 Add check to determine whether partition can be dropped at Semantic Analysis 
 time
 -

 Key: HIVE-3718
 URL: https://issues.apache.org/jira/browse/HIVE-3718
 Project: Hive
  Issue Type: Task
  Components: CLI
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor
 Attachments: HIVE-3718.1.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3719) Improve HiveServer to support username/password authentication

2012-11-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-3719:
---

Assignee: Yu Gao

 Improve HiveServer to support username/password authentication
 --

 Key: HIVE-3719
 URL: https://issues.apache.org/jira/browse/HIVE-3719
 Project: Hive
  Issue Type: Improvement
  Components: Authentication, JDBC
Affects Versions: 0.9.0
Reporter: Yu Gao
Assignee: Yu Gao
  Labels: security

 The current HiveServer implementation (call it HiveServer version 1 to 
 distinguish it from HIveServer2 that is under development currently) does not 
 have any authentication mechanism against connecting clients, which means 
 anyone can access it, e.g. through Hive JDBC driver, without any security 
 control. The user and password property are simply ignored by Hive JDBC 
 driver and never get to HiveServer1.
 It would be good to introduce authentication infrastructure to HiveServer 1, 
 and improve JDBC driver implementation as well to support this, so that 
 together with the existing authorization infrastructure, for applications 
 that want to access HiveServer1 via JDBC driver, connections and operations 
 are under security control.
 Although there's HiveServer2 that has been under implementation for a while, 
 this improvement for HiveServer1 is very necessary to fill the big security 
 hole, and would benefit applications a lot that are using HiveServer1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Shreepadma Venugopalan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500526#comment-13500526
 ] 

Shreepadma Venugopalan commented on HIVE-3678:
--

With the changes from HIVE-3712, the column schema has *no* dependency on any 
specific db. The column schema, with the changes from HIVE-3712, uses simple 
data types, which are supported across DBs. The primary motivation for making 
the change to the schema in HIVE-3712 was to avoid storing column statistics 
fields as a BLOB. The problem with using a BLOB is a) BLOBs are designed to 
store large volumes of data in the order of GBs and are hence stored outside 
the row. A consequence of this design is BLOBs don't perform well for storing 
small amounts of data. While some DBs such as Oracle inline small BLOBs, all 
DBs don't. While BLOBs are the only practical choice for storing data whose 
size is not known in advance, it is an overkill for storing around 100 bytes of 
data, and b) there is no uniform support across DB vendors and versions. Hence 
I don't really see the value in storing this as a JSON BLOB.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3709) Stop storing default ConfVars in temp file

2012-11-19 Thread Carl Steinbach (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3709:
-

Status: Open  (was: Patch Available)

@Kevin: I still see errors in TestHiveServerSessions when I run the test 
individually:

% ant clean package test -Dtestcase=TestHiveServerSessions

test:
 [echo] Project: service
[junit] WARNING: multiple versions of ant detected in path for junit 
[junit]  
jar:file:/Users/carl/.local/java/ant/lib/ant.jar!/org/apache/tools/ant/Project.class
[junit]  and 
jar:file:/Users/carl/Work/repos/hive-test/build/ivy/lib/hadoop0.20.shim/ant-1.6.5.jar!/org/apache/tools/ant/Project.class
[junit] Running org.apache.hadoop.hive.service.TestHiveServerSessions
[junit] Hive history 
file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_789001489.txt
[junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 8.439 sec
[junit] Hive history 
file=/Users/carl/Work/repos/hive-test/build/service/tmp/hive_job_log_carl_201211191056_788616740.txt
[junit] [Fatal Error] :1:1: Content is not allowed in prolog.
[junit] [Fatal Error] :92:58: The element type name must be terminated by 
the matching end-tag /name.
[junit] Test org.apache.hadoop.hive.service.TestHiveServerSessions FAILED
  [for] /Users/carl/Work/repos/hive-test/service/build.xml: The following 
error occurred while executing this line:
  [for] /Users/carl/Work/repos/hive-test/build.xml:325: The following error 
occurred while executing this line:
  [for] /Users/carl/Work/repos/hive-test/build-common.xml:455: Tests failed!

BUILD FAILED
/Users/carl/Work/repos/hive-test/build.xml:320: Keepgoing execution: 1 of 12 
iterations failed.


 Stop storing default ConfVars in temp file
 --

 Key: HIVE-3709
 URL: https://issues.apache.org/jira/browse/HIVE-3709
 Project: Hive
  Issue Type: Improvement
  Components: Configuration
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3709.1.patch.txt, HIVE-3709.2.patch.txt


 To work around issues with Hadoop's Configuration object, specifically it's 
 addResource(InputStream), default configurations are written to a temp file 
 (I think HIVE-2362 introduced this).
 This, however, introduces the problem that once that file is deleted from 
 /tmp the client crashes.  This is particularly problematic for long running 
 services like the metastore server.
 Writing a custom InputStream to deal with the problems in the Configuration 
 object should provide a work around, which does not introduce a time bomb 
 into Hive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500570#comment-13500570
 ] 

Carl Steinbach commented on HIVE-3678:
--

Sorry for the confusion. When I wrote blob I was trying to convey only that 
the field will be opaque to the DB (since it's a JSON struct), not that it will 
actually be stored in a BLOB column. If we store the JSON struct in a VARCHAR 
we have at least 4000 bytes to work with.

 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: HIVE-2206: add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7126/
---

(Updated Nov. 19, 2012, 7:51 p.m.)


Review request for hive.


Changes
---

Correlation optimizer will guess which join operators at the bottom (input 
tables are not intermediate tables) will be optimized by auto join convert and 
ignore those join operators in the optimization of correlation optimizer.


Description
---

This optimizer exploits intra-query correlations and merges multiple correlated 
MapReduce jobs into one jobs. Open a new request since I have been working on 
hive-git.


This addresses bug HIVE-2206.
https://issues.apache.org/jira/browse/HIVE-2206


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 9fa9525 
  conf/hive-default.xml.template f332f3a 
  
ql/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/ql/plan/api/OperatorType.java
 7c4c413 
  ql/src/java/org/apache/hadoop/hive/ql/exec/BaseReduceSinkOperator.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationCompositeOperator.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationLocalSimulativeReduceSinkOperator.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/exec/CorrelationReducerDispatchOperator.java
 PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ExecReducer.java 18a9bd2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java 46daeb2 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java 68302f8 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 0c22141 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ReduceSinkOperator.java 919a140 
  ql/src/java/org/apache/hadoop/hive/ql/exec/TableScanOperator.java 1469325 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizer.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/CorrelationOptimizerUtils.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java edde378 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/Optimizer.java d1555e2 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 2bf284d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 330aa52 
  ql/src/java/org/apache/hadoop/hive/ql/plan/BaseReduceSinkDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationCompositeDesc.java 
PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationLocalSimulativeReduceSinkDesc.java
 PRE-CREATION 
  
ql/src/java/org/apache/hadoop/hive/ql/plan/CorrelationReducerDispatchDesc.java 
PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapredWork.java 5a9f064 
  ql/src/java/org/apache/hadoop/hive/ql/plan/ReduceSinkDesc.java b33d616 
  ql/src/java/org/apache/hadoop/hive/ql/plan/TableScanDesc.java 9a95efd 
  ql/src/test/org/apache/hadoop/hive/ql/exec/TestExecDriver.java 6f8bc47 
  ql/src/test/queries/clientpositive/correlationoptimizer1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/correlationoptimizer5.q PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/correlationoptimizer5.q.out PRE-CREATION 
  ql/src/test/results/compiler/plan/groupby1.q.xml cd0d6e4 
  ql/src/test/results/compiler/plan/groupby2.q.xml 7b07f02 
  ql/src/test/results/compiler/plan/groupby3.q.xml a6a1986 
  ql/src/test/results/compiler/plan/groupby5.q.xml 25e3583 

Diff: https://reviews.apache.org/r/7126/diff/


Testing
---

All tests pass.


Thanks,

Yin Huai



[jira] [Updated] (HIVE-3648) HiveMetaStoreFsImpl is not compatible with hadoop viewfs

2012-11-19 Thread Arup Malakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arup Malakar updated HIVE-3648:
---

Attachment: HIVE_3648_branch_0.patch
HIVE_3648_trunk_1.patch

Patch available for branch. Added one missing abstract method in 
HadoopShimsSecure class.

Updated trunk review: https://reviews.facebook.net/D6759
Branch review: https://reviews.facebook.net/D6801

Thanks,
Arup

 HiveMetaStoreFsImpl is not compatible with hadoop viewfs
 

 Key: HIVE-3648
 URL: https://issues.apache.org/jira/browse/HIVE-3648
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.9.0, 0.10.0
Reporter: Kihwal Lee
 Attachments: HIVE_3648_branch_0.patch, HIVE-3648-trunk-0.patch, 
 HIVE_3648_trunk_1.patch


 HiveMetaStoreFsImpl#deleteDir() method calls Trash#moveToTrash(). This may 
 not work when viewfs is used. It needs to call Trash#moveToAppropriateTrash() 
 instead.  Please note that this method is not available in hadoop versions 
 earlier than 0.23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3721) ALTER TABLE ADD PARTS should check for valid partition spec and throw a SemanticException if part spec is not valid

2012-11-19 Thread Pamela Vagata (JIRA)
Pamela Vagata created HIVE-3721:
---

 Summary: ALTER TABLE ADD PARTS should check for valid partition 
spec and throw a SemanticException if part spec is not valid
 Key: HIVE-3721
 URL: https://issues.apache.org/jira/browse/HIVE-3721
 Project: Hive
  Issue Type: Task
Reporter: Pamela Vagata
Assignee: Pamela Vagata
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-2206:
---

Attachment: HIVE-2206.19-r1410581.patch.txt

I just integrate HIVE-3671 into this patch. At the beginning of correlation 
optimizer, it will predict if a join operator will be converted by 
CommonJoinResolver, if so, correlation optimizer will annotate this join 
operator and in the future optimization, ignore this operator. The prediction 
can only be made to those join operators the input tables of which are not 
intermediate tables. The method of the prediction is ported from 
CommonJoinResolver. Also, a test is added in correlationoptimizer1.q

[~namit]
Please take a look at this patch. Let me know if you have any comment.

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3678) Add metastore upgrade scripts for column stats schema changes

2012-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500599#comment-13500599
 ] 

Ashutosh Chauhan commented on HIVE-3678:


I agree with Carl, making it easier to evolve such that its independent of 
exact type will be a win. We already have one such use-case with BigDecimal 
support being added over on HIVE-2693. 

Also, following looks unintentional change.
{code}
 -- Constraints for table PARTITION_KEYS
-ALTER TABLE PARTITION_KEYS ADD CONSTRAINT PARTITION_KEYS_FK1 FOREIGN KEY 
(TBL_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
+ALTER TABLE PARTITION_KEYS ADD CONSTRAINT PARTITION_KEYS_FK1 FOREIGN KEY 
(TBTB_ID) REFERENCES TBLS (TBL_ID) INITIALLY DEFERRED ;
{code}


 Add metastore upgrade scripts for column stats schema changes
 -

 Key: HIVE-3678
 URL: https://issues.apache.org/jira/browse/HIVE-3678
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Reporter: Shreepadma Venugopalan
Assignee: Shreepadma Venugopalan
 Fix For: 0.10.0

 Attachments: HIVE-3678.1.patch.txt


 Add upgrade script for column statistics schema changes for 
 Postgres/MySQL/Oracle/Derby

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2206) add a new optimizer for query correlation discovery and optimization

2012-11-19 Thread Carl Steinbach (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500626#comment-13500626
 ] 

Carl Steinbach commented on HIVE-2206:
--

I'm surprised that auto_join26 is the only test that fails due to different 
EXPLAIN output. Is that because this optimization doesn't affect the queries in 
most tests, or because we don't consistently call EXPLAIN in the tests?

What is preventing us from enabling this by default right now?

 add a new optimizer for query correlation discovery and optimization
 

 Key: HIVE-2206
 URL: https://issues.apache.org/jira/browse/HIVE-2206
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: He Yongqiang
Assignee: Yin Huai
 Attachments: HIVE-2206.10-r1384442.patch.txt, 
 HIVE-2206.11-r1385084.patch.txt, HIVE-2206.12-r1386996.patch.txt, 
 HIVE-2206.13-r1389072.patch.txt, HIVE-2206.14-r1389704.patch.txt, 
 HIVE-2206.15-r1392491.patch.txt, HIVE-2206.16-r1399936.patch.txt, 
 HIVE-2206.17-r1404933.patch.txt, HIVE-2206.18-r1407720.patch.txt, 
 HIVE-2206.19-r1410581.patch.txt, HIVE-2206.1.patch.txt, 
 HIVE-2206.2.patch.txt, HIVE-2206.3.patch.txt, HIVE-2206.4.patch.txt, 
 HIVE-2206.5-1.patch.txt, HIVE-2206.5.patch.txt, HIVE-2206.6.patch.txt, 
 HIVE-2206.7.patch.txt, HIVE-2206.8.r1224646.patch.txt, 
 HIVE-2206.8-r1237253.patch.txt, testQueries.2.q, YSmartPatchForHive.patch


 This issue proposes a new logical optimizer called Correlation Optimizer, 
 which is used to merge correlated MapReduce jobs (MR jobs) into a single MR 
 job. The idea is based on YSmart (http://ysmart.cse.ohio-state.edu/).The 
 paper and slides of YSmart are linked at the bottom.
 Since Hive translates queries in a sentence by sentence fashion, for every 
 operation which may need to shuffle the data (e.g. join and aggregation 
 operations), Hive will generate a MapReduce job for that operation. However, 
 for those operations which may need to shuffle the data, they may involve 
 correlations explained below and thus can be executed in a single MR job.
 # Input Correlation: Multiple MR jobs have input correlation (IC) if their 
 input relation sets are not disjoint;
 # Transit Correlation: Multiple MR jobs have transit correlation (TC) if they 
 have not only input correlation, but also the same partition key;
 # Job Flow Correlation: An MR has job flow correlation (JFC) with one of its 
 child nodes if it has the same partition key as that child node.
 The current implementation of correlation optimizer only detect correlations 
 among MR jobs for reduce-side join operators and reduce-side aggregation 
 operators (not map only aggregation). A query will be optimized if it 
 satisfies following conditions.
 # There exists a MR job for reduce-side join operator or reduce side 
 aggregation operator which have JFC with all of its parents MR jobs (TCs will 
 be also exploited if JFC exists);
 # All input tables of those correlated MR job are original input tables (not 
 intermediate tables generated by sub-queries); and 
 # No self join is involved in those correlated MR jobs.
 Correlation optimizer is implemented as a logical optimizer. The main reasons 
 are that it only needs to manipulate the query plan tree and it can leverage 
 the existing component on generating MR jobs.
 Current implementation can serve as a framework for correlation related 
 optimizations. I think that it is better than adding individual optimizers. 
 There are several work that can be done in future to improve this optimizer. 
 Here are three examples.
 # Support queries only involve TC;
 # Support queries in which input tables of correlated MR jobs involves 
 intermediate tables; and 
 # Optimize queries involving self join. 
 References:
 Paper and presentation of YSmart.
 Paper: 
 http://www.cse.ohio-state.edu/hpcs/WWW/HTML/publications/papers/TR-11-7.pdf
 Slides: http://sdrv.ms/UpwJJc

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Hive-0.9.1-SNAPSHOT-h0.21 #203

2012-11-19 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hive-0.9.1-SNAPSHOT-h0.21/203/

--
[...truncated 36981 lines...]
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-29_760_9041461297608391868/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_402762564.txt
[junit] Copying file: 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] Copying data from 
file:/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt
[junit] Loading data to table default.testhivedrivertable
[junit] POSTHOOK: query: load data local inpath 
'/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/data/files/kv1.txt'
 into table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: select * from testhivedrivertable limit 10
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-33_658_2399849414089401271/-mr-1
[junit] POSTHOOK: query: select * from testhivedrivertable limit 10
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: 
file:/tmp/jenkins/hive_2012-11-19_12-44-33_658_2399849414089401271/-mr-1
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_1902789586.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (num int)
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: create table testhivedrivertable (num int)
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] PREHOOK: Input: default@testhivedrivertable
[junit] PREHOOK: Output: default@testhivedrivertable
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] POSTHOOK: Input: default@testhivedrivertable
[junit] POSTHOOK: Output: default@testhivedrivertable
[junit] OK
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_994263279.txt
[junit] Hive history 
file=/x1/jenkins/jenkins-slave/workspace/Hive-0.9.1-SNAPSHOT-h0.21/hive/build/service/tmp/hive_job_log_jenkins_201211191244_1983954224.txt
[junit] PREHOOK: query: drop table testhivedrivertable
[junit] PREHOOK: type: DROPTABLE
[junit] POSTHOOK: query: drop table testhivedrivertable
[junit] POSTHOOK: type: DROPTABLE
[junit] OK
[junit] PREHOOK: query: create table testhivedrivertable (key int, value 
string)
   

Re: hive 0.10 release

2012-11-19 Thread Ashutosh Chauhan
Another quick update. I have created a hive-0.10 branch. At this point,
HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
have which were there in my previous email. I will be happy to merge new
patches between now and RC if folks request for it and are low risk.

Thanks,
Ashutosh
On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Good progress. Looks like folks are on board. I propose to cut the branch
 in next couple of days. There are few jiras which are patch ready which I
 want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
 HIVE-3400 HIVE-3678
 Ed has already made a request for HIVE-3083.  If folks have other patches
 they want see in 0.10, please chime in.
 Also, request to other committers to help in review patches. There are
 quite a few in Patch Available state.

 Thanks,
 Ashutosh


 On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley omal...@apache.org wrote:

 +1


 On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach c...@cloudera.com wrote:

  +1
 
  On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz wget.n...@gmail.com
  wrote:
 
   +1, good karma
  
   On Nov 8, 2012, at 4:58 AM, Namit Jain nj...@fb.com wrote:
  
+1 to the idea
   
On 11/8/12 6:33 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
   
That sounds good. I think this issue needs to be solved as well as
anything else that produces a bugus query result.
   
https://issues.apache.org/jira/browse/HIVE-3083
   
Edward
   
On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan 
  hashut...@apache.org
wrote:
Hi,
   
Its been a while since we released 0.10 more than six months ago.
 All
this
while, lot of action has happened with various cool features
 landing
  in
trunk. Additionally, I am looking forward to HiveServer2 landing
 in
trunk.  So, I propose that we cut the branch for 0.10 soon
 afterwards
and
than release it. Thoughts?
   
Thanks,
Ashutosh
   
  
   --
   Alexander Alten-Lorenz
   http://mapredit.blogspot.com
   German Hadoop LinkedIn Group: http://goo.gl/N8pCF
  
  
 





Re: hive 0.10 release

2012-11-19 Thread kulkarni.swar...@gmail.com
There are couple of enhancements that I have been working on mainly related
to the hive/hbase integration. It would be awesome if it is possible at all
to include them in this release. None of them should really be high risk. I
have patches submitted for few of them. Will try to get for others
submitted in next couple of days. Any specific deadline that I should be
looking forward to?

[1] https://issues.apache.org/jira/browse/HIVE-2599 (Patch Available)
[2] https://issues.apache.org/jira/browse/HIVE-3553 (Patch Available)
[3] https://issues.apache.org/jira/browse/HIVE-3211
[4] https://issues.apache.org/jira/browse/HIVE-3555
[5] https://issues.apache.org/jira/browse/HIVE-3725


On Mon, Nov 19, 2012 at 4:55 PM, Ashutosh Chauhan hashut...@apache.orgwrote:

 Another quick update. I have created a hive-0.10 branch. At this point,
 HIVE-3678 is a blocker to do a 0.10 release. There are few others nice to
 have which were there in my previous email. I will be happy to merge new
 patches between now and RC if folks request for it and are low risk.

 Thanks,
 Ashutosh
 On Thu, Nov 15, 2012 at 2:29 PM, Ashutosh Chauhan hashut...@apache.org
 wrote:

  Good progress. Looks like folks are on board. I propose to cut the branch
  in next couple of days. There are few jiras which are patch ready which I
  want to get into the hive-0.10 release, including HIVE-3255 HIVE-2517
  HIVE-3400 HIVE-3678
  Ed has already made a request for HIVE-3083.  If folks have other patches
  they want see in 0.10, please chime in.
  Also, request to other committers to help in review patches. There are
  quite a few in Patch Available state.
 
  Thanks,
  Ashutosh
 
 
  On Thu, Nov 8, 2012 at 3:22 PM, Owen O'Malley omal...@apache.org
 wrote:
 
  +1
 
 
  On Thu, Nov 8, 2012 at 3:18 PM, Carl Steinbach c...@cloudera.com
 wrote:
 
   +1
  
   On Wed, Nov 7, 2012 at 11:23 PM, Alexander Lorenz 
 wget.n...@gmail.com
   wrote:
  
+1, good karma
   
On Nov 8, 2012, at 4:58 AM, Namit Jain nj...@fb.com wrote:
   
 +1 to the idea

 On 11/8/12 6:33 AM, Edward Capriolo edlinuxg...@gmail.com
  wrote:

 That sounds good. I think this issue needs to be solved as well
 as
 anything else that produces a bugus query result.

 https://issues.apache.org/jira/browse/HIVE-3083

 Edward

 On Wed, Nov 7, 2012 at 7:50 PM, Ashutosh Chauhan 
   hashut...@apache.org
 wrote:
 Hi,

 Its been a while since we released 0.10 more than six months
 ago.
  All
 this
 while, lot of action has happened with various cool features
  landing
   in
 trunk. Additionally, I am looking forward to HiveServer2 landing
  in
 trunk.  So, I propose that we cut the branch for 0.10 soon
  afterwards
 and
 than release it. Thoughts?

 Thanks,
 Ashutosh

   
--
Alexander Alten-Lorenz
http://mapredit.blogspot.com
German Hadoop LinkedIn Group: http://goo.gl/N8pCF
   
   
  
 
 
 




-- 
Swarnim


Hive-trunk-h0.21 - Build # 1805 - Still Failing

2012-11-19 Thread Apache Jenkins Server
Changes for Build #1764
[kevinwilfong] HIVE-3610. Add a command Explain dependency ... (Sambavi 
Muthukrishnan via kevinwilfong)


Changes for Build #1765

Changes for Build #1766
[hashutosh] HIVE-3441 : testcases escape1,escape2 fail on windows (Thejas Nair 
via Ashutosh Chauhan)

[kevinwilfong] HIVE-3499. add tests to use bucketing metadata for partitions. 
(njain via kevinwilfong)


Changes for Build #1767
[kevinwilfong] HIVE-3276. optimize union sub-queries. (njain via kevinwilfong)


Changes for Build #1768

Changes for Build #1769

Changes for Build #1770
[namit] HIVE-3570 Add/fix facility to collect operator specific statisticsin 
hive + add hash-in/hash-out
counter for GroupBy Optr (Satadru Pan via namit)

[namit] HIVE-3554 Hive List Bucketing - Query logic
(Gang Tim Liu via namit)

[cws] HIVE-3563. Drop database cascade fails when there are indexes on any 
tables (Prasad Mujumdar via cws)


Changes for Build #1771
[kevinwilfong] HIVE-3640. Reducer allocation is incorrect if enforce bucketing 
and mapred.reduce.tasks are both set. (Vighnesh Avadhani via kevinwilfong)


Changes for Build #1772

Changes for Build #1773

Changes for Build #1774

Changes for Build #1775
[namit] HIVE-3673 Sort merge join not used when join columns have different 
names
(Kevin Wilfong via namit)


Changes for Build #1776
[kevinwilfong] HIVE-3627. eclipse misses library: 
javolution-@javolution-version@.jar. (Gang Tim Liu via kevinwilfong)


Changes for Build #1777
[kevinwilfong] HIVE-3524. Storing certain Exception objects thrown in 
HiveMetaStore.java in MetaStoreEndFunctionContext. (Maheshwaran Srinivasan via 
kevinwilfong)

[cws] HIVE-1977. DESCRIBE TABLE syntax doesn't support specifying a database 
qualified table name (Zhenxiao Luo via cws)

[cws] HIVE-3674. Test case TestParse broken after recent checkin (Sambavi 
Muthukrishnan via cws)


Changes for Build #1778
[cws] HIVE-1362. Column level scalar valued statistics on Tables and Partitions 
(Shreepadma Venugopalan via cws)


Changes for Build #1779

Changes for Build #1780
[kevinwilfong] HIVE-3686. Fix compile errors introduced by the interaction of 
HIVE-1362 and HIVE-3524. (Shreepadma Venugopalan via kevinwilfong)


Changes for Build #1781
[namit] HIVE-3687 smb_mapjoin_13.q is nondeterministic
(Kevin Wilfong via namit)


Changes for Build #1782
[hashutosh] HIVE-2715: Upgrade Thrift dependency to 0.9.0 (Ashutosh Chauhan)


Changes for Build #1783
[kevinwilfong] HIVE-3654. block relative path access in hive. (njain via 
kevinwilfong)

[hashutosh] HIVE-3658 : Unable to generate the Hbase related unit tests using 
velocity templates on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3661 : Remove the Windows specific = related swizzle path 
changes from Proxy FileSystems (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3480 : Resource leak: Fix the file handle leaks in Symbolic 
 Symlink related input formats. (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1784
[kevinwilfong] HIVE-3675. NaN does not work correctly for round(n). (njain via 
kevinwilfong)

[cws] HIVE-3651. bucketmapjoin?.q tests fail with hadoop 0.23 (Prasad Mujumdar 
via cws)


Changes for Build #1785
[namit] HIVE-3613 Implement grouping_id function
(Ian Gorbachev via namit)

[namit] HIVE-3692 Update parallel test documentation
(Ivan Gorbachev via namit)

[namit] HIVE-3649 Hive List Bucketing - enhance DDL to specify list bucketing 
table
(Gang Tim Liu via namit)


Changes for Build #1786
[namit] HIVE-3696 Revert HIVE-3483 which causes performance regression
(Gang Tim Liu via namit)


Changes for Build #1787
[kevinwilfong] HIVE-3621. Make prompt in Hive CLI configurable. (Jingwei Lu via 
kevinwilfong)

[kevinwilfong] HIVE-3695. TestParse breaks due to HIVE-3675. (njain via 
kevinwilfong)


Changes for Build #1788
[kevinwilfong] HIVE-3557. Access to external URLs in hivetest.py. (Ivan 
Gorbachev via kevinwilfong)


Changes for Build #1789
[hashutosh] HIVE-3662 : TestHiveServer: testScratchDirShouldClearWhileStartup 
is failing on Windows (Kanna Karanam via Ashutosh Chauhan)

[hashutosh] HIVE-3659 : TestHiveHistory::testQueryloglocParentDirNotExist Test 
fails on Windows because of some resource leaks in ZK (Kanna Karanam via 
Ashutosh Chauhan)

[hashutosh] HIVE-3663 Unable to display the MR Job file path on Windows in case 
of MR job failures.  (Kanna Karanam via Ashutosh Chauhan)


Changes for Build #1790

Changes for Build #1791

Changes for Build #1792

Changes for Build #1793
[hashutosh] HIVE-3704 : name of some metastore scripts are not per convention 
(Ashutosh Chauhan)


Changes for Build #1794
[hashutosh] HIVE-3243 : ignore white space between entries of hive/hbase table 
mapping (Shengsheng Huang via Ashutosh Chauhan)

[hashutosh] HIVE-3215 : JobDebugger should use RunningJob.getTrackingURL 
(Bhushan Mandhani via Ashutosh Chauhan)


Changes for Build #1795
[cws] HIVE-3437. 0.23 compatibility: fix unit tests when building against 0.23 
(Chris Drome via cws)


[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500828#comment-13500828
 ] 

Namit Jain commented on HIVE-3722:
--

+1

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500834#comment-13500834
 ] 

Ashutosh Chauhan commented on HIVE-3722:


Kevin,
I am not sure if you have looked at the discussion on HIVE-2800 Adding a 
null-check may just be masking an underlying issue. I think it might be 
worthwhile to uncover it, since this thrift nuisance (of null handling) may 
bite us again in future.

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3722) Create index fails on CLI using remote metastore

2012-11-19 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500847#comment-13500847
 ] 

Kevin Wilfong commented on HIVE-3722:
-

Ashutosh, I missed that JIRA.  But based on THRIFT-1625 it sounds like we have 
to add a check to our code.

 Create index fails on CLI using remote metastore
 

 Key: HIVE-3722
 URL: https://issues.apache.org/jira/browse/HIVE-3722
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Kevin Wilfong
 Attachments: HIVE-3722.1.patch.txt


 If the CLI uses a remote metastore and the user attempts to create an index 
 without a comment, it will fail with a NPE.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3589) describe/show partition/show tblproperties command should accept database name

2012-11-19 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500857#comment-13500857
 ] 

Phabricator commented on HIVE-3589:
---

navis has commented on the revision HIVE-3589 [jira] describe/show 
partition/show tblproperties command should accept database name.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java:1802 fixed.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1407 I 
just split original method to two. Exception seemed for handling thrift errors 
and should be re-thrown to user.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1472 
agreed. I'll do it.
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java:1474 I 
always thought splitting with regex pattern for this kind of simple string is a 
bit too much. But if it's cleaner, I'll do it.
  ql/src/java/org/apache/hadoop/hive/ql/plan/DescTableDesc.java:38 ok.
  ql/src/java/org/apache/hadoop/hive/ql/plan/DescTableDesc.java:112 I'll check 
on that.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ShowPartitionsDesc.java:64 ok.
  ql/src/java/org/apache/hadoop/hive/ql/plan/ShowTblPropertiesDesc.java:34 ok.
  ql/src/test/queries/clientpositive/describe_table.q:5 Yes, it was HIVE-3676. 
I'll add the test.

REVISION DETAIL
  https://reviews.facebook.net/D6075

BRANCH
  DPAL-1916

To: JIRA, cwsteinbach, navis


 describe/show partition/show tblproperties command should accept database name
 --

 Key: HIVE-3589
 URL: https://issues.apache.org/jira/browse/HIVE-3589
 Project: Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Affects Versions: 0.8.1
Reporter: Sujesh Chirackkal
Assignee: Navis
Priority: Minor
 Attachments: HIVE-3589.D6075.1.patch


 describe command not giving the details when called as describe 
 dbname.tablename.
 Throwing the error Table dbname not found.
 Ex: hive -e describe masterdb.table1 will throw error
 Table masterdb not found

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-11-19 Thread Alexander Alten-Lorenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Attachment: (was: HIVE-3635.patch)

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-11-19 Thread Alexander Alten-Lorenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Status: Patch Available  (was: Open)

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-11-19 Thread Alexander Alten-Lorenz (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alten-Lorenz updated HIVE-3635:
-

Attachment: HIVE-3635.patch

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3635) allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-11-19 Thread Alexander Alten-Lorenz (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13500878#comment-13500878
 ] 

Alexander Alten-Lorenz commented on HIVE-3635:
--

Replaced available patch here with the newer one.

  allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for 
 the boolean hive type
 ---

 Key: HIVE-3635
 URL: https://issues.apache.org/jira/browse/HIVE-3635
 Project: Hive
  Issue Type: Improvement
  Components: CLI
Affects Versions: 0.9.0
Reporter: Alexander Alten-Lorenz
Assignee: Alexander Alten-Lorenz
 Fix For: 0.10.0

 Attachments: HIVE-3635.patch


 interpret t as true and f as false for boolean types. PostgreSQL exports 
 represent it that way.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: allow 't', 'T', '1', 'f', 'F', and '0' to be allowable true/false values for the boolean hive type

2012-11-19 Thread Alexander Alten-Lorenz

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7759/
---

(Updated Nov. 20, 2012, 7:11 a.m.)


Review request for hive.


Changes
---

indentation fixed


Description
---

interpret t as true and f as false for boolean types. PostgreSQL exports 
represent it that way


This addresses bug HIVE-3635.
https://issues.apache.org/jira/browse/HIVE-3635


Diffs (updated)
-

  serde/src/java/org/apache/hadoop/hive/serde2/lazy/LazyBoolean.java c741c3a 

Diff: https://reviews.apache.org/r/7759/diff/


Testing
---


Thanks,

Alexander Alten-Lorenz



[jira] [Updated] (HIVE-3073) Hive List Bucketing - DML support

2012-11-19 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3073:
---

Status: Patch Available  (was: Open)

Another patch. thanks

 Hive List Bucketing - DML support 
 --

 Key: HIVE-3073
 URL: https://issues.apache.org/jira/browse/HIVE-3073
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Affects Versions: 0.10.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3073.patch.12, HIVE-3073.patch.13, 
 HIVE-3073.patch.15


 If a hive table column has skewed keys, query performance on non-skewed key 
 is always impacted. Hive List Bucketing feature will address it:
 https://cwiki.apache.org/Hive/listbucketing.html
 This jira issue will track DML change for the feature:
 1. single skewed column
 2. manual load data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3073) Hive List Bucketing - DML support

2012-11-19 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3073:
---

Attachment: HIVE-3073.patch.15

 Hive List Bucketing - DML support 
 --

 Key: HIVE-3073
 URL: https://issues.apache.org/jira/browse/HIVE-3073
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Affects Versions: 0.10.0
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3073.patch.12, HIVE-3073.patch.13, 
 HIVE-3073.patch.15


 If a hive table column has skewed keys, query performance on non-skewed key 
 is always impacted. Hive List Bucketing feature will address it:
 https://cwiki.apache.org/Hive/listbucketing.html
 This jira issue will track DML change for the feature:
 1. single skewed column
 2. manual load data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira