Review Request: HIVE-1948

2011-02-06 Thread Devaraj Das

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/398/
---

Review request for hive.


Summary
---

Adds audit logging to the metastore


This addresses bug HIVE-1948.
https://issues.apache.org/jira/browse/HIVE-1948


Diffs
-

  
http://svn.apache.org/repos/asf/hive/trunk/metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
 1067080 

Diff: https://reviews.apache.org/r/398/diff


Testing
---

Tested manually and checked the audit logs appear in the configured log file.


Thanks,

Devaraj



[jira] Commented: (HIVE-1948) Have audit logging in the Metastore

2011-02-06 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991189#comment-12991189
 ] 

Devaraj Das commented on HIVE-1948:
---

https://reviews.apache.org/r/398/ is the reviewboard URL

 Have audit logging in the Metastore
 ---

 Key: HIVE-1948
 URL: https://issues.apache.org/jira/browse/HIVE-1948
 Project: Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.7.0

 Attachments: audit-log.1.patch, audit-log.patch


 It would be good to have audit logging in the metastore, similar to Hadoop's 
 NameNode audit logging. This would allow administrators to dig into details 
 about which user performed metadata operations (like create/drop 
 tables/partitions) and from where (IP address).

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (HIVE-1900) a mapper should be able to span multiple partitions

2011-02-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain reassigned HIVE-1900:


Assignee: Namit Jain  (was: Krishna Kumar)

 a mapper should be able to span multiple partitions
 ---

 Key: HIVE-1900
 URL: https://issues.apache.org/jira/browse/HIVE-1900
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.1900.1.patch, hive.1900.2.patch


 Currently, a  mapper only spans a single partition which creates a problem in 
 the presence of many
 small partitions (which is becoming a common usecase in facebook).
 If the plan is the same, a mapper should be able to span files across 
 multiple partitions

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (HIVE-1961) Make Stats gathering more flexible with timeout and atomicity

2011-02-06 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-1961:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed. Thanks Ning

 Make Stats gathering more flexible with timeout and atomicity
 -

 Key: HIVE-1961
 URL: https://issues.apache.org/jira/browse/HIVE-1961
 Project: Hive
  Issue Type: Improvement
Reporter: Ning Zhang
Assignee: Ning Zhang
 Attachments: HIVE-1961.patch


 The JDBC-type stats publisher and and aggregator use a hard-coded timeout 
 value. We should make it configurable. Also we should introduce a hive 
 variable to allow atomic stats gathering (metastore stats are updated only if 
 all types of stats are made available). 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




Build failed in Hudson: Hive-trunk-h0.20 #538

2011-02-06 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/538/changes

Changes:

[namit] HIVE-1961 Make Stats gathering more flexible with timeout and atomicity
(Ning Zhang via namit)

--
[...truncated 19174 lines...]
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt
[junit] Loading data to table srcbucket
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt'
 INTO TABLE srcbucket
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket
[junit] OK
[junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] PREHOOK: type: CREATETABLE
[junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) 
CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE
[junit] POSTHOOK: type: CREATETABLE
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt
[junit] Loading data to table srcbucket2
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt'
 INTO TABLE srcbucket2
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@srcbucket2
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt
[junit] Loading data to table src
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt'
 INTO TABLE src
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src
[junit] OK
[junit] PREHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] PREHOOK: type: LOAD
[junit] Copying data from 
https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt
[junit] Loading data to table src1
[junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 
'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt'
 INTO TABLE src1
[junit] POSTHOOK: type: LOAD
[junit] POSTHOOK: Output: default@src1

[jira] Updated: (HIVE-1803) Implement bitmap indexing in Hive

2011-02-06 Thread Marquis Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marquis Wang updated HIVE-1803:
---

Attachment: HIVE-1803.2.patch

We fixed the problem in BitmapCollectSet by looking at the PercentileApprox 
UDAF to figure out how to use an array an input to a UDAF.

This new patch is a working implementation of bitmap indexing. The new test 
index_bitmap.q shows how to use the index. However, I am unable to add the test 
itself, and get errors when I run 

ant test -Dtestcase=TestCliDriver -Dqfile=index_bitmap.q -Doverwrite=true 
-Dtest.silent=false

It says 


Exception: java.lang.RuntimeException: The table 
default__srcpart_srcpart_index_proj__ is an index table. Please do drop index 
instead.

wrt to the ALTER INDEX REBUILD line in the test.

We're pretty confused about whether we're doing the new test incorrectly and 
would appreciate any help.

While we're working to get around that we're also going to go ahead and work on 
a compressed bitmap, since this implementation does no compression.

 Implement bitmap indexing in Hive
 -

 Key: HIVE-1803
 URL: https://issues.apache.org/jira/browse/HIVE-1803
 Project: Hive
  Issue Type: New Feature
Reporter: Marquis Wang
Assignee: Marquis Wang
 Attachments: HIVE-1803.1.patch, HIVE-1803.2.patch, 
 bitmap_index_1.png, bitmap_index_2.png


 Implement bitmap index handler to complement compact indexing.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-1918) Add export/import facilities to the hive system

2011-02-06 Thread Paul Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12991282#comment-12991282
 ] 

Paul Yang commented on HIVE-1918:
-

Looking at it as well..

 Add export/import facilities to the hive system
 ---

 Key: HIVE-1918
 URL: https://issues.apache.org/jira/browse/HIVE-1918
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Krishna Kumar
Assignee: Krishna Kumar
 Attachments: HIVE-1918.patch.1.txt, HIVE-1918.patch.2.txt, 
 HIVE-1918.patch.3.txt, HIVE-1918.patch.txt, hive-metastore-er.pdf


 This is an enhancement request to add export/import features to hive.
 With this language extension, the user can export the data of the table - 
 which may be located in different hdfs locations in case of a partitioned 
 table - as well as the metadata of the table into a specified output 
 location. This output location can then be moved over to another different 
 hadoop/hive instance and imported there.  
 This should work independent of the source and target metastore dbms used; 
 for instance, between derby and mysql.
 For partitioned tables, the ability to export/import a subset of the 
 partition must be supported.
 Howl will add more features on top of this: The ability to create/use the 
 exported data even in the absence of hive, using MR or Pig. Please see 
 http://wiki.apache.org/pig/Howl/HowlImportExport for these details.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira