[jira] Commented: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983598#action_12983598 ] Mac Yang commented on HIVE-1862: Paul, thanks for the comment. - I will add a check to catch the case where it's the last element in the path - Escaping the value does not work for the like operator, for example, the pattern p.*3 would match values like p13. However, it would also match value p1# since it got truned into p1%23 - I will leave get_partition_names_ps()/get_partitions_ps() as the way they are in the trunk in the next patch Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1696) Add delegation token support to metastore
[ https://issues.apache.org/jira/browse/HIVE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983814#action_12983814 ] Devaraj Das commented on HIVE-1696: --- For the record, I'd like to mention that Pradeep Kamath did a lot of initial work on the patch. Thanks, Pradeep! Add delegation token support to metastore - Key: HIVE-1696 URL: https://issues.apache.org/jira/browse/HIVE-1696 Project: Hive Issue Type: Sub-task Components: Metastore, Security, Server Infrastructure Reporter: Todd Lipcon Assignee: Devaraj Das Fix For: 0.7.0 Attachments: hive-1696-1-with-gen-code.patch, hive-1696-1.patch, hive-1696-3-with-gen-code.patch, hive-1696-3.patch, hive-1696-4-with-gen-code.1.patch, hive-1696-4-with-gen-code.patch, hive-1696-4.patch, hive-1696-4.patch, hive_1696.patch, hive_1696.patch, hive_1696_no-thrift.patch As discussed in HIVE-842, kerberos authentication is only sufficient for authentication of a hive user client to the metastore. There are other cases where thrift calls need to be authenticated when the caller is running in an environment without kerberos credentials. For example, an MR task running as part of a hive job may want to report statistics to the metastore, or a job may be running within the context of Oozie or Hive Server. This JIRA is to implement support of delegation tokens for the metastore. The concept of a delegation token is borrowed from the Hadoop security design - the quick summary is that a kerberos-authenticated client may retrieve a binary token from the server. This token can then be passed to other clients which can use it to achieve authentication as the original user in lieu of a kerberos ticket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang reassigned HIVE-1862: --- Assignee: Mac Yang Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Mac Yang Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
how to get handle to the output of Hive query and know the control flow of the program
hi, Can someone please let me know how can i find out the control flow in Hive of a query that is executed ? I want to know the modules called so that I can modify them for my purpose. Actually, finally i want to have handle to the output of the Hive query to be stored in some text file rather than just display it. I am unable to figure it out looking at the code. Please help. regards, Abhinav Narain
Re: Storage Handler using JDBC
Hi, Is there any feedback on this question? Thanks, Vijay On Jan 15, 2011, at 12:36 PM, Vijay tec...@gmail.com wrote: The storage handler mechanism seems like an excellent way to support mixing hive with a traditional database using a generic JDBC storage handler. While that may not always be the best thing to do, is there any work targeted at this integration? Are there any issues or problems preventing such an integration? Thanks, Vijay
subscription to hive dev list
hi, I am student working on Hive. Please grant me access to the mailing list regards, Abhinav narain
[jira] Updated: (HIVE-1920) DESCRIBE with comments is difficult to read
[ https://issues.apache.org/jira/browse/HIVE-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1920: Component/s: CLI DESCRIBE with comments is difficult to read --- Key: HIVE-1920 URL: https://issues.apache.org/jira/browse/HIVE-1920 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang When DESCRIBE is run, comments for columns are displayed next to the column type. A problem with this is that if the comment contains line breaks, it is difficult to differentiate the columns from the comments and is difficult to read. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1920) DESCRIBE with comments is difficult to read
DESCRIBE with comments is difficult to read --- Key: HIVE-1920 URL: https://issues.apache.org/jira/browse/HIVE-1920 Project: Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang When DESCRIBE is run, comments for columns are displayed next to the column type. A problem with this is that if the comment contains line breaks, it is difficult to differentiate the columns from the comments and is difficult to read. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1918) Add export/import facilities to the hive system
[ https://issues.apache.org/jira/browse/HIVE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-1918: - Status: Open (was: Patch Available) Add export/import facilities to the hive system --- Key: HIVE-1918 URL: https://issues.apache.org/jira/browse/HIVE-1918 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Krishna Kumar Attachments: HIVE-1918.patch.txt This is an enhancement request to add export/import features to hive. With this language extension, the user can export the data of the table - which may be located in different hdfs locations in case of a partitioned table - as well as the metadata of the table into a specified output location. This output location can then be moved over to another different hadoop/hive instance and imported there. This should work independent of the source and target metastore dbms used; for instance, between derby and mysql. For partitioned tables, the ability to export/import a subset of the partition must be supported. Howl will add more features on top of this: The ability to create/use the exported data even in the absence of hive, using MR or Pig. Please see http://wiki.apache.org/pig/Howl/HowlImportExport for these details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mac Yang updated HIVE-1862: --- Attachment: HIVE-1862.3.patch.txt Incorporated Paul's feedback Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Mac Yang Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, HIVE-1862.3.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mac Yang updated HIVE-1862: --- Status: Patch Available (was: Open) Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Mac Yang Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, HIVE-1862.3.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #495
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/495/ -- [...truncated 21356 lines...] [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt [junit] Loading data to table srcbucket [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket1.txt' INTO TABLE srcbucket [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket [junit] OK [junit] PREHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] PREHOOK: type: CREATETABLE [junit] POSTHOOK: query: CREATE TABLE srcbucket2(key int, value string) CLUSTERED BY (key) INTO 4 BUCKETS STORED AS TEXTFILE [junit] POSTHOOK: type: CREATETABLE [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket20.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket21.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket22.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt [junit] Loading data to table srcbucket2 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/srcbucket23.txt' INTO TABLE srcbucket2 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@srcbucket2 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt [junit] Loading data to table src [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv1.txt' INTO TABLE src [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] PREHOOK: type: LOAD [junit] Copying data from https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt [junit] Loading data to table src1 [junit] POSTHOOK: query: LOAD DATA LOCAL INPATH 'https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/data/files/kv3.txt' INTO TABLE src1 [junit] POSTHOOK: type: LOAD [junit] POSTHOOK: Output: default@src1 [junit] OK [junit] PREHOOK: query: LOAD DATA LOCAL INPATH
[jira] Commented: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12983908#action_12983908 ] Paul Yang commented on HIVE-1862: - +1 looks good. Will commit if tests pass. Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Mac Yang Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, HIVE-1862.3.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1915) authorization on database level is broken.
[ https://issues.apache.org/jira/browse/HIVE-1915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-1915: - Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed. Thanks Yongqiang! authorization on database level is broken. -- Key: HIVE-1915 URL: https://issues.apache.org/jira/browse/HIVE-1915 Project: Hive Issue Type: Bug Components: Metastore, Security Reporter: He Yongqiang Assignee: He Yongqiang Fix For: 0.7.0 Attachments: HIVE-1915-2.patch, HIVE-1915-3.patch, HIVE-1915.1.patch CREATE DATABASE IF NOT EXISTS test_db COMMENT 'Hive test database'; SHOW DATABASES; grant `drop` on DATABASE test_db to user hive_test_user; grant `select` on DATABASE test_db to user hive_test_user; show grant user hive_test_user on DATABASE test_db; DROP DATABASE IF EXISTS test_db; will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hive storage handler using JDBC
Hi, The storage handler mechanism seems like an excellent way to support mixing hive with a traditional database using a generic JDBC storage handler. While that may not always be the best thing to do, is there any work targeted at this integration? Are there any issues or problems preventing such an integration? Any ideas/suggestions for implementation are also welcome! P.S. I think I've been posting this to a wrong alias and never saw a response. Sorry if you've already seen it. Thanks, Vijay
[jira] Created: (HIVE-1921) Better error message when a non-essential job fails
Better error message when a non-essential job fails --- Key: HIVE-1921 URL: https://issues.apache.org/jira/browse/HIVE-1921 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.7.0 Reporter: Paul Yang Assignee: Paul Yang Priority: Minor To determine whether a join can be converted into a map-join, a task is launched to determine memory requirements. If the task fails, then a normal join must be performed. This is not an error but the user sees a message like: {code} ... 2011-01-19 02:48:51 Processing rows:180 Hashtable size: 179 Memory usage: 818546352 rate: 0.789 2011-01-19 02:48:57 Processing rows:190 Hashtable size: 189 Memory usage: 861746352 rate: 0.831 2011-01-19 02:49:05 Processing rows:200 Hashtable size: 199 Memory usage: 904921384 rate: 0.873 2011-01-19 02:49:12 Processing rows:210 Hashtable size: 209 Memory usage: 952382416 rate: 0.918 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapredLocalTask ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask Launching Job 2 out of 2 ... {code} The wording makes it seem as if something went wrong, which is not necessarily the case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #497
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/497/changes Changes: [cws] HIVE-1915 Authorization on database level is broken (He Yongqiang via cws) -- [...truncated 7276 lines...] compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: install-hadoopcore: install-hadoopcore-default: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: [ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:retrieve] :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#contrib;working@minerva [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] :: resolution report :: resolve 1183ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 1 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#contrib [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 1 already retrieved (0kB/2ms) install-hadoopcore-internal: setup: compile: [echo] Compiling: hbase-handler [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build-common.xml:283: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: [echo] Jar: hbase-handler test: test-shims: test-conditions: gen-test: create-dirs: compile-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: compile: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: [ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:retrieve] :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#shims;working@minerva [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] found hadoop#core;0.20.3-CDH3-SNAPSHOT in hadoop-source [ivy:retrieve] :: resolution report :: resolve 2471ms :: artifacts dl 2ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 2 | 0 | 0 | 0 || 2 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#shims [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 2 already retrieved (0kB/2ms) install-hadoopcore-internal: build_shims: [echo] Compiling shims against hadoop 0.20.0 (https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/hadoopcore/hadoop-0.20.0)
Re: Hive storage handler using JDBC
Hi Vijay, There's a JIRA ticket open for this feature here: https://issues.apache.org/jira/browse/HIVE-1555 Ed Capriolo recently implemented a Hive storage handler for Cassandra, and may be able to give you some more pointers. Thanks. Carl On Wed, Jan 19, 2011 at 3:04 PM, Vijay tec...@gmail.com wrote: Hi, The storage handler mechanism seems like an excellent way to support mixing hive with a traditional database using a generic JDBC storage handler. While that may not always be the best thing to do, is there any work targeted at this integration? Are there any issues or problems preventing such an integration? Any ideas/suggestions for implementation are also welcome! P.S. I think I've been posting this to a wrong alias and never saw a response. Sorry if you've already seen it. Thanks, Vijay
Re: subscription to hive dev list
Hi Abhinav, Please see this link for information about subscribing to the various Hive mailing lists: http://hive.apache.org/mailing_lists.html Thanks. Carl On Wed, Jan 19, 2011 at 3:56 AM, abhinav narain abhinavnarai...@gmail.comwrote: hi, I am student working on Hive. Please grant me access to the mailing list regards, Abhinav narain
[jira] Assigned: (HIVE-1900) a mapper should be able to span multiple partitions
[ https://issues.apache.org/jira/browse/HIVE-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain reassigned HIVE-1900: Assignee: Namit Jain (was: He Yongqiang) a mapper should be able to span multiple partitions --- Key: HIVE-1900 URL: https://issues.apache.org/jira/browse/HIVE-1900 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain Currently, a mapper only spans a single partition which creates a problem in the presence of many small partitions (which is becoming a common usecase in facebook). If the plan is the same, a mapper should be able to span files across multiple partitions -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: patch review process
+1 for option 2. In general, we as a community should be nice to all contributors, and should avoid doing things that make contributors not comfortable, even that requires some work from committers. Sometimes it is especially true for new contributors, like we need to be more patience for new people. It seems a free style and contribution focused environment would be better to encourage people to do more contributions of different kinds. thanks -yongqiang On Wed, Jan 19, 2011 at 6:37 PM, Namit Jain nj...@fb.com wrote: It would be good to have a policy for submitting a new patch for review. If the patch is small, usually it is pretty easy to review.But, if it large, a GUI like reviewboard (https://reviews.apache.org) makes it easy. So, going forward, I would like to propose either of the following. 1. All patches must go through reviewboard 2. If a contributor/reviewer creates a reviewboard request, all subsequent review requests should go through the reviewboard. I would personally vote for 2., since for small patches, we don’t really need a reviewboard. But, please vote, and based on that, we can come up with a policy. Let us know, if you think of some other option. Thanks, -namit
[jira] Updated: (HIVE-1862) Revive partition filtering in the Hive MetaStore
[ https://issues.apache.org/jira/browse/HIVE-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Yang updated HIVE-1862: Resolution: Fixed Status: Resolved (was: Patch Available) Committed. Thanks Mac! Cool use of string manipulation. Hopefully, we'll find a workaround for those escaped partition names soon.. Revive partition filtering in the Hive MetaStore Key: HIVE-1862 URL: https://issues.apache.org/jira/browse/HIVE-1862 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.7.0 Reporter: Devaraj Das Assignee: Mac Yang Fix For: 0.7.0 Attachments: HIVE-1862.1.patch.txt, HIVE-1862.2.patch.txt, HIVE-1862.3.patch.txt, invoke_runqry.sh, qry, qry-sch.Z, runqry HIVE-1853 downgraded the JDO version. This makes the feature of partition filtering in the metastore unusable. This jira is to keep track of the lost feature and discussing approaches to bring it back. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.20 #498
See https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/498/changes Changes: [pauly] HIVE-1862 Revive partition filtering in the Hive MetaStore (Mac Yang via pauly) -- [...truncated 4238 lines...] jar: [echo] Jar: shims [jar] Building jar: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/shims/hive-shims-0.7.0-SNAPSHOT.jar create-dirs: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/classes [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/test [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/test/src [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/test/classes compile-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds jar: init: install-hadoopcore: install-hadoopcore-default: ivy-init-dirs: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar [get] To: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/ivy/lib/ivy-2.1.0.jar [get] Not modified - so not downloaded ivy-probe-antlib: ivy-init-antlib: ivy-init: ivy-retrieve-hadoop-source: [ivy:retrieve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:retrieve] :: loading settings :: file = https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ivy/ivysettings.xml [ivy:retrieve] :: resolving dependencies :: org.apache.hadoop.hive#common;working@minerva [ivy:retrieve] confs: [default] [ivy:retrieve] found hadoop#core;0.20.0 in hadoop-source [ivy:retrieve] :: resolution report :: resolve 1315ms :: artifacts dl 1ms - | |modules|| artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| - | default | 1 | 0 | 0 | 0 || 1 | 0 | - [ivy:retrieve] :: retrieving :: org.apache.hadoop.hive#common [ivy:retrieve] confs: [default] [ivy:retrieve] 0 artifacts copied, 1 already retrieved (0kB/3ms) install-hadoopcore-internal: setup: compile: [echo] Compiling: common [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build-common.xml:283: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds [javac] Compiling 5 source files to https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/classes [javac] Note: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java uses or overrides a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. jar: [echo] Jar: common [jar] Building jar: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/common/hive-common-0.7.0-SNAPSHOT.jar create-dirs: [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/serde [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/serde/classes [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/serde/test [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/serde/test/src [mkdir] Created dir: https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/build/serde/test/classes compile-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable builds deploy-ant-tasks: create-dirs: init: compile: [echo] Compiling: anttasks [javac] https://hudson.apache.org/hudson/job/Hive-trunk-h0.20/ws/hive/ant/build.xml:40: warning: 'includeantruntime' was not set, defaulting to build.sysclasspath=last; set to false for repeatable
Re: patch review process
The system that we have in place right now places all of the burden on the reviewer. If you want to look at a patch you have to download it, apply it to a clean workspace, view it using the diff viewer of your choice, and then copy your comments back to JIRA along with line numbers and code fragments in order to provide context for the author. If there's more than one reviewer, then everyone repeats these steps individually. From this perspective I think using ReviewBoard is a clear win. It eliminates the setup steps that are currently incumbent on the reviewer and consequently encourages more people to participate in the review process, which I think will result in higher quality code in the end. I think that the additional burden that ReviewBoard places on the contributor is very small (especially when compared to the effort invested in producing the patch in the first place) and can be mitigated by using tools like post-review ( http://www.reviewboard.org/docs/manual/dev/users/tools/post-review/). I'm +1 for option (1), meaning that I think people should be required to post a review request (or update an existing request) for every patch that they submit for review on JIRA. I also think excluding small patches from this requirement is a bad idea because rational people can disagree about what qualifies as a small patch and what does not, and I'd like people to make ReviewBoard a habit instead of something that they use occasionally. I think that Yongqiang's point about scaring away new contributors with lots of requirements is valid, and I'm more that willing to post a review request for a first (or second) time contributor, but in general it's important for the contributor to create the request since only the creator can update it. Thanks. Carl On Wed, Jan 19, 2011 at 6:48 PM, yongqiang he heyongqiang...@gmail.comwrote: +1 for option 2. In general, we as a community should be nice to all contributors, and should avoid doing things that make contributors not comfortable, even that requires some work from committers. Sometimes it is especially true for new contributors, like we need to be more patience for new people. It seems a free style and contribution focused environment would be better to encourage people to do more contributions of different kinds. thanks -yongqiang On Wed, Jan 19, 2011 at 6:37 PM, Namit Jain nj...@fb.com wrote: It would be good to have a policy for submitting a new patch for review. If the patch is small, usually it is pretty easy to review.But, if it large, a GUI like reviewboard (https://reviews.apache.org) makes it easy. So, going forward, I would like to propose either of the following. 1. All patches must go through reviewboard 2. If a contributor/reviewer creates a reviewboard request, all subsequent review requests should go through the reviewboard. I would personally vote for 2., since for small patches, we don’t really need a reviewboard. But, please vote, and based on that, we can come up with a policy. Let us know, if you think of some other option. Thanks, -namit
Re: Review Request: HIVE-1636: Implement SHOW TABLES {FROM | IN} db_name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/323/ --- (Updated 2011-01-19 21:06:38.069779) Review request for hive. Changes --- Adding some tests to show_tables.q and database.q Summary --- Review request for HIVE-1636. This implements the syntax SHOW TABLES [{FROM | IN} db_name] [table_pattern]. This addresses bug HIVE-1636. https://issues.apache.org/jira/browse/HIVE-1636 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java 32c6e72 ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java df7e0f9 ql/src/java/org/apache/hadoop/hive/ql/parse/Hive.g 128f3a6 ql/src/java/org/apache/hadoop/hive/ql/plan/ShowTablesDesc.java ec9e933 ql/src/test/queries/clientpositive/database.q 2b6c911 ql/src/test/queries/clientpositive/show_tables.q 1fa78bf ql/src/test/results/clientpositive/database.q.out a74f9ea ql/src/test/results/clientpositive/show_tables.q.out 0bbd81b Diff: https://reviews.apache.org/r/323/diff Testing --- Thanks, Jonathan
[jira] Updated: (HIVE-1636) Implement SHOW TABLES {FROM | IN} db_name
[ https://issues.apache.org/jira/browse/HIVE-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Natkins updated HIVE-1636: --- Status: Patch Available (was: Open) https://reviews.apache.org/r/323/diff/ Implement SHOW TABLES {FROM | IN} db_name --- Key: HIVE-1636 URL: https://issues.apache.org/jira/browse/HIVE-1636 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Carl Steinbach Assignee: Jonathan Natkins Attachments: HIVE-1636.1.patch.txt, HIVE-1636.2.patch.txt Make it possible to list the tables in a specific database using the following syntax borrowed from MySQL: {noformat} SHOW TABLES [{FROM|IN} db_name] {noformat} See http://dev.mysql.com/doc/refman/5.0/en/show-tables.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: The control flow of hive
Hi Abhinav, If you are looking in source you should start looking at the code in SemanticAnalyser.java * For actual execution and result handling take a look at the Driver class, this method calls up semantic analyser, generates plans and tasks and takes care of execution. You can take a look at the code flow from there. * The metadata information is fetched in method getMetadata method in SemanticAnalyser. Which inturn calls BaseSemanticAnalysers db to get metadata information. The protocol used to talk to metastore is thrift. * And result fetching is done thro fetch operator. You can take a look at the explain extended in the query to see the parameters. -- Sreekanth On 1/20/11 11:38 AM, abhinav narain abhinavnarai...@gmail.com wrote: Hi, I have two questions to ask, 1. How does one have access to metastore of Hive, to retrieve the schema information from it. I cant find a file or other such thing in source code. 2. How does one get the handle of the result that is produced after the query. If someone can tell about where to look for the answers, that will also help regards, Abhinav -- Sreekanth Ramakrishnan
Re: The control flow of hive
Hi Abhinav, 1. How does one have access to metastore of Hive, to retrieve the schema information from it. I cant find a file or other such thing in source code. Take a look at the IMetaStoreClient interface and HiveMetaStoreClient class, but bear in mind that the fundamental definition of the MetaStore interface is contained in the Thrift IDL file located here: metastore/if/hive_metastore.thrift. IMetaStoreClient actually defines a wrapper interface around the code generated by the Thrift compiler based on the definitions in hive_metastore.thrift You can also find some good code examples in TestHiveMetaStore. 2. How does one get the handle of the result that is produced after the query. If someone can tell about where to look for the answers, that will also help Here are the relevant pieces of code that you should look at: service/if/hive_service.thrift service/src/java/org/apache/hadoop/hive/service/HiveServer.java service/src/test/org/apache/hadoop/hive/service/TestHiveServer.java The interface for executing queries and fetching results is defined in hive_service.thrift and consists of the following methods: void execute(string query) string fetchOne() liststring fetchN(i32 numRows) liststring fetchAll() Since execute() does not return a query ID the Thrift client is limited to executing/fetching the results of a single query at a time. Hope this helps. Carl