Re: select count(*) from table;
If you have enabled performance optimization by enabling statistics it will come from there if the underlying file format supports infile statistics (like ORC), it will come from there if its just plain vanilla text file format, it needs to run a job to get the count so the longest of all On Tue, Mar 22, 2016 at 12:44 PM, Amey Barve <ameybarv...@gmail.com> wrote: > select count(*) from table; > > How does hive evaluate count(*) on a table? > > Does it return count by actually querying table, or directly return count > by consulting some statistics locally. > > For Hive's Text format it takes few seconds while Hive's Orc format takes > fraction of seconds. > > Regards, > Amey > -- Nitin Pawar
Re: Possible bug loading data in Hive.
)) - Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( Hive.java:1454) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition( Hive.java:1158) at org.apache.hadoop.hive.ql.exec.MoveTask.execute( MoveTask.java:304) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1331) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) at org.apache.hadoop.hive.service.HiveServer$ HiveServerHandler.execute(HiveServer.java:191) at org.apache.hadoop.hive.service.ThriftHive$Processor$ execute.getResult(ThriftHive.java:630) at org.apache.hadoop.hive.service.ThriftHive$Processor$ execute.getResult(ThriftHive.java:618) at org.apache.thrift.ProcessFunction.process( ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process( TBaseProcessor.java:34) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run( TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker. runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter partition. at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( Hive.java:429) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( Hive.java:1446) ... 16 more Caused by: MetaException(message:The transaction for alter partition did not commit successfully.) at org.apache.hadoop.hive.metastore.ObjectStore. alterPartition(ObjectStore.java:1927) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore. invoke(RetryingRawStore.java:111) at $Proxy0.alterPartition(Unknown Source) at org.apache.hadoop.hive.metastore.HiveAlterHandler. alterPartition(HiveAlterHandler.java:254) at org.apache.hadoop.hive.metastore.HiveMetaStore$ HMSHandler.rename_partition(HiveMetaStore.java:1816) at org.apache.hadoop.hive.metastore.HiveMetaStore$ HMSHandler.rename_partition(HiveMetaStore.java:1788) at org.apache.hadoop.hive.metastore.HiveMetaStore$ HMSHandler.alter_partition(HiveMetaStore.java:1771) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient. alter_partition(HiveMetaStoreClient.java:834) at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( Hive.java:425) ... 17 more 2014-06-08 20:16:34,852 ERROR ql.Driver (SessionState.java:printError(403)) - FAILED: Execution Error, return code 1 from org.apache.hadoop .hive.ql.exec.MoveTask -- *Fernando Agudo Tarancón* /Big Data Software Engineer/ Telf.: +34 917 680 490 Fax: +34 913 833 301 C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain _http://www.bidoop.es_ -- Nitin Pawar
Re: Possible bug loading data in Hive.
The error you see is with hive metastore and these issues were kind of related to two sided 1) Load on metastore 2) datanuclueas related For now if possible, see if you can restart hive metastore and that resolves your issue. On Tue, Jun 10, 2014 at 3:27 PM, Fernando Agudo fag...@pragsis.com wrote: I have problems to upgrade to hive-0.13 or 0.12 because is in production. Only have this configuration of the datanuclues: property namedatanucleus.fixedDatastore/name valuetrue/value /property property namedatanucleus.autoCreateSchema/name valuefalse/value /property This is relevant for the problem? Thanks, On 10/06/14 10:53, Nitin Pawar wrote: Hive 0.9.0 with CDH4.1 --- This is very old release. I would recommend to upgrade to hive-0.13 or at least 0.12 and see. Error you are seeing is on loading data into a partition and metastore alter/add partition is failing. Can you try upgrading and see if that resolves your issue? If not can you share your datanuclues related settings in hive On Tue, Jun 10, 2014 at 2:16 PM, Fernando Agudo fag...@pragsis.com wrote: Hello, I'm working with Hive 0.9.0 with CDH4.1. I have a process which it's loading data in Hive every minute. It creates the partition if it's necessary. I have been monitoring this process for three days and I realize that there's a method (*listStorageDescriptorsWithCD*) which increases the execution time. First execution this method lasted about 15 millisencond and in the end it took more than 3 seconds (three days later), after that, Hive throws an exception and starts working again. I have checking this method but I haven't figured out any suspicious, could it be a bug? *2014-06-05 09:58:20,921* DEBUG metastore.ObjectStore (ObjectStore.java: listStorageDescriptorsWithCD(2036)) - Executing listStorageDescriptorsWithCD *2014-06-05 09:58:20,928* DEBUG metastore.ObjectStore (ObjectStore.java: listStorageDescriptorsWithCD(2045)) - Done executing query for listStorageDescriptorsWithCD *2014-06-08 20:15:33,867* DEBUG metastore.ObjectStore (ObjectStore.java: listStorageDescriptorsWithCD(2036)) - Executing listStorageDescriptor sWithCD *2014-06-08 20:15:36,134* DEBUG metastore.ObjectStore (ObjectStore.java: listStorageDescriptorsWithCD(2045)) - Done executing query for listSt orageDescriptorsWithCD 2014-06-08 20:16:34,600 DEBUG metastore.ObjectStore (ObjectStore.java: removeUnusedColumnDescriptor(1989)) - execute removeUnusedColumnDescr iptor *2014-06-08 20:16:34,600 DEBUG metastore.ObjectStore (ObjectStore.java: listStorageDescriptorsWithCD(2036)) - Executing listStorageDescriptor** **sWithCD* 2014-06-08 20:16:34,805 ERROR metadata.Hive (Hive.java:getPartition(1453)) - org.apache.hadoop.hive.ql.metadata.HiveException: Unable to al ter partition. at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition( Hive.java:429) at org.apache.hadoop.hive.ql.metadata.Hive.getPartition( Hive.java:1446) at org.apache.hadoop.hive.ql.metadata.Hive.loadPartition( Hive.java:1158) at org.apache.hadoop.hive.ql.exec.MoveTask.execute( MoveTask.java:304) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task. java:153) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential( TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java: 1331) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1117) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:950) at org.apache.hadoop.hive.service.HiveServer$ HiveServerHandler.execute(HiveServer.java:191) at org.apache.hadoop.hive.service.ThriftHive$Processor$ execute.getResult(ThriftHive.java:630) at org.apache.hadoop.hive.service.ThriftHive$Processor$ execute.getResult(ThriftHive.java:618) at org.apache.thrift.ProcessFunction.process( ProcessFunction.java:32) at org.apache.thrift.TBaseProcessor.process( TBaseProcessor.java:34) at org.apache.thrift.server.TThreadPoolServer$ WorkerProcess.run( TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker. runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: MetaException(message:The transaction for alter partition did not commit successfully.) at org.apache.hadoop.hive.metastore.ObjectStore. alterPartition(ObjectStore.java:1927) at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke( DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hive.metastore.RetryingRawStore. invoke(RetryingRawStore.java:111) at $Proxy0
Re: Scheduling the next Hive Contributors Meeting
I am not a contributor but a spectator to what hive have been doing last couple of years. I work out of India and would love to just sit back and listen to all the new upcoming things (if that's allowed) :) On Sat, Nov 9, 2013 at 1:08 AM, Brock Noland br...@cloudera.com wrote: Hi, Thanks Carl and Thejas! I would be attending remotely so the webex or google hangout would be very much appreciated. Please let me know if there is anything I can do to help enable either a webex or hangout! The Apache Sentry (incubating)[1] community which depends on Hive would be interested in briefly describing the project to the Hive community and discuss how we can work together to move both projects forward! As a side note, there have been lively discussions on the integration of other incubating projects therefore I'd just like to share that the changes Sentry is interested in are very small in scope and unlikely to cause disruption to the Hive community. Cheers! Brock [1] http://incubator.apache.org/projects/sentry.html On Fri, Nov 8, 2013 at 1:08 PM, Carl Steinbach c...@apache.org wrote: We're long overdue for a Hive Contributors Meeting. Thejas has offered to host the next meeting at Hortonworks on November 19th from 4-6pm. We will have a Google Hangout or Webex setup for people who wish to attend remotely. If you want to attend but can't because of a scheduling conflict please let us know. If enough people fall into this category we will try to reschedule. Thanks. Carl -- Nitin Pawar
Re: Skip trash while dropping Hive table
On hive cli I normally set this set fs.trash.interval=0; in hiverc and use it This setting is hdfs related and I would not recommend it setting it on hdfs-site.xml as it will then apply across hdfs which is not desirable most of the times. On Tue, Nov 5, 2013 at 5:28 AM, Chu Tong chut...@altiscale.com wrote: Hi all, Is there an existing way to drop Hive tables without having the deleted files hitting trash? If not, can we add something similar to Hive for this? Thanks a lot. -- Nitin Pawar
Re: Single Mapper - HIVE 0.11
whats the size of the table? (in GBs? ) Whats the max and min split sizes have you provied? On Wed, Oct 9, 2013 at 10:28 PM, Gourav Sengupta gourav.had...@gmail.comwrote: Hi, I am trying to run a join using two tables stored in ORC file format. The first table has 34 million records and the second has around 300,000 records. Setting set hive.auto.convert.join=true makes the entire query run via a single mapper. In case I am setting set hive.auto.convert.join=false then there are two mappers first one reads the second table and then the entire large table goes through the second mapper. Is there something that I am doing wrong because there are three nodes in the HADOOP cluster currently and I was expecting that at least 6 mappers should have been used. Thanks and Regards, Gourav -- Nitin Pawar
[jira] [Created] (HIVE-5432) self join for a table with serde definition fails with classNotFoundException, single queries work fine
Nitin Pawar created HIVE-5432: - Summary: self join for a table with serde definition fails with classNotFoundException, single queries work fine Key: HIVE-5432 URL: https://issues.apache.org/jira/browse/HIVE-5432 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.11.0 Environment: rhel6.4 Reporter: Nitin Pawar Steps to reproduce hive add jar /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar; Added /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar to class path Added resource: /home/hive/udfs/hive-serdes-1.0-SNAPSHOT.jar hive create table if not exists test(a string,b string) ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'; OK Time taken: 0.159 seconds hive load data local inpath '/tmp/1' overwrite into table test; Copying data from file:/tmp/1 Copying file: file:/tmp/1 Loading data to table default.test Table default.test stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 51, raw_data_size: 0] OK Time taken: 0.659 seconds hive select a from test; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... ... hive select * from (select b from test where a=test)x join (select b from test where a=test1)y on (x.b = y.b); Total MapReduce jobs = 1 setting HADOOP_USER_NAMEhive Execution log at: /tmp/hive/.log java.lang.ClassNotFoundException: com.cloudera.hive.serde.JSONSerDe Continuing ... 2013-10-03 05:13:00 Starting to launch local task to process map join; maximum memory = 1065484288 org.apache.hadoop.hive.ql.metadata.HiveException: Failed with exception nulljava.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.FetchOperator.getRowInspectorFromTable(FetchOperator.java:230) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:595) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) at org.apache.hadoop.hive.ql.exec.FetchOperator.getOutputObjectInspector(FetchOperator.java:631) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.initializeOperators(MapredLocalTask.java:406) at org.apache.hadoop.hive.ql.exec.MapredLocalTask.executeFromChildJVM(MapredLocalTask.java:290) at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:682) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:160) Execution failed with exit status: 2 Obtaining error information Task failed! Task ID: -- This message was sent by Atlassian JIRA (v6.1#6144)
Self join issue
Hi, I just raised a ticket for a table with self join query. Table is created with json serde provided by cloudera. When I run a single query on the table like select col from table where col='xyz', this works perfectly fine with a mapreduce job. but when I try to run the query of self join on the table it says serde not found on query parsing. i have mentioned the steps in detail on JIRA HIVE-5432https://issues.apache.org/jira/browse/HIVE-5432 . Can somebody tell what's special when the query is parsed for join and stand alone query? Due to this issue, I have to create temporary tables and make sure I clean them up myself after the jobs are over. Thanks, Nitin Pawar
Re: Error - loading data into tables
Manickam, I am really not sure if hive supports Federated namespaces yet. I have cc'd dev list. May be any of the core hive developers will be able to tell how to load data using hive on a federated hdfs. On Tue, Oct 1, 2013 at 12:59 PM, Manickam P manicka...@outlook.com wrote: Hi Pawar, I tried that option but not working. I have a federated HDFS cluster and given below is my core site xml. I created the HDFS directory inside that /home/storage/mount1 and tried to load the file now also i'm getting the same error. Can you pls tell me what mistake i'm doing here? bcoz i dont have any clue. *configuration* * * * property* * namefs.default.name/name* * valueviewfs:value* * /property* * property* * namefs.viewfs.mounttable.default.link./home/storage/mount1/name* * valuehdfs://10.108.99.68:8020/value* * /property* * property* * namefs.viewfs.mounttable.default.link./home/storage/mount2/name* * valuehdfs://10.108.99.69:8020/value* * /property * */configuration* Thanks, Manickam Ppa -- Date: Mon, 30 Sep 2013 21:53:03 +0530 Subject: Re: Error - loading data into tables From: nitinpawar...@gmail.com To: u...@hive.apache.org Is this /home/strorage/... a hdfs directory? I think its a normal filesystem directory. Try running this load data local inpath '*/home/storage/mount1/tabled.txt' INTO TABLE TEST; * On Mon, Sep 30, 2013 at 7:13 PM, Manickam P manicka...@outlook.comwrote: Hi, I'm getting the below error while loading the data into hive table. *return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask* * * I used * LOAD DATA INPATH '/home/storage/mount1/tabled.txt' INTO TABLE TEST;* this query to insert into table. Thanks, Manickam P -- Nitin Pawar -- Nitin Pawar
Re: Hive Issue
) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:52 1) ... 44 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:529) at java.net.Socket.connect(Socket.java:478) at java.net.Socket.init(Socket.java:375) at java.net.Socket.init(Socket.java:218) at com.mysql.jdbc.StandardSocketFactory.connect(StandardSocketFactory.java:257) at com.mysql.jdbc.MysqlIO.init(MysqlIO.java:294) ... 63 more Nested Throwables StackTrace: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the serve r. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1116) at com.mysql.jdbc.MysqlIO.init(MysqlIO.java:344) at com.mysql.jdbc.ConnectionImpl.coreConnect(ConnectionImpl.java:2332) at com.mysql.jdbc.ConnectionImpl.connectOneTryOnly(ConnectionImpl.java:2369) at com.mysql.jdbc.ConnectionImpl.createNewIO(ConnectionImpl.java:2153) at com.mysql.jdbc.ConnectionImpl.init(ConnectionImpl.java:792) at com.mysql.jdbc.JDBC4Connection.init(JDBC4Connection.java:47) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at com.mysql.jdbc.Util.handleNewInstance(Util.java:411) at com.mysql.jdbc.ConnectionImpl.getInstance(ConnectionImpl.java:381) at com.mysql.jdbc.NonRegisteringDriver.connect(NonRegisteringDriver.java:305) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at org.apache.commons.dbcp.DriverManagerConnectionFactory.createConnection(DriverManagerConnectionFactory.java:75) at org.apache.commons.dbcp.PoolableConnectionFactory.makeObject(PoolableConnectionFactory.java:582) at org.apache.commons.pool.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:1148) at org.apache.commons.dbcp.PoolingDataSource.getConnection(PoolingDataSource.java:106) at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:52 1) at org.datanucleus.store.rdbms.RDBMSStoreManager.init(RDBMSStoreManager.java:290) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:593) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:300) at org.datanucleus.ObjectManagerFactoryImpl.initialiseStoreManager(ObjectManagerFactoryImpl.java:161) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:583) -- Nitin Pawar
Re: Last time request for cwiki update privileges
are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. CONFIDENTIALITY NOTICE == This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator. -- Nitin Pawar
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
The mentioned flow is called when you have unsecure mode of thrift metastore client-server connection. So one way to avoid this is have a secure way. code public boolean process(final TProtocol in, final TProtocol out) throwsTException { setIpAddress(in); ... ... ... @Override protected void setIpAddress(final TProtocol in) { TUGIContainingTransport ugiTrans = (TUGIContainingTransport)in.getTransport(); Socket socket = ugiTrans.getSocket(); if (socket != null) { setIpAddress(socket); /code From the above code snippet, it looks like the null pointer exception is not handled if the getSocket returns null. can you check whats the ulimit setting on the server? If its set to default can you set it to unlimited and restart hcat server. (This is just a wild guess). also the getSocket method suggests If the underlying TTransport is an instance of TSocket, it returns the Socket object which it contains. Otherwise it returns null. so someone from thirft gurus need to tell us whats happening. I have no knowledge of this depth may be Ashutosh or Thejas will be able to help on this. From the netstat close_wait, it looks like the hive metastore server has not closed the connection (do not know why yet), may be the hive dev guys can help.Are there too many connections in close_wait state? On Tue, Jul 30, 2013 at 5:52 AM, agateaaa agate...@gmail.com wrote: Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69
Re: HCatalog (from Hive 0.11) and Hadoop 2
There is a build scheduled on jenkins for hive trunk which is failing. I will give it a try on my local for hive-011, there is another build which does the ptests which is disabled due to lots of test case failures. https://builds.apache.org/job/Hive-trunk-hadoop2/ I will update you if I could build it On Mon, Jul 29, 2013 at 8:07 PM, Rodrigo Trujillo rodrigo.truji...@linux.vnet.ibm.com wrote: Hi, is it possible to build Hive 0.11 and HCatalog with Hadoop 2 (2.0.4-alpha) ?? Regards, Rodrigo -- Nitin Pawar
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 31 more -- Nitin Pawar
Re: ant maven-build not working in trunk
I just tried a build with both jdk versions build = ant clean package jdk7 on branch-0.10 with patch from HIVE-3384 and it works jdk6 on trunk without any changes it works i created a new redhat vm and installed sun jdk 6u43 and tried it. It works too. when i try ant maven-build -Dmvn.publish.repo=local it does fail with make-pom target not existing. Alan has a Jira on this: https://issues.apache.org/jira/browse/HIVE-4387 There is a patch available there for branch-0.11. I will try to build with that. On Thu, Jun 13, 2013 at 10:14 AM, amareshwari sriramdasu amareshw...@gmail.com wrote: Nitin, Hive does not compile with jdk7. You have to use jdk6 for compiling On Wed, Jun 12, 2013 at 9:42 PM, Nitin Pawar nitinpawar...@gmail.com wrote: I tried the build on trunk i did not hit the issue of make-pom but i hit the issue of jdbc with jdk7. I will apply the patch and try again On Wed, Jun 12, 2013 at 4:48 PM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, ant maven-build -Dmvn.publish.repo=local fails to build hcatalog with following error : /home/amareshwaris/hive/build. xml:121: The following error occurred while executing this line: /home/amareshwaris/hive/build.xml:123: The following error occurred while executing this line: Target make-pom does not exist in the project hcatalog. Was curious to know if I'm only one facing this or Is there anyother way to publish maven artifacts locally? Thanks Amareshwari -- Nitin Pawar -- Nitin Pawar
Re: ant maven-build not working in trunk
I tried the build on trunk i did not hit the issue of make-pom but i hit the issue of jdbc with jdk7. I will apply the patch and try again On Wed, Jun 12, 2013 at 4:48 PM, amareshwari sriramdasu amareshw...@gmail.com wrote: Hello, ant maven-build -Dmvn.publish.repo=local fails to build hcatalog with following error : /home/amareshwaris/hive/build. xml:121: The following error occurred while executing this line: /home/amareshwaris/hive/build.xml:123: The following error occurred while executing this line: Target make-pom does not exist in the project hcatalog. Was curious to know if I'm only one facing this or Is there anyother way to publish maven artifacts locally? Thanks Amareshwari -- Nitin Pawar
adding a new property for hive history file HIVE-1708
Hi Guys, I am trying to work on this JIRA HIVE-1708https://issues.apache.org/jira/browse/HIVE-1708 . I have added one property HIVE_CLI_ENABLE_LOGGING to enable or disable the history and tested it. I am stuck at a point what should be the default value for HIVE_CLI_HISTORY_FILE_PATH? Currently this is set to String historyDirectory = System.getProperty(user.home); String historyFile = historyDirectory + File.separator + HISTORYFILE; Any ideas on what will be the default path then ? -- Nitin Pawar
Re: plugable partitioning
whenever you create a partition in hive, it needs to be registered with the metadata store. So short answer would be partition data is looked from metadata store instead of the actual source data. having a lot of partitions does slow down hive (around 1+). Normally have not seen anyone using hourly partitions. You may want to look at adding daily partition and bucket by hour. but if you are adding data directly into partition directories then there is no alternative other than adding partitions to metadata store manually apart from alter partition. if you are using hcatalog as metadata store then it does provide an api to register your partition so you can automate your data loading and registering both in a single flow. Others will correct me if I have made any wrong assumption On Mon, Apr 15, 2013 at 8:15 PM, Steve Hoffman ste...@goofy.net wrote: Looking for some pointers on where the partitioning is figured out in the source when a query is executed. I'm investigating an alternative partitioning scheme based on date patterns (using external tables). The situation is that I have data being written to some HDFS root directory with some dated pattern (i.e. /MM/DD). Today I have to run an alter table to insert this partition every day. It gets worse if you have hourly partitions. This seems like it can be described once (root + date partition pattern in the metastore). So looking for some pointers on where in the code this is currently handled. Thanks, Steve -- Nitin Pawar
[jira] [Commented] (HIVE-1708) make hive history file configurable
[ https://issues.apache.org/jira/browse/HIVE-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13630609#comment-13630609 ] Nitin Pawar commented on HIVE-1708: --- I did add a new setting to hive-site.xml and made some change in the cli code and tested it for making hive history optional. I wanted to add one more property for the hive history file path but currently it is set to .hivehistory inside each individual users home directory. If I have to retain this property how will I keep the default value in hive-site.xml. As all the users will have different home directories on different linux distributions, how do we default the path then? can we change the file path to something like log location which resides inside /tmp ? Is that an acceptable change? make hive history file configurable --- Key: HIVE-1708 URL: https://issues.apache.org/jira/browse/HIVE-1708 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Namit Jain Currentlly, it is derived from System.getProperty(user.home)/.hivehistory; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4231) Build fails with WrappedRuntimeException: Content is not allowed in prolog. when _JAVA_OPTIONS=-Dfile.encoding=UTF-8
[ https://issues.apache.org/jira/browse/HIVE-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13628789#comment-13628789 ] Nitin Pawar commented on HIVE-4231: --- Even I am running into same issue when trying to build hive project I have same environment at Sho but OS is rhel 6.3 the log says exactly same and to add to the log contents for the failed xml file is [root@localhost branch-0.10]# cat /root/apache/hive/branch-0.10/build/builtins/metadata/class-info.xml ClassList Exception in thread main java.lang.UnsupportedClassVersionError: org/apache/hadoop/hive/ql/exec/Description : Unsupported major.minor version 51.0 at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631) at java.lang.ClassLoader.defineClass(ClassLoader.java:615) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141) at java.net.URLClassLoader.defineClass(URLClassLoader.java:283) at java.net.URLClassLoader.access$000(URLClassLoader.java:58) at java.net.URLClassLoader$1.run(URLClassLoader.java:197) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at org.apache.hive.pdk.FunctionExtractor.main(FunctionExtractor.java:27) [root@localhost branch-0.10]# Build fails with WrappedRuntimeException: Content is not allowed in prolog. when _JAVA_OPTIONS=-Dfile.encoding=UTF-8 Key: HIVE-4231 URL: https://issues.apache.org/jira/browse/HIVE-4231 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Sho Shimauchi Priority: Minor Build failed with the follwing error when I set _JAVA_OPTIONS to -Dfile.encoding=UTF-8: {code} extract-functions: [xslt] Processing /Users/sho/src/apache/hive/build/builtins/metadata/class-info.xml to /Users/sho/src/apache/hive/build/builtins/metadata/class-registration.sql [xslt] Loading stylesheet /Users/sho/src/apache/hive/pdk/scripts/class-registration.xsl [xslt] : Error! Content is not allowed in prolog. [xslt] : Error! com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Content is not allowed in prolog. [xslt] Failed to process /Users/sho/src/apache/hive/build/builtins/metadata/class-info.xml BUILD FAILED /Users/sho/src/apache/hive/build.xml:517: The following error occurred while executing this line: /Users/sho/src/apache/hive/builtins/build.xml:37: The following error occurred while executing this line: /Users/sho/src/apache/hive/pdk/scripts/build-plugin.xml:118: javax.xml.transform.TransformerException: javax.xml.transform.TransformerException: com.sun.org.apache.xml.internal.utils.WrappedRuntimeException: Content is not allowed in prolog. at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:735) at com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl.transform(TransformerImpl.java:336) at org.apache.tools.ant.taskdefs.optional.TraXLiaison.transform(TraXLiaison.java:194) at org.apache.tools.ant.taskdefs.XSLTProcess.process(XSLTProcess.java:852) at org.apache.tools.ant.taskdefs.XSLTProcess.execute(XSLTProcess.java:388) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106) at org.apache.tools.ant.Task.perform(Task.java:348) at org.apache.tools.ant.Target.execute(Target.java:390) at org.apache.tools.ant.Target.performTasks(Target.java:411) at org.apache.tools.ant.Project.executeSortedTargets(Project.java:1399) at org.apache.tools.ant.helper.SingleCheckExecutor.executeTargets(SingleCheckExecutor.java:38) at org.apache.tools.ant.Project.executeTargets(Project.java:1251) at org.apache.tools.ant.taskdefs.Ant.execute(Ant.java:442) at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291) at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java
Hive compilation issues on branch-0.10 and trunk
Hello, I am trying to build hive on both trunk and branch-0.10 I have tried both SUN JDK6 and JDK7 With both the version running into different issues with JDK6 running into issue mentioned at HIVE-4231 with JDK7 running into issue mentioned at HIVE-3384 can somebody please help out with this? What would be recommended JDK version going forward for development activities ? -- Nitin Pawar
Re: Hive compilation issues on branch-0.10 and trunk
Hi Mark, Yes I applied the patch and got it working with JDK7. Can we continue using JDK7? Thanks, Nitin On Apr 11, 2013 8:48 PM, Mark Grover grover.markgro...@gmail.com wrote: Nitin, I have been able to build hive trunk with JDK 1.6. Did you try the workaround listed in HIVE-4231? Mark On Thu, Apr 11, 2013 at 2:42 AM, Nitin Pawar nitinpawar...@gmail.com wrote: Hello, I am trying to build hive on both trunk and branch-0.10 I have tried both SUN JDK6 and JDK7 With both the version running into different issues with JDK6 running into issue mentioned at HIVE-4231 with JDK7 running into issue mentioned at HIVE-3384 can somebody please help out with this? What would be recommended JDK version going forward for development activities ? -- Nitin Pawar
[jira] [Created] (HIVE-2980) Show a warning or an error when the data directory is empty or not existing
Nitin Pawar created HIVE-2980: - Summary: Show a warning or an error when the data directory is empty or not existing Key: HIVE-2980 URL: https://issues.apache.org/jira/browse/HIVE-2980 Project: Hive Issue Type: Improvement Reporter: Nitin Pawar It looks like a good idea to show a warning or an error when the data directory is missing or empty. This will help in cut down the debugging time as well a good information to have on the deleted data -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-2814) Can we have a feature to disable creating empty buckets on a larger number of buckets creates?
Can we have a feature to disable creating empty buckets on a larger number of buckets creates? --- Key: HIVE-2814 URL: https://issues.apache.org/jira/browse/HIVE-2814 Project: Hive Issue Type: Bug Reporter: Nitin Pawar Priority: Minor When we create buckets on a larger datasets, its not often that all the partitions have same number of buckets so we choose the largest possible number to capture the buckets mostly. It results into creating lot of empty buckets, which might be an overhead of hadoop as well as for hive queries. Also it takes a lot of time to just create empty buckets. Is there a way where I can say do not create empty buckets? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira