We have been struggling to get a reliable system working where we interact with
Hive over JDBC a lot. The pattern we see is that everything starts ok but the
memory used by the Hive server process grows over time and after some hundreds
of operations we start to see exceptions.
To ensure there was nothing stupid in our code causing this I took the example
code from the wiki page for Hive 2 clients and put that in a loop. For us
after about 80 runs we would see exceptions as below.
2014-04-21 07:31:02,251 ERROR [pool-5-thread-5]: server.TThreadPoolServer
(TThreadPoolServer.java:run(215)) - Error occurred during processing of message.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.apache.thrift.transport.TTransportException
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:178)
at
org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at
org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at
org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at
org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
This is also sometimes accompanied by out of memory exceptions.
The code on the wiki did not close statements and adding that in changes the
behaviour instead of exceptions things just lock up after a while and there is
high CPU usage.
This looks similar to HIVE-5296 but that was fixed in 0.12 so should not be an
issue in 0.13 I assume. Issues fixed in 0.13.1 don’t seem to relate to this
either. The only way to get Hive back up and running is to restart.
Before raising a JIRA I wanted to make sure I wasn’t missing something so any
suggestions would be greatly appreciated.
Full code as below.
import java.sql.*;
public class HiveOutOfMem {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException{
for(int i =0; i < 100000; i++){
System.out.println("Run number " + i);
run();
}
}
/**
* @param
* @throws SQLException
*/
public static void run() throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
//replace "hive" here with the name of the user the queries should
run as
Connection con =
DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "hive", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.execute("drop table if exists " + tableName);
stmt.execute("create external table " + tableName + " (key int,
value string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
// describe table
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
//stmt.close();
con.close();
}
}