Hi Prasad, Thanks for the detail review of it. I also figure out multiple connection would make sense. I create https://issues.apache.org/jira/browse/ZEPPELIN-487 and I will fix it.
Regards, Jongyoul On Sun, Dec 6, 2015 at 6:05 AM, Prasad Wagle <prasadwa...@gmail.com> wrote: > Hi Jongyoul, > > I ran experiments below and found that we need multiple connections for > parallel execution in Vertica. > > *Same Connection* > scala> val connection = DriverManager.getConnection(url, username, > password) > connection: java.sql.Connection = > com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f > > scala> val thread1 = new Thread { > | override def run { > | val statement = connection.createStatement() > | println("Thread1: before execute at " + > Calendar.getInstance().getTime()) > | val resultSet = statement.executeQuery("SELECT sleep(10) as > col1, 1 as col2") > | resultSet.next() > | val col1 = resultSet.getString("col1") > | val col2 = resultSet.getString("col2") > | resultSet.next() > | println("Thread1: col " + col1 + " " + col2 + " at " + > Calendar.getInstance().getTime()) > | } > | } > thread1: Thread = Thread[Thread-3,5,main] > > scala> val thread2 = new Thread { > | override def run { > | val statement = connection.createStatement() > | println("Thread2: before execute at " + > Calendar.getInstance().getTime()) > | val resultSet = statement.executeQuery("SELECT sleep(10) as > col1, 2 as col2") > | resultSet.next() > | val col1 = resultSet.getString("col1") > | val col2 = resultSet.getString("col2") > | resultSet.next() > | println("Thread2: col " + col1 + " " + col2 + " at " + > Calendar.getInstance().getTime()) > | } > | } > thread2: Thread = Thread[Thread-4,5,main] > > scala> thread1.start > > scala> thread2.start > Thread1: before execute at Sat Dec 05 12:38:59 PST 2015 > Thread2: before execute at Sat Dec 05 12:38:59 PST 2015 > Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015 > Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015 > > *Two Connections* > > scala> val thread1 = new Thread { > | override def run { > | val connection = DriverManager.getConnection(url, username, > password) > | val statement = connection.createStatement() > | println("Thread1: before execute at " + > Calendar.getInstance().getTime()) > | val resultSet = statement.executeQuery("SELECT sleep(10) as > col1, 1 as col2") > | resultSet.next() > | val col1 = resultSet.getString("col1") > | val col2 = resultSet.getString("col2") > | println("Thread1: col " + col1 + " " + col2 + " at " + > Calendar.getInstance().getTime()) > | } > | } > thread1: Thread = Thread[Thread-3,5,main] > > scala> val thread2 = new Thread { > | override def run { > | val connection = DriverManager.getConnection(url, username, > password) > | val statement = connection.createStatement() > | println("Thread2: before execute at " + > Calendar.getInstance().getTime()) > | val resultSet = statement.executeQuery("SELECT sleep(10) as > col1, 2 as col2") > | resultSet.next() > | val col1 = resultSet.getString("col1") > | val col2 = resultSet.getString("col2") > | println("Thread2: col " + col1 + " " + col2 + " at " + > Calendar.getInstance().getTime()) > | } > | } > thread2: Thread = Thread[Thread-4,5,main] > > scala> thread1.start > > scala> thread2.start > > Thread2: before execute at Sat Dec 05 12:34:16 PST 2015 > Thread1: before execute at Sat Dec 05 12:34:16 PST 2015 > Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015 > Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015 > > > Does that make sense? > > Thanks for the great work! > > Prasad > > On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <jongy...@gmail.com> wrote: > >> Hi Prasad, >> >> First of all, Thanks for testing multiple executions and documenting the >> bug. I'll fix the bug. Concerning the concurrent executions, that >> interpreter make a two statements from one connection. I'm not good at >> vertica, could you please confirm supporting multi statements from one >> connection and verify running multiple statements from one connection? >> Moreover, I appreciate the advice which is better strategy between multiple >> statements and multiple connections. I expect that second query is started >> before finishing the first one as you can see in your log, In your case, >> >> 1. the first query started >> 2. the second query started >> 3. the second query ended >> 4. the first query ended >> >> I, however why those queries finished after 2 minutes. I guess it's about >> vertica's behavior of jdbc, and it doesn't seems that vertica runs multiple >> statements in one connection. with this situation, even though two >> statements try to execute with parallel, sequentially executes internally. >> If it's true, I'll change to use multiple connections. >> >> Regards, >> Jongyoul >> >> On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <prasadwa...@gmail.com> >> wrote: >> >>> Hi Jongyoul, >>> >>> Thanks for merging ZEPPELIN-440. I downloaded changes from >>> https://github.com/apache/incubator-zeppelin/pull/455 and here's what I >>> found. >>> >>> When I ran two Vertica queries, I got the error below: >>> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) - >>> Job failed >>> java.lang.AbstractMethodError: >>> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z >>> at >>> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163) >>> at >>> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196) >>> at >>> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208) >>> at >>> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292) >>> at >>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>> at >>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>> at >>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300) >>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>> at >>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>> at >>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>> at >>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>> at >>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>> at java.lang.Thread.run(Thread.java:744) >>> >>> When I removed the connection.isValid check, the queries ran without any >>> errors. >>> if (connection.isClosed()) { // || !connection.isValid(10)) { >>> >>> However, I noticed that even though the jobs for the two queries (select >>> sleep(60)) were started around the same time, they both finished 120 >>> seconds after they were started. I also looked at the sessions in Vertica >>> and found that 'select sleep(60), 4' was started after 'select sleep(60), >>> 3' completed. Is this behavior expected? Our goal is to prevent one long >>> Vertica query from delaying other Vertica queries. >>> >>> MacBook-Pro logs $ cat >>> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log >>> >>> INFO [2015-12-04 09:28:44,545] ({Thread-0} >>> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server >>> on port 54257 >>> >>> INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3} >>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >>> interpreter org.apache.zeppelin.hive.HiveInterpreter >>> >>> INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2} >>> SchedulerFactory.java[jobStarted]:132) - Job >>> remoteInterpretJob_1449250124861 started by scheduler >>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>> >>> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2} >>> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and >>> hive.url is mandatory. >>> >>> INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2} >>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >>> 'select sleep(60), 3' >>> >>> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >>> SchedulerFactory.java[jobStarted]:132) - Job >>> remoteInterpretJob_1449250133312 started by scheduler >>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>> >>> INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3} >>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command: >>> 'select sleep(60), 4' >>> >>> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3} >>> SchedulerFactory.java[jobFinished]:138) - Job >>> remoteInterpretJob_1449250133312 finished by scheduler >>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>> >>> INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2} >>> SchedulerFactory.java[jobFinished]:138) - Job >>> remoteInterpretJob_1449250124861 finished by scheduler >>> org.apache.zeppelin.hive.HiveInterpreter1131923628 >>> >>> >>> Thanks for your help! >>> >>> Prasad >>> >>> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <jongy...@gmail.com> wrote: >>> >>>> Hi Prasad, >>>> >>>> PostgresSqlInterpreter doesn't seems to support parallel execution. I >>>> merged ZEPPELIN-440 for supporting it today. Could you please use this >>>> interpreter? You can find docs and changes in >>>> https://github.com/apache/incubator-zeppelin/pull/455 >>>> >>>> Regards, >>>> Jongyoul >>>> >>>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <prasadwa...@gmail.com> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> We are evaluating Zeppelin and are excited about its potential. Right >>>>> now, we are using it to access Presto and Vertica. We downloaded >>>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto >>>>> interpreter and are using PostgreSqlInterpreter to access Vertica. >>>>> >>>>> The main issue we are facing is related to serial execution of >>>>> queries. Since the default scheduler is FIFOScheduler, when a Vertica >>>>> query >>>>> is running in one notebook, all other Vertica queries issued from the same >>>>> or different notes are pending. >>>>> >>>>> We changed the getScheduler method in PostgreSqlInterpreter to return >>>>> ParallelScheduler. >>>>> return SchedulerFactory.singleton().createOrGetParallelScheduler( >>>>> PostgreSqlInterpreter.class.getName() + this.hashCode(), >>>>> 2); >>>>> >>>>> But this resulted in the exception below. Is this expected behavior? >>>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel >>>>> execution of queries? >>>>> >>>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log: >>>>> >>>>> INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2} >>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate >>>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter >>>>> INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2} >>>>> SchedulerFactory.java[jobStarted]:132) - Job >>>>> remoteInterpretJob_1449167615939 started by scheduler >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection! >>>>> INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2} >>>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection! >>>>> INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2} >>>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC >>>>> DriverName:/vertica.jar-sql.keywords >>>>> INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3} >>>>> SchedulerFactory.java[jobStarted]:132) - Job >>>>> remoteInterpretJob_1449167617779 started by scheduler >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql >>>>> connection >>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2} >>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>>> select sleep(30), 1;' >>>>> INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3} >>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command ' >>>>> select sleep(30), 2;' >>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3} >>>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run >>>>> select sleep(30), 2; >>>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has >>>>> been closed. >>>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source) >>>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source) >>>>> at >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235) >>>>> at >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>>> at >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> at >>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183) >>>>> - Job failed >>>>> org.apache.zeppelin.interpreter.InterpreterException: >>>>> java.lang.NullPointerException >>>>> at >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61) >>>>> at >>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93) >>>>> at >>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299) >>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170) >>>>> at >>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157) >>>>> at >>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262) >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) >>>>> at >>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >>>>> at >>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >>>>> at java.lang.Thread.run(Thread.java:744) >>>>> Caused by: java.lang.NullPointerException >>>>> at >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262) >>>>> at >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289) >>>>> at >>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57) >>>>> ... 11 more >>>>> INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3} >>>>> SchedulerFactory.java[jobFinished]:138) - Job >>>>> remoteInterpretJob_1449167617779 finished by scheduler >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>> INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2} >>>>> SchedulerFactory.java[jobFinished]:138) - Job >>>>> remoteInterpretJob_1449167615939 finished by scheduler >>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485 >>>>> >>>>> >>>>> Thanks, >>>>> Prasad >>>>> >>>> >>>> >>>> >>>> -- >>>> 이종열, Jongyoul Lee, 李宗烈 >>>> http://madeng.net >>>> >>> >>> >> >> >> -- >> 이종열, Jongyoul Lee, 李宗烈 >> http://madeng.net >> > > > > -- > Prasad Wagle > 408.476.6261 | twitter.com/prasadwagle | prasadwa...@gmail.com | > www.linkedin.com/in/prasadwagle > -- 이종열, Jongyoul Lee, 李宗烈 http://madeng.net