Hi Prasad,

Thanks for the detail review of it. I also figure out multiple connection
would make sense. I create
https://issues.apache.org/jira/browse/ZEPPELIN-487 and I will fix it.

Regards,
Jongyoul

On Sun, Dec 6, 2015 at 6:05 AM, Prasad Wagle <prasadwa...@gmail.com> wrote:

> Hi Jongyoul,
>
> I ran experiments below and found that we need multiple connections for
> parallel execution in Vertica.
>
> *Same Connection*
> scala> val connection = DriverManager.getConnection(url, username,
> password)
> connection: java.sql.Connection =
> com.vertica.jdbc.VerticaConnectionImpl@5d56ea0f
>
> scala> val thread1 = new Thread {
>      |   override def run {
>      |     val statement = connection.createStatement()
>      |     println("Thread1: before execute at " +
> Calendar.getInstance().getTime())
>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
> col1, 1 as col2")
>      |     resultSet.next()
>      |     val col1 = resultSet.getString("col1")
>      |     val col2 = resultSet.getString("col2")
>      |     resultSet.next()
>      |     println("Thread1: col " + col1 + " " + col2 + " at " +
> Calendar.getInstance().getTime())
>      |   }
>      | }
> thread1: Thread = Thread[Thread-3,5,main]
>
> scala> val thread2 = new Thread {
>      |   override def run {
>      |     val statement = connection.createStatement()
>      |     println("Thread2: before execute at " +
> Calendar.getInstance().getTime())
>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
> col1, 2 as col2")
>      |     resultSet.next()
>      |     val col1 = resultSet.getString("col1")
>      |     val col2 = resultSet.getString("col2")
>      |     resultSet.next()
>      |     println("Thread2: col " + col1 + " " + col2 + " at " +
> Calendar.getInstance().getTime())
>      |   }
>      | }
> thread2: Thread = Thread[Thread-4,5,main]
>
> scala> thread1.start
>
> scala> thread2.start
> Thread1: before execute at Sat Dec 05 12:38:59 PST 2015
> Thread2: before execute at Sat Dec 05 12:38:59 PST 2015
> Thread1: col 0 1 at Sat Dec 05 12:39:19 PST 2015
> Thread2: col 0 2 at Sat Dec 05 12:39:19 PST 2015
>
> *Two Connections*
>
> scala> val thread1 = new Thread {
>      |   override def run {
>      |     val connection = DriverManager.getConnection(url, username,
> password)
>      |     val statement = connection.createStatement()
>      |     println("Thread1: before execute at " +
> Calendar.getInstance().getTime())
>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
> col1, 1 as col2")
>      |     resultSet.next()
>      |     val col1 = resultSet.getString("col1")
>      |     val col2 = resultSet.getString("col2")
>      |     println("Thread1: col " + col1 + " " + col2 + " at " +
> Calendar.getInstance().getTime())
>      |   }
>      | }
> thread1: Thread = Thread[Thread-3,5,main]
>
> scala> val thread2 = new Thread {
>      |   override def run {
>      |     val connection = DriverManager.getConnection(url, username,
> password)
>      |     val statement = connection.createStatement()
>      |     println("Thread2: before execute at " +
> Calendar.getInstance().getTime())
>      |     val resultSet = statement.executeQuery("SELECT sleep(10) as
> col1, 2 as col2")
>      |     resultSet.next()
>      |     val col1 = resultSet.getString("col1")
>      |     val col2 = resultSet.getString("col2")
>      |     println("Thread2: col " + col1 + " " + col2 + " at " +
> Calendar.getInstance().getTime())
>      |   }
>      | }
> thread2: Thread = Thread[Thread-4,5,main]
>
> scala> thread1.start
>
> scala> thread2.start
>
> Thread2: before execute at Sat Dec 05 12:34:16 PST 2015
> Thread1: before execute at Sat Dec 05 12:34:16 PST 2015
> Thread1: col 0 1 at Sat Dec 05 12:34:26 PST 2015
> Thread2: col 0 2 at Sat Dec 05 12:34:26 PST 2015
>
>
> Does that make sense?
>
> Thanks for the great work!
>
> Prasad
>
> On Fri, Dec 4, 2015 at 6:18 PM, Jongyoul Lee <jongy...@gmail.com> wrote:
>
>> Hi Prasad,
>>
>> First of all, Thanks for testing multiple executions and documenting the
>> bug. I'll fix the bug. Concerning the concurrent executions, that
>> interpreter make a two statements from one connection. I'm not good at
>> vertica, could you please confirm supporting multi statements from one
>> connection and verify running multiple statements from one connection?
>> Moreover, I appreciate the advice which is better strategy between multiple
>> statements and multiple connections. I expect that second query is started
>> before finishing the first one as you can see in your log, In your case,
>>
>> 1. the first query started
>> 2. the second query started
>> 3. the second query ended
>> 4. the first query ended
>>
>> I, however why those queries finished after 2 minutes. I guess it's about
>> vertica's behavior of jdbc, and it doesn't seems that vertica runs multiple
>> statements in one connection. with this situation, even though two
>> statements try to execute with parallel, sequentially executes internally.
>> If it's true, I'll change to use multiple connections.
>>
>> Regards,
>> Jongyoul
>>
>> On Sat, Dec 5, 2015 at 3:25 AM, Prasad Wagle <prasadwa...@gmail.com>
>> wrote:
>>
>>> Hi Jongyoul,
>>>
>>> Thanks for merging ZEPPELIN-440. I downloaded changes from
>>> https://github.com/apache/incubator-zeppelin/pull/455 and here's what I
>>> found.
>>>
>>> When I ran two Vertica queries, I got the error below:
>>> ERROR [2015-12-04 09:18:47,153] ({pool-2-thread-3} Job.java[run]:183) -
>>> Job failed
>>> java.lang.AbstractMethodError:
>>> com.vertica.jdbc.VerticaConnectionImpl.isValid(I)Z
>>>         at
>>> org.apache.zeppelin.hive.HiveInterpreter.getConnection(HiveInterpreter.java:163)
>>>         at
>>> org.apache.zeppelin.hive.HiveInterpreter.getStatement(HiveInterpreter.java:196)
>>>         at
>>> org.apache.zeppelin.hive.HiveInterpreter.executeSql(HiveInterpreter.java:208)
>>>         at
>>> org.apache.zeppelin.hive.HiveInterpreter.interpret(HiveInterpreter.java:292)
>>>         at
>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>         at
>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>         at
>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:300)
>>>         at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>         at
>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>         at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>         at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>         at
>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>         at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>         at java.lang.Thread.run(Thread.java:744)
>>>
>>> When I removed the connection.isValid check, the queries ran without any
>>> errors.
>>>       if (connection.isClosed()) { // || !connection.isValid(10)) {
>>>
>>> However, I noticed that even though the jobs for the two queries (select
>>> sleep(60)) were started around the same time, they both finished 120
>>> seconds after they were started. I also looked at the sessions in Vertica
>>> and found that 'select sleep(60), 4' was started after 'select sleep(60),
>>> 3' completed. Is this behavior expected? Our goal is to prevent one long
>>> Vertica query from delaying other Vertica queries.
>>>
>>> MacBook-Pro logs $ cat
>>> zeppelin-interpreter-hive-pwagle-MacBook-Pro.local.log
>>>
>>>  INFO [2015-12-04 09:28:44,545] ({Thread-0}
>>> RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server
>>> on port 54257
>>>
>>>  INFO [2015-12-04 09:28:44,818] ({pool-1-thread-3}
>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>>> interpreter org.apache.zeppelin.hive.HiveInterpreter
>>>
>>>  INFO [2015-12-04 09:28:44,862] ({pool-2-thread-2}
>>> SchedulerFactory.java[jobStarted]:132) - Job
>>> remoteInterpretJob_1449250124861 started by scheduler
>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>
>>> ERROR [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>>> HiveInterpreter.java[open]:128) - hive will be ignored. driver.hive and
>>> hive.url is mandatory.
>>>
>>>  INFO [2015-12-04 09:28:44,865] ({pool-2-thread-2}
>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>>> 'select sleep(60), 3'
>>>
>>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>>> SchedulerFactory.java[jobStarted]:132) - Job
>>> remoteInterpretJob_1449250133312 started by scheduler
>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>
>>>  INFO [2015-12-04 09:28:53,312] ({pool-2-thread-3}
>>> HiveInterpreter.java[interpret]:290) - PropertyKey: vertica, SQL command:
>>> 'select sleep(60), 4'
>>>
>>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-3}
>>> SchedulerFactory.java[jobFinished]:138) - Job
>>> remoteInterpretJob_1449250133312 finished by scheduler
>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>
>>>  INFO [2015-12-04 09:30:48,970] ({pool-2-thread-2}
>>> SchedulerFactory.java[jobFinished]:138) - Job
>>> remoteInterpretJob_1449250124861 finished by scheduler
>>> org.apache.zeppelin.hive.HiveInterpreter1131923628
>>>
>>>
>>> Thanks for your help!
>>>
>>> Prasad
>>>
>>> On Thu, Dec 3, 2015 at 9:12 PM, Jongyoul Lee <jongy...@gmail.com> wrote:
>>>
>>>> Hi Prasad,
>>>>
>>>> PostgresSqlInterpreter doesn't seems to support parallel execution. I
>>>> merged ZEPPELIN-440 for supporting it today. Could you please use  this
>>>> interpreter? You can find docs and changes in
>>>> https://github.com/apache/incubator-zeppelin/pull/455
>>>>
>>>> Regards,
>>>> Jongyoul
>>>>
>>>> On Fri, Dec 4, 2015 at 10:32 AM, Prasad Wagle <prasadwa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We are evaluating Zeppelin and are excited about its potential. Right
>>>>> now, we are using it to access Presto and Vertica. We downloaded
>>>>> https://github.com/apache/incubator-zeppelin/pull/68 for the Presto
>>>>> interpreter and are using PostgreSqlInterpreter to access Vertica.
>>>>>
>>>>> The main issue we are facing is related to serial execution of
>>>>> queries. Since the default scheduler is FIFOScheduler, when a Vertica 
>>>>> query
>>>>> is running in one notebook, all other Vertica queries issued from the same
>>>>> or different notes are pending.
>>>>>
>>>>> We changed the getScheduler method in PostgreSqlInterpreter to return
>>>>> ParallelScheduler.
>>>>>     return SchedulerFactory.singleton().createOrGetParallelScheduler(
>>>>>             PostgreSqlInterpreter.class.getName() + this.hashCode(),
>>>>> 2);
>>>>>
>>>>> But this resulted in the exception below. Is this expected behavior?
>>>>> Does the PostgreSqlInterpreter have to be redesigned to support parallel
>>>>> execution of queries?
>>>>>
>>>>> In zeppelin-interpreter-psql-pwagle-MacBook-Pro.local.log:
>>>>>
>>>>>  INFO [2015-12-03 10:33:35,891] ({pool-1-thread-2}
>>>>> RemoteInterpreterServer.java[createInterpreter]:168) - Instantiate
>>>>> interpreter org.apache.zeppelin.postgresql.PostgreSqlInterpreter
>>>>>  INFO [2015-12-03 10:33:35,940] ({pool-2-thread-2}
>>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>>> remoteInterpretJob_1449167615939 started by scheduler
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>>> PostgreSqlInterpreter.java[open]:132) - Open psql connection!
>>>>>  INFO [2015-12-03 10:33:35,942] ({pool-2-thread-2}
>>>>> PostgreSqlInterpreter.java[close]:182) - Close psql connection!
>>>>>  INFO [2015-12-03 10:33:36,511] ({pool-2-thread-2}
>>>>> SqlCompleter.java[getSqlKeywordsCompletions]:144) - JDBC
>>>>> DriverName:/vertica.jar-sql.keywords
>>>>>  INFO [2015-12-03 10:33:37,779] ({pool-2-thread-3}
>>>>> SchedulerFactory.java[jobStarted]:132) - Job
>>>>> remoteInterpretJob_1449167617779 started by scheduler
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>>> PostgreSqlInterpreter.java[open]:153) - Successfully created psql 
>>>>> connection
>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-2}
>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>>> select sleep(30), 1;'
>>>>>  INFO [2015-12-03 10:33:49,352] ({pool-2-thread-3}
>>>>> PostgreSqlInterpreter.java[interpret]:288) - Run SQL command '
>>>>> select sleep(30), 2;'
>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-3}
>>>>> PostgreSqlInterpreter.java[executeSql]:271) - Cannot run
>>>>> select sleep(30), 2;
>>>>> java.sql.SQLException: [Vertica][JDBC](12060) ResultSet object has
>>>>> been closed.
>>>>> at com.vertica.jdbc.SForwardResultSet.checkIfOpen(Unknown Source)
>>>>> at com.vertica.jdbc.SForwardResultSet.next(Unknown Source)
>>>>> at
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:235)
>>>>> at
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>> at
>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> ERROR [2015-12-03 10:34:49,482] ({pool-2-thread-2} Job.java[run]:183)
>>>>> - Job failed
>>>>> org.apache.zeppelin.interpreter.InterpreterException:
>>>>> java.lang.NullPointerException
>>>>> at
>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:61)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:299)
>>>>> at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
>>>>> at
>>>>> org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:157)
>>>>> at
>>>>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178)
>>>>> at
>>>>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>>>> at
>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>>>> at java.lang.Thread.run(Thread.java:744)
>>>>> Caused by: java.lang.NullPointerException
>>>>> at
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.executeSql(PostgreSqlInterpreter.java:262)
>>>>> at
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter.interpret(PostgreSqlInterpreter.java:289)
>>>>> at
>>>>> org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
>>>>> ... 11 more
>>>>>  INFO [2015-12-03 10:34:49,489] ({pool-2-thread-3}
>>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>>> remoteInterpretJob_1449167617779 finished by scheduler
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>  INFO [2015-12-03 10:34:49,490] ({pool-2-thread-2}
>>>>> SchedulerFactory.java[jobFinished]:138) - Job
>>>>> remoteInterpretJob_1449167615939 finished by scheduler
>>>>> org.apache.zeppelin.postgresql.PostgreSqlInterpreter761108485
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Prasad
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> 이종열, Jongyoul Lee, 李宗烈
>>>> http://madeng.net
>>>>
>>>
>>>
>>
>>
>> --
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net
>>
>
>
>
> --
> Prasad Wagle
> 408.476.6261  |  twitter.com/prasadwagle | prasadwa...@gmail.com |
> www.linkedin.com/in/prasadwagle
>



-- 
이종열, Jongyoul Lee, 李宗烈
http://madeng.net

Reply via email to