Re: ExecuteSQL to support multiple statements

Matt Burgess Mon, 20 Aug 2018 06:20:02 -0700

Boris,

A lot of those tools, especially if they are geared for simply
querying the DB, only call executeQuery() or execute() or something
like that, and even the younger NoSQL drivers will at least implement
those (for this very reason). However our SQL processors are much more
powerful and complex, plus we are aiming for high performance, so we
use the JDBC API calls that are designed for higher throughput, with
better control of fetch size, batch size, etc.  We could trade off
performance for more flexibility, but in a flow-based system it didn't
seem prudent.


Having said that, it would be kind of cool to have a built-in UI tool
that could leverage a DBCPConnectionPool and allow you to run various
one-off SQL statements, like a stripped-down built-in
SquirrelSQL/DBeaver :)

Regards,
Matt

On Thu, Aug 16, 2018 at 2:19 PM Boris Tyukin <[email protected]> wrote:
>
> Interesting, thanks for the explanation, Matt. I did not realize it was so 
> complicated. When we just started with Big Data and were testing our BI tools 
> (Qlik and Oracle BI) and also ETL tools (Informatica and Alteryx), we were 
> just able to use Hive and Impala ODBC drivers to connect and run simple 
> SELECT statements. I also used Oracle SQL Developer and DBeaver to connect 
> using jdbc drivers. I assumed it would be similar with NiFi.
>
> On Thu, Aug 16, 2018 at 12:38 PM Matt Burgess <[email protected]> wrote:
>>
>> Boris,
>>
>> Historically the Hive JDBC drivers (at least around 1.2.x) did not
>> support some of the JDBC API methods called from ExecuteSQL/PutSQL,
>> namely setQueryTimeout(), executeBatch(), etc. Also the column names
>> are returned from the metadata with the table name prepended, so we'd
>> need special logic for the Hive case, and currently we don't have to
>> specify the database type in those processors. There have been some
>> Jiras/PRs around adding Hive support for the SQL processors, but they
>> haven't yet been incorporated. The Apache Hive driver still doesn't
>> support executeBatch(), so PutSQL and PutDatabaseRecord wouldn't be
>> able to call that. Again we'd need a Hive database adapter and would
>> ask the specified adapter if batching is supported, then go along some
>> other logic route if not. It should be doable, but just hasn't been
>> finished yet.
>>
>> Regards,
>> Matt
>>
>> On Thu, Aug 16, 2018 at 12:14 PM Boris Tyukin <[email protected]> wrote:
>> >
>> > awesome, passing property to impala connection string did the trick! 
>> > thanks Ed!
>> >
>> > Does it still make sense to open Jira to support multiple statements for 
>> > non-Hive processors?
>> >
>> > Totally separate subject, since you mentioned Hive - is there a reason why 
>> > I cannot connect to Hive and run basic SQL using ExecuteSQL/PutSQL? Not 
>> > sure why dedicated processors exist just for Hive. Does it not work over 
>> > generic DBCP connection pool? We've done some research yesterday when we 
>> > were looking for options on how to run multiple queries but because we are 
>> > on CDH, Hive jars included with NiFi do not work for us. Looks like we can 
>> > compile CDH specific NiFi but I am wondering why bother if we can just 
>> > access Hive using generic pool
>> >
>> > On Thu, Aug 16, 2018 at 11:42 AM Ed B <[email protected]> wrote:
>> >>
>> >> Hi Boris,
>> >> True, multi-statements support has been added to SelectHiveQL, but not to 
>> >> ExecuteSQL/PutSQL.
>> >> As a workaround you could try to specify config params on JDBC connection 
>> >> string when you define controller service:
>> >>
>> >> jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...
>> >>
>> >> If you can use Hive driver to access Impala:
>> >>  
>> >> jdbc:hive2://host:port/;principal=principal_name;Property1=Value;Property2=Value;...
>> >>
>> >> Some params will be working after "?" in URL:
>> >>  
>> >> jdbc:hive2://host:port/;principal=principal_name?Property1=Value;Property2=Value;...
>> >>
>> >> But, if you really can use Hive drivers, then you can use PutHiveQL and 
>> >> ExecuteHiveQL. Both support multi-statements.
>> >>
>> >> Regards.
>> >> Ed.
>> >>
>> >> On Wed, Aug 15, 2018 at 11:11 AM Boris Tyukin <[email protected]> 
>> >> wrote:
>> >>>
>> >>> Hi guys,
>> >>>
>> >>> I need to issue a query like below on Impala. it works fine from 
>> >>> impala-shell but NiFi seems not to like multiple statements like that.
>> >>>
>> >>> set max_row_size=7mb;
>> >>>
>> >>> create table blabla as
>> >>> select blabla
>> >>> from blablabla;
>> >>>
>> >>>
>> >>> I thought it was addressed in 1.7 but I got it confused with Hive 
>> >>> processors and this Jira
>> >>> https://issues.apache.org/jira/browse/NIFI-5044
>> >>>
>> >>> Is there something in work already to address it or should I open a new 
>> >>> Jira?
>> >>>
>> >>> For now, I am going to use Groovy script but if anyone knows a better 
>> >>> workaround, please let me know.
>> >>>
>> >>> Boris

Re: ExecuteSQL to support multiple statements

Reply via email to