Re: ExecuteSQL to support multiple statements

Boris Tyukin Mon, 20 Aug 2018 07:48:02 -0700

thanks for the explanation, Matt.

On Mon, Aug 20, 2018 at 9:19 AM Matt Burgess <[email protected]> wrote:


> Boris,
>
> A lot of those tools, especially if they are geared for simply
> querying the DB, only call executeQuery() or execute() or something
> like that, and even the younger NoSQL drivers will at least implement
> those (for this very reason). However our SQL processors are much more
> powerful and complex, plus we are aiming for high performance, so we
> use the JDBC API calls that are designed for higher throughput, with
> better control of fetch size, batch size, etc.  We could trade off
> performance for more flexibility, but in a flow-based system it didn't
> seem prudent.
>
> Having said that, it would be kind of cool to have a built-in UI tool
> that could leverage a DBCPConnectionPool and allow you to run various
> one-off SQL statements, like a stripped-down built-in
> SquirrelSQL/DBeaver :)
>
> Regards,
> Matt
>
> On Thu, Aug 16, 2018 at 2:19 PM Boris Tyukin <[email protected]>
> wrote:
> >
> > Interesting, thanks for the explanation, Matt. I did not realize it was
> so complicated. When we just started with Big Data and were testing our BI
> tools (Qlik and Oracle BI) and also ETL tools (Informatica and Alteryx), we
> were just able to use Hive and Impala ODBC drivers to connect and run
> simple SELECT statements. I also used Oracle SQL Developer and DBeaver to
> connect using jdbc drivers. I assumed it would be similar with NiFi.
> >
> > On Thu, Aug 16, 2018 at 12:38 PM Matt Burgess <[email protected]>
> wrote:
> >>
> >> Boris,
> >>
> >> Historically the Hive JDBC drivers (at least around 1.2.x) did not
> >> support some of the JDBC API methods called from ExecuteSQL/PutSQL,
> >> namely setQueryTimeout(), executeBatch(), etc. Also the column names
> >> are returned from the metadata with the table name prepended, so we'd
> >> need special logic for the Hive case, and currently we don't have to
> >> specify the database type in those processors. There have been some
> >> Jiras/PRs around adding Hive support for the SQL processors, but they
> >> haven't yet been incorporated. The Apache Hive driver still doesn't
> >> support executeBatch(), so PutSQL and PutDatabaseRecord wouldn't be
> >> able to call that. Again we'd need a Hive database adapter and would
> >> ask the specified adapter if batching is supported, then go along some
> >> other logic route if not. It should be doable, but just hasn't been
> >> finished yet.
> >>
> >> Regards,
> >> Matt
> >>
> >> On Thu, Aug 16, 2018 at 12:14 PM Boris Tyukin <[email protected]>
> wrote:
> >> >
> >> > awesome, passing property to impala connection string did the trick!
> thanks Ed!
> >> >
> >> > Does it still make sense to open Jira to support multiple statements
> for non-Hive processors?
> >> >
> >> > Totally separate subject, since you mentioned Hive - is there a
> reason why I cannot connect to Hive and run basic SQL using
> ExecuteSQL/PutSQL? Not sure why dedicated processors exist just for Hive.
> Does it not work over generic DBCP connection pool? We've done some
> research yesterday when we were looking for options on how to run multiple
> queries but because we are on CDH, Hive jars included with NiFi do not work
> for us. Looks like we can compile CDH specific NiFi but I am wondering why
> bother if we can just access Hive using generic pool
> >> >
> >> > On Thu, Aug 16, 2018 at 11:42 AM Ed B <[email protected]> wrote:
> >> >>
> >> >> Hi Boris,
> >> >> True, multi-statements support has been added to SelectHiveQL, but
> not to ExecuteSQL/PutSQL.
> >> >> As a workaround you could try to specify config params on JDBC
> connection string when you define controller service:
> >> >>
> >> >> jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;...
> >> >>
> >> >> If you can use Hive driver to access Impala:
> >> >>
> jdbc:hive2://host:port/;principal=principal_name;Property1=Value;Property2=Value;...
> >> >>
> >> >> Some params will be working after "?" in URL:
> >> >>
> jdbc:hive2://host:port/;principal=principal_name?Property1=Value;Property2=Value;...
> >> >>
> >> >> But, if you really can use Hive drivers, then you can use PutHiveQL
> and ExecuteHiveQL. Both support multi-statements.
> >> >>
> >> >> Regards.
> >> >> Ed.
> >> >>
> >> >> On Wed, Aug 15, 2018 at 11:11 AM Boris Tyukin <[email protected]>
> wrote:
> >> >>>
> >> >>> Hi guys,
> >> >>>
> >> >>> I need to issue a query like below on Impala. it works fine from
> impala-shell but NiFi seems not to like multiple statements like that.
> >> >>>
> >> >>> set max_row_size=7mb;
> >> >>>
> >> >>> create table blabla as
> >> >>> select blabla
> >> >>> from blablabla;
> >> >>>
> >> >>>
> >> >>> I thought it was addressed in 1.7 but I got it confused with Hive
> processors and this Jira
> >> >>> https://issues.apache.org/jira/browse/NIFI-5044
> >> >>>
> >> >>> Is there something in work already to address it or should I open a
> new Jira?
> >> >>>
> >> >>> For now, I am going to use Groovy script but if anyone knows a
> better workaround, please let me know.
> >> >>>
> >> >>> Boris
>

Re: ExecuteSQL to support multiple statements

Reply via email to