thanks for the explanation, Matt. On Mon, Aug 20, 2018 at 9:19 AM Matt Burgess <[email protected]> wrote:
> Boris, > > A lot of those tools, especially if they are geared for simply > querying the DB, only call executeQuery() or execute() or something > like that, and even the younger NoSQL drivers will at least implement > those (for this very reason). However our SQL processors are much more > powerful and complex, plus we are aiming for high performance, so we > use the JDBC API calls that are designed for higher throughput, with > better control of fetch size, batch size, etc. We could trade off > performance for more flexibility, but in a flow-based system it didn't > seem prudent. > > Having said that, it would be kind of cool to have a built-in UI tool > that could leverage a DBCPConnectionPool and allow you to run various > one-off SQL statements, like a stripped-down built-in > SquirrelSQL/DBeaver :) > > Regards, > Matt > > On Thu, Aug 16, 2018 at 2:19 PM Boris Tyukin <[email protected]> > wrote: > > > > Interesting, thanks for the explanation, Matt. I did not realize it was > so complicated. When we just started with Big Data and were testing our BI > tools (Qlik and Oracle BI) and also ETL tools (Informatica and Alteryx), we > were just able to use Hive and Impala ODBC drivers to connect and run > simple SELECT statements. I also used Oracle SQL Developer and DBeaver to > connect using jdbc drivers. I assumed it would be similar with NiFi. > > > > On Thu, Aug 16, 2018 at 12:38 PM Matt Burgess <[email protected]> > wrote: > >> > >> Boris, > >> > >> Historically the Hive JDBC drivers (at least around 1.2.x) did not > >> support some of the JDBC API methods called from ExecuteSQL/PutSQL, > >> namely setQueryTimeout(), executeBatch(), etc. Also the column names > >> are returned from the metadata with the table name prepended, so we'd > >> need special logic for the Hive case, and currently we don't have to > >> specify the database type in those processors. There have been some > >> Jiras/PRs around adding Hive support for the SQL processors, but they > >> haven't yet been incorporated. The Apache Hive driver still doesn't > >> support executeBatch(), so PutSQL and PutDatabaseRecord wouldn't be > >> able to call that. Again we'd need a Hive database adapter and would > >> ask the specified adapter if batching is supported, then go along some > >> other logic route if not. It should be doable, but just hasn't been > >> finished yet. > >> > >> Regards, > >> Matt > >> > >> On Thu, Aug 16, 2018 at 12:14 PM Boris Tyukin <[email protected]> > wrote: > >> > > >> > awesome, passing property to impala connection string did the trick! > thanks Ed! > >> > > >> > Does it still make sense to open Jira to support multiple statements > for non-Hive processors? > >> > > >> > Totally separate subject, since you mentioned Hive - is there a > reason why I cannot connect to Hive and run basic SQL using > ExecuteSQL/PutSQL? Not sure why dedicated processors exist just for Hive. > Does it not work over generic DBCP connection pool? We've done some > research yesterday when we were looking for options on how to run multiple > queries but because we are on CDH, Hive jars included with NiFi do not work > for us. Looks like we can compile CDH specific NiFi but I am wondering why > bother if we can just access Hive using generic pool > >> > > >> > On Thu, Aug 16, 2018 at 11:42 AM Ed B <[email protected]> wrote: > >> >> > >> >> Hi Boris, > >> >> True, multi-statements support has been added to SelectHiveQL, but > not to ExecuteSQL/PutSQL. > >> >> As a workaround you could try to specify config params on JDBC > connection string when you define controller service: > >> >> > >> >> jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;... > >> >> > >> >> If you can use Hive driver to access Impala: > >> >> > jdbc:hive2://host:port/;principal=principal_name;Property1=Value;Property2=Value;... > >> >> > >> >> Some params will be working after "?" in URL: > >> >> > jdbc:hive2://host:port/;principal=principal_name?Property1=Value;Property2=Value;... > >> >> > >> >> But, if you really can use Hive drivers, then you can use PutHiveQL > and ExecuteHiveQL. Both support multi-statements. > >> >> > >> >> Regards. > >> >> Ed. > >> >> > >> >> On Wed, Aug 15, 2018 at 11:11 AM Boris Tyukin <[email protected]> > wrote: > >> >>> > >> >>> Hi guys, > >> >>> > >> >>> I need to issue a query like below on Impala. it works fine from > impala-shell but NiFi seems not to like multiple statements like that. > >> >>> > >> >>> set max_row_size=7mb; > >> >>> > >> >>> create table blabla as > >> >>> select blabla > >> >>> from blablabla; > >> >>> > >> >>> > >> >>> I thought it was addressed in 1.7 but I got it confused with Hive > processors and this Jira > >> >>> https://issues.apache.org/jira/browse/NIFI-5044 > >> >>> > >> >>> Is there something in work already to address it or should I open a > new Jira? > >> >>> > >> >>> For now, I am going to use Groovy script but if anyone knows a > better workaround, please let me know. > >> >>> > >> >>> Boris >
