Interesting, thanks for the explanation, Matt. I did not realize it was so complicated. When we just started with Big Data and were testing our BI tools (Qlik and Oracle BI) and also ETL tools (Informatica and Alteryx), we were just able to use Hive and Impala ODBC drivers to connect and run simple SELECT statements. I also used Oracle SQL Developer and DBeaver to connect using jdbc drivers. I assumed it would be similar with NiFi.
On Thu, Aug 16, 2018 at 12:38 PM Matt Burgess <[email protected]> wrote: > Boris, > > Historically the Hive JDBC drivers (at least around 1.2.x) did not > support some of the JDBC API methods called from ExecuteSQL/PutSQL, > namely setQueryTimeout(), executeBatch(), etc. Also the column names > are returned from the metadata with the table name prepended, so we'd > need special logic for the Hive case, and currently we don't have to > specify the database type in those processors. There have been some > Jiras/PRs around adding Hive support for the SQL processors, but they > haven't yet been incorporated. The Apache Hive driver still doesn't > support executeBatch(), so PutSQL and PutDatabaseRecord wouldn't be > able to call that. Again we'd need a Hive database adapter and would > ask the specified adapter if batching is supported, then go along some > other logic route if not. It should be doable, but just hasn't been > finished yet. > > Regards, > Matt > > On Thu, Aug 16, 2018 at 12:14 PM Boris Tyukin <[email protected]> > wrote: > > > > awesome, passing property to impala connection string did the trick! > thanks Ed! > > > > Does it still make sense to open Jira to support multiple statements for > non-Hive processors? > > > > Totally separate subject, since you mentioned Hive - is there a reason > why I cannot connect to Hive and run basic SQL using ExecuteSQL/PutSQL? Not > sure why dedicated processors exist just for Hive. Does it not work over > generic DBCP connection pool? We've done some research yesterday when we > were looking for options on how to run multiple queries but because we are > on CDH, Hive jars included with NiFi do not work for us. Looks like we can > compile CDH specific NiFi but I am wondering why bother if we can just > access Hive using generic pool > > > > On Thu, Aug 16, 2018 at 11:42 AM Ed B <[email protected]> wrote: > >> > >> Hi Boris, > >> True, multi-statements support has been added to SelectHiveQL, but not > to ExecuteSQL/PutSQL. > >> As a workaround you could try to specify config params on JDBC > connection string when you define controller service: > >> > >> jdbc:impala://Host:Port[/Schema];Property1=Value;Property2=Value;... > >> > >> If you can use Hive driver to access Impala: > >> > jdbc:hive2://host:port/;principal=principal_name;Property1=Value;Property2=Value;... > >> > >> Some params will be working after "?" in URL: > >> > jdbc:hive2://host:port/;principal=principal_name?Property1=Value;Property2=Value;... > >> > >> But, if you really can use Hive drivers, then you can use PutHiveQL and > ExecuteHiveQL. Both support multi-statements. > >> > >> Regards. > >> Ed. > >> > >> On Wed, Aug 15, 2018 at 11:11 AM Boris Tyukin <[email protected]> > wrote: > >>> > >>> Hi guys, > >>> > >>> I need to issue a query like below on Impala. it works fine from > impala-shell but NiFi seems not to like multiple statements like that. > >>> > >>> set max_row_size=7mb; > >>> > >>> create table blabla as > >>> select blabla > >>> from blablabla; > >>> > >>> > >>> I thought it was addressed in 1.7 but I got it confused with Hive > processors and this Jira > >>> https://issues.apache.org/jira/browse/NIFI-5044 > >>> > >>> Is there something in work already to address it or should I open a > new Jira? > >>> > >>> For now, I am going to use Groovy script but if anyone knows a better > workaround, please let me know. > >>> > >>> Boris >
