Re: HiveRunner 3.2.1 released
This is awesome news! Well Done! /Pelle On Thu, 31 May 2018 at 19:48, Mass Dosage wrote: > We are pleased to announce the 3.2.1 release of HiveRunner > <https://github.com/klarna/HiveRunner> (an open source framework for > testing Hive queries using JUnit). The changes in this release are: > > > >- Fixed issue where if case of column name in a file was different to >case in table definition they would be treated as different #73 ><https://github.com/klarna/HiveRunner/issues/73>. > > >- The way of setting writable permissions on JUnit temporary folder >changed to make it compatible with Windows #63 ><https://github.com/klarna/HiveRunner/issues/63>. > > The binary artifacts for the release are available in Maven Central > <http://repo1.maven.org/maven2/com/klarna/hiverunner/3.2.1/>. > > We encourage everyone to upgrade to this new version. > > Thanks, > > Adrian > (on behalf of the HiveRunner committers) > -- *Per Ullberg* Datavault Tech Lead Odin (Uppsala) Klarna Bank AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 Reg no: 556737-0431 klarna.com
Re: Is 'application' a reserved word?
Make sure to backtick all table and column names and you won’t get a problem when updating. We also try to include keywords in our automated integration tests to try to find places where we’ve missed the backticks. /Pelle On Wed, 30 May 2018 at 20:07, Matt Burgess wrote: > Edward, > > I hear that, "application" is a bit unfortunate as a reserved word as > well, I wonder how many folks have data sets with a column named > "application". We have that field in provenance data in NiFi, I > discovered it was a reserved word when trying to create a flow to put > NiFi provenance data into Hive for analysis. > > Regards, > Matt > > > On Wed, May 30, 2018 at 2:04 PM, Edward Capriolo > wrote: > > We got bit pretty hard when "exchange partitions" was added. How many > people > > in ad-tech work with exchange's? everyone! > > > > On Wed, May 30, 2018 at 1:38 PM, Alan Gates > wrote: > >> > >> It is. You can see the definitive list of keywords at > >> > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g > >> (Note this is for the master branch, you can switch the branch around to > >> find the list for a particular release.) It would be good to file a > JIRA on > >> this so we fix the documentation. > >> > >> Alan. > >> > >> On Wed, May 30, 2018 at 7:48 AM Matt Burgess > wrote: > >>> > >>> I tried the following simple statement in beeline (Hive 3.0.0): > >>> > >>> create table app (application STRING); > >>> > >>> And got the following error: > >>> > >>> Error: Error while compiling statement: FAILED: ParseException line > >>> 1:18 cannot recognize input near 'application' 'STRING' ')' in column > >>> name or constraint (state=42000,code=4) > >>> > >>> I checked the Wiki [1] but didn't see 'application' on the list of > >>> reserved words. However if I change the column name to anything else > >>> (even 'applicatio') it works. Can someone confirm whether this is a > >>> reserved word? > >>> > >>> Thanks in advance, > >>> Matt > >>> > >>> [1] > >>> > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords > > > > > -- *Per Ullberg* Datavault Tech Lead Odin (Uppsala) Klarna Bank AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 Reg no: 556737-0431 klarna.com
Re: Unable to retrieve table metadata from hcatalog
And you're sure that you have a database and table in hive matching *oraclehadoop.bigtab*? /Pelle On Sat 13 May 2017 at 01:08 Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > I am using GoldenGate to send data real time from Oracle table to Hive. > > It tries to read table metadata using HCat but fails as below: > > *ERROR 2017-05-12 22:45:52,700 [main] Unable to retrieve table matadata. > Table : oraclehadoop.bigtab* > > *org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred > while processing HCat request : NoSuchObjectException while fetching > table.. Cause : NoSuchObjectException(message:oraclehadoop.bigtab table not > found) **at > org.apache.hive.hcatalog.api.HCatClientHMSImpl.getTable(HCatClientHMSImpl.java:175)* > > at > oracle.goldengate.mdp.hive.HiveMetaDataProvider.resolve(HiveMetaDataProvider.java:91) > > at > oracle.goldengate.datasource.metadata.provider.TargetMetaDataStore.retrieveMetaData(TargetMetaDataStore.java:73) > > at > oracle.goldengate.datasource.UserExitDataSource.getMetaData(UserExitDataSource.java:2133) > > Caused by: NoSuchObjectException(message:oraclehadoop.bigtab table not > found) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1885) > > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1838) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at java.lang.reflect.Method.invoke(Method.java:497) > > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:138) > > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > > at com.sun.proxy.$Proxy13.get_table(Unknown Source) > > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1228) > > at > org.apache.hive.hcatalog.api.HCatClientHMSImpl.getTable(HCatClientHMSImpl.java:168) > > > > I am not that familiar with HCat. Any help will be appreciated. > > > thanks > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <javascript:void(0);> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <javascript:void(0);> Reg no: 556737-0431 klarna.com
Re: Interrogating a uniontype
Could you write a UDF that parses it and returns a json object. From there you can use the standard json support in Hive. I did something similar for Erlang structs about 3 years ago. I actually kept them on file and wrote a serde that exposed them as json objects. regards /Pelle On Wed, Nov 23, 2016 at 6:40 PM, Elliot West <tea...@gmail.com> wrote: > Ah, I see that this can't be done with an array as there is no type common > to all union indexes. Perhaps a struct with one field per indexed type? > > On Wed, 23 Nov 2016 at 17:29, Elliot West <tea...@gmail.com> wrote: > >> Can anyone recommend a good approach for interrogating uniontype values >> in HQL? I note that the documentation states that the support for such >> types is limited to 'look-at-only' which I assume to mean that I may only >> dump out the value in its entirety, and extract sub-elements. Using the >> example below, is there anyway I can get to field 'b' of union index 3 to >> extract only the value 5? >> >> {0:1} >> {1:2.0} >> {2:["three","four"]} >> {3:{"a":5,"b":"five"}} >> >> >> If not possible with HQL, would it be possible to implement a UDF that >> can explode the type into something more navigable, like an array, struct, >> or map? >> >> Example when exploded as array: >> >> >> [1,null,null,null] >> [null,2.0,null,null] >> [null,null,["three","four"],null] >> [null,null,null,{"a":5,"b":"five"}] >> >> Has anyone done this? >> >> Thanks, >> >> Elliot. >> >> -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Connect metadata
Sorry, no help from me I'm afraid, but I have a similar question: Could someone provide me with a code snippet (preferably Java) that installs the schema (through datanucleus) on my empty metastore (postgres). I don't want Hive to install the schema at startup, but rather do it explicitly myself. regards /Pelle On Tue, Oct 25, 2016 at 9:09 AM, Rajendra Bhat <rajhalk...@gmail.com> wrote: > Hi Team, > > I have configured only meta store and started the meta store service, > hwich i ma used on presto. > > I need create table on metastore.. how can i able create that.. as i am > not started hive server service, bcoz hadoop not installed on my syatem. > > -- > Thanks and > Regards > > Rajendra Bhat > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Hive metadata on Hbase
What version of hive are you running? /Pelle On Monday, October 24, 2016, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > @Per > > We run full transactional enabled Hive metadb on an Oracle DB. > > I don't have statistics now but will collect from AWR reports no problem. > > @Jorn, > > The primary reason Oracle was chosen is because the company has global > licenses for Oracle + MSSQL + SAP and they are classified as Enterprise > Grade databases. > > None of MySQL and others are classified as such so they cannot be deployed > in production. > > Besides, for us to have Hive metadata on Oracle makes sense as our > infrastructure does all the support, HA etc for it and they have trained > DBAs to look after it 24x7. > > Admittedly we are now relying on HDFS itself plus Hbase as well for > persistent storage. So the situation might change. > > HTH > > > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 24 October 2016 at 06:46, Per Ullberg <per.ullb...@klarna.com > <javascript:_e(%7B%7D,'cvml','per.ullb...@klarna.com');>> wrote: > >> I thought the main gain was to get ACID on Hive performant enough. >> >> @Mich: Do you run with ACID-enabled tables? How many >> Create/Update/Deletes do you do per second? >> >> best regards >> /Pelle >> >> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke <jornfra...@gmail.com >> <javascript:_e(%7B%7D,'cvml','jornfra...@gmail.com');>> wrote: >> >>> I think the main gain is more about getting rid of a dedicated database >>> including maintenance and potential license cost. >>> For really large clusters and a lot of users this might be even more >>> beneficial. You can avoid clustering the database etc. >>> >>> On 24 Oct 2016, at 00:46, Mich Talebzadeh <mich.talebza...@gmail.com >>> <javascript:_e(%7B%7D,'cvml','mich.talebza...@gmail.com');>> wrote: >>> >>> >>> A while back there was some notes on having Hive metastore on Hbase as >>> opposed to conventional RDBMSs >>> >>> I am currently involved with some hefty work with Hbase and Phoenix for >>> batch ingestion of trade data. As long as you define your Hbase table >>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is >>> impressive. >>> >>> I am not sure how much having Hbase as Hive metastore is going to add to >>> Hive performance. We use Oracle 12c as Hive metastore and the Hive >>> database/schema is built on solid state disks. Never had any issues with >>> lock and concurrency. >>> >>> Therefore I am not sure what one is going to gain by having Hbase as the >>> Hive metastore? I trust that we can still use our existing schemas on >>> Oracle. >>> >>> HTH >>> >>> >>> >>> Dr Mich Talebzadeh >>> >>> >>> >>> LinkedIn * >>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >>> >>> >>> >>> http://talebzadehmich.wordpress.com >>> >>> >>> *Disclaimer:* Use it at your own risk. Any and all responsibility for >>> any loss, damage or destruction of data or any other property which may >>> arise from relying on this email's technical content is explicitly >>> disclaimed. The author will in no case be liable for any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >> >> >> -- >> >> *Per Ullberg* >> Data Vault Tech Lead >> Odin Uppsala >> +46 701612693 <+46+701612693> >> >> Klarna AB (publ) >> Sveavägen 46, 111 34 Stockholm >> Tel: +46 8 120 120 00 <+46812012000> >> Reg no: 556737-0431 >> klarna.com >> >> > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Hive metadata on Hbase
I thought the main gain was to get ACID on Hive performant enough. @Mich: Do you run with ACID-enabled tables? How many Create/Update/Deletes do you do per second? best regards /Pelle On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke <jornfra...@gmail.com> wrote: > I think the main gain is more about getting rid of a dedicated database > including maintenance and potential license cost. > For really large clusters and a lot of users this might be even more > beneficial. You can avoid clustering the database etc. > > On 24 Oct 2016, at 00:46, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > > A while back there was some notes on having Hive metastore on Hbase as > opposed to conventional RDBMSs > > I am currently involved with some hefty work with Hbase and Phoenix for > batch ingestion of trade data. As long as you define your Hbase table > through Phoenix and with secondary Phoenix indexes on Hbase, the speed is > impressive. > > I am not sure how much having Hbase as Hive metastore is going to add to > Hive performance. We use Oracle 12c as Hive metastore and the Hive > database/schema is built on solid state disks. Never had any issues with > lock and concurrency. > > Therefore I am not sure what one is going to gain by having Hbase as the > Hive metastore? I trust that we can still use our existing schemas on > Oracle. > > HTH > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Unit testing macros
We ran in to the same issue with oozie hive actions not sharing sessions. We went for the concatenation approach. We have a java action before the hive action that reads all hql files from a specification and concatenates these into one long hql script. We also add some sugar at this point by setting some custom hive variables (like NOMIAL_TIMESTAMP that allows the users to not use current_time udf in their scripts etc). best regards /Pelle On Fri, Sep 30, 2016 at 12:12 PM, Staņislavs Rogozins < stanislavs.rogoz...@gmail.com> wrote: > Have you taken a look at https://github.com/klarna/HiveRunner? > > > I assume you're referring to this: > https://github.com/klarna/HiveRunner/blob/ef14a1c181be863cec > 2278aeb732d9e21c38a2b3/src/test/java/com/klarna/hiverunner/MacroTest.java > > It appears to execute the macro creation statement as part of the session, > using hive library's CLIService class, so, doing the same during normal > execution would require for the application itself to be implemented in > Java. > > On Fri, Sep 30, 2016 at 12:45 PM, Elliot West <tea...@gmail.com> wrote: > >> Hi, >> >> You can achieve this by storing the macro definition in a separate HQL >> file and 'import' this as needed. Unfortunately such imports are >> interpreted by your Hive client and the relevant command varies between >> client implementations: '!run' in Beeline and 'SOURCE' in Hive CLI. I >> raised a proposal to create a unified command that is compatible across >> clients but this has yet to gain any traction: https://issues.apach >> e.org/jira/browse/HIVE-12703 >> >> Elliot. >> >> >> > Actually, I'm developing an Oozie workflow that executes HQL scripts as > actions. Doesn't look like it provides support for such features. > So, it would seem that my current options are either to use shell action > instead or do some magic with concatenation of scripts before main action's > execution. > A first-class command like that sure would be nice. > > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Unit testing macros
Have you taken a look at https://github.com/klarna/HiveRunner? regards /Pelle On Fri, Sep 30, 2016 at 11:23 AM, Staņislavs Rogozins < stanislavs.rogoz...@gmail.com> wrote: > The Unit testing wiki page > <https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL> > suggests > using macros to 'extract and resuse the expressions applied to columns' and > that they can be 'readily isolated for testing'. However as far as I'm > aware, right now only temporary macros can be created, that stop existing > outside of session where they were defined(https://cwiki.apache. > org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/ > DropMacro), thus making it necessary to include the macro in every HQL > script that uses it. How is it possible to isolate them for testing? > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com
Re: Query consuming all resources
What Jörn said. We use the capacity scheduler to be able to give priority to some user groups over others. Regards /Pelle On Wednesday, September 28, 2016, Jörn Franke <jornfra...@gmail.com> wrote: > You need to configure queues in yarn and use the fairscheduler. From your > use case it looks like you need to also configure pre-emption > > > On 28 Sep 2016, at 00:52, Jose Rozanec <jose.roza...@mercadolibre.com > <javascript:;>> wrote: > > > > Hi, > > > > We have a Hive cluster. We notice that some queries consume all > resources, which is not desirable to us, since we want to grant some degree > of parallelism to incoming ones: any incoming query should be able to do at > least some progress, not just wait the big one finish. > > > > Is there way to do so? We use Hive 2.1.0 with Tez engine. > > > > Thank you in advance, > > > > Joze. > -- *Per Ullberg* Data Vault Tech Lead Odin Uppsala +46 701612693 <+46+701612693> Klarna AB (publ) Sveavägen 46, 111 34 Stockholm Tel: +46 8 120 120 00 <+46812012000> Reg no: 556737-0431 klarna.com