Re: HiveRunner 3.2.1 released

2018-05-31 Thread Per Ullberg
This is awesome news! Well Done!

/Pelle

On Thu, 31 May 2018 at 19:48, Mass Dosage  wrote:

> We are pleased to announce the 3.2.1 release of HiveRunner
> <https://github.com/klarna/HiveRunner> (an open source framework for
> testing Hive queries using JUnit). The changes in this release are:
>
>
>
>- Fixed issue where if case of column name in a file was different to
>case in table definition they would be treated as different #73
><https://github.com/klarna/HiveRunner/issues/73>.
>
>
>- The way of setting writable permissions on JUnit temporary folder
>changed to make it compatible with Windows #63
><https://github.com/klarna/HiveRunner/issues/63>.
>
> The binary artifacts for the release are available in Maven Central
> <http://repo1.maven.org/maven2/com/klarna/hiverunner/3.2.1/>.
>
> We encourage everyone to upgrade to this new version.
>
> Thanks,
>
> Adrian
> (on behalf of the HiveRunner committers)
>
-- 

*Per Ullberg*
Datavault Tech Lead
Odin (Uppsala)

Klarna Bank AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 
Reg no: 556737-0431
klarna.com


Re: Is 'application' a reserved word?

2018-05-30 Thread Per Ullberg
Make sure to backtick all table and column names and you won’t get a
problem when updating. We also try to include keywords in our automated
integration tests to try to find places where we’ve missed the backticks.

/Pelle


On Wed, 30 May 2018 at 20:07, Matt Burgess  wrote:

> Edward,
>
> I hear that, "application" is a bit unfortunate as a reserved word as
> well, I wonder how many folks have data sets with a column named
> "application". We have that field in provenance data in NiFi, I
> discovered it was a reserved word when trying to create a flow to put
> NiFi provenance data into Hive for analysis.
>
> Regards,
> Matt
>
>
> On Wed, May 30, 2018 at 2:04 PM, Edward Capriolo 
> wrote:
> > We got bit pretty hard when "exchange partitions" was added. How many
> people
> > in ad-tech work with exchange's? everyone!
> >
> > On Wed, May 30, 2018 at 1:38 PM, Alan Gates 
> wrote:
> >>
> >> It is.  You can see the definitive list of keywords at
> >>
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g
> >> (Note this is for the master branch, you can switch the branch around to
> >> find the list for a particular release.)  It would be good to file a
> JIRA on
> >> this so we fix the documentation.
> >>
> >> Alan.
> >>
> >> On Wed, May 30, 2018 at 7:48 AM Matt Burgess 
> wrote:
> >>>
> >>> I tried the following simple statement in beeline (Hive 3.0.0):
> >>>
> >>> create table app (application STRING);
> >>>
> >>> And got the following error:
> >>>
> >>> Error: Error while compiling statement: FAILED: ParseException line
> >>> 1:18 cannot recognize input near 'application' 'STRING' ')' in column
> >>> name or constraint (state=42000,code=4)
> >>>
> >>> I checked the Wiki [1] but didn't see 'application' on the list of
> >>> reserved words. However if I change the column name to anything else
> >>> (even 'applicatio') it works. Can someone confirm whether this is a
> >>> reserved word?
> >>>
> >>> Thanks in advance,
> >>> Matt
> >>>
> >>> [1]
> >>>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords
> >
> >
>
-- 

*Per Ullberg*
Datavault Tech Lead
Odin (Uppsala)

Klarna Bank AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 
Reg no: 556737-0431
klarna.com


Re: Unable to retrieve table metadata from hcatalog

2017-05-12 Thread Per Ullberg
And you're sure that you have a database and table in hive matching
*oraclehadoop.bigtab*?

/Pelle

On Sat 13 May 2017 at 01:08 Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> I am using GoldenGate to send data real time from Oracle table to Hive.
>
> It tries to read table metadata using HCat but fails as below:
>
> *ERROR 2017-05-12 22:45:52,700 [main] Unable to retrieve table matadata.
> Table : oraclehadoop.bigtab*
>
> *org.apache.hive.hcatalog.common.HCatException : 9001 : Exception occurred
> while processing HCat request : NoSuchObjectException while fetching
> table.. Cause : NoSuchObjectException(message:oraclehadoop.bigtab table not
> found) **at
> org.apache.hive.hcatalog.api.HCatClientHMSImpl.getTable(HCatClientHMSImpl.java:175)*
>
> at
> oracle.goldengate.mdp.hive.HiveMetaDataProvider.resolve(HiveMetaDataProvider.java:91)
>
> at
> oracle.goldengate.datasource.metadata.provider.TargetMetaDataStore.retrieveMetaData(TargetMetaDataStore.java:73)
>
> at
> oracle.goldengate.datasource.UserExitDataSource.getMetaData(UserExitDataSource.java:2133)
>
> Caused by: NoSuchObjectException(message:oraclehadoop.bigtab table not
> found)
>
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table_core(HiveMetaStore.java:1885)
>
> at
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:1838)
>
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
> at java.lang.reflect.Method.invoke(Method.java:497)
>
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:138)
>
> at
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
>
> at com.sun.proxy.$Proxy13.get_table(Unknown Source)
>
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:1228)
>
> at
> org.apache.hive.hcatalog.api.HCatClientHMSImpl.getTable(HCatClientHMSImpl.java:168)
>
>
>
> I am not that familiar with HCat. Any help will be appreciated.
>
>
> thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <javascript:void(0);>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <javascript:void(0);>
Reg no: 556737-0431
klarna.com


Re: Interrogating a uniontype

2016-11-23 Thread Per Ullberg
Could you write a UDF that parses it and returns a json object. From there
you can use the standard json support in Hive. I did something similar for
Erlang structs about 3 years ago. I actually kept them on file and wrote a
serde that exposed them as json objects.

regards
/Pelle

On Wed, Nov 23, 2016 at 6:40 PM, Elliot West <tea...@gmail.com> wrote:

> Ah, I see that this can't be done with an array as there is no type common
> to all union indexes. Perhaps a struct with one field per indexed type?
>
> On Wed, 23 Nov 2016 at 17:29, Elliot West <tea...@gmail.com> wrote:
>
>> Can anyone recommend a good approach for interrogating uniontype values
>> in HQL? I note that the documentation states that the support for such
>> types is limited to 'look-at-only' which I assume to mean that I may only
>> dump out the value in its entirety, and extract sub-elements. Using the
>> example below, is there anyway I can get to field 'b' of union index 3 to
>> extract only the value 5?
>>
>> {0:1}
>> {1:2.0}
>> {2:["three","four"]}
>> {3:{"a":5,"b":"five"}}
>>
>>
>> If not possible with HQL, would it be possible to implement a UDF that
>> can explode the type into something more navigable, like an array, struct,
>> or map?
>>
>> Example when exploded as array:
>>
>>
>> [1,null,null,null]
>> [null,2.0,null,null]
>> [null,null,["three","four"],null]
>> [null,null,null,{"a":5,"b":"five"}]
>>
>> Has anyone done this?
>>
>> Thanks,
>>
>> Elliot.
>>
>>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Connect metadata

2016-10-25 Thread Per Ullberg
Sorry, no help from me I'm afraid, but I have a similar question:

Could someone provide me with a code snippet (preferably Java) that
installs the schema (through datanucleus) on my empty metastore (postgres).
I don't want Hive to install the schema at startup, but rather do it
explicitly myself.

regards
/Pelle

On Tue, Oct 25, 2016 at 9:09 AM, Rajendra Bhat <rajhalk...@gmail.com> wrote:

> Hi Team,
>
> I have configured only meta store and started the meta store service,
> hwich i ma used on presto.
>
> I need create table on metastore.. how can i able create that.. as i am
> not started hive server service, bcoz hadoop not installed on my syatem.
>
> --
> Thanks and
> Regards
>
> Rajendra Bhat
>



-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Hive metadata on Hbase

2016-10-24 Thread Per Ullberg
What version of hive are you running?

/Pelle

On Monday, October 24, 2016, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> @Per
>
> We run full transactional enabled Hive metadb on an Oracle DB.
>
> I don't have statistics now but will collect from AWR reports no problem.
>
> @Jorn,
>
> The primary reason Oracle was chosen is because the company has global
> licenses for Oracle + MSSQL + SAP and they are classified as Enterprise
> Grade databases.
>
> None of MySQL and others are classified as such so they cannot be deployed
> in production.
>
> Besides, for us to have Hive metadata on Oracle makes sense as our
> infrastructure does all the support, HA etc for it and they have trained
> DBAs to look after it 24x7.
>
> Admittedly we are now relying on HDFS itself plus Hbase as well for
> persistent storage. So the situation might change.
>
> HTH
>
>
>
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
> On 24 October 2016 at 06:46, Per Ullberg <per.ullb...@klarna.com
> <javascript:_e(%7B%7D,'cvml','per.ullb...@klarna.com');>> wrote:
>
>> I thought the main gain was to get ACID on Hive performant enough.
>>
>> @Mich: Do you run with ACID-enabled tables? How many
>> Create/Update/Deletes do you do per second?
>>
>> best regards
>> /Pelle
>>
>> On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke <jornfra...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','jornfra...@gmail.com');>> wrote:
>>
>>> I think the main gain is more about getting rid of a dedicated database
>>> including maintenance and potential license cost.
>>> For really large clusters and a lot of users this might be even more
>>> beneficial. You can avoid clustering the database etc.
>>>
>>> On 24 Oct 2016, at 00:46, Mich Talebzadeh <mich.talebza...@gmail.com
>>> <javascript:_e(%7B%7D,'cvml','mich.talebza...@gmail.com');>> wrote:
>>>
>>>
>>> A while back there was some notes on having Hive metastore on Hbase as
>>> opposed to conventional RDBMSs
>>>
>>> I am currently involved with some hefty work with Hbase and Phoenix for
>>> batch ingestion of trade data. As long as you define your Hbase table
>>> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
>>> impressive.
>>>
>>> I am not sure how much having Hbase as Hive metastore is going to add to
>>> Hive performance. We use Oracle 12c as Hive metastore and the Hive
>>> database/schema is built on solid state disks. Never had any issues with
>>> lock and concurrency.
>>>
>>> Therefore I am not sure what one is going to gain by having Hbase as the
>>> Hive metastore? I trust that we can still use our existing schemas on
>>> Oracle.
>>>
>>> HTH
>>>
>>>
>>>
>>> Dr Mich Talebzadeh
>>>
>>>
>>>
>>> LinkedIn * 
>>> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>>
>>>
>>>
>>> http://talebzadehmich.wordpress.com
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>
>>
>> --
>>
>> *Per Ullberg*
>> Data Vault Tech Lead
>> Odin Uppsala
>> +46 701612693 <+46+701612693>
>>
>> Klarna AB (publ)
>> Sveavägen 46, 111 34 Stockholm
>> Tel: +46 8 120 120 00 <+46812012000>
>> Reg no: 556737-0431
>> klarna.com
>>
>>
>

-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Hive metadata on Hbase

2016-10-23 Thread Per Ullberg
I thought the main gain was to get ACID on Hive performant enough.

@Mich: Do you run with ACID-enabled tables? How many Create/Update/Deletes
do you do per second?

best regards
/Pelle

On Mon, Oct 24, 2016 at 7:39 AM, Jörn Franke <jornfra...@gmail.com> wrote:

> I think the main gain is more about getting rid of a dedicated database
> including maintenance and potential license cost.
> For really large clusters and a lot of users this might be even more
> beneficial. You can avoid clustering the database etc.
>
> On 24 Oct 2016, at 00:46, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>
> A while back there was some notes on having Hive metastore on Hbase as
> opposed to conventional RDBMSs
>
> I am currently involved with some hefty work with Hbase and Phoenix for
> batch ingestion of trade data. As long as you define your Hbase table
> through Phoenix and with secondary Phoenix indexes on Hbase, the speed is
> impressive.
>
> I am not sure how much having Hbase as Hive metastore is going to add to
> Hive performance. We use Oracle 12c as Hive metastore and the Hive
> database/schema is built on solid state disks. Never had any issues with
> lock and concurrency.
>
> Therefore I am not sure what one is going to gain by having Hbase as the
> Hive metastore? I trust that we can still use our existing schemas on
> Oracle.
>
> HTH
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Unit testing macros

2016-09-30 Thread Per Ullberg
We ran in to the same issue with oozie hive actions not sharing sessions.
We went for the concatenation approach. We have a java action before the
hive action that reads all hql files from a specification and concatenates
these into one long hql script. We also add some sugar at this point by
setting some custom hive variables (like NOMIAL_TIMESTAMP that allows the
users to not use current_time udf in their scripts etc).

best regards
/Pelle

On Fri, Sep 30, 2016 at 12:12 PM, Staņislavs Rogozins <
stanislavs.rogoz...@gmail.com> wrote:

> Have you taken a look at https://github.com/klarna/HiveRunner?
>
>
> I assume you're referring to this:
> https://github.com/klarna/HiveRunner/blob/ef14a1c181be863cec
> 2278aeb732d9e21c38a2b3/src/test/java/com/klarna/hiverunner/MacroTest.java
>
> It appears to execute the macro creation statement as part of the session,
> using hive library's CLIService class, so, doing the same during normal
> execution would require for the application itself to be implemented in
> Java.
>
> On Fri, Sep 30, 2016 at 12:45 PM, Elliot West <tea...@gmail.com> wrote:
>
>> Hi,
>>
>> You can achieve this by storing the macro definition in a separate HQL
>> file and 'import' this as needed. Unfortunately such imports are
>> interpreted by your Hive client and the relevant command varies between
>> client implementations: '!run' in Beeline and 'SOURCE' in Hive CLI. I
>> raised a proposal to create a unified command that is compatible across
>> clients but this has yet to gain any traction: https://issues.apach
>> e.org/jira/browse/HIVE-12703
>>
>> Elliot.
>>
>>
>>
> Actually, I'm developing an Oozie workflow that executes HQL scripts as
>  actions. Doesn't look like it provides support for such features.
> So, it would seem that my current options are either to use shell action
> instead or do some magic with concatenation of scripts before main action's
> execution.
> A first-class command like that sure would be nice.
>
>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Unit testing macros

2016-09-30 Thread Per Ullberg
Have you taken a look at https://github.com/klarna/HiveRunner?

regards
/Pelle

On Fri, Sep 30, 2016 at 11:23 AM, Staņislavs Rogozins <
stanislavs.rogoz...@gmail.com> wrote:

> The Unit testing wiki page
> <https://cwiki.apache.org/confluence/display/Hive/Unit+Testing+Hive+SQL> 
> suggests
> using macros to 'extract and resuse the expressions applied to columns' and
> that they can be 'readily isolated for testing'. However as far as I'm
> aware, right now only temporary macros can be created, that stop existing
> outside of session where they were defined(https://cwiki.apache.
> org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/
> DropMacro), thus making it necessary to include the macro in every HQL
> script that uses it. How is it possible to isolate them for testing?
>



-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com


Re: Query consuming all resources

2016-09-28 Thread Per Ullberg
What Jörn said. We use the capacity scheduler to be able to give priority
to some user groups over others.

Regards
/Pelle

On Wednesday, September 28, 2016, Jörn Franke <jornfra...@gmail.com> wrote:

> You need to configure queues in yarn and use the fairscheduler. From your
> use case it looks like you need to also configure pre-emption
>
> > On 28 Sep 2016, at 00:52, Jose Rozanec <jose.roza...@mercadolibre.com
> <javascript:;>> wrote:
> >
> > Hi,
> >
> > We have a Hive cluster. We notice that some queries consume all
> resources, which is not desirable to us, since we want to grant some degree
> of parallelism to incoming ones: any incoming query should be able to do at
> least some progress, not just wait the big one finish.
> >
> > Is there way to do so? We use Hive 2.1.0 with Tez engine.
> >
> > Thank you in advance,
> >
> > Joze.
>


-- 

*Per Ullberg*
Data Vault Tech Lead
Odin Uppsala
+46 701612693 <+46+701612693>

Klarna AB (publ)
Sveavägen 46, 111 34 Stockholm
Tel: +46 8 120 120 00 <+46812012000>
Reg no: 556737-0431
klarna.com