Re: Trouble trying to get started with hive

2016-07-11 Thread Jörn Franke
Please use a Hadoop distribution to avoid these configuration issues (in the beginning). > On 05 Jul 2016, at 12:06, Kari Pahula wrote: > > Hi. I'm trying to familiarize myself with Hadoop and various projects related > to it. > > I've been following >

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
In my test I did like for like keeping the systematic the same namely: 1. Table was a parquet table of 100 Million rows 2. The same set up was used for both Hive on Spark and Hive on MR 3. Spark was very impressive compared to MR on this particular test. Just to see any issues I

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Gopal Vijayaraghavan
> Status: Finished successfully in 14.12 seconds > OK > 1 > Time taken: 14.38 seconds, Fetched: 1 row(s) That might be an improvement over MR, but that still feels far too slow. Parquet numbers are in general bad in Hive, but that's because the Parquet reader gets no actual love from

Re: Trouble trying to get started with hive

2016-07-11 Thread Shannon Ladymon
Kari, Perhaps the Getting Started doc should be updated. What information did you find missing/incorrect? Have you been able to get it working? -Shannon On Tue, Jul 5, 2016 at 3:06 AM, Kari Pahula wrote: > Hi. I'm trying to familiarize myself with Hadoop and various

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
Another point with Hive on spark and Hive on Tez + LLAP, I am thinking loud :) 1. I am using Hive on Spark and I have a table of 10GB say with 100 users concurrently accessing the same partition of ORC table (last one hour or so) 2. Spark takes data and puts in in memory. I gather

Hive Metastore on Amazon Aurora

2016-07-11 Thread Elliot West
Hello, Is anyone running the Hive metastore database on Amazon Aurora?: https://aws.amazon.com/rds/aurora/details/. My expectation is that it should work nicely as it is derived from MySQL but I'd be keen to hear of user's experiences with this setup. Many thanks, Elliot.

Re: Hive Metastore on Amazon Aurora

2016-07-11 Thread Mich Talebzadeh
Hi Elliot, Am I correct that you want to put your Hive metastore on Amazon? Is the metastore (database/schema) is sitting on on MySQL and you want to migrate your MySQL to cloud now? Two questions that need to be verified 1. How big is your current metadata 2. Do you do a lot of

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Ashok Kumar
Hi Mich, Your recent presentation in London on this topic "Running Spark on Hive or Hive on Spark" Have you made any more interesting findings that you like to bring up? If Hive is offering both Spark and Tez in addition to MR, what stopping one not to use Spark? I still don't get why TEZ + LLAP

Re: Hive Metastore on Amazon Aurora

2016-07-11 Thread Elliot West
Hi Mich, Correct. We have proof of concepts up and running with both MySQL on RDS and Aurora. We'd be keen to hear of experiences of others with Aurora in a Hive metastore database role, primarily as a sanity check. In answer to your specific points: 1. 30GB 2. We don't intend to use ACID

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
The presentation will go deeper into the topic. Otherwise some thoughts of mine. Fell free to comment. criticise :) 1. I am a member of Spark Hive and Tez user groups plus one or two others 2. Spark is by far the biggest in terms of community interaction 3. Tez, typically one thread in

Fast database with writes per second and horizontal scaling

2016-07-11 Thread Ashok Kumar
Hi Gurus, Advice appreciated from Hive gurus. My colleague has been using Cassandra. However, he says it is too slow and not user friendly/MongodDB as a doc databases is pretty neat but not fast enough May main concern is fast writes per second and good scaling. Hive on Spark or Tez? How about

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Michael Segel
Just a clarification. Tez is ‘vendor’ independent. ;-) Yeah… I know… Anyone can support it. Only Hortonworks has stacked the deck in their favor. Drill could be in the same boat, although there now more committers who are not working for MapR. I’m not sure who outside of HW is

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Mich Talebzadeh
Appreciate all the comments. Hive on Spark. Spark runs as an execution engine and is only used when you query Hive. Otherwise it is not running. I run it in Yarn client mode. let me show you an example In hive-site xml set the execution engine to be spark to spark. It requires some configuration

Re: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-11 Thread Jörn Franke
I think llap should be in the future a general component so llap + spark can make sense. I see tez and spark not as competitors but they have different purposes. Hive+Tez+llap is not the same as hive+spark. I think it goes beyond that for interactive queries . Tez - you should use a