Please use a Hadoop distribution to avoid these configuration issues (in the
beginning).
> On 05 Jul 2016, at 12:06, Kari Pahula wrote:
>
> Hi. I'm trying to familiarize myself with Hadoop and various projects related
> to it.
>
> I've been following
>
In my test I did like for like keeping the systematic the same namely:
1. Table was a parquet table of 100 Million rows
2. The same set up was used for both Hive on Spark and Hive on MR
3. Spark was very impressive compared to MR on this particular test.
Just to see any issues I
> Status: Finished successfully in 14.12 seconds
> OK
> 1
> Time taken: 14.38 seconds, Fetched: 1 row(s)
That might be an improvement over MR, but that still feels far too slow.
Parquet numbers are in general bad in Hive, but that's because the Parquet
reader gets no actual love from
Kari,
Perhaps the Getting Started doc should be updated. What information did
you find missing/incorrect? Have you been able to get it working?
-Shannon
On Tue, Jul 5, 2016 at 3:06 AM, Kari Pahula wrote:
> Hi. I'm trying to familiarize myself with Hadoop and various
Another point with Hive on spark and Hive on Tez + LLAP, I am thinking loud
:)
1. I am using Hive on Spark and I have a table of 10GB say with 100
users concurrently accessing the same partition of ORC table (last one
hour or so)
2. Spark takes data and puts in in memory. I gather
Hello,
Is anyone running the Hive metastore database on Amazon Aurora?:
https://aws.amazon.com/rds/aurora/details/. My expectation is that it
should work nicely as it is derived from MySQL but I'd be keen to hear of
user's experiences with this setup.
Many thanks,
Elliot.
Hi Elliot,
Am I correct that you want to put your Hive metastore on Amazon? Is the
metastore (database/schema) is sitting on on MySQL and you want to migrate
your MySQL to cloud now?
Two questions that need to be verified
1. How big is your current metadata
2. Do you do a lot of
Hi Mich,
Your recent presentation in London on this topic "Running Spark on Hive or Hive
on Spark"
Have you made any more interesting findings that you like to bring up?
If Hive is offering both Spark and Tez in addition to MR, what stopping one not
to use Spark? I still don't get why TEZ + LLAP
Hi Mich,
Correct. We have proof of concepts up and running with both MySQL on RDS
and Aurora. We'd be keen to hear of experiences of others with Aurora in a
Hive metastore database role, primarily as a sanity check. In answer to
your specific points:
1. 30GB
2. We don't intend to use ACID
The presentation will go deeper into the topic. Otherwise some thoughts of
mine. Fell free to comment. criticise :)
1. I am a member of Spark Hive and Tez user groups plus one or two others
2. Spark is by far the biggest in terms of community interaction
3. Tez, typically one thread in
Hi Gurus,
Advice appreciated from Hive gurus.
My colleague has been using Cassandra. However, he says it is too slow and not
user friendly/MongodDB as a doc databases is pretty neat but not fast enough
May main concern is fast writes per second and good scaling.
Hive on Spark or Tez?
How about
Just a clarification.
Tez is ‘vendor’ independent. ;-)
Yeah… I know… Anyone can support it. Only Hortonworks has stacked the deck in
their favor.
Drill could be in the same boat, although there now more committers who are not
working for MapR. I’m not sure who outside of HW is
Appreciate all the comments.
Hive on Spark. Spark runs as an execution engine and is only used when you
query Hive. Otherwise it is not running. I run it in Yarn client mode. let
me show you an example
In hive-site xml set the execution engine to be spark to spark. It requires
some configuration
I think llap should be in the future a general component so llap + spark can
make sense. I see tez and spark not as competitors but they have different
purposes. Hive+Tez+llap is not the same as hive+spark. I think it goes beyond
that for interactive queries .
Tez - you should use a
14 matches
Mail list logo