Re: Concurrency support of Apache Hive for streaming data ingest at 7K RPS into multiple tables

Joel Victor Wed, 24 Aug 2016 01:07:53 -0700

@Jörn: If I understood correctly even later versions of Hive won't be able
to handle these kinds of workloads?


On Wed, Aug 24, 2016 at 1:26 PM, Jörn Franke <jornfra...@gmail.com> wrote:

> I think Hive especially these old versions have not been designed for
> this. Why not store them in Hbase and run a oozie job regularly that puts
> them all into Hive /Orc or parquet in a bulk job?
>
> On 24 Aug 2016, at 09:35, Joel Victor <joelsvic...@gmail.com> wrote:
>
> Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are
> trying perform streaming ingestion with it.
> We are using the Storm Hive bolt and we have 7 tables in which we are
> trying to insert. The RPS (requests per second) of our bolts ranges from
> 7000 to 5000 and our commit policies are configured accordingly i.e 100k
> events or 15 seconds.
>
> We see that there are many commitTxn exceptions due to serialization
> errors in the metastore (we are using PostgreSQL 9.5 as metastore)
> The serialization errors will cause the topology to start lagging in terms
> of events processed as it will try to reprocess the batches that have
> failed.
>
> I have already backported this HIVE-10500
> <https://issues.apache.org/jira/browse/HIVE-10500> to 0.14 and there
> isn't much improvement.
> I went through most of the JIRA's about transaction and I found the
> following HIVE-11948 <https://issues.apache.org/jira/browse/HIVE-11948>,
> HIVE-13013 <https://issues.apache.org/jira/browse/HIVE-13013>. I would
> like to backport them to 0.14.
> Going through the patches gives me an impression that I need to mostly
> update the queries and transaction levels.
> Do these patches also require me to update the schema in the metastore?
> Please also let me know if there are any other patches that I missed.
>
> I would also like to know whether Apache Hive can handle inserts to the
> same/different tables concurrently from multiple clients in 1.2.1 or later
> versions without many serialization errors in Hive metastore?
>
> -Joel
>
>

Re: Concurrency support of Apache Hive for streaming data ingest at 7K RPS into multiple tables

Reply via email to