@Jörn: If I understood correctly even later versions of Hive won't be able to handle these kinds of workloads?
On Wed, Aug 24, 2016 at 1:26 PM, Jörn Franke <jornfra...@gmail.com> wrote: > I think Hive especially these old versions have not been designed for > this. Why not store them in Hbase and run a oozie job regularly that puts > them all into Hive /Orc or parquet in a bulk job? > > On 24 Aug 2016, at 09:35, Joel Victor <joelsvic...@gmail.com> wrote: > > Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are > trying perform streaming ingestion with it. > We are using the Storm Hive bolt and we have 7 tables in which we are > trying to insert. The RPS (requests per second) of our bolts ranges from > 7000 to 5000 and our commit policies are configured accordingly i.e 100k > events or 15 seconds. > > We see that there are many commitTxn exceptions due to serialization > errors in the metastore (we are using PostgreSQL 9.5 as metastore) > The serialization errors will cause the topology to start lagging in terms > of events processed as it will try to reprocess the batches that have > failed. > > I have already backported this HIVE-10500 > <https://issues.apache.org/jira/browse/HIVE-10500> to 0.14 and there > isn't much improvement. > I went through most of the JIRA's about transaction and I found the > following HIVE-11948 <https://issues.apache.org/jira/browse/HIVE-11948>, > HIVE-13013 <https://issues.apache.org/jira/browse/HIVE-13013>. I would > like to backport them to 0.14. > Going through the patches gives me an impression that I need to mostly > update the queries and transaction levels. > Do these patches also require me to update the schema in the metastore? > Please also let me know if there are any other patches that I missed. > > I would also like to know whether Apache Hive can handle inserts to the > same/different tables concurrently from multiple clients in 1.2.1 or later > versions without many serialization errors in Hive metastore? > > -Joel > >