I think Hive especially these old versions have not been designed for this. Why not store them in Hbase and run a oozie job regularly that puts them all into Hive /Orc or parquet in a bulk job?
> On 24 Aug 2016, at 09:35, Joel Victor <[email protected]> wrote: > > Currently I am using Apache Hive 0.14 that ships with HDP 2.2. We are trying > perform streaming ingestion with it. > We are using the Storm Hive bolt and we have 7 tables in which we are trying > to insert. The RPS (requests per second) of our bolts ranges from 7000 to > 5000 and our commit policies are configured accordingly i.e 100k events or 15 > seconds. > > We see that there are many commitTxn exceptions due to serialization errors > in the metastore (we are using PostgreSQL 9.5 as metastore) > The serialization errors will cause the topology to start lagging in terms of > events processed as it will try to reprocess the batches that have failed. > > I have already backported this HIVE-10500 to 0.14 and there isn't much > improvement. > I went through most of the JIRA's about transaction and I found the following > HIVE-11948, HIVE-13013. I would like to backport them to 0.14. > Going through the patches gives me an impression that I need to mostly update > the queries and transaction levels. > Do these patches also require me to update the schema in the metastore? > Please also let me know if there are any other patches that I missed. > > I would also like to know whether Apache Hive can handle inserts to the > same/different tables concurrently from multiple clients in 1.2.1 or later > versions without many serialization errors in Hive metastore? > > -Joel
