To store the processed records I am using HiveBolt in Storm topology with
following arguments.
- id: "MyHiveOptions"
className: "org.apache.storm.hive.common.HiveOptions"
- "${metastore.uri}" # metaStoreURI
- "${hive.database}" # databaseName
- "${hive.table}" # tableName
configMethods:
- name: "withTxnsPerBatch"
args:
- 2
- name: "withBatchSize"
args:
- 100
- name: "withIdleTimeout"
args:
- 2 #default value 0
- name: "withMaxOpenConnections"
args:
- 200 #default value 500
- name: "withCallTimeout"
args:
- 30000 #default value 10000
- name: "withHeartBeatInterval"
args:
- 240 #default value 240
There are missing transaction in Hive due to batch no being completed and
records are flushed. (For example: 1330 records are processed but only 1200
records are in hive. 130 records missing.)
How can I overcome this situation? How can I fill the batch so that the
transaction is triggered and the records are stored in hive.
Topology : Kafka-Spout --> DataProcessingBolt
DataProcessingBolt -->HiveBolt (Sink)
DataProcessingBolt -->JdbcBolt (Sink)
--
Thanks and Regards,
Harshit Raikar