Btw this is not Hive specific, but also for other relational database systems, such as Oracle Exadata.
> On 05 Jan 2016, at 20:57, Jörn Franke <jornfra...@gmail.com> wrote: > > You can still use execution Engine mr for maintaining the index. Indeed with > the ORC or parquet format there are min/max indexes and bloom filters, but > you need to sort your data appropriately to benefit from performance. > Alternatively you can create redundant tables sorted in different order. > The "traditional" indexes can still make sense for data not in Orc or parquet > format. > Keep in mind that for warehouse scenarios there are many other optimization > methods in Hive. > >> On 05 Jan 2016, at 19:17, Ting(Goden) Yao <t...@pivotal.io> wrote: >> >> Hi, >> >> We hit an issue when doing Hive testing to rebuild index on Tez. >> We were told by our Hadoop distro vendor that it's not recommended (or >> should avoid) using index with Hive. >> >> But I don't see an official message on Hive wiki or documentation. >> Can someone confirm that so we'll ask our users to avoid indexing. >> >> Thanks. >> -Goden >> >> ==Exceptions (if you're interested in details) == >> Exception: >> >> 2015-12-08 22:55:30,263 FATAL [AsyncDispatcher event handler] >> event.AsyncDispatcher: Error in dispatcher thread >> org.apache.tez.dag.api.TezUncheckedException: Unable to instantiate class >> with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator >> at >> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:80) >> at >> org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:98) >> at >> org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:137) >> at >> org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:114) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:3943) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl.access$3900(VertexImpl.java:180) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:2956) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2906) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2887) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) >> at >> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) >> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) >> at >> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) >> at >> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) >> at >> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) >> at >> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) >> at java.lang.Thread.run(Thread.java:745) >> Caused by: java.lang.reflect.InvocationTargetException >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) >> at >> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) >> at >> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) >> at java.lang.reflect.Constructor.newInstance(Constructor.java:526) >> at >> org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:69) >> ... 20 more >> Caused by: java.lang.NullPointerException >> at >> org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.initialize(DynamicPartitionPruner.java:154) >> at >> org.apache.hadoop.hive.ql.exec.tez.DynamicPartitionPruner.<init>(DynamicPartitionPruner.java:110) >> at >> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:95) >> ... 25 more >> 2015-12-08 22:55:30,266 ERROR [AsyncDispatcher event handler] >> impl.VertexImpl: Can't handle Invalid event V_START on vertex Map 1 with >> vertexId vertex_1449613300943_0002_1_00 at current state NEW >> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: >> V_START at NEW >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:305) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46) >> at >> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448) >> at >> org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57) >> at >> org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1556) >> at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:179) >> at >> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1764) >> at >> org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1750) >> at >> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173) >> at >> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106) >> at java.lang.Thread.run(Thread.java:745) >> 2015-12-08 22:55:30,267 ERROR [AsyncDispatcher event handler] >> impl.VertexImpl: Invalid event V_INTERNAL_ERROR on Vert