On Thu, Oct 25, 2012 at 8:31 AM, Nick maillard <[email protected]> wrote: > Hi jean-Daniel > > Ok I'll sent it in the env thanks for the advice. > Are their other libs I might need to add?
The usual client libs... doesn't seem like we documented them anywhere... it's pretty much what you have in now. > Could just tell hive to use it's lib directory or hbase's lib directory in > it's > classpath in some way? That's a question for the hive ML. > I could just set it in the bashrc but that's not very elegant. I really meant that you should use HIVE_AUX_JARS_PATH in hive-env.sh > > Another thing I am testing my 3 machine hadoop cluster. > I have queried 'select * from myTestTable' which has 1719428 entries. > The 7 map tasks and 1 reducer took almost 5 minutes to compute, I am right to > think it is a little slow? You have a 1-2 minutes overhead in there because you are using MapReduce, then usually one should set hbase.client.scanner.caching to a better value than 1. It's client-side so hive needs to have it. But everything will seem slow when using MR on such a small dataset, a single client running a scan would be faster in this case. > How could I make this go faster, more map tasks, more nodes? Is select count(*) really the use case you want to optimize? Have you read this? http://hbase.apache.org/book.html#performance > > True I would never scan a whole table usually but I could easily have queries > that MR over a set of this size. >
