Hi Prajakta, I get it now. But are there plans to support groupbys of that format, or would I be able to write a UDF of sorts to do that?
Also, could you please give me an example of accelerating equi-joins with indices, I haven't been able to get that to work either :( Regards, -- Rohan Monga On Wed, Sep 21, 2011 at 6:15 PM, Prajakta Kalmegh <pkalm...@gmail.com> wrote: > Hi Rohan > As of now, we support queries with aggregate function count on columns which > are the same as index key column and are used in group-by construct. For > example, > CREATE TABLE tbl(key int, value int); > CREATE INDEX tbl_key_idx ON TABLE tbl(key) AS > 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED > REBUILD IDXPROPERTIES("AGGREGATES"="count(key)"); > ALTER INDEX tbl_key_idx ON tbl REBUILD; > It is valid to use the following query: > select key, count(key) from tbl where key < 1000 group by key; > But not valid to use: > select value, count(key) from tbl group by value; (I think this is the same > as your case) > As of now, you cannot use the aggregate index in case of join queries. > For the semantic analysis error, I apologize for the typo in the code. It > worked for us as it was uniform in all the classes :) > You forgot to change it in a few more instances and hence the semantic > error. The aggregate index itself creates the column references with a > 'count_Of..' in it. We have fixed the issue and should be updated in the > trunk soon. Thanks for noticing it though. :) > Regards, > Prajakta > > > > > > > > On Wed, Sep 21, 2011 at 12:28 AM, rohan monga <monga.ro...@gmail.com> wrote: >> >> Hi Prajakta, >> Thanks, I was able to generate a plan which was using indices by >> following your advise. >> However, I want to group on one key and count the others >> "select count(id2) from table_t1 group by id1", would that not be possible >> ? >> >> Also, could you give me an example of using indices to speed up joins? >> >> And for the semantic analysis thing, I sent the error in my first >> email, here are the lines that I modified to get it to work. Basically >> changed "_count_Of" to "_count_of" >> >> Regards, >> -- >> Rohan Monga >> >> >> >> On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalm...@gmail.com> >> wrote: >> > Hi Rohan >> > I run your queries on the same version from trunk and did not get any >> > errors. Not sure why you are getting a semantic analysis error. It would >> > be >> > good if you could send me a snapshot of the error message. >> > About the second issue of getting the same plan even if you >> > set hive.optimize.index.groupby to true or false, we do not optimize for >> > cases where the count (aggregate functions) are applied on keys other >> > than >> > the group-by keys. Since you are trying to get a count of id2 and have >> > the >> > index key (as well as the group-by key) on id1, the optimization is not >> > applied. Hence the same plan. >> > Hope this helps. Please let me know if you have any questions. >> > Regards, >> > Prajakta >> > >> > >> >> From: rohan monga <monga.ro...@gmail.com> >> >> Date: September 19, 2011 11:26:29 PM PDT >> >> To: <user@hive.apache.org> >> >> Subject: problems with indices >> >> Reply-To: <user@hive.apache.org> >> >> >> >> Hi, >> >> I have a table and index that look like >> >> >> >> <snip> >> >> CREATE TABLE table_t1(id1 int, id2 int) >> >> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS >> >> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED >> >> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)"); >> >> ALTER INDEX table_t1_idx ON table_t1 REBUILD; >> >> </snip> >> >> >> >> Now, I am seeing 2 problems with the query 'set >> >> hive.optimize.index.groupby=false;select count(id2) from table_t1 >> >> group by id1', >> >> firstly, >> >> >> >> <snip> >> >> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or >> >> column reference '`_count_Of_id2`': (possible column names are: id1, >> >> _bucketname, _offsets, _count_of_id2) >> >> </snip> >> >> >> >> So, I assumed that it was a typo or something, and I change all 'Of' >> >> to 'of' in the relevant places in the code. >> >> >> >> Then I ran the query again, with the option set to true and then to >> >> false, the run times were almost the same. Also 'explain' on the >> >> queries show identical query plans. >> >> I am using hive revision 1172989 from trunk >> >> >> >> Could someone help me with this? >> >> >> >> Regards, >> >> >> >> -- >> >> Rohan Monga >> > >> > >> > > >