[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405701#comment-17405701 ] Jimi commented on KUDU-2038: when publish this feature ? > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: performance, roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238400#comment-17238400 ] Andrew Wong commented on KUDU-2038: --- There is a patch for bitmap indexing out, but I don't think it is being actively worked on right now: https://gerrit.cloudera.org/c/11722/. It is something that I have wanted to revisit, but haven't had the time to prioritize recently. KUDU-3033 is another ticket that I think would be really helpful for reducing IO for selective predicates, but again I'm unaware of anyone working on it. If you're interested in picking up either feature, I'd be happy to help design and review. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237257#comment-17237257 ] Redriver commented on KUDU-2038: What is about the current status? It is a very useful feature for fast scan. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963597#comment-16963597 ] LiFu He commented on KUDU-2038: --- It looks like the Procella from the YouTuBe supports secondary index, and takes the example of Roaring. [Procella|[http://www.vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf]] * 2.2.2 Metadata Storage * 3.2 Data format -> Supports storing inverted indexes. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951174#comment-16951174 ] Adar Dembo commented on KUDU-2038: -- bq. Is this issue duplicate with KUDU-2613 ? Yes. I closed that JIRA since it's newer than this one; when we find duplicates, we try to keep the oldest (or at least the most informative) of the two JIRAs. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Task >Reporter: Yi Guolei >Priority: Major > Labels: roadmap-candidate > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950792#comment-16950792 ] Finch Jiang commented on KUDU-2038: --- Is this issue duplicate with [KUDU-2613|https://issues.apache.org/jira/browse/KUDU-2613] ? > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei >Priority: Major > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221172#comment-16221172 ] Todd Lipcon commented on KUDU-2038: --- Sure, I think something like BRIN could also work. Again, though, the design would have to account for updates potentially invalidating (or updating) indexes. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220011#comment-16220011 ] HeLifu commented on KUDU-2038: -- how about BRIN ? [link title|https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#BRIN_Indexes > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052465#comment-16052465 ] Yi Guolei commented on KUDU-2038: - I know the reading processes. But why not apply mutations to the bitmap index to get a new index, and then using the new bitmap index to evaluate? I think it is more efficient. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052063#comment-16052063 ] Andrew Wong commented on KUDU-2038: --- I think the above placement (storing the index as a part of the base data) makes sense. Applying mutations after reading might be tricky though. Currently, if there are mutations, 1) we read a block of rows from disk, 2) then apply deltas to that block, and 3) finally evaluate the predicate against each row. The benefit with a bitmap index is that we can avoid 1) and do 3) without reading the row data into memory, and at the moment, 2) _must_ come before 3). Invalidating the index in the presence of mutations, as Todd mentioned, is probably the simplest solution. Another thing to keep in mind is that building the bitmap index isn't trivial since it essentially requires reading the entire table. If starting from a brand new cluster, this wouldn't be so bad. If data already exists in the table and an index is created, there will likely be some time between the initial call to the CreateBitmapIndex() and the bitmap actually being ready. If the bitmap index can be specified at tablet creation, this probably isn't an issue. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051545#comment-16051545 ] Yi Guolei commented on KUDU-2038: - What about this way: 1. Add index(maybe bitmap index) just on base cfile and the bitmap is immutable. 2. Apply mutations to index according to query timestamp while reading. 3. using the new index to query data. At most time, there are only several mutations, so the step 2 is very quickly. And also we could cache the index to accelerate the upcoming queries. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045297#comment-16045297 ] Todd Lipcon commented on KUDU-2038: --- I think bitmap indexing is probably more appropriate for typical datawarehouse applications rather than b-tree. One thing to keep in mind, though, is that the design would need to either support mutability (which is a bit tricky on bitmap indexes) or incorporate some kind of feature that users could mark a column as immutable, and only allow indexing immutable columns. Another option would be to invalidate indexes after mutations in a given cfile, so that some portion of the index might be "inactive" at a given time due to mutations, but the assumption is that most portions would be active. I don't know of anyone currently working on this, but long term it is an appealing idea. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044104#comment-16044104 ] Yi Guolei commented on KUDU-2038: - I don't think partition could solve the problem. Partition could split data into different tablet. What I have said is to speed up search in a single tablet. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field
[ https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044018#comment-16044018 ] lifu commented on KUDU-2038: maybe range partition on salary and age could reduce the scan data effectively. > Add b-tree or inverted index on value field > --- > > Key: KUDU-2038 > URL: https://issues.apache.org/jira/browse/KUDU-2038 > Project: Kudu > Issue Type: Wish >Reporter: Yi Guolei > > Do we have a plan to add index on any column [not primary column] ? Currently > kudu does not have btree or inverted index on columns. In this case if a > query wants to filter a column then kudu has to scan all datas in all > rowsets. > For example, select * from table where salary > 1 and age < 40, the bloom > filter or min max index will have no effect, kudu has to scan all datas in > all row sets. But if kudu has inverted index, then it will be much faster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)