[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2021-08-27 Thread Jimi (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17405701#comment-17405701
 ] 

Jimi commented on KUDU-2038:


when publish this feature ?

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: performance, roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2020-11-24 Thread Andrew Wong (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238400#comment-17238400
 ] 

Andrew Wong commented on KUDU-2038:
---

There is a patch for bitmap indexing out, but I don't think it is being 
actively worked on right now: https://gerrit.cloudera.org/c/11722/. It is 
something that I have wanted to revisit, but haven't had the time to prioritize 
recently.

KUDU-3033 is another ticket that I think would be really helpful for reducing 
IO for selective predicates, but again I'm unaware of anyone working on it. If 
you're interested in picking up either feature, I'd be happy to help design and 
review.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2020-11-23 Thread Redriver (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17237257#comment-17237257
 ] 

Redriver commented on KUDU-2038:


What is about the current status? It is a very useful feature for fast scan.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2019-10-30 Thread LiFu He (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16963597#comment-16963597
 ] 

LiFu He commented on KUDU-2038:
---

It looks like the Procella from the YouTuBe supports secondary index, and takes 
the example of Roaring.

[Procella|[http://www.vldb.org/pvldb/vol12/p2022-chattopadhyay.pdf]]
 * 2.2.2 Metadata Storage
 * 3.2 Data format -> Supports storing inverted indexes.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2019-10-14 Thread Adar Dembo (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16951174#comment-16951174
 ] 

Adar Dembo commented on KUDU-2038:
--

bq. Is this issue duplicate with KUDU-2613 ?

Yes. I closed that JIRA since it's newer than this one; when we find 
duplicates, we try to keep the oldest (or at least the most informative) of the 
two JIRAs.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Task
>Reporter: Yi Guolei
>Priority: Major
>  Labels: roadmap-candidate
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2019-10-14 Thread Finch Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16950792#comment-16950792
 ] 

Finch Jiang commented on KUDU-2038:
---

Is this issue duplicate with 
[KUDU-2613|https://issues.apache.org/jira/browse/KUDU-2613] ?

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>Priority: Major
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-10-26 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221172#comment-16221172
 ] 

Todd Lipcon commented on KUDU-2038:
---

Sure, I think something like BRIN could also work. Again, though, the design 
would have to account for updates potentially invalidating (or updating) 
indexes.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-10-25 Thread HeLifu (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16220011#comment-16220011
 ] 

HeLifu commented on KUDU-2038:
--

how about BRIN ? [link 
title|https://wiki.postgresql.org/wiki/What's_new_in_PostgreSQL_9.5#BRIN_Indexes

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-16 Thread Yi Guolei (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052465#comment-16052465
 ] 

Yi Guolei commented on KUDU-2038:
-

I know the reading processes. But why not apply mutations to the bitmap index 
to get a new index,  and then using the new bitmap index to evaluate?   I think 
it is more efficient. 

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-16 Thread Andrew Wong (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052063#comment-16052063
 ] 

Andrew Wong commented on KUDU-2038:
---

I think the above placement (storing the index as a part of the base data) 
makes sense.

Applying mutations after reading might be tricky though. Currently, if there 
are mutations, 1) we read a block of rows from disk, 2) then apply deltas to 
that block, and 3) finally evaluate the predicate against each row. The benefit 
with a bitmap index is that we can avoid 1) and do 3) without reading the row 
data into memory, and at the moment, 2) _must_ come before 3). Invalidating the 
index in the presence of mutations, as Todd mentioned, is probably the simplest 
solution.

Another thing to keep in mind is that building the bitmap index isn't trivial 
since it essentially requires reading the entire table. If starting from a 
brand new cluster, this wouldn't be so bad. If data already exists in the table 
and an index is created, there will likely be some time between the initial 
call to the CreateBitmapIndex() and the bitmap actually being ready. If the 
bitmap index can be specified at tablet creation, this probably isn't an issue.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-16 Thread Yi Guolei (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16051545#comment-16051545
 ] 

Yi Guolei commented on KUDU-2038:
-

What about this way:
1. Add index(maybe bitmap index) just on base cfile and the bitmap is 
immutable. 
2. Apply mutations to index according to query timestamp while reading.
3. using the new index to query data.
At most time, there are only several mutations, so the step 2 is very quickly. 
And also we could cache the index to accelerate the upcoming queries.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045297#comment-16045297
 ] 

Todd Lipcon commented on KUDU-2038:
---

I think bitmap indexing is probably more appropriate for typical datawarehouse 
applications rather than b-tree.

One thing to keep in mind, though, is that the design would need to either 
support mutability (which is a bit tricky on bitmap indexes) or incorporate 
some kind of feature that users could mark a column as immutable, and only 
allow indexing immutable columns. Another option would be to invalidate indexes 
after mutations in a given cfile, so that some portion of the index might be 
"inactive" at a given time due to mutations, but the assumption is that most 
portions would be active.

I don't know of anyone currently working on this, but long term it is an 
appealing idea.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-09 Thread Yi Guolei (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044104#comment-16044104
 ] 

Yi Guolei commented on KUDU-2038:
-

I don't think partition could solve the problem.  Partition could split data 
into different tablet. What I have said is to speed up search in a single 
tablet.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KUDU-2038) Add b-tree or inverted index on value field

2017-06-08 Thread lifu (JIRA)

[ 
https://issues.apache.org/jira/browse/KUDU-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044018#comment-16044018
 ] 

lifu commented on KUDU-2038:


maybe range partition on salary and age could reduce the scan data effectively.

> Add b-tree or inverted index on value field
> ---
>
> Key: KUDU-2038
> URL: https://issues.apache.org/jira/browse/KUDU-2038
> Project: Kudu
>  Issue Type: Wish
>Reporter: Yi Guolei
>
> Do we have a plan to add index on any column [not primary column] ? Currently 
> kudu does not have btree or inverted index on columns. In this case if a 
> query wants to filter a column then kudu has to scan all datas in all 
> rowsets. 
> For example, select * from table where salary > 1 and age < 40, the bloom 
> filter or min max index will have no effect, kudu has to scan all datas in 
> all row sets. But if kudu has inverted index, then it will be much faster.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)