[ 
https://issues.apache.org/jira/browse/IOTDB-544?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17051779#comment-17051779
 ] 

Zesong Sun commented on IOTDB-544:
----------------------------------

Hi, 

I'm quite interested in index and summary information optimization. I'd like to 
 try my best in contributing to it.

Thanks,
Sun Zesong

> Apache IoTDB integration with more powerful aggregation index
> -------------------------------------------------------------
>
>                 Key: IOTDB-544
>                 URL: https://issues.apache.org/jira/browse/IOTDB-544
>             Project: Apache IoTDB
>          Issue Type: Wish
>          Components: Core/Engine
>            Reporter: Xiangdong Huang
>            Priority: Major
>              Labels: IoTDB, gsoc2020, mentor
>
> IoTDB is a highly efficient time series database, which supports high speed 
> query process, including aggregation query.
> Currently, IoTDB pre-calculates the aggregation info, or called the summary 
> info, (sum, count, max_time, min_time, max_value, min_value) for each page 
> and each Chunk. The info is helpful for aggregation operations and some query 
> filters. For example, if the query filter is value >10 and the max value of a 
> page is 9, we can skip the page. For another example, if the query is select 
> max(value) and the max value of 3 chunks are 5, 10, 20, then the max(value) 
> is 20. 
> However, there are two drawbacks:
> 1. The summary info actually reduces the data that needs to be scanned as 1/k 
> (suppose each page has k data points). However, the time complexity is still 
> O(N). If we store a long historical data, e.g., storing 2 years data with 
> 500KHz, then the aggregation operation may be still time-consuming. So, a 
> tree-based index to reduce the time complexity from O(N) to O(logN) is a good 
> choice. Some basic ideas have been published in [1], while it can just handle 
> data with fix frequency. So, improving it and implementing it into IoTDB is a 
> good choice.
> 2. The summary info is helpless for evaluating the query like where value >8 
> if the max value = 10. If we can enrich the summary info, e.g., storing the 
> data histogram, we can use the histogram to evaluate how many points we can 
> return. 
> This proposal is mainly for adding an index for speeding up the aggregation 
> query. Besides, if we can let the summary info be more useful, it could be 
> better.
> Notice that the premise is that the insertion speed should not be slow down 
> too much!
> You should know:
>  • IoTDB query process
>  • TsFile structure and organization
>  • Basic index knowledge
>  • Java 
> difficulty: Major
>  mentors:
>  h...@apache.org
> Reference:
> [1] [https://www.sciencedirect.com/science/article/pii/S0306437918305489]
>   
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to