Re: [Proposal] data filtering and aggregation with tags

2022-06-24 Thread Jialin Qiao
Hi,

Thanks for proposing this out! Extending the tag to support data query is
what we want to for a long time :)

There are some aspects to consider:

We add tags on timeseries, influxdb or other TSDBs add tags on device. This
is a difference but may not a problem in insertion, since on timeseries
could cover on device.
This impacts the query...

For influxdb, the table format is
【time, tag1, tag2, tag3, field1, field2, field3】

so they could use select tag1, tag2, field1, field2 from measurement where
tag=xx

For us, which view could we give users to write the sql?

Same with influxdb or use this【time, tag1, tag2, tag3, fieldname, value】?

This is a big topic. Maybe we need to give some result format example to
see.

Thanks,
—
Jialin Qiao
Apache IoTDB PMC


Eric Pai  于2022年6月24日周五 16:48写道:

> Great! Let's wake up those tags!
>
> 在 2022/6/24 16:42,“Wz” 写入:
>
> Hi guys,
>
>
>
>
> To handle multidimensional queries, we plan to implement data
> filtering and aggregation on top of the MPP framework. Here's a description
> of the scenario:
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache-iotdb.feishu.cn%2Fdocx%2FdoxcnOfxK6kYK159X86gxBVFwtsdata=05%7C01%7C%7Cb785c20dd27a423032e408da55bd62e4%7C84df9e7fe9f640afb435%7C1%7C0%7C637916569203467291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=ZsdsIVgCMgthC0g6vGDv6FMoIsU3oUUWN2xuLf5FqWY%3Dreserved=0
>
>
>
>
> ​Any suggestions are welcome. Thanks, Zhong Wang
>
>


Re: [Proposal] data filtering and aggregation with tags

2022-06-24 Thread Eric Pai
Great! Let's wake up those tags!

在 2022/6/24 16:42,“Wz” 写入:

Hi guys,




To handle multidimensional queries, we plan to implement data filtering and 
aggregation on top of the MPP framework. Here's a description of the scenario: 
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fapache-iotdb.feishu.cn%2Fdocx%2FdoxcnOfxK6kYK159X86gxBVFwtsdata=05%7C01%7C%7Cb785c20dd27a423032e408da55bd62e4%7C84df9e7fe9f640afb435%7C1%7C0%7C637916569203467291%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=ZsdsIVgCMgthC0g6vGDv6FMoIsU3oUUWN2xuLf5FqWY%3Dreserved=0




​Any suggestions are welcome. Thanks, Zhong Wang



[Proposal] data filtering and aggregation with tags

2022-06-24 Thread Wz
Hi guys,




To handle multidimensional queries, we plan to implement data filtering and 
aggregation on top of the MPP framework. Here's a description of the scenario: 
https://apache-iotdb.feishu.cn/docx/doxcnOfxK6kYK159X86gxBVFwts




​Any suggestions are welcome. Thanks, Zhong Wang

Re: Data filtering and aggregation with tags

2022-06-19 Thread Xiangdong Huang
+1, this feature is useful.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Eric Pai  于2022年6月16日周四 17:45写道:

> Good idea! If we can make use of the tags not only in metadata but also in
> data query, we can enrich the data analysis ability a lot, and help the
> business layer to achieve more goals than before. However as the query
> grammar may become more complicated, we should take the easy-use into
> consideration of SQL design as well.
>
> 在 2022/6/16 17:39,“Wz” 写入:
>
> Hi guys,
>
>
>
>
> I believe using tags to do data filtering and aggregation can be a
> common need. Putting all the attributes into the path is not a good idea
> because it makes the path extremely long, and slows down the MTree
> searching, so we take some of the attributes as tags. But that doesn't mean
> tags are not important.
>
>
>
>
>
> Let's take the following ECS management scenario as an example. IoTDB
> stores the cpu_util of each ECS instance. Besides that, an ECS instance has
> static attributes like region_id, available_zone, hostname, CPU, memory,
> storage, and OS store. Since the CPU, memory, and storage are numbers and
> OS is a string with white spaces, they are stored as tags and other
> attributes are stored as levels in the path like
> root.${region_id}.${available_zone}.${hostname}.cpu_util.
>
>
>
>
> Let's say there are some ECS instances whose cpu_util is abnormally
> high in the last hour and we want to know if the problem is caused by a
> certain version of OS. The query should be like,
>
>
>
>
>  SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util  95.0
> GRUOP BY TAG OS ALIGN BY DEVICE
>
>
>
>
> ​With the ability to do filter and aggregation with tags, IoTDB can be
> more powerful in analytics processing. What do you think?
>
>
>
>
> Any suggestions are welcome :D
>
>
>
>
> Zhong Wang,
>
> Alibaba group
>
>


Re: Data filtering and aggregation with tags

2022-06-16 Thread Eric Pai
Good idea! If we can make use of the tags not only in metadata but also in data 
query, we can enrich the data analysis ability a lot, and help the business 
layer to achieve more goals than before. However as the query grammar may 
become more complicated, we should take the easy-use into consideration of SQL 
design as well.

在 2022/6/16 17:39,“Wz” 写入:

Hi guys,




I believe using tags to do data filtering and aggregation can be a common 
need. Putting all the attributes into the path is not a good idea because it 
makes the path extremely long, and slows down the MTree searching, so we take 
some of the attributes as tags. But that doesn't mean tags are not important.





Let's take the following ECS management scenario as an example. IoTDB 
stores the cpu_util of each ECS instance. Besides that, an ECS instance has 
static attributes like region_id, available_zone, hostname, CPU, memory, 
storage, and OS store. Since the CPU, memory, and storage are numbers and OS is 
a string with white spaces, they are stored as tags and other attributes are 
stored as levels in the path like 
root.${region_id}.${available_zone}.${hostname}.cpu_util.




Let's say there are some ECS instances whose cpu_util is abnormally high in 
the last hour and we want to know if the problem is caused by a certain version 
of OS. The query should be like,




 SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util  95.0 GRUOP 
BY TAG OS ALIGN BY DEVICE




​With the ability to do filter and aggregation with tags, IoTDB can be more 
powerful in analytics processing. What do you think?




Any suggestions are welcome :D




Zhong Wang,

Alibaba group



Data filtering and aggregation with tags

2022-06-16 Thread Wz
Hi guys,




I believe using tags to do data filtering and aggregation can be a common need. 
Putting all the attributes into the path is not a good idea because it makes 
the path extremely long, and slows down the MTree searching, so we take some of 
the attributes as tags. But that doesn't mean tags are not important.





Let's take the following ECS management scenario as an example. IoTDB stores 
the cpu_util of each ECS instance. Besides that, an ECS instance has static 
attributes like region_id, available_zone, hostname, CPU, memory, storage, and 
OS store. Since the CPU, memory, and storage are numbers and OS is a string 
with white spaces, they are stored as tags and other attributes are stored as 
levels in the path like 
root.${region_id}.${available_zone}.${hostname}.cpu_util.




Let's say there are some ECS instances whose cpu_util is abnormally high in the 
last hour and we want to know if the problem is caused by a certain version of 
OS. The query should be like,




 SELECT OS, COUNT(cpu_util) FROM root.** WHERE cpu_util  95.0 GRUOP BY 
TAG OS ALIGN BY DEVICE




​With the ability to do filter and aggregation with tags, IoTDB can be more 
powerful in analytics processing. What do you think?




Any suggestions are welcome :D




Zhong Wang,

Alibaba group