?????? Flush function in cluster

2022-05-26 Thread Yuhua Ren
+1


Best,
Yuhua Ren


YuhuaRen
2452431...@qq.com








----
??: 
   "dev"

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099amp;data=05%7C01%7C%7C9bf11e7a5a2c4b8270f708da3c6e3868%7C84df9e7fe9f640afb435%7C1%7C0%7C637888741347695139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Camp;sdata=UY0gbvyZNox8WctT7N0yK6hz71NiWtZh%2BtW18TO4uOw%3Damp;reserved=0
   

??
Jialin Qiao
Apache 
IoTDB PMC
  
  
  
 
 


Re: Flush function in cluster

2022-05-26 Thread Haonan Hou
+1 

Best,
Haonan Hou

> On May 26, 2022, at 10:07 PM, Jialin Qiao  wrote:
> 
> Hi,
> 
> We need to support specifying no sg. How about:
> 
> FLUSH [(,)*] [ON (LOCAL|CLUSTER)]
> 
> Some examples:
> 
> flush root.sg1【flush root.sg1 on current datanode】
> flush root.sg1 on local 【flush root.sg1 on current datanode】
> flush root.sg1 on cluster  【flush root.sg1 on all datanodes】
> flush on cluster 【flush all sgs on all datanodes】
> flush on local  【flush all sgs on current datanode】
> flush 【flush all sgs on current datanode】
> 
> In the standalone version, "on cluster" will be rejected.
> 
> Thanks,
> —
> Jialin Qiao
> Apache IoTDB PMC
> 
> 
> Xiangdong Huang  于2022年5月23日周一 21:25写道:
> 
>> OK... SQL should look like a complete sentence..  So, how about "FLUSH
>> (,)* [ON  LOCAL, CLUSTER]"
>> If [ON LOCAL] is omitted, then it just flushes locally.
>> 
>> ---
>> Xiangdong Huang
>> School of Software, Tsinghua University
>> 
>> 黄向东
>> 清华大学 软件学院
>> 
>> 
>> Eric Pai  于2022年5月23日周一 11:53写道:
>> 
>>> As we want to define the SQL grammar, it's not a good choice to use Unix
>>> command line style syntax.
>>> 
>>> 在 2022/5/23 11:42,“Xiangdong Huang” 写入:
>>> 
>>>how about:  flush [, ] [--all-nodes] [-node ]
>>> 
>>>omitting []  means flush all sgs.
>>>-- all-nodes means flush on each nodes
>>>-node  means flush on the given node
>>>omitting [-node ] and [--all-nodes] equals [-node 127.0.0.1]
>>>--all-nodes and -node are mutually exclusive
>>> 
>>>Best,
>>>---
>>>Xiangdong Huang
>>>School of Software, Tsinghua University
>>> 
>>> 黄向东
>>>清华大学 软件学院
>>> 
>>> 
>>>Eric Pai  于2022年5月23日周一 11:27写道:
>>> 
 +1. It's not necessary to give 2 different syntax but with same
>>> meaning.
 Just define the most suitable one.
 
 在 2022/5/23 11:22,“Haonan Hou” 写入:
 
Hi,
 
+1 for `FLUSH ALL` syntax.
 
`FLUSH` and `FLUSH sg` are the existing syntax of the current
 standalone version.
If we execute `FLUSH ALL` on standalone IoTDB, it can be equals
>>> to
 `Flush` command.
`flush cluster` sounds meaningless for standalone IoTDB.
 
Best,
Haonan Hou
 
> On May 23, 2022, at 11:07 AM, Jialin Qiao <
>>> qiaojia...@apache.org>
 wrote:
> 
> Hi,
> 
> Flush is a frequently used command in IoTDB, which flushes
>>> memtable
 into
> disk and closes all tsfiles.
> 
> In the new cluster, we need to redefine this function [1].
> 
> * flush: flushing current datanode
> 
> * flush all/cluster: flushing all datanodes
> 
> * flush sg: flush all DataRegions of a storage group
> 
> 
> What do you think?
> 
> [1]
 
>>> 
>> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7C9bf11e7a5a2c4b8270f708da3c6e3868%7C84df9e7fe9f640afb435%7C1%7C0%7C637888741347695139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=UY0gbvyZNox8WctT7N0yK6hz71NiWtZh%2BtW18TO4uOw%3Dreserved=0
> 
> —
> Jialin Qiao
> Apache IoTDB PMC
 
 
 
>>> 
>>> 
>> 
> 



Re: Flush function in cluster

2022-05-26 Thread Jialin Qiao
Hi,

We need to support specifying no sg. How about:

FLUSH [(,)*] [ON (LOCAL|CLUSTER)]

Some examples:

flush root.sg1【flush root.sg1 on current datanode】
flush root.sg1 on local 【flush root.sg1 on current datanode】
flush root.sg1 on cluster  【flush root.sg1 on all datanodes】
flush on cluster 【flush all sgs on all datanodes】
flush on local  【flush all sgs on current datanode】
flush 【flush all sgs on current datanode】

In the standalone version, "on cluster" will be rejected.

Thanks,
—
Jialin Qiao
Apache IoTDB PMC


Xiangdong Huang  于2022年5月23日周一 21:25写道:

> OK... SQL should look like a complete sentence..  So, how about "FLUSH
>  (,)* [ON  LOCAL, CLUSTER]"
> If [ON LOCAL] is omitted, then it just flushes locally.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Eric Pai  于2022年5月23日周一 11:53写道:
>
> > As we want to define the SQL grammar, it's not a good choice to use Unix
> > command line style syntax.
> >
> > 在 2022/5/23 11:42,“Xiangdong Huang” 写入:
> >
> > how about:  flush [, ] [--all-nodes] [-node ]
> >
> > omitting []  means flush all sgs.
> > -- all-nodes means flush on each nodes
> > -node  means flush on the given node
> > omitting [-node ] and [--all-nodes] equals [-node 127.0.0.1]
> > --all-nodes and -node are mutually exclusive
> >
> > Best,
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Eric Pai  于2022年5月23日周一 11:27写道:
> >
> > > +1. It's not necessary to give 2 different syntax but with same
> > meaning.
> > > Just define the most suitable one.
> > >
> > > 在 2022/5/23 11:22,“Haonan Hou” 写入:
> > >
> > > Hi,
> > >
> > > +1 for `FLUSH ALL` syntax.
> > >
> > > `FLUSH` and `FLUSH sg` are the existing syntax of the current
> > > standalone version.
> > > If we execute `FLUSH ALL` on standalone IoTDB, it can be equals
> > to
> > > `Flush` command.
> > > `flush cluster` sounds meaningless for standalone IoTDB.
> > >
> > > Best,
> > > Haonan Hou
> > >
> > > > On May 23, 2022, at 11:07 AM, Jialin Qiao <
> > qiaojia...@apache.org>
> > > wrote:
> > > >
> > > > Hi,
> > > >
> > > > Flush is a frequently used command in IoTDB, which flushes
> > memtable
> > > into
> > > > disk and closes all tsfiles.
> > > >
> > > > In the new cluster, we need to redefine this function [1].
> > > >
> > > > * flush: flushing current datanode
> > > >
> > > > * flush all/cluster: flushing all datanodes
> > > >
> > > > * flush sg: flush all DataRegions of a storage group
> > > >
> > > >
> > > > What do you think?
> > > >
> > > > [1]
> > >
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7C9bf11e7a5a2c4b8270f708da3c6e3868%7C84df9e7fe9f640afb435%7C1%7C0%7C637888741347695139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=UY0gbvyZNox8WctT7N0yK6hz71NiWtZh%2BtW18TO4uOw%3Dreserved=0
> > > >
> > > > —
> > > > Jialin Qiao
> > > > Apache IoTDB PMC
> > >
> > >
> > >
> >
> >
>


Re: Flush function in cluster

2022-05-23 Thread Xiangdong Huang
OK... SQL should look like a complete sentence..  So, how about "FLUSH
 (,)* [ON  LOCAL, CLUSTER]"
If [ON LOCAL] is omitted, then it just flushes locally.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Eric Pai  于2022年5月23日周一 11:53写道:

> As we want to define the SQL grammar, it's not a good choice to use Unix
> command line style syntax.
>
> 在 2022/5/23 11:42,“Xiangdong Huang” 写入:
>
> how about:  flush [, ] [--all-nodes] [-node ]
>
> omitting []  means flush all sgs.
> -- all-nodes means flush on each nodes
> -node  means flush on the given node
> omitting [-node ] and [--all-nodes] equals [-node 127.0.0.1]
> --all-nodes and -node are mutually exclusive
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Eric Pai  于2022年5月23日周一 11:27写道:
>
> > +1. It's not necessary to give 2 different syntax but with same
> meaning.
> > Just define the most suitable one.
> >
> > 在 2022/5/23 11:22,“Haonan Hou” 写入:
> >
> > Hi,
> >
> > +1 for `FLUSH ALL` syntax.
> >
> > `FLUSH` and `FLUSH sg` are the existing syntax of the current
> > standalone version.
> > If we execute `FLUSH ALL` on standalone IoTDB, it can be equals
> to
> > `Flush` command.
> > `flush cluster` sounds meaningless for standalone IoTDB.
> >
> > Best,
> > Haonan Hou
> >
> > > On May 23, 2022, at 11:07 AM, Jialin Qiao <
> qiaojia...@apache.org>
> > wrote:
> > >
> > > Hi,
> > >
> > > Flush is a frequently used command in IoTDB, which flushes
> memtable
> > into
> > > disk and closes all tsfiles.
> > >
> > > In the new cluster, we need to redefine this function [1].
> > >
> > > * flush: flushing current datanode
> > >
> > > * flush all/cluster: flushing all datanodes
> > >
> > > * flush sg: flush all DataRegions of a storage group
> > >
> > >
> > > What do you think?
> > >
> > > [1]
> >
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7C9bf11e7a5a2c4b8270f708da3c6e3868%7C84df9e7fe9f640afb435%7C1%7C0%7C637888741347695139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=UY0gbvyZNox8WctT7N0yK6hz71NiWtZh%2BtW18TO4uOw%3Dreserved=0
> > >
> > > —
> > > Jialin Qiao
> > > Apache IoTDB PMC
> >
> >
> >
>
>


回复: Re: Flush function in cluster

2022-05-23 Thread Zhou Yifu
Hi all,
According to what you discussed earlier, in my understanding, currently this 
operation should be the same as the pervious version, mainly used for 
debugging?  If it is mainly used for debugging, I think it is OK to redefine it 
and add more detail to this operation. But if we want this operation as a 
frequently used command in this new cluster version, it is recommended to be 
very careful and wait for this command to be stable before releasing it. In 
pervious cluster version, I remember flush had some bugs and it is hard for us 
to recover it. Maybe currently can add some attention notes to this commend in 
user guide and tell the user should use in caution.

Thanks,
Yifu Zhou

发件人: Jialin Qiao<mailto:qiaojia...@apache.org>
发送时间: 2022年5月23日 12:51
收件人: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
主题: Re: Re: Flush function in cluster

Hi,

flush could be used in the following scenarios:

1. Test the compression ratio: A user writes some data into IoTDB and wants
to get the compression ratio, so he needs to run flush to clear the wal.
2. DBA wants to debug a datanode to see if the bug is from the memtable or
TsFile.
3. Developers of IoTDB write IT, flush will help build different cases.

As for show datanodes, these commands could be only used by the root user.

Thanks,
―
Jialin Qiao
Apache IoTDB PMC


jianyun cheng  于2022年5月23日周一 12:10写道:

> Who can execute the flush operation?
>
> This is a very dangerous operation which may block the data ingestion. So
> the permission for such commands are very important which should only limit
> the DBA to execute in my oponion. The same limitation should apply to other
> similar OP commands like list cluster data/config nodes, show cluster
> configuration, show region set on some data nodes… when we have. These
> commands are very helpful to help DBA know the cluster status and should
> not run by any other users.
>
> It’s better to separate such OP commands and data operation commands.
>
> --
> Jianyun Cheng
> Thanks
>
> From: Jialin Qiao<mailto:qiaojia...@apache.org>
> Sent: Monday, May 23, 2022 11:55 AM
> To: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
> Subject: Re: Re: Flush function in cluster
>
> Hi,
>
> In the previous version, flush is mainly used for debugging.
> Indeed, before shutdown, we want to do a flush to acceperate restarting,
> this could be bound in the stop-server.sh.
>
> In the data region, flush could be seen as a read operation, no need to
> keep all replicas having the same data format(wal or tsfile), as long as
> they have the same data point.
>
> Thanks,
> ―
> Jialin Qiao
> Apache IoTDB PMC
>
>
> 李思佳  于2022年5月23日周一 11:47写道:
>
> > " flush can reduce memory and speed up the restart process" , this
> assumes
> > that all copies have been flushed synchronously, so we can ensure that
> the
> > data files are logically consistent at this point.
> >
> > The operation of datanode flushing should be the process of resource
> > release before the node is shutdown(but this does not guarantee that all
> > copies are logically consistent at this point). For example, shutdownHook
> > requires the default disk flushing and resource release. We need to
> provide
> > a flush command scenario, perhaps because our node shutdown operation is
> > not incomplete?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Xiangdong Huang 
> > 发送时间: 2022年5月23日 11:37
> > 收件人: dev 
> > 主题: Re: Flush function in cluster
> >
> > I think distinguishing flushing on one node or on the cluster has its
> > meaning.
> >
> > As you said, flush can reduce memory and speed up the restart process.
> So,
> > how about if the DBA just wants to restart one node..
> >
> > However, the default behavior can be discussed: flush on one node by
> > default or on the whole cluster by default.
> >
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > 李思佳  于2022年5月23日周一 11:28写道:
> >
> > > Sorry, I don't understand what the purpose and use of flushing current
> > > datanode is.
> > >
> > > IMO, flush all should mean that all storage group could be flushed, in
> > > another word, flush sg is a subset of flush all.
> > >
> > > For users, distributed is a black box, while SG is an exposed
> structure.
> > > Therefore, for cli commands, there is no need to be aware of the
>

Re: Re: Flush function in cluster

2022-05-22 Thread Jialin Qiao
Hi,

flush could be used in the following scenarios:

1. Test the compression ratio: A user writes some data into IoTDB and wants
to get the compression ratio, so he needs to run flush to clear the wal.
2. DBA wants to debug a datanode to see if the bug is from the memtable or
TsFile.
3. Developers of IoTDB write IT, flush will help build different cases.

As for show datanodes, these commands could be only used by the root user.

Thanks,
—
Jialin Qiao
Apache IoTDB PMC


jianyun cheng  于2022年5月23日周一 12:10写道:

> Who can execute the flush operation?
>
> This is a very dangerous operation which may block the data ingestion. So
> the permission for such commands are very important which should only limit
> the DBA to execute in my oponion. The same limitation should apply to other
> similar OP commands like list cluster data/config nodes, show cluster
> configuration, show region set on some data nodes… when we have. These
> commands are very helpful to help DBA know the cluster status and should
> not run by any other users.
>
> It’s better to separate such OP commands and data operation commands.
>
> --
> Jianyun Cheng
> Thanks
>
> From: Jialin Qiao<mailto:qiaojia...@apache.org>
> Sent: Monday, May 23, 2022 11:55 AM
> To: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
> Subject: Re: Re: Flush function in cluster
>
> Hi,
>
> In the previous version, flush is mainly used for debugging.
> Indeed, before shutdown, we want to do a flush to acceperate restarting,
> this could be bound in the stop-server.sh.
>
> In the data region, flush could be seen as a read operation, no need to
> keep all replicas having the same data format(wal or tsfile), as long as
> they have the same data point.
>
> Thanks,
> —
> Jialin Qiao
> Apache IoTDB PMC
>
>
> 李思佳  于2022年5月23日周一 11:47写道:
>
> > " flush can reduce memory and speed up the restart process" , this
> assumes
> > that all copies have been flushed synchronously, so we can ensure that
> the
> > data files are logically consistent at this point.
> >
> > The operation of datanode flushing should be the process of resource
> > release before the node is shutdown(but this does not guarantee that all
> > copies are logically consistent at this point). For example, shutdownHook
> > requires the default disk flushing and resource release. We need to
> provide
> > a flush command scenario, perhaps because our node shutdown operation is
> > not incomplete?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Xiangdong Huang 
> > 发送时间: 2022年5月23日 11:37
> > 收件人: dev 
> > 主题: Re: Flush function in cluster
> >
> > I think distinguishing flushing on one node or on the cluster has its
> > meaning.
> >
> > As you said, flush can reduce memory and speed up the restart process.
> So,
> > how about if the DBA just wants to restart one node..
> >
> > However, the default behavior can be discussed: flush on one node by
> > default or on the whole cluster by default.
> >
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > 李思佳  于2022年5月23日周一 11:28写道:
> >
> > > Sorry, I don't understand what the purpose and use of flushing current
> > > datanode is.
> > >
> > > IMO, flush all should mean that all storage group could be flushed, in
> > > another word, flush sg is a subset of flush all.
> > >
> > > For users, distributed is a black box, while SG is an exposed
> structure.
> > > Therefore, for cli commands, there is no need to be aware of the
> > > relationship between the datanode and the self-created SG.
> > >
> > > In addition, the Flush operation may speed up our restart recovery
> > > process. For example, when we flush an SG successfully, we can label
> > > the associated data files to indicate that all copies are consistent
> > > at that point in time(here are flush and write priorities). During the
> > > next restart, we can use this flag to quickly skip the verification
> step.
> > >
> > > In summary, here are my questions and thoughts:
> > > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > > 2. Can the Flush operation affect the consensus group or WAL for a
> > > quick restart?
> > >
> > > BR,
> > > 

Re: Re: Re: Flush function in cluster

2022-05-22 Thread Jialin Qiao
Hi,

We cannot ensure that all replicas has the same tsfile, except for user
flush, the storage engine will auto flush memtables according to its memory
usage. We can not guarantee different nodes has the same memory.

As for accelerating restart and catch up in the cluster, this is the
responsibility of the snapshot of the consensus layer, not related to the
user flush.
The snapshot is a behavior of one replica: call flush of storage engine,
record the tsfiles.

Thanks,
—
Jialin Qiao
Apache IoTDB PMC


李思佳  于2022年5月23日周一 12:04写道:

> In fact, this is because we cannot compare tsFiles to determine whether
> the replica data is consistent.
>
> If the user flush ensures that all copies are flushed, then the next
> restart, we only need to check whether the operation after this flush is
> consistent and update it.
>
> Otherwise, when the follower is much behind the leader and we need to
> catch up via tsfile, is there a copy of the all data files?
>
> BR,
> ---
> Sijia Li
>
>
> -邮件原件-
> 发件人: Xiangdong Huang 
> 发送时间: 2022年5月23日 11:52
> 收件人: dev 
> 主题: Re: Re: Flush function in cluster
>
> > " flush can reduce memory and speed up the restart process" , this
> assumes that all copies have been flushed synchronously, so we can ensure
> that the data files are logically consistent at this point.
>
> Sorry that maybe I lag behind current cluster design..
> Do we need "all copies have been flushed synchronously, so we can ensure
> that the data files are logically consistent at this point" ? why? because
> of the raft protocol?
>
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳  于2022年5月23日周一 11:47写道:
>
> > " flush can reduce memory and speed up the restart process" , this
> > assumes that all copies have been flushed synchronously, so we can
> > ensure that the data files are logically consistent at this point.
> >
> > The operation of datanode flushing should be the process of resource
> > release before the node is shutdown(but this does not guarantee that
> > all copies are logically consistent at this point). For example,
> > shutdownHook requires the default disk flushing and resource release.
> > We need to provide a flush command scenario, perhaps because our node
> > shutdown operation is not incomplete?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Xiangdong Huang 
> > 发送时间: 2022年5月23日 11:37
> > 收件人: dev 
> > 主题: Re: Flush function in cluster
> >
> > I think distinguishing flushing on one node or on the cluster has its
> > meaning.
> >
> > As you said, flush can reduce memory and speed up the restart process.
> > So, how about if the DBA just wants to restart one node..
> >
> > However, the default behavior can be discussed: flush on one node by
> > default or on the whole cluster by default.
> >
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > 李思佳  于2022年5月23日周一 11:28写道:
> >
> > > Sorry, I don't understand what the purpose and use of flushing
> > > current datanode is.
> > >
> > > IMO, flush all should mean that all storage group could be flushed,
> > > in another word, flush sg is a subset of flush all.
> > >
> > > For users, distributed is a black box, while SG is an exposed
> structure.
> > > Therefore, for cli commands, there is no need to be aware of the
> > > relationship between the datanode and the self-created SG.
> > >
> > > In addition, the Flush operation may speed up our restart recovery
> > > process. For example, when we flush an SG successfully, we can label
> > > the associated data files to indicate that all copies are consistent
> > > at that point in time(here are flush and write priorities). During
> > > the next restart, we can use this flag to quickly skip the
> verification step.
> > >
> > > In summary, here are my questions and thoughts:
> > > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > > 2. Can the Flush operation affect the consensus group or WAL for a
> > > quick restart?
> > >
> > > BR,
> > > ---
> > > Sijia Li
> > >
> > >
> > > -邮件原件-
> > > 发件人: Jialin Qiao 
> > > 发送时间: 2022年5月23日 11:07
> > > 收件人: dev@iotdb.apache.org
> > > 主题: Flush function in cluster
> > >
> > > Hi,
> > >
> > > Flush is a frequently used command in IoTDB, which flushes memtable
> > > into disk and closes all tsfiles.
> > >
> > > In the new cluster, we need to redefine this function [1].
> > >
> > > * flush: flushing current datanode
> > >
> > > * flush all/cluster: flushing all datanodes
> > >
> > > * flush sg: flush all DataRegions of a storage group
> > >
> > >
> > > What do you think?
> > >
> > > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> > >
> > > —
> > > Jialin Qiao
> > > Apache IoTDB PMC
> > >
> >
>


RE: Re: Flush function in cluster

2022-05-22 Thread jianyun cheng
Who can execute the flush operation?

This is a very dangerous operation which may block the data ingestion. So the 
permission for such commands are very important which should only limit the DBA 
to execute in my oponion. The same limitation should apply to other similar OP 
commands like list cluster data/config nodes, show cluster configuration, show 
region set on some data nodes… when we have. These commands are very helpful to 
help DBA know the cluster status and should not run by any other users.

It’s better to separate such OP commands and data operation commands.

--
Jianyun Cheng
Thanks

From: Jialin Qiao<mailto:qiaojia...@apache.org>
Sent: Monday, May 23, 2022 11:55 AM
To: dev@iotdb.apache.org<mailto:dev@iotdb.apache.org>
Subject: Re: Re: Flush function in cluster

Hi,

In the previous version, flush is mainly used for debugging.
Indeed, before shutdown, we want to do a flush to acceperate restarting,
this could be bound in the stop-server.sh.

In the data region, flush could be seen as a read operation, no need to
keep all replicas having the same data format(wal or tsfile), as long as
they have the same data point.

Thanks,
―
Jialin Qiao
Apache IoTDB PMC


李思佳  于2022年5月23日周一 11:47写道:

> " flush can reduce memory and speed up the restart process" , this assumes
> that all copies have been flushed synchronously, so we can ensure that the
> data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource
> release before the node is shutdown(but this does not guarantee that all
> copies are logically consistent at this point). For example, shutdownHook
> requires the default disk flushing and resource release. We need to provide
> a flush command scenario, perhaps because our node shutdown operation is
> not incomplete?
>
> BR,
> ---
> Sijia Li
>
>
> -邮件原件-
> 发件人: Xiangdong Huang 
> 发送时间: 2022年5月23日 11:37
> 收件人: dev 
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. So,
> how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by
> default or on the whole cluster by default.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳  于2022年5月23日周一 11:28写道:
>
> > Sorry, I don't understand what the purpose and use of flushing current
> > datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, in
> > another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery
> > process. For example, when we flush an SG successfully, we can label
> > the associated data files to indicate that all copies are consistent
> > at that point in time(here are flush and write priorities). During the
> > next restart, we can use this flag to quickly skip the verification step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a
> > quick restart?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Jialin Qiao 
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > ―
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>



答复: Re: Re: Flush function in cluster

2022-05-22 Thread 李思佳
In fact, this is because we cannot compare tsFiles to determine whether the 
replica data is consistent.

If the user flush ensures that all copies are flushed, then the next restart, 
we only need to check whether the operation after this flush is consistent and 
update it.

Otherwise, when the follower is much behind the leader and we need to catch up 
via tsfile, is there a copy of the all data files?

BR,
---
Sijia Li


-邮件原件-
发件人: Xiangdong Huang  
发送时间: 2022年5月23日 11:52
收件人: dev 
主题: Re: Re: Flush function in cluster

> " flush can reduce memory and speed up the restart process" , this
assumes that all copies have been flushed synchronously, so we can ensure that 
the data files are logically consistent at this point.

Sorry that maybe I lag behind current cluster design..
Do we need "all copies have been flushed synchronously, so we can ensure that 
the data files are logically consistent at this point" ? why? because of the 
raft protocol?


---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳  于2022年5月23日周一 11:47写道:

> " flush can reduce memory and speed up the restart process" , this 
> assumes that all copies have been flushed synchronously, so we can 
> ensure that the data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource 
> release before the node is shutdown(but this does not guarantee that 
> all copies are logically consistent at this point). For example, 
> shutdownHook requires the default disk flushing and resource release. 
> We need to provide a flush command scenario, perhaps because our node 
> shutdown operation is not incomplete?
>
> BR,
> ---
> Sijia Li
>
>
> -----邮件原件-
> 发件人: Xiangdong Huang 
> 发送时间: 2022年5月23日 11:37
> 收件人: dev 
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its 
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. 
> So, how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by 
> default or on the whole cluster by default.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳  于2022年5月23日周一 11:28写道:
>
> > Sorry, I don't understand what the purpose and use of flushing 
> > current datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, 
> > in another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the 
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery 
> > process. For example, when we flush an SG successfully, we can label 
> > the associated data files to indicate that all copies are consistent 
> > at that point in time(here are flush and write priorities). During 
> > the next restart, we can use this flag to quickly skip the verification 
> > step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a 
> > quick restart?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Jialin Qiao 
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable 
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > —
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>


Re: Re: Flush function in cluster

2022-05-22 Thread Jialin Qiao
Hi,

In the previous version, flush is mainly used for debugging.
Indeed, before shutdown, we want to do a flush to acceperate restarting,
this could be bound in the stop-server.sh.

In the data region, flush could be seen as a read operation, no need to
keep all replicas having the same data format(wal or tsfile), as long as
they have the same data point.

Thanks,
—
Jialin Qiao
Apache IoTDB PMC


李思佳  于2022年5月23日周一 11:47写道:

> " flush can reduce memory and speed up the restart process" , this assumes
> that all copies have been flushed synchronously, so we can ensure that the
> data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource
> release before the node is shutdown(but this does not guarantee that all
> copies are logically consistent at this point). For example, shutdownHook
> requires the default disk flushing and resource release. We need to provide
> a flush command scenario, perhaps because our node shutdown operation is
> not incomplete?
>
> BR,
> ---
> Sijia Li
>
>
> -邮件原件-
> 发件人: Xiangdong Huang 
> 发送时间: 2022年5月23日 11:37
> 收件人: dev 
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. So,
> how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by
> default or on the whole cluster by default.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳  于2022年5月23日周一 11:28写道:
>
> > Sorry, I don't understand what the purpose and use of flushing current
> > datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, in
> > another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery
> > process. For example, when we flush an SG successfully, we can label
> > the associated data files to indicate that all copies are consistent
> > at that point in time(here are flush and write priorities). During the
> > next restart, we can use this flag to quickly skip the verification step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a
> > quick restart?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Jialin Qiao 
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > —
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>


Re: Flush function in cluster

2022-05-22 Thread Eric Pai
As we want to define the SQL grammar, it's not a good choice to use Unix 
command line style syntax.

在 2022/5/23 11:42,“Xiangdong Huang” 写入:

how about:  flush [, ] [--all-nodes] [-node ]

omitting []  means flush all sgs.
-- all-nodes means flush on each nodes
-node  means flush on the given node
omitting [-node ] and [--all-nodes] equals [-node 127.0.0.1]
--all-nodes and -node are mutually exclusive

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Eric Pai  于2022年5月23日周一 11:27写道:

> +1. It's not necessary to give 2 different syntax but with same meaning.
> Just define the most suitable one.
>
> 在 2022/5/23 11:22,“Haonan Hou” 写入:
>
> Hi,
>
> +1 for `FLUSH ALL` syntax.
>
> `FLUSH` and `FLUSH sg` are the existing syntax of the current
> standalone version.
> If we execute `FLUSH ALL` on standalone IoTDB, it can be equals to
> `Flush` command.
> `flush cluster` sounds meaningless for standalone IoTDB.
>
> Best,
> Haonan Hou
>
> > On May 23, 2022, at 11:07 AM, Jialin Qiao 
> wrote:
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> into
> > disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1]
> 
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7C9bf11e7a5a2c4b8270f708da3c6e3868%7C84df9e7fe9f640afb435%7C1%7C0%7C637888741347695139%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=UY0gbvyZNox8WctT7N0yK6hz71NiWtZh%2BtW18TO4uOw%3Dreserved=0
> >
> > —
> > Jialin Qiao
> > Apache IoTDB PMC
>
>
>



Re: Re: Flush function in cluster

2022-05-22 Thread Xiangdong Huang
> " flush can reduce memory and speed up the restart process" , this
assumes that all copies have been flushed synchronously, so we can ensure
that the data files are logically consistent at this point.

Sorry that maybe I lag behind current cluster design..
Do we need "all copies have been flushed synchronously, so we can ensure
that the data files are logically consistent at this point" ? why? because
of the raft protocol?


---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳  于2022年5月23日周一 11:47写道:

> " flush can reduce memory and speed up the restart process" , this assumes
> that all copies have been flushed synchronously, so we can ensure that the
> data files are logically consistent at this point.
>
> The operation of datanode flushing should be the process of resource
> release before the node is shutdown(but this does not guarantee that all
> copies are logically consistent at this point). For example, shutdownHook
> requires the default disk flushing and resource release. We need to provide
> a flush command scenario, perhaps because our node shutdown operation is
> not incomplete?
>
> BR,
> ---
> Sijia Li
>
>
> -----邮件原件-
> 发件人: Xiangdong Huang 
> 发送时间: 2022年5月23日 11:37
> 收件人: dev 
> 主题: Re: Flush function in cluster
>
> I think distinguishing flushing on one node or on the cluster has its
> meaning.
>
> As you said, flush can reduce memory and speed up the restart process. So,
> how about if the DBA just wants to restart one node..
>
> However, the default behavior can be discussed: flush on one node by
> default or on the whole cluster by default.
>
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> 李思佳  于2022年5月23日周一 11:28写道:
>
> > Sorry, I don't understand what the purpose and use of flushing current
> > datanode is.
> >
> > IMO, flush all should mean that all storage group could be flushed, in
> > another word, flush sg is a subset of flush all.
> >
> > For users, distributed is a black box, while SG is an exposed structure.
> > Therefore, for cli commands, there is no need to be aware of the
> > relationship between the datanode and the self-created SG.
> >
> > In addition, the Flush operation may speed up our restart recovery
> > process. For example, when we flush an SG successfully, we can label
> > the associated data files to indicate that all copies are consistent
> > at that point in time(here are flush and write priorities). During the
> > next restart, we can use this flag to quickly skip the verification step.
> >
> > In summary, here are my questions and thoughts:
> > 1. Is it necessary to flush a dataNode? What are the benefits of this?
> > 2. Can the Flush operation affect the consensus group or WAL for a
> > quick restart?
> >
> > BR,
> > ---
> > Sijia Li
> >
> >
> > -邮件原件-
> > 发件人: Jialin Qiao 
> > 发送时间: 2022年5月23日 11:07
> > 收件人: dev@iotdb.apache.org
> > 主题: Flush function in cluster
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> > into disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1] https://issues.apache.org/jira/browse/IOTDB-3099
> >
> > —
> > Jialin Qiao
> > Apache IoTDB PMC
> >
>


答复: Re: Flush function in cluster

2022-05-22 Thread 李思佳
" flush can reduce memory and speed up the restart process" , this assumes that 
all copies have been flushed synchronously, so we can ensure that the data 
files are logically consistent at this point.  

The operation of datanode flushing should be the process of resource release 
before the node is shutdown(but this does not guarantee that all copies are 
logically consistent at this point). For example, shutdownHook requires the 
default disk flushing and resource release. We need to provide a flush command 
scenario, perhaps because our node shutdown operation is not incomplete?  

BR,
---
Sijia Li


-邮件原件-
发件人: Xiangdong Huang  
发送时间: 2022年5月23日 11:37
收件人: dev 
主题: Re: Flush function in cluster

I think distinguishing flushing on one node or on the cluster has its meaning.

As you said, flush can reduce memory and speed up the restart process. So, how 
about if the DBA just wants to restart one node..

However, the default behavior can be discussed: flush on one node by default or 
on the whole cluster by default.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳  于2022年5月23日周一 11:28写道:

> Sorry, I don't understand what the purpose and use of flushing current 
> datanode is.
>
> IMO, flush all should mean that all storage group could be flushed, in 
> another word, flush sg is a subset of flush all.
>
> For users, distributed is a black box, while SG is an exposed structure.
> Therefore, for cli commands, there is no need to be aware of the 
> relationship between the datanode and the self-created SG.
>
> In addition, the Flush operation may speed up our restart recovery 
> process. For example, when we flush an SG successfully, we can label 
> the associated data files to indicate that all copies are consistent 
> at that point in time(here are flush and write priorities). During the 
> next restart, we can use this flag to quickly skip the verification step.
>
> In summary, here are my questions and thoughts:
> 1. Is it necessary to flush a dataNode? What are the benefits of this?
> 2. Can the Flush operation affect the consensus group or WAL for a 
> quick restart?
>
> BR,
> ---
> Sijia Li
>
>
> -----邮件原件-
> 发件人: Jialin Qiao 
> 发送时间: 2022年5月23日 11:07
> 收件人: dev@iotdb.apache.org
> 主题: Flush function in cluster
>
> Hi,
>
> Flush is a frequently used command in IoTDB, which flushes memtable 
> into disk and closes all tsfiles.
>
> In the new cluster, we need to redefine this function [1].
>
> * flush: flushing current datanode
>
> * flush all/cluster: flushing all datanodes
>
> * flush sg: flush all DataRegions of a storage group
>
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/IOTDB-3099
>
> —
> Jialin Qiao
> Apache IoTDB PMC
>


Re: Flush function in cluster

2022-05-22 Thread Xiangdong Huang
how about:  flush [, ] [--all-nodes] [-node ]

omitting []  means flush all sgs.
-- all-nodes means flush on each nodes
-node  means flush on the given node
omitting [-node ] and [--all-nodes] equals [-node 127.0.0.1]
--all-nodes and -node are mutually exclusive

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Eric Pai  于2022年5月23日周一 11:27写道:

> +1. It's not necessary to give 2 different syntax but with same meaning.
> Just define the most suitable one.
>
> 在 2022/5/23 11:22,“Haonan Hou” 写入:
>
> Hi,
>
> +1 for `FLUSH ALL` syntax.
>
> `FLUSH` and `FLUSH sg` are the existing syntax of the current
> standalone version.
> If we execute `FLUSH ALL` on standalone IoTDB, it can be equals to
> `Flush` command.
> `flush cluster` sounds meaningless for standalone IoTDB.
>
> Best,
> Haonan Hou
>
> > On May 23, 2022, at 11:07 AM, Jialin Qiao 
> wrote:
> >
> > Hi,
> >
> > Flush is a frequently used command in IoTDB, which flushes memtable
> into
> > disk and closes all tsfiles.
> >
> > In the new cluster, we need to redefine this function [1].
> >
> > * flush: flushing current datanode
> >
> > * flush all/cluster: flushing all datanodes
> >
> > * flush sg: flush all DataRegions of a storage group
> >
> >
> > What do you think?
> >
> > [1]
> https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7Cad1f61d413164120347708da3c6b86e5%7C84df9e7fe9f640afb435%7C1%7C0%7C637888729789442308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=6X9ZDSKSOLA0HCkAr5v9uNiK1KkI71SOrzXLW%2BH4GBs%3Dreserved=0
> >
> > —
> > Jialin Qiao
> > Apache IoTDB PMC
>
>
>


Re: Flush function in cluster

2022-05-22 Thread Xiangdong Huang
I think distinguishing flushing on one node or on the cluster has its
meaning.

As you said, flush can reduce memory and speed up the restart process. So,
how about if the DBA just wants to restart one node..

However, the default behavior can be discussed: flush on one node by
default or on the whole cluster by default.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


李思佳  于2022年5月23日周一 11:28写道:

> Sorry, I don't understand what the purpose and use of flushing current
> datanode is.
>
> IMO, flush all should mean that all storage group could be flushed, in
> another word, flush sg is a subset of flush all.
>
> For users, distributed is a black box, while SG is an exposed structure.
> Therefore, for cli commands, there is no need to be aware of the
> relationship between the datanode and the self-created SG.
>
> In addition, the Flush operation may speed up our restart recovery
> process. For example, when we flush an SG successfully, we can label the
> associated data files to indicate that all copies are consistent at that
> point in time(here are flush and write priorities). During the next
> restart, we can use this flag to quickly skip the verification step.
>
> In summary, here are my questions and thoughts:
> 1. Is it necessary to flush a dataNode? What are the benefits of this?
> 2. Can the Flush operation affect the consensus group or WAL for a quick
> restart?
>
> BR,
> ---
> Sijia Li
>
>
> -邮件原件-
> 发件人: Jialin Qiao 
> 发送时间: 2022年5月23日 11:07
> 收件人: dev@iotdb.apache.org
> 主题: Flush function in cluster
>
> Hi,
>
> Flush is a frequently used command in IoTDB, which flushes memtable into
> disk and closes all tsfiles.
>
> In the new cluster, we need to redefine this function [1].
>
> * flush: flushing current datanode
>
> * flush all/cluster: flushing all datanodes
>
> * flush sg: flush all DataRegions of a storage group
>
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/IOTDB-3099
>
> —
> Jialin Qiao
> Apache IoTDB PMC
>


答复: Flush function in cluster

2022-05-22 Thread 李思佳
In addition, if flush current datanode operation is not supported, can our 
command use the previous rules, such as flush and flush SG  

BR,
---
Sijia Li


-邮件原件-
发件人: 李思佳  
发送时间: 2022年5月23日 11:28
收件人: dev@iotdb.apache.org
主题: 答复: Flush function in cluster

Sorry, I don't understand what the purpose and use of flushing current datanode 
is.

IMO, flush all should mean that all storage group could be flushed, in another 
word, flush sg is a subset of flush all.

For users, distributed is a black box, while SG is an exposed structure. 
Therefore, for cli commands, there is no need to be aware of the relationship 
between the datanode and the self-created SG.  

In addition, the Flush operation may speed up our restart recovery process. For 
example, when we flush an SG successfully, we can label the associated data 
files to indicate that all copies are consistent at that point in time(here are 
flush and write priorities). During the next restart, we can use this flag to 
quickly skip the verification step.

In summary, here are my questions and thoughts:
1. Is it necessary to flush a dataNode? What are the benefits of this?  
2. Can the Flush operation affect the consensus group or WAL for a quick 
restart?  

BR,
---
Sijia Li


-邮件原件-
发件人: Jialin Qiao  
发送时间: 2022年5月23日 11:07
收件人: dev@iotdb.apache.org
主题: Flush function in cluster

Hi,

Flush is a frequently used command in IoTDB, which flushes memtable into disk 
and closes all tsfiles.

In the new cluster, we need to redefine this function [1].

* flush: flushing current datanode

* flush all/cluster: flushing all datanodes

* flush sg: flush all DataRegions of a storage group


What do you think?

[1] https://issues.apache.org/jira/browse/IOTDB-3099

—
Jialin Qiao
Apache IoTDB PMC


答复: Flush function in cluster

2022-05-22 Thread 李思佳
Sorry, I don't understand what the purpose and use of flushing current datanode 
is.

IMO, flush all should mean that all storage group could be flushed, in another 
word, flush sg is a subset of flush all.

For users, distributed is a black box, while SG is an exposed structure. 
Therefore, for cli commands, there is no need to be aware of the relationship 
between the datanode and the self-created SG.  

In addition, the Flush operation may speed up our restart recovery process. For 
example, when we flush an SG successfully, we can label the associated data 
files to indicate that all copies are consistent at that point in time(here are 
flush and write priorities). During the next restart, we can use this flag to 
quickly skip the verification step.

In summary, here are my questions and thoughts:
1. Is it necessary to flush a dataNode? What are the benefits of this?  
2. Can the Flush operation affect the consensus group or WAL for a quick 
restart?  

BR,
---
Sijia Li


-邮件原件-
发件人: Jialin Qiao  
发送时间: 2022年5月23日 11:07
收件人: dev@iotdb.apache.org
主题: Flush function in cluster

Hi,

Flush is a frequently used command in IoTDB, which flushes memtable into disk 
and closes all tsfiles.

In the new cluster, we need to redefine this function [1].

* flush: flushing current datanode

* flush all/cluster: flushing all datanodes

* flush sg: flush all DataRegions of a storage group


What do you think?

[1] https://issues.apache.org/jira/browse/IOTDB-3099

—
Jialin Qiao
Apache IoTDB PMC


Re: Flush function in cluster

2022-05-22 Thread Eric Pai
+1. It's not necessary to give 2 different syntax but with same meaning. Just 
define the most suitable one.

在 2022/5/23 11:22,“Haonan Hou” 写入:

Hi,

+1 for `FLUSH ALL` syntax.

`FLUSH` and `FLUSH sg` are the existing syntax of the current standalone 
version.
If we execute `FLUSH ALL` on standalone IoTDB, it can be equals to `Flush` 
command.
`flush cluster` sounds meaningless for standalone IoTDB.

Best,
Haonan Hou

> On May 23, 2022, at 11:07 AM, Jialin Qiao  wrote:
> 
> Hi,
> 
> Flush is a frequently used command in IoTDB, which flushes memtable into
> disk and closes all tsfiles.
> 
> In the new cluster, we need to redefine this function [1].
> 
> * flush: flushing current datanode
> 
> * flush all/cluster: flushing all datanodes
> 
> * flush sg: flush all DataRegions of a storage group
> 
> 
> What do you think?
> 
> [1] 
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7Cad1f61d413164120347708da3c6b86e5%7C84df9e7fe9f640afb435%7C1%7C0%7C637888729789442308%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=6X9ZDSKSOLA0HCkAr5v9uNiK1KkI71SOrzXLW%2BH4GBs%3Dreserved=0
> 
> —
> Jialin Qiao
> Apache IoTDB PMC




Re: Flush function in cluster

2022-05-22 Thread Haonan Hou
Hi,

+1 for `FLUSH ALL` syntax.

`FLUSH` and `FLUSH sg` are the existing syntax of the current standalone 
version.
If we execute `FLUSH ALL` on standalone IoTDB, it can be equals to `Flush` 
command.
`flush cluster` sounds meaningless for standalone IoTDB.

Best,
Haonan Hou

> On May 23, 2022, at 11:07 AM, Jialin Qiao  wrote:
> 
> Hi,
> 
> Flush is a frequently used command in IoTDB, which flushes memtable into
> disk and closes all tsfiles.
> 
> In the new cluster, we need to redefine this function [1].
> 
> * flush: flushing current datanode
> 
> * flush all/cluster: flushing all datanodes
> 
> * flush sg: flush all DataRegions of a storage group
> 
> 
> What do you think?
> 
> [1] https://issues.apache.org/jira/browse/IOTDB-3099
> 
> —
> Jialin Qiao
> Apache IoTDB PMC



Re: Flush function in cluster

2022-05-22 Thread HW-Chao Wang
good job,we need define syntax first…



---Original---
From: "Jialin Qiao"https://issues.apache.org/jira/browse/IOTDB-3099

—
Jialin Qiao
Apache IoTDB PMC

Re: Flush function in cluster

2022-05-22 Thread Eric Pai
Maybe ALL and CLUSTER are keywords afterwards. We should use FLUSH `all` and 
FLUSH `cluster` instead.

在 2022/5/23 11:16,“Xiangdong Huang” 写入:

how about if there is a sg called "all" or "cluster" ?
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao  于2022年5月23日周一 11:07写道:

> Hi,
>
> Flush is a frequently used command in IoTDB, which flushes memtable into
> disk and closes all tsfiles.
>
> In the new cluster, we need to redefine this function [1].
>
> * flush: flushing current datanode
>
> * flush all/cluster: flushing all datanodes
>
> * flush sg: flush all DataRegions of a storage group
>
>
> What do you think?
>
> [1] 
https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FIOTDB-3099data=05%7C01%7C%7C863b74c336cc4f7f6f7208da3c6aaf01%7C84df9e7fe9f640afb435%7C1%7C0%7C637888726159267012%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=z%2BFtmVn00k4thb4XaxP%2F54TXBq%2B0QuznR0GS7YuqN0g%3Dreserved=0
>
> —
> Jialin Qiao
> Apache IoTDB PMC
>



Re: Flush function in cluster

2022-05-22 Thread Xiangdong Huang
how about if there is a sg called "all" or "cluster" ?
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Jialin Qiao  于2022年5月23日周一 11:07写道:

> Hi,
>
> Flush is a frequently used command in IoTDB, which flushes memtable into
> disk and closes all tsfiles.
>
> In the new cluster, we need to redefine this function [1].
>
> * flush: flushing current datanode
>
> * flush all/cluster: flushing all datanodes
>
> * flush sg: flush all DataRegions of a storage group
>
>
> What do you think?
>
> [1] https://issues.apache.org/jira/browse/IOTDB-3099
>
> —
> Jialin Qiao
> Apache IoTDB PMC
>


Flush function in cluster

2022-05-22 Thread Jialin Qiao
Hi,

Flush is a frequently used command in IoTDB, which flushes memtable into
disk and closes all tsfiles.

In the new cluster, we need to redefine this function [1].

* flush: flushing current datanode

* flush all/cluster: flushing all datanodes

* flush sg: flush all DataRegions of a storage group


What do you think?

[1] https://issues.apache.org/jira/browse/IOTDB-3099

—
Jialin Qiao
Apache IoTDB PMC