[GitHub] [incubator-iotdb] qiaojialin commented on a change in pull request #1101: [IOTDB-611] Add documents introducing Data Query design fundamentals

GitBox Mon, 27 Apr 2020 05:35:47 -0700


qiaojialin commented on a change in pull request #1101:
URL: https://github.com/apache/incubator-iotdb/pull/1101#discussion_r415670440




##########
File path: docs/zh/SystemDesign/5-DataQuery/2-QueryFundamentals.md
##########
@@ -0,0 +1,69 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 查询基础介绍
+
+## 顺序和乱序tsFile文件
+
+在对某一个时间序列插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序文件被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。

Review comment:
       ```suggestion
   
在对某一个设备插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序数据被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。
   ```

##########
File path: docs/zh/SystemDesign/5-DataQuery/2-QueryFundamentals.md
##########
@@ -0,0 +1,69 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 查询基础介绍
+
+## 顺序和乱序tsFile文件
+
+在对某一个时间序列插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序文件被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。
+
+IoTDB会将顺序和乱序文件分开存储在data/sequence和data/unsequence文件目录下。在查询过程中也会对顺序和乱序文件中的数据分别进行处理，我们总会使用`QueryResourceManager.java`中的`getQueryDataSource()`方法通过时间序列的全路径得到存储该时间序列的顺序和乱序文件。
+
+
+## 读取TsFile的一般流程
+
+TsFile 
各级结构在前面的[1-TsFile](/#/SystemDesign/progress/chap1/sec1)文档中已有介绍，读取一个时间序列的过程需要按照层级各级展开TsFileResource
 -> TimeseriesMetadata -> ChunkMetadata -> IPageReader -> BatchData。
+
+文件读取的功能方法在
+`org.apache.iotdb.db.utils.FileLoaderUtils`
+
+* 
`loadTimeSeriesMetadata()`用来读取一个TsFileResource对应于某一个时间序列的timeseriesMetadata，该方法同时接受一个时间戳的Filter条件来保证该方法返回满足条件的timeseriesMetadata，若没有满足条件的timeseriesMetadata则返回null。

Review comment:
       ```suggestion
   * `loadTimeSeriesMetadata()`用来读取一个TsFileResource对应于某一个时间序列的 
TimeseriesMetadata，该方法同时接受一个时间戳的Filter条件来保证该方法返回满足条件的 
TimeseriesMetadata，若没有满足条件的 TimeseriesMetadata 则返回null。
   ```

##########
File path: docs/zh/SystemDesign/5-DataQuery/2-QueryFundamentals.md
##########
@@ -0,0 +1,69 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 查询基础介绍
+
+## 顺序和乱序tsFile文件
+
+在对某一个时间序列插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序文件被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。
+
+IoTDB会将顺序和乱序文件分开存储在data/sequence和data/unsequence文件目录下。在查询过程中也会对顺序和乱序文件中的数据分别进行处理，我们总会使用`QueryResourceManager.java`中的`getQueryDataSource()`方法通过时间序列的全路径得到存储该时间序列的顺序和乱序文件。
+
+
+## 读取TsFile的一般流程
+
+TsFile 
各级结构在前面的[1-TsFile](/#/SystemDesign/progress/chap1/sec1)文档中已有介绍，读取一个时间序列的过程需要按照层级各级展开TsFileResource
 -> TimeseriesMetadata -> ChunkMetadata -> IPageReader -> BatchData。
+
+文件读取的功能方法在
+`org.apache.iotdb.db.utils.FileLoaderUtils`
+
+* 
`loadTimeSeriesMetadata()`用来读取一个TsFileResource对应于某一个时间序列的timeseriesMetadata，该方法同时接受一个时间戳的Filter条件来保证该方法返回满足条件的timeseriesMetadata，若没有满足条件的timeseriesMetadata则返回null。
+* `loadChunkMetadataList()`得到这个timeseries所包含的所有chunkMetadata列表。

Review comment:
       ```suggestion
   * `loadChunkMetadataList()`得到这个timeseries所包含的所有ChunkMetadata列表。
   ```

##########
File path: docs/zh/SystemDesign/5-DataQuery/2-QueryFundamentals.md
##########
@@ -0,0 +1,69 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 查询基础介绍
+
+## 顺序和乱序tsFile文件
+
+在对某一个时间序列插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序文件被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。
+
+IoTDB会将顺序和乱序文件分开存储在data/sequence和data/unsequence文件目录下。在查询过程中也会对顺序和乱序文件中的数据分别进行处理，我们总会使用`QueryResourceManager.java`中的`getQueryDataSource()`方法通过时间序列的全路径得到存储该时间序列的顺序和乱序文件。
+
+
+## 读取TsFile的一般流程
+
+TsFile 
各级结构在前面的[1-TsFile](/#/SystemDesign/progress/chap1/sec1)文档中已有介绍，读取一个时间序列的过程需要按照层级各级展开TsFileResource
 -> TimeseriesMetadata -> ChunkMetadata -> IPageReader -> BatchData。
+
+文件读取的功能方法在
+`org.apache.iotdb.db.utils.FileLoaderUtils`
+
+* 
`loadTimeSeriesMetadata()`用来读取一个TsFileResource对应于某一个时间序列的timeseriesMetadata，该方法同时接受一个时间戳的Filter条件来保证该方法返回满足条件的timeseriesMetadata，若没有满足条件的timeseriesMetadata则返回null。
+* `loadChunkMetadataList()`得到这个timeseries所包含的所有chunkMetadata列表。
+* `loadPageReaderList()`可以用来读取一个chunkMetadata所包含的所有page列表，用pageReader来进行访问。
+
+以上在对于时间序列数据的各种读取方法中总会涉及到读取内存和磁盘数据两种情况。
+
+读取内存数据是指读取存在于memtable中但尚未被写入磁盘的数据，例如`loadTimeSeriesMetadata()`中使用`TsFileResource.getTimeSeriesMetadata()`得到一个未被封口的timeseriesMetadata。一旦这个timeseriesMetadata被flush到磁盘中之后,我们将只能通过访问磁盘读取到其中的数据。磁盘和内存读取metadata的相关类为DiskChunkMetadataLoader和MemChunkMetadataLoader。
+
+`loadPageReaderList()`读取page数据也是一样，分别通过两个辅助类MemChunkLoader和DiskChunkLoader进行处理。
+
+
+
+## 顺序和乱序文件的数据特点
+
+对于顺序和乱序文件的数据，其数据在文件中的分部特征有所不同。
+顺序文件的TimeseriesMetadata中所包含的ChunkMetadata也是有序的，也就是说如果按照chunkMetadata1, 
chunkMetadata2的顺序存储，那么将会保证chunkMetadata1.endtime <= chunkMetadata2.startTime。
+
+乱序文件的TimeseriesMetadata中所包含的ChunkMetadata是无序的，乱序文件中多个Chunk所覆盖的数据可能存在重叠，同时也可能与顺序文件中的chunk数据存在重叠。
+
+每个Chunk结构内部所包含的Page数据总是有序的，不管是从属于顺序文件还是乱序文件。也就是说前一个Page的最大时间戳不小于后一个的最小时间戳。因此在查询过程中可以充分利用这种有序性，通过统计信息对Page数据进行提前筛选。
+
+
+
+## 查询中的数据修改处理

Review comment:
       现在3-ModificationHandle 还没删吧，跟 3-SeriesReader 编号冲突了，或者把 
ModificationHandle 里的移动到这里来，或者把ModificationHandle和之后的往后顺移一位。

##########
File path: docs/zh/SystemDesign/5-DataQuery/2-QueryFundamentals.md
##########
@@ -0,0 +1,69 @@
+<!--
+
+    Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+    
+        http://www.apache.org/licenses/LICENSE-2.0
+    
+    Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+-->
+
+# 查询基础介绍
+
+## 顺序和乱序tsFile文件
+
+在对某一个时间序列插入数据的过程中，由于插入数据的时间戳的特点会产生顺序和乱序的tsFile文件。如果我们按照时间戳递增的顺序插入数据，那么只会产生顺序文件。顺序文件被写入到磁盘后，一旦新写入的数据时间戳在顺序文件的最大时间戳之前则会产生乱序文件。
+
+IoTDB会将顺序和乱序文件分开存储在data/sequence和data/unsequence文件目录下。在查询过程中也会对顺序和乱序文件中的数据分别进行处理，我们总会使用`QueryResourceManager.java`中的`getQueryDataSource()`方法通过时间序列的全路径得到存储该时间序列的顺序和乱序文件。
+
+
+## 读取TsFile的一般流程
+
+TsFile 
各级结构在前面的[1-TsFile](/#/SystemDesign/progress/chap1/sec1)文档中已有介绍，读取一个时间序列的过程需要按照层级各级展开TsFileResource
 -> TimeseriesMetadata -> ChunkMetadata -> IPageReader -> BatchData。
+
+文件读取的功能方法在
+`org.apache.iotdb.db.utils.FileLoaderUtils`
+
+* 
`loadTimeSeriesMetadata()`用来读取一个TsFileResource对应于某一个时间序列的timeseriesMetadata，该方法同时接受一个时间戳的Filter条件来保证该方法返回满足条件的timeseriesMetadata，若没有满足条件的timeseriesMetadata则返回null。
+* `loadChunkMetadataList()`得到这个timeseries所包含的所有chunkMetadata列表。
+* `loadPageReaderList()`可以用来读取一个chunkMetadata所包含的所有page列表，用pageReader来进行访问。

Review comment:
       ```suggestion
   * `loadPageReaderList()`可以用来读取一个 ChunkMetadata 对应的 Chunk 
所包含的所有page列表，用PageReader来进行访问。
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-iotdb] qiaojialin commented on a change in pull request #1101: [IOTDB-611] Add documents introducing Data Query design fundamentals

Reply via email to