[jira] [Created] (IOTDB-562) Apache IoTDB raft log persistence in the distributed version
Tian Jiang created IOTDB-562: Summary: Apache IoTDB raft log persistence in the distributed version Key: IOTDB-562 URL: https://issues.apache.org/jira/browse/IOTDB-562 Project: Apache IoTDB Issue Type: Improvement Components: Core/Cluster Reporter: Tian Jiang Assignee: Kaifeng Xue IoTDB is a highly efficient time series database, which supports high-speed query process, including aggregation query. Currently, IoTDB has supported shared-nothing cluster which using raft mechanism and raft logs to communicate among all nodes. So raft logs are very important in communication, consistency keeping, and fail-over. However, the current logs are just stored in memory which means raft logs will lost when the nodes down and then recover. Secondly, the raft logs may overlap with current WAL which means we may do some unnecessary log writing works. So there are two improvements about raft logs need to be done: 1. Store the raft logs in a durable material such as a disk. You need to design a serializable form of logs and then put them to disk. 2. Find a way of using raft logs in the IoTDB recovery process. That's means we just write raft logs rather than both raft logs and WAL. This will avoid some unnecessary log writing works and improve insertion performance. This proposal is mainly for improving raft logs in clustered IOTDB. Besides, if we can let the summary info be more useful, it could be better. Notice that the premise is that the raft logs writing process should not be slow down too much. That means the serializable form should be high efficiency enough. You should know: • IoTDB cluster structure • IoTDB WAL • IoTDB insertion process • Raft • Java difficulty: Major mentors: jt2594...@163.com -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-557) Use generalized "and" and "or"
Tian Jiang created IOTDB-557: Summary: Use generalized "and" and "or" Key: IOTDB-557 URL: https://issues.apache.org/jira/browse/IOTDB-557 Project: Apache IoTDB Issue Type: Improvement Components: Planner/SQLOptimizer Reporter: Tian Jiang We only use binary "and" and "or" in expression construction, as a result, a filter like "root.group1.*.s1 > 1" (* ranges from d1 to d100) will result in a filter tree of over 100 nodes, which could be a great waste when there are even more series. By using generalized "and" and "or", we can replace such a filter tree with just one filter node, which I think could relieve the Java GC a lot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-552) Restrictions of predicates in ALIGN_BY_DEVICE statements are not well stated
Tian Jiang created IOTDB-552: Summary: Restrictions of predicates in ALIGN_BY_DEVICE statements are not well stated Key: IOTDB-552 URL: https://issues.apache.org/jira/browse/IOTDB-552 Project: Apache IoTDB Issue Type: Bug Components: Document Affects Versions: master branch Reporter: Tian Jiang Fix For: master branch Although value predicates are newly allowed in ALIGN_BY_DEVICE statements, their statements are not well stated in the documents, like that wildcard cannot be used in it. So the documents should be updated with those restrictions. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-551) Prefix in predicates
Tian Jiang created IOTDB-551: Summary: Prefix in predicates Key: IOTDB-551 URL: https://issues.apache.org/jira/browse/IOTDB-551 Project: Apache IoTDB Issue Type: Bug Components: Planner/SQLParser Affects Versions: master branch Reporter: Tian Jiang Attachments: image-2020-03-10-16-45-36-508.png, image-2020-03-10-16-51-36-084.png When I looked into the SQL definitions (the antlr file), I found that prefixes are allowed in predicates. !image-2020-03-10-16-45-36-508.png|thumbnail! It is weird because I think it would be difficult to define "WHERE root.group1.device1 > 100". And when I tried to query with such a predicate, I got a "no such timeseries". !image-2020-03-10-16-51-36-084.png|thumbnail! So this grammar had better be corrected. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager
[ https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-439. Resolution: Fixed > [Distributed] Incorrect Snapshot implementation and LogManager > -- > > Key: IOTDB-439 > URL: https://issues.apache.org/jira/browse/IOTDB-439 > Project: Apache IoTDB > Issue Type: Sub-task >Reporter: Xiangdong Huang >Priority: Major > > I read the log/snapshot and manage packages in current cluster_new branch, > and have some questions: > 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are > incorrect as > a. they still store log into memory while the JavaDoc says they do not > store data in memory. > b. When doing snapshot, do they need to consider the part of the log in > memory? > > 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) > uses sync keyword to guarantee that for each call. > a. a better design? > b. is there any performance problem? as all operations are serialization. > > 3. Consider the Raft Protocol, don't we need APIs like > `removeLogFrom(startIndex)` in LogManager? see the case of Figure 7 in Raft > paper [1] > > [1] [https://raft.github.io/raft.pdf] > > [~jt2594838] may know clearly about current implementation. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-534) [Distributed] Query coordinating
Tian Jiang created IOTDB-534: Summary: [Distributed] Query coordinating Key: IOTDB-534 URL: https://issues.apache.org/jira/browse/IOTDB-534 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang When using multiple replicas, it is vital for the query performance that the queries are properly coordinated, i.e., for each query, find the best replicas to execute it so that the overall workload is balanced and the caches (if exist) are utilized maximumly. To establish an effective query coordination mechanism, one must decide what status of a node is relevant to the query performance, as its CPU usage, disk usage, memory usage, network usage and so on. And build a model based on the collected information to determine which node is the best for a query. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-532) [Distributed] Enabling parallel processing within a data group
Tian Jiang created IOTDB-532: Summary: [Distributed] Enabling parallel processing within a data group Key: IOTDB-532 URL: https://issues.apache.org/jira/browse/IOTDB-532 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang In the present implementation, the logs in a data group are executed serially which means for one data group, there is only one client that can be served at the same time. To increase concurrency, the data group should be able to process multiple client requests simultaneously. In order to do this, the following changes should be made: The log manager should be locked only when getting a new index. When a log is failed, the logs after it should also be removed. The internal retires should be added to overcome temporary network failure or the thread being switched out which causing the logs with larger index to arrive ahead. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-526) [Distributed]Support metadata queries
Tian Jiang created IOTDB-526: Summary: [Distributed]Support metadata queries Key: IOTDB-526 URL: https://issues.apache.org/jira/browse/IOTDB-526 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Assignee: Tian Jiang Fix For: master branch Metadata queries, like "getNodeList", "getPathNextChildern", "getTimeseriesSchema" are currently unsupported. The point is that the paths being queried may contain wildcards(*) or they may be prefix paths, which makes it a little hard to figure out which data groups to send the query. The simplest way may be performing a broadcast and merge the results, which is clearly less efficient. I am hoping you can give a more brilliant idea to resolve this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-522) Aggregation result should be serializable
Tian Jiang created IOTDB-522: Summary: Aggregation result should be serializable Key: IOTDB-522 URL: https://issues.apache.org/jira/browse/IOTDB-522 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Fix For: master branch As the coordinator node in the distributed version should gather the aggregation results from other nodes and merge them, the AggregationResults must be serializable for the nodes to transfer them. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-520) Result of IBatchReader should not cross partition
Tian Jiang created IOTDB-520: Summary: Result of IBatchReader should not cross partition Key: IOTDB-520 URL: https://issues.apache.org/jira/browse/IOTDB-520 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Fix For: master branch Situation: Assuming daily partitioning. Node A manages the data of Day1,3,5 and Node B manages the data of Day2,4. In the current implementation, when the coordinator node fetches a batch from Node A, the batch may contain data of Day1,3 and the batch from Node B contains data of Day2. As a result, the coordinator node must merge the two batches to retain an ordered batch. But if the batches never cross the partition border, the coordinator node will be able to just return the batches without merging using a heap comparing the first element of each batch, which could reduce the merging overheads. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-514) To support aggregation in the distributed version
Tian Jiang created IOTDB-514: Summary: To support aggregation in the distributed version Key: IOTDB-514 URL: https://issues.apache.org/jira/browse/IOTDB-514 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Fix For: master branch The current aggregation cannot satisfy the needs in the distributed version, to be specific, there are two points that should be satisfied: 1. The FirstValueAggrResult and LastValueAggrResult should also contain timestamp. Without a timestamp, the coordinator node cannot tell the result from which node is the true first/last. 2. An AggrResult should be able to merge with another. When we get AggrResults from all nodes that participate in the query, these results should be merged to generate a new result. To resolve the issue, you should: 1. Add a field `timestamp` in FirstValueAggrResult and LastValueAggrResult 2. Add an abstract method `merge(AggregateResult another)` in AggregateResult and implement it properly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-505) Add TsFileFilter in series reader to support distributed query
[ https://issues.apache.org/jira/browse/IOTDB-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-505. Resolution: Fixed > Add TsFileFilter in series reader to support distributed query > -- > > Key: IOTDB-505 > URL: https://issues.apache.org/jira/browse/IOTDB-505 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Assignee: Tian Jiang >Priority: Major > Labels: distributed, filter, query > Fix For: master branch > > > In the distributed version of IoTDB, data of different data groups of a node > is mixed together in an IoTDB instance. As a result, when querying the data > of one group, data of other groups will also be queried, which is not desired. > To resolve this, we need to add a TsFileFilter in series readers that will > filter the TsFiles accordingly, so that the unwanted data will not be queried. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-511) Deault directories are not platform-independent
Tian Jiang created IOTDB-511: Summary: Deault directories are not platform-independent Key: IOTDB-511 URL: https://issues.apache.org/jira/browse/IOTDB-511 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang Assignee: Tian Jiang Fix For: master branch Attachments: image-2020-02-24-10-27-21-348.png, image-2020-02-24-10-28-48-113.png !image-2020-02-24-10-27-21-348.png|thumbnail! The default directories in IoTDBConfig are using a specific path separator, which may cause trouble in some platforms. !image-2020-02-24-10-28-48-113.png|thumbnail! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-505) Add TsFileFilter in series reader to support distributed query
Tian Jiang created IOTDB-505: Summary: Add TsFileFilter in series reader to support distributed query Key: IOTDB-505 URL: https://issues.apache.org/jira/browse/IOTDB-505 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang In the distributed version of IoTDB, data of different data groups of a node is mixed together in an IoTDB instance. As a result, when querying the data of one group, data of other groups will also be queried, which is not desired. To resolve this, we need to add a TsFileFilter in series readers that will filter the TsFiles accordingly, so that the unwanted data will not be queried. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-504) Confusing implementations of next()
Tian Jiang created IOTDB-504: Summary: Confusing implementations of next() Key: IOTDB-504 URL: https://issues.apache.org/jira/browse/IOTDB-504 Project: Apache IoTDB Issue Type: Bug Affects Versions: master branch Reporter: Tian Jiang Attachments: image-2020-02-20-17-54-01-852.png Some implementations of `next()` in readers are confusing, calling `next()` without calling `hasNext()` will not move the cursor to the next element, which is counter-common-practice. See the attachment for one example. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-460) [Distributed]Remove data of outdated slots
Tian Jiang created IOTDB-460: Summary: [Distributed]Remove data of outdated slots Key: IOTDB-460 URL: https://issues.apache.org/jira/browse/IOTDB-460 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang In node addition/removal, the slots managed by a node will change. However, the data of corresponding slots cannot be removed yet because the new holders of the slots will pull such data to themselves. As the data pulling is issued by the new holders randomly, it is hard for the previous holders to find out when will the data be needless. As a result, the previous holder cannot delete the local data with confidence. It will be necessary to find a way for the previous holders to know when the data has been replicated to the new holders so that they will be able to remove the local data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-451) [Distributed]Recovery of snapshot pulling
Tian Jiang created IOTDB-451: Summary: [Distributed]Recovery of snapshot pulling Key: IOTDB-451 URL: https://issues.apache.org/jira/browse/IOTDB-451 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang After the addition/removal of a node, snapshots of slots are pulled from the previous holders to the new holders. In case that the new holders are down and restarted, it would be better to restart the pulling from a breakpoint instead of starting over. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-372) [Distributed] Support node deletion.
[ https://issues.apache.org/jira/browse/IOTDB-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029556#comment-17029556 ] Tian Jiang commented on IOTDB-372: -- https://www.processon.com/diagraming/5e37c840e4b006a43aea52cb Procedure design. > [Distributed] Support node deletion. > > > Key: IOTDB-372 > URL: https://issues.apache.org/jira/browse/IOTDB-372 > Project: Apache IoTDB > Issue Type: New Feature >Reporter: Tian Jiang >Priority: Major > Labels: distributed > > Currently, only node addition is supported, to take a step toward scaling > even auto-scaling, node deletion. Node deletion is no simple reversion of > node addition, it should be carefully designed, discussed and verified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager
[ https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029513#comment-17029513 ] Tian Jiang commented on IOTDB-439: -- 3. Currently, before one log is either timed out or committed, the next log is blocked as the high concurrency is mainly supported by partitioning. As operations in the same partition are basically serialized, this may not be a big issue. So the current method "replaceLastLog" is enough. Still, holding multiple uncommitted logs is still a future optimization. > [Distributed] Incorrect Snapshot implementation and LogManager > -- > > Key: IOTDB-439 > URL: https://issues.apache.org/jira/browse/IOTDB-439 > Project: Apache IoTDB > Issue Type: Sub-task >Reporter: Xiangdong Huang >Priority: Major > > I read the log/snapshot and manage packages in current cluster_new branch, > and have some questions: > 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are > incorrect as > a. they still store log into memory while the JavaDoc says they do not > store data in memory. > b. When doing snapshot, do they need to consider the part of the log in > memory? > > 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) > uses sync keyword to guarantee that for each call. > a. a better design? > b. is there any performance problem? as all operations are serialization. > > 3. Consider the Raft Protocol, don't we need APIs like > `removeLogFrom(startIndex)` in LogManager? see the case of Figure 7 in Raft > paper [1] > > [1] [https://raft.github.io/raft.pdf] > > [~jt2594838] may know clearly about current implementation. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager
[ https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029508#comment-17029508 ] Tian Jiang commented on IOTDB-439: -- 2.a. If you have concrete advice, it is welcomed. But using "synchronized" can minimize the scope being locked, as far as I see, I do not think there is any problem. 2.b. Performance concerns are currently left behind, and there is no proof supporting that. And for correctness, synchronization is necessary. Tightening the scope needed to be synchronized may be a good optimization, but it is too early for now. > [Distributed] Incorrect Snapshot implementation and LogManager > -- > > Key: IOTDB-439 > URL: https://issues.apache.org/jira/browse/IOTDB-439 > Project: Apache IoTDB > Issue Type: Sub-task >Reporter: Xiangdong Huang >Priority: Major > > I read the log/snapshot and manage packages in current cluster_new branch, > and have some questions: > 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are > incorrect as > a. they still store log into memory while the JavaDoc says they do not > store data in memory. > b. When doing snapshot, do they need to consider the part of the log in > memory? > > 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) > uses sync keyword to guarantee that for each call. > a. a better design? > b. is there any performance problem? as all operations are serialization. > > 3. Consider the Raft Protocol, don't we need APIs like > `removeLogFrom(startIndex)` in LogManager? see the case of Figure 7 in Raft > paper [1] > > [1] [https://raft.github.io/raft.pdf] > > [~jt2594838] may know clearly about current implementation. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager
[ https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029502#comment-17029502 ] Tian Jiang commented on IOTDB-439: -- 1.a. It is saying the committed logs do not have to be stored in the memory while storing them may bring some benefit for catching up, but it is not necessary. Please compare it with PartitionedSnapshotLogManager carefully for a better understanding. 1.b. Committed logs are covered by snapshots and non-committed logs do not concern snapshot. They are already considered. So I do not know why you call it "incorrect". > [Distributed] Incorrect Snapshot implementation and LogManager > -- > > Key: IOTDB-439 > URL: https://issues.apache.org/jira/browse/IOTDB-439 > Project: Apache IoTDB > Issue Type: Sub-task >Reporter: Xiangdong Huang >Priority: Major > > I read the log/snapshot and manage packages in current cluster_new branch, > and have some questions: > 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are > incorrect as > a. they still store log into memory while the JavaDoc says they do not > store data in memory. > b. When doing snapshot, do they need to consider the part of the log in > memory? > > 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) > uses sync keyword to guarantee that for each call. > a. a better design? > b. is there any performance problem? as all operations are serialization. > > 3. Consider the Raft Protocol, don't we need APIs like > `removeLogFrom(startIndex)` in LogManager? see the case of Figure 7 in Raft > paper [1] > > [1] [https://raft.github.io/raft.pdf] > > [~jt2594838] may know clearly about current implementation. > > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-422) Close current files before merge.
Tian Jiang created IOTDB-422: Summary: Close current files before merge. Key: IOTDB-422 URL: https://issues.apache.org/jira/browse/IOTDB-422 Project: Apache IoTDB Issue Type: Bug Affects Versions: 0.10.0-SNAPSHOT Reporter: Tian Jiang Fix For: 0.10.0-SNAPSHOT If some unseq file overlaps the unsealed seq file and a merge is triggered, the overlapped data may not be able to be merged into the right file. To resolve this, the files should be closed before a merge starts. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-420) Avoid flush encoding task dying silently.
Tian Jiang created IOTDB-420: Summary: Avoid flush encoding task dying silently. Key: IOTDB-420 URL: https://issues.apache.org/jira/browse/IOTDB-420 Project: Apache IoTDB Issue Type: Bug Affects Versions: 0.10.0-SNAPSHOT Reporter: Tian Jiang Fix For: 0.10.0-SNAPSHOT If a runtime exception is thrown in an encoding sub-task, it will die silently and prevents the io task from ending. To avoid this, the future of the encoding task should be got before that of the io task is got. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-412) Paths are not correctly deduplicated
[ https://issues.apache.org/jira/browse/IOTDB-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-412. Resolution: Fixed > Paths are not correctly deduplicated > > > Key: IOTDB-412 > URL: https://issues.apache.org/jira/browse/IOTDB-412 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 0.10.0-SNAPSHOT >Reporter: Tian Jiang >Assignee: atoildw >Priority: Major > Labels: pull-request-available, query > Fix For: 0.10.0-SNAPSHOT > > Attachments: Paths are duplicated in GroupByPlan.docx > > Time Spent: 20m > Remaining Estimate: 0h > > Please check the attachment for details. I am not sure if other plans have > the same problem, those who take over this should have a look. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-412) Paths are not correctly deduplicated
[ https://issues.apache.org/jira/browse/IOTDB-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012563#comment-17012563 ] Tian Jiang commented on IOTDB-412: -- Somehow I cannot assign this to you on Jira, but I will keep in mind that you are working on it, thanks. > Paths are not correctly deduplicated > > > Key: IOTDB-412 > URL: https://issues.apache.org/jira/browse/IOTDB-412 > Project: Apache IoTDB > Issue Type: Bug >Affects Versions: 0.10.0-SNAPSHOT >Reporter: Tian Jiang >Priority: Major > Labels: query > Fix For: 0.10.0-SNAPSHOT > > Attachments: Paths are duplicated in GroupByPlan.docx > > > Please check the attachment for details. I am not sure if other plans have > the same problem, those who take over this should have a look. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-412) Paths are not correctly deduplicated
Tian Jiang created IOTDB-412: Summary: Paths are not correctly deduplicated Key: IOTDB-412 URL: https://issues.apache.org/jira/browse/IOTDB-412 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang Fix For: 0.10.0-SNAPSHOT Attachments: Paths are duplicated in GroupByPlan.docx Please check the attachment for details. I am not sure if other plans have the same problem, those who take over this should have a look. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot
[ https://issues.apache.org/jira/browse/IOTDB-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005984#comment-17005984 ] Tian Jiang commented on IOTDB-352: -- This is the current solution, it is not perfect. Any suggestions or new ideas are welcomed. > [Distributed] Recognize and skip duplicated files in a snapshot > --- > > Key: IOTDB-352 > URL: https://issues.apache.org/jira/browse/IOTDB-352 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Priority: Major > Labels: distributed > > By the naming of TsFiles in IoTDB, the files with the same data may have > different names on different nodes. When such files are sent through > snapshots, the receiver is unable to tell whether the file already exists > locally or not, so it will blindly load the file as an unsequential one (if > it does overlap any existing files), which will waste a lot of system > resources. > How can we figure out if we already have one file or not? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot
[ https://issues.apache.org/jira/browse/IOTDB-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005983#comment-17005983 ] Tian Jiang commented on IOTDB-352: -- Adding md5 is not helpful in this issue, it may be used to check the file integrity of files during file transfers, but that is another issue. > [Distributed] Recognize and skip duplicated files in a snapshot > --- > > Key: IOTDB-352 > URL: https://issues.apache.org/jira/browse/IOTDB-352 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Priority: Major > Labels: distributed > > By the naming of TsFiles in IoTDB, the files with the same data may have > different names on different nodes. When such files are sent through > snapshots, the receiver is unable to tell whether the file already exists > locally or not, so it will blindly load the file as an unsequential one (if > it does overlap any existing files), which will waste a lot of system > resources. > How can we figure out if we already have one file or not? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-385) Bloom Filter for time ranges
Tian Jiang created IOTDB-385: Summary: Bloom Filter for time ranges Key: IOTDB-385 URL: https://issues.apache.org/jira/browse/IOTDB-385 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Situation: Device1 generates data at 1 pm, 5 pm, and 8 pm, Device2 generates data at 1 pm, 7 pm, and 8 pm. The query is "SELECT * FROM Device1, Device2 WHERE 13:00:00 < time < 18:00:00". It is clear that Device2 is not satisfied, but we still need to query it since we currently only record startTime and endTIme for each device. Solution: For each device, assuming its startTime is t_s, then each timestamp _t_d >= t_s_ can be cast to a time range id using: _id = ceiling((t_d - t_s) / interval_length)_, where the interval_length is 1 hour for the above example. Having this id, a bloom filter (maybe other filters) can be built to tell if we truly have data satisfying the time condition. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-373) [Distributed] Query coordinating
Tian Jiang created IOTDB-373: Summary: [Distributed] Query coordinating Key: IOTDB-373 URL: https://issues.apache.org/jira/browse/IOTDB-373 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang When using more than one replicas, query options are enriched and complicated. To ensure load balancing, we may have to choose the node with lowest load to perform the query, which requests knowing the load of each node and a formula to rank the nodes based on their status. We can also issue the same query to multiple replicas and pick up the fastest one to respond to the user as MapReduce has done. But this may result in resource wasting unless we feasibly support quick query cancellation. In a word, we should decide which replica(s) to serve a query and what information we need to collect to make the decision. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-372) [Distributed] Support node deletion.
Tian Jiang created IOTDB-372: Summary: [Distributed] Support node deletion. Key: IOTDB-372 URL: https://issues.apache.org/jira/browse/IOTDB-372 Project: Apache IoTDB Issue Type: New Feature Reporter: Tian Jiang Currently, only node addition is supported, to take a step toward scaling even auto-scaling, node deletion. Node deletion is no simple reversion of node addition, it should be carefully designed, discussed and verified. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-361) Refactor session management by inducing sessionId
[ https://issues.apache.org/jira/browse/IOTDB-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-361. Fix Version/s: 0.10.0-SNAPSHOT Assignee: Tian Jiang Resolution: Fixed > Refactor session management by inducing sessionId > - > > Key: IOTDB-361 > URL: https://issues.apache.org/jira/browse/IOTDB-361 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Assignee: Tian Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.10.0-SNAPSHOT > > Time Spent: 20m > Remaining Estimate: 0h > > We are using ThreadLocals in TSServiceImpl to distinguish different clients, > which rely on the underlined server pool to provide a thread for each client > and blocks us from using more efficient pooling techs. > To resolve this, each client should be given a sessionId (or you may call it > clientId) as an identifier to replace the usages of ThreadLocal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-361) Refactor session management by inducing sessionId
Tian Jiang created IOTDB-361: Summary: Refactor session management by inducing sessionId Key: IOTDB-361 URL: https://issues.apache.org/jira/browse/IOTDB-361 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang We are using ThreadLocals in TSServiceImpl to distinguish different clients, which rely on the underlined server pool to provide a thread for each client and blocks us from using more efficient pooling techs. To resolve this, each client should be given a sessionId (or you may call it clientId) as an identifier to replace the usages of ThreadLocal. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-355) [Distributed] Start-up checks
Tian Jiang created IOTDB-355: Summary: [Distributed] Start-up checks Key: IOTDB-355 URL: https://issues.apache.org/jira/browse/IOTDB-355 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang A node should check the following items before it is set up: The size of seed-nodes should be no less than the quorum. When a node joins the cluster or the seed-nodes are trying to form the initial cluster: Configurations like partition interval, hash salt, replication number should be the same for all nodes. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-353) [Distributed] Validate files in snapshots
Tian Jiang created IOTDB-353: Summary: [Distributed] Validate files in snapshots Key: IOTDB-353 URL: https://issues.apache.org/jira/browse/IOTDB-353 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang After a node pulls a file from a remote node in the snapshot, it does not check the integrity of this file. The file should be validated using md5 or other verification methods to avoid file corruption due to bad network or anything. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot
Tian Jiang created IOTDB-352: Summary: [Distributed] Recognize and skip duplicated files in a snapshot Key: IOTDB-352 URL: https://issues.apache.org/jira/browse/IOTDB-352 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang By the naming of TsFiles in IoTDB, the files with the same data may have different names on different nodes. When such files are sent through snapshots, the receiver is unable to tell whether the file already exists locally or not, so it will blindly load the file as an unsequential one (if it does overlap any existing files), which will waste a lot of system resources. How can we figure out if we already have one file or not? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-351) [Distributed] Serialize the raft logs
Tian Jiang created IOTDB-351: Summary: [Distributed] Serialize the raft logs Key: IOTDB-351 URL: https://issues.apache.org/jira/browse/IOTDB-351 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang The raft logs are only memory-resident now if all nodes in a group crash, the logs will be lost permanently, so the logs should be persisted to the storage according to a certain strategy. Moreover, it is interesting how raft logs interact or even replace the existing WALs in IoTDB. They are currently independent to decouple the design of the distributed version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-350) [Distributed] Integrate with time partitioning of data
Tian Jiang created IOTDB-350: Summary: [Distributed] Integrate with time partitioning of data Key: IOTDB-350 URL: https://issues.apache.org/jira/browse/IOTDB-350 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang When time partitioning of data is supported in the standalone IoTDB, the distributed version should integrate with this feature and partition data using the same granularity as IoTDB's. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-349) [Distributed] Incrementally update snapshot
Tian Jiang created IOTDB-349: Summary: [Distributed] Incrementally update snapshot Key: IOTDB-349 URL: https://issues.apache.org/jira/browse/IOTDB-349 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Currently, the snapshot is fully recalculated when takeSnapshot() is called. It does not count much since snapshot-taking is relatively rare and the snapshot mainly concerns the list of the data files and timeseries schemas, not including the data. Still, using an incremental strategy would help to reduce meta tree traversing and file listing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-348) [Distributed] Support more non-query operations (log types)
Tian Jiang created IOTDB-348: Summary: [Distributed] Support more non-query operations (log types) Key: IOTDB-348 URL: https://issues.apache.org/jira/browse/IOTDB-348 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Currently supported operations: create storage group create timeseries single row insertion Please link to and reply to this issue if you added any new functionalities. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-322) Thrift should be upgraded
[ https://issues.apache.org/jira/browse/IOTDB-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-322. > Thrift should be upgraded > - > > Key: IOTDB-322 > URL: https://issues.apache.org/jira/browse/IOTDB-322 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2019-11-26-12-08-26-149.png > > Time Spent: 20m > Remaining Estimate: 0h > > The current thrift version (0.9.3) has a bug that mistakenly converts the > TApplicationException to TBase, but TApplicationException does not extend > TBase. > !image-2019-11-26-12-08-26-149.png|thumbnail! > Upgrading to 0.10.0 or higher will fix this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-276) Inconsistent ways of judging the nullness of a Field
[ https://issues.apache.org/jira/browse/IOTDB-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-276. > Inconsistent ways of judging the nullness of a Field > > > Key: IOTDB-276 > URL: https://issues.apache.org/jira/browse/IOTDB-276 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0 > > Attachments: image-2019-10-29-11-43-40-180.png, > image-2019-10-29-11-45-53-763.png > > Time Spent: 20m > Remaining Estimate: 0h > > Several places are using the `dataType == null` to judge whether a field is > null or not, while there is a field `isNull` which better suits this job. > The inconsistent usages may result in that one sets `isNull` to true but find > that the displayed result is not null. > !image-2019-10-29-11-43-40-180.png|thumbnail! > !image-2019-10-29-11-45-53-763.png|thumbnail! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-322) Thrift should be upgraded
Tian Jiang created IOTDB-322: Summary: Thrift should be upgraded Key: IOTDB-322 URL: https://issues.apache.org/jira/browse/IOTDB-322 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang Attachments: image-2019-11-26-12-08-26-149.png The current thrift version (0.9.3) has a bug that mistakenly converts the TApplicationException to TBase, but TApplicationException does not extend TBase. !image-2019-11-26-12-08-26-149.png|thumbnail! Upgrading to 0.10.0 or higher will fix this problem. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-314) Partition the data in a storage group by time.
Tian Jiang created IOTDB-314: Summary: Partition the data in a storage group by time. Key: IOTDB-314 URL: https://issues.apache.org/jira/browse/IOTDB-314 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang In many analytic applications, reports are generated daily, weekly or monthly. If the data files are naturally partitioned by such intervals, such applications will be able to find the target data more easily. Other functionalities like daily replication or transfers also benefit from this. As a result, we should support embedded storage-group-level time partitioning in IoTDB, which allows each TsFile generated by IoTDB will not have data exceeds a configurable interval (e.g. a day). By the way, this is also the fundamental support needed by the distributed version. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-208) Add bloom filters to TsFile
[ https://issues.apache.org/jira/browse/IOTDB-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-208. > Add bloom filters to TsFile > --- > > Key: IOTDB-208 > URL: https://issues.apache.org/jira/browse/IOTDB-208 > Project: Apache IoTDB > Issue Type: New Feature >Reporter: Tian Jiang >Priority: Minor > Labels: pull-request-available > Fix For: 0.9.0-SNAPSHOT > > Time Spent: 20m > Remaining Estimate: 0h > > The recent readings remind me that the bloom filter is standard equipment in > K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom > filter still helps a lot in various situations. For example, our recent > experiments gave us an illusion that the time series in a storage group > remains unchanged. However, that is not the case. > Naturally, in real situations, the number of time series grows over time, due > to reasons like adding new gears. The old files do not contain such a time > series. Without the help of bloom filters, we have to check each old file > only to find that there is no such time series. To my knowledge, this may > take a lot of time. > So, I suggest we add a bloom filter (or some more efficient one) to each > TsFile to help skip unwanted files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-262) CachedPriortityMergeReader fails to deduplicate some elements
[ https://issues.apache.org/jira/browse/IOTDB-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-262. > CachedPriortityMergeReader fails to deduplicate some elements > - > > Key: IOTDB-262 > URL: https://issues.apache.org/jira/browse/IOTDB-262 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.9.0-SNAPSHOT > > Attachments: duplicated (1).png > > Time Spent: 20m > Remaining Estimate: 0h > > CachedPriortityMergeReader fails to deduplicate the element at the end of the > cache. The picture in the attachment explains this. > I plan to record the last timestamp to help to deduplicate such elements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-276) Inconsistent ways of judging the nullness of a Field
Tian Jiang created IOTDB-276: Summary: Inconsistent ways of judging the nullness of a Field Key: IOTDB-276 URL: https://issues.apache.org/jira/browse/IOTDB-276 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang Attachments: image-2019-10-29-11-43-40-180.png, image-2019-10-29-11-45-53-763.png Several places are using the `dataType == null` to judge whether a field is null or not, while there is a field `isNull` which better suits this job. The inconsistent usages may result in that one sets `isNull` to true but find that the displayed result is not null. !image-2019-10-29-11-43-40-180.png|thumbnail! !image-2019-10-29-11-45-53-763.png|thumbnail! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-262) CachedPriortityMergeReader fails to deduplicate some elements
Tian Jiang created IOTDB-262: Summary: CachedPriortityMergeReader fails to deduplicate some elements Key: IOTDB-262 URL: https://issues.apache.org/jira/browse/IOTDB-262 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang Attachments: duplicated (1).png CachedPriortityMergeReader fails to deduplicate the element at the end of the cache. The picture in the attachment explains this. I plan to record the last timestamp to help to deduplicate such elements. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-13) Support batched ingestion
[ https://issues.apache.org/jira/browse/IOTDB-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-13. --- > Support batched ingestion > - > > Key: IOTDB-13 > URL: https://issues.apache.org/jira/browse/IOTDB-13 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Assignee: Yanzhe An >Priority: Minor > Labels: pull-request-available > Fix For: 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Our current insertion interface is based on building one TsRecord for a > timestamp. This limits our capability when ingestion a large amount of > pre-generated data. > We need specifically designed batch load interface to improve our performance > when loading, say, historical data. For example, the size checks of multiple > PageWriters can be reduced to one in a batched fashion. Moreover, when the > schema of the data is static, we can use primitive arrays instead of Lists > which may incur the performance greatly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (IOTDB-143) Support merge
[ https://issues.apache.org/jira/browse/IOTDB-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-143. > Support merge > - > > Key: IOTDB-143 > URL: https://issues.apache.org/jira/browse/IOTDB-143 > Project: Apache IoTDB > Issue Type: New Feature >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Fix For: 0.8.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Merge (or compaction) is an important feature of LSM or LSM-like systems and > IoTDB depends on it to put the data in the sequential files and unsequential > files together and make them ordered and non-duplicated again. > Merged data files provide better locality and potentially higher compression > rate (for some of the missing values are supplemented). While merging > interacts with many aspects of IoTDB like ingestion and query, finding an > effective implementation may be rather difficult. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (IOTDB-208) Add bloom filters to TsFile
Tian Jiang created IOTDB-208: Summary: Add bloom filters to TsFile Key: IOTDB-208 URL: https://issues.apache.org/jira/browse/IOTDB-208 Project: Apache IoTDB Issue Type: New Feature Reporter: Tian Jiang The recent readings remind me that the bloom filter is standard equipment in K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom filter still helps a lot in various situations. For example, our recent experiments gave us an illusion that the time series in a storage group remains unchanged. However, that is not the case. Naturally, in real situations, the number of time series grows over time, due to reasons like adding new gears. The old files do not contain such a time series. Without the help of bloom filters, we have to check each old file only to find that there is no such time series. To my knowledge, this may take a lot of time. So, I suggest we add a bloom filter (or some more efficient one) to each TsFile to help skip unwanted files. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (IOTDB-163) Support create device template and create device.
Tian Jiang created IOTDB-163: Summary: Support create device template and create device. Key: IOTDB-163 URL: https://issues.apache.org/jira/browse/IOTDB-163 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang In the present version, it is a little trouble some to create a set timeseries that has the same measurements. On the other hand, although we use the conception "device" in the code, it is not properly abstracted. Expected usage: Using IoTDB in a more _*relational*_ way: *CREATE DEVICE TEMPLATE vehicle (speed DOUBLE PLAIN, direction* *DOUBLE PLAIN, temperature* *DOUBLE PLAIN, fuel* *DOUBLE PLAIN**)* If all datatypes(or encodings) are the same, you can write the equal form: *CREATE DEVICE TEMPLATE vehicle MEASUREMENTS (speed, direction, temperature, fuel) DATATYPE DOUBLE ENCODING PLAIN* Then you will be able to create time series in an easier way: *CREATE DEVICE (vehicle) root.sg1.vehicle1* Which equals: *CREATE TIMESERIES root.sg1.vehicle1.speed WITH DATATYPE=DOUBLE,ENCODING=PLAIN* *CREATE TIMESERIES root.sg1.vehicle1.direction WITH DATATYPE=DOUBLE,ENCODING=PLAIN* *CREATE TIMESERIES root.sg1.vehicle1.fuel WITH DATATYPE=DOUBLE,ENCODING=PLAIN* *CREATE TIMESERIES root.sg1.vehicle1.temperature WITH DATATYPE=DOUBLE,ENCODING=PLAIN* I ** hope this will narrow the gap of using IoTDB and traditional relation databases. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IOTDB-162) Fix the semantics of hasNext() and next().
Tian Jiang created IOTDB-162: Summary: Fix the semantics of hasNext() and next(). Key: IOTDB-162 URL: https://issues.apache.org/jira/browse/IOTDB-162 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Attachments: image-2019-08-14-09-50-05-929.png Some definitions of hasNext() and next() are misleading, for example, the following actually means hasCurrent rather than hasNext, say, when curIdx = timeLength - 1, it will true while actually there is not the next value. Such definitions conflict the hasNext() and next() defined and widely used in Java Iterator, and cause confusion to those who are not so familiar with the code. !image-2019-08-14-09-50-05-929.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (IOTDB-95) Keep stack traces when handling an Exception.
[ https://issues.apache.org/jira/browse/IOTDB-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-95. --- Resolution: Fixed > Keep stack traces when handling an Exception. > - > > Key: IOTDB-95 > URL: https://issues.apache.org/jira/browse/IOTDB-95 > Project: Apache IoTDB > Issue Type: Improvement >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > Currently, in some handlings of exceptions, the stack traces of the > exceptions are omitted, which significantly increases the difficulty of > locating problems precisely. > To provide more useful information for debugging, the stack traces should be > kept until they are logged at the top level. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IOTDB-143) Support merge
Tian Jiang created IOTDB-143: Summary: Support merge Key: IOTDB-143 URL: https://issues.apache.org/jira/browse/IOTDB-143 Project: Apache IoTDB Issue Type: New Feature Reporter: Tian Jiang Merge (or compaction) is an important feature of LSM or LSM-like systems and IoTDB depends on it to put the data in the sequential files and unsequential files together and make them ordered and non-duplicated again. Merged data files provide better locality and potentially higher compression rate (for some of the missing values are supplemented). While merging interacts with many aspects of IoTDB like ingestion and query, finding an effective implementation may be rather difficult. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (IOTDB-122) Support prepared insertion
[ https://issues.apache.org/jira/browse/IOTDB-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-122. > Support prepared insertion > -- > > Key: IOTDB-122 > URL: https://issues.apache.org/jira/browse/IOTDB-122 > Project: Apache IoTDB > Issue Type: New Feature >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > As some have mentioned, sql parser(antlr) may consume about 40% of time in > ingestion, especially when small sqls executed sent frequently. Luckily, > IoTDB insertion sqls are currently all alike and simple, there are 4 most > meaningful parts of such sqls: deviceId, measurements, values and time. For > such a simple structure, using tools like antlr may be just too heavy. > Intuitively, PreparedStatement in Standard JDBC interface can be just used > for reliving parsing overhead when sqls are similar. I will describe how > PreparedStatement works as follow (this is still left to be implemented): > 1. The user wants to create a prepared insert statement and called > `connection.prepareStatement(“Insert”)`; > 2. The connection matches the parameter string with some templates, finds out > it is an insertion and returns an IoTDBPreparedInsertStatement pStmt. > 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); > pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to set > parameters for next insertion. > 4. The user calls `pStmt.execute()` to execute an insertion. > 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, > measurements, values and time into this request and sends this request to the > server. > 6. The server receives the request, extracts parameters from the request and > executes an insertion directly through database engine and return a > TSInsertionResp to the user. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Closed] (IOTDB-107) WAL log node is missing after recovery.
[ https://issues.apache.org/jira/browse/IOTDB-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tian Jiang closed IOTDB-107. > WAL log node is missing after recovery. > --- > > Key: IOTDB-107 > URL: https://issues.apache.org/jira/browse/IOTDB-107 > Project: Apache IoTDB > Issue Type: Bug >Reporter: Tian Jiang >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > During recovery, the wal is disabled because the recovery process aims to > eventually remove all wals. As a result, BufferWriteProcessors and > OverflowProcessors are created without wal log node. When recovery is over, > even if the wal is enabled, it cannot function correctly due to the missing > of wal log nodes. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (IOTDB-122) Support prepared insertion
Tian Jiang created IOTDB-122: Summary: Support prepared insertion Key: IOTDB-122 URL: https://issues.apache.org/jira/browse/IOTDB-122 Project: Apache IoTDB Issue Type: New Feature Reporter: Tian Jiang As some have mentioned, sql parser(antlr) may consume about 40% of time in ingestion, especially when small sqls executed sent frequently. Luckily, IoTDB insertion sqls are currently all alike and simple, there are 4 most meaningful parts of such sqls: deviceId, measurements, values and time. For such a simple structure, using tools like antlr may be just too heavy. Intuitively, PreparedStatement in Standard JDBC interface can be just used for reliving parsing overhead when sqls are similar. I will describe how PreparedStatement works as follow (this is still left to be implemented): 1. The user wants to create a prepared insert statement and called `connection.prepareStatement(“Insert”)`; 2. The connection matches the parameter string with some templates, finds out it is an insertion and returns an IoTDBPreparedInsertStatement pStmt. 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to set parameters for next insertion. 4. The user calls `pStmt.execute()` to execute an insertion. 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, measurements, values and time into this request and sends this request to the server. 6. The server receives the request, extracts parameters from the request and executes an insertion directly through database engine and return a TSInsertionResp to the user. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IOTDB-107) WAL log node is missing after recovery.
Tian Jiang created IOTDB-107: Summary: WAL log node is missing after recovery. Key: IOTDB-107 URL: https://issues.apache.org/jira/browse/IOTDB-107 Project: Apache IoTDB Issue Type: Bug Reporter: Tian Jiang During recovery, the wal is disabled because the recovery process aims to eventually remove all wals. As a result, BufferWriteProcessors and OverflowProcessors are created without wal log node. When recovery is over, even if the wal is enabled, it cannot function correctly due to the missing of wal log nodes. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IOTDB-95) Keep stack traces when handling an Exception.
Tian Jiang created IOTDB-95: --- Summary: Keep stack traces when handling an Exception. Key: IOTDB-95 URL: https://issues.apache.org/jira/browse/IOTDB-95 Project: Apache IoTDB Issue Type: Improvement Reporter: Tian Jiang Currently, in some handlings of exceptions, the stack traces of the exceptions are omitted, which significantly increases the difficulty of locating problems precisely. To provide more useful information for debugging, the stack traces should be kept until they are logged at the top level. -- This message was sent by Atlassian JIRA (v7.6.3#76005)