[jira] [Created] (IOTDB-562) Apache IoTDB raft log persistence in the distributed version

2020-03-18 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-562:


 Summary: Apache IoTDB raft log persistence in the distributed 
version
 Key: IOTDB-562
 URL: https://issues.apache.org/jira/browse/IOTDB-562
 Project: Apache IoTDB
  Issue Type: Improvement
  Components: Core/Cluster
Reporter: Tian Jiang
Assignee: Kaifeng Xue


IoTDB is a highly efficient time series database, which supports high-speed 
query process, including aggregation query.

Currently, IoTDB has supported shared-nothing cluster which using raft 
mechanism and raft logs to communicate among all nodes. So raft logs are very 
important in communication, consistency keeping, and fail-over.

However, the current logs are just stored in memory which means raft logs will 
lost when the nodes down and then recover. Secondly, the raft logs may overlap 
with current WAL which means we may do some unnecessary log writing works.

So there are two improvements about raft logs need to be done:

1. Store the raft logs in a durable material such as a disk. You need to design 
a serializable form of logs and then put them to disk.

2. Find a way of using raft logs in the IoTDB recovery process. That's means we 
just write raft logs rather than both raft logs and WAL. This will avoid some 
unnecessary log writing works and improve insertion performance.

This proposal is mainly for improving raft logs in clustered IOTDB. Besides, if 
we can let the summary info be more useful, it could be better.

Notice that the premise is that the raft logs writing process should not be 
slow down too much. That means the serializable form should be high efficiency 
enough.

You should know:
• IoTDB cluster structure
• IoTDB WAL
• IoTDB insertion process
• Raft
• Java 

difficulty: Major
mentors: 
jt2594...@163.com



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-557) Use generalized "and" and "or"

2020-03-11 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-557:


 Summary: Use generalized "and" and "or"
 Key: IOTDB-557
 URL: https://issues.apache.org/jira/browse/IOTDB-557
 Project: Apache IoTDB
  Issue Type: Improvement
  Components: Planner/SQLOptimizer
Reporter: Tian Jiang


We only use binary "and" and "or" in expression construction, as a result, a 
filter like "root.group1.*.s1 > 1" (* ranges from d1 to d100) will result in a 
filter tree of over 100 nodes, which could be a great waste when there are even 
more series. 

By using generalized "and" and "or", we can replace such a filter tree with 
just one filter node, which I think could relieve the Java GC a lot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-552) Restrictions of predicates in ALIGN_BY_DEVICE statements are not well stated

2020-03-10 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-552:


 Summary: Restrictions of predicates in ALIGN_BY_DEVICE statements 
are not well stated
 Key: IOTDB-552
 URL: https://issues.apache.org/jira/browse/IOTDB-552
 Project: Apache IoTDB
  Issue Type: Bug
  Components: Document
Affects Versions: master branch
Reporter: Tian Jiang
 Fix For: master branch


Although value predicates are newly allowed in ALIGN_BY_DEVICE statements, 
their statements are not well stated in the documents, like that wildcard 
cannot be used in it. 

So the documents should be updated with those restrictions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-551) Prefix in predicates

2020-03-10 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-551:


 Summary: Prefix in predicates
 Key: IOTDB-551
 URL: https://issues.apache.org/jira/browse/IOTDB-551
 Project: Apache IoTDB
  Issue Type: Bug
  Components: Planner/SQLParser
Affects Versions: master branch
Reporter: Tian Jiang
 Attachments: image-2020-03-10-16-45-36-508.png, 
image-2020-03-10-16-51-36-084.png

When I looked into the SQL definitions (the antlr file), I found that prefixes 
are allowed in predicates. 
 !image-2020-03-10-16-45-36-508.png|thumbnail! 
It is weird because I think it would be difficult to define "WHERE 
root.group1.device1 > 100". And when I tried to query with such a predicate, I 
got a "no such timeseries".
 !image-2020-03-10-16-51-36-084.png|thumbnail! 
So this grammar had better be corrected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager

2020-03-02 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-439.

Resolution: Fixed

> [Distributed] Incorrect Snapshot implementation and LogManager
> --
>
> Key: IOTDB-439
> URL: https://issues.apache.org/jira/browse/IOTDB-439
> Project: Apache IoTDB
>  Issue Type: Sub-task
>Reporter: Xiangdong Huang
>Priority: Major
>
> I read the log/snapshot and manage packages in current cluster_new branch, 
> and have some questions:
> 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are 
> incorrect as
>    a. they still store log into memory while the JavaDoc says they do not 
> store data in memory.
>    b. When doing snapshot, do they need to consider the part of the log in 
> memory?
>  
> 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) 
> uses sync keyword to guarantee that for each call. 
>   a. a better design?
>   b. is there any performance problem? as all operations are serialization.
>  
> 3. Consider the Raft Protocol, don't we need APIs like 
> `removeLogFrom(startIndex)` in LogManager?  see the case of Figure 7 in Raft 
> paper [1] 
>  
> [1] [https://raft.github.io/raft.pdf]
>  
> [~jt2594838] may know clearly about current implementation.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-534) [Distributed] Query coordinating

2020-03-02 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-534:


 Summary: [Distributed] Query coordinating
 Key: IOTDB-534
 URL: https://issues.apache.org/jira/browse/IOTDB-534
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


When using multiple replicas, it is vital for the query performance that the 
queries are properly coordinated, i.e., for each query, find the best replicas 
to execute it so that the overall workload is balanced and the caches (if 
exist) are utilized maximumly.

 To establish an effective query coordination mechanism, one must decide what 
status of a node is relevant to the query performance, as its CPU usage, disk 
usage, memory usage, network usage and so on. And build a model based on the 
collected information to determine which node is the best for a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-532) [Distributed] Enabling parallel processing within a data group

2020-03-01 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-532:


 Summary: [Distributed] Enabling parallel processing within a data 
group
 Key: IOTDB-532
 URL: https://issues.apache.org/jira/browse/IOTDB-532
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


In the present implementation, the logs in a data group are executed serially 
which means for one data group, there is only one client that can be served at 
the same time. To increase concurrency, the data group should be able to 
process multiple client requests simultaneously.
In order to do this, the following changes should be made:

The log manager should be locked only when getting a new index.
When a log is failed, the logs after it should also be removed.
The internal retires should be added to overcome temporary network failure or 
the thread being switched out which causing the logs with larger index to 
arrive ahead.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-526) [Distributed]Support metadata queries

2020-02-27 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-526:


 Summary: [Distributed]Support metadata queries
 Key: IOTDB-526
 URL: https://issues.apache.org/jira/browse/IOTDB-526
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang
Assignee: Tian Jiang
 Fix For: master branch


Metadata queries, like "getNodeList", "getPathNextChildern", 
"getTimeseriesSchema" are currently unsupported. The point is that the paths 
being queried may contain wildcards(*) or they may be prefix paths, which makes 
it a little hard to figure out which data groups to send the query. 

The simplest way may be performing a broadcast and merge the results, which is 
clearly less efficient. I am hoping you can give a more brilliant idea to 
resolve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-522) Aggregation result should be serializable

2020-02-25 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-522:


 Summary: Aggregation result should be serializable
 Key: IOTDB-522
 URL: https://issues.apache.org/jira/browse/IOTDB-522
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang
 Fix For: master branch


As the coordinator node in the distributed version should gather the 
aggregation results from other nodes and merge them, the AggregationResults 
must be serializable for the nodes to transfer them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-520) Result of IBatchReader should not cross partition

2020-02-24 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-520:


 Summary: Result of IBatchReader should not cross partition
 Key: IOTDB-520
 URL: https://issues.apache.org/jira/browse/IOTDB-520
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang
 Fix For: master branch


Situation:
Assuming daily partitioning.
Node A manages the data of Day1,3,5 and Node B manages the data of Day2,4. In 
the current implementation, when the coordinator node fetches a batch from Node 
A, the batch may contain data of Day1,3 and the batch from Node B contains data 
of Day2. As a result, the coordinator node must merge the two batches to retain 
an ordered batch.

But if the batches never cross the partition border, the coordinator node will 
be able to just return the batches without merging using a heap comparing the 
first element of each batch, which could reduce the merging overheads. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-514) To support aggregation in the distributed version

2020-02-24 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-514:


 Summary: To support aggregation in the distributed version
 Key: IOTDB-514
 URL: https://issues.apache.org/jira/browse/IOTDB-514
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang
 Fix For: master branch


The current aggregation cannot satisfy the needs in the distributed version, to 
be specific, there are two points that should be satisfied:
1. The FirstValueAggrResult and LastValueAggrResult should also contain 
timestamp.
Without a timestamp, the coordinator node cannot tell the result from which 
node is the true first/last.
2. An AggrResult should be able to merge with another. When we get AggrResults 
from all nodes that participate in the query, these results should be merged to 
generate a new result.

To resolve the issue, you should:
1. Add a field `timestamp` in FirstValueAggrResult and LastValueAggrResult
2. Add an abstract method `merge(AggregateResult another)` in AggregateResult 
and implement it properly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-505) Add TsFileFilter in series reader to support distributed query

2020-02-23 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-505.

Resolution: Fixed

> Add TsFileFilter in series reader to support distributed query
> --
>
> Key: IOTDB-505
> URL: https://issues.apache.org/jira/browse/IOTDB-505
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Assignee: Tian Jiang
>Priority: Major
>  Labels: distributed, filter, query
> Fix For: master branch
>
>
> In the distributed version of IoTDB, data of different data groups of a node 
> is mixed together in an IoTDB instance. As a result, when querying the data 
> of one group, data of other groups will also be queried, which is not desired.
> To resolve this, we need to add a TsFileFilter in series readers that will 
> filter the TsFiles accordingly, so that the unwanted data will not be queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-511) Deault directories are not platform-independent

2020-02-23 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-511:


 Summary: Deault directories are not platform-independent
 Key: IOTDB-511
 URL: https://issues.apache.org/jira/browse/IOTDB-511
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang
Assignee: Tian Jiang
 Fix For: master branch
 Attachments: image-2020-02-24-10-27-21-348.png, 
image-2020-02-24-10-28-48-113.png

 !image-2020-02-24-10-27-21-348.png|thumbnail! 
The default directories in IoTDBConfig are using a specific path separator, 
which may cause trouble in some platforms.
 !image-2020-02-24-10-28-48-113.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-505) Add TsFileFilter in series reader to support distributed query

2020-02-20 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-505:


 Summary: Add TsFileFilter in series reader to support distributed 
query
 Key: IOTDB-505
 URL: https://issues.apache.org/jira/browse/IOTDB-505
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


In the distributed version of IoTDB, data of different data groups of a node is 
mixed together in an IoTDB instance. As a result, when querying the data of one 
group, data of other groups will also be queried, which is not desired.

To resolve this, we need to add a TsFileFilter in series readers that will 
filter the TsFiles accordingly, so that the unwanted data will not be queried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-504) Confusing implementations of next()

2020-02-20 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-504:


 Summary: Confusing implementations of next()
 Key: IOTDB-504
 URL: https://issues.apache.org/jira/browse/IOTDB-504
 Project: Apache IoTDB
  Issue Type: Bug
Affects Versions: master branch
Reporter: Tian Jiang
 Attachments: image-2020-02-20-17-54-01-852.png

Some implementations of `next()` in readers are confusing, calling `next()` 
without calling `hasNext()` will not move the cursor to the next element, which 
is counter-common-practice. See the attachment for one example.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-460) [Distributed]Remove data of outdated slots

2020-02-05 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-460:


 Summary: [Distributed]Remove data of outdated slots
 Key: IOTDB-460
 URL: https://issues.apache.org/jira/browse/IOTDB-460
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


In node addition/removal, the slots managed by a node will change. However, the 
data of corresponding slots cannot be removed yet because the new holders of 
the slots will pull such data to themselves. As the data pulling is issued by 
the new holders randomly, it is hard for the previous holders to find out when 
will the data be needless. As a result, the previous holder cannot delete the 
local data with confidence.

It will be necessary to find a way for the previous holders to know when the 
data has been replicated to the new holders so that they will be able to remove 
the local data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-451) [Distributed]Recovery of snapshot pulling

2020-02-04 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-451:


 Summary: [Distributed]Recovery of snapshot pulling
 Key: IOTDB-451
 URL: https://issues.apache.org/jira/browse/IOTDB-451
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


After the addition/removal of a node, snapshots of slots are pulled from the 
previous holders to the new holders. In case that the new holders are down and 
restarted, it would be better to restart the pulling from a breakpoint instead 
of starting over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-372) [Distributed] Support node deletion.

2020-02-03 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029556#comment-17029556
 ] 

Tian Jiang commented on IOTDB-372:
--

https://www.processon.com/diagraming/5e37c840e4b006a43aea52cb
Procedure design.

> [Distributed] Support node deletion.
> 
>
> Key: IOTDB-372
> URL: https://issues.apache.org/jira/browse/IOTDB-372
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Major
>  Labels: distributed
>
> Currently, only node addition is supported, to take a step toward scaling 
> even auto-scaling, node deletion. Node deletion is no simple reversion of 
> node addition, it should be carefully designed, discussed and verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager

2020-02-03 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029513#comment-17029513
 ] 

Tian Jiang commented on IOTDB-439:
--

3. Currently, before one log is either timed out or committed, the next log is 
blocked as the high concurrency is mainly supported by partitioning. As 
operations in the same partition are basically serialized, this may not be a 
big issue. So the current method "replaceLastLog" is enough.
Still, holding multiple uncommitted logs is still a future optimization.

> [Distributed] Incorrect Snapshot implementation and LogManager
> --
>
> Key: IOTDB-439
> URL: https://issues.apache.org/jira/browse/IOTDB-439
> Project: Apache IoTDB
>  Issue Type: Sub-task
>Reporter: Xiangdong Huang
>Priority: Major
>
> I read the log/snapshot and manage packages in current cluster_new branch, 
> and have some questions:
> 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are 
> incorrect as
>    a. they still store log into memory while the JavaDoc says they do not 
> store data in memory.
>    b. When doing snapshot, do they need to consider the part of the log in 
> memory?
>  
> 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) 
> uses sync keyword to guarantee that for each call. 
>   a. a better design?
>   b. is there any performance problem? as all operations are serialization.
>  
> 3. Consider the Raft Protocol, don't we need APIs like 
> `removeLogFrom(startIndex)` in LogManager?  see the case of Figure 7 in Raft 
> paper [1] 
>  
> [1] [https://raft.github.io/raft.pdf]
>  
> [~jt2594838] may know clearly about current implementation.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager

2020-02-03 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029508#comment-17029508
 ] 

Tian Jiang commented on IOTDB-439:
--

2.a. If you have concrete advice, it is welcomed. But using "synchronized" can 
minimize the scope being locked, as far as I see, I do not think there is any 
problem.

2.b. Performance concerns are currently left behind, and there is no proof 
supporting that. And for correctness, synchronization is necessary. Tightening 
the scope needed to be synchronized may be a good optimization, but it is too 
early for now.

> [Distributed] Incorrect Snapshot implementation and LogManager
> --
>
> Key: IOTDB-439
> URL: https://issues.apache.org/jira/browse/IOTDB-439
> Project: Apache IoTDB
>  Issue Type: Sub-task
>Reporter: Xiangdong Huang
>Priority: Major
>
> I read the log/snapshot and manage packages in current cluster_new branch, 
> and have some questions:
> 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are 
> incorrect as
>    a. they still store log into memory while the JavaDoc says they do not 
> store data in memory.
>    b. When doing snapshot, do they need to consider the part of the log in 
> memory?
>  
> 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) 
> uses sync keyword to guarantee that for each call. 
>   a. a better design?
>   b. is there any performance problem? as all operations are serialization.
>  
> 3. Consider the Raft Protocol, don't we need APIs like 
> `removeLogFrom(startIndex)` in LogManager?  see the case of Figure 7 in Raft 
> paper [1] 
>  
> [1] [https://raft.github.io/raft.pdf]
>  
> [~jt2594838] may know clearly about current implementation.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-439) [Distributed] Incorrect Snapshot implementation and LogManager

2020-02-03 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17029502#comment-17029502
 ] 

Tian Jiang commented on IOTDB-439:
--

1.a. It is saying the committed logs do not have to be stored in the memory 
while storing them may bring some benefit for catching up, but it is not 
necessary. Please compare it with PartitionedSnapshotLogManager carefully for a 
better understanding.

1.b. Committed logs are covered by snapshots and non-committed logs do not 
concern snapshot. They are already considered.

So I do not know why you call it "incorrect".
 

> [Distributed] Incorrect Snapshot implementation and LogManager
> --
>
> Key: IOTDB-439
> URL: https://issues.apache.org/jira/browse/IOTDB-439
> Project: Apache IoTDB
>  Issue Type: Sub-task
>Reporter: Xiangdong Huang
>Priority: Major
>
> I read the log/snapshot and manage packages in current cluster_new branch, 
> and have some questions:
> 1. PartitionedSnapshotLogManager and FilePartitionedSnapshotLogManager are 
> incorrect as
>    a. they still store log into memory while the JavaDoc says they do not 
> store data in memory.
>    b. When doing snapshot, do they need to consider the part of the log in 
> memory?
>  
> 2. Current LogManager is not thread-safety. The caller (i.e., RaftMember) 
> uses sync keyword to guarantee that for each call. 
>   a. a better design?
>   b. is there any performance problem? as all operations are serialization.
>  
> 3. Consider the Raft Protocol, don't we need APIs like 
> `removeLogFrom(startIndex)` in LogManager?  see the case of Figure 7 in Raft 
> paper [1] 
>  
> [1] [https://raft.github.io/raft.pdf]
>  
> [~jt2594838] may know clearly about current implementation.
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-422) Close current files before merge.

2020-01-14 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-422:


 Summary: Close current files before merge.
 Key: IOTDB-422
 URL: https://issues.apache.org/jira/browse/IOTDB-422
 Project: Apache IoTDB
  Issue Type: Bug
Affects Versions: 0.10.0-SNAPSHOT
Reporter: Tian Jiang
 Fix For: 0.10.0-SNAPSHOT


If some unseq file overlaps the unsealed seq file and a merge is triggered, the 
overlapped data may not be able to be merged into the right file.

To resolve this, the files should be closed before a merge starts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-420) Avoid flush encoding task dying silently.

2020-01-13 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-420:


 Summary: Avoid flush encoding task dying silently.
 Key: IOTDB-420
 URL: https://issues.apache.org/jira/browse/IOTDB-420
 Project: Apache IoTDB
  Issue Type: Bug
Affects Versions: 0.10.0-SNAPSHOT
Reporter: Tian Jiang
 Fix For: 0.10.0-SNAPSHOT


If a runtime exception is thrown in an encoding sub-task, it will die silently 
and prevents the io task from ending.

To avoid this, the future of the encoding task should be got before that of the 
io task is got.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-412) Paths are not correctly deduplicated

2020-01-10 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-412.

Resolution: Fixed

> Paths are not correctly deduplicated
> 
>
> Key: IOTDB-412
> URL: https://issues.apache.org/jira/browse/IOTDB-412
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 0.10.0-SNAPSHOT
>Reporter: Tian Jiang
>Assignee: atoildw
>Priority: Major
>  Labels: pull-request-available, query
> Fix For: 0.10.0-SNAPSHOT
>
> Attachments: Paths are duplicated in GroupByPlan.docx
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Please check the attachment for details. I am not sure if other plans have 
> the same problem, those who take over this should have a look.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-412) Paths are not correctly deduplicated

2020-01-10 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012563#comment-17012563
 ] 

Tian Jiang commented on IOTDB-412:
--

Somehow I cannot assign this to you on Jira, but I will keep in mind that you 
are working on it, thanks.

> Paths are not correctly deduplicated
> 
>
> Key: IOTDB-412
> URL: https://issues.apache.org/jira/browse/IOTDB-412
> Project: Apache IoTDB
>  Issue Type: Bug
>Affects Versions: 0.10.0-SNAPSHOT
>Reporter: Tian Jiang
>Priority: Major
>  Labels: query
> Fix For: 0.10.0-SNAPSHOT
>
> Attachments: Paths are duplicated in GroupByPlan.docx
>
>
> Please check the attachment for details. I am not sure if other plans have 
> the same problem, those who take over this should have a look.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-412) Paths are not correctly deduplicated

2020-01-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-412:


 Summary: Paths are not correctly deduplicated
 Key: IOTDB-412
 URL: https://issues.apache.org/jira/browse/IOTDB-412
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang
 Fix For: 0.10.0-SNAPSHOT
 Attachments: Paths are duplicated in GroupByPlan.docx

Please check the attachment for details. I am not sure if other plans have the 
same problem, those who take over this should have a look.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot

2019-12-31 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005984#comment-17005984
 ] 

Tian Jiang commented on IOTDB-352:
--

This is the current solution, it is not perfect. Any suggestions or new ideas 
are welcomed.

> [Distributed] Recognize and skip duplicated files in a snapshot
> ---
>
> Key: IOTDB-352
> URL: https://issues.apache.org/jira/browse/IOTDB-352
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Priority: Major
>  Labels: distributed
>
> By the naming of TsFiles in IoTDB, the files with the same data may have 
> different names on different nodes. When such files are sent through 
> snapshots, the receiver is unable to tell whether the file already exists 
> locally or not, so it will blindly load the file as an unsequential one (if 
> it does overlap any existing files), which will waste a lot of system 
> resources.
> How can we figure out if we already have one file or not?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot

2019-12-31 Thread Tian Jiang (Jira)


[ 
https://issues.apache.org/jira/browse/IOTDB-352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17005983#comment-17005983
 ] 

Tian Jiang commented on IOTDB-352:
--

Adding md5 is not helpful in this issue, it may be used to check the file 
integrity of files during file transfers, but that is another issue.

> [Distributed] Recognize and skip duplicated files in a snapshot
> ---
>
> Key: IOTDB-352
> URL: https://issues.apache.org/jira/browse/IOTDB-352
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Priority: Major
>  Labels: distributed
>
> By the naming of TsFiles in IoTDB, the files with the same data may have 
> different names on different nodes. When such files are sent through 
> snapshots, the receiver is unable to tell whether the file already exists 
> locally or not, so it will blindly load the file as an unsequential one (if 
> it does overlap any existing files), which will waste a lot of system 
> resources.
> How can we figure out if we already have one file or not?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-385) Bloom Filter for time ranges

2019-12-23 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-385:


 Summary: Bloom Filter for time ranges
 Key: IOTDB-385
 URL: https://issues.apache.org/jira/browse/IOTDB-385
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


Situation:
Device1 generates data at 1 pm, 5 pm, and 8 pm, Device2 generates data at 1 pm, 
7 pm, and 8 pm. The query is "SELECT * FROM Device1, Device2 WHERE 13:00:00 < 
time < 18:00:00".

It is clear that Device2 is not satisfied, but we still need to query it since 
we currently only record startTime and endTIme for each device.

Solution:
For each device, assuming its startTime is t_s, then each timestamp _t_d >= 
t_s_ can be cast to a time range id using: _id = ceiling((t_d - t_s) / 
interval_length)_, where the interval_length is 1 hour for the above example. 
Having this id, a bloom filter (maybe other filters) can be built to tell if we 
truly have data satisfying the time condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-373) [Distributed] Query coordinating

2019-12-17 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-373:


 Summary: [Distributed] Query coordinating
 Key: IOTDB-373
 URL: https://issues.apache.org/jira/browse/IOTDB-373
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


When using more than one replicas, query options are enriched and complicated. 
To ensure load balancing, we may have to choose the node with lowest load to 
perform the query, which requests knowing the load of each node and a formula 
to rank the nodes based on their status.

We can also issue the same query to multiple replicas and pick up the fastest 
one to respond to the user as MapReduce has done. But this may result in 
resource wasting unless we feasibly support quick query cancellation.

In a word, we should decide which replica(s) to serve a query and what 
information we need to collect to make the decision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-372) [Distributed] Support node deletion.

2019-12-17 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-372:


 Summary: [Distributed] Support node deletion.
 Key: IOTDB-372
 URL: https://issues.apache.org/jira/browse/IOTDB-372
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Tian Jiang


Currently, only node addition is supported, to take a step toward scaling even 
auto-scaling, node deletion. Node deletion is no simple reversion of node 
addition, it should be carefully designed, discussed and verified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-361) Refactor session management by inducing sessionId

2019-12-16 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-361.

Fix Version/s: 0.10.0-SNAPSHOT
 Assignee: Tian Jiang
   Resolution: Fixed

> Refactor session management by inducing sessionId
> -
>
> Key: IOTDB-361
> URL: https://issues.apache.org/jira/browse/IOTDB-361
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Assignee: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0-SNAPSHOT
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> We are using ThreadLocals in TSServiceImpl to distinguish different clients, 
> which rely on the underlined server pool to provide a thread for each client 
> and blocks us from using more efficient pooling techs.
> To resolve this, each client should be given a sessionId (or you may call it 
> clientId) as an identifier to replace the usages of ThreadLocal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-361) Refactor session management by inducing sessionId

2019-12-11 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-361:


 Summary: Refactor session management by inducing sessionId
 Key: IOTDB-361
 URL: https://issues.apache.org/jira/browse/IOTDB-361
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


We are using ThreadLocals in TSServiceImpl to distinguish different clients, 
which rely on the underlined server pool to provide a thread for each client 
and blocks us from using more efficient pooling techs.

To resolve this, each client should be given a sessionId (or you may call it 
clientId) as an identifier to replace the usages of ThreadLocal.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-355) [Distributed] Start-up checks

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-355:


 Summary: [Distributed] Start-up checks
 Key: IOTDB-355
 URL: https://issues.apache.org/jira/browse/IOTDB-355
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


A node should check the following items before it is set up:
The size of seed-nodes should be no less than the quorum.

When a node joins the cluster or the seed-nodes are trying to form the initial 
cluster:
Configurations like partition interval, hash salt, replication number should be 
the same for all nodes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-353) [Distributed] Validate files in snapshots

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-353:


 Summary: [Distributed] Validate files in snapshots
 Key: IOTDB-353
 URL: https://issues.apache.org/jira/browse/IOTDB-353
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


After a node pulls a file from a remote node in the snapshot, it does not check 
the integrity of this file. The file should be validated using md5 or other 
verification methods to avoid file corruption due to bad network or anything.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-352) [Distributed] Recognize and skip duplicated files in a snapshot

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-352:


 Summary: [Distributed] Recognize and skip duplicated files in a 
snapshot
 Key: IOTDB-352
 URL: https://issues.apache.org/jira/browse/IOTDB-352
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


By the naming of TsFiles in IoTDB, the files with the same data may have 
different names on different nodes. When such files are sent through snapshots, 
the receiver is unable to tell whether the file already exists locally or not, 
so it will blindly load the file as an unsequential one (if it does overlap any 
existing files), which will waste a lot of system resources.

How can we figure out if we already have one file or not?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-351) [Distributed] Serialize the raft logs

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-351:


 Summary: [Distributed] Serialize the raft logs
 Key: IOTDB-351
 URL: https://issues.apache.org/jira/browse/IOTDB-351
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


The raft logs are only memory-resident now if all nodes in a group crash, the 
logs will be lost permanently, so the logs should be persisted to the storage 
according to a certain strategy.

Moreover, it is interesting how raft logs interact or even replace the existing 
WALs in IoTDB. They are currently independent to decouple the design of the 
distributed version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-350) [Distributed] Integrate with time partitioning of data

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-350:


 Summary: [Distributed] Integrate with time partitioning of data
 Key: IOTDB-350
 URL: https://issues.apache.org/jira/browse/IOTDB-350
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


When time partitioning of data is supported in the standalone IoTDB, the 
distributed version should integrate with this feature and partition data using 
the same granularity as IoTDB's.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-349) [Distributed] Incrementally update snapshot

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-349:


 Summary: [Distributed] Incrementally update snapshot
 Key: IOTDB-349
 URL: https://issues.apache.org/jira/browse/IOTDB-349
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


Currently, the snapshot is fully recalculated when takeSnapshot() is called. It 
does not count much since snapshot-taking is relatively rare and the snapshot 
mainly concerns the list of the data files and timeseries schemas, not 
including the data.

Still, using an incremental strategy would help to reduce meta tree traversing 
and file listing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-348) [Distributed] Support more non-query operations (log types)

2019-12-09 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-348:


 Summary: [Distributed] Support more non-query operations (log 
types)
 Key: IOTDB-348
 URL: https://issues.apache.org/jira/browse/IOTDB-348
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


Currently supported operations:
create storage group
create timeseries 
single row insertion

Please link to and reply to this issue if you added any new functionalities.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-322) Thrift should be upgraded

2019-12-09 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-322.


> Thrift should be upgraded
> -
>
> Key: IOTDB-322
> URL: https://issues.apache.org/jira/browse/IOTDB-322
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2019-11-26-12-08-26-149.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The current thrift version (0.9.3) has a bug that mistakenly converts the 
> TApplicationException to TBase, but TApplicationException does not extend 
> TBase.
>  !image-2019-11-26-12-08-26-149.png|thumbnail! 
> Upgrading to 0.10.0 or higher will fix this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-276) Inconsistent ways of judging the nullness of a Field

2019-12-09 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-276.


> Inconsistent ways of judging the nullness of a Field
> 
>
> Key: IOTDB-276
> URL: https://issues.apache.org/jira/browse/IOTDB-276
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
> Attachments: image-2019-10-29-11-43-40-180.png, 
> image-2019-10-29-11-45-53-763.png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Several places are using the `dataType == null` to judge whether a field is 
> null or not, while there is a field `isNull` which better suits this job.
> The inconsistent usages may result in that one sets `isNull` to true but find 
> that the displayed result is not null.
>  !image-2019-10-29-11-43-40-180.png|thumbnail! 
>  !image-2019-10-29-11-45-53-763.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-322) Thrift should be upgraded

2019-11-25 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-322:


 Summary: Thrift should be upgraded
 Key: IOTDB-322
 URL: https://issues.apache.org/jira/browse/IOTDB-322
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang
 Attachments: image-2019-11-26-12-08-26-149.png

The current thrift version (0.9.3) has a bug that mistakenly converts the 
TApplicationException to TBase, but TApplicationException does not extend TBase.
 !image-2019-11-26-12-08-26-149.png|thumbnail! 

Upgrading to 0.10.0 or higher will fix this problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-314) Partition the data in a storage group by time.

2019-11-21 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-314:


 Summary: Partition the data in a storage group by time.
 Key: IOTDB-314
 URL: https://issues.apache.org/jira/browse/IOTDB-314
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


In many analytic applications, reports are generated daily, weekly or monthly. 
If the data files are naturally partitioned by such intervals, such 
applications will be able to find the target data more easily. Other 
functionalities like daily replication or transfers also benefit from this.

As a result,  we should support embedded storage-group-level time partitioning 
in IoTDB, which allows each TsFile generated by IoTDB will not have data 
exceeds a configurable interval (e.g. a day).

By the way, this is also the fundamental support needed by the distributed 
version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-208) Add bloom filters to TsFile

2019-10-31 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-208.


> Add bloom filters to TsFile
> ---
>
> Key: IOTDB-208
> URL: https://issues.apache.org/jira/browse/IOTDB-208
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0-SNAPSHOT
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The recent readings remind me that the bloom filter is standard equipment in 
> K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
> filter still helps a lot in various situations. For example, our recent 
> experiments gave us an illusion that the time series in a storage group 
> remains unchanged. However, that is not the case.
> Naturally, in real situations, the number of time series grows over time, due 
> to reasons like adding new gears. The old files do not contain such a time 
> series. Without the help of bloom filters, we have to check each old file 
> only to find that there is no such time series. To my knowledge, this may 
> take a lot of time.
> So, I suggest we add a bloom filter (or some more efficient one) to each 
> TsFile to help skip unwanted files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-262) CachedPriortityMergeReader fails to deduplicate some elements

2019-10-31 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-262.


> CachedPriortityMergeReader fails to deduplicate some elements
> -
>
> Key: IOTDB-262
> URL: https://issues.apache.org/jira/browse/IOTDB-262
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0-SNAPSHOT
>
> Attachments: duplicated (1).png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CachedPriortityMergeReader fails to deduplicate the element at the end of the 
> cache. The picture in the attachment explains this.
> I plan to record the last timestamp to help to deduplicate such elements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-276) Inconsistent ways of judging the nullness of a Field

2019-10-28 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-276:


 Summary: Inconsistent ways of judging the nullness of a Field
 Key: IOTDB-276
 URL: https://issues.apache.org/jira/browse/IOTDB-276
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang
 Attachments: image-2019-10-29-11-43-40-180.png, 
image-2019-10-29-11-45-53-763.png

Several places are using the `dataType == null` to judge whether a field is 
null or not, while there is a field `isNull` which better suits this job.
The inconsistent usages may result in that one sets `isNull` to true but find 
that the displayed result is not null.
 !image-2019-10-29-11-43-40-180.png|thumbnail! 
 !image-2019-10-29-11-45-53-763.png|thumbnail! 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-262) CachedPriortityMergeReader fails to deduplicate some elements

2019-10-20 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-262:


 Summary: CachedPriortityMergeReader fails to deduplicate some 
elements
 Key: IOTDB-262
 URL: https://issues.apache.org/jira/browse/IOTDB-262
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang
 Attachments: duplicated (1).png

CachedPriortityMergeReader fails to deduplicate the element at the end of the 
cache. The picture in the attachment explains this.

I plan to record the last timestamp to help to deduplicate such elements.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-13) Support batched ingestion

2019-10-07 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-13.
---

> Support batched ingestion
> -
>
> Key: IOTDB-13
> URL: https://issues.apache.org/jira/browse/IOTDB-13
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Assignee: Yanzhe An
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Our current insertion interface is based on building one TsRecord for a 
> timestamp. This limits our capability when ingestion a large amount of 
> pre-generated data.
> We need specifically designed batch load interface to improve our performance 
> when loading, say, historical data. For example, the size checks of multiple 
> PageWriters can be reduced to one in a batched fashion. Moreover, when the 
> schema of the data is static, we can use primitive arrays instead of Lists 
> which may incur the performance greatly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-143) Support merge

2019-10-07 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-143.


> Support merge
> -
>
> Key: IOTDB-143
> URL: https://issues.apache.org/jira/browse/IOTDB-143
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.8.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Merge (or compaction) is an important feature of LSM or LSM-like systems and 
> IoTDB depends on it to put the data in the sequential files and unsequential 
> files together and make them ordered and non-duplicated again.
> Merged data files provide better locality and potentially higher compression 
> rate (for some of the missing values are supplemented). While merging 
> interacts with many aspects of IoTDB like ingestion and query, finding an 
> effective implementation may be rather difficult.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (IOTDB-208) Add bloom filters to TsFile

2019-09-10 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-208:


 Summary: Add bloom filters to TsFile
 Key: IOTDB-208
 URL: https://issues.apache.org/jira/browse/IOTDB-208
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Tian Jiang


The recent readings remind me that the bloom filter is standard equipment in 
K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
filter still helps a lot in various situations. For example, our recent 
experiments gave us an illusion that the time series in a storage group remains 
unchanged. However, that is not the case.

Naturally, in real situations, the number of time series grows over time, due 
to reasons like adding new gears. The old files do not contain such a time 
series. Without the help of bloom filters, we have to check each old file only 
to find that there is no such time series. To my knowledge, this may take a lot 
of time.

So, I suggest we add a bloom filter (or some more efficient one) to each TsFile 
to help skip unwanted files.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IOTDB-163) Support create device template and create device.

2019-08-13 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-163:


 Summary: Support create device template and create device.
 Key: IOTDB-163
 URL: https://issues.apache.org/jira/browse/IOTDB-163
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


In the present version, it is a little trouble some to create a set timeseries 
that has the same measurements. On the other hand, although we use the 
conception "device" in the code, it is not properly abstracted.

Expected usage:

Using IoTDB in a more _*relational*_ way:

*CREATE DEVICE TEMPLATE vehicle (speed DOUBLE PLAIN, direction* *DOUBLE PLAIN, 
temperature* *DOUBLE PLAIN, fuel* *DOUBLE PLAIN**)*

If all datatypes(or encodings) are the same, you can write the equal form:

*CREATE DEVICE TEMPLATE vehicle MEASUREMENTS (speed, direction, temperature, 
fuel) DATATYPE DOUBLE ENCODING PLAIN*

Then you will be able to create time series in an easier way:

*CREATE DEVICE (vehicle) root.sg1.vehicle1*

Which equals:

*CREATE TIMESERIES root.sg1.vehicle1.speed WITH DATATYPE=DOUBLE,ENCODING=PLAIN*

*CREATE TIMESERIES root.sg1.vehicle1.direction WITH 
DATATYPE=DOUBLE,ENCODING=PLAIN*

*CREATE TIMESERIES root.sg1.vehicle1.fuel WITH DATATYPE=DOUBLE,ENCODING=PLAIN*

*CREATE TIMESERIES root.sg1.vehicle1.temperature WITH 
DATATYPE=DOUBLE,ENCODING=PLAIN*

I ** hope this will narrow the gap of using IoTDB and traditional relation 
databases.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IOTDB-162) Fix the semantics of hasNext() and next().

2019-08-13 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-162:


 Summary: Fix the semantics of hasNext() and next().
 Key: IOTDB-162
 URL: https://issues.apache.org/jira/browse/IOTDB-162
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang
 Attachments: image-2019-08-14-09-50-05-929.png

Some definitions of hasNext() and next() are misleading, for example, the 
following actually means hasCurrent rather than hasNext, say, when curIdx = 
timeLength - 1, it will true while actually there is not the next value.

Such definitions conflict the hasNext() and next() defined and widely used in 
Java Iterator, and cause confusion to those who are not so familiar with the 
code.

!image-2019-08-14-09-50-05-929.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (IOTDB-95) Keep stack traces when handling an Exception.

2019-08-12 Thread Tian Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IOTDB-95?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-95.
---
Resolution: Fixed

> Keep stack traces when handling an Exception.
> -
>
> Key: IOTDB-95
> URL: https://issues.apache.org/jira/browse/IOTDB-95
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, in some handlings of exceptions, the stack traces of the 
> exceptions are omitted, which significantly increases the difficulty of 
> locating problems precisely.
> To provide more useful information for debugging, the stack traces should be 
> kept until they are logged at the top level.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IOTDB-143) Support merge

2019-07-17 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-143:


 Summary: Support merge
 Key: IOTDB-143
 URL: https://issues.apache.org/jira/browse/IOTDB-143
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Tian Jiang


Merge (or compaction) is an important feature of LSM or LSM-like systems and 
IoTDB depends on it to put the data in the sequential files and unsequential 
files together and make them ordered and non-duplicated again.

Merged data files provide better locality and potentially higher compression 
rate (for some of the missing values are supplemented). While merging interacts 
with many aspects of IoTDB like ingestion and query, finding an effective 
implementation may be rather difficult.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (IOTDB-122) Support prepared insertion

2019-07-17 Thread Tian Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IOTDB-122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-122.


> Support prepared insertion
> --
>
> Key: IOTDB-122
> URL: https://issues.apache.org/jira/browse/IOTDB-122
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As some have mentioned, sql parser(antlr) may consume about 40% of time in 
> ingestion, especially when small sqls executed sent frequently. Luckily, 
> IoTDB insertion sqls are currently all alike and simple, there are 4 most 
> meaningful parts of such sqls: deviceId, measurements, values and time. For 
> such a simple structure, using tools like antlr may be just too heavy.
> Intuitively, PreparedStatement in Standard JDBC interface can be just used 
> for reliving parsing overhead when sqls are similar. I will describe how 
> PreparedStatement works as follow (this is still left to be implemented):
> 1. The user wants to create a prepared insert statement and called 
> `connection.prepareStatement(“Insert”)`;
> 2. The connection matches the parameter string with some templates, finds out 
> it is an insertion and returns an IoTDBPreparedInsertStatement pStmt.
> 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); 
> pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to set 
> parameters for next insertion.
> 4. The user calls `pStmt.execute()` to execute an insertion.
> 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, 
> measurements, values and time into this request and sends this request to the 
> server.
> 6. The server receives the request, extracts parameters from the request and 
> executes an insertion directly through database engine and return a 
> TSInsertionResp to the user.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Closed] (IOTDB-107) WAL log node is missing after recovery.

2019-07-16 Thread Tian Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IOTDB-107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-107.


> WAL log node is missing after recovery.
> ---
>
> Key: IOTDB-107
> URL: https://issues.apache.org/jira/browse/IOTDB-107
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> During recovery, the wal is disabled because the recovery process aims to 
> eventually remove all wals. As a result, BufferWriteProcessors and 
> OverflowProcessors are created without wal log node. When recovery is over, 
> even if the wal is enabled,  it cannot function correctly due to the missing 
> of wal log nodes.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Created] (IOTDB-122) Support prepared insertion

2019-06-25 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-122:


 Summary: Support prepared insertion
 Key: IOTDB-122
 URL: https://issues.apache.org/jira/browse/IOTDB-122
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Tian Jiang


As some have mentioned, sql parser(antlr) may consume about 40% of time in 
ingestion, especially when small sqls executed sent frequently. Luckily, IoTDB 
insertion sqls are currently all alike and simple, there are 4 most meaningful 
parts of such sqls: deviceId, measurements, values and time. For such a simple 
structure, using tools like antlr may be just too heavy.

Intuitively, PreparedStatement in Standard JDBC interface can be just used for 
reliving parsing overhead when sqls are similar. I will describe how 
PreparedStatement works as follow (this is still left to be implemented):

1. The user wants to create a prepared insert statement and called 
`connection.prepareStatement(“Insert”)`;
2. The connection matches the parameter string with some templates, finds out 
it is an insertion and returns an IoTDBPreparedInsertStatement pStmt.
3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); 
pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to set 
parameters for next insertion.
4. The user calls `pStmt.execute()` to execute an insertion.
5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, 
measurements, values and time into this request and sends this request to the 
server.
6. The server receives the request, extracts parameters from the request and 
executes an insertion directly through database engine and return a 
TSInsertionResp to the user.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IOTDB-107) WAL log node is missing after recovery.

2019-05-30 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-107:


 Summary: WAL log node is missing after recovery.
 Key: IOTDB-107
 URL: https://issues.apache.org/jira/browse/IOTDB-107
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Tian Jiang


During recovery, the wal is disabled because the recovery process aims to 
eventually remove all wals. As a result, BufferWriteProcessors and 
OverflowProcessors are created without wal log node. When recovery is over, 
even if the wal is enabled,  it cannot function correctly due to the missing of 
wal log nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IOTDB-95) Keep stack traces when handling an Exception.

2019-05-23 Thread Tian Jiang (JIRA)
Tian Jiang created IOTDB-95:
---

 Summary: Keep stack traces when handling an Exception.
 Key: IOTDB-95
 URL: https://issues.apache.org/jira/browse/IOTDB-95
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Tian Jiang


Currently, in some handlings of exceptions, the stack traces of the exceptions 
are omitted, which significantly increases the difficulty of locating problems 
precisely.

To provide more useful information for debugging, the stack traces should be 
kept until they are logged at the top level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)