Closed and labeled some issues

2019-10-31 Thread Tian Jiang


Greetings,


I and   @黄向东 have run through the issue list and closed or resolved some 
issues, the issues left should be more update-to-date. By the way, we labeled 
the issues according to the estimated difficulties such as "easy-fix, medium, 
little-hard, hard, experimental".


If you want to contribute to IoTDB, you may choose one of these labeled issues 
and try fixing it. Be sure to assign the issue to yourself or at least leave a 
comment that you are working on it.


Best,


Tian Jiang

[jira] [Closed] (IOTDB-148) modify start-*.sh to adapt the new jar files position

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-148.
-
Resolution: Fixed

> modify start-*.sh to adapt the new jar files position
> -
>
> Key: IOTDB-148
> URL: https://issues.apache.org/jira/browse/IOTDB-148
> Project: Apache IoTDB
>  Issue Type: Sub-task
>Reporter: xiangdong Huang
>Priority: Major
>
> Hi, as discussed in the mailing list, the new releasing structure is:
> .
> ├──  LICENSE
> ├──  NOTICE
> ├──  RELEASE_NOTES
> │
> ├──  bin
> │      ├──  start-client.bat
> │      ├──  start-client.sh
> │      ├──  start-server.bat
> │      ├──  start-server.sh
> │      ├──  stop-server.bat
> │      └──  stop-server.sh
> │
> ├──  conf
> │      ├──  iotdb-engine.properties
> │      ├──  iotdb-env.bat
> │      ├──  iotdb-env.sh
> │      ├──  iotdb-sync-client.properties
> │      ├──  logback.xml
> │      └──  tsfile-format.properties
> │
> ├──  lib
> │      └──  *.jar
> │
> ├──  licenses
> │      └──  *.license
> │
> └──  tools
>        ├──  export-csv.bat
>        ├──  export-csv.sh
>        ├──  import-csv.bat
>        ├──  import-csv.sh
>        ├──  start-WalChecker.bat
>        ├──  start-WalChecker.sh
>        ├──  memory-tool.bat
>        ├──  memory-tool.sh
>        ├──  start-sync-client.bat
>        ├──  start-sync-client.sh
>        ├──  stop-sync-client.bat
>        └──  stop-sync-client.sh
>  
> So, we need to modify the start-server.sh, start-client.sh etc.. to fit the 
> above structure. The main difference is that the -classpath is changed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-132) Add side navigator of documents

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-132.
-
Resolution: Fixed

> Add side navigator of documents
> ---
>
> Key: IOTDB-132
> URL: https://issues.apache.org/jira/browse/IOTDB-132
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Zesong Sun
>Priority: Major
>
> Since navigator on top of documents are changed into "Overview" part in 
> [PR229|https://github.com/apache/incubator-iotdb/pull/229], a new navigator 
> should  be added on left side, which is more convenient and also consistent 
> with User Guide documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-40) Supplement TsFile API document for users

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-40?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-40.


> Supplement TsFile API document for users
> 
>
> Key: IOTDB-40
> URL: https://issues.apache.org/jira/browse/IOTDB-40
> Project: Apache IoTDB
>  Issue Type: Task
>Reporter: xiangdong Huang
>Assignee: EJTTianyu
>Priority: Major
>
> There is no document about TsFile API in the iotdb.apache.org website, and we 
> need that indeed. 
>  
> The older version can be found at [https://github.com/thulab/tsfile/wiki]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-253) Time expression

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-253.
-

> Time expression
> ---
>
> Key: IOTDB-253
> URL: https://issues.apache.org/jira/browse/IOTDB-253
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: suyue
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> In some real-time display systems, applications need to constantly query the 
> latest period of data. For example, query data for the last 5 minutes. 
> However, iotdb currently does not support time operation based on now().



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-158) iotdb metrics service

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-158.
-
Resolution: Fixed

> iotdb metrics service
> -
>
> Key: IOTDB-158
> URL: https://issues.apache.org/jira/browse/IOTDB-158
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: 穆喜军
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2019-08-05-16-36-57-434.png
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Can we add a web page like this?
> We can look at the execution history and time of the SQL
> !image-2019-08-05-16-36-57-434.png|width=613,height=318!
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-136) Improve the new encoding method for inrregular data

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-136.
-
Resolution: Fixed

> Improve the new encoding method for inrregular data
> ---
>
> Key: IOTDB-136
> URL: https://issues.apache.org/jira/browse/IOTDB-136
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tsung-Han Tsai
>Priority: Minor
>
> Now, the new encoding method has been implemented to make the compression 
> ratio of the regular data higher than the original one (DeltaBinaryEncoder).
> However, it still need to find a better way to encode the inrregular data in 
> order to enhance its compression ratio.
> Some discussions about dealing with the inrregular data is in this mail: 
> [https://mail-archives.apache.org/mod_mbox/iotdb-dev/201907.mbox/%3CCAK7Y4CreCv2%2B6LBuGpURWMrx3C6PfPOGHnd3594qMq6KrYywZg%40mail.gmail.com%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-199) A simple tool to visualize logs

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-199.
-
Resolution: Fixed

> A simple tool to visualize logs
> ---
>
> Key: IOTDB-199
> URL: https://issues.apache.org/jira/browse/IOTDB-199
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Minor
>  Labels: pull-request-available
> Attachments: IoTDBRuntimReport.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Usually, the very first thing we will do on finding a bug is to search the 
> logs. Logs play a vital role in debugging especially in some environment that 
> attaching a debugger is impossible. In such circumstance, logs will hopefully 
> become the only information sources for the developers.
>  
> However, a single log, which is just a string, is easy to understand. But 
> when it comes to mining information from thousands of logs or even more,  
> getting lost is nearly unavoidable, since humans have a much limited memory 
> for exact truth compared to computers. From time to time, I forget what I 
> have read before and I must go back to review the previous logs, as a result, 
> progress is made very slowly. Reading several strings is easy, but when we 
> have thousands, there must be some better way to present them than raw text.
>  
> So, I keep thinking it would be much better if we can make the logs into 
> plots. Of course there must some existing tools, but they are often powerful 
> but too heavy (like Kibana) , or specialized for web or other logs (like 
> LogStalgia). Having a fantastic web interface is great, but a simple but 
> handy suit us better. What I want is something light-weighted, stand-alone 
> and highly customized.
>  
> As a result, I developed a simple tool that can visualize (plot) logs 
> generated by IoTDB (with some modification, it can be applied to other type 
> of logs, too) and generate report. I designed a simple GUI which provides 
> full functionalities and a command line tool to fast generate reports. The 
> attachment contains an example report I generated from one of my experiments, 
> which reveal interesting things like how the size of memtables converges over 
> time.
>  
> I may have missed some tools that are more powerful or easier to use. If you 
> know any, please inform me and I shall see what I can learn from them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-117) Add documentation about sync module

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-117.
-

> Add documentation about sync module
> ---
>
> Key: IOTDB-117
> URL: https://issues.apache.org/jira/browse/IOTDB-117
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Tianan Li
>Priority: Minor
>  Labels: pull-request-available, sync, tool
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently, sync tools have not yet been fully tested and are temporarily 
> unstable,  so there is no open documentation. But it's necessary to have 
> documentation about how to use the tool, we hope that anyone who is 
> interested in this tool to do some test and send the feedback to us and 
> improve it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-83) Add process bar for import/export srcipt

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-83.


> Add process bar for import/export srcipt
> 
>
> Key: IOTDB-83
> URL: https://issues.apache.org/jira/browse/IOTDB-83
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: XuYi
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Could you please add a progress bar for import and export scripts in the 
> iotdb-cli folder? Since import and export large-size file takes long time and 
> the user hopes to know the progress.
>  
> By Sunny Wenhui Wu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-66) choose suitable Reader/Writer automatically

2019-10-31 Thread xiangdong Huang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-66?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

xiangdong Huang closed IOTDB-66.

Resolution: Fixed

If a user do not know how to  choose, he/she can use TsFileRestorableReader and 
NativeRestorableIOWriter.

> choose suitable Reader/Writer automatically
> ---
>
> Key: IOTDB-66
> URL: https://issues.apache.org/jira/browse/IOTDB-66
> Project: Apache IoTDB
>  Issue Type: Task
>Reporter: xiangdong Huang
>Priority: Minor
>
> Now we have TsFileSequenceReader and TsFileWriter, which can read data from a 
> complete TsFile and write data to a new file.
> And, we have TsFileRestorableReader and NativeRestorableIOWriter, for read 
> data from an incomplete TsFile and write data into an existing but incomplete 
> TsFile.
> A better way is, do not let users choose which one they need to use. Just 
> give users the suitable reader/writer what they need.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-267) Reduce IO during the process of deserializing chunk header

2019-10-31 Thread suyue (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

suyue closed IOTDB-267.
---
Resolution: Fixed

> Reduce IO during the process of deserializing chunk header
> --
>
> Key: IOTDB-267
> URL: https://issues.apache.org/jira/browse/IOTDB-267
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: suyue
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When executing a query, IOTDB needs to read ChunkMetaData first and then use 
> it to read ChunkHeader and ChunkData. Currently, deserializing the chunk 
> header requires 2 disk read operations. One is reading chunkheader length, 
> the other is reading chunkheader content. So we considerate to reduce IO 
> operations in  chunk header deserialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-208) Add bloom filters to TsFile

2019-10-31 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-208.


> Add bloom filters to TsFile
> ---
>
> Key: IOTDB-208
> URL: https://issues.apache.org/jira/browse/IOTDB-208
> Project: Apache IoTDB
>  Issue Type: New Feature
>Reporter: Tian Jiang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.9.0-SNAPSHOT
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The recent readings remind me that the bloom filter is standard equipment in 
> K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
> filter still helps a lot in various situations. For example, our recent 
> experiments gave us an illusion that the time series in a storage group 
> remains unchanged. However, that is not the case.
> Naturally, in real situations, the number of time series grows over time, due 
> to reasons like adding new gears. The old files do not contain such a time 
> series. Without the help of bloom filters, we have to check each old file 
> only to find that there is no such time series. To my knowledge, this may 
> take a lot of time.
> So, I suggest we add a bloom filter (or some more efficient one) to each 
> TsFile to help skip unwanted files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-113) Use intern string to reduce memory usage

2019-10-31 Thread Jialin Qiao (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jialin Qiao closed IOTDB-113.
-
Fix Version/s: 0.9.0
   Resolution: Fixed

> Use intern string to reduce memory usage
> 
>
> Key: IOTDB-113
> URL: https://issues.apache.org/jira/browse/IOTDB-113
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Priority: Minor
> Fix For: 0.9.0
>
>
> Each time series is represented by a device id and a measurement name, which 
> are two strings. The memory usage may be large when there are too many time 
> series.
> One possible improvement is using String.intern(), which is designed for 
> reducing memory usage and improve performance.
> A blog about String.intern() is here (in Chinese)
> https://blog.csdn.net/SEU_Calvin/article/details/52291082



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-181) Remove "first_value" and "last_value" in TsDigest

2019-10-31 Thread Jialin Qiao (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jialin Qiao closed IOTDB-181.
-
Resolution: Later

> Remove "first_value" and "last_value" in TsDigest
> -
>
> Key: IOTDB-181
> URL: https://issues.apache.org/jira/browse/IOTDB-181
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Jialin Qiao
>Priority: Minor
>
> In TsDigest, we have five statistics: min_value, max_value, sum_value, 
> first_value, last_value.
> When filtering page or chunk, we use min_value and max_value for a filter 
> like "s1>10" or "s1<20".
> The sum value can be used in aggregation query "select sum(s1) from root...".
> However, it is difficult for me to come up with a scenario that uses 
> "first_value" and "last_value" to filter data, and they are never used at 
> all...
> So why not remove them?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (IOTDB-262) CachedPriortityMergeReader fails to deduplicate some elements

2019-10-31 Thread Tian Jiang (Jira)


 [ 
https://issues.apache.org/jira/browse/IOTDB-262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tian Jiang closed IOTDB-262.


> CachedPriortityMergeReader fails to deduplicate some elements
> -
>
> Key: IOTDB-262
> URL: https://issues.apache.org/jira/browse/IOTDB-262
> Project: Apache IoTDB
>  Issue Type: Bug
>Reporter: Tian Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0-SNAPSHOT
>
> Attachments: duplicated (1).png
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> CachedPriortityMergeReader fails to deduplicate the element at the end of the 
> cache. The picture in the attachment explains this.
> I plan to record the last timestamp to help to deduplicate such elements.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Is there anything needed to be developed now?

2019-10-31 Thread Jack Tsai
Hi,

I have finished the function about querying timeseries and devices recently. It 
is glad to see that IoTDB keeps releasing new releases these days.

Is there anything I could help now?  Maybe implementing some functions that is 
needed for the next release.

Best regards,
Jack Tsai


Re: [DISCUSS] Release 0.9.0

2019-10-31 Thread Jialin Qiao
Hi Tianan Li and Lei Rui,

I will update the change list according to your feedback.

Thanks,
--
Jialin Qiao
School of Software, Tsinghua University

乔嘉林
清华大学 软件学院

> -原始邮件-
> 发件人: "Lei Rui" 
> 发送时间: 2019-10-30 14:30:41 (星期三)
> 收件人: "dev@iotdb.apache.org" 
> 抄送: 
> 主题: Re:  [DISCUSS] Release 0.9.0
> 
> Fix: there are two merged prs linked to "[IOTDB-165][TsFile] Fix a example". 
> One is "[IOTDB-165][TsFile] Fix a example", and the other one is 
> "[IOTDB-165][TsFile] Delete a current version and add a number version and an 
> exception". Actually the IOTDN-165 JIRA title is "compatibility 
> considersion", not "Fix a example".
> 
> 
> On 10/30/2019 14:22,Lei Rui wrote:
> You should remove duplicated changes in the former versions. For example, 
> "IOTDB-172fix a bug of TsFileResource updateTime" has already been listed in 
> 0.8.1.
> 
> 
> Besides, "[IOTDB-165][TsFile] Fix a example" does not only fix a bug, but 
> also change the TsFile structure,
> which is unfortunately reflected neither from the JIRA issue title nor the pr 
> name.
> From this example, we can also see that it is necessary to name our pr as 
> accurate as possible.
> And the release manager better double check if there is inconsistency between 
> the JIRA issue and the linked pr.
> 
> 
> Lei Rui
> On 10/29/2019 17:11,Jialin Qiao wrote:
> Hi,
> 
> 
> As 0.8.1 has been released, the release of 0.9.0 could start.
> 
> 
> I would like to do the release manager of 0.9.0, and I gathered the change 
> list:
> 
> 
> 
> 
> ## New Features
> 
> 
> * IOTDB-143Compaction of data file
> * IOTDB-205Support storage-group-level Time To Live (TTL)
> * IoTDB 226Hive connect TsFile
> * IOTDB-188Delete storage group
> * IOTDB-253support time expression
> * IOTDB-239Add interface for showing devices
> * IOTDB-249enable lowercase in create_timeseries sql
> * IOTDB-203add "group by device" function for narrow table display
> * IOTDB-193Create schema automatically when inserting
> * IOTDB-241Add query and non query interface in session
> * IOTDB-223Add a TsFile sketch tool
> * IOTDB-158add metrics web service
> * IOTDB-234Refactor TsFile storage on HDFS
> * IOTDB-199Add a log visulization tool
> * IoTDB-174Add interfaces for querying device or timeseries number
> * IOTDB-173add batch write interface in session
> * IOTDB-151support number format in timeseries path
> * Spark-iotdb-connector
> * generate cpp, go, and python thrift files under service-rpc
> * display cache hit rate through jconsole
> * support time < 0: Fix initial value of minTimestamp to Long.MIN_VALUE in 
> ChunkBuffer
> * Add interface (Delete timeseries) in session
> * Add a tool to print tsfileResources (each device's start and end time)
> * Support watermark feature
> * Add micro and nano timestamp precision
> 
> 
> ## Incompatible changes
> 
> 
> * RPC is incompatible, you can not use client-0.8.0 to connect with 
> server-0.9.0 or use client-0.9.8 to connect with server-0.8.0.
> * Server is backward compatible, server-0.9.0 could run on data folder of 
> 0.8.0. The data file will be upgraded background.
> 
> 
> https://github.com/apache/incubator-iotdb/pull/467
> 
> 
> ## Miscellaneous changes
> 
> 
> * IOTDB-258Add documents for Query History Visualization Tool and Shared 
> Storage Architecture
> * IOTDB-233keep metadata plan clear
> * IOTDB-267reduce IO operations in deserializing chunk header
> * IOTDB-265Re-adjust the threshold size of memtable
> * IOTDB-251improve TSQueryDataSet structure in RPC
> * IOTDB-221Add a python client example
> * IOTDB-180Get rid of JSON format in "show timeseries"
> * IOTDB-161Add ErrorCode of different response errors
> * IOTDB-160External sort
> * IOTDB-153further limit fetchSize to speed up LIMIT query
> 
> 
> * reconstruct antlrv3 grammar to improve performance
> * Tooling for release
> * Modified Decoder and SequenceReader to support old version of TsFile
> * Remove jdk constrain of jdk8 and 11
> 
> 
> ## Known Issues
> 
> 
> * IOTDB-20Need to support UPDATE
> 
> 
> ## Bug Fixes
> 
> 
> * IOTDB-266 NullPoint exception when reading not existed devices using 
> ReadOnlyTsFile
> * IOTDB-264restart failure due to WAL replay error
> * IOTDB-165[TsFile] Fix a example
> * IOTDB-240fix unknown time series in where clause
> * IOTDB-244fix bug when querying with duplicated columns
> * IOTDB-174Fix querying timeseries interface cannot make a query by the 
> specified path prefix
> * IOTDB-195using String.getBytes(utf-9).length to replace string.length() in 
> ChunkGroupMetadata for supporting Chinese
> * IOTDB-211use "%IOTDB_HOME%\lib\*" to refers to all .jar files in the 
> directory in start-server.bat
> * IOTDB-172fix a bug of TsFileResource updateTime
> * fix start-walchecker scripts for leting user define the wal folder
> 
> 
> Is there any missing?
> 
> 
> 
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院


Re: [DISCUSS][FEATURE] Rich Datatypes API

2019-10-31 Thread Julian Feinauer
Hi,

I agree with your interpretation. Ist just another layer with different 
interpretation.
So the idea would be to provide a different API initially to experiment a bit 
and probably add it to the "core" API finally.
So that the Type resolution always checks whether the type is primitive or 
Logical.

I mainly wanted to get your ideas and feedback about that and if you could 
imagine use cases for that.
We would need something like "NaN" quite often in our use cases and I would 
also like to use a "string" mapping for "ON/OFF" rather than true/false as it 
makes it easier to interpret the data later on.

Julian

Am 31.10.19, 05:39 schrieb "Xiangdong Huang" :

Hi,

> You can look at how avro handles non primitive types (they call it
LogicalTypes) here:
https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types

Yes, I read some materials about LogicalTypes. It looks like a nick name of
a data type, with some new interpretation. E.g., a byte array data type can
be called as Decimal, while the interpretation relies on how user define
the precision and scale..

Using this kind of implementation is also ok. I think.

So, you'd like to provide the interface in the IoTDB layer to user (so
using SQL to operate data), or on top of the TsFile layer (so using TsFile
API to operate data)?

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer  于2019年10月30日周三 下午5:59写道:

> Hi,
>
> in fact it is mostly in the MDF spec not for compression (that’s a nice
> side effect) but rather for being able to really express the (physical)
> content of a signal.
> So my initial idea was to implement it as an optional layer on top of the
> current tsfile which does the "interpretation". Because in the tsfile its
> always just a "primitive" series that is stored.
>
> So the idea would be to store some metadata (like a formula, lookup table,
> ...) on creation and use that on reading but only optionally.
> You can look at how avro handles non primitive types (they call it
> LogicalTypes) here:
> https://avro.apache.org/docs/1.8.1/spec.html#Logical+Types
> This is similar to my idea.
>
> Julian
>
> Am 29.10.19, 14:40 schrieb "Xiangdong Huang" :
>
> Hi,
>
> > Then its most efficient to store integers and a formula like a * x +
> b
> with e.g. b = 3 and a = 1/100.
> > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 
1200.
> > So we only store 0 to 1200 and no decimals and stuff which would be
> very
> easily compressable I thnk.
>
> Good idea! Two thumbs up for that.
>
> But for cases like the above, implementing a new encoding method is
> better
> than a new data type.
>
> e.g, create time series root.a.b.voltage with encoding =
> linear_transformation and encoding_parameter = "describe the function
> like
> y=a * x + b" and datatype = INT.
>
> "linear_transformation" is the new encoding method.
>
> Now I get two cases from the discussion, one is like Optional data,
> and the
> other is data that can be transformative.
> So, do we want to support the above two, or find a more general data
> type
> for "rich data type" (can the MDF file support some inspiration)?
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer  于2019年10月29日周二
> 下午8:26写道:
>
> > Hi Xiangdong,
> >
> > to your second question:
> > The use case ist he other way round.
> > We know that we measure e.g. a voltage between 3V and 4.2V with a
> > precision of 0.01 or something.
> > Then its most efficient to store integers and a formula like a * x +
> b
> > with e.g. b = 3 and a = 1/100.
> > So 3V would be stored as x = 0, 3.01V -> x = 1, ... 4.2V as x = 
1200.
> > So we only store 0 to 1200 and no decimals and stuff which would be
> very
> > easily compressable I thnk.
> >
> > Julian
> >
> > Am 29.10.19, 07:13 schrieb "Xiangdong Huang" :
> >
> > Hi,
> >
> > > In Java we could model it as a variable Optional<> x which
> could be
> > null,
> > Optional.empty(), Optional.of(true), Optional.of(false).
> >
> > It make sense.  And, using a new data type to achieve in IoTDB
> it is
> > ok.
> >
> > > Or scale formulas like a*x+b which allows to leverage the
> precision
> > even
> >