Re: Losing privileges on Travis

2019-11-04 Thread
Hi, 

It turns out that this if you change a browser (from Chrome to Safari in my 
case), things will go back to normal. I don’t know who to blame, the browser or 
Travis, but you can try using another browser if the same problem occurs.

Best,
Tian Jiang

> 在 2019年10月22日,下午12:50,Tian Jiang  写道:
> 
> Hi,
> 
> As you can see from the picture, there was no restart button at all. However, 
> that was yesterday and I just found I can restart now. I am not sure it was a 
> bug of travis or anything, I just hope it won't happen again.
> 
> Best,
> Tian Jiang
> 
> On 10/22/2019 12:28,Xiangdong Huang 
> <mailto:saint...@gmail.com> wrote: 
> Hi Tian Jiang,
> 
> I do not think so. It seems that Travis changed its UI.
> 
> So, I also can not restart a single job in the job list view of a build.
> But if you click a job to see its details, you can find a button to restart
> the job.
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院
> 
> 
> Tian Jiang  于2019年10月21日周一 下午2:25写道:
> 
> Hi,
> 
> 
> Well, things are getting a little better. I can restart a build now, but I
> can't still restart a single job.
> 
> 
> Tian Jiang
> 
> 
> On 10/21/2019 13:20,Xiangdong Huang wrote:
> Hi,
> 
> Ah? I think still have the privilege..
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院
> 
> 
> 江天  于2019年10月21日周一 上午5:54写道:
> 
> Hi,
> 
> I just found I have lost some privileges on Travis, that is, I cannot
> restart any builds or jobs.
> 
> Does anyone encounter the same problem or have the solution?
> 
> Thanks,
> 
> Tian Jiang
> 
> 
> 



Losing privileges on Travis

2019-10-20 Thread
Hi, 

I just found I have lost some privileges on Travis, that is, I cannot restart 
any builds or jobs.

Does anyone encounter the same problem or have the solution?

Thanks,

Tian Jiang


Re: consider introduce error code in client-side

2019-08-13 Thread
Well, I prefer to define new Exception classes over ugly error codes, which 
better utilize OOP. 

Tian Jiang

> 在 2019年8月13日,下午4:00,Xiangdong Huang  写道:
> 
> Hi,
> 
> When using IoTDB-JDBC to operate data, I realize that it is not convenient
> without  error codes 
> 
> The typical application is register a new time series automatically if the
> received data belongs to a not existed time series.. Now, I have to check
> the error message to judge whether the keyword "exist" exists...
> 
> I list 4 useful error codes according to my experience of using IoTDB, see
> https://issues.apache.org/jira/browse/IOTDB-161
> 
> It is an easy task comparing to other issues. Knowing how to use Thrift
> will help to finish the feature.
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院




Re: [jira] [Created] (IOTDB-161) ERROR code is needed.

2019-08-13 Thread
Well, I prefer to define new Exception classes over ugly error codes, which 
better utilize OOP.

Tian Jiang

> 在 2019年8月13日,下午3:45,xiangdong Huang (JIRA)  写道:
> 
> xiangdong Huang created IOTDB-161:
> -
> 
> Summary: ERROR code is needed.
> Key: IOTDB-161
> URL: https://issues.apache.org/jira/browse/IOTDB-161
> Project: Apache IoTDB
>  Issue Type: Task
>Reporter: xiangdong Huang
> 
> 
> Now I think it is time to introduce error code.. 
> 
> For example, as IoTDB requires registering the time series first before 
> writing data, a kind of solution is:
> 
> ```
> 
> try{
> 
>   writeData();
> 
> } catch (SQLException e) {
> 
>   // the most case is that the time series does not exist.
> 
>   if (e.getMessage().contains("exist")) {
> 
>  //However, using the content of the error message  is not so efficient  
> 
> registerTimeSeries();
> 
> //write data once again
> 
> writeData();
> 
>   }
> 
> }
> 
> ```
> 
> If we have error code, then we do not need to write ugly codes like ` if 
> (e.getMessage().contains("exist")) ` any more.
> 
>  
> 
> Some needed error codes that I can consider include:
> * create time series failed because there is no related storage group for the 
> given time series;
> * insert/query failed because the time series does not exist
> * insert/query failed because the value format is incorrect
> * sql parse error
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> 
> 
> --
> This message was sent by Atlassian JIRA
> (v7.6.14#76016)




Re: Binary Release of IoTDB

2019-07-19 Thread
Hi,

Structure 4 looks cool, I shall stand for it. (I am not forced to say that.)

Tian Jiang

> 在 2019年7月19日,下午2:40,Julian Feinauer  写道:
> 
> Hi,
> 
> I like it but perhaps would rename bin to sbin or scripts.
> But also fine with that.
> 
> I think it's excellent that the community adopts more and more to the apache 
> way. Good job everyone!
> 
> Julian
> 
> Von meinem Mobiltelefon gesendet
> 
> 
>  Ursprüngliche Nachricht 
> Betreff: Re: Binary Release of IoTDB
> Von: Jialin Qiao
> An: dev@iotdb.apache.org
> Cc:
> 
> Hi,
> 
> Nothing good comes easily :)
> 
> After reorganizing the tools, deduplicating scripts and removing grafana 
> related folders, structure 4 here comes:
> 
> (Structure 4):
> .
> ├──  LICENSE
> ├──  NOTICE
> ├──  RELEASE_NOTES
> │
> ├──  bin
> │  ├──  start-client.bat
> │  ├──  start-client.sh
> │  ├──  start-server.bat
> │  ├──  start-server.sh
> │  ├──  stop-server.bat
> │  └──  stop-server.sh
> │
> ├──  conf
> │  ├──  iotdb-engine.properties
> │  ├──  iotdb-env.bat
> │  ├──  iotdb-env.sh
> │  ├──  iotdb-sync-client.properties
> │  ├──  logback.xml
> │  └──  tsfile-format.properties
> │
> ├──  lib
> │  └──  *.jar
> │
> ├──  licenses
> │  └──  *.license
> │
> └──  tools
>   ├──  export-csv.bat
>   ├──  export-csv.sh
>   ├──  import-csv.bat
>   ├──  import-csv.sh
>   ├──  start-WalChecker.bat
>   ├──  start-WalChecker.sh
>   ├──  memory-tool.bat
>   ├──  memory-tool.sh
>   ├──  start-sync-client.bat
>   ├──  start-sync-client.sh
>   ├──  stop-sync-client.bat
>   └──  stop-sync-client.sh
> 
> 
> Thanks,
> --
> Jialin Qiao
> School of Software, Tsinghua University
> 
> 乔嘉林
> 清华大学 软件学院
> 
>> -原始邮件-
>> 发件人: "Xiangdong Huang" 
>> 发送时间: 2019-07-18 20:13:52 (星期四)
>> 收件人: dev@iotdb.apache.org
>> 抄送:
>> 主题: Re: Binary Release of IoTDB
>> 
>> Hi all,
>> 
>> So we have made a consensus :).
>> 
>> Best,
>> ---
>> Xiangdong Huang
>> School of Software, Tsinghua University
>> 
>> 黄向东
>> 清华大学 软件学院
>> 
>> 
>> Julian Feinauer  于2019年7月18日周四 下午4:55写道:
>> 
>>> I also agree with structure 3.
>>> 
>>> Am 18.07.19, 10:39 schrieb "Xiangdong Huang" :
>>> 
>>>+1 for structure 3.
>>> 
>>>But it needs some work to fix current scripts (both the start-*.sh and
>>> the
>>>maven package plugin)...
>>> 
>>>By the way, I find that now when we are using `mvn package` with
>>>"maven-dependency-plugin:copy-dependencies" plugin, the jars whose
>>>lifecycle should be as "test" are copied into the lib folder... (e.g.,
>>>powermock-*.jar)
>>> 
>>>Best,
>>>---
>>>Xiangdong Huang
>>>School of Software, Tsinghua University
>>> 
>>> 黄向东
>>>清华大学 软件学院
>>> 
>>> 
>>>Jialin Qiao  于2019年7月18日周四 上午11:38写道:
>>> 
 Hi,
 
 I think structure 3 is better than 1 and 2.
 
 However, what makes the bin folder a mess is some tools script.
 
 Could we move walcheck and sync related scripts from "server" to a
>>> folder
 named "tools", also csv import/export scripts from client to "tools"?
 
 Besides, it's better to combine "run-client.bat" with
>>> "start-client.bat".
 
 Best
 --
 Jialin Qiao
 School of Software, Tsinghua University
 
 乔嘉林
 清华大学 软件学院
 
> -原始邮件-
> 发件人: "RUI, LEI" <1010953...@qq.com>
> 发送时间: 2019-07-18 11:24:50 (星期四)
> 收件人: dev 
> 抄送:
> 主题: 回复: Binary Release of IoTDB
> 
> Hi, I'm here to suggest another structure like this :)
> 
> 
> (Structure 3):
> .
> ├── LICENSE
> ├── NOTICE
> ├── changes.txt
> │
> ├── bin
> │   ├── client
> │   │   ├── export-csv.bat
> │   │   ├── export-csv.sh
> │   │   ├── import-csv.bat
> │   │   ├── import-csv.sh
> │   │   ├── run-client.bat
> │   │   ├── start-client.bat
> │   │   └── start-client.sh
> │   └── server
> │├── start-WalChecker.bat
> │├── start-WalChecker.sh
> │├── start-server.bat
> │├── start-server.sh
> │├── start-sync-client.bat
> │├── start-sync-client.sh
> │├── stop-server.bat
> │├── stop-server.sh
> │├── stop-sync-client.bat
> │└── stop-sync-client.sh
> │
> ├── conf
> │   ├── error_info_cn.properties
> │   ├── error_info_en.properties
> │   ├── iotdb-engine.properties
> │   ├── iotdb-env.bat
> │   ├── iotdb-env.sh
> │   ├── iotdb-sync-client.properties
> │   ├── logback.xml
> │   └── tsfile-format.properties
> │
> ├──  lib
> │   ├── client
> │   │   └── *.jar
> │   ├── server
> │   │   └── *.jar
> 
> │   └── common
> │   └── *.jar
> 
> │
> ├── licenses
> │   └── LICENCES
> │
> └── grafana-connector
> ├── bin
> │   ├── 

Re: Binary Release of IoTDB

2019-07-17 Thread
Hi,

I prefer the second. I will feel frustrated when I open the bin folder only to 
find everything is crowded in it. With the names of the scripts largely alike, 
searching for a specific one may take more time. And I really don't think some 
duplicated library matters. We are in 2010s, tens of MBs do not count.

I also think we can use a customizable download link to provide multiple choice 
for the users so they may choose which extension binaries they want.

Thanks

Tian Jiang


Re: About the design and development of merge.

2019-07-17 Thread
Hi,

The Merge is functional on branch dev_merge and the related PR is on 
https://github.com/apache/incubator-iotdb/pull/258 
. If you feel interested, 
feel free to give any advices or comments.

Thanks

Tian Jiang

Re: What Is a Good Git Workflow?

2019-07-11 Thread
I prefer not to squash, too. In my opinion, commit logs give reasons of 
everything you have done (even when you just fixed a typo) so it does not make 
sense to delete any of them. 

If someone wants a straight-forward conclusion of this branch or PR, I would 
suggest the committer(s) conclude it in the mail list or on confluence. I think 
this helps those who wish to know about commits better than searching the 
commit history.

Well, actually I doubt anyone would look into the commit logs carefully since 
they are seldom easy to understand if you are not the committer.

Tian Jiang


Re: About the design and development of merge.

2019-07-08 Thread
A1: Yes. However, in IoTDB queries, the order of chunks is not that important 
as a simple sort by startTime can make them ordered again once they are loaded 
into memory.

A2: Yes, but not necessarily. We handle time series one by one to minimize the 
memory burden, but if memory is abundant, we can handle several time series at 
the same time. Nevertheless, handing multiple time series at the same time does 
not seem to bring any significant advantages. Eventually all data in the 
unsequential file is merged, so no marks need leaving. In case of system 
failure, I also design a merge log to fast recover and I will explain it in the 
next mail.

> 在 2019年7月8日,下午4:10,Xiangdong Huang  写道:
> 
> Hi,
> 
> Q1: you change the order of Chunks in a TsFile. Does that break the 
> time-ordering characteristic?
> 
> Q2: Do you mean handling time series one by one (An unseq file may consists 
> of many devices and measurements)?  Do we need to make some marks on an unseq 
> file if only the data of a part of devices are merged?
> 
> Best.
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
>  黄向东
> 清华大学 软件学院
> 
> 
> 江天 mailto:jt2594...@163.com>> 于2019年7月8日周一 下午3:49写道:
> I have worked out a simple solution of merge, the following picture explains 
> the basic procedure of a merge, if you cannot see the picture, please visit 
> http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png 
> <http://assets.processon.com/chart_image/5d22a081e4b0878e40a8a4ae.png>. The 
> procedure consists of 6 steps:
> 1. Build a MergeReader over the unsequential files.
> 2. Read data from the MergeReader by time-ascending order.
> 3. For each unsequential data point, find its corresponding chunk in 
> sequential file.
> 4. Merge unsequantial data points with its corresponding sequential chunk to 
> generate a new chunk in a new file.
> 5.1 If unmerged chunks are the minority, append unmerged chunks to the new 
> file and generate metadata for it.
> 5.2 If merged chunks are the minority, remove metadata in the old file, 
> append merged chunks to the old file and re-generate metadata for it, 
> ignoring the chunks that have been merged in the old file.
> 6. Use the new file to replace the old file and remove useless files.
> 
> 
> 



About the design and development of merge.

2019-07-08 Thread
Greetings,

Although the new storage engine is on-line, some old functionalities are lost, 
for example, merge. As you may have known, current storage engine consists of 
two kind of data files, sequential files and unsequential files. A data point 
of a specified timestamp, say, t0, may occur zero or one time in sequential 
files and zero or many times in unsequential files. During a query, data with 
the same timestamp are all read from the files and only the newest version (the 
latest inserted one) is returned as a result. This indicates that the out-dated 
data in old files may down-grade the query performance. Besides, keeping data 
in different files incurs more disk seeks in a query, which significantly hurts 
the performance.

To avoid those disadvantages of keeping data disorderly in different files, we 
introduce a process called merge (also called compaction in other LSM systems) 
to read and rewrite data in multiple time-overlapping files to a new file which 
preserves better time order and contains no duplicated data.

Providing an efficient way to make data more compact is no easy task. If you 
feel interested or have learned compaction in some LSM systems, please join the 
discussion in this thread and give us your precious advices.

Many thanks,

Tian Jiang


Re: About prepared statement

2019-06-25 Thread
The relative pull request is on 
https://github.com/apache/incubator-iotdb/pull/206 
<https://github.com/apache/incubator-iotdb/pull/206>. I would be glad if 
someone could review it for me.
A naive test showed that when executing 100 insertions, prepared statement 
costed 41094ms, normal statement costed 54035ms, which is a significant 
improvement.

> 在 2019年6月25日,下午10:58,江天  写道:
> 
> I have created an issue IOTDB-122 
> <https://issues.apache.org/jira/browse/IOTDB-122>, you may find it on 
> https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-122?filter=allopenissues
>  
> <https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-122?filter=allopenissues>.
> 
> 
>> 在 2019年6月25日,下午9:17,江天  写道:
>> 
>> Follow the preceding mail,
>> 
>> People of interest are welcomed to join this thread, discuss about this new 
>> feature and give advice. If no one has opposite opinion, I would like to set 
>> off to develop this feature.
>> 
>> Best regards,
>> 
>> Tian Jiang
>> 
>>> 在 2019年6月25日,下午8:41,江天  写道:
>>> 
>>> Hi,
>>> 
>>> As some have mentioned, sql parser(antlr) may consume about 40% of time in 
>>> ingestion, especially when small sqls executed sent frequently. Luckily, 
>>> IoTDB insertion sqls are currently all alike and simple, there are 4 most 
>>> meaningful parts of such sqls: deviceId, measurements, values and time. For 
>>> such a simple structure, using tools like antlr may be just too heavy.
>>> 
>>> Intuitively, PreparedStatement in Standard JDBC interface can be just used 
>>> for reliving parsing overhead when sqls are similar. I will describe how 
>>> PreparedStatement works as follow (this is still left to be implemented):
>>> 
>>> 1. The user wants to create a prepared insert statement and called 
>>> `connection.prepareStatement(“Insert”)`;
>>> 2. The connection matches the parameter string with some templates, finds 
>>> out it is an insertion and returns an IoTDBPreparedInsertStatement pStmt.
>>> 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); 
>>> pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to 
>>> set parameters for next insertion.
>>> 4. The user calls `pStmt.execute()` to execute an insertion.
>>> 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, 
>>> measurements, values and time into this request and sends this request to 
>>> the server.
>>> 6. The server receives the request, extracts parameters from the request 
>>> and executes an insertion directly through database engine and return a 
>>> TSInsertionResp to the user.
>> 
> 



Re: About prepared statement

2019-06-25 Thread
I have created an issue IOTDB-122 
<https://issues.apache.org/jira/browse/IOTDB-122>, you may find it on 
https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-122?filter=allopenissues
 
<https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-122?filter=allopenissues>.


> 在 2019年6月25日,下午9:17,江天  写道:
> 
> Follow the preceding mail,
> 
> People of interest are welcomed to join this thread, discuss about this new 
> feature and give advice. If no one has opposite opinion, I would like to set 
> off to develop this feature.
> 
> Best regards,
> 
> Tian Jiang
> 
>> 在 2019年6月25日,下午8:41,江天  写道:
>> 
>> Hi,
>> 
>> As some have mentioned, sql parser(antlr) may consume about 40% of time in 
>> ingestion, especially when small sqls executed sent frequently. Luckily, 
>> IoTDB insertion sqls are currently all alike and simple, there are 4 most 
>> meaningful parts of such sqls: deviceId, measurements, values and time. For 
>> such a simple structure, using tools like antlr may be just too heavy.
>> 
>> Intuitively, PreparedStatement in Standard JDBC interface can be just used 
>> for reliving parsing overhead when sqls are similar. I will describe how 
>> PreparedStatement works as follow (this is still left to be implemented):
>> 
>> 1. The user wants to create a prepared insert statement and called 
>> `connection.prepareStatement(“Insert”)`;
>> 2. The connection matches the parameter string with some templates, finds 
>> out it is an insertion and returns an IoTDBPreparedInsertStatement pStmt.
>> 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); 
>> pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to 
>> set parameters for next insertion.
>> 4. The user calls `pStmt.execute()` to execute an insertion.
>> 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, 
>> measurements, values and time into this request and sends this request to 
>> the server.
>> 6. The server receives the request, extracts parameters from the request and 
>> executes an insertion directly through database engine and return a 
>> TSInsertionResp to the user.
> 



Re: About prepared statement

2019-06-25 Thread
Follow the preceding mail,

People of interest are welcomed to join this thread, discuss about this new 
feature and give advice. If no one has opposite opinion, I would like to set 
off to develop this feature.

Best regards,

Tian Jiang

> 在 2019年6月25日,下午8:41,江天  写道:
> 
> Hi,
> 
> As some have mentioned, sql parser(antlr) may consume about 40% of time in 
> ingestion, especially when small sqls executed sent frequently. Luckily, 
> IoTDB insertion sqls are currently all alike and simple, there are 4 most 
> meaningful parts of such sqls: deviceId, measurements, values and time. For 
> such a simple structure, using tools like antlr may be just too heavy.
> 
> Intuitively, PreparedStatement in Standard JDBC interface can be just used 
> for reliving parsing overhead when sqls are similar. I will describe how 
> PreparedStatement works as follow (this is still left to be implemented):
> 
> 1. The user wants to create a prepared insert statement and called 
> `connection.prepareStatement(“Insert”)`;
> 2. The connection matches the parameter string with some templates, finds out 
> it is an insertion and returns an IoTDBPreparedInsertStatement pStmt.
> 3. The user calls `pStmt.setDevice(“root.device1”); pStmt.setTime(100); 
> pStmt.setMeasurements(measurementArray); pStmt.setValues(valueArray);` to set 
> parameters for next insertion.
> 4. The user calls `pStmt.execute()` to execute an insertion.
> 5. The PreparedInsertStatement creates a TSInsertionReq, puts deviceId, 
> measurements, values and time into this request and sends this request to the 
> server.
> 6. The server receives the request, extracts parameters from the request and 
> executes an insertion directly through database engine and return a 
> TSInsertionResp to the user.




Having trouble testing recovery after abnormal exits.

2019-06-23 Thread
Hi,

Now that the Write Ahead Log(WAL) is back on feet again, we are having 
difficulties testing it in an integrated context. Of course we can test it by 
hand, but automated solutions are always better. However, we lack experience of 
testing recovering a system after crashes and Junit framework does not seem to 
support this feature satisfyingly.

Does anyone know about frameworks or tools that will help us testing if a 
system successfully recovers from a failure automatically?

Best regards,

Tian Jiang


Re: Out-of-Memory Analysis- reformat for reading

2019-04-21 Thread
Good summary, I would like to make some minor supplements.

1. I think the data of one series can be discarded as soon as its corresponding 
Chunk is generated, so you do not need to bother redesigning it, a simple 
optimization may do the trick.

2. Yeah, as long as you can estimate how much memory is occupied by a query. 
But I suggest we focus on insertion first.

3. OF COURSE NOT. Considering all intermediate objects and object overhead 
takes a long way to go, but we can adjust it little by little.

4. I have to point out that you may get BufferWrite working memtable, 
BufferWrite flushing memtable, Overflow working memtable, Overflow  flushing 
memtable at the same time, so a simple double may be just not enough for 
estimation.

> 在 2019年4月22日,下午12:10,kangr15  写道:
> 
> Hi all:
> Sorry forthe text format, the follows are reorganized:
> 1. Flushing to disk may double the memory cost: A storage group maintains a 
> list of ChunkGroups in memory and will be flushed to disk when its occupied 
> memory exceeding the threshold (128MB by default).
> In the current implementation, when starting to flush data, a ChunkGroup is 
> encoded in memory and thereby a new byte array is kept in memory. Until all 
> ChunkGroups have been encoded in memory, their corresponding byte arrays can 
> be released together. Since the byte array has a comparable size with 
> original data (0.5× to 1×), the above strategy may double the memory in the 
> worst case.
> Solution: It is needed to redesign the flush strategy. In TsFile, a Page is 
> the minimal flush unit, where a ChunkGroup contains several Chunks and a 
> Chunk contains several pages. Once a page is encoded into a byte array, we 
> can flush the byte array to disk and then release it. In this case, the extra 
> memory is a page size (64KB by default) at most. This modification involves a 
> list of cascading change, including metadata format and writing process.
> 
> 2. Memory Control Strategy: It is needed to redesign the memory Control 
> Strategy. For example, assigning 60% memory to the writing process and 30% 
> memory to the querying process. The writing memory includes the memory table 
> and the flush process. As an Insert coming, if its required memory exceeds 
> TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert will be rejected.
> 3. Is the memory statistics accuracy? In current codes, the memory usage of a 
> TSRecord Java Object, corresponding to an Insert SQL, is calculated by 
> summating its DataPoints. e.g., "insert into root.a.b.c(timestamp,v1, v2) 
> values(1L, true, 1.2f)", its usage is 8 + 1 + 4=13, which ignores the size of 
> object head and others. It is needed to redesign the memory statistics 
> accuracy carefully.
> 4. Is there still the memory leak? As shown in the log of the last crash due 
> to the out of memory exception, we find out the actual JVM memory is 18G, 
> whereas our memory statistic module only counts 8G. Besides the inaccuracy 
> mentioned in Q3, we doubt there are still memory leak or other potential 
> problems. We will continue to debug it.
> 
> 
> 
> 
> —
> 顺颂时祺
> 康荣
> 清华大学软件学院
> —
> Best Regards,
> Rong Kang
> School of Software, Tsinghua University
> 
> 
> 原始邮件
> 发件人:kangr15kang...@mails.tsinghua.edu.cn
> 收件人:dev...@iotdb.apache.org
> 发送时间:2019年4月22日(周一) 12:01
> 主题:Out-of-Memory Analysis
> 
> 
> Hi all: Flushing to disk may double the memory cost: A storage group 
> maintains a list of ChunkGroups in memory and will be flushed to disk when 
> its occupied memory exceeding the threshold (128MB by default). In the 
> current implementation, when starting to flush data, a ChunkGroup is encoded 
> in memory and thereby a new byte array is kept in memory. Until all 
> ChunkGroups have been encoded in memory, their corresponding byte arrays can 
> be released together. Since the byte array has a comparable size with 
> original data (0.5× to 1×), the above strategy may double the memory in the 
> worst case. Solution: It is needed to redesign the flush strategy. In TsFile, 
> a Page is the minimal flush unit, where a ChunkGroup contains several Chunks 
> and a Chunk contains several pages. Once a page is encoded into a byte array, 
> we can flush the byte array to disk and then release it. In this case, the 
> extra memory is a page size (64KB by default) at most. This modification 
> involves a list of cascading change, including metadata format and writing 
> process. Memory Control Strategy: It is needed to redesign the memory Control 
> Strategy. For example, assigning 60% memory to the writing process and 30% 
> memory to the querying process. The writing memory includes the memory table 
> and the flush process. As an Insert coming, if its required memory exceeds 
> TotalMem * 0.6 - MemTableUsage - FlushUsage, the Insert will be rejected. Is 
> the memory statistics accuracy? In current codes, the memory usage of a 
> TSRecord Java Object, corresponding to an Insert SQL, is calculated by 
> summating its 

Re: [External] Re: Commit tsfile-go

2019-04-02 Thread
Yes, I think the main differences will be: 
1. The header of a ChunkGroup(called RowGroup previously) is moved to the tail 
of the data and becomes a footer.
2. As a consequence of 1., there is a marker before each ChunkHeader and 
ChunkGroupFooter, and there is also a marker before the index at the tail of 
the file, indicating that the data section is over. 
3. The index structure at the tail of a TsFile also changes.
Please point it out if I miss anything.

> 在 2019年4月2日,下午5:10,Lyndon Dong5 Li  写道:
> 
> Hi,
> 
> After investigation, I found the tsfile format of go version is almost the 
> same with the latest TsFile format of java version, but unfortunately there 
> are some differences between them: They have almost the same meta objects 
> structures, but there are several attributes are not exactly the same.
> 
> Maybe we can eliminate these differences together if necessary.
> 
> 
> -邮件原件-
> 发件人: Xiangdong Huang  
> 发送时间: 2019年4月1日 23:03
> 收件人: dev@iotdb.apache.org
> 主题: [External] Re: Commit tsfile-go
> 
> Hi,
> 
> Glad to see that Lenovo makes this module open sourced.
> 
> I want to confirm that Is this go version follow the latest TsFile format?
> see https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院
> 
> 
> Lyndon Dong5 Li  于2019年4月1日周一 下午7:33写道:
> 
>> Hi,
>> 
>> I'm Lidong from Lenovo, I've just created a pull request to commit 
>> project 'tsfile-go' to branch 'master'.
>> Tsfile-go is a Golang version of tsfile(based on branch 'thanos'), 
>> which was developed by Lenovo & TsingHua. we implemented all of 
>> features in 'tsfile', including:
>> 1. Writing ts data to a tsfile
>> 2. Reading/querying ts data from an existing file 3. Encoding/decoding 
>> with RLE/TS_2DIFF/GORILLA/PLAIN 4. Compression/decompression with 
>> snappy
>> 
>> Now we have applied tsfile-go to Lenovo IOT platform, for purpose of 
>> caching & compressing ts data on edge devices.
>> 
>> Best regards
>> ---
>> Lidong
>> LCIG Big Data BU, Lenovo
>> 
>> 




Re: How to prevent GitBox from sending too many commit notifications

2019-01-24 Thread
I like the idea of being highly transparent, but we hope to know some tricks of 
doing this. Do we have to paste the code in the mails or just tell someone what 
problem which line in which file has? Both of these seem tiring.

> 在 2019年1月25日,上午1:08,Justin Mclean  写道:
> 
> Hi,
> 
>> No matter what you guys decide to do, I would strongly suggest to take 
>> discussions here to the list and not do discussions in the Github 
>> code-review tool.
> 
> A very big +1 to this, discussion works best for the whole community if it is 
> on the list.
> 
> Thanks,
> Justin




How to prevent GitBox from sending too many commit notifications

2019-01-24 Thread
Hello


I am driven crazy because every time someone commits to incubating-IoTDB, I 
receive a mail from GitBox. This is just too annoying. Could someone please 
tell me how to disable GitBox’s sending commit notifications, or make it send 
to comm...@iotdb.apache.org instead of dev@iotdb.apache.org?


Thanks


Tian Jiang
2019.1.24

How to prevent GitBox from sending too many commit notifications

2019-01-24 Thread
Hello.

I am driven crazy because every time someone commits to incubating-IoTDB, I 
receive a mail from GitBox. This is just too annoying. Could someone please 
tell me how to disable GitBox’s sending commit notifications, or make it send 
to comm...@iotdb.apache.org  instead of 
dev@iotdb.apache.org ?

Thanks

Tian Jiang