Re: [IoTDB-164] Convert Path(String) to ID(Long)

2019-09-10 Thread Lei Rui
Hi,


I am also confused about the "six times worse" when creating timeseries.


Could you propose a draft pull request to IoTDB so that we can have a look at 
your code implementation?


Sincerely,
Lei Rui
On 9/11/2019 09:25,安彦哲 wrote:
Hi,




I'm trying to solve 
[IoTDB-164](https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-164?filter=allopenissues).
 To accomplish

the task, I've modified the structure of MTree and replace Path(String) of 
several Maps with ID(Long), including

latestTimeForEachDevice and latestFlushedTimeForEachDevice in the 
org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.java

memTableMap in the org.apache.iotdb.db.engine.memtable.AbstractMemTable.java




Then, I've carried out a simple experiment:

create 200,000 timeseries in the same storage group

insert 10 tuples() into each of timeseries




The result shows that

Step 1: the original version costs 25138.8ms while the modified one costs 
177753ms. It concludes that the modified version
performs six times worse than the original version when creating timeseries.

Step 2: the original version costs 213662.8ms while the modified one costs 
194271.2ms. It concludes that the modified version
performs nine percent better than the original version when inserting values.




In addition, we don't usually create too many timeseries, while insertion is 
the most frequently used operation in IoTDB. I'm confused

whether we should apply the modified version. If anyone knows more about this 
topic, please inform me.




Best,

---

Yanzhe An

School of Software, Tsinghua University



| |
安彦哲
|
|
thss15_a...@163.com
|
签名由网易邮箱大师定制

Re: How to use the new table schema of Spark-Connector

2019-09-10 Thread Lei Rui
Hi,
 
The website page http://iotdb.apache.org/#/Tools/Spark does not link to the 
latest 
https://github.com/apache/incubator-iotdb/blob/master/docs/Documentation/UserGuide/9-Tools-spark.md.
I'm afraid that the website is not based on the master branch. Can someone 
validate that?


Yes I think adding the java version example is user-friendly.


Sincerely,
Lei Rui 


On 9/11/2019 09:13<827011...@qq.com> wrote??
Hi,
Thanks for trying this new feature. The document of this feature is in 
/docs/Documentation/UserGuide/9-Tools-spark but only in scala. Should I add 
java version? They will be quite similar.
Best,
Kaifeng Xue




--  --
??: "Xiangdong Huang";
: 2019??9??11??(??) 2:50
??: "dev";

: How to use the new table schema of Spark-Connector



Hi,

As PR[1] has been merged, but I can not find how to use the new feature
from the iotdb.apache.org or even spark/Readme.md.

What is more, even I read the introduction about this PR, I can only get
how to use it in scala, but how to use it in Java?

[1] https://github.com/apache/incubator-iotdb/pull/303

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

??
 

[IoTDB-164] Convert Path(String) to ID(Long)

2019-09-10 Thread 安彦哲
Hi,




I'm trying to solve 
[IoTDB-164](https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-164?filter=allopenissues).
 To accomplish 

the task, I've modified the structure of MTree and replace Path(String) of 
several Maps with ID(Long), including

latestTimeForEachDevice and latestFlushedTimeForEachDevice in the 
org.apache.iotdb.db.engine.storagegroup.StorageGroupProcessor.java

memTableMap in the org.apache.iotdb.db.engine.memtable.AbstractMemTable.java




Then, I've carried out a simple experiment:

create 200,000 timeseries in the same storage group

insert 10 tuples() into each of timeseries




The result shows that

Step 1: the original version costs 25138.8ms while the modified one costs 
177753ms. It concludes that the modified version 
performs six times worse than the original version when creating 
timeseries.

Step 2: the original version costs 213662.8ms while the modified one costs 
194271.2ms. It concludes that the modified version 
performs nine percent better than the original version when inserting 
values.




In addition, we don't usually create too many timeseries, while insertion is 
the most frequently used operation in IoTDB. I'm confused 

whether we should apply the modified version. If anyone knows more about this 
topic, please inform me.




Best,

---

Yanzhe An

School of Software, Tsinghua University



| |
安彦哲
|
|
thss15_a...@163.com
|
签名由网易邮箱大师定制

??????How to use the new table schema of Spark-Connector

2019-09-10 Thread ??????????????????????
Hi,
Thanks for trying this new feature. The document of this feature is in 
/docs/Documentation/UserGuide/9-Tools-spark but only in scala. Should I add 
java version? They will be quite similar. 
Best,
Kaifeng Xue




--  --
??: "Xiangdong Huang";
: 2019??9??11??(??) 2:50
??: "dev";

: How to use the new table schema of Spark-Connector



Hi,

As PR[1] has been merged, but I can not find how to use the new feature
from the iotdb.apache.org or even spark/Readme.md.

What is more, even I read the introduction about this PR, I can only get
how to use it in scala, but how to use it in Java?

[1] https://github.com/apache/incubator-iotdb/pull/303

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 ??
 

Re: How to use the new table schema of Spark-Connector

2019-09-10 Thread Xiangdong Huang
Hi,

Okey.. I read the source code and find a new option called "form" is added:

if we want to use the wide table, write Java codes like:

```
Dataset df =
spark.read().format("org.apache.iotdb.tsfile").load("TSFILE PATH");
df.show();
```

If we want to use the narrow table, write Java codes like:
```
Dataset df =
spark.read().format("org.apache.iotdb.tsfile").option("form",
"narrow_form").load("TSFILE PATH");
df.show();
```
It should be commented in the document, I think.

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Xiangdong Huang  于2019年9月10日周二 上午11:50写道:

> Hi,
>
> As PR[1] has been merged, but I can not find how to use the new feature
> from the iotdb.apache.org or even spark/Readme.md.
>
> What is more, even I read the introduction about this PR, I can only get
> how to use it in scala, but how to use it in Java?
>
> [1] https://github.com/apache/incubator-iotdb/pull/303
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>


How to use the new table schema of Spark-Connector

2019-09-10 Thread Xiangdong Huang
Hi,

As PR[1] has been merged, but I can not find how to use the new feature
from the iotdb.apache.org or even spark/Readme.md.

What is more, even I read the introduction about this PR, I can only get
how to use it in scala, but how to use it in Java?

[1] https://github.com/apache/incubator-iotdb/pull/303

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Re:Re:Solving jira problem (IOTDB-180) Get rid of JSON format in "show timeseries"

2019-09-10 Thread thss15_yit
I have submitted the pull request of this issue [IOTDB-180]. 
The link of the pull request is 
https://github.com/apache/incubator-iotdb/pull/387
Thanks for your checking.


Tao Yi

At 2019-09-09 11:30:25, "thss15_yit"  wrote:
>The JIRA link of this issue is 
>https://issues.apache.org/jira/projects/IOTDB/issues/IOTDB-180?filter=allopenissues
>
>
>
>
>
>
>
>
>在 2019-09-09 11:21:23,"thss15_yit"  写道:
>>Hi,
>>I have been working on JIRA issue [IOTDB-180 get rid of JSON format in 
>> "show timeseries"] these days.
>>   My plan of dealing with this issue is merging the execution of statement 
>> "show timeseries" into "show timeseries ",using the functions of "show 
>> timeseries " to output the table format of the data, and then remove 
>> some of the useless functions of JSON format.
>>
>>
>>Tao Yi 


Re: Add bloom filters to TsFile

2019-09-10 Thread Julian Feinauer
Hi,

I like the idea. I'm just adding Claude here as we talked yesterday about a 
bloom filter implementation he has already done.

@cla...@apache.org what do you think? : )

Julian

From: Tian Jiang 
Sent: Tuesday, September 10, 2019 5:14:33 AM
To: dev@iotdb.apache.org 
Subject: Add bloom filters to TsFile



Greetings,


The recent readings remind me that the bloom filter is standard equipment in 
K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
filter still helps a lot in various situations. For example, our recent 
experiments gave us an illusion that the time series in a storage group remains 
unchanged. However, that is not the case.


Naturally, in real situations, the number of time series grows over time, due 
to reasons like adding new gears. The old files do not contain such a time 
series. Without the help of bloom filters, we have to check each old file only 
to find that there is no such time series. To my knowledge, this may take a lot 
of time.


So, I suggest we add a bloom filter (or some more efficient one) to each TsFile 
to help skip unwanted files.


| |
Tian Jiang
|
|
jt2594...@163.com
|
签名由网易邮箱大师定制


Re: (IOTDB-209) Improvement on the Hadoop module

2019-09-10 Thread Yuan Tian
Hi,
I’m working on this issue, my plan is to use the query method with 
partitionStartOffset and partitionEndOffset parameters and store the 
startOffset and endOffset in the TsInputSplit.


Best,
--
Yuan Tian
School of Software, Tsinghua University

田原
清华大学 软件学院

> 在 2019年9月10日,下午9:48,Yuan Tian (Jira)  写道:
> 
> Yuan Tian created IOTDB-209:
> ---
> 
> Summary: Improvement on the Hadoop module
> Key: IOTDB-209
> URL: https://issues.apache.org/jira/browse/IOTDB-209
> Project: Apache IoTDB
>  Issue Type: Improvement
>Reporter: Yuan Tian
> 
> 
> Hadoop module isn't updated with the TsFile module, so there are lots of 
> history's legacy in that module. Hoping that the Hadoop module can be 
> remaintained, thus we can do some Hadoop map-reduce jobs on the TsFiles.
> 
> 
> 
> --
> This message was sent by Atlassian Jira
> (v8.3.2#803003)



[jira] [Created] (IOTDB-209) Improvement on the Hadoop module

2019-09-10 Thread Yuan Tian (Jira)
Yuan Tian created IOTDB-209:
---

 Summary: Improvement on the Hadoop module
 Key: IOTDB-209
 URL: https://issues.apache.org/jira/browse/IOTDB-209
 Project: Apache IoTDB
  Issue Type: Improvement
Reporter: Yuan Tian


Hadoop module isn't updated with the TsFile module, so there are lots of 
history's legacy in that module. Hoping that the Hadoop module can be 
remaintained, thus we can do some Hadoop map-reduce jobs on the TsFiles.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Add bloom filters to TsFile

2019-09-10 Thread Tian Jiang


Greetings,


The recent readings remind me that the bloom filter is standard equipment in 
K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
filter still helps a lot in various situations. For example, our recent 
experiments gave us an illusion that the time series in a storage group remains 
unchanged. However, that is not the case.


Naturally, in real situations, the number of time series grows over time, due 
to reasons like adding new gears. The old files do not contain such a time 
series. Without the help of bloom filters, we have to check each old file only 
to find that there is no such time series. To my knowledge, this may take a lot 
of time.


So, I suggest we add a bloom filter (or some more efficient one) to each TsFile 
to help skip unwanted files.


| |
Tian Jiang
|
|
jt2594...@163.com
|
签名由网易邮箱大师定制

[jira] [Created] (IOTDB-208) Add bloom filters to TsFile

2019-09-10 Thread Tian Jiang (Jira)
Tian Jiang created IOTDB-208:


 Summary: Add bloom filters to TsFile
 Key: IOTDB-208
 URL: https://issues.apache.org/jira/browse/IOTDB-208
 Project: Apache IoTDB
  Issue Type: New Feature
Reporter: Tian Jiang


The recent readings remind me that the bloom filter is standard equipment in 
K-VDBs. Although IoTDB is not one of them (at least not typically), the bloom 
filter still helps a lot in various situations. For example, our recent 
experiments gave us an illusion that the time series in a storage group remains 
unchanged. However, that is not the case.

Naturally, in real situations, the number of time series grows over time, due 
to reasons like adding new gears. The old files do not contain such a time 
series. Without the help of bloom filters, we have to check each old file only 
to find that there is no such time series. To my knowledge, this may take a lot 
of time.

So, I suggest we add a bloom filter (or some more efficient one) to each TsFile 
to help skip unwanted files.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IOTDB-207) Bug about the method query(QueryExpression queryExpression, long partitionStartOffset, long partitionEndOffset) in ReadOnlyTsFile

2019-09-10 Thread Yuan Tian (Jira)
Yuan Tian created IOTDB-207:
---

 Summary: Bug about the method query(QueryExpression 
queryExpression, long partitionStartOffset, long partitionEndOffset) in 
ReadOnlyTsFile
 Key: IOTDB-207
 URL: https://issues.apache.org/jira/browse/IOTDB-207
 Project: Apache IoTDB
  Issue Type: Bug
Reporter: Yuan Tian


While using the method query(QueryExpression queryExpression, long 
partitionStartOffset, long partitionEndOffset) in ReadOnlyTsFile, I found that 
if the TsFiles only contains one chunkGroup and I set the parameter 
‘partitionStartOffset’ to the startOffset of the chunkGroup and the parameter 
‘partitionEndOffset’ to the endOffset of the chunkGroup, the method should 
return a queryDataset that contains all data. However, it lost some data.

More specifically, in my case, I use TsFileWriteWithRowBatch in example module 
to create a TsFile consisting of only one device, ten sensors and 1 million row 
data totally. While using the method as mentioned before to read, I only got 
999424 row data.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)