Re: AW: Operation and robustness of iotDB

2019-03-07 Thread Xu yi
Hi,

In my opinion, different measurements use their own timestamp even though they 
are grouped into one chunk group.they don’t share from each other.

What do you think of this @xiangdong 

Thanks 
XuYi 

iPhoneから送信

2019/03/08 1:41、Julian Feinauer のメール:

> Hi,
> 
> Yes this is what I meant.
> 
> Julian
> 
> Von meinem Mobiltelefon gesendet
> 
> 
>  Ursprüngliche Nachricht 
> Betreff: Re: Operation and robustness of iotDB
> Von: 徐毅
> An: dev@iotdb.apache.org
> Cc:
> 
> Hi,
> In the definition of ChunkGroup, what is the meaning of 'share one time 
> signal'? Do these measurements share same timestamps?
> 
> 
> Thanks
> XuYi
> On 3/8/2019 01:11,Julian Feinauer wrote:
> Hey Xiangdong,
> hey all,
> 
> I like the documentation much.
> The only thing I'm a bit unsure is about the names (as there is no 
> clarification).
> So, before I update it with any wrong information I would like to ensure that 
> I have the correct understanding.
> 
> I assume that most naming is similar to Parquet.
> 
> Page - Contains one Measurement, smallest source of compression
> Chunk - Collection of multiple Pages, still one measurement
> ChunkGroup - Collection of chunks of which share one time signal (one Chunk 
> for each measurement)
> 
> Is this correct so?
> 
> Julian
> 
> Am 05.03.19, 12:26 schrieb "Xiangdong Huang" :
> 
> Hi,
> 
> 1. We have a document to introduce that:
> https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
> 
> 2. The new API for recovering data is almost done. I am writing the UTs
> now. Maybe I can submit a PR tonight (if everything is fine...)
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院
> 
> 
> Julian Feinauer  于2019年3月5日周二 下午6:00写道:
> 
> Hi Xiangdong,
> 
> that sounds excellent.
> Do you have a short overview of how the file format is designed on disk?
> I know that its somewhat similar to parquet but I did not find more
> details.
> Basically what would suffice for us would be something like skipping an
> invalid column group (or how you name it) and go on with the next, or so.
> 
> Julian
> 
> Am 04.03.19, 13:21 schrieb "Xiangdong Huang" :
> 
> Hi,
> 
> If so, I think I need to add a new API to allow you continue to write
> data
> in an existing  but not closed correctly TsFile. Then everything is
> fine
> for you :D
> 
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
> 
> 黄向东
> 清华大学 软件学院
> 
> 
> Julian Feinauer  于2019年3月4日周一 下午8:08写道:
> 
> Hey Xiangdong,
> 
> thanks for the great explanation.
> And in fact, I agree with you that it would be best if we start to
> play
> around with it and reply all our findings or wishes back to this
> list (in
> fact that proved to be beneficial in plc4x as well).
> 
> You confirm my thoughts about the two "levels" of APIs (DB and file)
> and
> the file api is exactly what we looked for for our use case.
> As we do not care much about data loss (when an edge device fails
> its...
> gone).
> The crucial point for us is that no corrupt files can be generated.
> This means I'm fine when the last data submitted is lost but I'm not
> fine
> if we can get to a situation where the last datafile is completely
> lost
> (well, perhaps this could be acceptable).
> 
> @tim: Perhaps its best when you give some more information to
> Xiangdong
> about our idea, and we can also point to our current code in github
> 
> Julian
> 
> Am 04.03.19, 13:03 schrieb "Xiangdong Huang" :
> 
> Hi,
> 
> TsFile API is not deprecated. In fact, it is designed for this
> scenario and
> MapReduce/Spark computing.
> 
> If you just use Reader and Writer API, there is something you
> need to
> know:
> 
> Let's suppose your block size is x Bytes,
> (tsfile-format.properties:
> group_size_in_byte).
> 
> 1. If you write data and a shutdown occurs, then all data that is
> flushed
> on disk is ok, and you can read the data ( class
> org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
> you need
> to
> change it a little. I think I can write an example.)
> 
> 2. Actually, TsFile has the ability to allow you continue to
> write
> data at
> the end of the incomplete file. However, We do not provide this
> API
> now...
> If needed, I can add the API.
> 
> 3. In this scenario, you will lose at most x Bytes data. If you
> do not
> accept that, something like WAL is needed. (It is not very
> complex,
> but I
> am not sure that whether it should be an embedded function for
> TsFile).
> 
> Up to now, we can consider that TsFile API is suitable for your
> scenario
> (even though we need to add a little more API if you desire).
> And you
> can
> get the ability to compress data, and query data from the TsFile
> rather
> than scan the data from the head to the tail.
> 
> However, TsFile has one constraint: You can not write
> out-of-order data
> into a TsFile, otherwise the query API may return incomplete
> result.
> But I think it is ok 

AW: Operation and robustness of iotDB

2019-03-07 Thread Julian Feinauer
Hi,

Yes this is what I meant.

Julian

Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: Operation and robustness of iotDB
Von: 徐毅
An: dev@iotdb.apache.org
Cc:

Hi,
In the definition of ChunkGroup, what is the meaning of 'share one time 
signal'? Do these measurements share same timestamps?


Thanks
XuYi
On 3/8/2019 01:11,Julian Feinauer wrote:
Hey Xiangdong,
hey all,

I like the documentation much.
The only thing I'm a bit unsure is about the names (as there is no 
clarification).
So, before I update it with any wrong information I would like to ensure that I 
have the correct understanding.

I assume that most naming is similar to Parquet.

Page - Contains one Measurement, smallest source of compression
Chunk - Collection of multiple Pages, still one measurement
ChunkGroup - Collection of chunks of which share one time signal (one Chunk for 
each measurement)

Is this correct so?

Julian

Am 05.03.19, 12:26 schrieb "Xiangdong Huang" :

Hi,

1. We have a document to introduce that:
https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format

2. The new API for recovering data is almost done. I am writing the UTs
now. Maybe I can submit a PR tonight (if everything is fine...)

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer  于2019年3月5日周二 下午6:00写道:

Hi Xiangdong,

that sounds excellent.
Do you have a short overview of how the file format is designed on disk?
I know that its somewhat similar to parquet but I did not find more
details.
Basically what would suffice for us would be something like skipping an
invalid column group (or how you name it) and go on with the next, or so.

Julian

Am 04.03.19, 13:21 schrieb "Xiangdong Huang" :

Hi,

If so, I think I need to add a new API to allow you continue to write
data
in an existing  but not closed correctly TsFile. Then everything is
fine
for you :D

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer  于2019年3月4日周一 下午8:08写道:

Hey Xiangdong,

thanks for the great explanation.
And in fact, I agree with you that it would be best if we start to
play
around with it and reply all our findings or wishes back to this
list (in
fact that proved to be beneficial in plc4x as well).

You confirm my thoughts about the two "levels" of APIs (DB and file)
and
the file api is exactly what we looked for for our use case.
As we do not care much about data loss (when an edge device fails
its...
gone).
The crucial point for us is that no corrupt files can be generated.
This means I'm fine when the last data submitted is lost but I'm not
fine
if we can get to a situation where the last datafile is completely
lost
(well, perhaps this could be acceptable).

@tim: Perhaps its best when you give some more information to
Xiangdong
about our idea, and we can also point to our current code in github

Julian

Am 04.03.19, 13:03 schrieb "Xiangdong Huang" :

Hi,

TsFile API is not deprecated. In fact, it is designed for this
scenario and
MapReduce/Spark computing.

If you just use Reader and Writer API, there is something you
need to
know:

Let's suppose your block size is x Bytes,
(tsfile-format.properties:
group_size_in_byte).

1. If you write data and a shutdown occurs, then all data that is
flushed
on disk is ok, and you can read the data ( class
org.apache.iotdb.tsfile.TsFileSequenceRead is an example, but
you need
to
change it a little. I think I can write an example.)

2. Actually, TsFile has the ability to allow you continue to
write
data at
the end of the incomplete file. However, We do not provide this
API
now...
If needed, I can add the API.

3. In this scenario, you will lose at most x Bytes data. If you
do not
accept that, something like WAL is needed. (It is not very
complex,
but I
am not sure that whether it should be an embedded function for
TsFile).

Up to now, we can consider that TsFile API is suitable for your
scenario
(even though we need to add a little more API if you desire).
And you
can
get the ability to compress data, and query data from the TsFile
rather
than scan the data from the head to the tail.

However, TsFile has one constraint: You can not write
out-of-order data
into a TsFile, otherwise the query API may return incomplete
result.
But I think it is ok for real applications, because I do not
think
that a
device can generate out-of-order data

For example, If you write two devices' data into one TsFile, it
is ok
if
you write data like:
- d1.t1, d1.t2, d2.t1, d2.t2, d2.t3, d1.t4, d1.t5 
or:
- d1.m1.t1, d1.m1.t2, d1.m2.t1, d1.m2.t2, d2.m1.t1 ...

But you can not write data like:
- d1.m1.t2, d1.m1.t1 ...

I think it is a good chance to improve TsFile to make it more
suitable
for
real applications, so please do not hesitate to tell me more
about
what you
think TsFile should want to have?

Best,
---
Xiangdong Huang
School 

AW: Operation and robustness of iotDB

2019-03-05 Thread Julian Feinauer
Hi Xiangdong,

Great work!
I'll try to go through your code to understand the internals better and perhaps 
do a simulation test with hard jvm exits.
If everything goes fine we will incorporate that asap in test code and give you 
feedback!

Thank you!
Julian

Von meinem Mobiltelefon gesendet


 Ursprüngliche Nachricht 
Betreff: Re: Operation and robustness of iotDB
Von: Xiangdong Huang
An: dev@iotdb.apache.org
Cc:

Hi,

I have added a new TsFileIOWriter, which supports to recover data from a
broken TsFile (I mean, a TsFile that is not closed correctly). The PR is
https://github.com/apache/incubator-iotdb/pull/87.

You can use the new feature by:
```
NativeRestorableIOWriter rWriter = new NativeRestorableIOWriter(file);
TsFileWriter writer = new TsFileWriter(rWriter);
```
1. If the file is a complete TsFile (e.g., the FileMetadata has been
persistent into the file and the tail magic string is correct), then you
can not write data into this TsFile anymore (actually I can remove all
FileMetadata and then let you continue to write, but it needs more coding
work...), `TsFileWriter writer = new TsFileWriter(rWriter);` will throw a
IoException. You can check it before you create TsFileWriter by using
`rWriter.canWrite()`.

2. ChunkGroup is the basic unit that can be recover. That is, if a
ChunkGroup is complete, i.e., its chunkGroupFooter is complete, I will
remain it, otherwise I will truncate the chunk group. However, I can not
know whether a chunkGroupFooter is complete (if only one byte lose, the
deserialize method does not throw exception, because  something like
`deserializeToInt(inputstream)` does not throw exception (it will use 3
bytes to convert them to a integer).  So, when I say "if a ChunkGroup is
complete", I mean that not only its chunkGroupFooter  is serialized
successfully, but also there is at least one more byte is serialized into
the file followed the footer.

3. I have tested many cases, such only a magic head string is persisted,
only a ChunkHeader is persisted,  the first ChunkHeader is not persisted
completely, only some Chunks are persisted, a ChunkGroupFooter is
persisted, Two ChunkGroupFooter are persisted, and a complete TsFile is
persisted, etc..  (`NativeRestorableIOWriterTest.java`, hope I have cover
all cases)

Hope TsFile can be used in edge device management applications ASAP!

Now please enjoy the new feature.

---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Julian Feinauer  于2019年3月5日周二 下午8:10写道:

> Hey,
>
> thank you fort he link... I did not know of this.. this is exactly what I
> was looking for!
>
> Julian
>
> PS.: Looking forward to your PR : )
>
> Am 05.03.19, 12:26 schrieb "Xiangdong Huang" :
>
> Hi,
>
> 1. We have a document to introduce that:
> https://cwiki.apache.org/confluence/display/IOTDB/TsFile+Format
>
> 2. The new API for recovering data is almost done. I am writing the UTs
> now. Maybe I can submit a PR tonight (if everything is fine...)
>
> Best,
> ---
> Xiangdong Huang
> School of Software, Tsinghua University
>
>  黄向东
> 清华大学 软件学院
>
>
> Julian Feinauer  于2019年3月5日周二 下午6:00写道:
>
> > Hi Xiangdong,
> >
> > that sounds excellent.
> > Do you have a short overview of how the file format is designed on
> disk?
> > I know that its somewhat similar to parquet but I did not find more
> > details.
> > Basically what would suffice for us would be something like skipping
> an
> > invalid column group (or how you name it) and go on with the next,
> or so.
> >
> > Julian
> >
> > Am 04.03.19, 13:21 schrieb "Xiangdong Huang" :
> >
> > Hi,
> >
> > If so, I think I need to add a new API to allow you continue to
> write
> > data
> > in an existing  but not closed correctly TsFile. Then everything
> is
> > fine
> > for you :D
> >
> > Best,
> > ---
> > Xiangdong Huang
> > School of Software, Tsinghua University
> >
> >  黄向东
> > 清华大学 软件学院
> >
> >
> > Julian Feinauer  于2019年3月4日周一
> 下午8:08写道:
> >
> > > Hey Xiangdong,
> > >
> > > thanks for the great explanation.
> > > And in fact, I agree with you that it would be best if we
> start to
> > play
> > > around with it and reply all our findings or wishes back to
> this
> > list (in
> > > fact that proved to be beneficial in plc4x as well).
> > >
> > > You confirm my thoughts about the two "levels" of APIs (DB and
> file)
> > and
> > > the file api is exactly what we looked for for our use case.
> > > As we do not care much about data loss (when an edge device
> fails
> > its...
> > > gone).
> > > The crucial point for us is that no corrupt files can be
>