Re: Storing data with long history of versions

Ted Yu Mon, 10 Nov 2014 07:55:47 -0800

See this recent thread: http://search-hadoop.com/m/DHED4pDVFG1


Before a major compaction, the query may fetch data from multiple HFiles.
This would be slower compared to fetching data from a single file. As for
the difference in duration of queries, you can perform query on your data
to get more concrete idea.

Cheers

On Mon, Nov 10, 2014 at 7:45 AM, Bill Q <[email protected]> wrote:

> Hi Ted,
> Thanks a lot.
>
> When would it break? Would you please give some details of why the size
> would be a decision factor?
>
> I will have probably 10 cells that have daily updates. And the rest cells
> in the column family will only have a handful of versions. So, the cells in
> the same column family will be very skewed in terms of version numbers.
>
> And before a major compaction, if I try to grab the all the versions of the
> cell, will there be any performance issue? I plan to do a batch process on
> hundreds of thousands of devices with all the versions of that few cells
> pulled out.
>
> On Monday, November 10, 2014, Ted Yu <[email protected]> wrote:
>
> > Half a million timestamps with 20 bytes each cell equate to 10MB.
> > That should be fine for your client.
> >
> > Cheers
> >
> > On Mon, Nov 10, 2014 at 7:23 AM, Bill Q <[email protected]
> > <javascript:;>> wrote:
> >
> > > Hi Ted,
> > > Thanks a lot for the reply.
> > >
> > > For #1, the size for the value only will be around 20 bytes for each
> > cell.
> > > And there will be hundreds of thousands of time stamp per cell. But not
> > > millions. Any suggestion?
> > >
> > > Many thanks.
> > >
> > >
> > > Cao
> > >
> > > On Monday, November 10, 2014, Ted Yu <[email protected]
> <javascript:;>>
> > wrote:
> > >
> > > > For #1, what's the expected size of data you want to store ?
> > > >
> > > > For #2, the new data inserted under column:value with a newer
> timestamp
> > > > would be stored in a different HFile. Old and new data would be
> > > > consolidated after major compaction.
> > > >
> > > > Cheers
> > > >
> > > > On Mon, Nov 10, 2014 at 6:21 AM, Bill Q <[email protected]
> > <javascript:;>
> > > > <javascript:;>> wrote:
> > > >
> > > > > Hi,
> > > > > I am designing a schema to store time series data for each device.
> > And
> > > I
> > > > > have a couple of questions that I am not quit sure.
> > > > >
> > > > > 1. *Is there any down side for storing the data in the same
> > > > > columnfamily:column with a long history of customized timestamp? *
> > > > >
> > > > > For example, I have historical daily data for a device. I would
> like
> > to
> > > > use
> > > > > only one column qualifier to store them with custom timestamp,
> which
> > is
> > > > the
> > > > > date of the data was collected. So, when I query the data I can
> > easily
> > > > pull
> > > > > all the timeseries data against this particular device in one scan.
> > > > >
> > > > > 2. *After a storefile is finalized and become immutable, what would
> > > > happen
> > > > > when someone updates the row? *
> > > > >
> > > > > For example, if I insert a new column:value with a newer timestamp
> > into
> > > > the
> > > > > same row:columnfamily. Where is this new key/value part going to
> sit
> > in
> > > > the
> > > > > HDFS? Is it close to the previous K/V pairs in the storefile?
> > > > >
> > > > >
> > > > > Many thanks.
> > > > >
> > > > >
> > > > > Bill
> > > > >
> > > >
> > >
> > >
> > > --
> > > Many thanks.
> > >
> > >
> > > Bill
> > >
> >
>
>
> --
> Many thanks.
>
>
> Bill
>

Re: Storing data with long history of versions

Reply via email to