I used the shell to create the table. This explained why it only
stored 3 versions. I will switch to use java API to create the
tables. Another question, I am currently sinking all data into the
same table for my prototype. Is there any heavy cost for creating new
instance of HTable?
My code may looks like this:
for(String tableName : tableList) {
List<PUT> list = ...;
hbase = new HTable(new HBaseConfiguration(), tableName);
hbase.put(list);
}
Or should I keep HTable instances in hash and reuse them later?
regards,
Eric
On Sat, Jul 3, 2010 at 5:43 PM, Jonathan Gray <[email protected]> wrote:
> Have you looked at Scan.setMaxVersions(int)? Is that what you're looking for?
>
> Also, when you created the table, it has a default max of three versions.
> Did you use the java API or the shell to create your table?
>
> HColumnDescriptor.setMaxVersions(int) is what you want to set when you create
> the table initially. To keep all versions, use
> setMaxVersions(Integer.MAX_VALUE).
>
> JG
>
>> -----Original Message-----
>> From: Eric Yang [mailto:[email protected]]
>> Sent: Saturday, July 03, 2010 4:19 PM
>> To: [email protected]
>> Subject: Re: stargate retrieve multiple version of a cell
>>
>> Hi Jonathan,
>>
>> I am trying to store large time series data. I am using a row as a
>> group for one hour's data. My row contains 60 timestamps, and each
>> timestamp has various cell values. I am hoping this will produce row
>> that is not too thick and table that is slightly shorter. I am fine
>> with none ordered versioning, as long as I get timestamp when data is
>> retrieved for the timestamp range. When I scan for the cell, I only
>> get the most recent three versions of the cell.
>>
>> This was tested on hbase 0.20.5, and hadoop 0.20.2.
>>
>> regards,
>> Eric
>>
>>
>>
>> On Sat, Jul 3, 2010 at 2:34 PM, Jonathan Gray <[email protected]>
>> wrote:
>> > What exactly are you trying to do with the timestamp? Currently even
>> duplicates are retained and returned, but the order is not guaranteed
>> (though we are working on this).
>> >
>> > The behavior is related only to time/order of operations, no
>> difference if using different clients (not including behavior from
>> write buffering).
>> >
>> > JG
>> >
>> >> -----Original Message-----
>> >> From: Eric Yang [mailto:[email protected]]
>> >> Sent: Saturday, July 03, 2010 2:32 PM
>> >> To: [email protected]
>> >> Subject: Re: stargate retrieve multiple version of a cell
>> >>
>> >> I think I just found the answer of my own question. It was not
>> >> stargte's problem. The data was not stored in hbase as I expected
>> it
>> >> to be. This raised a more basic question:
>> >>
>> >> I am storing data like this:
>> >>
>> >> Put row1, cf1:c1: 0, timestamp: 10
>> >> Put row1, cf1:c2: 10, timestamp: 10
>> >> Put row1, cf1:c2: 15, timestamp: 20
>> >> Put row1, cf1:c1: 1, timestamp: 20
>> >>
>> >> I am updating individual column by timestamp, and repeat repeat this
>> >> 60 times for each of the columns. This is all executed by the same
>> >> client. When I scan for "row1, c2", would I get 60 different values
>> >> for each of the timestamp?
>> >>
>> >> What would happen if this kind of updates are applied by different
>> >> hbase client?
>> >>
>> >> regards,
>> >> Eric
>> >>
>> >> On Sat, Jul 3, 2010 at 1:56 PM, Eric Yang <[email protected]> wrote:
>> >> > Hi all,
>> >> >
>> >> > I am trying to use stargate to get multiple versions of the cell,
>> and
>> >> > my query looks like this:
>> >> >
>> >> > http://localhost:9090/chukwa/1278180000000-Eric-Yangs-
>> >>
>> iMac.local/Hadoop_dfs_namenode:CreateFileOps/1278183540000/127818990000
>> >> 0
>> >> >
>> >> > table name: chukwa
>> >> > row: 1278187200000-Eric-Yangs-iMac.local
>> >> > column: Hadoop_dfs_namenode:CreateFileOps
>> >> > start-timestamp: 1278183540000
>> >> > end-timestamp: 1278189900000
>> >> >
>> >> > It only shows me the most recent 3 versions, but not all the
>> versions
>> >> > in this time range. Is this the right syntax? What am I doing
>> >> wrong?
>> >> > Thanks
>> >> >
>> >> > regards,
>> >> > Eric
>> >> >
>> >
>