[jira] [Commented] (HBASE-7645) put without timestamp duplicates the record/row

2013-01-22 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559780#comment-13559780
 ] 

Anoop Sam John commented on HBASE-7645:
---

When you put data without specifying a timestamp, HBase will assign a TS for 
the insert. This will be the system time at the RS. So when you do put with the 
same data set it can get a new TS making it a new version. This is expected 
behavior only.

 put without timestamp duplicates the record/row
 ---

 Key: HBASE-7645
 URL: https://issues.apache.org/jira/browse/HBASE-7645
 Project: HBase
  Issue Type: Brainstorming
  Components: Client
Reporter: Guido Serra aka Zeph

 if I call a couple of times SQOOP on the same dataset, outputting to HBase,
 I will end up with duplicated data...
 {code}
 hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = 
 mysql:created_at, VERSIONS = 4}
 COLUMN CELL   
   
 mysql:created_at  timestamp=1358853505756, value=2011-12-21 
 18:07:38.0 
 mysql:created_at  timestamp=1358790515451, value=2011-12-21 
 18:07:38.0 
 2 row(s) in 0.0040 seconds
 today's sqoop run
 hbase(main):031:0 Date.new(1358853505756).toString()
 = Tue Jan 22 11:18:25 UTC 2013
 yesterday's sqoop run
 hbase(main):032:0 Date.new(1358790515451).toString()
 = Mon Jan 21 17:48:35 UTC 2013
 {code}
 the fact that the Put.add() method writes the kv without checking if, apart 
 of the timestamp, the value has not changed, is it by design? or a bug?
 I mean, what's the idea behind? Shall it be SQOOP (the client application) 
 supposed to handle the read on the value before issuing an add() statement 
 call?
 from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java
 {code}
   public Put add(byte [] family, byte [] qualifier, byte [] value) {
 return add(family, qualifier, this.ts, value);
   }
   public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) {
 ListKeyValue list = getKeyValueList(family);
 KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
 list.add(kv);
 familyMap.put(kv.getFamily(), list);
 return this;
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7645) put without timestamp duplicates the record/row

2013-01-22 Thread Guido Serra aka Zeph (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13559783#comment-13559783
 ] 

Guido Serra aka Zeph commented on HBASE-7645:
-

[~anoopsamjohn] uh, ok... so that is then expect. Thanks for clarifying :)

I'll handle it on client side then

 put without timestamp duplicates the record/row
 ---

 Key: HBASE-7645
 URL: https://issues.apache.org/jira/browse/HBASE-7645
 Project: HBase
  Issue Type: Brainstorming
  Components: Client
Reporter: Guido Serra aka Zeph
Priority: Trivial

 if I call a couple of times SQOOP on the same dataset, outputting to HBase,
 I will end up with duplicated data...
 {code}
 hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = 
 mysql:created_at, VERSIONS = 4}
 COLUMN CELL   
   
 mysql:created_at  timestamp=1358853505756, value=2011-12-21 
 18:07:38.0 
 mysql:created_at  timestamp=1358790515451, value=2011-12-21 
 18:07:38.0 
 2 row(s) in 0.0040 seconds
 today's sqoop run
 hbase(main):031:0 Date.new(1358853505756).toString()
 = Tue Jan 22 11:18:25 UTC 2013
 yesterday's sqoop run
 hbase(main):032:0 Date.new(1358790515451).toString()
 = Mon Jan 21 17:48:35 UTC 2013
 {code}
 the fact that the Put.add() method writes the kv without checking if, apart 
 of the timestamp, the value has not changed, is it by design? or a bug?
 I mean, what's the idea behind? Shall it be SQOOP (the client application) 
 supposed to handle the read on the value before issuing an add() statement 
 call?
 from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java
 {code}
   public Put add(byte [] family, byte [] qualifier, byte [] value) {
 return add(family, qualifier, this.ts, value);
   }
   public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) {
 ListKeyValue list = getKeyValueList(family);
 KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
 list.add(kv);
 familyMap.put(kv.getFamily(), list);
 return this;
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7645) put without timestamp duplicates the record/row

2013-01-22 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13560207#comment-13560207
 ] 

Enis Soztutar commented on HBASE-7645:
--

You can also set VERSIONS=1 in the column family descriptor. HBase will not 
keep more than 1 version. 

 put without timestamp duplicates the record/row
 ---

 Key: HBASE-7645
 URL: https://issues.apache.org/jira/browse/HBASE-7645
 Project: HBase
  Issue Type: Brainstorming
  Components: Client
Reporter: Guido Serra aka Zeph
Priority: Trivial

 if I call a couple of times SQOOP on the same dataset, outputting to HBase,
 I will end up with duplicated data...
 {code}
 hbase(main):030:0 get dump_HKFAS.sales_order, 1, {COLUMN = 
 mysql:created_at, VERSIONS = 4}
 COLUMN CELL   
   
 mysql:created_at  timestamp=1358853505756, value=2011-12-21 
 18:07:38.0 
 mysql:created_at  timestamp=1358790515451, value=2011-12-21 
 18:07:38.0 
 2 row(s) in 0.0040 seconds
 today's sqoop run
 hbase(main):031:0 Date.new(1358853505756).toString()
 = Tue Jan 22 11:18:25 UTC 2013
 yesterday's sqoop run
 hbase(main):032:0 Date.new(1358790515451).toString()
 = Mon Jan 21 17:48:35 UTC 2013
 {code}
 the fact that the Put.add() method writes the kv without checking if, apart 
 of the timestamp, the value has not changed, is it by design? or a bug?
 I mean, what's the idea behind? Shall it be SQOOP (the client application) 
 supposed to handle the read on the value before issuing an add() statement 
 call?
 from: trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/Put.java
 {code}
   public Put add(byte [] family, byte [] qualifier, byte [] value) {
 return add(family, qualifier, this.ts, value);
   }
   public Put add(byte [] family, byte [] qualifier, long ts, byte [] value) {
 ListKeyValue list = getKeyValueList(family);
 KeyValue kv = createPutKeyValue(family, qualifier, ts, value);
 list.add(kv);
 familyMap.put(kv.getFamily(), list);
 return this;
   }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira