date:20100609

[jira] Commented: (HBASE-50) Snapshot of table

2010-06-09 Thread Jean-Daniel Cryans (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877121#action_12877121
]

Jean-Daniel Cryans commented on HBASE-50:
-

bq. Yes..That sounds good. I will implement another LogCleanerDelegate, say
ReferenceLogCleaner or SnapshotLogCleaner.

Latter. Some refactoring could be done on how to chain multiple delegates
without doing a bunch of ifs in the code. Could be in the scope of another jira.

bq. Do you archive any other files besides log files, say HFiles?

AFAIK, no.

Snapshot of table
-

Key: HBASE-50
URL: https://issues.apache.org/jira/browse/HBASE-50
Project: HBase
Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip

Havening an option to take a snapshot of a table would be vary useful in
production.
What I would like to see this option do is do a merge of all the data into
one or more files stored in the same folder on the dfs. This way we could
save data in case of a software bug in hadoop or user code.
The other advantage would be to be able to export a table to multi locations.
Say I had a read_only table that must be online. I could take a snapshot of
it when needed and export it to a separate data center and have it loaded
there and then i would have it online at multi data centers for load
balancing and failover.
I understand that hadoop takes the need out of havening backup to protect
from failed servers, but this does not protect use from software bugs that
might delete or alter data in ways we did not plan. We should have a way we
can roll back a dataset.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HBASE-2069) Use the new 'visible' length feature added by hdfs-814

2010-06-09 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-2069.
--

Resolution: Fixed

Fixed by HDFS append support

 Use the new 'visible' length feature added by hdfs-814
 --

 Key: HBASE-2069
 URL: https://issues.apache.org/jira/browse/HBASE-2069
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Fix For: 0.21.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-1025) Reconstruction log playback has no bounds on memory used

2010-06-09 Thread Jonathan Gray (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray reassigned HBASE-1025:


Assignee: Kannan Muthukkaruppan

 Reconstruction log playback has no bounds on memory used
 

 Key: HBASE-1025
 URL: https://issues.apache.org/jira/browse/HBASE-1025
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: Kannan Muthukkaruppan
 Fix For: 0.21.0


 Makes a TreeMap and just keeps adding edits without regard for size of edits 
 applied; could cause OOME (I've not seen a definitive case though have seen 
 an OOME around time of a reconstructionlog replay -- perhaps this the straw 
 that broke the fleas antlers?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

2010-06-09 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877244#action_12877244
]

HBase Review Board commented on HBASE-2468:
---

Message from: Mingjie Lai mjla...@gmail.com

bq. On 2010-06-07 14:23:42, stack wrote:
bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 96
bq. http://review.hbase.org/r/98/diff/5/?file=944#file944line96
bq.
bq. getRowOrBefore is an expensive call. Are we sure we are not calling
this too often?

I agree it is an expensive call.

However I don't think it would bring any performance penalty for existing and
potential use cases:
Use case 1 -- existing MetaScanner users: since this method is newly added,
existing users won't be affected;
Use case 2 -- hbase clients when locating a region :
1) if prefetch is on, it calls this MetaScanner with [table + row combination],
which calls getRowOrBefore() to get current region info, then number of
following regions from meta. After that, the client can get the region info
directly from cache.
2) if prefetch is disabled (current behavior), it eventually calls similar
method getClosestRowBefore() to get desired region.

So no matter prefetch is on or not, getRowOrBefore(or getClosestRowBefore)
eventually is called. The only difference is whether to scan following regions
from meta or not.

For future MetaScanner users which scan from one region with desired use table
row, it has to take the effort since it is the expected behavior.

- Mingjie

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review144
---

Improvements to prewarm META cache on clients
-

Key: HBASE-2468
URL: https://issues.apache.org/jira/browse/HBASE-2468
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Todd Lipcon
Assignee: Mingjie Lai
Fix For: 0.21.0

Attachments: HBASE-2468-trunk.patch

A couple different use cases cause storms of reads to META during startup.
For example, a large MR job will cause each map task to hit meta since it
starts with an empty cache.
A couple possible improvements have been proposed:
- MR jobs could ship a copy of META for the table in the DistributedCache
- Clients could prewarm cache by doing a large scan of all the meta for the
table instead of random reads for each miss
- Each miss could fetch ahead some number of rows in META

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

2010-06-09 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877251#action_12877251
]

HBase Review Board commented on HBASE-2468:
---

Message from: Mingjie Lai mjla...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/
---

(Updated 2010-06-09 15:50:59.084657)

Review request for hbase, Todd Lipcon and stack.

Changes
---

@St^ack: please see my comments for your feedback regarding getRowOrBefore()
issue.

Invite Todd as a reviewer.

Summary
---

HBASE-2468: Improvements to prewarm META cache on clients.

Changes:
1. Add new HTable methods which support region info de/serialation, and region
cache prewarm:
- void serializeRegionInfo(): clients could perform a large scan for all the
meta for the table, serialize the meta to a file. MR job can ship a copy of the
meta for the table in the DistributedCache
- MapHRegionInfo, HServerAddress deserializeRegionInfo(): MR job can
deserialize the region info from the DistributedCache
- prewarmRegionCache(MapHRegionInfo, HServerAddress regionMap): MR job can
prewarm local region cache by the deserialized region info.

2. For each client, each region cache read-miss could trigger read-ahead some
number of rows in META. This option could be turned on and off for one
particular table.

This addresses bug HBASE-2468.
http://issues.apache.org/jira/browse/HBASE-2468

Diffs
-

src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d
src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092
src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb
src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07
src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a

Diff: http://review.hbase.org/r/98/diff

Testing
---

Unit tests passed locally for me.

Thanks,

Mingjie

Improvements to prewarm META cache on clients
-

Key: HBASE-2468
URL: https://issues.apache.org/jira/browse/HBASE-2468
Project: HBase
Issue Type: Improvement
Components: client
Reporter: Todd Lipcon
Assignee: Mingjie Lai
Fix For: 0.21.0

Attachments: HBASE-2468-trunk.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

2010-06-09 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877281#action_12877281
 ] 

HBase Review Board commented on HBASE-2468:
---

Message from: Todd Lipcon t...@cloudera.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/98/#review165
---


Looking good! Just a few notes.


src/main/java/org/apache/hadoop/hbase/client/HConnection.java
http://review.hbase.org/r/98/#comment813

I thought we were collapsing these two calls into 
setRegionCachePrefetchEnabled(tableName, enabled)?



src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java
http://review.hbase.org/r/98/#comment816

I don't entirely understand why we key these hashes by integer, but it 
seems like you're following the status quo, so doesn't need to be addressed in 
this patch.



src/main/java/org/apache/hadoop/hbase/client/HTable.java
http://review.hbase.org/r/98/#comment822

I still don't quite understand the logic about why these should be static. 
Previously you pointed to the enable/disable calls, but those are more like 
admin calls, not calls that affect client behavior. Anyone else have an opinion?



src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
http://review.hbase.org/r/98/#comment823

I think this should be Math.max(rowLimit, configuration.getInt(...)) - if 
we only want to scan 5 rows, we don't want the scanner to prefetch 100 for us.


- Todd





 Improvements to prewarm META cache on clients
 -

 Key: HBASE-2468
 URL: https://issues.apache.org/jira/browse/HBASE-2468
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Todd Lipcon
Assignee: Mingjie Lai
 Fix For: 0.21.0

 Attachments: HBASE-2468-trunk.patch


 A couple different use cases cause storms of reads to META during startup. 
 For example, a large MR job will cause each map task to hit meta since it 
 starts with an empty cache.
 A couple possible improvements have been proposed:
  - MR jobs could ship a copy of META for the table in the DistributedCache
  - Clients could prewarm cache by doing a large scan of all the meta for the 
 table instead of random reads for each miss
  - Each miss could fetch ahead some number of rows in META

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

2010-06-09 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877280#action_12877280
]

HBase Review Board commented on HBASE-2400:
---

Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 111
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line111
bq.
bq. do we need to make these fields nullable? usually they are
true/false in the java code.
bq.
bq. Is this some semi-mechanical translation from a java api?

I use the same Avro record for table creation and modification as well as
description. For create table, I want the fields to be nullable because the
user should not have to specify a value.

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 94
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line94
bq.
bq. the compression can never be null, because the NONE is the catch
all here.

Same as below: I use the same record for family creation, modification, and
description. Avro currently doesn't have default values on write, so making
this field nullable means we can do smart things if the user doesn't specify a
compression algorithm during Family creation.

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 78
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line78
bq.
bq. same as deadServerNames.

Yeah I should make these 0-length arrays.

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 73
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line73
bq.
bq. couldnt you use a empty string if there are no dead server names?
im not sure if arrays can be 0 length in avro :-)

Will make a 0-length array

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 66
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line66
bq.
bq. technically the serverName is the serverAddress + startCode... in
the Java code is isnt fully exposed. Not sure what we want to do here, but
this is probably fine as is.

Yeah since Avro records don't have methods, you can think of this field as a
materialization of the Java logic.

bq. On 2010-06-09 17:54:20, Ryan Rawson wrote:
bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 34
bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line34
bq.
bq. you can probably just use 'hostname' and 'port'. There was a recent
patch in trunk that is attempting to get rid of IP addresses (they cause issues
when they dont align with DNS names, etc) and generally move us to a DNS name
world.

Let me know what you want me to do here. I was just copying the fields directly
from the Java objects.

- Jeff

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/128/#review164
---

new connector for Avro RPC access to HBase cluster
--

Key: HBASE-2400
URL: https://issues.apache.org/jira/browse/HBASE-2400
Project: HBase
Issue Type: Task
Components: avro
Reporter: Andrew Purtell
Priority: Minor
Attachments: HBASE-2400-v0.patch

Build a new connector contrib architecturally equivalent to the Thrift
connector, but using Avro serialization and associated transport and RPC
server work. Support AAA (audit, authentication, authorization).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

2010-06-09 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877283#action_12877283
 ] 

HBase Review Board commented on HBASE-2400:
---

Message from: Ryan Rawson ryano...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/128/#review168
---



trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment824

in this case we have to distinguish between 'give me family X' and 'give me 
family X 0 length qualifier' which are in fact different queries and are both 
representable in the standard Get Java API.

the java code does this by using a map of a map in the Get object:
Mapbyte[], Setbyte[] familyMap;
where the key is the family, and the value is the set of qualifiers for 
said family.  If you want to get a family the code will use 'null' as the Set 
value.

For the Avro API we don't have to do it in the same way, but we need to 
know the difference between those queries. perhaps using AColumn 'family = foo, 
qualifier=null' can be the 'give me the family' and 'family = foo, qualifier = 
0 length bytes' can be the other?


- Ryan





 new connector for Avro RPC access to HBase cluster
 --

 Key: HBASE-2400
 URL: https://issues.apache.org/jira/browse/HBASE-2400
 Project: HBase
  Issue Type: Task
  Components: avro
Reporter: Andrew Purtell
Priority: Minor
 Attachments: HBASE-2400-v0.patch


 Build a new connector contrib architecturally equivalent to the Thrift 
 connector, but using Avro serialization and associated transport and RPC 
 server work. Support AAA (audit, authentication, authorization). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

2010-06-09 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877285#action_12877285
 ] 

HBase Review Board commented on HBASE-2400:
---

Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/128/
---

(Updated 2010-06-09 18:22:12.245370)


Review request for hbase.


Changes
---

Addressed Ryan's requests


Summary
---

Initial patch; some javadoc and tests missing, but I wanted to get some initial 
feedback on the approach. My apologies for sticking a patch on the JIRA before 
the review. I should have read further on the HowToContribute JIRA.


This addresses bug HBASE-2400.


Diffs (updated)
-

  trunk/bin/hbase 953193 
  trunk/pom.xml 953193 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/AvroServer.java PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/AvroUtil.java PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AAlreadyExists.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AClusterStatus.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumn.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumnFamilyDescriptor.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumnValue.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ACompressionAlgorithm.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ADelete.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AFamilyDescriptor.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AGet.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AIOError.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AIllegalArgument.java
 PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AMasterNotRunning.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/APut.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ARegionLoad.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AResult.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AResultEntry.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AScan.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerAddress.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerInfo.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerLoad.java 
PRE-CREATION 
  
trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATableDescriptor.java
 PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATableExists.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATimeRange.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/HBase.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/IOError.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/TCell.java 
PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro PRE-CREATION 
  trunk/src/main/java/org/apache/hadoop/hbase/avro/package.html PRE-CREATION 
  trunk/src/test/java/org/apache/hadoop/hbase/avro/TestAvroServer.java 
PRE-CREATION 

Diff: http://review.hbase.org/r/128/diff


Testing
---


Thanks,

Jeff




 new connector for Avro RPC access to HBase cluster
 --

 Key: HBASE-2400
 URL: https://issues.apache.org/jira/browse/HBASE-2400
 Project: HBase
  Issue Type: Task
  Components: avro
Reporter: Andrew Purtell
Priority: Minor
 Attachments: HBASE-2400-v0.patch


 Build a new connector contrib architecturally equivalent to the Thrift 
 connector, but using Avro serialization and associated transport and RPC 
 server work. Support AAA (audit, authentication, authorization). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

2010-06-09 Thread HBase Review Board (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877294#action_12877294
]

HBase Review Board commented on HBASE-2400:
---

Message from: Ryan Rawson ryano...@gmail.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/128/#review171
---

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr
http://review.hbase.org/r/128/#comment832

if this is generated from the genavro, is it possible to get a maven rule
to generate this? Or is that not ready yet?

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment827

does it make sense to reuse AColumn here?

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment828

The Java API gets its speed by essentially taking a Result which is an
array of KeyValue, which are just byte arrays and serializing it all as one
large array. On the client side, the client reads the entire array then builds
the KeyValues that provide a view onto this one array.

I don't know how this performance improvement could be done in this avro
interface, but I thought I'd bring it up for reference.

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment829

it would be nice to collapse AResultEntry and AColumnValue, they seem to be
almost the same thing.

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment830

technically speaking getRowOrBefore() isnt a 'public' method, it is
supposed to be mostly used for META support, and I think we are trending to
'dont use for general purpose'.

trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro
http://review.hbase.org/r/128/#comment831

these apis are good, but i'm wondering if you'd be open to a new
experimental scanner API we have been interested in for the base RPC...

essentially right now you need 3 RPC calls even to retrieve a small amount
of data. What would an API look like that lets you open, get rows and have
implicit closes if you hit the end of the scan in your 'number of records'
parameter? We'd still have explicit closes for premature client-driven
scan-terminations, but if your goal is to scan to the end, then why do an
explicit close? Also why not have the 'open' also start to return data? The
returned value would probably have to be a struct..

This is more of an optional exercise, so if you dont feel the need, it's
fine.

- Ryan

new connector for Avro RPC access to HBase cluster
--

Key: HBASE-2400
URL: https://issues.apache.org/jira/browse/HBASE-2400
Project: HBase
Issue Type: Task
Components: avro
Reporter: Andrew Purtell
Priority: Minor
Attachments: HBASE-2400-v0.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

2010-06-09 Thread HBase Review Board (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877296#action_12877296
 ] 

HBase Review Board commented on HBASE-2400:
---

Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com


bq.  On 2010-06-09 19:24:25, Ryan Rawson wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr, line 1
bq.   http://review.hbase.org/r/128/diff/3/?file=1190#file1190line1
bq.  
bq.   if this is generated from the genavro, is it possible to get a maven 
rule to generate this?  Or is that not ready yet?

Yes, this should definitely be done during the build. See 
https://issues.apache.org/jira/browse/AVRO-572.


bq.  On 2010-06-09 19:24:25, Ryan Rawson wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 155
bq.   http://review.hbase.org/r/128/diff/3/?file=1191#file1191line155
bq.  
bq.   The Java API gets its speed by essentially taking a Result which is 
an array of KeyValue, which are just byte arrays and serializing it all as one 
large array.  On the client side, the client reads the entire array then builds 
the KeyValues that provide a view onto this one array. 
bq.   
bq.   I don't know how this performance improvement could be done in this 
avro interface, but I thought I'd bring it up for reference.

My comment here is not for performance considerations, it's for concision and 
related to your previous comment (on line 140): AColumn, AResultEntry, and 
AColumnValue all do approximately the same thing. I could make the fields 
nullable and use one Avro record for each. Pro: I have less generated classes. 
Con: the generated class I have is less task-specific. To be honest, since 
there are not a lot of Avro services out there, it's hard to say which is the 
best practice. I'm happy to take feedback but decided that being more verbose 
with my number of objects was better.


bq.  On 2010-06-09 19:24:25, Ryan Rawson wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 156
bq.   http://review.hbase.org/r/128/diff/3/?file=1191#file1191line156
bq.  
bq.   it would be nice to collapse AResultEntry and AColumnValue, they 
seem to be almost the same thing.

(see above comment)


bq.  On 2010-06-09 19:24:25, Ryan Rawson wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 268
bq.   http://review.hbase.org/r/128/diff/3/?file=1191#file1191line268
bq.  
bq.   these apis are good, but i'm wondering if you'd be open to a new 
experimental scanner API we have been interested in for the base RPC...
bq.   
bq.   essentially right now you need 3 RPC calls even to retrieve a small 
amount of data.  What would an API look like that lets you open, get rows and 
have implicit closes if you hit the end of the scan in your 'number of records' 
parameter?  We'd still have explicit closes for premature client-driven 
scan-terminations, but if your goal is to scan to the end, then why do an 
explicit close?  Also why not have the 'open' also start to return data?  The 
returned value would probably have to be a struct..
bq.   
bq.   This is more of an optional exercise, so if you dont feel the need, 
it's fine.

Yeah that would be nice; you could return (int scannerId, bytes[] row, 
resultScanner result). In the Python client, I don't expose open/close; the 
Python clients just scan.


bq.  On 2010-06-09 19:24:25, Ryan Rawson wrote:
bq.   trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 230
bq.   http://review.hbase.org/r/128/diff/3/?file=1191#file1191line230
bq.  
bq.   technically speaking getRowOrBefore() isnt a 'public' method, it is 
supposed to be mostly used for META support, and I think we are trending to 
'dont use for general purpose'.

Noted. I will remove the comment.


- Jeff


---
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/128/#review171
---





 new connector for Avro RPC access to HBase cluster
 --

 Key: HBASE-2400
 URL: https://issues.apache.org/jira/browse/HBASE-2400
 Project: HBase
  Issue Type: Task
  Components: avro
Reporter: Andrew Purtell
Priority: Minor
 Attachments: HBASE-2400-v0.patch


 Build a new connector contrib architecturally equivalent to the Thrift 
 connector, but using Avro serialization and associated transport and RPC 
 server work. Support AAA (audit, authentication, authorization). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-50) Snapshot of table

[jira] Resolved: (HBASE-2069) Use the new 'visible' length feature added by hdfs-814

[jira] Assigned: (HBASE-1025) Reconstruction log playback has no bounds on memory used

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster

11 matches

Site Navigation

Mail list logo

Footer information