[jira] Commented: (HBASE-50) Snapshot of table
[ https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877121#action_12877121 ] Jean-Daniel Cryans commented on HBASE-50: - bq. Yes..That sounds good. I will implement another LogCleanerDelegate, say ReferenceLogCleaner or SnapshotLogCleaner. Latter. Some refactoring could be done on how to chain multiple delegates without doing a bunch of ifs in the code. Could be in the scope of another jira. bq. Do you archive any other files besides log files, say HFiles? AFAIK, no. Snapshot of table - Key: HBASE-50 URL: https://issues.apache.org/jira/browse/HBASE-50 Project: HBase Issue Type: New Feature Reporter: Billy Pearson Assignee: Li Chongxin Priority: Minor Attachments: HBase Snapshot Design Report V2.pdf, snapshot-src.zip Havening an option to take a snapshot of a table would be vary useful in production. What I would like to see this option do is do a merge of all the data into one or more files stored in the same folder on the dfs. This way we could save data in case of a software bug in hadoop or user code. The other advantage would be to be able to export a table to multi locations. Say I had a read_only table that must be online. I could take a snapshot of it when needed and export it to a separate data center and have it loaded there and then i would have it online at multi data centers for load balancing and failover. I understand that hadoop takes the need out of havening backup to protect from failed servers, but this does not protect use from software bugs that might delete or alter data in ways we did not plan. We should have a way we can roll back a dataset. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HBASE-2069) Use the new 'visible' length feature added by hdfs-814
[ https://issues.apache.org/jira/browse/HBASE-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray resolved HBASE-2069. -- Resolution: Fixed Fixed by HDFS append support Use the new 'visible' length feature added by hdfs-814 -- Key: HBASE-2069 URL: https://issues.apache.org/jira/browse/HBASE-2069 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.21.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HBASE-1025) Reconstruction log playback has no bounds on memory used
[ https://issues.apache.org/jira/browse/HBASE-1025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Gray reassigned HBASE-1025: Assignee: Kannan Muthukkaruppan Reconstruction log playback has no bounds on memory used Key: HBASE-1025 URL: https://issues.apache.org/jira/browse/HBASE-1025 Project: HBase Issue Type: Bug Reporter: stack Assignee: Kannan Muthukkaruppan Fix For: 0.21.0 Makes a TreeMap and just keeps adding edits without regard for size of edits applied; could cause OOME (I've not seen a definitive case though have seen an OOME around time of a reconstructionlog replay -- perhaps this the straw that broke the fleas antlers?) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients
[ https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877244#action_12877244 ] HBase Review Board commented on HBASE-2468: --- Message from: Mingjie Lai mjla...@gmail.com bq. On 2010-06-07 14:23:42, stack wrote: bq. src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java, line 96 bq. http://review.hbase.org/r/98/diff/5/?file=944#file944line96 bq. bq. getRowOrBefore is an expensive call. Are we sure we are not calling this too often? I agree it is an expensive call. However I don't think it would bring any performance penalty for existing and potential use cases: Use case 1 -- existing MetaScanner users: since this method is newly added, existing users won't be affected; Use case 2 -- hbase clients when locating a region : 1) if prefetch is on, it calls this MetaScanner with [table + row combination], which calls getRowOrBefore() to get current region info, then number of following regions from meta. After that, the client can get the region info directly from cache. 2) if prefetch is disabled (current behavior), it eventually calls similar method getClosestRowBefore() to get desired region. So no matter prefetch is on or not, getRowOrBefore(or getClosestRowBefore) eventually is called. The only difference is whether to scan following regions from meta or not. For future MetaScanner users which scan from one region with desired use table row, it has to take the effort since it is the expected behavior. - Mingjie --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/98/#review144 --- Improvements to prewarm META cache on clients - Key: HBASE-2468 URL: https://issues.apache.org/jira/browse/HBASE-2468 Project: HBase Issue Type: Improvement Components: client Reporter: Todd Lipcon Assignee: Mingjie Lai Fix For: 0.21.0 Attachments: HBASE-2468-trunk.patch A couple different use cases cause storms of reads to META during startup. For example, a large MR job will cause each map task to hit meta since it starts with an empty cache. A couple possible improvements have been proposed: - MR jobs could ship a copy of META for the table in the DistributedCache - Clients could prewarm cache by doing a large scan of all the meta for the table instead of random reads for each miss - Each miss could fetch ahead some number of rows in META -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients
[ https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877251#action_12877251 ] HBase Review Board commented on HBASE-2468: --- Message from: Mingjie Lai mjla...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/98/ --- (Updated 2010-06-09 15:50:59.084657) Review request for hbase, Todd Lipcon and stack. Changes --- @St^ack: please see my comments for your feedback regarding getRowOrBefore() issue. Invite Todd as a reviewer. Summary --- HBASE-2468: Improvements to prewarm META cache on clients. Changes: 1. Add new HTable methods which support region info de/serialation, and region cache prewarm: - void serializeRegionInfo(): clients could perform a large scan for all the meta for the table, serialize the meta to a file. MR job can ship a copy of the meta for the table in the DistributedCache - MapHRegionInfo, HServerAddress deserializeRegionInfo(): MR job can deserialize the region info from the DistributedCache - prewarmRegionCache(MapHRegionInfo, HServerAddress regionMap): MR job can prewarm local region cache by the deserialized region info. 2. For each client, each region cache read-miss could trigger read-ahead some number of rows in META. This option could be turned on and off for one particular table. This addresses bug HBASE-2468. http://issues.apache.org/jira/browse/HBASE-2468 Diffs - src/main/java/org/apache/hadoop/hbase/client/HConnection.java 853164d src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java ed18092 src/main/java/org/apache/hadoop/hbase/client/HTable.java 7ec95cb src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java d3a0c07 src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java 95e494a Diff: http://review.hbase.org/r/98/diff Testing --- Unit tests passed locally for me. Thanks, Mingjie Improvements to prewarm META cache on clients - Key: HBASE-2468 URL: https://issues.apache.org/jira/browse/HBASE-2468 Project: HBase Issue Type: Improvement Components: client Reporter: Todd Lipcon Assignee: Mingjie Lai Fix For: 0.21.0 Attachments: HBASE-2468-trunk.patch A couple different use cases cause storms of reads to META during startup. For example, a large MR job will cause each map task to hit meta since it starts with an empty cache. A couple possible improvements have been proposed: - MR jobs could ship a copy of META for the table in the DistributedCache - Clients could prewarm cache by doing a large scan of all the meta for the table instead of random reads for each miss - Each miss could fetch ahead some number of rows in META -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2468) Improvements to prewarm META cache on clients
[ https://issues.apache.org/jira/browse/HBASE-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877281#action_12877281 ] HBase Review Board commented on HBASE-2468: --- Message from: Todd Lipcon t...@cloudera.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/98/#review165 --- Looking good! Just a few notes. src/main/java/org/apache/hadoop/hbase/client/HConnection.java http://review.hbase.org/r/98/#comment813 I thought we were collapsing these two calls into setRegionCachePrefetchEnabled(tableName, enabled)? src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java http://review.hbase.org/r/98/#comment816 I don't entirely understand why we key these hashes by integer, but it seems like you're following the status quo, so doesn't need to be addressed in this patch. src/main/java/org/apache/hadoop/hbase/client/HTable.java http://review.hbase.org/r/98/#comment822 I still don't quite understand the logic about why these should be static. Previously you pointed to the enable/disable calls, but those are more like admin calls, not calls that affect client behavior. Anyone else have an opinion? src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java http://review.hbase.org/r/98/#comment823 I think this should be Math.max(rowLimit, configuration.getInt(...)) - if we only want to scan 5 rows, we don't want the scanner to prefetch 100 for us. - Todd Improvements to prewarm META cache on clients - Key: HBASE-2468 URL: https://issues.apache.org/jira/browse/HBASE-2468 Project: HBase Issue Type: Improvement Components: client Reporter: Todd Lipcon Assignee: Mingjie Lai Fix For: 0.21.0 Attachments: HBASE-2468-trunk.patch A couple different use cases cause storms of reads to META during startup. For example, a large MR job will cause each map task to hit meta since it starts with an empty cache. A couple possible improvements have been proposed: - MR jobs could ship a copy of META for the table in the DistributedCache - Clients could prewarm cache by doing a large scan of all the meta for the table instead of random reads for each miss - Each miss could fetch ahead some number of rows in META -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877280#action_12877280 ] HBase Review Board commented on HBASE-2400: --- Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 111 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line111 bq. bq. do we need to make these fields nullable? usually they are true/false in the java code. bq. bq. Is this some semi-mechanical translation from a java api? I use the same Avro record for table creation and modification as well as description. For create table, I want the fields to be nullable because the user should not have to specify a value. bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 94 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line94 bq. bq. the compression can never be null, because the NONE is the catch all here. Same as below: I use the same record for family creation, modification, and description. Avro currently doesn't have default values on write, so making this field nullable means we can do smart things if the user doesn't specify a compression algorithm during Family creation. bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 78 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line78 bq. bq. same as deadServerNames. Yeah I should make these 0-length arrays. bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 73 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line73 bq. bq. couldnt you use a empty string if there are no dead server names? im not sure if arrays can be 0 length in avro :-) Will make a 0-length array bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 66 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line66 bq. bq. technically the serverName is the serverAddress + startCode... in the Java code is isnt fully exposed. Not sure what we want to do here, but this is probably fine as is. Yeah since Avro records don't have methods, you can think of this field as a materialization of the Java logic. bq. On 2010-06-09 17:54:20, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 34 bq. http://review.hbase.org/r/128/diff/2/?file=1157#file1157line34 bq. bq. you can probably just use 'hostname' and 'port'. There was a recent patch in trunk that is attempting to get rid of IP addresses (they cause issues when they dont align with DNS names, etc) and generally move us to a DNS name world. Let me know what you want me to do here. I was just copying the fields directly from the Java objects. - Jeff --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/128/#review164 --- new connector for Avro RPC access to HBase cluster -- Key: HBASE-2400 URL: https://issues.apache.org/jira/browse/HBASE-2400 Project: HBase Issue Type: Task Components: avro Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-2400-v0.patch Build a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. Support AAA (audit, authentication, authorization). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877283#action_12877283 ] HBase Review Board commented on HBASE-2400: --- Message from: Ryan Rawson ryano...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/128/#review168 --- trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment824 in this case we have to distinguish between 'give me family X' and 'give me family X 0 length qualifier' which are in fact different queries and are both representable in the standard Get Java API. the java code does this by using a map of a map in the Get object: Mapbyte[], Setbyte[] familyMap; where the key is the family, and the value is the set of qualifiers for said family. If you want to get a family the code will use 'null' as the Set value. For the Avro API we don't have to do it in the same way, but we need to know the difference between those queries. perhaps using AColumn 'family = foo, qualifier=null' can be the 'give me the family' and 'family = foo, qualifier = 0 length bytes' can be the other? - Ryan new connector for Avro RPC access to HBase cluster -- Key: HBASE-2400 URL: https://issues.apache.org/jira/browse/HBASE-2400 Project: HBase Issue Type: Task Components: avro Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-2400-v0.patch Build a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. Support AAA (audit, authentication, authorization). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877285#action_12877285 ] HBase Review Board commented on HBASE-2400: --- Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/128/ --- (Updated 2010-06-09 18:22:12.245370) Review request for hbase. Changes --- Addressed Ryan's requests Summary --- Initial patch; some javadoc and tests missing, but I wanted to get some initial feedback on the approach. My apologies for sticking a patch on the JIRA before the review. I should have read further on the HowToContribute JIRA. This addresses bug HBASE-2400. Diffs (updated) - trunk/bin/hbase 953193 trunk/pom.xml 953193 trunk/src/main/java/org/apache/hadoop/hbase/avro/AvroServer.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/AvroUtil.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AAlreadyExists.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AClusterStatus.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumn.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumnFamilyDescriptor.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AColumnValue.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ACompressionAlgorithm.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ADelete.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AFamilyDescriptor.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AGet.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AIOError.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AIllegalArgument.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AMasterNotRunning.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/APut.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ARegionLoad.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AResult.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AResultEntry.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AScan.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerAddress.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerInfo.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/AServerLoad.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATableDescriptor.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATableExists.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/ATimeRange.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/HBase.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/IOError.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/generated/TCell.java PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro PRE-CREATION trunk/src/main/java/org/apache/hadoop/hbase/avro/package.html PRE-CREATION trunk/src/test/java/org/apache/hadoop/hbase/avro/TestAvroServer.java PRE-CREATION Diff: http://review.hbase.org/r/128/diff Testing --- Thanks, Jeff new connector for Avro RPC access to HBase cluster -- Key: HBASE-2400 URL: https://issues.apache.org/jira/browse/HBASE-2400 Project: HBase Issue Type: Task Components: avro Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-2400-v0.patch Build a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. Support AAA (audit, authentication, authorization). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877294#action_12877294 ] HBase Review Board commented on HBASE-2400: --- Message from: Ryan Rawson ryano...@gmail.com --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/128/#review171 --- trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr http://review.hbase.org/r/128/#comment832 if this is generated from the genavro, is it possible to get a maven rule to generate this? Or is that not ready yet? trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment827 does it make sense to reuse AColumn here? trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment828 The Java API gets its speed by essentially taking a Result which is an array of KeyValue, which are just byte arrays and serializing it all as one large array. On the client side, the client reads the entire array then builds the KeyValues that provide a view onto this one array. I don't know how this performance improvement could be done in this avro interface, but I thought I'd bring it up for reference. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment829 it would be nice to collapse AResultEntry and AColumnValue, they seem to be almost the same thing. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment830 technically speaking getRowOrBefore() isnt a 'public' method, it is supposed to be mostly used for META support, and I think we are trending to 'dont use for general purpose'. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro http://review.hbase.org/r/128/#comment831 these apis are good, but i'm wondering if you'd be open to a new experimental scanner API we have been interested in for the base RPC... essentially right now you need 3 RPC calls even to retrieve a small amount of data. What would an API look like that lets you open, get rows and have implicit closes if you hit the end of the scan in your 'number of records' parameter? We'd still have explicit closes for premature client-driven scan-terminations, but if your goal is to scan to the end, then why do an explicit close? Also why not have the 'open' also start to return data? The returned value would probably have to be a struct.. This is more of an optional exercise, so if you dont feel the need, it's fine. - Ryan new connector for Avro RPC access to HBase cluster -- Key: HBASE-2400 URL: https://issues.apache.org/jira/browse/HBASE-2400 Project: HBase Issue Type: Task Components: avro Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-2400-v0.patch Build a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. Support AAA (audit, authentication, authorization). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HBASE-2400) new connector for Avro RPC access to HBase cluster
[ https://issues.apache.org/jira/browse/HBASE-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877296#action_12877296 ] HBase Review Board commented on HBASE-2400: --- Message from: Jeff Hammerbacher jeff.hammerbac...@gmail.com bq. On 2010-06-09 19:24:25, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.avpr, line 1 bq. http://review.hbase.org/r/128/diff/3/?file=1190#file1190line1 bq. bq. if this is generated from the genavro, is it possible to get a maven rule to generate this? Or is that not ready yet? Yes, this should definitely be done during the build. See https://issues.apache.org/jira/browse/AVRO-572. bq. On 2010-06-09 19:24:25, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 155 bq. http://review.hbase.org/r/128/diff/3/?file=1191#file1191line155 bq. bq. The Java API gets its speed by essentially taking a Result which is an array of KeyValue, which are just byte arrays and serializing it all as one large array. On the client side, the client reads the entire array then builds the KeyValues that provide a view onto this one array. bq. bq. I don't know how this performance improvement could be done in this avro interface, but I thought I'd bring it up for reference. My comment here is not for performance considerations, it's for concision and related to your previous comment (on line 140): AColumn, AResultEntry, and AColumnValue all do approximately the same thing. I could make the fields nullable and use one Avro record for each. Pro: I have less generated classes. Con: the generated class I have is less task-specific. To be honest, since there are not a lot of Avro services out there, it's hard to say which is the best practice. I'm happy to take feedback but decided that being more verbose with my number of objects was better. bq. On 2010-06-09 19:24:25, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 156 bq. http://review.hbase.org/r/128/diff/3/?file=1191#file1191line156 bq. bq. it would be nice to collapse AResultEntry and AColumnValue, they seem to be almost the same thing. (see above comment) bq. On 2010-06-09 19:24:25, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 268 bq. http://review.hbase.org/r/128/diff/3/?file=1191#file1191line268 bq. bq. these apis are good, but i'm wondering if you'd be open to a new experimental scanner API we have been interested in for the base RPC... bq. bq. essentially right now you need 3 RPC calls even to retrieve a small amount of data. What would an API look like that lets you open, get rows and have implicit closes if you hit the end of the scan in your 'number of records' parameter? We'd still have explicit closes for premature client-driven scan-terminations, but if your goal is to scan to the end, then why do an explicit close? Also why not have the 'open' also start to return data? The returned value would probably have to be a struct.. bq. bq. This is more of an optional exercise, so if you dont feel the need, it's fine. Yeah that would be nice; you could return (int scannerId, bytes[] row, resultScanner result). In the Python client, I don't expose open/close; the Python clients just scan. bq. On 2010-06-09 19:24:25, Ryan Rawson wrote: bq. trunk/src/main/java/org/apache/hadoop/hbase/avro/hbase.genavro, line 230 bq. http://review.hbase.org/r/128/diff/3/?file=1191#file1191line230 bq. bq. technically speaking getRowOrBefore() isnt a 'public' method, it is supposed to be mostly used for META support, and I think we are trending to 'dont use for general purpose'. Noted. I will remove the comment. - Jeff --- This is an automatically generated e-mail. To reply, visit: http://review.hbase.org/r/128/#review171 --- new connector for Avro RPC access to HBase cluster -- Key: HBASE-2400 URL: https://issues.apache.org/jira/browse/HBASE-2400 Project: HBase Issue Type: Task Components: avro Reporter: Andrew Purtell Priority: Minor Attachments: HBASE-2400-v0.patch Build a new connector contrib architecturally equivalent to the Thrift connector, but using Avro serialization and associated transport and RPC server work. Support AAA (audit, authentication, authorization). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.