Issues running a large MapReduce job over a complete HBase table

2010-12-06 Thread Gabriel Reid
Hi, We're currently running into issues with running a MapReduce job over a complete HBase table - we can't seem to find a balance between having dfs.datanode.max.xcievers set too low (and getting xceiverCount X exceeds the limit of concurrent xcievers) and getting OutOfMemoryErrors on datanodes.

Error while creating a table with compression enabled

2010-12-06 Thread Amandeep Khurana
The command I'm running on the shell: create 'table', {NAME='fam', COMPRESSION='GZ'} or create 'table', {NAME='fam', COMPRESSION='LZO'} Here's the error: ERROR: cannot convert instance of class org.jruby.RubyString to class org.apache.hadoop.hbase.io.hfile.Compression$Algorithm Any idea

asked for some examples for pagefilter,and array design

2010-12-06 Thread 梁景明
hi, is there any examples for pagefilter? here is my app case. person A and his friends id column value A firend:B B A firend:C C A firend:D D A firend:Z Z and i want to get A's friends from D to G as index from 2

Re: Command line integration question

2010-12-06 Thread Lars George
Hi Dmitriy, I think you sent this to the wrong list? You sent to hbase-user but this is a Mahout related question. Please check. Lars On Mon, Dec 6, 2010 at 12:17 AM, Dmitriy Lyubimov dlie...@gmail.com wrote: Dear all, I am testing the command line integration for the SSVD patch in hadoop

Re: something wrong with hbase mapreduce

2010-12-06 Thread Lars George
Understood. It would have been nice to know if this would have worked given the major compaction had completed. Just to check what the issue is. Setting values with absolute timestamps is always difficult, there are simply a few architectural issues that need to be handled before this works for

Re: Newbie question about scan filters

2010-12-06 Thread Lars George
Hi Jiajun, Sure, why not? What are you trying to achieve? Lars On Mon, Dec 6, 2010 at 3:19 AM, 陈加俊 cjjvict...@gmail.com wrote: Hi Do I can use the scan filter ? the HBase that we used is version 0.20.6. jiajun On Mon, Dec 6, 2010 at 2:05 AM, Lars George lars.geo...@gmail.com wrote: Hi

Re: Restricting insert/update in HBase

2010-12-06 Thread Lars George
Hi Hari, What you are asking for is transactions. I'd say try to avoid it. HBase can only guarantee atomicity on the row level. So if you want something across tables and rows then you need to use for example ZooKeeper to implement a transactional support system- There is also THBase, which gives

Re: blocked when creating HTable

2010-12-06 Thread Lars George
Hi Exception, For starters the logs say you are trying the wrong ZooKeeper node to get the HBase details (localhost) and you config has: property namehbase.zookeeper.quorum/name valuedev32/value /property property namehbase.zookeeper.quorum/name valuelocalhost/value

Re: Error while creating a table with compression enabled

2010-12-06 Thread Lars George
Hi AK, This issue? https://issues.apache.org/jira/browse/HBASE-3310 Lars On Mon, Dec 6, 2010 at 9:17 AM, Amandeep Khurana ama...@gmail.com wrote: The command I'm running on the shell: create 'table', {NAME='fam', COMPRESSION='GZ'} or create 'table', {NAME='fam', COMPRESSION='LZO'}

Re: Error while creating a table with compression enabled

2010-12-06 Thread Amandeep Khurana
Seems like it. Let me try the patch. -AK On Dec 6, 2010, at 12:36 PM, Lars George lars.geo...@gmail.com wrote: Hi AK, This issue? https://issues.apache.org/jira/browse/HBASE-3310 Lars On Mon, Dec 6, 2010 at 9:17 AM, Amandeep Khurana ama...@gmail.com wrote: The command I'm running on

Re: Issues running a large MapReduce job over a complete HBase table

2010-12-06 Thread Gabriel Reid
Hi Lars, All of the max heap sizes are left on their default values (ie 1000MB). The OOMEs that I encountered in the data nodes was only when I put the dfs.datanode.max.xcievers unrealistically high (8192) in an effort to escape the xceiverCount X exceeds the limit of concurrent xcievers errors.

row level transaction and synchronous replication?

2010-12-06 Thread Hiller, Dean (Contractor)
Does Hadoop support synchronous replication with a row level transaction? Ie. I want to update the entity and make sure the backup node of that data is updated upon return of my call to put that row level transactional entity. Thanks, Dean This message and any attachments are intended only

Re: row level transaction and synchronous replication?

2010-12-06 Thread Steven Noels
On Mon, Dec 6, 2010 at 4:15 PM, Hiller, Dean (Contractor) dean.hil...@broadridge.com wrote: Does Hadoop support synchronous replication with a row level transaction?  Ie. I want to update the entity and make sure the backup node of that data is updated upon return of my call to put that row

Re: Command line integration question

2010-12-06 Thread Dmitriy Lyubimov
Yes it was meant to be mahout's. Honest error, sorry. apologies for brevity. Sent from my android. -Dmitriy On Dec 6, 2010 3:08 AM, Lars George lars.geo...@gmail.com wrote:

Best Practices Adding Rows

2010-12-06 Thread Peter Haidinyak
Hi, I have to enter log data into HBase. We will need to query the data by Date:Hour I am using the 'Date|Hour|Incrementing Counter' as the Row Id. Is there an easy was to request the starting and stopping rows in a scan using some similar to 'like'? Scan 'T1', {STARTROW='like 20101201|14'}

Re: Best Practices Adding Rows

2010-12-06 Thread Todd Lipcon
Hi Peter, You can set the start row to '20101201|14' and the end row to '20101201|15' using the scanner API: http://archive.cloudera.com/cdh/3/hbase/apidocs/org/apache/hadoop/hbase/client/Scan.html#setStartRow(byte[])

Running Fully Distributed Operation in a single Machine

2010-12-06 Thread Thanh Do
Hi all, Is it possible to run fully distributed operation in a single machine, by playing around with config files and the port parameter? Any one ever try that? Thanks much Thanh

Re: Running Fully Distributed Operation in a single Machine

2010-12-06 Thread Stack
What J-D said. Here's some more help if you need it: https://hudson.apache.org/hudson/view/G-L/view/HBase/job/hbase-0.90/ws/trunk/target/site/notsoquick.html#d0e427 On Mon, Dec 6, 2010 at 9:36 AM, Jean-Daniel Cryans jdcry...@apache.org wrote: Just set hbase.cluster.distributed to true, point

Re: Importing to HBase from Java problem

2010-12-06 Thread Stack
That looks like a mismatch between client and server hbase versions. Ensure you have same running all over your cluster. Make sure you don't have a mix of 0.20.x and 0.89... or 0.90 release candidates. You seem to be feeling your way. Have you seen

Re: Scan performance in version 0.20.3

2010-12-06 Thread Stack
On Mon, Dec 6, 2010 at 5:44 AM, Lior Schachter li...@infolinks.com wrote: Hi all, I would like to speed up my scans and noticed these two methods on org.apache.hadoop.hbase.client.Scan: 1. setCacheBlocks This is whether we should add blocks to the server-side block cache as we scan (Follow

RE: Maps sharing a common table.

2010-12-06 Thread Michael Segel
Date: Mon, 6 Dec 2010 10:03:02 -0800 Subject: Re: Maps sharing a common table. From: jdcry...@apache.org To: user@hbase.apache.org You need to instantiate 1 HTable per Map task, then reuse it for every map() invocations. Sharing the actual object between JVMs isn't what you want to do.

RE: row level transaction and synchronous replication?

2010-12-06 Thread Hiller, Dean (Contractor)
Yes, I meant HBase. I am just getting into this so was confused and am clear now. Thanks, Dean -Original Message- From: Steven Noels [mailto:stev...@outerthought.org] Sent: Monday, December 06, 2010 9:01 AM To: user Subject: Re: row level transaction and synchronous replication? On

serialized objects as strings or as object? data corruption?

2010-12-06 Thread Hiller, Dean (Contractor)
Is there a good tool out there for serialization to hbase for a java entity? If I have an Account, and then have a ListActivities in the account, I preferably want to serialize that as all strings so data corruption issues can be fixed easier independent of the objects.or do I just create

Re: blocked when creating HTable

2010-12-06 Thread exception qin
Hi George, thanks for you reply and sorry for the silly mistake. I change the hbase-site.xml to this: configuration configuration property namehbase.cluster.distributed/name valuetrue/value /property property namehbase.rootdir/name

Re: asked for some examples for pagefilter,and array design

2010-12-06 Thread 梁景明
thanks very much . in the example ColumnPaginationFilter can do what i needed. the second question is to make one match many relationship as an arraylist. the operation of the arraylist included previous(),next(),get(index), data(from x to y) ,size() is what i want. so my arraylist like this.

Re: How to access all versions of a particular column?

2010-12-06 Thread Stack
You have to 'Get' all or N versions. See how http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/client/Scan.html and http://hbase.apache.org/docs/r0.89.20100924/apidocs/org/apache/hadoop/hbase/client/Get.html allow you stipulate how many versions to return (You can also

Re: Issues running a large MapReduce job over a complete HBase table

2010-12-06 Thread Gabriel Reid
Hi St.Ack, The cluster is a set of 5 machines, each with 3GB of RAM and 1TB of storage. One machine is doing duty as Namenode, HBase Master, HBase Regionserver, Datanode, Job Tracker, and Task Tracker, while the other four are all Datanodes, Regionservers, and Task Trackers. I have a similar

Re: How to access all versions of a particular column?

2010-12-06 Thread Ryan Rawson
We try to be optimal while scanning through large rows with many version that are not wanted, but nothing is as optimal as only storing minimal data and retrieving that. We'd love to hear about any benchmarks or speed runs you have. Regards, -ryan On Mon, Dec 6, 2010 at 11:15 PM, Hari Sreekumar