Re: Expert suggestion needed to create table in Hbase - Banking

2012-11-28 Thread Nick Dimiduk
Hi Rams, Can you explain in more detail how you will be accessing this data? Thanks, Nick On Tue, Nov 27, 2012 at 8:44 PM, Ramasubramanian ramasubramanian.naraya...@gmail.com wrote: Hi, Thanks!! Can someone help in suggesting what is the best rowkey that we can use in this scenario.

Re: Aggregation while Bulk Loading into HBase

2012-11-28 Thread Nick Dimiduk
Why don't you aggregate these data in a preprocessing step... like a map-reduce job? You can then load the output of that work directly into HBase. -n On Wed, Nov 28, 2012 at 5:37 AM, Narayanan K knarayana...@gmail.com wrote: Hi all, I have a scenario where I need to do aggregation while

Re: Using doubles and longs as ordering row values

2012-11-29 Thread Nick Dimiduk
I've used orderly a little, it works pretty well. There are some edge cases, particularly around null values. I'm not sure of the upstream status either. Check my git log; I haven't done much to this library. It is generally quite useful, so I don't mind maintaining it if you have patches for bug

Re: Using doubles and longs as ordering row values

2012-11-29 Thread Nick Dimiduk
On Thu, Nov 29, 2012 at 3:00 PM, David Koch ogd...@googlemail.com wrote: I am having a similar issue, only I need to preserve the order of qualifiers which are serialized signed longs - rather than row keys. Orderly is not rowkey specific. You can use it's serialization anywhere. -n

Re: Reg:delete performance on HBase table

2012-12-05 Thread Nick Dimiduk
On Wed, Dec 5, 2012 at 7:46 AM, Doug Meil doug.m...@explorysmedical.comwrote: You probably want to read this section on the RefGuide about deleting from HBase. http://hbase.apache.org/book.html#perf.deleting So hold on. From the guide: 11.9.2. Delete RPC Behavior Be aware that

Re: loss znode

2012-12-13 Thread Nick Dimiduk
Hi Christophe, You need to update all of your configuration, as per the pseudo-distributed instructions [0]. Specifically, I believe you're missing the hbase.cluster.distributed property. -n [0]: http://hbase.apache.org/book/standalone_dist.html#pseudo On Thu, Dec 13, 2012 at 5:09 AM, Zbierski

Re: Roll of hbase.tmp.dir in HBase

2012-12-17 Thread Nick Dimiduk
This directory is used by the RegionServers durring compactions to store intermediate data. See: $ git grep 'hbase.tmp.dir' hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/CompactionTool.java: private final static String CONF_TMP_DIR = hbase.tmp.dir;

Re: Roll of hbase.tmp.dir in HBase

2012-12-17 Thread Nick Dimiduk
On Mon, Dec 17, 2012 at 5:20 PM, anil gupta anilgupt...@gmail.com wrote: @Nick: I am using HBase 0.92.1, CompactionTool.java is part of HBase 0.96 as per https://issues.apache.org/jira/browse/HBASE-7253. Fair enough; I grepped against trunk. I have 10 disks on my slave node that will

Re: HBase 0.94 security configurations

2012-12-18 Thread Nick Dimiduk
Are you using secure HBase? Don't -- it'll only get in the way for a simple example. Is the master running? Be sure to run ./bin/start-hbase.sh from the directory where you unpacked the tgz. You can omit the conf.set(...) business from your code. By default, the configuration will point to local

Re: HBase - Secondary Index

2012-12-18 Thread Nick Dimiduk
Hi Anoop, Your presentation has garnered quite a bit of community interest. Have you considered providing your implementation to the community, perhaps in an HBase-contrib module? Thanks, Nick On Tue, Dec 4, 2012 at 12:10 AM, Anoop Sam John anoo...@huawei.com wrote: Hi All Last

Re: HBase 0.94 security configurations

2012-12-18 Thread Nick Dimiduk
, Nick Dimiduk ndimi...@gmail.com wrote: Are you using secure HBase? Don't -- it'll only get in the way for a simple example. Is the master running? Be sure to run ./bin/start-hbase.sh from the directory where you unpacked the tgz. You can omit the conf.set(...) business from your code

Re: HBase 0.94 security configurations

2012-12-18 Thread Nick Dimiduk
: [note]]http://www.quotationspage.com/quote/2771.html#note *Calvin Coolidge* On Tue, Dec 18, 2012 at 3:19 AM, Nick Dimiduk ndimi...@gmail.com wrote: Are you using secure HBase? Don't -- it'll only get in the way for a simple example. Is the master running? Be sure to run ./bin

Re: Is it necessary to set MD5 on rowkey?

2012-12-19 Thread Nick Dimiduk
On Wed, Dec 19, 2012 at 1:26 PM, David Arthur mum...@gmail.com wrote: Let's say you want to decompose a url into domain and path to include in your row key. You could of course just use the url as the key, but you will see hotspotting since most will start with http. Doesn't the original

Re: Drivers for ADO.net

2013-01-14 Thread Nick Dimiduk
Hi Prabhjot, Right now, your only choices for HBase directly from a .NET client are the Thrift or REST gateways. You can find documentation about those interfaces here [0] and here [1], respectively. Best of luck, -n [0]: http://hbase.apache.org/book/thrift.html [1]:

Re: MapReduce job over HBase fails

2013-02-26 Thread Nick Dimiduk
Hi Bhushan, Yes, MapReduce is supported over HBase for all releases in recent memory. Please provide the relevant stacktrace of the error you're seeing. How are you interacting with HBase from MapReduce -- for online reading, writing, I recommend you make use of TableMapper, TableReducer. Be

Re: Map Reduce with multiple scans

2013-02-26 Thread Nick Dimiduk
Hi Paul, You want to run multiple scans so that you can filter the previous scan results? Am I correct in my understanding of your objective? First, I suggest you use the PrefixFilter [0] instead of constructing the rowkey prefix manually. This looks something like: byte[] md5Key = Utils.md5(

Re: HBase type support

2013-03-15 Thread Nick Dimiduk
Nick, As an HBase user I would welcome this addition. In addition to the proposed list of datatypes A UUID/GUID type would also be nice to have. Regards, /David On Wed, Mar 13, 2013 at 5:42 PM, Nick Dimiduk ndimi...@gmail.com wrote: Hi all, I'd like to draw your attention to HBASE-8089

Re: HBase type support

2013-03-15 Thread Nick Dimiduk
I'm talking about MD5, SHA1, etc. It's something explicitly mentioned in HBASE-7221. On Fri, Mar 15, 2013 at 10:55 AM, James Taylor jtay...@salesforce.comwrote: Hi Nick, What do you mean by hashing algorithms? Thanks, James On 03/15/2013 10:11 AM, Nick Dimiduk wrote: Hi David, Native

Re: HBase type support

2013-03-19 Thread Nick Dimiduk
no problem with shipping with support for some hashing strategies if users demand, but I don't think it's a design approach we should encourage. Thanks, Nick From: Nick Dimiduk ndimi...@gmail.com To: user@hbase.apache.org Sent: Friday, March 15, 2013 10:57 AM

Re: HBase type support

2013-03-19 Thread Nick Dimiduk
is a set of utilities that take the burdon of correct serialization off of user code. I request that you please read the proposal in its entirety before commenting further. Thanks, Nick On Mar 15, 2013, at 10:06 AM, Nick Dimiduk ndimi...@gmail.com wrote: On Fri, Mar 15, 2013 at 5:25 AM, Michel

Re: HBase type support

2013-03-19 Thread Nick Dimiduk
, just my $0.02. -- Lars From: Nick Dimiduk ndimi...@gmail.com To: user@hbase.apache.org Sent: Friday, March 15, 2013 10:57 AM Subject: Re: HBase type support I'm talking about MD5, SHA1, etc. It's something explicitly mentioned in HBASE-7221

Re: HBase type support

2013-03-19 Thread Nick Dimiduk
, but can be used to implement them. As usual, just my $0.02. -- Lars From: Nick Dimiduk ndimi...@gmail.com To: user@hbase.apache.org Sent: Friday, March 15, 2013 10:57 AM Subject: Re: HBase type support I'm talking about MD5, SHA1, etc

HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
Heya, Thinking about data types and serialization. I think null support is an important characteristic for the serialized representations, especially when considering the compound type. However, doing so in directly incompatible with fixed-width representations for numerics. For instance, if we

Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
I'd rather see something like losing MIN_VALUE and keeping fixed width. On 4/1/13 2:00 PM, Nick Dimiduk ndimi...@gmail.com wrote: Heya, Thinking about data types and serialization. I think null support is an important characteristic for the serialized representations

Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
. That is, you'd rather be able to represent NULL than -2^31? On 04/01/2013 01:32 PM, Nick Dimiduk wrote: Thanks for the thoughtful response (and code!). I'm thinking I will press forward with a base implementation that does not support nulls. The idea is to provide an extensible set

Re: HBase Types: Explicit Null Support

2013-04-01 Thread Nick Dimiduk
Furthermore, is is more important to support null values than squeeze all representations into minimum size (4-bytes for int32, c.)? On Apr 1, 2013 4:41 PM, Nick Dimiduk ndimi...@gmail.com wrote: On Mon, Apr 1, 2013 at 4:31 PM, James Taylor jtay...@salesforce.comwrote: From the SQL perspective

Re: HBase Types: Explicit Null Support

2013-04-02 Thread Nick Dimiduk
as allowing SQL semantics. On Mon, Apr 1, 2013 at 7:26 PM, Nick Dimiduk ndimi...@gmail.com wrote: Furthermore, is is more important to support null values than squeeze all representations into minimum size (4-bytes for int32, c.)? On Apr 1, 2013 4:41 PM, Nick Dimiduk ndimi...@gmail.com wrote

Re: Adding String offset for ColumnPaginationFilter

2013-04-04 Thread Nick Dimiduk
+1 Wouldn't offset be a family:qualifier instead of a String? Please consider adding two interfaces: a version which exposes the state externally (as you've described) and another that encapsulates the state handling on the user's behalf. The former is useful for exposing over stateless

Re: HBase Types: Explicit Null Support

2013-04-04 Thread Nick Dimiduk
the prevalence of NULL in SQL. Thanks, Nick On Tue, Apr 2, 2013 at 9:40 AM, Nick Dimiduk ndimi...@gmail.com wrote: I agree that a user-extensible interface is a required feature here. Personally, I'd love to ship a set of standard GIS tools on HBase. Let's keep in mind, though, that SQL

Re: HBase bulk load through co-processors

2012-07-06 Thread Nick Dimiduk
Sever, I presume you're loading your data via online Puts via the MR job (as opposed to generating HFiles). What are you hoping to gain from a coprocessor implementation vs the 6 MR jobs? Have you pre-split your tables? Can the RegionServer(s) handle all the concurrent mappers? -n On Mon, Jul

Region number and allocation advice

2012-07-06 Thread Nick Dimiduk
Heya, I'm looking for more detailed advice about how many regions a table should run. Disabling automatic splits (often hand-in-hand with disabling automatic compactions) is often described as advanced practice, at least when guaranteeing latency SLAs. Which begs the question: how many regions

Re: HBase bulk load through co-processors

2012-07-09 Thread Nick Dimiduk
other? In this case, the coprocessor in your RS is acting like any other HBase client. Puts will write from the coproc to the target RS like a normal write. That is, of course, assuming I understand your implementation. -n On Fri, Jul 6, 2012 at 10:16 PM, Nick Dimiduk ndimi...@gmail.com wrote

Re: some problem about HBase index

2013-06-14 Thread Nick Dimiduk
Please help us understand. My read of your question is that you want to query for students in two ways: - get the student with name 'jean' - get the student with id 12 With the rowkey as student name, only the first query is possible with a simple GET. Likewise, with the rowkey as student id,

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

2013-07-02 Thread Nick Dimiduk
Hi Joarder, I think you're slightly confused about the impact of using a hashed (or sometimes called salted) prefix for your rowkeys. This strategy for rowkey design has an impact on the logical ordering of your data, not necessarily the physical distribution of your data. In HBase, these are

Re: 60000 millis timeout while waiting for channel to be ready for read

2013-07-29 Thread Nick Dimiduk
Hi Shapoor, Moving the conversation to the users list. Have you solved your issue? Sorry you haven't gotten a response sooner -- I think everyone is working overtime to get 0.96 released. I'm assuming each put is independent of the others. You're not putting 100mm times to the same row, are

Re: Chocolatey package for Windows

2013-08-20 Thread Nick Dimiduk
Hi Andrew, I don't think the homebrew recipes are managed by an HBase developer. Rather, someone in the community has taken it upon themselves to provide the project through brew. Likewise, the Apache HBase project does not provide RPM or DEB packages, but you're likely to find them if you look

Re: Please welcome our newest committer, Nick Dimiduk

2013-09-10 Thread Nick Dimiduk
Thank you everyone! On Tue, Sep 10, 2013 at 3:54 PM, Enis Söztutar e...@apache.org wrote: Hi, Please join me in welcoming Nick as our new addition to the list of committers. Nick is exceptionally good with user-facing issues, and has done major contributions in mapreduce related areas,

Re: Please welcome our newest committer, Rajeshbabu Chintaguntla

2013-09-11 Thread Nick Dimiduk
Nice one Rajesh! On Wed, Sep 11, 2013 at 9:17 AM, ramkrishna vasudevan ramkrishna.s.vasude...@gmail.com wrote: Hi All, Please join me in welcoming Rajeshbabu (Rajesh) as our new HBase committer. Rajesh has been there for more than a year and has been solving some very good bugs around the

Re: Frequent fail of bulkload

2013-09-18 Thread Nick Dimiduk
Did you ever find a resolution to this issue? Thanks, Nick On Thu, Apr 4, 2013 at 12:56 AM, vbogdanovsky vbogdanov...@griddynamics.com wrote: I have hfiles after MR-job and when I import them to my table I often get Exceptions like this: ==**==

Attention users of the REST Gateway

2013-09-27 Thread Nick Dimiduk
Hi there, Are you using the REST Gateway with JSON serialization? How are you forming your queries? Do you use Jersey's mapped notation (with the '@' prepended to attribute names)? Please have a look at the recent comments [0] on HBASE-9435 and weigh in. Thanks! Nick [0]:

Re: Bulk load from OSGi running client

2013-09-27 Thread Nick Dimiduk
Hi Amit, Would you be able to open a ticket summarizing your findings? Can you provide a sample project that demonstrates the behavior you're seeing? We could use that to provide a fix and, I hope, some kind of unit or integration test. Thanks, Nick On Sun, Sep 22, 2013 at 6:10 AM, Amit Sela

Re: exportSnapshot MISSING_FILES

2013-10-10 Thread Nick Dimiduk
On Tue, Oct 8, 2013 at 6:35 AM, Adrian Sandulescu sandulescu.adr...@gmail.com wrote: Imports work great, but only when using the s3n:// protocol (which means and HFile limit of 5GB). Are you using Apache Hadoop or an EMR build? From what I recall, EMR ships a customized s3n implementation

Re: Spatial data posting in HBase

2013-10-10 Thread Nick Dimiduk
Hi there, Just like other data modeling questions in HBase, how to store your spatial data will depend on how you want to access it. Are you focused on update performance or read queries? Are you accessing data based on point-in-space (ie, match lng,lat within a level of accuracy), spatial extent

Re: exportSnapshot MISSING_FILES

2013-10-11 Thread Nick Dimiduk
On Fri, Oct 11, 2013 at 5:27 AM, Adrian Sandulescu sandulescu.adr...@gmail.com wrote: It's Apache Hadoop, but I see multi-part upload is in the works for this as well. https://issues.apache.org/jira/browse/HADOOP-9454 I didn't know about this ticket. That's a very good thing to have in

Re: Spatial data posting in HBase

2013-10-12 Thread Nick Dimiduk
You can treat a geohash of a fixed precision as a tile and calculate the neighbors of that tile. This is precisely what I did in the chapter in HBaseIA. In that way, it's no different than a tile system. On Sat, Oct 12, 2013 at 11:33 AM, Michael Segel michael_se...@hotmail.comwrote: Adrien,

Re: Thread local readpoint would be dropped from MultiVersionConsistencyControl

2013-10-19 Thread Nick Dimiduk
+user On Saturday, October 19, 2013, Ted Yu wrote: Hi, In HBASE-9754 , there is refactoring which drops perThreadReadPoint from MultiVersionConsistencyControl. There're two reasons for this: 1. Using ThreadLocal in hot path affects performance 2. The readpoint is passed from

Re: TableMapReduceUtil, multiple scan objects, how to identify originating scan object in mapper

2013-10-21 Thread Nick Dimiduk
Hi Jim, I don't see an obvious way to gain access to this information. If you don't find a clever way to get at this, would you mind opening a ticket for this feature request? Thanks, Nick On Mon, Oct 21, 2013 at 9:44 AM, Jim Holloway jim.hollo...@windstream.netwrote: Hello, I’m using the

Re: Hbase - RegionServers used

2013-11-11 Thread Nick Dimiduk
dev to bcc. Hi Andrea, Welcome to HBase! You'll have a larger pool of people to answer your questions if you take them first to the user mailing list. I'm a student in computer science and I'm trying to use Hbase above an HDFS to show some performance of the system. I'd like to know how works

Re: control HBase stop / start from supervisord

2014-01-07 Thread Nick Dimiduk
Moving to HBase-user Hi Mathan, Have a look at hbase-daemon.sh in $HBASE_HOME/bin. It wraps up the details necessary for building the java command used to start an HBase process (it also does some other house-cleaning like log rotation, which you probably also want). Use it to start and stop

Re: Off-heap block cache fails in 0.94.6

2014-02-18 Thread Nick Dimiduk
Hi Dean, Any chance you've tested Ram's patch? Does it work for you? Thanks, Nick On Mon, Jan 27, 2014 at 8:28 AM, Dean hikeonp...@gmail.com wrote: Hi Ram, We'll give it a shot, thanks! -Dean

Re: HBase- Hive Integration

2014-03-14 Thread Nick Dimiduk
hbase-dev to bcc; adding hive-user. this is a question for the user lists, and more for Hive's than HBase, as HBaseStorageHandler is code in Hive project, not HBase. Hi Sai, You are embarking into a brave world. Because your aim is the interop across these different Apache projects, I highly

Re: Eventual consistency in Hbase

2014-03-17 Thread Nick Dimiduk
Consistency between replicated clusters (across data centers) is asynchronous and can also be considered eventual. From within a single cluster, only a single node is responsible for serving a given range of data, so there is only one, consistent, view on that data. This model can be relaxed via

Call for Lightning Talks, Hadoop Summit HBase BoF

2014-05-13 Thread Nick Dimiduk
Hi HBasers! Subash and I are organizing the HBase Birds of a Feather (BoF) session at Hadoop Summit San Jose this year. We're looking for 4-5 brave souls willing to standup for 15 minutes and tell the community what's working for them and what isn't. Have a story about how this particular feature

Re: custom filter which extends Filter directly

2014-05-15 Thread Nick Dimiduk
+hbase-user On Tue, May 13, 2014 at 7:57 PM, Ted Yu yuzhih...@gmail.com wrote: To be a bit more specific (Filter is an interface in 0.94): If you use 0.96+ releases and your filter extends Filter directly, I would be curious to know your use case. Thanks On Tue, May 6, 2014 at 11:25 AM,

Re: Call for Lightning Talks, Hadoop Summit HBase BoF

2014-05-15 Thread Nick Dimiduk
Just to be clear, this is not a call for vendor pitches. This is a venue for HBase users, operators, and developers to intermingle, share stories, and storm new ideas. On Tue, May 13, 2014 at 11:40 AM, Nick Dimiduk ndimi...@gmail.com wrote: Hi HBasers! Subash and I are organizing the HBase

Re: hfile 9.4 to 9.6

2014-07-22 Thread Nick Dimiduk
Hi Guangle, Please have a look at the online book, there's a section on upgrades. Also, please consider upgrading to 0.98. The 0.96 line is in minimum maintenance mode and 0.98 is considered the stable/production line. http://hbase.apache.org/book.html#upgrading Thanks, Nick On Tue, Jul 22,

Re: Calculate number of records in write buffer

2014-07-30 Thread Nick Dimiduk
On Wed, Jul 30, 2014 at 3:34 AM, varshar varsha.raveend...@gmail.com wrote: The WriteRequestCount metric is incremented by only 1 for one batch insert and not by the number of records inserted. I think you're correct, this is a bug. Do you mind filing a ticket? HRegion.doMiniBatchMutation

Re: hbase and hadoop (for normal hdfs) cluster together?

2014-07-31 Thread Nick Dimiduk
Hi Wilm, What else will this cluster do? Are you planning to run MR against the data here? If this cluster is dedicated to your application and you have enough IO capacity to support all application needs on the cluster, I see no reason to run two clusters. The reason we recommend against

Re: Hbase Mapreduce API - Reduce to a file is not working properly.

2014-07-31 Thread Nick Dimiduk
Hi Parkirat, I don't follow the reducer problem you're having. Can you post your code that configures the job? I assume you're using TableMapReduceUtil someplace. Your reducer is removing duplicate values? Sounds like you need to update it's logic to only emit a value once. Pastebin-ing your

Re: HBase + PySpark

2014-07-31 Thread Nick Dimiduk
+1 for happybase On Thu, Jul 31, 2014 at 2:47 PM, Stack st...@duboce.net wrote: On Thu, Jul 31, 2014 at 11:47 AM, Esteban Gutierrez este...@cloudera.com wrote: Hello, Have you tried to use the Thrift bindings for Python? An example can be found under the hbase-examples directory:

Re: Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-08-01 Thread Nick Dimiduk
You're asking whether it's more time efficient to do a single universal sort of all the data vs first doing a group by cf and sorting each group individually? Thats like a question more appropriate for the spark user list. -n On Wed, Jul 30, 2014 at 8:01 PM, Jianshi Huang

Re: Unable to build Oozie on Hbase 0.98.3

2014-08-01 Thread Nick Dimiduk
Your maven output says build success at the end. Anyway, this is more of an Oozie list question than HBase. 0.94 and 0.98 have some API incompatibilities, so Oozie will need to address those in its own code. -n On Fri, Aug 1, 2014 at 4:59 PM, hima bindu battahimabi...@gmail.com wrote: Hi

Re: Calculate number of records in write buffer

2014-08-04 Thread Nick Dimiduk
Nicolas's patch there on 11353 includes a nice test. A good way to start verifying what's broken and what isn't would be to add a similar test for the metric you're looking at (if it doesn't already exist). I assume region-level metrics are reported as aggregate in the server-level metrics, but I

Re: How to turn on tracing on HBase?

2014-08-07 Thread Nick Dimiduk
You can also have a look at PerformanceEvaluation tool on master (and probably branch-1 and 0.98, but I don't recall exactly). See what it does on the client side to enable tracing. IIRC, the param is --traceRate. -n On Thursday, August 7, 2014, Ted Yu yuzhih...@gmail.com wrote: See

Re: Hbase Read/Write poor performance. Please help!

2014-08-13 Thread Nick Dimiduk
Have you looked at the performance guidelines in our online book? http://hbase.apache.org/book.html#performance http://hbase.apache.org/book.html#casestudies.perftroub On Wed, Aug 13, 2014 at 8:43 AM, Pradeep Gollakota pradeep...@gmail.com wrote: Can you post the client code you're using to

Re: blockcache usage

2014-08-14 Thread Nick Dimiduk
I'm not aware of specifically this experiment. You might have a look at our HeapSize interface and it's implementations for things like HFileBlock. On Tue, Aug 12, 2014 at 11:05 PM, abhishek1015 abhishek1...@gmail.com wrote: Hello everyone, I am wondering if someone has experimentally

Re: Four nodes HBase cluster configuration

2014-08-14 Thread Nick Dimiduk
This is a fine role assignment if HA is not required. For true HBase HA you'll need at least HA namenode, multiple HBase masters, and a zookeeper quorum. On Thursday, August 14, 2014, Dongsu Lee dongsulee2...@gmail.com wrote: Hi, Could you help me to find a guideline or recommendation for

Re: how to debug hbase standalone?

2014-08-15 Thread Nick Dimiduk
I debug a local hbase out of my git checkout by attaching to the remote process. I edit hbase-env.sh (there are lines you can uncomment) to start the processes with debugging enabled. Then from my dev environment (IntelliJ), I point it at the port and everything works. I imagine it would work

Re: Scan output to file on each regserver node?

2014-08-19 Thread Nick Dimiduk
This sounds an awful lot like a map-only MR job... With Hadoop Streaming, you should be able to achieve your goal of piping to an arbitrary process. On Tue, Aug 19, 2014 at 4:26 PM, Demai Ni nid...@gmail.com wrote: Dear experts , I understand that I can do a simple command like: echo scan

Re: Scan output to file on each regserver node?

2014-08-19 Thread Nick Dimiduk
. What do you think about AggregationClient? It is carried out at region/region server level, maybe instead do a count/min/avg, a method can be used to write the data out to local file system? Demai on the run On Aug 19, 2014, at 5:04 PM, Nick Dimiduk ndimi...@gmail.com wrote: This sounds

Shout-out for Misty

2014-08-19 Thread Nick Dimiduk
Our docs are getting a lot of love lately, courtesy of one Misty Stanley-Jones. As someone who joined this community by way of documentation, I'd like to say: Thank you, Misty! -n

Re: performance of block cache

2014-08-20 Thread Nick Dimiduk
Hi Zhaojie, I'm responsible for this particular bit of work. One thing to note in these experiments is that I did not control explicitly for OS caching. I ran warmup workloads before collecting measurements, but because the amount of RAM on the machine is fixed, it's impact of OS cache is

Re: performance of block cache

2014-08-21 Thread Nick Dimiduk
. You can have a look if you are interested in. 2014-08-21 1:48 GMT+08:00 Nick Dimiduk ndimi...@gmail.com: Hi Zhaojie, I'm responsible for this particular bit of work. One thing to note in these experiments is that I did not control explicitly for OS caching. I ran warmup workloads before

Re: Writing Custom - KeyComparator !!!

2014-08-27 Thread Nick Dimiduk
You might also have a look at using OrderedBytes [0] instead of Bytes for encoding your values to byte[]. This is the kind of use-case those encoders are intended to support. Thanks, Nick [0]: https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/OrderedBytes.html On Wed, Aug 27, 2014

Re: performance of block cache

2014-09-14 Thread Nick Dimiduk
? If not, what kind of things should I take care? 2014-08-22 7:06 GMT+08:00 Nick Dimiduk ndimi...@gmail.com javascript:; : I'm familiar with Stack's work too, but thanks for pointing it out :) On Wed, Aug 20, 2014 at 8:19 PM, 牛兆捷 nzjem...@gmail.com javascript:; wrote: Hi Nick: Yes, I am

Re: performance of block cache

2014-09-16 Thread Nick Dimiduk
Replying to this thread is getting bounced as spam. Here's the reply I sent yesterday. On Mon, Sep 15, 2014 at 7:52 PM, Nick Dimiduk ndimi...@gmail.com wrote: The explicit JAVA_HOME requirement is new via HBASE-11534. On Mon, Sep 15, 2014 at 3:16 AM, 牛兆捷 nzjem...@gmail.com wrote: It works

Re: HBase Rest decoding responses

2014-09-18 Thread Nick Dimiduk
You'll need to grab the payload and run it through a base64 decoder. How exactly you do this will depend on your client environment. I demonstrate how to do this in HBase In Action from the shell using `base64 --decode` --- Thanks, Nick n10k.com hbaseinaction.com On Thu, Sep 18, 2014 at 6:16 AM,

Re: are column qualifiers safe as user inputed values?

2014-09-30 Thread Nick Dimiduk
This depends more on your parsing code than on HBase. All values are converted into byte[]'s for HBase. Once your code has parsed the user input and generated the byte[], there's no place for ambiguity on the HBase side. On Tue, Sep 30, 2014 at 5:19 PM, Ted r6squee...@gmail.com wrote: Hi I'm

Re: Recovering hbase after a failure

2014-10-02 Thread Nick Dimiduk
Hi Ron, Yikes! Do you have any basic metrics regarding the amount of data in the system -- size of store files before the incident, number of records, c? You could sift through the HDFS audit log and see if any files that were there previously have not been restored. -n On Thu, Oct 2, 2014 at

Re: How to make a given table spread evenly across the cluster

2014-10-02 Thread Nick Dimiduk
Is the balancer running? If it's running but unable to act, you should see some explanation in the Master logs. On Wed, Oct 1, 2014 at 8:27 PM, Tao Xiao xiaotao.cs@gmail.com wrote: Hi all, I have a HBase table containing 40 million records. Checking HBase UI, I could see that this table

Re: HBase read performance

2014-10-02 Thread Nick Dimiduk
Do check again on the heap size of the region servers. The default unconfigured size is 1G; too small for much of anything. Check your RS logs -- look for lines produced by the JVMPauseMonitor thread. They usually correlate with long GC pauses or other process-freeze events. Get is implemented as

Re: Recovering hbase after a failure

2014-10-02 Thread Nick Dimiduk
some rowcounter's too. Feels like we got off easy. Ron -Original Message- From: Nick Dimiduk [mailto:ndimi...@gmail.com] Sent: Thursday, October 02, 2014 1:27 PM To: hbase-user Subject: Re: Recovering hbase after a failure Hi Ron, Yikes! Do you have any basic

Re: Recovering hbase after a failure

2014-10-02 Thread Nick Dimiduk
Ah yes, of course there is. On Thu, Oct 2, 2014 at 12:11 PM, Andrew Purtell andrew.purt...@gmail.com wrote: Is there not the WAL to handle a failed flush? On Oct 2, 2014, at 11:39 AM, Nick Dimiduk ndimi...@gmail.com wrote: In this case, didn't the RS creating the directories

Re: QualifierFilter with Stargate

2014-10-08 Thread Nick Dimiduk
Hi Anil, Stargate has two scanner implementations: stateful and stateless. Which one are you using? The stateful scanner has long supported QualifierFilter. See the ScannerModel class [0], it has a FilterModel component that you'll need to populate when you create the scanner. Stateless scanner

Re: QualifierFilter with Stargate

2014-10-09 Thread Nick Dimiduk
am going to do gets. So, i should be good with Stateful also. Thanks Nick for the helpful link. Will give this a try soon. Thanks, Anil Gupta On Wed, Oct 8, 2014 at 3:08 PM, Nick Dimiduk ndimi...@gmail.com javascript:; wrote: Hi Anil, Stargate has two scanner implementations: stateful

Re: HBase read performance

2014-10-10 Thread Nick Dimiduk
Hang on, how are you using 11G total memory? m1.large only has 7.5G total RAM. On Fri, Oct 10, 2014 at 2:56 PM, Nick Dimiduk ndimi...@gmail.com wrote: ByteBuffer position math errors makes me suspect #1 cacheonwrite and #2 bucketcache (and #3 their use in combination ;) ) 11G memory

Re: hbase-client Put serialization exception

2014-10-16 Thread Nick Dimiduk
Can you confirm that you're using the same version of hbase in your project dependencies as with your runtime system? Seems like you might have some 0.94 mixed in somewhere. On Thu, Oct 16, 2014 at 2:57 PM, Ted Yu yuzhih...@gmail.com wrote: Do you have more information about the

Re: issue about migrate hbase table from 0.94 to 0.98

2014-10-19 Thread Nick Dimiduk
To be clear, you have two clusters, and you're interested in moving a table's data but not necessarily it's operational load? There is probably a migration path involving table snapshots, though I've not tried it myself. It would look something like the following: - major compact the table to

Re: Using parquet

2014-10-20 Thread Nick Dimiduk
Not currently. HBase uses it's own file format that makes different assumptions than parquet. Instead, HBase supports it's own format optimizations, such as block encodings and compression. I would be interested in an exercise to see what things are necessary for HBase to support a columnar format

Re: s3n with hbase

2014-10-31 Thread Nick Dimiduk
Please don't do this. S3 is not a strongly consistent filesystem. HBase will not be happy there. Better to run on HDFS and to snapshots/copytable backup, restore to S3. On Fri, Oct 31, 2014 at 4:53 PM, Khaled Elmeleegy kd...@hotmail.com wrote: Hi, I am trying to use hbase with s3, using s3n,

Re: s3n with hbase

2014-11-01 Thread Nick Dimiduk
It's a reliability/stability problem. the S3 implementation of the FS doesn't provide the characteristics we rely on because S3 doesn't have these characteristics. It may be that there are improvements to be made in the s3 or s3n drivers, but I believe there's a fundamental difference in the

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Nick Dimiduk
Ive been doing some testing with ITMTTR recently in ec2 with m1.l and m1.xl instances. Debug level logging seems to produce 20-30 messages/sec on the RS. I have noticed pauses in the log entries that last anywhere from 30-120 seconds. I have no explanation for the pauses other than the

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Nick Dimiduk
Ive been doing some testing with ITMTTR recently in ec2 with m1.l and m1.xl instances. Debug level logging seems to produce 20-30 messages/sec on the RS. I have noticed pauses in the log entries that last anywhere from 30-120 seconds. I have no explanation for the pauses other than the

Re: Hbase Unusable after auto split to 1024 regions

2014-11-06 Thread Nick Dimiduk
One other thought: you might try tracing your requests to see where the slowness happens. Recent versions of PerformanceEvaluation support this feature and can be used directly or as an example for adding tracing to your application. On Thursday, November 6, 2014, Pere Kyle p...@whisper.sh wrote:

Re: Logging for HBase tests

2014-11-15 Thread Nick Dimiduk
This stuff lands by default under module/target/surefire-reports/testName.xml. The content is only there when a test fails. I don't know how to make surefire always preserve the run logs, so when I need to check them I just add an Assert.fail() to the end of the test. This trick of overriding the

Re: Replacing a full Row content in HBase

2014-11-20 Thread Nick Dimiduk
What does unpredictable results mean? If you know all the existing qualifiers, just provide new values of all of them in a single put. If you don't, you can use a delete family marker to clear visibility of an entire family. I think you'll need to do this separately from writing the new values.

Re: Replacing a full Row content in HBase

2014-11-20 Thread Nick Dimiduk
On Thu, Nov 20, 2014 at 11:40 AM, Nick Dimiduk ndimi...@gmail.com wrote: What does unpredictable results mean? If you know all the existing qualifiers, just provide new values of all of them in a single put. If you don't, you can use a delete family marker to clear visibility of an entire

Re: Replacing a full Row content in HBase

2014-11-20 Thread Nick Dimiduk
); admin.deleteTable(NAME); } } On Thu, Nov 20, 2014 at 1:48 PM, Nick Dimiduk ndimi...@gmail.com wrote: Attachements are filtered from the list. Please include a link if you'd like to share some attachment. On Thu, Nov 20, 2014 at 11:46 AM, Sznajder ForMailingList

Re: Replacing a full Row content in HBase

2014-11-20 Thread Nick Dimiduk
to printTableContent(). On Thu, Nov 20, 2014 at 3:29 PM, Nick Dimiduk ndimi...@gmail.com wrote: Are you flushing the edits so that they're actually written to the server before you send the gets? On Thu, Nov 20, 2014 at 2:43 PM, Sznajder ForMailingList bs4mailingl...@gmail.com wrote: Sure Here

Re: how to explain read/write performance change after modifying the hfile.block.cache.size?

2014-11-21 Thread Nick Dimiduk
400mb blockcache? Ouch. What's your hbase-env.sh? Have you configured a heap size? My guess is you're using the un configured default of 1G. Should be at least 8G, and maybe more like 30G with this kind of host. How many users are sharing it and with what kinds of tasks? If there's no IO

  1   2   3   4   >