Re: Loader for small files

2013-02-11 Thread Something Something
Sorry.. Moving 'hbase' mailing list to BCC 'cause this is not related to HBase. Adding 'hadoop' user group. On Mon, Feb 11, 2013 at 10:22 AM, Something Something mailinglist...@gmail.com wrote: Hello, We are running into performance issues with Pig/Hadoop because our input files are small

Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-18 Thread Something Something
Anybody? Alrighty then.. back to more debugging -:) On Thu, May 17, 2012 at 5:06 PM, Something Something mailinglist...@gmail.com wrote: HBase Version: hbase-0.90.4-cdh3u3 Hadoop Version: hadoop-0.20.2-cdh3u2 12/05/17 16:37:47 ERROR mapreduce.LoadIncrementalHFiles: IOException during

Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
Hello, I keep getting this message while running the 'completebulkload' process. I tried the following solutions that I came across while Googling for this error: 1) setReduceSpeculativeExecution(true) 2) Made sure that none of the tasks are failing. 3) The HFileOutput job runs

Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
the complete message ? What HBase version are you using ? On Thu, May 17, 2012 at 4:48 PM, Something Something mailinglist...@gmail.com wrote: Hello, I keep getting this message while running the 'completebulkload' process. I tried the following solutions that I came across while Googling

Re: HBase Performance Improvements?

2012-05-16 Thread Something Something
enough to not have to go that far. On Thu, May 10, 2012 at 11:55 AM, Something Something mailinglist...@gmail.com wrote: I am beginning to get a sinking feeling about this :( But I won't give up! Problem is that when I use one Reducer the job runs for a long time. I killed it after

Re: MR job for creating splits

2012-05-13 Thread Something Something
in a row until the size reached a certain limit. On Sat, May 12, 2012 at 7:21 PM, Something Something mailinglist...@gmail.com wrote: Hello, This is really a MapReduce question, but the output from this will be used to create regions for an HBase table. Here's what I want to do

Re: MR job for creating splits

2012-05-13 Thread Something Something
). On Sun, May 13, 2012 at 2:11 AM, Something Something mailinglist...@gmail.com wrote: Is there no way to find out inside a single reducer how many records were created by all the Mappers? I tried several ways but nothing works. For example, I tried this: reporter.getCounter

Re: HBase Performance Improvements?

2012-05-10 Thread Something Something
took less than an hour, with the bulk load part only taking 15 minutes. Much better! On Wed, May 9, 2012 at 11:08 AM, Something Something mailinglist...@gmail.com wrote: Hey Oliver, Thanks a billion for the response -:) I will take any code you can provide even if it's a hack! I

Re: HBase Performance Improvements?

2012-05-10 Thread Something Something
. Secondary sort is not necessary unless the order of the values matter for you. In this case (with the row key as the reducer key), I don't think that matters. On Thu, May 10, 2012 at 3:22 AM, Something Something mailinglist...@gmail.com wrote: Thank you Tim Bryan for the responses. Sorry

HBase Performance Improvements?

2012-05-09 Thread Something Something
I ran the following MR job that reads AVRO files puts them on HBase. The files have tons of data (billions). We have a fairly decent size cluster. When I ran this MR job, it brought down HBase. When I commented out the Puts on HBase, the job completed in 45 seconds (yes that's seconds).

Re: HBase Performance Improvements?

2012-05-09 Thread Something Something
that I eventually moved into HBase with completebulkload. I haven't committed my class anywhere because it's a pretty ugly hack, but I'm happy to share it with you as a starting point. Doing billions of puts will just drive you crazy. Cheers, Oliver On 2012-05-09, at 4:51 PM, Something

Fwd: org.apache.hadoop.conf.Configuration - error parsing conf file

2012-03-08 Thread Something Something
-- Forwarded message -- From: Something Something mailinglist...@gmail.com Date: Thu, Mar 8, 2012 at 8:43 AM Subject: Re: org.apache.hadoop.conf.Configuration - error parsing conf file To: u...@pig.apache.org, manishbh...@rocketmail.com *Stack*: Explicit message would be one

Re: org.apache.hadoop.conf.Configuration - error parsing conf file

2012-03-08 Thread Something Something
* to turn it off. */ public synchronized void setQuietMode(boolean quietmode) { this.quietmode = quietmode; } Can someone tell me how to force call to this? Apologies in advance for my dumbness. On Wed, Mar 7, 2012 at 10:30 PM, Something Something mailinglist...@gmail.com wrote

org.apache.hadoop.conf.Configuration - error parsing conf file

2012-03-07 Thread Something Something
Hello, I am using: hadoop-0.20.2-cdh3u2, hbase-0.90.4-cdh3u3, pig-0.8.1-cdh3u3 I have successfully loaded data into HBase tables (implying my Hadoop HBase setup is good). I can look at the data using HBase shell. Now I am trying to read data from HBase via a Pig Script. My test script looks

Couple of schema design questions

2012-02-26 Thread Something Something
Trying to design a HBase schema for a log processing application. We will get new logs every day. 1) We are thinking we will keep data for each day in separate tables. The table names would be something like XYZ-2012-02-26 etc. There will be at most 4 tables for each day. Pros: Other

Re: Couple of schema design questions

2012-02-26 Thread Something Something
is your access pattern (both reads and writes) and sorting requirements. thanks On Sun, Feb 26, 2012 at 10:24 PM, Something Something mailinglist...@gmail.com wrote: Trying to design a HBase schema for a log processing application. We will get new logs every day. 1) We are thinking we

Starting Map Reduce Job on EC2

2012-01-15 Thread Something Something
Hello, Our Hadoop cluster is setup on EC2, but our client machine which will trigger the M/R job is in our data center. I am trying to start a M/R job from our client machine, but getting this: 00:01:16.885 [pool-6-thread-1] INFO org.apache.hadoop.ipc.Client - Retrying connect to server:

Re: HBase Vs CitrusLeaf?

2011-09-08 Thread Something Something
: Re: HBase Vs CitrusLeaf? On Sep 06, Something Something wrote: Anyway, before I spent a lot of time on it, I thought I should check if anyone has compared HBase against CitrusLeaf. If you've, I would greatly appreciate it if you would share your experiences. Disclaimer: I was an early

HBase as a replacement for Netezza?

2011-09-08 Thread Something Something
By no means I am a Netezza expert, but my manager seems to believe that our existing Netezza based system can be replaced with a NOSQL (Key/Value) type of database. If anyone has done Netezza to HBase migration, please share your experiences. As always, greatly appreciate the help.

HBase Vs CitrusLeaf?

2011-09-06 Thread Something Something
I am a HUGE fan of HBase, but our management team wants us to evaluate CitrusLeaf (http://citrusleaf.net/index.php). I have NO idea why! Our management claims that CitrusLeaf is (got to be) faster because it's written in C++. Trying to find if there's any truth to that. Anyway, before I spent

Transaction Management in HBase?

2011-06-12 Thread Something Something
What's the best way of implementing transaction management in HBase? I have a use case in which I update multiple tables. If for some reason an update fails on the 2nd table, I would like to rollback changes to the first table. A quick Google search got me to this document:

Starting Hadoop/HBase cluster on Rackspace

2011-05-31 Thread Something Something
Hello, Are there scripts available to create a HBase cluster on Rackspace - like there are for Amazon EC2? A quick Google search didn't come up with anything useful. Any help in this regard would be greatly appreciated. Thanks. - Ajay

Designing table with auto increment key

2011-02-13 Thread Something Something
Hello, Can you please tell me if this is the proper way of designing a table that's got an auto increment key? If there's a better way please let me know that as well. After reading the mail archives, I learned that the best way is to use the 'incrementColumnValue' method of HTable. So

Re: Fastest way to read only the keys of a HTable?

2011-02-03 Thread Something Something
to visualize counting process yield(count, String.from_java_bytes(row.getRow)) end # Return the counter return count end St.Ack On Thu, Feb 3, 2011 at 6:47 AM, Something Something mailinglist...@gmail.com wrote: Thanks. So I will add this... scan.setFilter

HBase as a backend for GUI app?

2011-02-03 Thread Something Something
Is it advisable to use HBase as a backend for a GUI app or is HBase more for storing huge amounts of data used mainly for data analysis in non-online/batch mode? In other words, after storing data on HBase do most people extract the summary and store it in a SQL database for quick retrieval by

Re: Fastest way to read only the keys of a HTable?

2011-02-03 Thread Something Something
of rows you want to pre-fetch per RPC. Setting it to 2 is already 2x better than the default. J-D On Thu, Feb 3, 2011 at 1:35 PM, Something Something mailinglist...@gmail.com wrote: After adding the following line: scan.addFamily(Bytes.toBytes(Info)); performance improved dramatically

Re: HBase as a backend for GUI app?

2011-02-03 Thread Something Something
On Thu, Feb 3, 2011 at 2:48 PM, Something Something mailinglist...@gmail.com wrote: Is it advisable to use HBase as a backend for a GUI app or is HBase more for storing huge amounts of data used mainly for data analysis in non-online/batch mode? In other words, after storing data

Re: HBase as a backend for GUI app?

2011-02-03 Thread Something Something
is around 8 seconds. This was my first try at HBase and my next rev. will be much better. -Pete PS At least you could use your name. Something Something mailinglist...@gmail.com wrote: = Is it advisable to use HBase as a backend for a GUI app or is HBase more

Re: Tables rows disappear

2011-02-02 Thread Something Something
Stack - Any thoughts on this? On Mon, Jan 31, 2011 at 6:27 PM, Something Something mailinglist...@gmail.com wrote: 1) Version numbers: hadoop-0.20.2 hbase-0.20.6 2) autoFlush to 'true' works, but wouldn't that slow down the insertion process? 3) Here's how I had set it up: In my

Fastest way to read only the keys of a HTable?

2011-02-02 Thread Something Something
I want to read only the keys in a table. I tried this... try { HTable table = new HTable(myTable); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(Info)); ResultScanner scanner = table.getScanner(scan); Result result = scanner.next(); while (result != null) { so on...

Re: Fastest way to read only the keys of a HTable?

2011-02-02 Thread Something Something
://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html St.Ack On Thu, Feb 3, 2011 at 6:01 AM, Something Something mailinglist...@gmail.com wrote: I want to read only the keys in a table. I tried this... try { HTable table = new HTable(myTable); Scan scan

Re: Tables rows disappear

2011-01-31 Thread Something Something
, Something Something mailinglist...@gmail.com wrote: Apologies for my dumbness. I know it's some property that I am not setting correctly. But every time I stop start HBase Hadoop I either lose all my tables or loose rows on tables in HBase. Here's what various files contain: *core

Re: Bytes.toString(value)) returns empty string

2011-01-21 Thread Something Something
supposed to exist. J-D On Thu, Jan 20, 2011 at 11:52 PM, Something Something mailinglist...@gmail.com wrote: I have a column that looks like this under hbase shell: column=Request:placement, timestamp=1295593730949, value=specific.ea.tracking.promo.deadspace2 In my code I have

Bytes.toString(value)) returns empty string

2011-01-20 Thread Something Something
I have a column that looks like this under hbase shell: column=Request:placement, timestamp=1295593730949, value=specific.ea.tracking.promo.deadspace2 In my code I have something like this... byte[] value = result.getValue(Bytes.toBytes(Request), Bytes.toBytes(placement));