RE: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Kiran Kumar.M.R
This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or

Calculate number of records in write buffer

2014-07-30 Thread varshar
Hi, We are writing a billion records into HBase using multiple clients. Each client is multithreaded. Autoflush is set to false and the write buffer size = 12MB. The WriteRequestCount metric is incremented by only 1 for one batch insert and not by the number of records inserted.* Is there any

Could not resolve the DNS name of slave2:60020

2014-07-30 Thread Chandrashekhar Kotekar
I have a HBase cluster on AWS. I have written few REST services which are supposed to connect to this HBase cluster and get some data. My configuration is as below : 1. Java code, eclipse, tomcat running on my desktop 2. HBase cluster, Hadoop cluster sitting on AWS 3. Can connect to

Re: Could not resolve the DNS name of slave2:60020

2014-07-30 Thread Jean-Marc Spaggiari
Hi Chandrash, What do you have in your /etc/hosts? Can you also share the piece of code where you are doing the connection to HBase? Thanks, JM 2014-07-30 7:34 GMT-04:00 Chandrashekhar Kotekar shekhar.kote...@gmail.com : I have a HBase cluster on AWS. I have written few REST services which

RE: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Kiran Kumar.M.R
Hi, After step 4 ( i.e disabling of WAL encryption, removing SecureProtobufReader/Writer and restart), read of encrypted WAL fails mainly due to EOF exception at Basedecoder. This is not considered as error and these WAL are being moved to /oldWALs. Following is observed in log files:

Re: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Ted Yu
Looking at HLogSplitter#getNextLogLine() : try { return in.next(); } catch (EOFException eof) { // truncated files are expected if a RS crashes (see HBASE-2643) LOG.info(EOF from hlog + path + . continuing); return null; The EOFException is not treated as

Re: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Ted Yu
In BaseDecoder#rethrowEofException() : if (!isEof) throw ioEx; LOG.error(Partial cell read caused by EOF: + ioEx); EOFException eofEx = new EOFException(Partial cell read); eofEx.initCause(ioEx); throw eofEx; throwing EOFException would not propagate the Partial cell

Re: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Andrew Purtell
Let's take this to JIRA On Wed, Jul 30, 2014 at 12:50 PM, Ted Yu yuzhih...@gmail.com wrote: In BaseDecoder#rethrowEofException() : if (!isEof) throw ioEx; LOG.error(Partial cell read caused by EOF: + ioEx); EOFException eofEx = new EOFException(Partial cell read);

Re: HBase file encryption, inconsistencies observed and data loss

2014-07-30 Thread Ted Yu
I logged HBASE-11620 for this issue. If my proposal is accepted, I can provide a patch. Cheers On Wed, Jul 30, 2014 at 12:56 PM, Andrew Purtell apurt...@apache.org wrote: Let's take this to JIRA On Wed, Jul 30, 2014 at 12:50 PM, Ted Yu yuzhih...@gmail.com wrote: In

Hbase / How to Migrate

2014-07-30 Thread Colin Kincaid Williams
I'm preparing to migrate an hbase database between clusters, but using the same hbase version 0.92.1-cdh4.1.3. I found the document http://wiki.apache.org/hadoop/Hbase/HowToMigrate had a link to an older document about the design http://wiki.apache.org/hadoop/Hbase/Migration . In that document was

Re: Calculate number of records in write buffer

2014-07-30 Thread Nick Dimiduk
On Wed, Jul 30, 2014 at 3:34 AM, varshar varsha.raveend...@gmail.com wrote: The WriteRequestCount metric is incremented by only 1 for one batch insert and not by the number of records inserted. I think you're correct, this is a bug. Do you mind filing a ticket? HRegion.doMiniBatchMutation

Hbase MR Job with 2 OutputForm classes possible?

2014-07-30 Thread Thomas Kwan
Hi there, I have a Hbase MR job that reads data from HDFS, do a Hbase Get, and then do some data transformation. Then I need to put the data back to Hbase as well as write data to a HDFS file directory (so I can import it back into Hive). The current job creation logic is similar to the

Re: Hbase MR Job with 2 OutputForm classes possible?

2014-07-30 Thread Shahab Yunus
There is a trick. You can use MultipleOutputs with TableMapReduceUtil. In the Reducer you can write to desired outputs on HDFS using MultipleOutputs and the HBase Util will do its work as is. Only caveat is that, you will have to commit the files that you have written using MultipleOutputs

Can I put all columns into a row key?

2014-07-30 Thread yl wu
Hi all, I am trying to design the row key for a table. Our application would perform many queries on columns. My question is that is that a good way to put all values from columns into the row key? Thus I can use filters like FuzzyRowFilter to get rows and parse the values directly from row keys.

Re: Completebulkload with namespace option?

2014-07-30 Thread Jianshi Huang
Created a Jira issue. https://issues.apache.org/jira/browse/HBASE-11622 On Tue, Jul 29, 2014 at 11:46 PM, Bharath Vissapragada bhara...@cloudera.com wrote: Appears to be a bug. It should be TableName.valueOf(...) or something similar. Mind filing a jira? On Tue, Jul 29, 2014 at 12:22 PM,

Re: Completebulkload with namespace option?

2014-07-30 Thread Ted Yu
Matteo acted very fast - this has been fixed by HBASE-11609 Cheers On Wed, Jul 30, 2014 at 7:02 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Created a Jira issue. https://issues.apache.org/jira/browse/HBASE-11622 On Tue, Jul 29, 2014 at 11:46 PM, Bharath Vissapragada

Re: Completebulkload with namespace option?

2014-07-30 Thread Jianshi Huang
Wow, thanks! :) On Thu, Jul 31, 2014 at 10:07 AM, Ted Yu yuzhih...@gmail.com wrote: Matteo acted very fast - this has been fixed by HBASE-11609 Cheers On Wed, Jul 30, 2014 at 7:02 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: Created a Jira issue.

Best practice for writing to HFileOutputFormat(2) with multiple Column Families

2014-07-30 Thread Jianshi Huang
I need to generate from a 2TB dataset and exploded it to 4 Column Families. The result dataset is likely to be 20TB or more. I'm currently using Spark so I sorted the (rk, cf, cq) myself. It's huge and I'm considering how to optimize it. My question is: Should I sort and write each column family

Re: Calculate number of records in write buffer

2014-07-30 Thread varshar
Hi, I found this ticket for wrong write request count but I am not sure if this is related to the same issue. https://issues.apache.org/jira/browse/HBASE-11353 https://issues.apache.org/jira/browse/HBASE-11353 We are using HBase version : 0.98.0.2 and the fix for the above ticket is in