Re: Spark HBase Bulk load using HFileFormat

2016-07-13 Thread Ted Yu
Can you show the code inside saveASHFile ? Maybe the partitions of the RDD need to be sorted (for 1st issue). Cheers On Wed, Jul 13, 2016 at 4:29 PM, yeshwanth kumar wrote: > Hi i am doing bulk load into HBase as HFileFormat, by > using saveAsNewAPIHadoopFile > > i am

Spark HBase Bulk load using HFileFormat

2016-07-13 Thread yeshwanth kumar
Hi i am doing bulk load into HBase as HFileFormat, by using saveAsNewAPIHadoopFile i am on HBase 1.2.0-cdh5.7.0 and spark 1.6 when i try to write i am getting an exception java.io.IOException: Added a key not lexically larger than previous. following is the code snippet case class

Re: is possible to create multiple TableSplit per region?

2016-07-13 Thread Billy Watson
I agree. I'm not an expert though. I do more pig jobs than anything. Any one else on the thread have more experience creating MR jobs on HBase data? On Wednesday, July 13, 2016, Frank Luo wrote: > It will work, but it is pretty awkward way to create more mappers. > > > >

RE: Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread Frank Luo
It will work, but it is pretty awkward way to create more mappers. From: Billy Watson [mailto:williamrwat...@gmail.com] Sent: Wednesday, July 13, 2016 3:57 PM To: Frank Luo Cc: user@hbase.apache.org Subject: Re: Re:is possible to create multiple TableSplit per region? It

Re: Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread Billy Watson
It seems like it might be faster then to consider a map job followed by another map job. Or, depending on the web service calls, maybe a combine step? William Watson Lead Software Engineer On Wed, Jul 13, 2016 at 4:40 PM, Frank Luo wrote: > It makes a number of web-service

RE: Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread Frank Luo
It makes a number of web-service calls. From: Billy Watson [mailto:williamrwat...@gmail.com] Sent: Wednesday, July 13, 2016 3:27 PM To: user@hbase.apache.org Cc: Frank Luo Subject: Re: Re:is possible to create multiple TableSplit per region? What do you mean by "heavy work

Re: Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread Billy Watson
What do you mean by "heavy work downstream"? I think the mailing list might need a *few* more details to help out better. William Watson On Wed, Jul 13, 2016 at 12:32 PM, Frank Luo wrote: > Thanks for the prompt reply, Lu. > > It is true that having a smaller region file

RE: Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread Frank Luo
Thanks for the prompt reply, Lu. It is true that having a smaller region file size can solve the problem. But it also have side effects. For example, total number of regions can be easily doubled/tripled, and I am already facing a challenge of having too many regions per server. So I cannot go

Re:is possible to create multiple TableSplit per region?

2016-07-13 Thread 陆巍
here is an archived mail: http://mail-archives.apache.org/mod_mbox/hbase-user/201303.mbox/%3cblu0-smtp19115a8967869d6cf0d49ef8f...@phx.gbl%3E At 2016-07-13 23:20:28, "Frank Luo" wrote: >We have mapper only jobs operating on a result of a Scan. Because of heavy >work

is possible to create multiple TableSplit per region?

2016-07-13 Thread Frank Luo
We have mapper only jobs operating on a result of a Scan. Because of heavy work downstream, the mapper runs fairly slowly. So I am wondering if there is a way to create multiple TableSplit on one region hence multiple mappers can be created to work on different piece of date on the region. I

unable to write data to hbase without any error

2016-07-13 Thread 罗辉
Hello guys I have a spark-sql app which writes some data to hbase, however this app hangs without any exception or error. Here is my code: //code base :https://hbase.apache.org/book.html#scala val sparkMasterUrlDev = "spark://master60:7077" val sparkMasterUrlLocal = "local[2]"