Re: Phoenix as a source for Spark processing

2018-03-15 Thread Josh Elser
t the target table will contains about 100 million records. HBase has 14 region servers, both tables salted with SALT_BUCKETS=42. Spark's job running via Yarn. -Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Monday, March 5, 2018 9:14 PM To: user@phoenix.apache.o

RE: Phoenix as a source for Spark processing

2018-03-15 Thread Stepan Migunov
---Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Friday, March 9, 2018 2:17 AM To: user@phoenix.apache.org Subject: Re: Phoenix as a source for Spark processing How large is each row in this case? Or, better yet, how large is the table in HBase? You're spreadi

Re: Phoenix as a source for Spark processing

2018-03-08 Thread Josh Elser
s. HBase has 14 region servers, both tables salted with SALT_BUCKETS=42. Spark's job running via Yarn. -Original Message- From: Josh Elser [mailto:els...@apache.org] Sent: Monday, March 5, 2018 9:14 PM To: user@phoenix.apache.org Subject: Re: Phoenix as a source for Spark processing

Re: Phoenix as a source for Spark processing

2018-03-08 Thread Josh Elser
I would guess that Hive would always be capable of out-matching what HBase/Phoenix can do for this type of workload (bulk-transformation). That said, I'm not ready to tell you that you can't get the Phoenix-Spark integration better performing. See the other thread where you provide more details

Re: Phoenix as a source for Spark processing

2018-03-07 Thread Stepan Migunov
Some more details... We have done some simple tests to compare read/write possibility spark+hive and spark+phoenix. And now we have the following results: Copy table (with no any transformations) (about 800 million rec): Hive (TEZ) - 752 sec Spark: >From Hive to Hive: 2463 sec >From Phoenix to H

RE: Phoenix as a source for Spark processing

2018-03-05 Thread Stepan Migunov
nt: Monday, March 5, 2018 9:14 PM To: user@phoenix.apache.org Subject: Re: Phoenix as a source for Spark processing Hi Stepan, Can you better ballpark the Phoenix-Spark performance you've seen (e.g. how much hardware do you have, how many spark executors did you use, how many region serve

Re: Phoenix as a source for Spark processing

2018-03-05 Thread Josh Elser
Hi Stepan, Can you better ballpark the Phoenix-Spark performance you've seen (e.g. how much hardware do you have, how many spark executors did you use, how many region servers)? Also, what versions of software are you using? I don't think there are any firm guidelines on how you can solve thi