Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread Ciureanu Constantin
Then please post a small part of your code (that one reading from Phoenix & processing the RDD contents) 2016-10-14 11:12 GMT+02:00 Antonio Murgia : > For the record, autocommit was set to true. > > On 10/14/2016 10:08 AM, James Taylor wrote: > > > > On Fri, Oct 14, 2016 at 12:37 AM, Antonio Murg

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread Antonio Murgia
For the record, autocommit was set to true. On 10/14/2016 10:08 AM, James Taylor wrote: On Fri, Oct 14, 2016 at 12:37 AM, Antonio Murgia mailto:antonio.mur...@eng.it>> wrote: We tried with an Upsert from select, but we ran into some memory issue from the phoenix side. Do you h

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread Antonio Murgia
I know spark doc is really comprehensive, I read it a lot of times in the last 2 years, I know how to check how Spark uses its memory and how to tweak it (e.g. using more memory for caching or not). I'll try asking to not use any memory to cache the rdd, since I'm not caching at all. Please don

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread James Taylor
On Fri, Oct 14, 2016 at 12:37 AM, Antonio Murgia wrote: > We tried with an Upsert from select, but we ran into some memory issue > from the phoenix side. > > Do you have any suggestion to perform something like that? > You can try setting auto commit to true on the connection before you perform t

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread Mich Talebzadeh
"I do know how Spark in general works, and how it stores data in memory etc. It's been almost 2 years that I work on it. So I'm definetely not collecting the whole rdd in memory ;)" Spark doc is a good start. To see how spark memory is utilised look at Spark UI on :4040 by default under storage t

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-14 Thread Antonio Murgia
Hi Constantin, thank you for your reply. I do know how Spark in general works, and how it stores data in memory etc. It's been almost 2 years that I work on it. So I'm definetely not collecting the whole rdd in memory ;) Our "mantainance use case" is the following: Copying the whole content

Re: sc.phoenixTableAsRDD number of initial partitions

2016-10-13 Thread Ciureanu Constantin
Hi Antonio, Reading the whole table is not a good use-case for Phoenix / HBase or any DB. You should never ever store the whole content read from DB / disk into memory, that's definitely wrong. Spark doesn't do that by itself, no matter what "they" told you that it's going to do in order to be fast