Re: Hash join confusion

2016-09-28 Thread Sumit Nigam
Thanks Maryann. Yes let me switch to merge sort join because the other query uses lots more columns. Also, if I just change the hint to use merge sort would that be enough or I need to sort both the driving query and subquery with same order by for merge sort? As an aside, is there a document

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Thank you Sumit, for trying this out! So right now it's very clear that the table to be cached IS too big so there should be no point of using hash join in this case. Is the other table much smaller, or it is about the same size or even bigger? If it's considerably smaller you can probably rewrite

Re: Hash join confusion

2016-09-28 Thread Sumit Nigam
Thank you Maryann. I am not using multi-tenancy for these tables. Increasing  phoenix.coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size config just delayed the error.  I have also started seeing some memory problem - Caused by:

Re: bulk-delete spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Fabio, You could probably just execute a regular DELETE query from a JDBC call, which is generally safe to do either from the Spark driver or within an executor. As long as auto-commit is enabled, it's an entirely server side operation: https://phoenix.apache.org/language/#delete Josh On

bulk-delete spark phoenix

2016-09-28 Thread fabio ferrante
Hi, I would like to perform a bulk delete to HBase using Apache Phoenix from Spark. Using Phoenix-Spark plugin i can successfully perform a bulk load using saveToPhoenix method from PhoenixRDD but how i can perform a bulk delete? There isn't a deleteFromPhoenix method in PhoenixRDD. Is that

Re: Hash join confusion

2016-09-28 Thread Maryann Xue
Yes, Sumit, the sub-query will get cached in hash join. Are you using multi-tenancy for these tables? If yes, you might want to checkout Phoenix 4.7 or 4.8, since a related bug fix got in the 4.7 release.

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Antonio, Certainly, a JIRA ticket with a patch would be fantastic. Thanks! Josh On Wed, Sep 28, 2016 at 12:08 PM, Antonio Murgia wrote: > Thank you very much for your insights Josh, if I decide to develop a small > Phoenix Library that does, through Spark, what the

Re: bulk-upsert spark phoenix

2016-09-28 Thread Antonio Murgia
Thank you very much for your insights Josh, if I decide to develop a small Phoenix Library that does, through Spark, what the CSV loader does, I'll surely write to the mailing list, or open a Jira, or maybe even open a PR, right? Thank you again #A.M. On 09/28/2016 05:10 PM, Josh Mahonin

Re: bulk-upsert spark phoenix

2016-09-28 Thread Josh Mahonin
Hi Antonio, You're correct, the phoenix-spark output uses the Phoenix Hadoop OutputFormat under the hood, which effectively does a parallel, batch JDBC upsert. It should scale depending on the number of Spark executors, RDD/DataFrame parallelism, and number of HBase RegionServers, though

Re: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Gabriel Reid
Hi Ravi, It looks like those log file entries you posted are from a mapreduce task. Could you post the output of the command that you're using to start the actual job (i.e. console output of "hadoop jar ..."). - Gabriel On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada

Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Ravi Kumar Bommada
Hi All, I'm trying to load data via phoenix mapreduce referring to below screen: HFiles are getting created, each HFile is of size 300MB and 176 such HFiles are there, but after that files are not moving to HBase. i.e when I'm querying HBase I'm not able to see data.According to the logs

Recall: Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Ravi Kumar Bommada
Ravi Kumar Bommada would like to recall the message, "Loading via MapReduce, Not Moving HFiles to HBase".

Loading via MapReduce, Not Moving HFiles to HBase

2016-09-28 Thread Ravi Kumar Bommada
Hi All, I'm trying to load data via phoenix mapreduce referring to below screen: HFiles are getting created, each HFile is of size 300MB and 176 such HFiles are there, but after that files are not moving to HBase. i.e when I'm querying HBase I'm not able to see data.According to the logs

Re: Phoenix ResultSet.next() takes a long time for first row

2016-09-28 Thread Ankit Singhal
Sorry Sasi, missed your last mails. It seems that you have one region in a table or the query touching one region because of monotonically increasing key['MK00100','YOU',4] . Varying performance is because you may have filter which are aggressive and skipping lots of rows in between (*0* (7965

Phoenix Bulkload Mapreduce Perfomance Issue

2016-09-28 Thread Ravi Kumar Bommada
Hi All, I'm running phoenix mapreduce with 20 input files each of size 100 MB, all map and reduce tasks got completed in 45 mins, except the last reduce task. The last reduce task is taking 3 hrs to complete. Please suggest me how I can optimize this. Regard's Ravi Kumar B Mob: +91