Thanks Maryann.
Yes let me switch to merge sort join because the other query uses lots more
columns. Also, if I just change the hint to use merge sort would that be enough
or I need to sort both the driving query and subquery with same order by for
merge sort?
As an aside, is there a document
Thank you Sumit, for trying this out! So right now it's very clear that the
table to be cached IS too big so there should be no point of using hash
join in this case. Is the other table much smaller, or it is about the same
size or even bigger? If it's considerably smaller you can probably rewrite
Thank you Maryann.
I am not using multi-tenancy for these tables. Increasing
phoenix.coprocessor.maxServerCacheTimeToLiveMs and the corresponding cache size
config just delayed the error.
I have also started seeing some memory problem -
Caused by:
Hi Fabio,
You could probably just execute a regular DELETE query from a JDBC call,
which is generally safe to do either from the Spark driver or within an
executor. As long as auto-commit is enabled, it's an entirely server side
operation: https://phoenix.apache.org/language/#delete
Josh
On
Hi,
I would like to perform a bulk delete to HBase using Apache Phoenix from
Spark. Using Phoenix-Spark plugin i can successfully perform a bulk load
using saveToPhoenix method from PhoenixRDD but how i can perform a bulk
delete? There isn't a deleteFromPhoenix method in PhoenixRDD. Is that
Yes, Sumit, the sub-query will get cached in hash join. Are you using
multi-tenancy for these tables? If yes, you might want to checkout Phoenix
4.7 or 4.8, since a related bug fix got in the 4.7 release.
Hi Antonio,
Certainly, a JIRA ticket with a patch would be fantastic.
Thanks!
Josh
On Wed, Sep 28, 2016 at 12:08 PM, Antonio Murgia
wrote:
> Thank you very much for your insights Josh, if I decide to develop a small
> Phoenix Library that does, through Spark, what the
Thank you very much for your insights Josh, if I decide to develop a
small Phoenix Library that does, through Spark, what the CSV loader
does, I'll surely write to the mailing list, or open a Jira, or maybe
even open a PR, right?
Thank you again
#A.M.
On 09/28/2016 05:10 PM, Josh Mahonin
Hi Antonio,
You're correct, the phoenix-spark output uses the Phoenix Hadoop
OutputFormat under the hood, which effectively does a parallel, batch JDBC
upsert. It should scale depending on the number of Spark executors,
RDD/DataFrame parallelism, and number of HBase RegionServers, though
Hi Ravi,
It looks like those log file entries you posted are from a mapreduce task.
Could you post the output of the command that you're using to start the
actual job (i.e. console output of "hadoop jar ...").
- Gabriel
On Wed, Sep 28, 2016 at 1:49 PM, Ravi Kumar Bommada
Hi All,
I'm trying to load data via phoenix mapreduce referring to below screen:
HFiles are getting created, each HFile is of size 300MB and 176 such HFiles are
there, but after that files are not moving to HBase. i.e when I'm querying
HBase I'm not able to see data.According to the logs
Ravi Kumar Bommada would like to recall the message, "Loading via MapReduce,
Not Moving HFiles to HBase".
Hi All,
I'm trying to load data via phoenix mapreduce referring to below screen:
HFiles are getting created, each HFile is of size 300MB and 176 such HFiles are
there, but after that files are not moving to HBase. i.e when I'm querying
HBase I'm not able to see data.According to the logs
Sorry Sasi, missed your last mails.
It seems that you have one region in a table or the query touching one
region because of monotonically increasing key['MK00100','YOU',4] .
Varying performance is because you may have filter which are aggressive and
skipping lots of rows in between (*0* (7965
Hi All,
I'm running phoenix mapreduce with 20 input files each of size 100 MB, all map
and reduce tasks got completed in 45 mins, except the last reduce task. The
last reduce task is taking 3 hrs to complete.
Please suggest me how I can optimize this.
Regard's
Ravi Kumar B
Mob: +91
15 matches
Mail list logo