Re: Direct HBase vs. Phoenix query performance

2018-03-21 Thread Marcell Ortutay
Thanks James! I've made a JIRA ticket here: https://issues.apache.org/jira/projects/PHOENIX/issues/PHOENIX-4666 This is a priority for us at 23andMe as it substantially affects some of our queries, so we'd be happy to provide a patch if Phoenix maintainers are able to provide some guidance on the

Re: Direct HBase vs. Phoenix query performance

2018-03-15 Thread James Taylor
Hi Marcell, Yes, that's correct - the cache we build for the RHS is only kept around while the join query is being executed. It'd be interesting to explore keeping the cache around longer for cases like yours (and probably not too difficult). We'd need to keep a map that maps the RHS query to its

Re: Direct HBase vs. Phoenix query performance

2018-03-14 Thread Marcell Ortutay
A quick update--I did some inspection of the Phoenix codebase, and it looks like my understanding of the coprocessor cache was incorrect. I thought it was meant to be used across queries, eg. that the RHS of the join would be saved for subsequent queries. In fact this is not the case, the

Re: Direct HBase vs. Phoenix query performance

2018-03-13 Thread Marcell Ortutay
Hi James, Thanks for the tips. Our row keys are (I think) reasonably optimized. I've made a gist which is an anonymized version of the query, and it indicates which conditions are / are not part of the PK. It is here: https://gist.github.com/ortutay23andme/12f03767db13343ee797c328a4d78c9c I

Re: Direct HBase vs. Phoenix query performance

2018-03-08 Thread James Taylor
Hi Marcell, It'd be helpful to see the table DDL and the query too along with an idea of how many regions might be involved in the query. If a query is a commonly run query, usually you'll design the row key around optimizing it. If you have other, simpler queries that have determined your row

Direct HBase vs. Phoenix query performance

2018-03-08 Thread Marcell Ortutay
Hi, I am using Phoenix at my company for a large query that is meant to be run in real time as part of our application. The query involves several aggregations, anti-joins, and an inner query. Here is the (anonymized) query plan:

Re: Phoenix query performance

2017-02-23 Thread Pradheep Shanmugam
e.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Phoenix query performance why cant you reduce your query to select msbo1.PARENTID from msbo_phoenix_comp_rowkey where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1

Re: Phoenix query performance

2017-02-22 Thread Arvind S
why cant you reduce your query to select msbo1.PARENTID from msbo_phoenix_comp_rowkey where msbo1.PARENTTYPE = 'SHIPMENT' and msbo1.OWNERORGID = 100 and msbo1.MILESTONETYPEID != 19661 and msbo1.PARENTREFERENCETIME between 1479964000 and 1480464000 group by msbo1.PARENTID order

Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
lt;maryann@gmail.com> > Reply-To: "user@phoenix.apache.org" <user@phoenix.apache.org> > Date: Wednesday, February 22, 2017 at 2:22 PM > To: "user@phoenix.apache.org" <user@phoenix.apache.org> > Subject: Re: Phoenix query performance > > Hi Pradhe

Re: Phoenix query performance

2017-02-22 Thread Pradheep Shanmugam
;> Date: Wednesday, February 22, 2017 at 2:22 PM To: "user@phoenix.apache.org<mailto:user@phoenix.apache.org>" <user@phoenix.apache.org<mailto:user@phoenix.apache.org>> Subject: Re: Phoenix query performance Hi Pradheep, Thank you for posting the query and the log file

Re: Phoenix query performance

2017-02-22 Thread Maryann Xue
Hi Pradheep, Thank you for posting the query and the log file! There are two things going on on the server side at the same time here. I think it'd be a good idea to isolate the problem first. So a few questions: 1. When you say data size went from "< 1M" to 30M, did the data from both LHS and

Phoenix query performance

2016-04-03 Thread Sumit Nigam
Hi, I was benchmarking some of the phoenix queries with different compaction level tuning.  A strange thing is observed when there are huge number of Hfiles on disk. The queries not returning any data (resultset size 0) execute very quickly (5-10 ms or so) but just doing a rs.next() on result