Re: Getting too many open files during table scan

2017-06-23 Thread Michael Young
Sergey, thanks for this tip! Since our client data volume varies a lot from site to site, would splitting only on the first letters of the client_id lead to some regions being much larger than others? Or does phoenix distribute fairly across the different region servers? Would this continue to

Re: Getting too many open files during table scan

2017-06-23 Thread Sergey Soldatov
You may check "Are there any tips for optimizing Phoenix?" section of Apache Phoenix FAQ at https://phoenix.apache.org/faq.html. It says how to pre-split table. In your case you may split on the first letters of client_id. When we are talking about monotonous data, we usually mean the primary key

Re: Getting too many open files during table scan

2017-06-23 Thread Michael Young
>>Don't you have any other column which is obligatory in queries during reading but not monotonous with ingestion? We have several columns used in typical query WHERE clauses (like userID='abc' or a specific user attributes, data types). However, there are a number of columns which are monotonous

Re: Getting too many open files during table scan

2017-06-23 Thread Ankit Singhal
bq. A leading date column is in our schema model:- Don't you have any other column which is obligatory in queries during reading but not monotonous with ingestion? As pre-split can help you avoiding hot-spotting. For parallelism/performance comparison, have you tried running a query on a

Re: Getting too many open files during table scan

2017-06-22 Thread Michael Young
We started with no salt buckets, but the performance was terrible in our testing. A leading date column is in our schema model. We don't seem to be getting hotspotting after salting. Date range scans are very common as are slice and dice on many dimension columns. We have tested with a range

Re: Getting too many open files during table scan

2017-06-22 Thread James Taylor
My recommendation: don't use salt buckets unless you have a monatomically increasing row key, for example one that leads with the current date/time. Otherwise you'll be putting more load (# of salt buckets more load worst case) for bread-and-butter small-range-scan Phoenix queries. Thanks, James

Re: Getting too many open files during table scan

2017-06-22 Thread Michael Young
The ulimit open files was only 1024 for the user executing the query. After increasing, the queries behaves better. How can we tell if we need to reduce/increase the number of salt buckets? Our team set this based on read/write performance using data volume and expected queries to be run by

Re: Getting too many open files during table scan

2017-06-20 Thread Josh Elser
I think this is more of an issue of your 78 salt buckets than the width of your table. Each chunk, running in parallel, is spilling incremental counts to disk. I'd check your ulimit settings on the node which you run this query from and try to increase the number of open files allowed before