Re: LIMIT issue of SparkSQL

2016-10-29 Thread Asher Krim
We have also found LIMIT to take an unacceptable amount of time when reading parquet formatted data from s3. LIMIT was not strictly needed for our usecase, so we worked around it -- Asher Krim Senior Software Engineer On Fri, Oct 28, 2016 at 5:36 AM, Liz Bai wrote: > Sorry

Re: LIMIT issue of SparkSQL

2016-10-28 Thread Liz Bai
Sorry for the late reply. The size of the raw data is 20G and it is composed of two columns. We generated it by this . The test queries are very simple, 1). select ColA from Table limit 1 2). select ColA from Table

Re: LIMIT issue of SparkSQL

2016-10-24 Thread Michael Armbrust
It is not about limits on specific tables. We do support that. The case I'm describing involves pushing limits across system boundaries. It is certainly possible to do this, but the current datasource API does provide this information (other than the implicit limit that is pushed down to the

Re: LIMIT issue of SparkSQL

2016-10-24 Thread Mich Talebzadeh
This is an interesting point. As far as I know in any database (practically all RDBMS Oracle, SAP etc), the LIMIT affects the collection part of the result set. The result set is carried out fully on the query that may involve multiple joins on multiple underlying tables. To limit the actual

Re: LIMIT issue of SparkSQL

2016-10-23 Thread Michael Armbrust
- dev + user Can you give more info about the query? Maybe a full explain()? Are you using a datasource like JDBC? The API does not currently push down limits, but the documentation talks about how you can use a query instead of a table if that is what you are looking to do. On Mon, Oct 24,