subject:"Re\: SparkSQL performance"

Re: SparkSQL performance

2015-04-22 Thread Michael Armbrust

https://github.com/databricks/spark-avro On Tue, Apr 21, 2015 at 3:09 PM, Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > Thanks Michael! > I have tried applying my schema programatically but I didn't get any > improvement on performance :( > Could you point me to some code exa

Re: SparkSQL performance

2015-04-21 Thread Renato Marroquín Mogrovejo

Thanks Michael! I have tried applying my schema programatically but I didn't get any improvement on performance :( Could you point me to some code examples using Avro please? Many thanks again! Renato M. 2015-04-21 20:45 GMT+02:00 Michael Armbrust : > Here is an example using rows directly: > >

Re: SparkSQL performance

2015-04-21 Thread Michael Armbrust

Here is an example using rows directly: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#programmatically-specifying-the-schema Avro or parquet input would likely give you the best performance. On Tue, Apr 21, 2015 at 4:28 AM, Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com

Re: SparkSQL performance

2015-04-21 Thread Renato Marroquín Mogrovejo

Thanks for the hints guys! much appreciated! Even if I just do a something like: "Select * from tableX where attribute1 < 5" I see similar behaviour. @Michael Could you point me to any sample code that uses Spark's Rows? We are at a phase where we can actually change our JavaBeans for something

Re: SparkSQL performance

2015-04-20 Thread Michael Armbrust

There is a cost to converting from JavaBeans to Rows and this code path has not been optimized. That is likely what you are seeing. On Mon, Apr 20, 2015 at 3:55 PM, ayan guha wrote: > SparkSQL optimizes better by column pruning and predicate pushdown, > primarily. Here you are not taking advant

Re: SparkSQL performance

2015-04-20 Thread ayan guha

SparkSQL optimizes better by column pruning and predicate pushdown, primarily. Here you are not taking advantage of either. I am curious to know what goes in your filter function, as you are not using a filter in SQL side. Best Ayan On 21 Apr 2015 08:05, "Renato Marroquín Mogrovejo" < renatoj.mar

Re: SparkSQL performance

2015-04-20 Thread Renato Marroquín Mogrovejo

Does anybody have an idea? a clue? a hint? Thanks! Renato M. 2015-04-20 9:31 GMT+02:00 Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com>: > Hi all, > > I have a simple query "Select * from tableX where attribute1 between 0 and > 5" that I run over a Kryo file with four partitions that e

Re: SparkSQL Performance Tuning Options

2015-01-27 Thread Cheng Lian

On 1/27/15 5:55 PM, Cheng Lian wrote: On 1/27/15 11:38 AM, Manoj Samel wrote: Spark 1.2, no Hive, prefer not to use HiveContext to avoid metastore_db. Use case is Spark Yarn app will start and serve as query server for multiple users i.e. always up and running. At startup, there is option t

Re: SparkSQL Performance Tuning Options

2015-01-27 Thread Cheng Lian

On 1/27/15 11:38 AM, Manoj Samel wrote: Spark 1.2, no Hive, prefer not to use HiveContext to avoid metastore_db. Use case is Spark Yarn app will start and serve as query server for multiple users i.e. always up and running. At startup, there is option to cache data and also pre-compute some r

Re: SparkSQL performance

2014-11-03 Thread Marius Soutier

I did some simple experiments with Impala and Spark, and Impala came out ahead. But it’s also less flexible, couldn’t handle irregular schemas, didn't support Json, and so on. On 01.11.2014, at 02:20, Soumya Simanta wrote: > I agree. My personal experience with Spark core is that it performs r

Re: SparkSQL performance

2014-10-31 Thread Soumya Simanta

I agree. My personal experience with Spark core is that it performs really well once you tune it properly. As far I understand SparkSQL under the hood performs many of these optimizations (order of Spark operations) and uses a more efficient storage format. Is this assumption correct? Has anyone

Re: SparkSQL performance

2014-10-31 Thread Soumya Simanta

I agree. My personal experience with Spark core is that it performs really well once you tune it properly. As far I understand SparkSQL under the hood performs many of these optimizations (order of Spark operations) and uses a more efficient storage format. Is this assumption correct? Has anyone

Re: SparkSQL performance

2014-10-31 Thread Du Li

We have seen all kinds of results published that often contradict each other. My take is that the authors often know more tricks about how to tune their own/familiar products than the others. So the product on focus is tuned for ideal performance while the competitors are not. The authors are no

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL Performance Tuning Options

Re: SparkSQL Performance Tuning Options

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

Re: SparkSQL performance

13 matches

Site Navigation

Mail list logo

Footer information