It depends on many situations:

1) what’s your data format?  csv(text) or ORC/parquet?
2) Did you have Data warehouse to summary/cluster  your data?


if your data is text or you query for the raw data, It should be slow, Spark 
cannot do much to optimize your job.




> On Dec 2, 2015, at 9:21 AM, Andrés Ivaldi <[email protected]> wrote:
> 
> Mark, We have an application that use data from different kind of source, and 
> we build a engine able to handle that, but cant scale with big data(we could 
> but is to time expensive), and doesn't have Machine learning module, etc, we 
> came across with Spark and it's looks like it have all we need, actually it 
> does, but our latency is very low right now, and when we do some testing it 
> took too long time for the same kind of results, always against RDBM which is 
> our primary source. 
> 
> So, we want to expand our sources, to cvs, web service, big data, etc, we can 
> extend our engine or use something like Spark, which give as power of 
> clustering, different kind of source access, streaming, machine learning, 
> easy extensibility and so on. 
> 
> On Tue, Dec 1, 2015 at 9:36 PM, Mark Hamstra <[email protected] 
> <mailto:[email protected]>> wrote:
> I'd ask another question first: If your SQL query can be executed in a 
> performant fashion against a conventional (RDBMS?) database, why are you 
> trying to use Spark?  How you answer that question will be the key to 
> deciding among the engineering design tradeoffs to effectively use Spark or 
> some other solution.
> 
> On Tue, Dec 1, 2015 at 4:23 PM, Andrés Ivaldi <[email protected] 
> <mailto:[email protected]>> wrote:
> Ok, so latency problem is being generated because I'm using SQL as source? 
> how about csv, hive, or another source?
> 
> On Tue, Dec 1, 2015 at 9:18 PM, Mark Hamstra <[email protected] 
> <mailto:[email protected]>> wrote:
> It is not designed for interactive queries.
> 
> You might want to ask the designers of Spark, Spark SQL, and particularly 
> some things built on top of Spark (such as BlinkDB) about their intent with 
> regard to interactive queries.  Interactive queries are not the only designed 
> use of Spark, but it is going too far to claim that Spark is not designed at 
> all to handle interactive queries.
> 
> That being said, I think that you are correct to question the wisdom of 
> expecting lowest-latency query response from Spark using SQL (sic, presumably 
> a RDBMS is intended) as the datastore.
> 
> On Tue, Dec 1, 2015 at 4:05 PM, Jörn Franke <[email protected] 
> <mailto:[email protected]>> wrote:
> Hmm it will never be faster than SQL if you use SQL as an underlying storage. 
> Spark is (currently) an in-memory batch engine for iterative machine learning 
> workloads. It is not designed for interactive queries. 
> Currently hive is going into the direction of interactive queries. 
> Alternatives are Hbase on Phoenix or Impala.
> 
> On 01 Dec 2015, at 21:58, Andrés Ivaldi <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Yes, 
>> The use case would be,
>> Have spark in a service (I didnt invertigate this yet), through api calls of 
>> this service we perform some aggregations over data in SQL, We are already 
>> doing this with an internal development
>> 
>> Nothing complicated, for instance, a table with Product, Product Family, 
>> cost, price, etc. Columns like Dimension and Measures,
>> 
>> I want to Spark for query that table to perform a kind of rollup, with cost 
>> as Measure and Prodcut, Product Family as Dimension
>> 
>> Only 3 columns, it takes like 20s to perform that query and the aggregation, 
>> the  query directly to the database with a grouping at the columns takes 
>> like 1s 
>> 
>> regards
>> 
>> 
>> 
>> On Tue, Dec 1, 2015 at 5:38 PM, Jörn Franke <[email protected] 
>> <mailto:[email protected]>> wrote:
>> can you elaborate more on the use case?
>> 
>> > On 01 Dec 2015, at 20:51, Andrés Ivaldi <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> >
>> > Hi,
>> >
>> > I'd like to use spark to perform some transformations over data stored 
>> > inSQL, but I need low Latency, I'm doing some test and I run into spark 
>> > context creation and data query over SQL takes too long time.
>> >
>> > Any idea for speed up the process?
>> >
>> > regards.
>> >
>> > --
>> > Ing. Ivaldi Andres
>> 
>> 
>> 
>> -- 
>> Ing. Ivaldi Andres
> 
> 
> 
> 
> -- 
> Ing. Ivaldi Andres
> 
> 
> 
> 
> -- 
> Ing. Ivaldi Andres

Reply via email to