Do SparkSQL support subquery?

2015-07-13 Thread Louis Hust
Hi, all I am using spark 1.4, and find some sql is not support, especially the subquery, such as subquery in select items, in where clause, and in predicate conditions. So i want to know if spark support subquery or i am in the wrong way using spark sql? If not support subquery, is there a plan

Is spark suitable for real time query

2015-07-22 Thread Louis Hust
Hi, all I am using spark jar in standalone mode, fetch data from different mysql instance and do some action, but i found the time is at second level. So i want to know if spark job is suitable for real time query which at microseconds?

Re: Is spark suitable for real time query

2015-07-22 Thread Louis Hust
second level. > > Robin > > > On 22 Jul 2015, at 11:14, Louis Hust wrote: > > > > Hi, all > > > > I am using spark jar in standalone mode, fetch data from different mysql > instance and do some action, but i found the time is at second level. > > > &

Re: Is spark suitable for real time query

2015-07-22 Thread Louis Hust
c.textFile("LICENSE").filter(_ contains "Spark").count > > This takes less than a second the first time I run it and is instantaneous > on every subsequent run. > > What code are you running? > > > On 22 Jul 2015, at 12:34, Louis Hust wrote: > > I

Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
Hi, all, I am using spark DataFrame to fetch small table from MySQL, and i found it cost so much than directly access MySQL Using JDBC. Time cost for Spark is about 2033ms, and direct access at about 16ms. Code can be found at: https://github.com/louishust/sparkDemo/blob/master/src/main/java/Di

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
, it's possible because the overhead of > Spark dominates for small queries. > > Best Regards, > Shixiong Zhu > > 2015-07-26 15:56 GMT+08:00 Jerrick Hoang : > >> how big is the dataset? how complicated is the query? >> >> On Sun, Jul 26, 2015 at 12:47 AM Louis Hus

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
hixiong Zhu > > 2015-07-26 16:16 GMT+08:00 Louis Hust : > >> Look at the given url: >> >> Code can be found at: >> >> >> https://github.com/louishust/sparkDemo/blob/master/src/main/java/DirectQueryTest.java >> >> 2015-07-26 16:14 GMT+08:00 Sh

Re: Spark is much slower than direct access MySQL

2015-07-26 Thread Louis Hust
leverage > the distributed in memory engine of spark. > > Paolo > > Inviata dal mio Windows Phone > ------ > Da: Louis Hust > Inviato: ‎26/‎07/‎2015 10:28 > A: Shixiong Zhu > Cc: Jerrick Hoang ; user@spark.apache.org > Oggetto: Re: Spark is m

OOM when extract big data from MySQL Using JDBC

2018-04-01 Thread Louis Hust
hi, all, We deploy sparksql in standalone mode without HDFS on 1 machine with 256G RAM and 64 cores. The spark session props like below: SparkSession.builder().appName("MYAPP") > .config("spark.sql.crossJoin.enabled", "true") > .config("spark.executor.memory", th

How to use disk instead of just InMemoryRelation when use JDBC datasource in SPARKSQL?

2018-04-10 Thread Louis Hust
We want to extract data from mysql, and calculate in sparksql. The sql explain like below. == Parsed Logical Plan == > 'Sort ['revenue DESC NULLS LAST], true > +- 'Aggregate ['n_name], ['n_name, 'SUM(('l_extendedprice * (1 - > 'l_discount))) AS revenue#329] >+- 'Filter ('c_custkey = 'o_cu

How to use disk instead of just InMemoryRelation when use JDBC datasource in SPARKSQL?

2018-04-11 Thread Louis Hust
We want to extract data from mysql, and calculate in sparksql. The sql explain like below. REGIONKEY#177,N_COMMENT#178] PushedFilters: [], ReadSchema: struct +- *(20) Sort [r_regionkey#203 ASC NULLS FIRST], false, 0 +- Exchange(coordinator id: 266374831) ha