Hi Rajeshwar Gaini, dbtable can be any valid sql query, simple define it as a sub query, something like:
val query = "(SELECT country, count(*) FROM customer group by country) as X" val df1 = sqlContext.read .format("jdbc") .option("url", url) .option("user", username) .option("password", pwd) .option("driver", "driverClassNameHere") .option("dbtable", query) .load() Not sure if that's what your looking for or not. HTH. -Todd On Mon, Jan 11, 2016 at 3:47 AM, Gaini Rajeshwar < raja.rajeshwar2...@gmail.com> wrote: > There is no problem with the sql read. When i do the following it is > working fine. > > *val dataframe1 = sqlContext.load("jdbc", Map("url" -> > "jdbc:postgresql://localhost/customerlogs?user=postgres&password=postgres", > "dbtable" -> "customer"))* > > *dataframe1.filter("country = 'BA'").show()* > > On Mon, Jan 11, 2016 at 1:41 PM, Xingchi Wang <regrec...@gmail.com> wrote: > >> Error happend at the "Lost task 0.0 in stage 0.0", I think it is not the >> "groupBy" problem, it's the sql read the "customer" table issue, >> please check the jdbc link and the data is loaded successfully?? >> >> Thanks >> Xingchi >> >> 2016-01-11 15:43 GMT+08:00 Gaini Rajeshwar <raja.rajeshwar2...@gmail.com> >> : >> >>> Hi All, >>> >>> I have a table named *customer *(customer_id, event, country, .... ) in >>> postgreSQL database. This table is having more than 100 million rows. >>> >>> I want to know number of events from each country. To achieve that i am >>> doing groupBY using spark as following. >>> >>> *val dataframe1 = sqlContext.load("jdbc", Map("url" -> >>> "jdbc:postgresql://localhost/customerlogs?user=postgres&password=postgres", >>> "dbtable" -> "customer"))* >>> >>> >>> *dataframe1.groupBy("country").count().show()* >>> >>> above code seems to be getting complete customer table before doing >>> groupBy. Because of that reason it is throwing the following error >>> >>> *16/01/11 12:49:04 WARN HeartbeatReceiver: Removing executor 0 with no >>> recent heartbeats: 170758 ms exceeds timeout 120000 ms* >>> *16/01/11 12:49:04 ERROR TaskSchedulerImpl: Lost executor 0 on >>> 10.2.12.59 <http://10.2.12.59>: Executor heartbeat timed out after 170758 >>> ms* >>> *16/01/11 12:49:04 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID >>> 0, 10.2.12.59): ExecutorLostFailure (executor 0 exited caused by one of the >>> running tasks) Reason: Executor heartbeat timed out after 170758 ms* >>> >>> I am using spark 1.6.0 >>> >>> Is there anyway i can solve this ? >>> >>> Thanks, >>> Rajeshwar Gaini. >>> >> >> >