Re: General Question (Spark Hive integration )

2016-01-21 Thread Bala
Thanks for the response Silvio,  my table is not partitioned because my filter 
column is primary key , I guess we can't partition on primary key column. My 
table has 600 million data if I query single regard it seems by default its 
loding whole data and taking some time to just return single record. Pls 
suggest if any thing I can tune here, and mine 5 node spark cluster

Bala

> On Jan 21, 2016, at 7:07 PM, Silvio Fiorito  
> wrote:
> 
> Also, just to clarify it doesn’t read the whole table into memory unless you 
> specifically cache it.
> 
> From: Silvio Fiorito 
> Date: Thursday, January 21, 2016 at 10:02 PM
> To: "Balaraju.Kagidala Kagidala" , 
> "user@spark.apache.org" 
> Subject: Re: General Question (Spark Hive integration )
> 
> Hi Bala,
> 
> It depends on how your Hive table is configured. If you used partitioning and 
> you are filtering on a partition column then it will only load the relevant 
> partitions. If, however, you’re filtering on a non-partitioned column then it 
> will have to read all the data and then filter as part of the Spark job.
> 
> Thanks,
> Silvio
> 
> From: "Balaraju.Kagidala Kagidala" 
> Date: Thursday, January 21, 2016 at 9:37 PM
> To: "user@spark.apache.org" 
> Subject: General Question (Spark Hive integration )
> 
> Hi ,
> 
> 
>   I have simple question regarding Spark Hive integration with DataFrames.
> 
> When we query  for a table, does spark loads whole table into memory and 
> applies the filter on top of it or it only loads the data with filter applied.
> 
> for example if the my query 'select * from employee where deptno=10' does my 
> rdd loads whole employee data into memory and applies fileter or will it load 
> only dept number 10 data.
> 
> 
> Thanks
> Bala
> 
> 
> 
> 
> 


Re: General Question (Spark Hive integration )

2016-01-21 Thread Silvio Fiorito
Also, just to clarify it doesn’t read the whole table into memory unless you 
specifically cache it.

From: Silvio Fiorito 
mailto:silvio.fior...@granturing.com>>
Date: Thursday, January 21, 2016 at 10:02 PM
To: "Balaraju.Kagidala Kagidala" 
mailto:balaraju.kagid...@gmail.com>>, 
"user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: Re: General Question (Spark Hive integration )

Hi Bala,

It depends on how your Hive table is configured. If you used partitioning and 
you are filtering on a partition column then it will only load the relevant 
partitions. If, however, you’re filtering on a non-partitioned column then it 
will have to read all the data and then filter as part of the Spark job.

Thanks,
Silvio

From: "Balaraju.Kagidala Kagidala" 
mailto:balaraju.kagid...@gmail.com>>
Date: Thursday, January 21, 2016 at 9:37 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: General Question (Spark Hive integration )

Hi ,


  I have simple question regarding Spark Hive integration with DataFrames.

When we query  for a table, does spark loads whole table into memory and 
applies the filter on top of it or it only loads the data with filter applied.

for example if the my query 'select * from employee where deptno=10' does my 
rdd loads whole employee data into memory and applies fileter or will it load 
only dept number 10 data.


Thanks
Bala







Re: General Question (Spark Hive integration )

2016-01-21 Thread Silvio Fiorito
Hi Bala,

It depends on how your Hive table is configured. If you used partitioning and 
you are filtering on a partition column then it will only load the relevant 
partitions. If, however, you’re filtering on a non-partitioned column then it 
will have to read all the data and then filter as part of the Spark job.

Thanks,
Silvio

From: "Balaraju.Kagidala Kagidala" 
mailto:balaraju.kagid...@gmail.com>>
Date: Thursday, January 21, 2016 at 9:37 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>" 
mailto:user@spark.apache.org>>
Subject: General Question (Spark Hive integration )

Hi ,


  I have simple question regarding Spark Hive integration with DataFrames.

When we query  for a table, does spark loads whole table into memory and 
applies the filter on top of it or it only loads the data with filter applied.

for example if the my query 'select * from employee where deptno=10' does my 
rdd loads whole employee data into memory and applies fileter or will it load 
only dept number 10 data.


Thanks
Bala







General Question (Spark Hive integration )

2016-01-21 Thread Balaraju.Kagidala Kagidala
Hi ,


  I have simple question regarding Spark Hive integration with DataFrames.

When we query  for a table, does spark loads whole table into memory and
applies the filter on top of it or it only loads the data with filter
applied.

for example if the my query 'select * from employee where deptno=10' does
my rdd loads whole employee data into memory and applies fileter or will it
load only dept number 10 data.


Thanks
Bala