You need the table in an efficient format, such as Orc or parquet. Have the 
table sorted appropriately (hint: most discriminating column in the where 
clause). Do not use SAN or virtualization for the slave nodes.

Can you please post your query.

I always recommend to avoid single updates where possible. They are very 
inefficient for analytics scenarios - this is somehow also true for the 
traditional database world (depends on the use case of course).

> On 07 Jan 2016, at 05:47, Balaraju.Kagidala Kagidala 
> <balaraju.kagid...@gmail.com> wrote:
> 
> Hi ,
> 
>   I am new user to spark. I am trying to use Spark to process huge Hive data 
> using Spark DataFrames.
> 
> 
> I have 5 node Spark cluster each with 30 GB memory. i am want to process hive 
> table with 450GB data using DataFrames. To fetch single row from Hive table 
> its taking 36 mins. Pls suggest me what wrong here and any help is 
> appreciated.
> 
> 
> Thanks
> Bala
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to