You need the table in an efficient format, such as Orc or parquet. Have the
table sorted appropriately (hint: most discriminating column in the where
clause). Do not use SAN or virtualization for the slave nodes.
Can you please post your query.
I always recommend to avoid single updates where p
It depends on how you fetch the single row. Does your query complex ?
On Thu, Jan 7, 2016 at 12:47 PM, Balaraju.Kagidala Kagidala <
balaraju.kagid...@gmail.com> wrote:
> Hi ,
>
> I am new user to spark. I am trying to use Spark to process huge Hive
> data using Spark DataFrames.
>
>
> I have 5
Hi ,
I am new user to spark. I am trying to use Spark to process huge Hive
data using Spark DataFrames.
I have 5 node Spark cluster each with 30 GB memory. i am want to process
hive table with 450GB data using DataFrames. To fetch single row from Hive
table its taking 36 mins. Pls suggest me w