Hi,

 

Yes I tried to fetch around 40 million rows which took time but it was
executed. I'll try with the Avro thing.

 

How to break the  select into multiple part? Can you explain in brief the
partition flow to start with?

 

Thanks,

Mohit

 

From: Shawn Weeks <[email protected]> 
Sent: 27 June 2018 18:51
To: [email protected]
Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion
rows

 

It's probably not stuck doing nothing, using a JDBC connection to fetch 12
Billion rows is going to be painful no matter what you do. At those kind of
sizes you're probably better off having Hive create a temporary table in
Avro format and then consuming the Avro files from HDFS into NiFi. The
largest number of rows I've pulled into NiFi via JDBC in a single query is
around 10-20 Million and that took a long time. You can also try breaking
the select into multiple parts and running them simultaneously. I've done
something similar where I first ran a query to get all of the partitions and
then I executed a select for each partition in parallel.

 

Thanks

Shawn

  _____  

From: Mohit <[email protected]
<mailto:[email protected]> >
Sent: Wednesday, June 27, 2018 8:14:25 AM
To: [email protected] <mailto:[email protected]> 
Subject: SelectHiveQl gets stuck when query table containning 12 Billion
rows 

 

Hi all,

 

I'm trying to fetch data from hive using SelectHiveQL. It works fine for
small to medium sized tables, but when I try to fetch data from large table
with around 12 billion rows it gets stuck for hours but do nothing.  I have
set the Max Row per flowfile property to 10 million.

We have a 4 node NiFi cluster with 150GB RAM memory each. 

Is there any configuration which is to be manipulated to make this work?

 

Regards,

Mohit

Reply via email to