Thanks Shawn, I followed a similar approach.
Regards, Mohit From: Shawn Weeks <[email protected]> Sent: 27 June 2018 19:22 To: [email protected] Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion rows Well to get the partitions you can execute a 'show partitions table_name', then you can use the SplitRecord with an AvroReader and JSON Writer to generate a flow file for partition. That flow file can then be read with EvaluateJsonPath to pull the partition_name into an attribute on the flow file. Then finally a ReplaceText to actual write out the select statement substituting the partition variable. Thanks Shawn _____ From: Mohit <[email protected] <mailto:[email protected]> > Sent: Wednesday, June 27, 2018 8:40:20 AM To: [email protected] <mailto:[email protected]> Subject: RE: SelectHiveQl gets stuck when query table containning 12 Billion rows Hi, Yes I tried to fetch around 40 million rows which took time but it was executed. I'll try with the Avro thing. How to break the select into multiple part? Can you explain in brief the partition flow to start with? Thanks, Mohit From: Shawn Weeks <[email protected] <mailto:[email protected]> > Sent: 27 June 2018 18:51 To: [email protected] <mailto:[email protected]> Subject: Re: SelectHiveQl gets stuck when query table containning 12 Billion rows It's probably not stuck doing nothing, using a JDBC connection to fetch 12 Billion rows is going to be painful no matter what you do. At those kind of sizes you're probably better off having Hive create a temporary table in Avro format and then consuming the Avro files from HDFS into NiFi. The largest number of rows I've pulled into NiFi via JDBC in a single query is around 10-20 Million and that took a long time. You can also try breaking the select into multiple parts and running them simultaneously. I've done something similar where I first ran a query to get all of the partitions and then I executed a select for each partition in parallel. Thanks Shawn _____ From: Mohit <[email protected] <mailto:[email protected]> > Sent: Wednesday, June 27, 2018 8:14:25 AM To: [email protected] <mailto:[email protected]> Subject: SelectHiveQl gets stuck when query table containning 12 Billion rows Hi all, I'm trying to fetch data from hive using SelectHiveQL. It works fine for small to medium sized tables, but when I try to fetch data from large table with around 12 billion rows it gets stuck for hours but do nothing. I have set the Max Row per flowfile property to 10 million. We have a 4 node NiFi cluster with 150GB RAM memory each. Is there any configuration which is to be manipulated to make this work? Regards, Mohit
