I have a small Java app that I wrote that uses JDBC to run a hive query.
The Hive table that I'm running it against has 30+ million rows, and I want
to pull them all back to verify the data. If I run a simple "SELECT * FROM
<table>" and set a fetch size of 30,000 then the fetch size is not honored
and it seems to want to bring back all 30+ million rows at once, which is
definitely not going to work. If I set a LIMIT on the SQL, like "SELECT *
FROM <table> LIMIT 9999999", then it honors the fetch size just fine.
However, when I set the LIMIT on there, it does not run as a map reduce job
but rather seems to stream the data back. Is this how it's supposed to
work? I'm new to the Hadoop eco-system and I'm really just trying to figure
out what the best way to bring this data back in chunks is. Maybe I'm going
about this all wrong?