Hi Zhang. Thank you for your response 

While your answer clarifies my confusion with `CollectLimit` it still does
not clarify what is the recommended way to extract large amounts of data
(but not all the records) from a source and maintain a high level of
parallelism. 

For example , at some instances trying to extract 1 million records from a
table with over 100M records , I see my cluster using 1-2 cores out of the
hundreds that I have available. 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Reply via email to