Hi Zhang. Thank you for your response While your answer clarifies my confusion with `CollectLimit` it still does not clarify what is the recommended way to extract large amounts of data (but not all the records) from a source and maintain a high level of parallelism.
For example , at some instances trying to extract 1 million records from a table with over 100M records , I see my cluster using 1-2 cores out of the hundreds that I have available. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: [email protected]
