I use spark to generate data , then we use hive/pig/presto/spark to analyze
data, but I found even I add used bucketBy and sortBy with bucket number in
Spark, the results files was generate by Spark is always far more than
bucket number under each partition, then Presto can not recognize the
bucket, how can I control that in Spark ?
Unfortunately, I did not find any way to do that.
Adam - App Annie Ops
Phone: +86 18610024053
*This email may contain or reference confidential information and is
intended only for the individual to whom it is addressed. Please refrain
from distributing, disclosing or copying this email and the information
contained within unless you are the intended recipient. If you received
this email in error, please notify us at le...@appannie.com
<le...@appannie.com>** immediately and remove it from your system.*