Hi People, I'm using java kafka spark streaming and saving the result file into hdfs.
As per my understanding, spark streaming write every processed message or event to hdfs file. Reason to creating one file per message or event could be to ensure fault tolerance. Is there any way spark handle this small file problem or Do I need to append small files into bigger file and then insert into hdfs? Appreciate your time and suggestions.