user@accumulo,

I was working with the Wikipedia Accumulo ingest examples, and I was trying to 
get the ingest of a single archive file to be as fast as ingesting multiple 
archives through parallelization. I increased the number of ways the job split 
the single archive so that all the servers could work on ingesting at the same 
time. What I noticed, however, was that having all the servers work on 
ingesting the same file was still not nearly as fast as using multiple ingest 
files. I was wondering if I could have some insight into the design of the 
Wikipedia ingest that could explain this phenomenon.


Thank you for your time,
Patrick Lynch

Reply via email to