Hello, I am investigating how well accumulo will handle mapreduce jobs. I am interested in hearing about any known issues from anyone running mapreduce with accumulo as their source and sink. Specifically, I want to hear your thoughts about the following:
Assume cluster has 50 nodes. Accumulo running is on three nodes Solr is on three nodes 1. how many concurrent mutations can accumulo handle - more details on how this works would be extremely helpful 2. is there a delay between when map reduce writes data to table vs. when the data is available for read. 3. how are concurrent mutations to the same row handled (say from different mappers/reducers) since accumulo isn't transactional 4. I am trying to solr index some accumulo data --- are there are any know issues on accumulo end? solr end? how does one vs. multiple shard affect the MR job? 5. should I have more accumulo/ solr nodes (ie an instance on each node in cluster? is that necessary? workarounds?) 5. Normally I have log4j statements all over the java job. Can I still use them with map reduce? I apologize if any of these questions do not belong on this mailing list (and please point me to where I can ask them, if possible). I am trying to gather a lot of information to decide if this is a good approach for me and the level of effort needed so I realize these are a lot of questions. I very much appreciate any and all feedback. Thank you for your time in advance!
