Thanks for the anwsers. I'm trying to really make sense of NoSql and Hbase in particular. The software part has a lot of loop wholes and I'm still fighting off the compaction storm issue, so right I would not say hbase is fast when it comes to writing.
But my post was more nosql schema thoughts, after so long on SQL schemas it does take a little time to stop thinking that way in terms of schema but also of in terms of questions or of interaction if you'd rather. So contrary to SQL I cannot think a logical model for data and figure out later what I'll want out of it. In my case I stated 10 TB but this is very likely to grow since it is the starting scenario. I do believe having a 30 minutes latency before ingesting logs is not an issue, however the questions to the Hbase must be anwsered in real time manner. I have been trying to play with my questions and see how they can fit in a rowkey and Or columnfamilies but they being different in nature and purpose I ended supposing they would end up in a number of different hbase tables in order to adress the scope of questions. One table for one or three questions. The questions have joins and filter embedded in them. My post was about getting your insight on how you would go about answering this type of issues, what your schemas might be. Overall how to switch from SQL vision to noSQL vision. Coprocessor to create a couple of tables on the fly for all questions are an interesting way. To mapreduce the logs however I am afraid the performance would be to slow. I was thinking of answering in milliseconds if possible. But this might be me being new and not evaluating correctly.
