HI! We are currently using HBase as our primary data store of different event-like data. On-top of that, we use Shark to aggregate this data and keep it in memory for fast data access. Since we use no specific HBase functionality whatsoever except Putting data into it, a discussion came up on having to set up an additional set of components on top of HDFS instead of just writing to HDFS directly.
Is there any overview regarding implications of doing that ? I mean except things like taking care of file structure and the like. What is the true advantage of Spark on HBase in favor of Spark on HDFS? Best Philip Automic Software GmbH, Hauptstrasse 3C, 3012 Wolfsgraben Firmenbuchnummer/Commercial Register No. 275184h Firmenbuchgericht/Commercial Register Court: Landesgericht St. Poelten This email (including any attachments) may contain information which is privileged, confidential, or protected. If you are not the intended recipient, note that any disclosure, copying, distribution, or use of the contents of this message and attached files is prohibited. If you have received this email in error, please notify the sender and delete this email and any attached files.
