spark can definitely very quickly answer queries like "give me all transactions with property x". and you can put a http query server in front of it and run queries concurrently.
but spark does not support inserts, updates, or fast random access lookups. this is because RDDs are immutable and designed for batch operations. so it supports appends (combining RDDs) and (very fast) full scan queries. now you can make the batch appends very frequent (like every second). if that is of interest take a look at spark streaming, which does just that. why not have kafka stream directly into hbase? than hbase can answer point queries or certain highly structured queries. and spark streaming listening to kafka could answer complex " freeform" queries on a recent window. best, koert On Oct 19, 2014 5:35 PM, "kc66" <kahchan...@yahoo.com> wrote: > I am very new to Spark. > I am work on a project that involves reading stock transactions off a > number > of TCP connections and > 1. periodically (once every few hours) uploads the transaction records to > HBase > 2. maintains the records that are not yet written into HBase and acts as a > HTTP query server for these records. An example for a query would be to > return all transactions between 1-2pm for Google stocks for the current > trading day. > > I am thinking of using Kafka to receive all the transaction records. Spark > will be the consumers of Kafka output. > > In particular, I need to create a RDD hashmap with string (stock ticker > symbol) as key and list (or vector) of transaction records as data. > This RDD need to be "thread (or process) safe" since different threads and > processes will be reading and modifying it. I need insertion, deletion, and > lookup to be fast. > Is this something that can be done with Spark and is Spark the right tool > to > use in terms of latency and throughput? > > Pardon me if I don't know what I am talking about. All these are very new > to > me. > Thanks! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Is-Spark-the-right-tool-tp16775.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >