Thank you both.
So am I correct that Spark fits in within the application tier in N-tier
architecture?
On Tuesday, 29 March 2016, 23:50, Alexander Pivovarov
<[email protected]> wrote:
Spark is a distributed data processing engine plus distributed in-memory /
disk data cache
spark-jobserver provides REST API to your spark applications. It allows you to
submit jobs to spark and get results in sync or async mode
It also can create long running Spark context to cache RDDs in memory with some
name (namedRDD) and then use it to serve requests from multiple users. Because
RDD is in memory response should be super fast (seconds)
https://github.com/spark-jobserver/spark-jobserver
On Tue, Mar 29, 2016 at 2:50 PM, Mich Talebzadeh <[email protected]>
wrote:
Interesting question.
The most widely used application of N-tier is the traditional three-tier
architecture that has been the backbone of Client-server architecture by having
presentation layer, application layer and data layer. This is primarily for
performance, scalability and maintenance. The most profound changes that Big
data space has introduced to N-tier architecture is the concept of horizontal
scaling as opposed to the previous tiers that relied on vertical scaling. HDFS
is an example of horizontal scaling at the data tier by adding more JBODS to
storage. Similarly adding more nodes to Spark cluster should result in better
performance.
Bear in mind that these tiers are at Logical levels which means that there or
may not be so many so many physical layers. For example multiple virtual
servers can be hosted on the same physical server.
With regard to Spark, it is effectively a powerful query tools that sits in
between the presentation layer (say Tableau) and the HDFS or Hive as you
alluded. In that sense you can think of Spark as part of the application layer
that communicates with the backend via a number of protocols including the
standard JDBC. There is rather a blurred vision here whether Spark is a
database or query tool. IMO it is a query tool in a sense that Spark by itself
does not have its own storage concept or metastore. Thus it relies on others to
provide that service.
HTH
Dr Mich Talebzadeh LinkedIn
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
http://talebzadehmich.wordpress.com
On 29 March 2016 at 22:07, Ashok Kumar <[email protected]> wrote:
Experts,
One of terms used and I hear is N-tier architecture within Big Data used for
availability, performance etc. I also hear that Spark by means of its query
engine and in-memory caching fits into middle tier (application layer) with
HDFS and Hive may be providing the data tier. Can someone elaborate the role
of Spark here. For example A Scala program that we write uses JDBC to talk to
databases so in that sense is Spark a middle tier application?
I hope that someone can clarify this and if so what would the best practice in
using Spark as middle tier and within Big data.
Thanks