Alaa,

one  option is to use Spark as a cache, importing subset of data from
hbase/phoenix that fits in memory, and using jdbcrdd to get more data on
cache miss. The front end can be created with pyspark and flusk, either as
rest api translating json requests to sparkSQL dialect, or simply allowing
the user to post sparkSql queries directly

On Sun, Nov 23, 2014 at 3:37 PM, Alaa Ali <contact.a...@gmail.com> wrote:

> Hello. Okay, so I'm working on a project to run analytic processing using
> Spark or PySpark. Right now, I connect to the shell and execute my
> commands. The very first part of my commands is: create an SQL JDBC
> connection and cursor to pull from Apache Phoenix, do some processing on
> the returned data, and spit out some output. I want to create a web "gui"
> tool kind of a thing where I play around with what SQL query is executed
> for my analysis.
>
> I know that I can write my whole Spark program and use spark-submit and
> have it accept and argument to be the SQL query I want to execute, but this
> means that every time I submit: an SQL connection will be created, query
> ran, processing done, output printed, program closes and SQL connection
> closes, and then the whole thing repeats if I want to do another query
> right away. That will probably cause it to be very slow. Is there a way
> where I can somehow have the SQL connection "working" in the backend for
> example, and then all I have to do is supply a query from my GUI tool where
> it then takes it, runs it, displays the output? I just want to know the big
> picture and a broad overview of how would I go about doing this and what
> additional technology to use and I'll dig up the rest.
>
> Regards,
> Alaa Ali
>

Reply via email to