Presto has slightly lower latency than Spark, but I've found that it gets stuck
on some edge cases.
If you are on AWS, then the simplest solution is to use Athena. Athena is built
on Presto, has a JDBC driver, and is serverless, so you don't have to take any
headaches
On 2/18/21, 3:32 PM, "S
> On Feb 18, 2021, at 12:52 PM, Jeff Evans
> wrote:
>
> It sounds like the tool you're after, then, is a distributed SQL engine like
> Presto. But I could be totally misunderstanding what you're trying to do.
Presto may well be a longer-term solution as our use grows. For now, a simple
data
> On Feb 18, 2021, at 1:13 PM, Lalwani, Jayesh
> wrote:
>
> Have you tried any of those? Where are you getting stuck?
Thanks! The 3rd one in your list I had not found, and it seems to fill in what
I was missing (CREATE EXTERNAL TABLE).
I'd found the first two, but they only got me creating an
There are several step by step guides that you can find online by googling
https://spark.apache.org/docs/latest/sql-distributed-sql-engine.html
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/content/spark-sql-thrift-server.html
https://medium.com/@saipeddy/setting-up-a-thrift-server-4eb0c5
It sounds like the tool you're after, then, is a distributed SQL engine
like Presto. But I could be totally misunderstanding what you're trying to
do.
On Thu, Feb 18, 2021 at 1:48 PM Scott Ribe
wrote:
> I have a client side piece that needs access via JDBC.
>
> > On Feb 18, 2021, at 12:45 PM, J
I have a client side piece that needs access via JDBC.
> On Feb 18, 2021, at 12:45 PM, Jeff Evans
> wrote:
>
> If the data is already in Parquet files, I don't see any reason to involve
> JDBC at all. You can read Parquet files directly into a DataFrame.
> https://spark.apache.org/docs/late
If the data is already in Parquet files, I don't see any reason to involve
JDBC at all. You can read Parquet files directly into a DataFrame.
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html
On Thu, Feb 18, 2021 at 1:42 PM Scott Ribe
wrote:
> I need a little help figuring out
I need a little help figuring out how some pieces fit together. I have some
tables in parquet files, and I want to access them using SQL over JDBC. I
gather that I need to run the thrift server, but how do I configure it to load
my files into datasets and expose views?
The context is this: tryi