Hi All,
When I try and run Spark SQL in standalone mode it appears to be missing
the parquet jar, I have to pass it as -jars and that works..
sbin/start-thriftserver.sh --jars lib/parquet-hive-bundle-1.6.0.jar
--driver-memory 28g --master local[10]
Any ideas on why? I downloaded the one pre buil
How big is the data set? Does it work when you copy it to hdfs?
-Manu
On Mon, Sep 8, 2014 at 2:58 PM, Jim Carroll wrote:
> Hello all,
>
> I've been wrestling with this problem all day and any suggestions would be
> greatly appreciated.
>
> I'm trying to test reading a parquet file that's store
Hi,
Let me start with, I am new to spark.(be gentle)
I have a large data set in Parquet (~1.5B rows, 900 columns)
Currently Impala takes ~1-2 seconds for the queries while SparkSQL is
taking ~30 seconds..
Here is what I am currently doing..
I launch with SPARK_MEM=6g spark-shell
val sqlContex