from:"Manu Mukerji"

Spark SQL Standalone mode missing parquet?

2015-05-05 Thread Manu Mukerji

Hi All, When I try and run Spark SQL in standalone mode it appears to be missing the parquet jar, I have to pass it as -jars and that works.. sbin/start-thriftserver.sh --jars lib/parquet-hive-bundle-1.6.0.jar --driver-memory 28g --master local[10] Any ideas on why? I downloaded the one pre buil

Re: Querying a parquet file in s3 with an ec2 install

2014-09-08 Thread Manu Mukerji

How big is the data set? Does it work when you copy it to hdfs? -Manu On Mon, Sep 8, 2014 at 2:58 PM, Jim Carroll wrote: > Hello all, > > I've been wrestling with this problem all day and any suggestions would be > greatly appreciated. > > I'm trying to test reading a parquet file that's store

Recommendations for performance

2014-09-08 Thread Manu Mukerji

Hi, Let me start with, I am new to spark.(be gentle) I have a large data set in Parquet (~1.5B rows, 900 columns) Currently Impala takes ~1-2 seconds for the queries while SparkSQL is taking ~30 seconds.. Here is what I am currently doing.. I launch with SPARK_MEM=6g spark-shell val sqlContex

Spark SQL Standalone mode missing parquet?

Re: Querying a parquet file in s3 with an ec2 install

Recommendations for performance

3 matches

Site Navigation

Mail list logo

Footer information