spark.sql.inMemoryColumnarStorage.compressed to true. This property is
already set to true by default in master branch and branch-1.2.
On 11/13/14 7:16 AM, Sadhan Sood wrote:
We noticed while caching data from our hive tables which contain data
in compressed sequence file format that it gets uncompressed in memory when
We are running spark on yarn with combined memory 1TB and when trying to
cache a table partition(which is 100G), seeing a lot of failed collect
stages in the UI and this never succeeds. Because of the failed collect, it
seems like the mapPartitions keep getting resubmitted. We have more than
(Logging.scala:logError(75)) - Asked to remove non-existent executor 372
2014-11-12 19:11:21,655 INFO scheduler.DAGScheduler
(Logging.scala:logInfo(59)) - Executor lost: 372 (epoch 3)
On Wed, Nov 12, 2014 at 12:31 PM, Sadhan Sood sadhan.s...@gmail.com wrote:
We are running spark on yarn with combined
an
output location for shuffle 0
The data is lzo compressed sequence file with compressed size ~ 26G. Is
there a way to understand why shuffle keeps failing for one partition. I
believe we have enough memory to store the uncompressed data in memory.
On Wed, Nov 12, 2014 at 2:50 PM, Sadhan Sood sadhan.s
We noticed while caching data from our hive tables which contain data in
compressed sequence file format that it gets uncompressed in memory when
getting cached. Is there a way to turn this off and cache the compressed
data as is ?
didn't start
successfully because the HiveServer2 occupied the port, and your Beeline
session was probably linked against HiveServer2.
Cheng
On 11/11/14 8:29 AM, Sadhan Sood wrote:
I was testing out the spark thrift jdbc server by running a simple query
in the beeline client. The spark
While testing SparkSQL on top of our Hive metastore, we were trying to
cache the data for one partition of the table in memory like this:
CACHE TABLE xyz_20141029 AS SELECT * FROM xyz where date_prefix = 20141029
Table xyz is a hive table which is partitioned with date_prefix. The data
is
Getting an exception while trying to build spark in spark-core:
[ERROR]
while compiling:
/Users/dev/tellapart_spark/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
during phase: typer
library version: version 2.10.4
compiler version: version 2.10.4
are broken, too. Based on the
Jenkins logs, I think that this pull request may have broken things
(although I'm not sure why):
https://github.com/apache/spark/pull/3030#issuecomment-62436181
On Mon, Nov 10, 2014 at 1:42 PM, Sadhan Sood sadhan.s...@gmail.com
wrote:
Getting
I was testing out the spark thrift jdbc server by running a simple query in
the beeline client. The spark itself is running on a yarn cluster.
However, when I run a query in beeline - I see no running jobs in the
spark UI(completely empty) and the yarn UI seem to indicate that the
submitted query
We want to run multiple instances of spark sql cli on our yarn cluster.
Each instance of the cli is to be used by a different user. This would be
non-optimal if each user brings up a different cli given how spark works on
yarn by running executor processes (and hence consuming resources) on
worker
We want to run multiple instances of spark sql cli on our yarn cluster.
Each instance of the cli is to be used by a different user. This looks
non-optimal if each user brings up a different cli given how spark works on
yarn by running executor processes (and hence consuming resources) on
worker
12 matches
Mail list logo