Hi,
I'm building a spark application in which I load some data from an
Elasticsearch cluster (using latest elasticsearch-hadoop connector) and
continue to perform some calculations on the spark cluster.
In one case, I use collect on the RDD as soon as it is created (loaded from
ES).
However, it is
Hi,
I've been researching spark for a couple of months now, and I strongly
believe it can solve our problem.
We are developing an application that allows the user to analyze various
sources of information. We are dealing with non-technical users, so simply
giving them and interactive shell won't
the second RDD contains another subset.
Is there a map like API that could do this trick ?
BTW - I know that one can iteratively build multiple flows that would call map
and select the proper columns. Is there any faster way ?
Sagi
Hi spark-users,
When I use spark-sql or beeline to query a large dataset, sometimes the
query result may cause driver OOM.
So I wonder is there a config property in spark sql to limit the max return
result size (without LIMIT clause in sql query)?
For example, before the select query, I run the
related to serialization support in python but
I am just guessing.
Any help is appreciated,
Sagi
14/06/19 08:35:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use ma
preduce.task.id
14/06/19 08:35:52 INFO deprecation: mapred.task.id is deprecated. Instead, use m
apreduce.task.attempt.id
1
if you saw some exception message like the JIRA
https://issues.apache.org/jira/browse/SPARK-1886 mentioned in work's log
file, you are welcome to have a try https://github.com/apache/spark/pull/827
On Wed, May 21, 2014 at 11:21 AM, Josh Marcus wrote:
> Aaron:
>
> I see this in the Master's l
/machines ?
Sagi
I would check the DNS setting.
Akka seems to pick configuration from FQDN on my system
Sagi
From: Hahn Jiang [mailto:hahn.jiang@gmail.com]
Sent: Friday, April 11, 2014 10:56 AM
To: user
Subject: Error when I use spark-streaming
hi all,
When I run spark-streaming use NetworkWordCount