Spark hangs on collect (stuck on scheduler delay)

2015-08-16 Thread Sagi r
Hi, I'm building a spark application in which I load some data from an Elasticsearch cluster (using latest elasticsearch-hadoop connector) and continue to perform some calculations on the spark cluster. In one case, I use collect on the RDD as soon as it is created (loaded from ES). However, it is

Spark application with a RESTful API

2015-07-06 Thread Sagi r
Hi, I've been researching spark for a couple of months now, and I strongly believe it can solve our problem. We are developing an application that allows the user to analyze various sources of information. We are dealing with non-technical users, so simply giving them and interactive shell won't

Split RDD along columns

2015-01-29 Thread Schein, Sagi
the second RDD contains another subset. Is there a map like API that could do this trick ? BTW - I know that one can iteratively build multiple flows that would call map and select the proper columns. Is there any faster way ? Sagi

Is there a way to limit the sql query result size?

2014-11-06 Thread sagi
Hi spark-users, When I use spark-sql or beeline to query a large dataset, sometimes the query result may cause driver OOM. So I wonder is there a config property in spark sql to limit the max return result size (without LIMIT clause in sql query)? For example, before the select query, I run the

python worker crash in spark 1.0

2014-06-18 Thread Schein, Sagi
related to serialization support in python but I am just guessing. Any help is appreciated, Sagi 14/06/19 08:35:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use ma preduce.task.id 14/06/19 08:35:52 INFO deprecation: mapred.task.id is deprecated. Instead, use m apreduce.task.attempt.id 1

Re: advice on maintaining a production spark cluster?

2014-05-21 Thread sagi
if you saw some exception message like the JIRA https://issues.apache.org/jira/browse/SPARK-1886 mentioned in work's log file, you are welcome to have a try https://github.com/apache/spark/pull/827 On Wed, May 21, 2014 at 11:21 AM, Josh Marcus wrote: > Aaron: > > I see this in the Master's l

moving SparkContext around

2014-04-13 Thread Schein, Sagi
/machines ? Sagi

RE: Error when I use spark-streaming

2014-04-11 Thread Schein, Sagi
I would check the DNS setting. Akka seems to pick configuration from FQDN on my system Sagi From: Hahn Jiang [mailto:hahn.jiang@gmail.com] Sent: Friday, April 11, 2014 10:56 AM To: user Subject: Error when I use spark-streaming hi all, When I run spark-streaming use NetworkWordCount