Hi
I've observed an inconsistent behaviour in .saveAsTextFile.
Up until version 1.3 it was possible to save RDDs as snappy compressed
files with the invocation of
rdd.saveAsTextFile(targetFile)
but after upgrading to 1.4 this no longer works. I need to specify a
codec for that:
rdd.saveAsText
On Thu, 12 Mar 2015 00:48:12 -0700
d34th4ck3r wrote:
> I'm trying to use Neo4j with Apache Spark Streaming but I am finding
> serializability as an issue.
>
> Basically, I want Apache Spark to parse and bundle my data in real
> time. After, the data has been bundled it should be stored in the
>
On Wed, 11 Mar 2015 11:19:56 +0100
Marcin Cylke wrote:
> Hi
>
> I'm trying to do a join of two datasets: 800GB with ~50MB.
The job finishes if I set spark.yarn.executor.memoryOverhead to 2048MB.
If it is around 1000MB it fails with "executor lost" errors.
My spark s
Hi
I'm trying to do a join of two datasets: 800GB with ~50MB.
My code looks like this:
private def parseClickEventLine(line: String, jsonFormatBC:
Broadcast[LazyJsonFormat]): ClickEvent = {
val json = line.parseJson.asJsObject
val eventJson = if (json.fields.contains("recommendationId
Hi
We're using Spark in our app's unit tests. The tests start spark
context with "local[*]" and test time now is 178 seconds on spark 1.2
instead of 41 seconds on 1.0.
We are using spark version from cloudera CDH (1.2.0-cdh5.3.1).
Could you give some hints what could cause that? and where to sea
On Thu, 15 May 2014 09:44:35 -0700
Marcelo Vanzin wrote:
> These are actually not worrisome; that's just the HDFS client doing
> its own thing to support HA. It probably picked the "wrong" NN to try
> first, and got the "NN in standby" exception, which it logs. Then it
> tries the other NN and th
Hi
I'm running Spark 0.9.1 on hadoop cluster - cdh4.2.1, with YARN.
I have a job, that performs a few transformations on a given file and joins
that file with some other.
The job itself finishes with success, however some tasks are failed and then
after rerun succeeds.
During the development
On Tue, 22 Apr 2014 12:28:15 +0200
Marcin Cylke wrote:
> Hi
>
> I have a Spark job that reads files from HDFS, does some pretty basic
> transformations, then writes it to some other location on hdfs.
>
> I'm running this job with spark-0.9.1-rc3, on Hadoop Yarn with
>
Hi
I have a Spark job that reads files from HDFS, does some pretty basic
transformations, then writes it to some other location on hdfs.
I'm running this job with spark-0.9.1-rc3, on Hadoop Yarn with Kerberos
security enabled.
One of my approaches to fixing this issue was changing SparkConf, s