Awesome, thanks for the PR Koert!
/Anders
On Thu, Dec 17, 2015 at 10:22 PM Prasad Ravilla wrote:
> Thanks, Koert.
>
> Regards,
> Prasad.
>
> From: Koert Kuipers
> Date: Thursday, December 17, 2015 at 1:06 PM
> To: Prasad Ravilla
> Cc: Anders Arpteg, user
>
>
iently. AvroRelation should just pass the paths (comma separated).
>>
>>
>>
>>
>> On Thu, Oct 22, 2015 at 1:37 PM, Anders Arpteg
>> wrote:
>>
>>> Yes, seems unnecessary. I actually tried patching the
>>> com.databricks.spark.avro reader
Thu, Sep 24, 2015 at 1:24 PM, Anders Arpteg wrote:
>
>> Hi,
>>
>> Running spark 1.5.0 in yarn-client mode, and am curios in why there are
>> so many broadcast being done when loading datasets with large number of
>> partitions/files. Have datasets with thousands
Hi,
Received the following error when reading an Avro source with Spark 1.5.0
and the com.databricks.spark.avro reader. In the data source, there is one
nested field named "UserActivity.history.activity" and another named
"UserActivity.activity". This seems to be the reason for the execption,
sinc
Hi,
Running spark 1.5.0 in yarn-client mode, and am curios in why there are so
many broadcast being done when loading datasets with large number of
partitions/files. Have datasets with thousands of partitions, i.e. hdfs
files in the avro folder, and sometime loading hundreds of these large
dataset
Ok, thanks Reynold. When I tested dynamic allocation with Spark 1.4, it
complained saying that it was not tungsten compliant. Lets hope it works
with 1.5 then!
On Tue, Sep 8, 2015 at 5:49 AM Reynold Xin wrote:
>
> On Wed, Sep 2, 2015 at 12:03 AM, Anders Arpteg wrote:
>
>>
Liu wrote:
> Thanks for the confirmation. The tungsten-sort is not the default
> ShuffleManager, this fix will not block 1.5 release, it may be in
> 1.5.1.
>
> BTW, How is the difference between sort and tungsten-sort
> ShuffleManager for this large job?
>
> On Tue, Sep 1, 201
had sent out a PR [1] to fix 2), could you help to test that?
>
> [1] https://github.com/apache/spark/pull/8543
>
> On Mon, Aug 31, 2015 at 12:34 PM, Anders Arpteg
> wrote:
> > Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
> > mana
Was trying out 1.5 rc2 and noticed some issues with the Tungsten shuffle
manager. One problem was when using the com.databricks.spark.avro reader
and the error(1) was received, see stack trace below. The problem does not
occur with the "sort" shuffle manager.
Another problem was in a large complex
t; On 6/25/15 5:52 PM, Anders Arpteg wrote:
>
> Yes, both the driver and the executors. Works a little bit better with
> more space, but still a leak that will cause failure after a number of
> reads. There are about 700 different data sources that needs to be loaded,
> lots of data...
try increasing the perm gen for the driver?
>
> Regards
> Sab
> On 24-Jun-2015 4:40 pm, "Anders Arpteg" wrote:
>
>> When reading large (and many) datasets with the Spark 1.4.0 DataFrames
>> parquet reader (the org.apache.spark.sql.parquet format)
When reading large (and many) datasets with the Spark 1.4.0 DataFrames
parquet reader (the org.apache.spark.sql.parquet format), the following
exceptions are thrown:
Exception in thread "task-result-getter-0"
Exception: java.lang.OutOfMemoryError thrown from the
UncaughtExceptionHandler in thread
Jun 2, 2015 at 8:45 PM, Yin Huai wrote:
> Does it happen every time you read a parquet source?
>
> On Tue, Jun 2, 2015 at 3:42 AM, Anders Arpteg wrote:
>
>> The log is from the log aggregation tool (hortonworks, "yarn logs ..."),
>> so both executors and driver. I
est Regards,
> Shixiong Zhu
>
> 2015-06-02 17:11 GMT+08:00 Anders Arpteg :
>
>> Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
>> worked fine for Spark 1.3. The job starts on the cluster (yarn-cluster
>> mode), initial stage starts, but the job f
Just compiled Spark 1.4.0-rc3 for Yarn 2.2 and tried running a job that
worked fine for Spark 1.3. The job starts on the cluster (yarn-cluster
mode), initial stage starts, but the job fails before any task succeeds
with the following error. Any hints?
[ERROR] [06/02/2015 09:05:36.962] [Executor ta
or fifo scheduler without multi
> resource scheduling by any chance?
>
> On Thu, Feb 12, 2015 at 1:51 PM, Anders Arpteg wrote:
>
>> The nm logs only seems to contain similar to the following. Nothing else
>> in the same time range. Any help?
&g
7; storage levels- both of them are having the same
>> problems. I notice large shuffle files (30-40gb) that only seem to spill a
>> few hundred mb.
>>
>> On Mon, Feb 23, 2015 at 4:28 PM, Anders Arpteg
>> wrote:
>>
>>> Sounds very similar to what I ex
1/5th of the data just fine.The only thing that's
> pointing me towards a memory issue is that it seems to be happening in the
> same stages each time and when I lower the memory that each executor has
> allocated it happens in earlier stages but I can't seem to find anything
> t
over 1.3TB of memory
>> allocated for the application. I was thinking perhaps it was possible that
>> a single executor was getting a single or a couple large partitions but
>> shouldn't the disk persistence kick in at that point?
>>
>> On Sat, Feb 21, 2015 at 11:20
For large jobs, the following error message is shown that seems to indicate
that shuffle files for some reason are missing. It's a rather large job
with many partitions. If the data size is reduced, the problem disappears.
I'm running a build from Spark master post 1.2 (build at 2015-01-16) and
run
re you able to find any of the container logs? Is the
> NodeManager launching containers and reporting some exit code?
>
> -Sandy
>
> On Thu, Feb 12, 2015 at 1:21 PM, Anders Arpteg wrote:
>
>> No, not submitting from windows, from a debian distribution. Had a quick
>> lo
y YARN to execute the container and even run
> manually to trace at what line the error has occurred.
>
> BTW are you submitting the job from windows?
>
> On Thu, Feb 12, 2015, 3:34 PM Anders Arpteg wrote:
>
>> Interesting to hear that it works for you. Are you using Yarn 2.2
>
> -Sandy
>
> On Wed, Feb 11, 2015 at 1:28 PM, Anders Arpteg wrote:
>
>> Hi,
>>
>> Compiled the latest master of Spark yesterday (2015-02-10) for Hadoop
>> 2.2 and failed executing jobs in yarn-cluster mode for that build. Works
>> successfully with spa
Hi,
Compiled the latest master of Spark yesterday (2015-02-10) for Hadoop 2.2
and failed executing jobs in yarn-cluster mode for that build. Works
successfully with spark 1.2 (and also master from 2015-01-16), so something
has changed since then that prevents the job from receiving any executors
o
second time the app gets launched.
On Thu, Jan 15, 2015 at 3:01 PM, Anders Arpteg wrote:
> Found a setting that seems to fix this problem, but it does not seems to
> be available until Spark 1.3. See
> https://issues.apache.org/jira/browse/SPARK-2165
>
> However, glad to see a work is
Found a setting that seems to fix this problem, but it does not seems to be
available until Spark 1.3. See
https://issues.apache.org/jira/browse/SPARK-2165
However, glad to see a work is being done with the issue.
On Tue, Jan 13, 2015 at 8:00 PM, Anders Arpteg wrote:
> Yes Andrew, I am. Tr
3:29 AM, Sven Krasser wrote:
> Anders,
>
> This could be related to this open ticket:
> https://issues.apache.org/jira/browse/SPARK-5077. A call to coalesce()
> also fixed that for us as a stopgap.
>
> Best,
> -Sven
>
>
> On Mon, Jan 12, 2015 at 10:18 AM, Ande
Yes Andrew, I am. Tried setting spark.yarn.applicationMaster.waitTries to 1
(thanks Sean), but with no luck. Any ideas?
On Tue, Jan 13, 2015 at 7:58 PM, Andrew Or wrote:
> Hi Anders, are you using YARN by any chance?
>
> 2015-01-13 0:32 GMT-08:00 Anders Arpteg :
>
> Since start
Since starting using Spark 1.2, I've experienced an annoying issue with
failing apps that gets executed twice. I'm not talking about tasks inside a
job, that should be executed multiple times before failing the whole app.
I'm talking about the whole app, that seems to close the previous Spark
conte
n 12, 2015 at 6:32 AM, Sandy Ryza wrote:
> Hi Anders,
>
> Have you checked your NodeManager logs to make sure YARN isn't killing
> executors for exceeding memory limits?
>
> -Sandy
>
> On Tue, Jan 6, 2015 at 8:20 AM, Anders Arpteg wrote:
>
>> Hey,
>>
gt; collection of dates and invoking a Spark operation for each. Simply
> write "dateList.par.map(...)" to make the local map proceed in
> parallel. It should invoke the Spark jobs simultaneously.
>
> On Fri, Jan 9, 2015 at 10:46 AM, Anders Arpteg wrote:
> > Hey,
>
Hey,
Lets say we have multiple independent jobs that each transform some data
and store in distinct hdfs locations, is there a nice way to run them in
parallel? See the following pseudo code snippet:
dateList.map(date =>
sc.hdfsFile(date).map(transform).saveAsHadoopFile(date))
It's unfortunate i
Hey,
I have a job that keeps failing if too much data is processed, and I can't
see how to get it working. I've tried repartitioning with more partitions
and increasing amount of memory for the executors (now about 12G and 400
executors. Here is a snippets of the first part of the code, which succ
Hey,
I have a job that keeps failing if too much data is processed, and I can't
see how to get it working. I've tried repartitioning with more partitions
and increasing amount of memory for the executors (now about 12G and 400
executors. Here is a snippets of the first part of the code, which succ
t; instructions described in the docs.
>>
>> Thanks,
>> - Tsuyoshi
>>
>> On Sat, Dec 27, 2014 at 11:06 PM, Anders Arpteg
>> wrote:
>> > Hey,
>> >
>> > Tried to get the new spark.dynamicAllocation.enabled feature working on
>> Yarn
Hey,
Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:
conf
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.shuffle.service.enabled", "true")
.se
Hey,
Tried to get the new spark.dynamicAllocation.enabled feature working on
Yarn (Hadoop 2.2), but am unsuccessful so far. I've tested with the
following settings:
conf
.set("spark.dynamicAllocation.enabled", "true")
.set("spark.shuffle.service.enabled", "true")
.se
37 matches
Mail list logo