how to investigate skew and DataFrames and RangePartitioner

2016-06-13 Thread Peter Halliday
does one achieve this now. Peter Halliday - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Error writing parquet to S3

2016-06-10 Thread Peter Halliday
Has anyone else seen this before? Before when I saw this there was an OOM but doesn’t seem so. Of course, I’m not sure how large the file that created this was either. Peter > On Jun 9, 2016, at 9:00 PM, Peter Halliday <pjh...@cornell.edu> wrote: > > I’m not 100% sure

Error writing parquet to S3

2016-06-09 Thread Peter Halliday
I’m not 100% sure why I’m getting this. I don’t see any errors before this at all. I’m not sure how to diagnose this. Peter Halliday 2016-06-10 01:46:05,282] WARN org.apache.spark.scheduler.TaskSetManager [task-result-getter-2hread] - Lost task 3737.0 in stage 2.0 (TID 10585, ip-172-16

UnsupportedOperationException: converting from RDD to DataSets on 1.6.1

2016-06-08 Thread Peter Halliday
I have some code that was producing OOM during shuffle and was RDD. So, upon direction by a member of Databricks I started covering to Datasets. However, when we did we are getting an error that seems to be not liking something within one of our case classes. Peter Halliday [2016-06-08 19

Re: EMR Spark log4j and metrics

2016-04-15 Thread Peter Halliday
I wonder if anyone can confirm is Spark on YARN the problem here? Or is it how AWS has put it together? I'm wondering if Spark on YARN has problems with configuration files for the workers and driver? Peter Halliday On Thu, Apr 14, 2016 at 1:09 PM, Peter Halliday <pjh...@cornell.edu>

Re: EMR Spark log4j and metrics

2016-04-14 Thread Peter Halliday
see evidence than the configuration files are read from or used after they pushed On Wed, Apr 13, 2016 at 11:22 AM, Peter Halliday <pjh...@cornell.edu> wrote: > I have an existing cluster that I stand up via Docker images and > CloudFormation Templates on AWS. We are moving to

EMR Spark log4j and metrics

2016-04-13 Thread Peter Halliday
to a jar than’s sent via —jars to spark-submit. Peter Halliday - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

FileAlreadyExistsException and Streaming context

2016-03-08 Thread Peter Halliday
the stack trace: http://pastebin.com/AqBFXkga <http://pastebin.com/AqBFXkga> Peter Halliday

Re: Get rid of FileAlreadyExistsError

2016-03-01 Thread Peter Halliday
gt; Have you tried spark.hadoop.validateOutputSpecs? > > On 01-Mar-2016 9:43 pm, "Peter Halliday" <pjh...@cornell.edu > <mailto:pjh...@cornell.edu>> wrote: > http://pastebin.com/vbbFzyzb <http://pastebin.com/vbbFzyzb> > > The problem seems to be t

Re: Get rid of FileAlreadyExistsError

2016-03-01 Thread Peter Halliday
, but no plans on changing this. I’m surprised not to see this fixed yet. Peter Halliday > On Mar 1, 2016, at 10:01 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > Do you mind pastebin'ning the stack trace with the error so that we know > which part of the code is under discus

Get rid of FileAlreadyExistsError

2016-03-01 Thread Peter Halliday
the 1.5.1 version of this code doesn’t allow for this to be passed in. Is that correct? Peter Halliday - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org