subject:"Dataframe sort"

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-06 Thread amarouni

You can get some more insights by using the Spark history server (http://spark.apache.org/docs/latest/monitoring.html), it can show you which task is failing and some other information that might help you debugging the issue. On 05/10/2016 19:00, Babak Alipour wrote: > The issue seems to lie in t

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-05 Thread Babak Alipour

The issue seems to lie in the RangePartitioner trying to create equal ranges. [1] [1] https://spark.apache.org/docs/2.0.0/api/java/org/apache/ spark/RangePartitioner.html The *Double* values I'm trying to sort are mostly in the range [0,1] (~70% of the data which roughly equates 1 billion record

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-02 Thread Babak Alipour

Thanks Vadim for sharing your experience, but I have tried multi-JVM setup (2 workers), various sizes for spark.executor.memory (8g, 16g, 20g, 32g, 64g) and spark.executor.core (2-4), same error all along. As for the files, these are all .snappy.parquet files, resulting from inserting some data fr

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov

oh, and try to run even smaller executors, i.e. with `spark.executor.memory` <= 16GiB. I wonder what result you're going to get. On Sun, Oct 2, 2016 at 1:24 AM, Vadim Semenov wrote: > > Do you mean running a multi-JVM 'cluster' on the single machine? > Yes, that's what I suggested. > > You can g

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Vadim Semenov

> Do you mean running a multi-JVM 'cluster' on the single machine? Yes, that's what I suggested. You can get some information here: http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ > How would that affect performance/memory-consumption? If a multi-JVM setup can han

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour

To add one more note, I tried running more smaller executors each with 32-64g memory and executor.cores 2-4 (with 2 workers as well) and I'm still getting the same exception: java.lang.IllegalArgumentException: Cannot allocate a page with more than 17179869176 bytes at org.apache.spark.mem

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-10-01 Thread Babak Alipour

Do you mean running a multi-JVM 'cluster' on the single machine? How would that affect performance/memory-consumption? If a multi-JVM setup can handle such a large input, then why can't a single-JVM break down the job into smaller tasks? I also found that SPARK-9411 mentions making the page_size c

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov

Run more smaller executors: change `spark.executor.memory` to 32g and `spark.executor.cores` to 2-4, for example. Changing driver's memory won't help because it doesn't participate in execution. On Fri, Sep 30, 2016 at 2:58 PM, Babak Alipour wrote: > Thank you for your replies. > > @Mich, using

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour

Thank you for your replies. @Mich, using LIMIT 100 in the query prevents the exception but given the fact that there's enough memory, I don't think this should happen even without LIMIT. @Vadim, here's the full stack trace: Caused by: java.lang.IllegalArgumentException: Cannot allocate a page wi

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Vadim Semenov

Can you post the whole exception stack trace? What are your executor memory settings? Right now I assume that it happens in UnsafeExternalRowSorter -> UnsafeExternalSorter:insertRecord Running more executors with lower `spark.executor.memory` should help. On Fri, Sep 30, 2016 at 12:57 PM, Babak

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Mich Talebzadeh

What will happen if you LIMIT the result set to 100 rows only -- select from order by field LIMIT 100. Will that work? How about running the whole query WITHOUT order by? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw

DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

2016-09-30 Thread Babak Alipour

Greetings everyone, I'm trying to read a single field of a Hive table stored as Parquet in Spark (~140GB for the entire table, this single field should be just a few GB) and look at the sorted output using the following: sql("SELECT " + field + " FROM MY_TABLE ORDER BY " + field + " DESC") But

Dataframe sort

2016-07-05 Thread tan shai

Hi, I need to sort a dataframe and retrive the bounds of each partition. The dataframe.sort() is using the range partitioning in the physical plan. I need to retrieve partition bounds. Many thanks for your help.

Re: pyspark dataframe sort issue

2016-05-08 Thread Buntu Dev

Thanks Davies, after I did a coalesce(1) to save as single parquet file I was able to get the head() to return the correct order. On Sun, May 8, 2016 at 12:29 AM, Davies Liu wrote: > When you have multiple parquet files, the order of all the rows in > them is not defined. > > On Sat, May 7, 2016

Re: pyspark dataframe sort issue

2016-05-08 Thread Davies Liu

When you have multiple parquet files, the order of all the rows in them is not defined. On Sat, May 7, 2016 at 11:48 PM, Buntu Dev wrote: > I'm using pyspark dataframe api to sort by specific column and then saving > the dataframe as parquet file. But the resulting parquet file doesn't seem > to

pyspark dataframe sort issue

2016-05-07 Thread Buntu Dev

I'm using pyspark dataframe api to sort by specific column and then saving the dataframe as parquet file. But the resulting parquet file doesn't seem to be sorted. Applying sort and doing a head() on the results shows the correct results sorted by 'value' column in desc order, as shown below: ~~~

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Re: DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

DataFrame Sort gives Cannot allocate a page with more than 17179869176 bytes

Dataframe sort

Re: pyspark dataframe sort issue

Re: pyspark dataframe sort issue

pyspark dataframe sort issue

16 matches

Site Navigation

Mail list logo

Footer information