Re: ALS failure with size > Integer.MAX_VALUE

Bharath Ravi Kumar Mon, 15 Dec 2014 19:25:43 -0800

Ok. We'll try using it in a test cluster running 1.2.
On 16-Dec-2014 1:36 am, "Xiangrui Meng" <men...@gmail.com> wrote:


Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui

On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar
<reachb...@gmail.com> wrote:
> Hi Xiangrui,
>
> The block size limit was encountered even with reduced number of item
blocks
> as you had expected. I'm wondering if I could try the new implementation
as
> a standalone library against a 1.1 deployment. Does it have dependencies
on
> any core API's in the current master?
>
> Thanks,
> Bharath
>
> On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar <reachb...@gmail.com>
> wrote:
>>
>> Thanks Xiangrui. I'll try out setting a smaller number of item blocks.
And
>> yes, I've been following the JIRA for the new ALS implementation. I'll
try
>> it out when it's ready for testing. .
>>
>> On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <men...@gmail.com> wrote:
>>>
>>> Hi Bharath,
>>>
>>> You can try setting a small item blocks in this case. 1200 is
>>> definitely too large for ALS. Please try 30 or even smaller. I'm not
>>> sure whether this could solve the problem because you have 100 items
>>> connected with 10^8 users. There is a JIRA for this issue:
>>>
>>> https://issues.apache.org/jira/browse/SPARK-3735
>>>
>>> which I will try to implement in 1.3. I'll ping you when it is ready.
>>>
>>> Best,
>>> Xiangrui
>>>
>>> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <reachb...@gmail.com
>
>>> wrote:
>>> > Yes, the issue appears to be due to the 2GB block size limitation. I
am
>>> > hence looking for (user, product) block sizing suggestions to work
>>> > around
>>> > the block size limitation.
>>> >
>>> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote:
>>> >>
>>> >> (It won't be that, since you see that the error occur when reading a
>>> >> block from disk. I think this is an instance of the 2GB block size
>>> >> limitation.)
>>> >>
>>> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya
>>> >> <ilya.gane...@capitalone.com> wrote:
>>> >> > Hi Bharath – I’m unsure if this is your problem but the
>>> >> > MatrixFactorizationModel in MLLIB which is the underlying component
>>> >> > for
>>> >> > ALS
>>> >> > expects your User/Product fields to be integers. Specifically, the
>>> >> > input
>>> >> > to
>>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am
>>> >> > wondering if
>>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a
>>> >> > quick
>>> >> > check for that?
>>> >> >
>>> >> > I have been running a very similar use case to yours (with more
>>> >> > constrained
>>> >> > hardware resources) and I haven’t seen this exact problem but I’m
>>> >> > sure
>>> >> > we’ve
>>> >> > seen similar issues. Please let me know if you have other
questions.
>>> >> >
>>> >> > From: Bharath Ravi Kumar <reachb...@gmail.com>
>>> >> > Date: Thursday, November 27, 2014 at 1:30 PM
>>> >> > To: "user@spark.apache.org" <user@spark.apache.org>
>>> >> > Subject: ALS failure with size > Integer.MAX_VALUE
>>> >> >
>>> >> > We're training a recommender with ALS in mllib 1.1 against a
dataset
>>> >> > of
>>> >> > 150M
>>> >> > users and 4.5K items, with the total number of training records
>>> >> > being
>>> >> > 1.2
>>> >> > Billion (~30GB data). The input data is spread across 1200
>>> >> > partitions on
>>> >> > HDFS. For the training, rank=10, and we've configured {number of
>>> >> > user
>>> >> > data
>>> >> > blocks = number of item data blocks}. The number of user/item
blocks
>>> >> > was
>>> >> > varied  between 50 to 1200. Irrespective of the block size (e.g. at
>>> >> > 1200
>>> >> > blocks each), there are atleast a couple of tasks that end up
>>> >> > shuffle
>>> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and
>>> >> > failing
>>> >> > with
>>> >> > the following exception:
>>> >> >
>>> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE
>>> >> >         at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108)
>>> >> >         at
>>> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> >
org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332)
>>> >> >         at
>>> >> >
>>> >> >
>>> >> >
org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204)
>>> >> >
>>> >
>>> >
>>
>>
>

Re: ALS failure with size > Integer.MAX_VALUE

Reply via email to