Ok. We'll try using it in a test cluster running 1.2. On 16-Dec-2014 1:36 am, "Xiangrui Meng" <men...@gmail.com> wrote:
Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar <reachb...@gmail.com> wrote: > Hi Xiangrui, > > The block size limit was encountered even with reduced number of item blocks > as you had expected. I'm wondering if I could try the new implementation as > a standalone library against a 1.1 deployment. Does it have dependencies on > any core API's in the current master? > > Thanks, > Bharath > > On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar <reachb...@gmail.com> > wrote: >> >> Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And >> yes, I've been following the JIRA for the new ALS implementation. I'll try >> it out when it's ready for testing. . >> >> On Wed, Dec 3, 2014 at 4:24 AM, Xiangrui Meng <men...@gmail.com> wrote: >>> >>> Hi Bharath, >>> >>> You can try setting a small item blocks in this case. 1200 is >>> definitely too large for ALS. Please try 30 or even smaller. I'm not >>> sure whether this could solve the problem because you have 100 items >>> connected with 10^8 users. There is a JIRA for this issue: >>> >>> https://issues.apache.org/jira/browse/SPARK-3735 >>> >>> which I will try to implement in 1.3. I'll ping you when it is ready. >>> >>> Best, >>> Xiangrui >>> >>> On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar <reachb...@gmail.com > >>> wrote: >>> > Yes, the issue appears to be due to the 2GB block size limitation. I am >>> > hence looking for (user, product) block sizing suggestions to work >>> > around >>> > the block size limitation. >>> > >>> > On Sun, Nov 30, 2014 at 3:01 PM, Sean Owen <so...@cloudera.com> wrote: >>> >> >>> >> (It won't be that, since you see that the error occur when reading a >>> >> block from disk. I think this is an instance of the 2GB block size >>> >> limitation.) >>> >> >>> >> On Sun, Nov 30, 2014 at 4:36 AM, Ganelin, Ilya >>> >> <ilya.gane...@capitalone.com> wrote: >>> >> > Hi Bharath – I’m unsure if this is your problem but the >>> >> > MatrixFactorizationModel in MLLIB which is the underlying component >>> >> > for >>> >> > ALS >>> >> > expects your User/Product fields to be integers. Specifically, the >>> >> > input >>> >> > to >>> >> > ALS is an RDD[Rating] and Rating is an (Int, Int, Double). I am >>> >> > wondering if >>> >> > perhaps one of your identifiers exceeds MAX_INT, could you write a >>> >> > quick >>> >> > check for that? >>> >> > >>> >> > I have been running a very similar use case to yours (with more >>> >> > constrained >>> >> > hardware resources) and I haven’t seen this exact problem but I’m >>> >> > sure >>> >> > we’ve >>> >> > seen similar issues. Please let me know if you have other questions. >>> >> > >>> >> > From: Bharath Ravi Kumar <reachb...@gmail.com> >>> >> > Date: Thursday, November 27, 2014 at 1:30 PM >>> >> > To: "user@spark.apache.org" <user@spark.apache.org> >>> >> > Subject: ALS failure with size > Integer.MAX_VALUE >>> >> > >>> >> > We're training a recommender with ALS in mllib 1.1 against a dataset >>> >> > of >>> >> > 150M >>> >> > users and 4.5K items, with the total number of training records >>> >> > being >>> >> > 1.2 >>> >> > Billion (~30GB data). The input data is spread across 1200 >>> >> > partitions on >>> >> > HDFS. For the training, rank=10, and we've configured {number of >>> >> > user >>> >> > data >>> >> > blocks = number of item data blocks}. The number of user/item blocks >>> >> > was >>> >> > varied between 50 to 1200. Irrespective of the block size (e.g. at >>> >> > 1200 >>> >> > blocks each), there are atleast a couple of tasks that end up >>> >> > shuffle >>> >> > reading > 9.7G each in the aggregate stage (ALS.scala:337) and >>> >> > failing >>> >> > with >>> >> > the following exception: >>> >> > >>> >> > java.lang.IllegalArgumentException: Size exceeds Integer.MAX_VALUE >>> >> > at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:745) >>> >> > at >>> >> > org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:108) >>> >> > at >>> >> > org.apache.spark.storage.DiskStore.getValues(DiskStore.scala:124) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.spark.storage.BlockManager.getLocalFromDisk(BlockManager.scala:332) >>> >> > at >>> >> > >>> >> > >>> >> > org.apache.spark.storage.BlockFetcherIterator$BasicBlockFetcherIterator$$anonfun$getLocalBlocks$1.apply(BlockFetcherIterator.scala:204) >>> >> > >>> > >>> > >> >> >