[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

JoshRosen Tue, 28 Apr 2015 15:38:13 -0700

Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5725#discussion_r29295786
  
    --- Diff: 
unsafe/src/main/java/org/apache/spark/unsafe/memory/TaskMemoryManager.java ---
    @@ -0,0 +1,237 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.unsafe.memory;
    +
    +import java.util.*;
    +
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +/**
    + * Manages the memory allocated by an individual task.
    + * <p>
    + * Most of the complexity in this class deals with encoding of off-heap 
addresses into 64-bit longs.
    + * In off-heap mode, memory can be directly addressed with 64-bit longs. 
In on-heap mode, memory is
    + * addressed by the combination of a base Object reference and a 64-bit 
offset within that object.
    + * This is a problem when we want to store pointers to data structures 
inside of other structures,
    + * such as record pointers inside hashmaps or sorting buffers. Even if we 
decided to use 128 bits
    + * to address memory, we can't just store the address of the base object 
since it's not guaranteed
    + * to remain stable as the heap gets reorganized due to GC.
    + * <p>
    + * Instead, we use the following approach to encode record pointers in 
64-bit longs: for off-heap
    + * mode, just store the raw address, and for on-heap mode use the upper 13 
bits of the address to
    + * store a "page number" and the lower 51 bits to store an offset within 
this page. These page
    + * numbers are used to index into a "page table" array inside of the 
MemoryManager in order to
    + * retrieve the base object.
    + * <p>
    + * This allows us to address 8192 pages. In on-heap mode, the maximum page 
size is limited by the
    + * maximum size of a long[] array, allowing us to address 8192 * 2^32 * 8 
bytes, which is
    + * approximately 35 terabytes of memory.
    + */
    +public final class TaskMemoryManager {
    +
    +  private final Logger logger = 
LoggerFactory.getLogger(TaskMemoryManager.class);
    +
    +  /**
    +   * The number of entries in the page table.
    +   */
    +  private static final int PAGE_TABLE_SIZE = 1 << 13;
    +
    +  /** Bit mask for the lower 51 bits of a long. */
    +  private static final long MASK_LONG_LOWER_51_BITS = 0x7FFFFFFFFFFFFL;
    +
    +  /** Bit mask for the upper 13 bits of a long */
    +  private static final long MASK_LONG_UPPER_13_BITS = 
~MASK_LONG_LOWER_51_BITS;
    +
    +  /**
    +   * Similar to an operating system's page table, this array maps page 
numbers into base object
    +   * pointers, allowing us to translate between the hashtable's internal 
64-bit address
    +   * representation and the baseObject+offset representation which we use 
to support both in- and
    +   * off-heap addresses. When using an off-heap allocator, every entry in 
this map will be `null`.
    +   * When using an in-heap allocator, the entries in this map will point 
to pages' base objects.
    +   * Entries are added to this map as new data pages are allocated.
    +   */
    +  private final MemoryBlock[] pageTable = new MemoryBlock[PAGE_TABLE_SIZE];
    +
    +  /**
    +   * Bitmap for tracking free pages.
    +   */
    +  private final BitSet allocatedPages = new BitSet(PAGE_TABLE_SIZE);
    +
    +  /**
    +   * Tracks memory allocated with {@link 
TaskMemoryManager#allocate(long)}, used to detect / clean
    +   * up leaked memory.
    +   */
    +  private final HashSet<MemoryBlock> allocatedNonPageMemory = new 
HashSet<MemoryBlock>();
    --- End diff --
    
    Here, I've used a HashSet to track non-page allocations.  I don't think 
that this should be _too_ expensive because we only use these in a couple of 
places (long arrays and bitsets, which we might just want to always keep 
on-heap anyways (in which case we don't need to involve the allocator)).  I 
didn't implement equals() or hashCode() for MemoryBlock, but I think that 
should be okay; object identity is fine for our purposes here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

Reply via email to