[GitHub] spark pull request: [SPARK-6190][core] create LargeByteBuffer for ...

squito Tue, 02 Jun 2015 13:06:01 -0700

Github user squito commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5400#discussion_r31564799
  
    --- Diff: 
network/common/src/main/java/org/apache/spark/network/buffer/WrappedLargeByteBuffer.java
 ---
    @@ -0,0 +1,280 @@
    +/*
    +* Licensed to the Apache Software Foundation (ASF) under one or more
    +* contributor license agreements.  See the NOTICE file distributed with
    +* this work for additional information regarding copyright ownership.
    +* The ASF licenses this file to You under the Apache License, Version 2.0
    +* (the "License"); you may not use this file except in compliance with
    +* the License.  You may obtain a copy of the License at
    +*
    +*    http://www.apache.org/licenses/LICENSE-2.0
    +*
    +* Unless required by applicable law or agreed to in writing, software
    +* distributed under the License is distributed on an "AS IS" BASIS,
    +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    +* See the License for the specific language governing permissions and
    +* limitations under the License.
    +*/
    +package org.apache.spark.network.buffer;
    +
    +import java.io.IOException;
    +import java.nio.BufferUnderflowException;
    +import java.nio.ByteBuffer;
    +import java.nio.channels.WritableByteChannel;
    +import java.util.Arrays;
    +import java.util.List;
    +
    +import com.google.common.annotations.VisibleForTesting;
    +import sun.nio.ch.DirectBuffer;
    +
    +/**
    + * A {@link org.apache.spark.network.buffer.LargeByteBuffer} which may 
contain multiple
    + * {@link java.nio.ByteBuffer}s.  In order to support 
<code>asByteBuffer</code>, all
    + * of the underlying ByteBuffers must have size equal to
    + * {@link 
org.apache.spark.network.buffer.LargeByteBufferHelper#MAX_CHUNK_SIZE} (except 
that last
    + * one).  The underlying ByteBuffers may be on-heap, direct, or 
memory-mapped.
    + */
    +public class WrappedLargeByteBuffer implements LargeByteBuffer {
    +
    +  @VisibleForTesting
    +  final ByteBuffer[] underlying;
    +
    +  private final long size;
    +  /**
    +   * each sub-ByteBuffer (except for the last one) must be exactly this 
size.  Note that this
    +   * class *really* expects this to be 
LargeByteBufferHelper.MAX_CHUNK_SIZE.  The only reason it isn't
    +   * is so that we can do tests without creating ginormous buffers.  
Public methods force it to
    +   * be LargeByteBufferHelper.MAX_CHUNK_SIZE
    +   */
    +  private final int subBufferSize;
    +  private long _pos;
    +  @VisibleForTesting
    +  int currentBufferIdx;
    +  @VisibleForTesting
    +  ByteBuffer currentBuffer;
    +
    +  /**
    +   * Construct a WrappedLargeByteBuffer from the given ByteBuffers.  Each 
of the ByteBuffers must
    +   * have size equal to {@link 
org.apache.spark.network.buffer.LargeByteBufferHelper#MAX_CHUNK_SIZE}
    +   * except for the final one.  The buffers are <code>duplicate</code>d, 
so the position of the
    +   * given buffers and the returned buffer will be independent, though the 
underlying data will be
    +   * shared.  The constructed buffer will always have position == 0.
    +   */
    +  public WrappedLargeByteBuffer(ByteBuffer[] underlying) {
    +    this(underlying, LargeByteBufferHelper.MAX_CHUNK_SIZE);
    +  }
    +
    +  /**
    +   * you do **not** want to call this version.  It leads to a buffer which 
doesn't properly
    +   * support {@link #asByteBuffer}.  The only reason it exists is to we 
can have tests which
    +   * don't require 2GB of memory
    +   *
    +   * @param underlying
    +   * @param subBufferSize
    +   */
    +  @VisibleForTesting
    +  WrappedLargeByteBuffer(ByteBuffer[] underlying, int subBufferSize) {
    --- End diff --
    
    The issue is that `asByteBuffer` will not be implemented correctly if the 
first sub buffer is smaller than `LargeByteBufferHelper.MAX_CHUNK_SIZE`.  It 
would actually be fine if the rest of the sub buffers had other sizes, but that 
struck me as just making it more confusing without any real benefits (after you 
force the first sub buffer to be 2GB, I dont' see much gain in letting other 
buffers be smaller).
    
    This is part of what I was getting at with an earlier comment, about how 
`asByteBuffer` really makes things a little ugly.  Say replication doesn't 
support blocks that are over 2GB, so it needs to use `asByteBuffer`.  If you 
let the sub buffers be smaller, than say you try to replicate a block that is 
1GB -- that replication should be allowed, so you'll need to a do full copy of 
all the data to get it into one `ByteBuffer`.
    
    OTOH, this forces `LargeByteBufferOutputStream.largeBuffer` to do a full 
copy -- but at least that is no worse than what we are doing already to get a 
`ByteBuffer` from a `ByteArrayOutputStream`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-6190][core] create LargeByteBuffer for ...

Reply via email to