date:20180226

[GitHub] spark issue #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Blockmana...

2018-02-26 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/20667
  
@cloud-fan I just find commit log below
`Modified StorageLevel and BlockManagerId to cache common objects and use 
cached object while deserializing.`
I can't figure out why we need cache, since i think the cache miss may be 
high?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87664 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87664/testReport)**
 for PR 20382 at commit 
[`fd890ad`](https://github.com/apache/spark/commit/fd890ad837bb7068c70a27921d67af1c3fe65350).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87665/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87665 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87665/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20666
  
**[Test build #87663 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87663/testReport)**
 for PR 20666 at commit 
[`1d03d3b`](https://github.com/apache/spark/commit/1d03d3b248821a05dfd2751eeb0c8b657ebc9073).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87663/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20666: [SPARK-23448][SQL] Clarify JSON and CSV parser behavior ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20666
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87664/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread jerryshao

Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/20382
  
Jenkins, retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread xuanyuanking

Github user xuanyuanking commented on the issue:

https://github.com/apache/spark/pull/20675
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1055/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1054/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

2018-02-26 Thread caneGuy

Github user caneGuy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170516775
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, 
BlockManagerId](true)
 
-  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
+  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean 
= false): BlockManagerId =
+  {
 blockManagerIdCache.putIfAbsent(id, id)
-blockManagerIdCache.get(id)
+val blockManagerId = blockManagerIdCache.get(id)
+if (clearOldValues) {
+  blockManagerIdCache.clearOldValues(System.currentTimeMillis - 
Utils.timeStringAsMs("10d"))
--- End diff --

@Ngone51 Thanks.i also though about remove when we delete a block.
In this case, it is history replaying which will trigger this problem,and 
we do not delete any block actually.
Maybe use `weakreference` better as @jiangxb1987 mentioned?WDYT?
Thanks again!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87667 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87667/testReport)**
 for PR 20382 at commit 
[`fd890ad`](https://github.com/apache/spark/commit/fd890ad837bb7068c70a27921d67af1c3fe65350).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87666/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170519540
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,15 +19,24 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+// No usage of DirectByteBuffer.allocateDirect is current design
--- End diff --

Since previous implementation used `DirectByteBuffer.allocateDirect`, I 
leave a reference for my decision. I will remove this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170519752
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java ---
@@ -45,38 +45,135 @@
*/
   public static final int FREED_IN_ALLOCATOR_PAGE_NUMBER = -3;
 
-  private final long length;
+  @Nullable
+  protected Object obj;
+
+  protected long offset;
+
+  protected long length;
 
   /**
* Optional page number; used when this MemoryBlock represents a page 
allocated by a
-   * TaskMemoryManager. This field is public so that it can be modified by 
the TaskMemoryManager,
-   * which lives in a different package.
+   * TaskMemoryManager. This field can be updated using setPageNumber 
method so that
+   * this can be modified by the TaskMemoryManager, which lives in a 
different package.
*/
-  public int pageNumber = NO_PAGE_NUMBER;
+  private int pageNumber = NO_PAGE_NUMBER;
 
   public MemoryBlock(@Nullable Object obj, long offset, long length) {
-super(obj, offset);
+this.obj = obj;
+this.offset = offset;
 this.length = length;
   }
 
+  public MemoryBlock() {
+this(null, 0, 0);
+  }
+
+  public final Object getBaseObject() {
+return obj;
+  }
+
+  public final long getBaseOffset() {
+return offset;
+  }
+
+  public void resetObjAndOffset() {
+this.obj = null;
+this.offset = 0;
+  }
+
   /**
* Returns the size of the memory block.
*/
-  public long size() {
+  public final long size() {
 return length;
   }
 
-  /**
-   * Creates a memory block pointing to the memory used by the long array.
-   */
-  public static MemoryBlock fromLongArray(final long[] array) {
-return new MemoryBlock(array, Platform.LONG_ARRAY_OFFSET, array.length 
* 8L);
+  public final void setPageNumber(int pageNum) {
+pageNumber = pageNum;
+  }
+
+  public final int getPageNumber() {
+return pageNumber;
   }
 
   /**
* Fills the memory block with the specified byte value.
*/
-  public void fill(byte value) {
+  public final void fill(byte value) {
 Platform.setMemory(obj, offset, length, value);
   }
+
+  /**
+   * Instantiate MemoryBlock for given object type with new offset
+   */
+  public final static MemoryBlock allocateFromObject(Object obj, long 
offset, long length) {
+MemoryBlock mb = null;
+if (obj instanceof byte[]) {
+  byte[] array = (byte[])obj;
+  mb = new ByteArrayMemoryBlock(array, offset, length);
+} else if (obj instanceof long[]) {
+  long[] array = (long[])obj;
+  mb = new OnHeapMemoryBlock(array, offset, length);
+} else if (obj == null) {
+  // we assume that to pass null pointer means off-heap
+  mb = new OffHeapMemoryBlock(offset, length);
+} else {
+  throw new UnsupportedOperationException(obj.getClass() + " is not 
supported now");
+}
+return mb;
+  }
+
+  /**
+   * Instantiate the same type of MemoryBlock with new offset and size
+   */
+  public abstract MemoryBlock allocate(long offset, long size);
+
+
+  public abstract int getInt(long offset);
+
+  public abstract void putInt(long offset, int value);
+
+  public abstract boolean getBoolean(long offset);
+
+  public abstract void putBoolean(long offset, boolean value);
+
+  public abstract byte getByte(long offset);
+
+  public abstract void putByte(long offset, byte value);
+
+  public abstract short getShort(long offset);
+
+  public abstract void putShort(long offset, short value);
+
+  public abstract long getLong(long offset);
+
+  public abstract void putLong(long offset, long value);
+
+  public abstract float getFloat(long offset);
+
+  public abstract void putFloat(long offset, float value);
+
+  public abstract double getDouble(long offset);
+
+  public abstract void putDouble(long offset, double value);
+
+  public static final void copyMemory(
+  MemoryBlock src, long srcOffset, MemoryBlock dst, long dstOffset, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset() + 
srcOffset,
+dst.getBaseObject(), dst.getBaseOffset() + dstOffset, length);
+  }
+
+  public static final void copyMemory(MemoryBlock src, MemoryBlock dst, 
long length) {
+Platform.copyMemory(src.getBaseObject(), src.getBaseOffset(),
+  dst.getBaseObject(), dst.getBaseOffset(), length);
+  }
+
+  public final void copyFrom(Object src, long srcOffset, long dstOffset, 
long length)

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170520696
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java ---
@@ -50,12 +52,11 @@
 
   // These are only updated by readExternal() or read()
   @Nonnull
-  private Object base;
-  private long offset;
+  private MemoryBlock base;
   private int numBytes;
--- End diff --

Yeah, since I incrementally use `MemoryBlock`, some fields still remains as 
duplicated.  
I think that `UTF8String.length` can be used projected into 
`MemoryBlock.length` since `UTF8String` seems immutable. 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20667: [SPARK-23508][CORE] Use timeStampedHashMap for Bl...

2018-02-26 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20667#discussion_r170520957
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerId.scala 
---
@@ -132,10 +133,15 @@ private[spark] object BlockManagerId {
 getCachedBlockManagerId(obj)
   }
 
-  val blockManagerIdCache = new ConcurrentHashMap[BlockManagerId, 
BlockManagerId]()
+  val blockManagerIdCache = new TimeStampedHashMap[BlockManagerId, 
BlockManagerId](true)
 
-  def getCachedBlockManagerId(id: BlockManagerId): BlockManagerId = {
+  def getCachedBlockManagerId(id: BlockManagerId, clearOldValues: Boolean 
= false): BlockManagerId =
+  {
 blockManagerIdCache.putIfAbsent(id, id)
-blockManagerIdCache.get(id)
+val blockManagerId = blockManagerIdCache.get(id)
+if (clearOldValues) {
+  blockManagerIdCache.clearOldValues(System.currentTimeMillis - 
Utils.timeStringAsMs("10d"))
--- End diff --

Can we simply use `com.google.common.cache.Cache`? which has a size 
limitation and we don't need to worry about OOM.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Use softreference for BlockmanagerId...

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20667
  
Had a offline chat with @cloud-fan and we feel 
`com.google.common.cache.Cache` should be used here. You can find a example at 
`CodeGenerator.cache`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Use softreference for BlockmanagerId...

2018-02-26 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/20667
  
Nice @jiangxb1987 @cloud-fan I will modify later.Thanks!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #87668 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87668/testReport)**
 for PR 20611 at commit 
[`5e9b6a5`](https://github.com/apache/spark/commit/5e9b6a5cb9568b49cef9508e0b2dc216e0d6c1ef).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

2018-02-26 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20670#discussion_r170529019
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
   lazy val constraints: ExpressionSet = {
 if (conf.constraintPropagationEnabled) {
+  var relevantOutPutSet: AttributeSet = outputSet
+  constraints.foreach {
+case eq @ EqualTo(l: Attribute, r: Attribute) =>
+  if (l.references.subsetOf(relevantOutPutSet)
--- End diff --

You can avoid computing each `subsetOf` twice here.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

2018-02-26 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20670#discussion_r170528989
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
   lazy val constraints: ExpressionSet = {
 if (conf.constraintPropagationEnabled) {
+  var relevantOutPutSet: AttributeSet = outputSet
+  constraints.foreach {
+case eq @ EqualTo(l: Attribute, r: Attribute) =>
--- End diff --

`eq` isn't used


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20662: [SPARK-23475][UI][BACKPORT-2.3] Show also skipped...

2018-02-26 Thread mgaido91

Github user mgaido91 closed the pull request at:

https://github.com/apache/spark/pull/20662


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

2018-02-26 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/20670#discussion_r170529062
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala
 ---
@@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
*/
   lazy val constraints: ExpressionSet = {
 if (conf.constraintPropagationEnabled) {
+  var relevantOutPutSet: AttributeSet = outputSet
+  constraints.foreach {
+case eq @ EqualTo(l: Attribute, r: Attribute) =>
+  if (l.references.subsetOf(relevantOutPutSet)
+&& !r.references.subsetOf(relevantOutPutSet)) {
+relevantOutPutSet = relevantOutPutSet.++(r.references)
--- End diff --

Use ` ++ ` syntax, rather than write it as a method invocation.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18555: [SPARK-21353][CORE]add checkValue in spark.internal.conf...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18555
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20663: [SPARK-23501][UI] Refactor AllStagesPage in order...

2018-02-26 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20663#discussion_r170532005
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala 
---
@@ -143,76 +72,105 @@ private[ui] class AllStagesPage(parent: StagesTab) 
extends WebUIPage("") {
   Seq.empty[Node]
 }
   }
-if (shouldShowActiveStages) {
-  content ++=
-
-  
-
-Active Stages ({activeStages.size})
-  
- ++
-
-  {activeStagesTable.toNodeSeq}
-
-}
-if (shouldShowPendingStages) {
-  content ++=
-
-  
-
-Pending Stages ({pendingStages.size})
-  
- ++
-
-  {pendingStagesTable.toNodeSeq}
-
+
+tables.flatten.foreach(content ++= _)
+
+UIUtils.headerSparkPage("Stages for All Jobs", content, parent)
+  }
+
+  def summaryAndTableForStatus(
+  status: StageStatus,
+  request: HttpServletRequest): (Option[Elem], Option[NodeSeq]) = {
+val stages = if (status == StageStatus.FAILED) {
+  allStages.filter(_.status == status).reverse
+} else {
+  allStages.filter(_.status == status)
 }
-if (shouldShowCompletedStages) {
-  content ++=
-
-  
-
-Completed Stages ({completedStageNumStr})
-  
- ++
-
-  {completedStagesTable.toNodeSeq}
-
+
+if (stages.isEmpty) {
+  (None, None)
+} else {
+  val killEnabled = status == StageStatus.ACTIVE && parent.killEnabled
+  val isFailedStage = status == StageStatus.FAILED
+
+  val stagesTable =
+new StageTableBase(parent.store, request, stages, 
tableHeaderID(status), stageTag(status),
+  parent.basePath, subPath, parent.isFairScheduler, killEnabled, 
isFailedStage)
+  val stagesSize = stages.size
+  (Some(summary(status, stagesSize)), Some(table(status, stagesTable, 
stagesSize)))
 }
-if (shouldShowSkippedStages) {
-  content ++=
-
-  
-
-Skipped Stages ({skippedStages.size})
-  
- ++
-
-  {skippedStagesTable.toNodeSeq}
-
+  }
+
+  private def tableHeaderID(status: StageStatus): String = status match {
+case StageStatus.ACTIVE => "active"
+case StageStatus.COMPLETE => "completed"
+case StageStatus.FAILED => "failed"
+case StageStatus.PENDING => "pending"
+case StageStatus.SKIPPED => "skipped"
+  }
+
+  private def stageTag(status: StageStatus): String = status match {
+case StageStatus.ACTIVE => "activeStage"
+case StageStatus.COMPLETE => "completedStage"
+case StageStatus.FAILED => "failedStage"
+case StageStatus.PENDING => "pendingStage"
+case StageStatus.SKIPPED => "skippedStage"
+  }
+
+  private def headerDescription(status: StageStatus): String = status 
match {
+case StageStatus.ACTIVE => "Active"
+case StageStatus.COMPLETE => "Completed"
+case StageStatus.FAILED => "Failed"
+case StageStatus.PENDING => "Pending"
+case StageStatus.SKIPPED => "Skipped"
+  }
+
+  private def classSuffix(status: StageStatus): String = status match {
+case StageStatus.ACTIVE => "ActiveStages"
+case StageStatus.COMPLETE => "CompletedStages"
+case StageStatus.FAILED => "FailedStages"
+case StageStatus.PENDING => "PendingStages"
+case StageStatus.SKIPPED => "SkippedStages"
+  }
+
+  private def summaryContent(status: StageStatus, size: Int): String = {
+if (status == StageStatus.COMPLETE
+&& appSummary.numCompletedStages != size) {
+  s"${appSummary.numCompletedStages}, only showing $size"
+} else {
+  s"$size"
 }
-if (shouldShowFailedStages) {
-  content ++=
-
-  
-
-Failed Stages ({numFailedStages})
-  
- ++
-
-  {failedStagesTable.toNodeSeq}
-
+  }
+
+  private def summary(status: StageStatus, size: Int): Elem = {
+val summary =
+  
+
+  {headerDescription(status)} Stages:
+
+{summaryContent(status, size)}
+  
+
+if (status == StageStatus.COMPLETE) {
+  summary % Attribute(None, "id", Text("completed-summary"), Null)
--- End diff --

yes, I realized it while doing the refactor. It was a copy-and-paste 
mistak

[GitHub] spark pull request #20663: [SPARK-23501][UI] Refactor AllStagesPage in order...

2018-02-26 Thread mgaido91

Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/20663#discussion_r170532241
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllStagesPage.scala 
---
@@ -143,76 +72,105 @@ private[ui] class AllStagesPage(parent: StagesTab) 
extends WebUIPage("") {
   Seq.empty[Node]
 }
   }
-if (shouldShowActiveStages) {
-  content ++=
-
-  
-
-Active Stages ({activeStages.size})
-  
- ++
-
-  {activeStagesTable.toNodeSeq}
-
-}
-if (shouldShowPendingStages) {
-  content ++=
-
-  
-
-Pending Stages ({pendingStages.size})
-  
- ++
-
-  {pendingStagesTable.toNodeSeq}
-
+
+tables.flatten.foreach(content ++= _)
--- End diff --

that won't really works, since `tables.flatten` is a `Seq[NodeSeq]`. But 
I'll try and do something similar.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20667: [SPARK-23508][CORE] Fix BlockmanagerId in case blockMana...

2018-02-26 Thread caneGuy

Github user caneGuy commented on the issue:

https://github.com/apache/spark/pull/20667
  
Update @jiangxb1987 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20673: [SPARK-23515] Use input/output streams for large ...

2018-02-26 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20673#discussion_r170537137
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -100,7 +102,18 @@ private[spark] object JsonProtocol {
 executorMetricsUpdateToJson(metricsUpdate)
   case blockUpdate: SparkListenerBlockUpdated =>
 blockUpdateToJson(blockUpdate)
-  case _ => parse(mapper.writeValueAsString(event))
+  case _ =>
+// Use piped streams to avoid extra memory consumption
+val outputStream = new PipedOutputStream()
+val inputStream = new PipedInputStream(outputStream)
+try {
+  mapper.writeValue(outputStream, event)
--- End diff --

Wait wait .. does this lazily work for sure? Can we add a test (or manual 
test in the PR description) that reads some more data (maybe more then the 
buffer size in that pipe)?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20676: [SPARK-23516][CORE] It is unnecessary to transfer...

2018-02-26 Thread 10110346

GitHub user 10110346 opened a pull request:

https://github.com/apache/spark/pull/20676

[SPARK-23516][CORE] It is unnecessary to transfer unroll memory to storage 
memory

## What changes were proposed in this pull request?
In fact, unroll memory is also storage memory,so i think it is unnecessary 
to release unroll memory really, and then to get storage memory again.

## How was this patch tested?
Existing unit test


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/10110346/spark notreleasemem

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20676.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20676


commit 0574be52a46ff29253c432f1c71ded041d991351
Author: liuxian 
Date:   2018-02-26T09:38:30Z

fix




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19033: [SPARK-21811][SQL]Inconsistency when finding the ...

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19033#discussion_r170547020
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -150,9 +150,27 @@ object TypeCoercion {
   }
 
   private def findWiderCommonType(types: Seq[DataType]): Option[DataType] 
= {
+var awaitingString = false
--- End diff --

Maybe we should make a pass of all dataTypes here, and see whether there 
exists `StringType` and all other dataTypes can be promoted to `StringType`, 
under that case we can default return `StringType` instead of `None`, WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87667/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20382
  
**[Test build #87667 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87667/testReport)**
 for PR 20382 at commit 
[`fd890ad`](https://github.com/apache/spark/commit/fd890ad837bb7068c70a27921d67af1c3fe65350).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20382: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20382
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20675
  
**[Test build #87666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87666/testReport)**
 for PR 20675 at commit 
[`21f574e`](https://github.com/apache/spark/commit/21f574e2a3ad3c8e68b92776d2a141d7fcb90502).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20675: [SPARK-23033][SS][Follow Up] Task level retry for contin...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20675
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87666/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-26 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/20670
  
Good catch! This is a real problem, but the fix looks hacky.

By definition, I think `plan.contraints` should only include constraints 
that refer to `plan.output`, as that's the promise a plan can make to its 
parent. However, join is special as `Join.condition` can refer to both of the 
join sides, and we add the constraints to `Join.condition`, which is kind of we 
are making a promise to Join's children, not parent. My proposal:
```
  lazy val constraints: ExpressionSet = {
if (conf.constraintPropagationEnabled) {
  allConstraints.filter { c =>
c.references.nonEmpty && c.references.subsetOf(outputSet) && 
c.deterministic
  }
} else {
  ExpressionSet(Set.empty)
}
  }

  lazy val allConstraints = ExpressionSet(validConstraints
  .union(inferAdditionalConstraints(validConstraints))
  .union(constructIsNotNullConstraints(validConstraints)))
```
Then we can call `plan.allConstraints` when inferring contraints for join.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread juliuszsompolski

Github user juliuszsompolski commented on a diff in the pull request:

https://github.com/apache/spark/pull/20624#discussion_r170567874
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/StarSchemaDetection.scala
 ---
@@ -187,11 +187,11 @@ object StarSchemaDetection extends PredicateHelper {
   stats.rowCount match {
 case Some(rowCount) if rowCount >= 0 =>
   if (stats.attributeStats.nonEmpty && 
stats.attributeStats.contains(col)) {
-val colStats = stats.attributeStats.get(col)
-if (colStats.get.nullCount > 0) {
+val colStats = stats.attributeStats.get(col).get
+if (!colStats.hasCountStats || colStats.nullCount.get > 0) 
{
--- End diff --

`hasCountStats == distinctCount.isDefined && nullCount.isDefined`
So if it passed to the second part of the ||, then `hasCountStats == true 
-> nullCount.isDefined`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20611
  
**[Test build #87668 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87668/testReport)**
 for PR 20611 at commit 
[`5e9b6a5`](https://github.com/apache/spark/commit/5e9b6a5cb9568b49cef9508e0b2dc216e0d6c1ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20611
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87668/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20624
  
**[Test build #87671 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87671/testReport)**
 for PR 20624 at commit 
[`a006bab`](https://github.com/apache/spark/commit/a006bab64420b24d9aff242902d0145892879691).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20663
  
**[Test build #87670 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87670/testReport)**
 for PR 20663 at commit 
[`9a8e032`](https://github.com/apache/spark/commit/9a8e032f6d46cb181e8b775812acdda8805888a8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20676
  
**[Test build #87669 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87669/testReport)**
 for PR 20676 at commit 
[`0574be5`](https://github.com/apache/spark/commit/0574be52a46ff29253c432f1c71ded041d991351).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1056/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20663
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1057/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20663
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20624
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1058/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20624
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20449: [SPARK-23040][CORE]: Returns interruptible iterator for ...

2018-02-26 Thread advancedxy

Github user advancedxy commented on the issue:

https://github.com/apache/spark/pull/20449
  
@cloud-fan I have update the comments and fixed style issues(previously was 
auto formatted by IntelliJ)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/20611
  
@gatorsmile  Build is passed, Please review and let me know for any 
suggestions. Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20670
  
Agree with that @cloud-fan proposed to have constraints for a plan and the 
children. However, that requires a relative wider change as well as a find set 
of test cases, please don't be hesitate to ask for help if you run into any 
issues working on this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20670: [SPARK-23405] Add constranits

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20670
  
Also, a better title for this PR would be:
```
Generate additional constraints for Join's children
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20611
  
QQ: how does hive behave on the same/similar sql command?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread jiangxb1987

Github user jiangxb1987 commented on the issue:

https://github.com/apache/spark/pull/20663
  
@gengliangwang Mind take a look?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20677: Event time can't be greater then processing time....

2018-02-26 Thread deil87

GitHub user deil87 opened a pull request:

https://github.com/apache/spark/pull/20677

Event time can't be greater then processing time. 12:21, owl. Mistakeâ¦

â¦ on image.

## What changes were proposed in this pull request?

There is an error on image. Point 12:21 for owl can't have 12:19 processing 
time. Consequently new watermark should be computed on another data (12:15, 
cat). Donkey still will be rejected I suppose but I'm not sure whether those 
watermarks inclusive or exclusive. If watermark is set to 12:05 then I guess 
intermediate state for window 11:55(inclusive)-12:05(exclusive) could be 
cleared and there is no place for Donkey anymore. You need to change image and 
depending on how you will change it you need to adjust the text.

## How was this patch tested?
No tests for docs.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/deil87/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20677


commit 1bf1017db1e484d0ad85c14b166961ba35b08cc8
Author: Spiridonov Andrey 
Date:   2018-02-26T14:30:45Z

Event time can't be greater then processing time. 12:21, owl. Mistake on 
image.




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-26 Thread ala

Github user ala commented on a diff in the pull request:

https://github.com/apache/spark/pull/20664#discussion_r170611946
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with 
SharedSparkContext {
 }.collect()
   }
 
+  test("SPARK-23496: order of input partitions can result in severe skew 
in coalesce") {
+val numInputPartitions = 100
+val numCoalescedPartitions = 50
+val locations = Array("locA", "locB")
+
+val inputRDD = sc.makeRDD(Range(0, numInputPartitions).toArray[Int], 
numInputPartitions)
+assert(inputRDD.getNumPartitions == numInputPartitions)
+
+val locationPrefRDD = new LocationPrefRDD(inputRDD, { (p: Partition) =>
+  if (p.index < numCoalescedPartitions) {
+Seq(locations(0))
+  } else {
+Seq(locations(1))
+  }
+})
+val coalescedRDD = new CoalescedRDD(locationPrefRDD, 
numCoalescedPartitions)
+
+val numPartsPerLocation = coalescedRDD
+  .getPartitions
+  .map(coalescedRDD.getPreferredLocations(_).head)
+  .groupBy(identity)
+  .mapValues(_.size)
+
+// Without the fix these would be:
+//  numPartsPerLocation(locations(0)) == numCoalescedPartitions - 1
+//  numPartsPerLocation(locations(1)) == 1
+assert(numPartsPerLocation(locations(0)) > 0.4 * 
numCoalescedPartitions)
--- End diff --

Added comment about flakiness & fixed seed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20677: Event time can't be greater then processing time. 12:21,...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20677
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20664: [SPARK-23496][CORE] Locality of coalesced partiti...

2018-02-26 Thread ala

Github user ala commented on a diff in the pull request:

https://github.com/apache/spark/pull/20664#discussion_r170612006
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -1129,6 +1129,36 @@ class RDDSuite extends SparkFunSuite with 
SharedSparkContext {
 }.collect()
   }
 
+  test("SPARK-23496: order of input partitions can result in severe skew 
in coalesce") {
+val numInputPartitions = 100
+val numCoalescedPartitions = 50
+val locations = Array("locA", "locB")
+
+val inputRDD = sc.makeRDD(Range(0, numInputPartitions).toArray[Int], 
numInputPartitions)
+assert(inputRDD.getNumPartitions == numInputPartitions)
+
+val locationPrefRDD = new LocationPrefRDD(inputRDD, { (p: Partition) =>
+  if (p.index < numCoalescedPartitions) {
+Seq(locations(0))
+  } else {
+Seq(locations(1))
+  }
+})
+val coalescedRDD = new CoalescedRDD(locationPrefRDD, 
numCoalescedPartitions)
+
+val numPartsPerLocation = coalescedRDD
+  .getPartitions
+  .map(coalescedRDD.getPreferredLocations(_).head)
+  .groupBy(identity)
+  .mapValues(_.size)
+
+// Without the fix these would be:
--- End diff --

Done.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20664
  
**[Test build #87672 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87672/testReport)**
 for PR 20664 at commit 
[`0512736`](https://github.com/apache/spark/commit/051273651cd65b9eca568b37c79b50342a7f69c2).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20677: Event time can't be greater then processing time. 12:21,...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20677
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20676
  
**[Test build #87669 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87669/testReport)**
 for PR 20676 at commit 
[`0574be5`](https://github.com/apache/spark/commit/0574be52a46ff29253c432f1c71ded041d991351).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20676: [SPARK-23516][CORE] It is unnecessary to transfer unroll...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20676
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87669/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20664: [SPARK-23496][CORE] Locality of coalesced partitions can...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1059/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20611: [SPARK-23425][SQL]When wild card is been used in load co...

2018-02-26 Thread sujith71955

Github user sujith71955 commented on the issue:

https://github.com/apache/spark/pull/20611
  
@jiangxb1987 Yes, Hive supports such way of data loading, suppose we have 
many files which starts with same naming conventions, user will preferto use 
such kind of queries where he can take advantage of wild cards
eg: load data inpath 'hdfs://hacluster/user/ext*  into table t1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-26 Thread HyukjinKwon

GitHub user HyukjinKwon opened a pull request:

https://github.com/apache/spark/pull/20678

[SPARK-23380][PYTHON] Adds a conf for Arrow fallback in 
toPandas/createDataFrame with Pandas DataFrame

## What changes were proposed in this pull request?

This PR adds a configuration to control the fallback of Arrow optimization 
for `toPandas` and `createDataFrame` with Pandas DataFrame.

## How was this patch tested?

Manually tested and unit tests added.

You can test this by:

**`createDataFrame`**

```python
spark.conf.set("spark.sql.execution.arrow.enabled", False)
pdf = spark.createDataFrame([[{'a': 1}]]).toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True)
spark.createDataFrame(pdf, "a: map")
```

```python
spark.conf.set("spark.sql.execution.arrow.enabled", False)
pdf = spark.createDataFrame([[{'a': 1}]]).toPandas()
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False)
spark.createDataFrame(pdf, "a: map")
```


**`toPandas`**

```python
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", True)
spark.createDataFrame([[{'a': 1}]]).toPandas()
```

```python
spark.conf.set("spark.sql.execution.arrow.enabled", True)
spark.conf.set("spark.sql.execution.arrow.fallback.enabled", False)
spark.createDataFrame([[{'a': 1}]]).toPandas()
```

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HyukjinKwon/spark SPARK-23380-conf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20678.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20678


commit ff9d38b691bd54c073db3b55983564cfbb0d903e
Author: hyukjinkwon 
Date:   2018-02-26T15:02:55Z

Adds a conf for Arrow fallback in toPandas/createDataFrame with Pandas 
DataFrame




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20567: [SPARK-23380][PYTHON] Make toPandas fallback to n...

2018-02-26 Thread HyukjinKwon

Github user HyukjinKwon closed the pull request at:

https://github.com/apache/spark/pull/20567


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20567: [SPARK-23380][PYTHON] Make toPandas fallback to non-Arro...

2018-02-26 Thread HyukjinKwon

Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20567
  
I just opened another PR for adding a configuration - 
https://github.com/apache/spark/pull/20678. Let me close this one.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19222: [SPARK-10399][CORE][SQL] Introduce multiple Memor...

2018-02-26 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19222#discussion_r170622554
  
--- Diff: 
common/unsafe/src/main/java/org/apache/spark/unsafe/memory/UnsafeMemoryAllocator.java
 ---
@@ -19,15 +19,24 @@
 
 import org.apache.spark.unsafe.Platform;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.nio.ByteBuffer;
+
+import sun.nio.ch.DirectBuffer;
+
 /**
  * A simple {@link MemoryAllocator} that uses {@code Unsafe} to allocate 
off-heap memory.
  */
 public class UnsafeMemoryAllocator implements MemoryAllocator {
 
   @Override
-  public MemoryBlock allocate(long size) throws OutOfMemoryError {
+  public OffHeapMemoryBlock allocate(long size) throws OutOfMemoryError {
+// No usage of DirectByteBuffer.allocateDirect is current design
--- End diff --

Let me put this comment here
```
No usage of DirectByteBuffer.allocateDirect is current design
Platform.allocateMemory is used here.

http://downloads.typesafe.com/website/presentations/ScalaDaysSF2015/T4_Xin_Performance_Optimization.pdf#page=26
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1060/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20678
  
**[Test build #87673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87673/testReport)**
 for PR 20678 at commit 
[`ff9d38b`](https://github.com/apache/spark/commit/ff9d38b691bd54c073db3b55983564cfbb0d903e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20663
  
**[Test build #87670 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87670/testReport)**
 for PR 20663 at commit 
[`9a8e032`](https://github.com/apache/spark/commit/9a8e032f6d46cb181e8b775812acdda8805888a8).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallb...

2018-02-26 Thread HyukjinKwon

Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20678#discussion_r170623237
  
--- Diff: python/pyspark/sql/dataframe.py ---
@@ -1986,55 +1986,89 @@ def toPandas(self):
 timezone = None
 
 if self.sql_ctx.getConf("spark.sql.execution.arrow.enabled", 
"false").lower() == "true":
+_should_fallback = False
 try:
-from pyspark.sql.types import 
_check_dataframe_convert_date, \
-_check_dataframe_localize_timestamps, to_arrow_schema
+from pyspark.sql.types import to_arrow_schema
 from pyspark.sql.utils import 
require_minimum_pyarrow_version
+
 require_minimum_pyarrow_version()
-import pyarrow
 to_arrow_schema(self.schema)
-tables = self._collectAsArrow()
-if tables:
-table = pyarrow.concat_tables(tables)
-pdf = table.to_pandas()
-pdf = _check_dataframe_convert_date(pdf, self.schema)
-return _check_dataframe_localize_timestamps(pdf, 
timezone)
-else:
-return pd.DataFrame.from_records([], 
columns=self.columns)
 except Exception as e:
-msg = (
-"Note: toPandas attempted Arrow optimization because "
-"'spark.sql.execution.arrow.enabled' is set to true. 
Please set it to false "
-"to disable this.")
-raise RuntimeError("%s\n%s" % (_exception_message(e), msg))
-else:
-pdf = pd.DataFrame.from_records(self.collect(), 
columns=self.columns)
 
-dtype = {}
+if 
self.sql_ctx.getConf("spark.sql.execution.arrow.fallback.enabled", "false") \
+.lower() == "true":
+msg = (
+"toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to 
true; however, "
+"failed by the reason below:\n"
+"  %s\n"
+"Attempts non-optimization as "
+"'spark.sql.execution.arrow.fallback.enabled' is 
set to "
+"true." % _exception_message(e))
+warnings.warn(msg)
+_should_fallback = True
+else:
+msg = (
+"toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to 
true; however, "
+"failed by the reason below:\n"
+"  %s\n"
+"For fallback to non-optimization automatically, 
please set true to "
+"'spark.sql.execution.arrow.fallback.enabled'." % 
_exception_message(e))
+raise RuntimeError(msg)
+
+if not _should_fallback:
+try:
+from pyspark.sql.types import 
_check_dataframe_convert_date, \
+_check_dataframe_localize_timestamps
+import pyarrow
+
+tables = self._collectAsArrow()
+if tables:
+table = pyarrow.concat_tables(tables)
+pdf = table.to_pandas()
+pdf = _check_dataframe_convert_date(pdf, 
self.schema)
+return _check_dataframe_localize_timestamps(pdf, 
timezone)
+else:
+return pd.DataFrame.from_records([], 
columns=self.columns)
+except Exception as e:
+# We might have to allow fallback here as well but 
multiple Spark jobs can
+# be executed. So, simply fail in this case for now.
+msg = (
+"toPandas attempted Arrow optimization because "
+"'spark.sql.execution.arrow.enabled' is set to 
true; however, "
+"failed unexpectedly:\n"
+"  %s" % _exception_message(e))
+raise RuntimeError(msg)
+
+# Below is toPandas without Arrow optimization.
--- End diff --

Likewise, the change from here is due to removed `else:` block.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20663
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87670/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20663
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20640: [SPARK-19755][Mesos] Blacklist is always active for Meso...

2018-02-26 Thread IgorBerman

Github user IgorBerman commented on the issue:

https://github.com/apache/spark/pull/20640
  
@skonto, @susanxhuynh, @squito So let's agree that:
1. I'll revert log when there is some failure. I'll reword it to be 
something without "blacklisting"
2. The blacklisting itself will be moved to BlacklistTracker(as it now)

bottom line the only thing missing - adding log in a case of failure(but 
without counting number of failures etc)

WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20677: Event time can't be greater then processing time. 12:21,...

2018-02-26 Thread deil87

Github user deil87 commented on the issue:

https://github.com/apache/spark/pull/20677
  
This patch needs to be changed before it can be merged. It is an image that 
involved in mistake and I don't have sources for image to change it nicely. 
Admins should verify and suggest next actions.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1061/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20678: [SPARK-23380][PYTHON] Adds a conf for Arrow fallback in ...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20678
  
**[Test build #87674 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87674/testReport)**
 for PR 20678 at commit 
[`7f87d25`](https://github.com/apache/spark/commit/7f87d2537488ca03c926f4d9c6318451c688ebe5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20679: [SPARK-23514] Use SessionState.newHadoopConf() to...

2018-02-26 Thread juliuszsompolski

GitHub user juliuszsompolski opened a pull request:

https://github.com/apache/spark/pull/20679

[SPARK-23514] Use SessionState.newHadoopConf() to propage hadoop configs 
set in SQLConf.

## What changes were proposed in this pull request?

A few places in `spark-sql` were using `sc.hadoopConfiguration` directly. 
They should be using `sessionState.newHadoopConf()` to blend in configs that 
were set through `SQLConf`.

Also, for better UX, for these configs blended in from `SQLConf`, we should 
consider removing the `spark.hadoop` prefix, so that the settings are 
recognized whether or not they were specified by the user.

## How was this patch tested?

Tested that AlterTableRecoverPartitions now correctly recognizes settings 
that are passed in to the FileSystem through SQLConf.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/juliuszsompolski/apache-spark SPARK-23514

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20679.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20679


commit 2c070fcd053acb47d8a8c3214d67e106b5683376
Author: Juliusz Sompolski 
Date:   2018-02-26T15:13:23Z

spark-23514




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-26 Thread juliuszsompolski

Github user juliuszsompolski commented on the issue:

https://github.com/apache/spark/pull/20679
  
cc @gatorsmile @rxin 
We had a chat whether to implement something that would catch direct 
misuses of sc.hadoopConfiguration in sql module, but it seems that it's not 
very common, so maybe just fixing it where it happened is enough.

@liancheng suggested stripping the "spark.hadoop" prefix to have more 
compatibility with users specifying or not specifying that prefix.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20624
  
**[Test build #87671 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87671/testReport)**
 for PR 20624 at commit 
[`a006bab`](https://github.com/apache/spark/commit/a006bab64420b24d9aff242902d0145892879691).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20679
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20679
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1062/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread mgaido91

Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/20663
  
Jenkins, retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20679: [SPARK-23514] Use SessionState.newHadoopConf() to propag...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20679
  
**[Test build #87675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87675/testReport)**
 for PR 20679 at commit 
[`2c070fc`](https://github.com/apache/spark/commit/2c070fcd053acb47d8a8c3214d67e106b5683376).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20663: [SPARK-23501][UI] Refactor AllStagesPage in order to avo...

2018-02-26 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20663
  
**[Test build #87676 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87676/testReport)**
 for PR 20663 at commit 
[`9a8e032`](https://github.com/apache/spark/commit/9a8e032f6d46cb181e8b775812acdda8805888a8).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20624: [SPARK-23445] ColumnStat refactoring

2018-02-26 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20624
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 >

1 - 100 of 411 matches

Mail list logo