from:"kiszk"

[GitHub] spark issue #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggregate.row....

2018-08-18 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21931
  
LGTM, cc @cloud-fan @hvanhovell 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-18 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21087: [SPARK-23997][SQL] Configurable maximum number of...

2018-08-18 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21087#discussion_r211080067
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala
 ---
@@ -164,9 +165,12 @@ case class BucketSpec(
 numBuckets: Int,
 bucketColumnNames: Seq[String],
 sortColumnNames: Seq[String]) {
-  if (numBuckets <= 0 || numBuckets >= 10) {
+  def conf: SQLConf = SQLConf.get
+
+  if (numBuckets <= 0 || numBuckets > conf.bucketingMaxBuckets) {
--- End diff --

Since the condition is changed from `>` to `>=`, there is inconsistent 
between the condition and the error message.

If this condition is true, the message is like `... but less than or equal 
to bucketing.maxBuckets ...`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-18 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21860
  
cc @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
cc @ueshin @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22135: [SPARK-25093][SQL] Avoid recompiling regexp for comments...

2018-08-17 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22135
  
SGTM, but can we address the similar issues at once?
Even under `src/main/...`, we can see this pattern at several places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22125
  
@KraFusion Sorry, I overlooked another PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: [SPARK-25108][SQL] Fix the show method to display the wi...

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
Thank you for creating a JIRA entry and for putting the result. The test 
case is not available yet.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22126: [SPARK-23938][SQL][FOLLOW-UP][TEST] Nullabilities of val...

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22126
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22125: [DOCS] Fix cloud-integration.md Typo

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22125
  
Thanks, would it possible to address similar issues? For example, in 
`configurations.md`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-16 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21537
  
@gatorsmile Thank you for your reply. Could you elaborate on your 
suggestion?
>A general suggestion. To avoid introducing the regressions, how about 
implementing a new one without changing the existing one? We can easily switch 
to the new one when it becomes stable.

Does it mean to work in a particular branch or to work in a fork repository 
until its implementation has been completed ? 



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-15 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21537
  
For 2. and 3., it is harder to say my opinion in the comment. Let me say 
short comments at first.

For 2., if I remember correctly, @viirya once wrote the API document in a 
JIRA entry. it would be good to update and add some thoughts about design as a 
first step. I understand that it is hard to keep the up-to-date design 
document, in particular, during the open-source development. This is because we 
have a lot of excellent comments in a PR. 

For 3., at the early implementation of SQL codegen (i.e. use `s""` to 
represent code), I thought there are two problems.
1. lack of type of an expression (in other words, `ExprCode` did not have 
the type of `value`)
2. lack of a structure of statements

Now, we meet a problem that it is hard to rewrite a method argument due to 
problem 1. In my understanding, the current effort led by @viirya is trying to 
resolve problem 1.
It is hard to rewrite a set of statements due to Problem 2. To resolve 
problem 2, we need more effort. In my opinion, we are addressing them step by 
step.

Of course, it would be happy for me to co-lead a discussion of the IR 
design for 2.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21537: [SPARK-24505][SQL] Convert strings in codegen to blocks:...

2018-08-15 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21537
  
Thank for involving me in an important thread. I was busy this morning in 
Japan.

I think there are three topics in the thread.
1. Merge or revert this PR
2. Design document
3. IR design

For 1., in short, my opinion is likely to revert this PR from the view like 
a release manager.

As we know, it is a time to cut a release branch. This PR partially 
introduce a new representation. If there would be a bug in code generation at 
Spark 2.4, it may introduce a complication of debugging and fixing.
As @mgaido91 pointed out, #20043 and #21026 have been merged. I think that 
they are a kind of refactoring (e.g. change the representation of literal, 
class, and so on). Thus, these two PRs can be there. However, this PR 
introduces a mixture of representation `s""` and `code""`.

WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-15 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21912
  
cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22110: [SPARK-25122][SQL] Deduplication of supports equa...

2018-08-15 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22110#discussion_r210282476
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/AbstractDataType.scala 
---
@@ -115,6 +115,8 @@ protected[sql] abstract class AtomicType extends 
DataType {
   private[sql] type InternalType
   private[sql] val tag: TypeTag[InternalType]
   private[sql] val ordering: Ordering[InternalType]
+
+  private[spark] override def supportsEquals: Boolean = true
--- End diff --

+1


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22103#discussion_r210152479
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
   try {
 val cf = new ClassFile(new ByteArrayInputStream(classBytes))
 val stats = cf.methodInfos.asScala.flatMap { method =>
-  method.getAttributes().filter(_.getClass.getName == 
codeAttr.getName).map { a =>
+  method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
--- End diff --

Yeah, I know The current comparison is more strict. Although the previous 
comparison was only for name, the current comparison is for a pair of class 
loader and name. 

I worried whether the strictness may change behavior.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22105: [SPARK-25115] [Core] Eliminate extra memory copy ...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22105#discussion_r210056504
  
--- Diff: 
common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java
 ---
@@ -140,8 +140,24 @@ private int copyByteBuf(ByteBuf buf, 
WritableByteChannel target) throws IOExcept
 // SPARK-24578: cap the sub-region's size of returned nio buffer to 
improve the performance
 // for the case that the passed-in buffer has too many components.
 int length = Math.min(buf.readableBytes(), NIO_BUFFER_LIMIT);
-ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
-int written = target.write(buffer);
+// If the ByteBuf holds more then one ByteBuffer we should better call 
nioBuffers(...)
+// to eliminate extra memory copies.
+int written = 0;
+if (buf.nioBufferCount() == 1) {
+  ByteBuffer buffer = buf.nioBuffer(buf.readerIndex(), length);
+  written = target.write(buffer);
+} else {
+  ByteBuffer[] buffers = buf.nioBuffers(buf.readerIndex(), length);
+  for (ByteBuffer buffer: buffers) {
+int remaining = buffer.remaining();
+int w = target.write(buffer);
+written += w;
--- End diff --

Can we guarantee `written` does not overflow while we accumulate `int` 
values? 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22101: [SPARK-25114][Core] Fix RecordBinaryComparator when subt...

2018-08-14 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22101
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22103#discussion_r209995159
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
   try {
 val cf = new ClassFile(new ByteArrayInputStream(classBytes))
 val stats = cf.methodInfos.asScala.flatMap { method =>
-  method.getAttributes().filter(_.getClass.getName == 
codeAttr.getName).map { a =>
+  method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
--- End diff --

Why do we need change this condition?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22103: [SPARK-25113][SQL] Add logging to CodeGenerator w...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22103#discussion_r209993118
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -1385,9 +1386,15 @@ object CodeGenerator extends Logging {
   try {
 val cf = new ClassFile(new ByteArrayInputStream(classBytes))
 val stats = cf.methodInfos.asScala.flatMap { method =>
-  method.getAttributes().filter(_.getClass.getName == 
codeAttr.getName).map { a =>
+  method.getAttributes().filter(_.getClass eq codeAttr).map { a =>
 val byteCodeSize = 
codeAttrField.get(a).asInstanceOf[Array[Byte]].length
 
CodegenMetrics.METRIC_GENERATED_METHOD_BYTECODE_SIZE.update(byteCodeSize)
+
+if (byteCodeSize > DEFAULT_JVM_HUGE_METHOD_LIMIT) {
+  logInfo("Generated method too long to be JIT compiled: " +
--- End diff --

nit: `Generated method is too long ...`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19222: [SPARK-10399][SPARK-23879][CORE][SQL] Introduce multiple...

2018-08-14 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/19222
  
ping @rednaxelafx


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209878827
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,98 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
--- End diff --

I think that it would be good to have a utility method to generate this 
call. WDYT?
On the other hand, I agree that we have a method `setArrayElementFunc` or 
`setArrayElement`. Thus, we can split this method into two methods that have 
only one return value.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209877426
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/ArrayData.scala 
---
@@ -34,6 +36,37 @@ object ArrayData {
 case a: Array[Double] => UnsafeArrayData.fromPrimitiveArray(a)
 case other => new GenericArrayData(other)
   }
+
+
+  /**
+   * Allocate [[UnsafeArrayData]] or [[GenericArrayData]] based on given 
parameters.
+   *
+   * @param elementSize a size of an element in bytes
+   * @param numElements the number of elements the array should contain
+   * @param isPrimitiveType whether the type of an element is primitive 
type
+   * @param additionalErrorMessage string to include in the error message
+   */
+  def allocateArrayData(
+  elementSize: Int,
+  numElements : Long,
+  isPrimitiveType: Boolean,
+  additionalErrorMessage: String) : ArrayData = {
+val arraySize = 
UnsafeArrayData.calculateSizeOfUnderlyingByteArray(numElements, elementSize)
+if (isPrimitiveType && 
!UnsafeArrayData.shouldUseGenericArrayData(elementSize, numElements)) {
--- End diff --

When `UnsafeArrayData` can be used, `GenericArrayData` is also used. 
However, if the element size is large, `GenericArrayData` should be used. But, 
`UnsafeArrayData` cannot be used.

Thus, I think that it would be good to use the current name 
`shouldUseGenericArrayData`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22101: [SPARK-23207][Core][FOLLOWUP] Fix RecordBinaryCom...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22101#discussion_r209863964
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java
 ---
@@ -42,16 +42,16 @@ public int compare(
   while ((leftOff + i) % 8 != 0 && i < leftLen) {
 res = (Platform.getByte(leftObj, leftOff + i) & 0xff) -
 (Platform.getByte(rightObj, rightOff + i) & 0xff);
-if (res != 0) return res;
+if (res != 0) return (int) res;
 i += 1;
   }
 }
 // for architectures that support unaligned accesses, chew it up 8 
bytes at a time
 if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + 
i) % 8 == 0))) {
   while (i <= leftLen - 8) {
-res = (int) ((Platform.getLong(leftObj, leftOff + i) -
-Platform.getLong(rightObj, rightOff + i)) % 
Integer.MAX_VALUE);
-if (res != 0) return res;
+res = Platform.getLong(leftObj, leftOff + i) -
+Platform.getLong(rightObj, rightOff + i);
+if (res != 0) return res > 0 ? 1 : -1;
--- End diff --

+1 for no subtraction.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22101: [SPARK-23207][Core][FOLLOWUP] Fix RecordBinaryCom...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22101#discussion_r209855589
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java
 ---
@@ -60,7 +60,7 @@ public int compare(
 while (i < leftLen) {
   res = (Platform.getByte(leftObj, leftOff + i) & 0xff) -
   (Platform.getByte(rightObj, rightOff + i) & 0xff);
-  if (res != 0) return res;
+  if (res != 0) return (int) res;
--- End diff --

ditto


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22101: [SPARK-23207][Core][FOLLOWUP] Fix RecordBinaryCom...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22101#discussion_r209855528
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java
 ---
@@ -42,16 +42,16 @@ public int compare(
   while ((leftOff + i) % 8 != 0 && i < leftLen) {
 res = (Platform.getByte(leftObj, leftOff + i) & 0xff) -
 (Platform.getByte(rightObj, rightOff + i) & 0xff);
-if (res != 0) return res;
+if (res != 0) return (int) res;
--- End diff --

How about the following change to minimize and localize the change?
```
int res = (Platform.getByte(leftObj, leftOff + i) & 0xff) -
 (Platform.getByte(rightObj, rightOff + i) & 0xff);
if (res != 0) return res;
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22101: [SPARK-23207][Core][FOLLOWUP] Fix RecordBinaryCom...

2018-08-14 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22101#discussion_r209855135
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/execution/RecordBinaryComparator.java
 ---
@@ -42,16 +42,16 @@ public int compare(
   while ((leftOff + i) % 8 != 0 && i < leftLen) {
 res = (Platform.getByte(leftObj, leftOff + i) & 0xff) -
 (Platform.getByte(rightObj, rightOff + i) & 0xff);
-if (res != 0) return res;
+if (res != 0) return (int) res;
 i += 1;
   }
 }
 // for architectures that support unaligned accesses, chew it up 8 
bytes at a time
 if (Platform.unaligned() || (((leftOff + i) % 8 == 0) && ((rightOff + 
i) % 8 == 0))) {
   while (i <= leftLen - 8) {
-res = (int) ((Platform.getLong(leftObj, leftOff + i) -
-Platform.getLong(rightObj, rightOff + i)) % 
Integer.MAX_VALUE);
-if (res != 0) return res;
+res = Platform.getLong(leftObj, leftOff + i) -
+Platform.getLong(rightObj, rightOff + i);
+if (res != 0) return res > 0 ? 1 : -1;
--- End diff --

How about the following change to minimize and localize the change?
```
long res = Platform.getLong(leftObj, leftOff + i) -
 Platform.getLong(rightObj, rightOff + i);
if (res != 0) return res > 0 ? 1 : -1;
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22001: [SPARK-24819][CORE] Fail fast when no enough slots to la...

2018-08-13 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22001
  
Just curious.

It is very interesting to me since the recent three tries consistently 
cause a timeout failure at the same test.
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94687
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94705
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94716

In addition, other PRs look successful without timeout.

```
[info] - abort the job if total size of results is too large (1 second, 122 
milliseconds)
Exception in thread "task-result-getter-3" java.lang.Error: 
java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:206)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:222)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
at 
org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:115)
at 
org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:701)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply$mcV$sp(TaskResultGetter.scala:82)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3$$anonfun$run$1.apply(TaskResultGetter.scala:63)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1932)
at 
org.apache.spark.scheduler.TaskResultGetter$$anon$3.run(TaskResultGetter.scala:62)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-13 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21912
  
cc @ueshin @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

2018-08-13 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
@xuejianbest Could you please create a JIRA entry and add test cases to 
this PR?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-13 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209692496
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 case _ => false
   }
 
-  // TODO: if the nullability of field is correct, we can use it to save 
null check.
   private def writeStructToBuffer(
   ctx: CodegenContext,
   input: String,
   index: String,
-  fieldTypes: Seq[DataType],
+  fieldTypeAndNullables: Seq[(DataType, Boolean)],
--- End diff --

@cloud-fan I found this case class `case class Schema(dataType: DataType, 
nullable: Boolean)` at two places.
1. `ScalaReflection.Schema`
1. `SchemaConverters.SchemaType`

Do we use one of them? Or do we define `org.apache.spark.sql.types.Schema`?



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createU...

2018-08-13 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21912#discussion_r209643599
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
 ---
@@ -735,70 +735,100 @@ class CodegenContext {
   }
 
   /**
-   * Generates code creating a [[UnsafeArrayData]].
+   * Generates code creating a [[UnsafeArrayData]] or [[GenericArrayData]] 
based on
+   * given parameters.
*
* @param arrayName name of the array to create
+   * @param elementType data type of the elements in source array
* @param numElements code representing the number of elements the array 
should contain
-   * @param elementType data type of the elements in the array
* @param additionalErrorMessage string to include in the error message
+   * @param elementSize optional value which shows the size of an element 
of the allocated
+   *[[UnsafeArrayData]] or [[GenericArrayData]]
+   *
+   * @return code representing the allocation of [[ArrayData]]
+   * code representing a setter of an assignment for the generated 
array
*/
-  def createUnsafeArray(
+  def createArrayData(
   arrayName: String,
-  numElements: String,
   elementType: DataType,
-  additionalErrorMessage: String): String = {
-val arraySize = freshName("size")
-val arrayBytes = freshName("arrayBytes")
+  numElements: String,
+  additionalErrorMessage: String,
+  elementSize: Option[Int] = None): (String, String) = {
--- End diff --

Yeah, they are exclusive. However, when `elementType` is not used, the 
integer value is required.

`MapEntries` stores `UnsafeRow` instead of primitive data. 
[This](https://github.com/apache/spark/blob/c15630014c7ef850fa21c1247fa5edd2b1bac81b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala#L405)
 requires non-fixed size unlike `elementType.defaultSize`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-12 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22083: [SQL][Test][Minor] Add missing codes to ParquetCo...

2018-08-12 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22083

[SQL][Test][Minor] Add missing codes to 
ParquetCompressionCodecPrecedenceSuite

## What changes were proposed in this pull request?

This PR adds codes to ``"Test `spark.sql.parquet.compression.codec` 
config"` in `ParquetCompressionCodecPrecedenceSuite`.

## How was this patch tested?

Existing UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark 
ParquetCompressionCodecPrecedenceSuite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22083.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22083


commit 974224b495fc8155b9a6a73673ea478ccdb164d6
Author: Kazuaki Ishizaki 
Date:   2018-08-12T07:08:42Z

update




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22082: [SPARK-24420][Build][FOLLOW-UP] Upgrade ASM6 APIs

2018-08-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22082
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22007
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-11 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22007
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209416053
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 case _ => false
   }
 
-  // TODO: if the nullability of field is correct, we can use it to save 
null check.
   private def writeStructToBuffer(
   ctx: CodegenContext,
   input: String,
   index: String,
-  fieldTypes: Seq[DataType],
+  fieldTypeAndNullables: Seq[(DataType, Boolean)],
--- End diff --

@cloud-fan What name of the case class do you suggest?  `DataTypeNullable`, 
or others?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22068
  
Thanks. BTW, I found another instance in test, not in doc.

Do we address this in this PR?  Or, do we address in another PR?
@HyukjinKwon  WDYT ?

```
class ParquetCompressionCodecPrecedenceSuite extends ParquetTest with 
SharedSQLContext {
  test("Test `spark.sql.parquet.compression.codec` config") {
Seq("NONE", "UNCOMPRESSED", "SNAPPY", "GZIP", "LZO").foreach { c =>
  withSQLConf(SQLConf.PARQUET_COMPRESSION.key -> c) {
val expected = if (c == "NONE") "UNCOMPRESSED" else c
val option = new ParquetOptions(Map.empty[String, String], 
spark.sessionState.conf)
assert(option.compressionCodecClassName == expected)
  }
}
  }
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22016: Fix typos

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22016
  
#22070 addresses more typo.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22070: Fix typos detected by github.com/client9/misspell

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22070
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22067: [SPARK-25084][SQL] distribute by on multiple columns may...

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22067
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22069: [MINOR][DOC] Fix Java example code in Column's comments

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22069
  
Do we need to update the following similar examples, too?

Column.scala
```
   * {{{
   *   // Example: encoding gender string column into integer.
   *
   *   // Scala:
   *   people.select(when(people("gender") === "male", 0)
   * .when(people("gender") === "female", 1)
   * .otherwise(2))
   *
   *   // Java:
   *   people.select(when(col("gender").equalTo("male"), 0)
   * .when(col("gender").equalTo("female"), 1)
   * .otherwise(2))
   * }}}
```

functions.scala
```
   * {{{
   *   // Example: encoding gender string column into integer.
   *
   *   // Scala:
   *   people.select(when(people("gender") === "male", 0)
   * .when(people("gender") === "female", 1)
   * .otherwise(2))
   *
   *   // Java:
   *   people.select(when(col("gender").equalTo("male"), 0)
   * .when(col("gender").equalTo("female"), 1)
   * .otherwise(2))
   * }}}
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22068: [MINOR][DOC]Add missing compression codec .

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22068
  
Would it be better to update a comment in `DataFrameWriter.scala`, too?

```
   * `compression` (default is the value specified in 
`spark.sql.parquet.compression.codec`):
   * compression codec to use when saving to file. This can be one of the 
known case-insensitive
   * shorten names(`none`, `snappy`, `gzip`, and `lzo`). This will override
   * `spark.sql.parquet.compression.codec`.
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209180525
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -43,25 +43,29 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 case _ => false
   }
 
-  // TODO: if the nullability of field is correct, we can use it to save 
null check.
   private def writeStructToBuffer(
   ctx: CodegenContext,
   input: String,
   index: String,
-  fieldTypes: Seq[DataType],
+  fieldTypeAndNullables: Seq[(DataType, Boolean)],
--- End diff --

I think that it would be good since it is used at `JavaTypeInference` and 
`higherOrderFunctions`.
cc @ueshin


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-10 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r209178573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -170,6 +174,23 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
 
 val element = CodeGenerator.getValue(tmpInput, et, index)
 
+val primitiveTypeName = if (CodeGenerator.isPrimitiveType(jt)) {
--- End diff --

good catch


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-10 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21994: [SPARK-24529][Build][test-maven][follow-up] Add s...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21994#discussion_r209147728
  
--- Diff: pom.xml ---
@@ -2609,6 +2609,28 @@
   
 
   
+  
+com.github.spotbugs
+spotbugs-maven-plugin
--- End diff --

let me check the elapsed time on my environment. +1 for holding on for now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
I think that this is not a data correctness issue. This may cause 
unexpected program abort due to hardware memory access error.
BTW, it would be good to backport it to increase stability.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
The failure of 
`org.apache.spark.sql.catalyst.expressions.JsonExpressionsSuite.from_json 
missing fields` is due to passing `null` while the schema has `nullable=false`.

This inconsistency is agreed in the discussion at 
[SPARK-23173](https://issues.apache.org/jira/browse/SPARK-23173). 
`Assume that each field in schema passed to from_json is nullable, and 
ignore the nullability information set in the passed schema.`

When `spark.sql.fromJsonForceNullableSchema=false`, I think that a test is 
invalid to pass `nullable=false` in the corresponding schema to the missing 
field.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
cc @hvanhovell


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22059: [SPARK-25036][SQL] Avoid discarding unmoored doc comment...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22059
  
cc @srowen @ueshin @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22059: [SPARK-25036][SQL] Avoid discarding unmoored doc comment...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22059
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
cc @cloud-fan


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22058: [SPARK-25036][SQL][FOLLOW-UP] Avoid match may not be exh...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22058
  
I think that this is the last one with the following command. But, I would 
like to confirm this with @ueshin.

`build/sbt -Pscala-2.12 -Phadoop-2.6 -Pkubernetes -Phive -Pmesos 
-Phive-thriftserver -Pflume  -Pkinesis-asl  -Pyarn package`



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22053: [SPARK-25069][CORE]Using UnsafeAlignedOffset to make the...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22053
  
Good catch
LGTM with a comment: Would it be better to update comments regarding `4 
bytes` with `4 or 8 bytes` in `UnsafeExternalSorter.java`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22055: [MINOR][BUILD] Update Jetty to 9.3.24.v20180605

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22055
  
LGTM
These changes are not huge. It looks non-risky.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22059: [SPARK-25036][SQL] Avoid discarding unmoored doc ...

2018-08-09 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22059

[SPARK-25036][SQL] Avoid discarding unmoored doc comment in Scala-2.12. 

## What changes were proposed in this pull request?

This PR avoid the following compilation error using sbt in Scala-2.12.

```
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:410:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn] 
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala:441:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn] 
...
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala:440:
 discarding unmoored doc comment
[error] [warn] /**
[error] [warn] 
```

## How was this patch tested?

Existing UTs


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-25036d

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22059.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22059


commit 70ec11e5b85100b0b10257f7c0846b63af1ba494
Author: Kazuaki Ishizaki 
Date:   2018-08-09T17:11:30Z

initial commit




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22058: [SPARK-25036][SQL][FOLLOW-UP] Avoid match may not be exh...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22058
  
cc @ueshin @srowen @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22058: [SPARK-25036][SQL][FOLLOW-UP] Avoid match may not...

2018-08-09 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22058

[SPARK-25036][SQL][FOLLOW-UP] Avoid match may not be exhaustive in 
Scala-2.12.

## What changes were proposed in this pull request?

This is a follow-up pr of #22014 and #22039

We still have some more compilation errors in mllib with scala-2.12 with 
sbt:

```
[error] [warn] 
/home/ishizaki/Spark/PR/scala212/spark/mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala:116:
 match may not be exhaustive.
[error] It would fail on the following inputs: ("silhouette", _), (_, 
"cosine"), (_, "squaredEuclidean"), (_, String()), (_, _)
[error] [warn] ($(metricName), $(distanceMeasure)) match {
[error] [warn] 
```

## How was this patch tested?

Existing UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-25036c

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22058.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22058


commit 6a0ee38c53d8a53d219bfec8cad9953bc9571e0c
Author: Kazuaki Ishizaki 
Date:   2018-08-09T16:29:40Z

initial commit

commit 5655d83bd2c9d6ca872c371bff421f69409b6d0b
Author: Kazuaki Ishizaki 
Date:   2018-08-09T16:45:23Z

make the change one-liner




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20184: [SPARK-22987][Core] UnsafeExternalSorter cases OO...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20184#discussion_r208993136
  
--- Diff: 
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
 ---
@@ -116,13 +138,18 @@ public void loadNext() throws IOException {
 if (taskContext != null) {
   taskContext.killTaskIfInterrupted();
 }
-recordLength = din.readInt();
-keyPrefix = din.readLong();
-if (recordLength > arr.length) {
-  arr = new byte[recordLength];
+// check if the reader is closed to prevent reopen the in and din.
+if (!hasNext()) {
+  throw new IndexOutOfBoundsException("Can not load next item when 
UnsafeSorterSpillReader is closed.");
+}
+recordLength = getDin().readInt();
+keyPrefix = getDin().readLong();
+int arrLength = Math.max(1024 * 1024, recordLength);
+if (arrLength > arr.length) {
+  arr = new byte[arrLength];
   baseObject = arr;
 }
-ByteStreams.readFully(in, arr, 0, recordLength);
+ByteStreams.readFully(getIn(), arr, 0, recordLength);
--- End diff --

Is it fine if `recordLength` is greater than `1024 * 1024`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22001#discussion_r208950122
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -929,6 +955,28 @@ class DAGScheduler(
   // HadoopRDD whose underlying HDFS files have been deleted.
   finalStage = createResultStage(finalRDD, func, partitions, jobId, 
callSite)
 } catch {
+  case e: Exception if e.getMessage ==
+  
DAGScheduler.ERROR_MESSAGE_BARRIER_REQUIRE_MORE_SLOTS_THAN_CURRENT_TOTAL_NUMBER 
=>
+logWarning("The job requires to run a barrier stage that requires 
more slots than the " +
+  "total number of slots in the cluster currently.")
+jobIdToNumTasksCheckFailures.putIfAbsent(jobId, 0)
+val numCheckFailures = jobIdToNumTasksCheckFailures.get(jobId) + 1
--- End diff --

Is it OK while this increment is not atomic?
In the following scenario, the value may not be correct
1. We assume `jobIdToNumTasksCheckFailures(jobId) = 1`
1. Thread A executes L963, then `numCheckFailures = 2`
1. Thread B executes L963, then `numCheckFailures = 2`
1. Thread B executes L964 and L965, then 
`jobIdToNumTasksCheckFailures(jobId)` has 2.
1. Thread A executes L964 and L965, then 
`jobIdToNumTasksCheckFailures(jobId)` has 2.

Since two threads detected failure, we expect `3`. But, it is `2`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22001#discussion_r208947201
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -203,6 +203,17 @@ class DAGScheduler(
 sc.getConf.getInt("spark.stage.maxConsecutiveAttempts",
   DAGScheduler.DEFAULT_MAX_CONSECUTIVE_STAGE_ATTEMPTS)
 
+  /**
+   * Number of max concurrent tasks check failures for each job.
+   */
+  private[scheduler] val jobIdToNumTasksCheckFailures = new 
ConcurrentHashMap[Int, Int]
+
+  /**
+   * Time in seconds to wait between a max concurrent tasks check failure 
and the next check.
--- End diff --

nit: `a max` -> `max`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22001#discussion_r208946523
  
--- Diff: 
core/src/main/scala/org/apache/spark/internal/config/package.scala ---
@@ -577,4 +577,17 @@ package object config {
   .timeConf(TimeUnit.SECONDS)
   .checkValue(v => v > 0, "The value should be a positive time value.")
   .createWithDefaultString("365d")
+
+  private[spark] val BARRIER_MAX_CONCURRENT_TASKS_CHECK_INTERVAL =
+
ConfigBuilder("spark.scheduler.barrier.maxConcurrentTasksCheck.interval")
+  .doc("Time in seconds to wait between a max concurrent tasks check 
failure and the next " +
--- End diff --

nit: `a max` -> `max`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22001: [SPARK-24819][CORE] Fail fast when no enough slot...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22001#discussion_r208945843
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -1602,6 +1602,15 @@ class SparkContext(config: SparkConf) extends 
Logging {
 }
   }
 
+  /**
+   * Get the max number of tasks that can be concurrent launched currently.
--- End diff --

How about like this?
```
 * Get the max number of tasks that can be concurrently launched when the 
method is called.
 * Note that please don't cache the value returned by this method, because 
the number can be
 * changed due to adding/removing executors. 
```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22055: [MINOR][BUILD] Update Jetty to 9.3.24.v20180605

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22055
  
Release notes

[9.3.21](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.3.21.v20170918)

[9.3.22](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.3.22.v20171030)

[9.3.23](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.3.23.v20180228)

[9.3.24](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.3.24.v20180605)


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21505: [SPARK-24457][SQL] Improving performance of stringToTime...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21505
  
gentle ping @ssonker 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-09 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22044#discussion_r208862917
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/collectionOperations.scala
 ---
@@ -3410,6 +3410,28 @@ case class ArrayDistinct(child: Expression)
 case _ => false
   }
 
+  @transient protected lazy val canUseSpecializedHashSet = elementType 
match {
--- End diff --

We can do so. To minimize the changes due to remaining time for cutting, I 
would like to do this in another PR #21912.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistinct

2018-08-09 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22044
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21993: [SPARK-24983][Catalyst] Add configuration for max...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21993#discussion_r208822706
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -631,19 +631,26 @@ object ColumnPruning extends Rule[LogicalPlan] {
 object CollapseProject extends Rule[LogicalPlan] {
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
-case p1 @ Project(_, p2: Project) =>
-  if (haveCommonNonDeterministicOutput(p1.projectList, 
p2.projectList)) {
-p1
-  } else {
-p2.copy(projectList = buildCleanedProjectList(p1.projectList, 
p2.projectList))
-  }
+case p1@Project(_, p2: Project) =>
--- End diff --

nit: Do we need to change this line? We can keep this line as is.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
I created [a PR](https://github.com/apache/spark/pull/16086) to solve the 
same problem.
Can this PR handle [East Asian 
Width](http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt) 
correctly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22048: Fix the show method to display the wide character alignm...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22048
  
Since this change does not look minor, could you please create a JIRA entry?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistinct

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22044
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21859: [SPARK-24900][SQL]Speed up sort when the dataset is smal...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21859
  
Good point. [These 
failures](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/94431/testReport/)
 may show that it affects other places.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20636: [SPARK-23415][SQL][TEST] Make behavior of BufferHolderSp...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20636
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22040: [Minor][Doc] Fix typo

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22040
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistinct

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22044
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22007
  
This change looks non-risky to me.
cc @swoen @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22044: [SPARK-23912][SQL][Followup] Refactor ArrayDistin...

2018-08-08 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22044

[SPARK-23912][SQL][Followup] Refactor ArrayDistinct

## What changes were proposed in this pull request?

This PR simplified code generation for `ArrayDistinct`. #21966 enabled code 
generation only if the type can be specialized by the hash set. This PR follows 
this strategy.

Optimization of null handling will be implemented in #21912.

## How was this patch tested?

Existing UTs

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark SPARK-23912-follow

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22044


commit 9d0cb100a85dc77569e1a62b83e0461ee2c33ddb
Author: Kazuaki Ishizaki 
Date:   2018-08-08T18:43:51Z

stop code generation when elementTypeSupportEquals is false




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20637: [SPARK-23466][SQL] Remove redundant null checks in gener...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/20637
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r208677733
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -142,7 +143,7 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
   case _ => s"$rowWriter.write($index, ${input.value});"
 }
 
-if (input.isNull == "false") {
+if (input.isNull == "false" || !nullable) {
--- End diff --

good catch, thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #20637: [SPARK-23466][SQL] Remove redundant null checks i...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20637#discussion_r208677695
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 ---
@@ -308,10 +319,10 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
   expressions: Seq[Expression],
   useSubexprElimination: Boolean = false): ExprCode = {
 val exprEvals = ctx.generateExpressions(expressions, 
useSubexprElimination)
-val exprTypes = expressions.map(_.dataType)
+val exprTypeAndNullables = expressions.map(e => (e.dataType, 
e.nullable))
 
-val numVarLenFields = exprTypes.count {
-  case dt if UnsafeRow.isFixedLength(dt) => false
+val numVarLenFields = exprTypeAndNullables.count {
+  case (dt, _) if UnsafeRow.isFixedLength(dt) => false
--- End diff --

sure


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22039: [SPARK-25036][SQL][FOLLOW-UP] Avoid match may not...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22039#discussion_r208666588
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala
 ---
@@ -273,6 +273,9 @@ class SymmetricHashJoinStateManager(
   s.copy(desc = newDesc(desc)) -> value
 case (s @ StateStoreCustomTimingMetric(_, desc), value) =>
   s.copy(desc = newDesc(desc)) -> value
+case (s, _) =>
+  throw new IllegalArgumentException(
+  s"Unknown state store custom metric is found at metrics: $s")
--- End diff --

nit: 2 more spaces?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22039: [SPARK-25036][SQL][FOLLOW-UP] Avoid match may not...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/22039#discussion_r208666245
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala
 ---
@@ -394,6 +394,9 @@ class MicroBatchExecution(
   case (src: Source, off) => src.commit(off)
   case (reader: MicroBatchReader, off) =>
 reader.commit(reader.deserializeOffset(off.json))
+  case (src, _) =>
+throw new IllegalArgumentException(
+  s"Unknows source is found at constructNextBatch: $src")
--- End diff --

nit: `Unknows` -> `Unknown`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22041: [SPARK-25058][SQL] Use Block.isEmpty/nonEmpty to check w...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22041
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22040: [Minor][Doc] Fix typo

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22040
  
cc @HyukjinKwon 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22040: [Minor][Doc] Fix typo

2018-08-08 Thread kiszk

GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/22040

[Minor][Doc] Fix typo

## What changes were proposed in this pull request?

This PR fixes typo regarding `auxiliary verb + verb[s]`.

## How was this patch tested?

N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kiszk/spark spellcheck1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22040.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22040


commit f9f5feb6a82db36dd0bf6b2f2a18ded0a7d22d39
Author: Kazuaki Ishizaki 
Date:   2018-08-08T17:05:47Z

fix typos

commit aa4e530986a8d64cb76c0715ca8576f19c8ab8f9
Author: Kazuaki Ishizaki 
Date:   2018-08-08T17:09:35Z

fix typo




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21956: [MINOR][DOCS] Fix grammatical error in SortShuffleManage...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21956
  
@HyukjinKwon sure, I will open a PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22007: [SPARK-25033] Bump Apache commons.{httpclient, httpcore}

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22007
  
@fokko Do we need to update files under `dev/deps`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21912: [SPARK-24962][SQL] Refactor CodeGenerator.createUnsafeAr...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21912
  
cc @ueshin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #22035: [SPARK-23911][SQL][FOLLOW-UP] Fix examples of aggregate ...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22035
  
LGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21860: [SPARK-24901][SQL]Merge the codegen of RegularHashMap an...

2018-08-08 Thread kiszk

Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/21860
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21931: [SPARK-24978][SQL]Add spark.sql.fast.hash.aggrega...

2018-08-08 Thread kiszk

Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/21931#discussion_r208561981
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/AggregateBenchmark.scala
 ---
@@ -366,6 +366,43 @@ class AggregateBenchmark extends BenchmarkBase {
  */
   }
 
+  ignore("capacity for fast hash aggregate") {
+val N = 20 << 20
+val M = 1 << 19
+
+val benchmark = new Benchmark("Aggregate w multiple keys", N)
+sparkSession.range(N)
+  .selectExpr(
+"id",
+s"(id % $M) as k1",
+s"cast(id % $M as int) as k2",
+s"cast(id % $M as double) as k3",
+s"cast(id % $M as float) as k4") .createOrReplaceTempView("test")
+
+def f(): Unit = sparkSession.sql("select k1, k2, k3, k4, sum(k1), 
sum(k2), sum(k3), " +
+  "sum(k4) from test group by k1, k2, k3, k4").collect()
+
+benchmark.addCase(s"fasthash = default") { iter =>
+  
sparkSession.conf.set("spark.sql.codegen.aggregate.map.row.capacitybit", "16")
+  f()
+}
+
+benchmark.addCase(s"fasthash = config") { iter =>
+  
sparkSession.conf.set("spark.sql.codegen.aggregate.map.row.capacitybit", "20")
+  f()
+}
+
+benchmark.run()
+
+/*
+Java HotSpot(TM) 64-Bit Server VM 1.8.0_60-b27 on Windows 7 6.1
+Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
+Aggregate w multiple keys:   Best/Avg Time(ms)
Rate(M/s)   Per Row(ns)   Relative
--- End diff --

nit: we need to reduce # of characters per line up to 100. IIUC, the number 
is more than 100.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 3564 matches

Mail list logo