[GitHub] spark pull request #13911: [SPARK-16215][SQL] Reduce runtime overhead of a p...

2016-11-08 Thread kiszk
Github user kiszk closed the pull request at:

https://github.com/apache/spark/pull/13911


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13911: [SPARK-16215][SQL] Reduce runtime overhead of a p...

2016-06-25 Thread kiszk
GitHub user kiszk opened a pull request:

https://github.com/apache/spark/pull/13911

[SPARK-16215][SQL] Reduce runtime overhead of a program that writes an 
primitive array in Dataframe/Dataset

## What changes were proposed in this pull request?

This PR optimize generate code of projection for an primitive type array. 
While we know primitive type array does not require null check and has 
contigious data region, current generated code performs null checks and 
performs copy for each element (at Lines 075-082 at Generated code before 
applying this PR)

1. Eliminate null checks for each array element
2. Perform bulk data copy by using ```Platform.copy```
3. Eliminate primitive array allocation in ```GenericArrayData``` when 
https://github.com/apache/spark/pull/13758 is merged
4. Eliminate setting sparse index for ```UnsafeArrayData``` when 
https://github.com/apache/spark/pull/13680 is merged

They are done in helper method 
```UnsafeArrayWrite.writePrimitiveArray()``` (at Line 075 at 
Generated code after applying this PR).

For now, 3 and 4 are not enabled. But, code are ready.


An example program
```
val df = sparkContext.parallelize(Seq(0.0d, 1.0d), 1).toDF
df.selectExpr("Array(value + 1.1d, value + 2.2d)").collect
```

Generated code before applying this PR
```java
/* 028 */   protected void processNext() throws java.io.IOException {
/* 029 */ while (inputadapter_input.hasNext()) {
/* 030 */   InternalRow inputadapter_row = (InternalRow) 
inputadapter_input.next();
/* 031 */   double inputadapter_value = inputadapter_row.getDouble(0);
/* 032 */
/* 033 */   final boolean project_isNull = false;
/* 034 */   this.project_values = new Object[2];
/* 035 */   double project_value1 = -1.0;
/* 036 */   project_value1 = inputadapter_value + 1.1D;
/* 037 */   if (false) {
/* 038 */ project_values[0] = null;
/* 039 */   } else {
/* 040 */ project_values[0] = project_value1;
/* 041 */   }
/* 042 */
/* 043 */   double project_value4 = -1.0;
/* 044 */   project_value4 = inputadapter_value + 2.2D;
/* 045 */   if (false) {
/* 046 */ project_values[1] = null;
/* 047 */   } else {
/* 048 */ project_values[1] = project_value4;
/* 049 */   }
/* 050 */
/* 051 */   final ArrayData project_value = new 
org.apache.spark.sql.catalyst.util.GenericArrayData(project_values);
/* 052 */   this.project_values = null;
/* 053 */   project_holder.reset();
/* 054 */
/* 055 */   project_rowWriter.zeroOutNullBytes();
/* 056 */
/* 057 */   if (project_isNull) {
/* 058 */ project_rowWriter.setNullAt(0);
/* 059 */   } else {
/* 060 */ // Remember the current cursor so that we can calculate 
how many bytes are
/* 061 */ // written later.
/* 062 */ final int project_tmpCursor = project_holder.cursor;
/* 063 */
/* 064 */ if (project_value instanceof UnsafeArrayData) {
/* 065 */   final int project_sizeInBytes = ((UnsafeArrayData) 
project_value).getSizeInBytes();
/* 066 */   // grow the global buffer before writing data.
/* 067 */   project_holder.grow(project_sizeInBytes);
/* 068 */   ((UnsafeArrayData) 
project_value).writeToMemory(project_holder.buffer, project_holder.cursor);
/* 069 */   project_holder.cursor += project_sizeInBytes;
/* 070 */
/* 071 */ } else {
/* 072 */   final int project_numElements = 
project_value.numElements();
/* 073 */   project_arrayWriter.initialize(project_holder, 
project_numElements, 8);
/* 074 */
/* 075 */   for (int project_index = 0; project_index < 
project_numElements; project_index++) {
/* 076 */ if (project_value.isNullAt(project_index)) {
/* 077 */   project_arrayWriter.setNullAt(project_index);
/* 078 */ } else {
/* 079 */   final double project_element = 
project_value.getDouble(project_index);
/* 080 */   project_arrayWriter.write(project_index, 
project_element);
/* 081 */ }
/* 082 */   }
/* 083 */
/* 084 */ }
/* 085 */
/* 086 */ project_rowWriter.setOffsetAndSize(0, project_tmpCursor, 
project_holder.cursor - project_tmpCursor);
/* 087 */ project_rowWriter.alignToWords(project_holder.cursor - 
project_tmpCursor);
/* 088 */   }
/* 089 */   project_result.setTotalSize(project_holder.totalSize());
/* 090 */   append(project_result);
/* 091 */   if (shouldStop()) return;
/* 092 */ }
/* 093 */   }
/* 094 */ }
```

Generated code after applying this PR
```java
/* 028 */