Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r47920442
--- Diff:
core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala
---
@@ -235,7 +235,7 @@ class ExternalSorterSuite extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r47922285
--- Diff:
mllib/src/test/scala/org/apache/spark/mllib/util/MLlibTestSparkContext.scala ---
@@ -38,12 +38,15 @@ trait MLlibTestSparkContext extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r47920869
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala ---
@@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/10289
[SPARK-12311][CORE] Restore previous value of "os.arch" property in test
suites after forcing to set specific value to "os.arch" property
Restore the original value of os.arch
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10289#issuecomment-164779119
@srowen, OK, I will correct this pattern everywhere.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10289#issuecomment-164802842
There are two potential bugs
1. A method of the super class is not called in `beforeEach()`,
`afterEach()`, `beforeAll()`, and `afterAll()`
2. Although
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-168571357
I see. Sounds good. I will reformat them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-168872227
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-168127380
@yhuai, I think that I will work. Do you think how many characters are
preferable to all at a line?
---
If your project is set up for it, you can reply to this email
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10488#issuecomment-167848836
@yhuai, do you mean that I would update all of the string concatenation in
@ExpressionDescription by using multi-line string literals rather than only the
original one
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10488#issuecomment-167853136
I see. I will create another JIRA entry to update other usages.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10589#discussion_r48960725
--- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10589#discussion_r48922143
--- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10628#discussion_r49050418
--- Diff:
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
---
@@ -0,0 +1,162 @@
+package
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-168038054
Jenkins, retest this please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10593#discussion_r48920875
--- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10589#discussion_r48921242
--- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala ---
@@ -0,0 +1,102 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r48209450
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala ---
@@ -43,12 +45,21 @@ import org.apache.spark.util.Utils
class
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r48209783
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala ---
@@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10289#issuecomment-166765714
Can I retest this? Timeout may occur in pyspark.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/10488
[SPARK-12530][Build] Fix build break at Spark-Master-Maven-Snapshots from
#1293
Compilation error caused due to string concatenations that are not a
constant
Use raw string literal to avoid
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10488#issuecomment-167418794
Thanks for letting me know them. I will check them.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10488#issuecomment-167424111
@hvanhovell, a set of following two conditions causes this compilation
failure:
1. Use more than one string concatenations
2. Use scala 2.11 compiler (open
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10488#discussion_r48472393
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -57,9 +57,10 @@ case class Md5(child: Expression) extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10488#discussion_r48498744
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -57,9 +57,10 @@ case class Md5(child: Expression) extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10488#discussion_r48500799
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -57,9 +57,10 @@ case class Md5(child: Expression) extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r48238449
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala ---
@@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10289#discussion_r48238546
--- Diff:
yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala ---
@@ -43,12 +45,21 @@ import org.apache.spark.util.Utils
class
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/10463
[SPARK-12502][Build][Python] Script /dev/run-tests fails when IBM Java is
used
fix an exception with IBM JDK by removing update field from a JavaVersion
tuple. This is because IBM JDK does not have
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10289#issuecomment-165960572
I will fix this merge issue this weekend.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/10524
[SPARK-12580][SQL] Remove string concatenations from usage and extended in
@ExpressionDescription
Use multi-line string literals for @ExpressionDescription with ``//
scalastyle:off line.size.limit
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-167957361
@yhuai, are these changes fine with the policy that you proposed? I would
appreciate it if you would check them.
---
If your project is set up for it, you can reply
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/10524#issuecomment-167996508
On jenkins with scala-2.11, we can wrap once in a source file (i.e. only
one string concatenation). To use more than one concatenation causes
compilation error
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10488#discussion_r48534614
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -57,9 +57,10 @@ case class Md5(child: Expression) extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/10488#discussion_r48534869
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -57,9 +57,10 @@ case class Md5(child: Expression) extends
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13505#discussion_r65811458
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
---
@@ -86,11 +86,31 @@ package object expressions
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13439
Oh, you are right. IMHO, it is too complex to introduce new implementation
classes only for a column vector with the same value in all of the rows.
To introduce compression schemes, as implemented
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13539
test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13439
Can we implement this feature simply by
- using ```ColumnarBatch.allocate(StructType schema, MemoryMode memMode,
int maxRows)``` with ```maxRows=1```
- not introducing
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/11301#discussion_r66726703
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala ---
@@ -37,6 +39,16 @@ private[sql] object Column {
def apply(expr: Expression
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13589#discussion_r66568805
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala
---
@@ -490,6 +490,7 @@ class CodegenContext
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13505#discussion_r65804276
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
---
@@ -86,11 +86,31 @@ package object expressions
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13663
Spark SparkSPARK-15950 Eliminate unreachable code at projection for
complex types
## What changes were proposed in this pull request?
This PR eliminates unreachable code at projection
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13663
@cloud-fan and @davies , thank you for your comments.
A global variable is used for 1. I will address 1. by using another
approach without using a global variable in another PR. This PR will focus
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13680
[SPARK-15962][SQL] Introduce additonal implementation with a dense format
for UnsafeArrayData
## What changes were proposed in this pull request?
This PR introduces two implementations
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67209280
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -60,18 +60,24 @@ case class CreateArray
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13505#discussion_r65804344
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala
---
@@ -86,11 +86,31 @@ package object expressions
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13472#discussion_r65804523
--- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala ---
@@ -33,18 +38,37 @@ import org.apache.commons.lang3.SystemUtils
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13472#discussion_r65804521
--- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala ---
@@ -33,18 +38,37 @@ import org.apache.commons.lang3.SystemUtils
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13439
Can we have a benchmark program to show performance improvement?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13459
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67849394
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +27,55 @@ import
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67849962
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +164,415 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67850981
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +164,415 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67993179
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -23,7 +23,60 @@ import
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
@hvanhovell , I added [a
file](https://github.com/kiszk/spark/blob/133d4c0085b5ca2f20870c05d077e25d8715e07a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
@hvanhovell, yes, it is good idea. Actually, I wrote a benchmark program
```org.apache.spark.sql.catalyst.util.GenericArrayBenchmark``` (not committed
yet). An issue in my environment is that I
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67996014
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala
---
@@ -112,8 +112,8 @@ class CodeGenerationSuite
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67995936
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +196,414 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67995993
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +196,414 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67997326
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +164,415 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r68004874
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayDataBenchmark.scala
---
@@ -0,0 +1,188 @@
+/*
+ * Licensed
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r68004837
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
---
@@ -159,17 +159,17 @@ object CatalystTypeConverters
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r68007482
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +164,415 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r67994789
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +196,414 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r68015184
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayDataBenchmark.scala
---
@@ -0,0 +1,188 @@
+/*
+ * Licensed
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13758#discussion_r68039256
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala
---
@@ -142,3 +196,414 @@ class GenericArrayData(val array
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67315501
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +26,38 @@ import
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13704
[SPARK-15985][SQL] Reduce runtime overhead of a program that reads an
primitive array in Dataset
## What changes were proposed in this pull request?
This PR reduces runtime overhead
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67315144
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -71,7 +71,8 @@ case class CreateArray
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67213607
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -60,18 +60,24 @@ case class CreateArray
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67863392
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +27,55 @@ import
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13663
@cloud-fan , @davies , I would appreciate it if you would look at this
again. I think that I had addressed all of your comments.
---
If your project is set up for it, you can reply to this email
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13758
[SPARK-16043][SQL] Prepare GenericArrayData implementation specialized for
a primitive array
## What changes were proposed in this pull request?
This PR addresses a ToDo
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13757
[SPARK-16042][SQL] Eliminate nullcheck code at projection for an array type
## What changes were proposed in this pull request?
This PR eliminates nullcheck code at projection for an array
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
It is OK to always keep ```[null bits]```
One question: Is this format to keep fixed space for ```[values]```? I mean
if ```[null bit]``` is true, the corresponding element in ```[value
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
Good to hear. I will make an implementation for single format. If I would
meet some issues, I will raise them here.
---
If your project is set up for it, you can reply to this email and have your
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@cloud-fan thank you for your good comment. I also read [previous
proposal](https://github.com/apache/spark/pull/12640#discussion_r61539393).
I love to have only single format (or implementation
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
@cloud-fan , thank you for your comment. About conversion between primitive
array and unsafe array, I think we are on the same page. The motivation of my
recent PRs is to reduce overhead to change
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@cloud-fan , I have one question about null field. Should we put zero into
the corresponding field to position where ```setNullAt()``` is called as
```UnsafeRow```
[does](https://github.com/apache
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
One potential performance issue is that we have to always clear all of null
bits at ```UnsafeArrayWriter.initialize()```. This is because
```holder.buffer``` is reused for each row. If one row has
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67386476
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -60,18 +60,24 @@ case class CreateArray
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13663#discussion_r67272226
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -60,18 +60,24 @@ case class CreateArray
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
I see. I assumed that virtual call will be devirtualized by declaring
```final``` method and by optimistically propagating type information in the
JIT compiler. Would it be better to add a flag like
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
@hvanhovell , @cloud-fan , could you review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@cloud-fan , for the first issue, we are on the same page. Your proposal is
what I am thinking about as possible solutions. I will do that.
For the second issue, it seems to be design choice
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13704
@cloud-fan I think ```value#63``` is ```UnsafeArrayData```.
When I ran a DataFrame program, I got the following trees. Since operations
for DataFrame access data in UnsafeArrayData, I think
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13899
@andrewor14 Looks interesting.
I created two PRs that generate similar code like [your
code](https://gist.github.com/andrewor14/7ce4c37a3c6bcd5cc2b6b16c861859e9). My
PRs use current
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
Jenkins, retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13909
[SPARK-16213][SQL] Reduce runtime overhead of a program that creates an
primitive array in DataFrame
## What changes were proposed in this pull request?
This PR reduces runtime overhead
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@rxin thank you for your comment. As you said, holistic view is important.
This PR is not only for machine learning.
This PR has another use case for improving projection of an array in any
GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/13911
[SPARK-16215][SQL] Reduce runtime overhead of a program that writes an
primitive array in Dataframe/Dataset
## What changes were proposed in this pull request?
This PR optimize generate
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13680
@cloud-fan and @hvanhovell thank you for your comments. Based on your
comments, I implemented ```UnsafeArrayData``` by using one implementation with
explicit clearing ```null bits
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13663
@cloud-fan , @davies , I would appreciate it if you would look at this
again.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user kiszk commented on the issue:
https://github.com/apache/spark/pull/13758
@hvanhovell and @cloud-fan This```GenericArrayData``` can be used in
generated code for a program with an primitive array written in DataFrame or
Dataset. I newly added Dataframe benchmark program
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13909#discussion_r68519480
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +26,20 @@ import
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13909#discussion_r68519507
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala ---
@@ -26,6 +26,20 @@ import
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13909#discussion_r68519430
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -51,27 +51,52 @@ case class CreateArray
Github user kiszk commented on a diff in the pull request:
https://github.com/apache/spark/pull/13909#discussion_r68519450
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
---
@@ -51,27 +51,52 @@ case class CreateArray
Github user kiszk commented on the pull request:
https://github.com/apache/spark/pull/11146#issuecomment-182347108
@nongli Is there some kind of design doc on the ColumnarBatch? I am
planning to make PRs for columnar storage and its computations with
DataFrame/Dataset.
We
1 - 100 of 3561 matches
Mail list logo