[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-17 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r47920442 --- Diff: core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala --- @@ -235,7 +235,7 @@ class ExternalSorterSuite extends

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-17 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r47922285 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/util/MLlibTestSparkContext.scala --- @@ -38,12 +38,15 @@ trait MLlibTestSparkContext extends

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-17 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r47920869 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-13 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/10289 [SPARK-12311][CORE] Restore previous value of "os.arch" property in test suites after forcing to set specific value to "os.arch" property Restore the original value of os.arch

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-15 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10289#issuecomment-164779119 @srowen, OK, I will correct this pattern everywhere. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-15 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10289#issuecomment-164802842 There are two potential bugs 1. A method of the super class is not called in `beforeEach()`, `afterEach()`, `beforeAll()`, and `afterAll()` 2. Although

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2016-01-03 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-168571357 I see. Sounds good. I will reformat them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2016-01-04 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-168872227 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2015-12-30 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-168127380 @yhuai, I think that I will work. Do you think how many characters are preferable to all at a line? --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167848836 @yhuai, do you mean that I would update all of the string concatenation in @ExpressionDescription by using multi-line string literals rather than only the original one

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167853136 I see. I will create another JIRA entry to update other usages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-06 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10589#discussion_r48960725 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-05 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10589#discussion_r48922143 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: [SPARK-12635][SQL] Add ColumnarBatch, an in me...

2016-01-07 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10628#discussion_r49050418 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java --- @@ -0,0 +1,162 @@ +package

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2015-12-30 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-168038054 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-12644][SQL] Update parquet reader to be...

2016-01-05 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10593#discussion_r48920875 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: [SPARK-12640][SQL] Add simple benchmarking uti...

2016-01-05 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10589#discussion_r48921242 --- Diff: core/src/test/scala/org/apache/spark/Benchmark.scala --- @@ -0,0 +1,102 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r48209450 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -43,12 +45,21 @@ import org.apache.spark.util.Utils class

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r48209783 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-22 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10289#issuecomment-166765714 Can I retest this? Timeout may occur in pyspark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-27 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/10488 [SPARK-12530][Build] Fix build break at Spark-Master-Maven-Snapshots from #1293 Compilation error caused due to string concatenations that are not a constant Use raw string literal to avoid

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-27 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167418794 Thanks for letting me know them. I will check them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-27 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10488#issuecomment-167424111 @hvanhovell, a set of following two conditions causes this compilation failure: 1. Use more than one string concatenations 2. Use scala 2.11 compiler (open

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-28 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48472393 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-28 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48498744 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-28 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48500799 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r48238449 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/BaseYarnClusterSuite.scala --- @@ -116,7 +120,7 @@ abstract class BaseYarnClusterSuite

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10289#discussion_r48238546 --- Diff: yarn/src/test/scala/org/apache/spark/deploy/yarn/ClientSuite.scala --- @@ -43,12 +45,21 @@ import org.apache.spark.util.Utils class

[GitHub] spark pull request: [SPARK-12502][Build][Python] Script /dev/run-t...

2015-12-23 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/10463 [SPARK-12502][Build][Python] Script /dev/run-tests fails when IBM Java is used fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have

[GitHub] spark pull request: [SPARK-12311][CORE] Restore previous value of ...

2015-12-19 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10289#issuecomment-165960572 I will fix this merge issue this weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2015-12-30 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/10524 [SPARK-12580][SQL] Remove string concatenations from usage and extended in @ExpressionDescription Use multi-line string literals for @ExpressionDescription with ``// scalastyle:off line.size.limit

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2015-12-30 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-167957361 @yhuai, are these changes fine with the policy that you proposed? I would appreciate it if you would check them. --- If your project is set up for it, you can reply

[GitHub] spark pull request: [SPARK-12580][SQL] Remove string concatenation...

2015-12-30 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/10524#issuecomment-167996508 On jenkins with scala-2.11, we can wrap once in a source file (i.e. only one string concatenation). To use more than one concatenation causes compilation error

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48534614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends

[GitHub] spark pull request: [SPARK-12530][Build] Fix build break at Spark-...

2015-12-29 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/10488#discussion_r48534869 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -57,9 +57,10 @@ case class Md5(child: Expression) extends

[GitHub] spark pull request #13505: [SPARK-15764][SQL] Replace N^2 loop in BindRefere...

2016-06-04 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13505#discussion_r65811458 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -86,11 +86,31 @@ package object expressions

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-06 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13439 Oh, you are right. IMHO, it is too complex to introduce new implementation classes only for a column vector with the same value in all of the rows. To introduce compression schemes, as implemented

[GitHub] spark issue #13539: [SPARK-15795] [SQL] Enable more optimizations in whole s...

2016-06-07 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13539 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-06 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13439 Can we implement this feature simply by - using ```ColumnarBatch.allocate(StructType schema, MemoryMode memMode, int maxRows)``` with ```maxRows=1``` - not introducing

[GitHub] spark pull request #11301: [SPARK-13432][SQL] add the source file name and l...

2016-06-12 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/11301#discussion_r66726703 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Column.scala --- @@ -37,6 +39,16 @@ private[sql] object Column { def apply(expr: Expression

[GitHub] spark pull request #13589: [SPARK-15822][SPARK-15825][SQL] Fix SMJ Segfault/...

2016-06-10 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13589#discussion_r66568805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -490,6 +490,7 @@ class CodegenContext

[GitHub] spark pull request #13505: [SPARK-15764][SQL] Replace N^2 loop in BindRefere...

2016-06-04 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13505#discussion_r65804276 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -86,11 +86,31 @@ package object expressions

[GitHub] spark pull request #13663: Spark SparkSPARK-15950 Eliminate unreachable code...

2016-06-14 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13663 Spark SparkSPARK-15950 Eliminate unreachable code at projection for complex types ## What changes were proposed in this pull request? This PR eliminates unreachable code at projection

[GitHub] spark issue #13663: [SPARK-15950][SQL] Eliminate unreachable code at project...

2016-06-14 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13663 @cloud-fan and @davies , thank you for your comments. A global variable is used for 1. I will address 1. by using another approach without using a global variable in another PR. This PR will focus

[GitHub] spark pull request #13680: [SPARK-15962][SQL] Introduce additonal implementa...

2016-06-15 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13680 [SPARK-15962][SQL] Introduce additonal implementation with a dense format for UnsafeArrayData ## What changes were proposed in this pull request? This PR introduces two implementations

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-15 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67209280 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -60,18 +60,24 @@ case class CreateArray

[GitHub] spark pull request #13505: [SPARK-15764][SQL] Replace N^2 loop in BindRefere...

2016-06-04 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13505#discussion_r65804344 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/package.scala --- @@ -86,11 +86,31 @@ package object expressions

[GitHub] spark pull request #13472: [SPARK-15735] Allow specifying min time to run in...

2016-06-04 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13472#discussion_r65804523 --- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala --- @@ -33,18 +38,37 @@ import org.apache.commons.lang3.SystemUtils

[GitHub] spark pull request #13472: [SPARK-15735] Allow specifying min time to run in...

2016-06-04 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13472#discussion_r65804521 --- Diff: core/src/main/scala/org/apache/spark/util/Benchmark.scala --- @@ -33,18 +38,37 @@ import org.apache.commons.lang3.SystemUtils

[GitHub] spark issue #13439: [SPARK-15701][SQL] Constant ColumnVector only needs to p...

2016-06-01 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13439 Can we have a benchmark program to show performance improvement? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13459: [SPARK-15726] [SQL] Make DatasetBenchmark fairer among D...

2016-06-01 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13459 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67849394 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -26,6 +27,55 @@ import

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67849962 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +164,415 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67850981 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +164,415 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67993179 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -23,7 +23,60 @@ import

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-21 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 @hvanhovell , I added [a file](https://github.com/kiszk/spark/blob/133d4c0085b5ca2f20870c05d077e25d8715e07a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-21 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 @hvanhovell, yes, it is good idea. Actually, I wrote a benchmark program ```org.apache.spark.sql.catalyst.util.GenericArrayBenchmark``` (not committed yet). An issue in my environment is that I

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67996014 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -112,8 +112,8 @@ class CodeGenerationSuite

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67995936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +196,414 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67995993 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +196,414 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67997326 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +164,415 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r68004874 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayDataBenchmark.scala --- @@ -0,0 +1,188 @@ +/* + * Licensed

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r68004837 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala --- @@ -159,17 +159,17 @@ object CatalystTypeConverters

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r68007482 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +164,415 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r67994789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +196,414 @@ class GenericArrayData(val array

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r68015184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayDataBenchmark.scala --- @@ -0,0 +1,188 @@ +/* + * Licensed

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-22 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13758#discussion_r68039256 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/GenericArrayData.scala --- @@ -142,3 +196,414 @@ class GenericArrayData(val array

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-16 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67315501 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -26,6 +26,38 @@ import

[GitHub] spark pull request #13704: [SPARK-15985][SQL] Reduce runtime overhead of a p...

2016-06-16 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13704 [SPARK-15985][SQL] Reduce runtime overhead of a program that reads an primitive array in Dataset ## What changes were proposed in this pull request? This PR reduces runtime overhead

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-16 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67315144 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -71,7 +71,8 @@ case class CreateArray

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-15 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67213607 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -60,18 +60,24 @@ case class CreateArray

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-21 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67863392 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -26,6 +27,55 @@ import

[GitHub] spark issue #13663: [SPARK-15950][SQL] Eliminate unreachable code at project...

2016-06-18 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13663 @cloud-fan , @davies , I would appreciate it if you would look at this again. I think that I had addressed all of your comments. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-06-18 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13758 [SPARK-16043][SQL] Prepare GenericArrayData implementation specialized for a primitive array ## What changes were proposed in this pull request? This PR addresses a ToDo

[GitHub] spark pull request #13757: [SPARK-16042][SQL] Eliminate nullcheck code at pr...

2016-06-18 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13757 [SPARK-16042][SQL] Eliminate nullcheck code at projection for an array type ## What changes were proposed in this pull request? This PR eliminates nullcheck code at projection for an array

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 It is OK to always keep ```[null bits]``` One question: Is this format to keep fixed space for ```[values]```? I mean if ```[null bit]``` is true, the corresponding element in ```[value

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 Good to hear. I will make an implementation for single format. If I would meet some issues, I will raise them here. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @cloud-fan thank you for your good comment. I also read [previous proposal](https://github.com/apache/spark/pull/12640#discussion_r61539393). I love to have only single format (or implementation

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-22 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 @cloud-fan , thank you for your comment. About conversion between primitive array and unsafe array, I think we are on the same page. The motivation of my recent PRs is to reduce overhead to change

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @cloud-fan , I have one question about null field. Should we put zero into the corresponding field to position where ```setNullAt()``` is called as ```UnsafeRow``` [does](https://github.com/apache

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 One potential performance issue is that we have to always clear all of null bits at ```UnsafeArrayWriter.initialize()```. This is because ```holder.buffer``` is reused for each row. If one row has

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-16 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67386476 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -60,18 +60,24 @@ case class CreateArray

[GitHub] spark pull request #13663: [SPARK-15950][SQL] Eliminate unreachable code at ...

2016-06-15 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13663#discussion_r67272226 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -60,18 +60,24 @@ case class CreateArray

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 I see. I assumed that virtual call will be devirtualized by declaring ```final``` method and by optimistically propagating type information in the JIT compiler. Would it be better to add a flag like

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 @hvanhovell , @cloud-fan , could you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce additonal implementation wi...

2016-06-23 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @cloud-fan , for the first issue, we are on the same page. Your proposal is what I am thinking about as possible solutions. I will do that. For the second issue, it seems to be design choice

[GitHub] spark issue #13704: [SPARK-15985][SQL] Reduce runtime overhead of a program ...

2016-06-27 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13704 @cloud-fan I think ```value#63``` is ```UnsafeArrayData```. When I ran a DataFrame program, I got the following trees. Since operations for DataFrame access data in UnsafeArrayData, I think

[GitHub] spark issue #13899: [SPARK-16196][SQL] Codegen in-memory scan with ColumnarB...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13899 @andrewor14 Looks interesting. I created two PRs that generate similar code like [your code](https://gist.github.com/andrewor14/7ce4c37a3c6bcd5cc2b6b16c861859e9). My PRs use current

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 Jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-25 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13909 [SPARK-16213][SQL] Reduce runtime overhead of a program that creates an primitive array in DataFrame ## What changes were proposed in this pull request? This PR reduces runtime overhead

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-26 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @rxin thank you for your comment. As you said, holistic view is important. This PR is not only for machine learning. This PR has another use case for improving projection of an array in any

[GitHub] spark pull request #13911: [SPARK-16215][SQL] Reduce runtime overhead of a p...

2016-06-25 Thread kiszk
GitHub user kiszk opened a pull request: https://github.com/apache/spark/pull/13911 [SPARK-16215][SQL] Reduce runtime overhead of a program that writes an primitive array in Dataframe/Dataset ## What changes were proposed in this pull request? This PR optimize generate

[GitHub] spark issue #13680: [SPARK-15962][SQL] Introduce implementation with a dense...

2016-06-26 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13680 @cloud-fan and @hvanhovell thank you for your comments. Based on your comments, I implemented ```UnsafeArrayData``` by using one implementation with explicit clearing ```null bits

[GitHub] spark issue #13663: [SPARK-15950][SQL] Eliminate unreachable code at project...

2016-06-24 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13663 @cloud-fan , @davies , I would appreciate it if you would look at this again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-06-25 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/13758 @hvanhovell and @cloud-fan This```GenericArrayData``` can be used in generated code for a program with an primitive array written in DataFrame or Dataset. I newly added Dataframe benchmark program

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r68519480 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -26,6 +26,20 @@ import

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r68519507 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameComplexTypeSuite.scala --- @@ -26,6 +26,20 @@ import

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r68519430 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -51,27 +51,52 @@ case class CreateArray

[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-06-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/13909#discussion_r68519450 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -51,27 +51,52 @@ case class CreateArray

[GitHub] spark pull request: [SPARK-13255][SQL] Integrate vectorized parque...

2016-02-10 Thread kiszk
Github user kiszk commented on the pull request: https://github.com/apache/spark/pull/11146#issuecomment-182347108 @nongli Is there some kind of design doc on the ColumnarBatch? I am planning to make PRs for columnar storage and its computations with DataFrame/Dataset. We

  1   2   3   4   5   6   7   8   9   10   >