[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83761710 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83757435 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83756988 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java --- @@ -0,0 +1,189 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83753744 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java --- @@ -0,0 +1,189 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83759570 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java --- @@ -0,0 +1,318 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83752300 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java --- @@ -0,0 +1,189 @@ +/* + * Licensed

[GitHub] spark pull request #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/13775#discussion_r83757105 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java --- @@ -0,0 +1,189 @@ +/* + * Licensed

[GitHub] spark issue #13775: [SPARK-16060][SQL] Vectorized Orc reader

2016-10-17 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13775 Earlier this year I had spent some time trying out Presto's ORC reader with Spark. In standalone benchmark, Presto's ORC reader is 3x faster than the one in Hive. My experimental

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-16 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...

2016-10-15 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15272 @cloud-fan + @rxin : Fixed the test case. Ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...

2016-10-15 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15272 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...

2016-10-13 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15272 @rxin : Yes. I looked at it but could not find the root cause. I have been busy with other stuff so could not invest more time. I plan to get this fixed over the weekend. --- If your project

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-13 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 @rxin : Yes. I think I know why its happening and will get back with a fix over weekend. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15272: [SPARK-17698] [SQL] Join predicates should not contain f...

2016-10-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15272 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15272: [SPARK-17698] [SQL] Join predicates should not co...

2016-10-10 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15272#discussion_r82716898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -88,7 +88,7 @@ trait PredicateHelper

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 jenkins test this please. Failed test from earlier run was in KafkaSourceStressSuite which I don't see being related to this PR. --- If your project is set up for it, you can reply

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 Jenkins test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15300: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2016-10-06 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15300 @hvanhovell , @cloud-fan : Can you please review this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15300: [SPARK-17729] [SQL] Enable creating hive bucketed tables

2016-09-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15300 cc @hvanhovell , @cloud-fan for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15300: [SPARK-17729] [SQL] Enable creating hive bucketed...

2016-09-29 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15300 [SPARK-17729] [SQL] Enable creating hive bucketed tables ## What changes were proposed in this pull request? Hive allows inserting data to bucketed table without guaranteeing bucketed

[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2016-09-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r81014894 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/HashByteArrayBenchmark.scala --- @@ -59,90 +59,110 @@ object HashByteArrayBenchmark

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r80848744 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/HiveHasher.java --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r80848767 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -276,6 +276,97 @@ abstract class HashExpression[E

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r80848722 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/hash/HiveHasher.java --- @@ -0,0 +1,66 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r80848979 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -559,3 +607,219 @@ case class CurrentDatabase

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-28 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r80848863 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -559,3 +607,219 @@ case class CurrentDatabase

[GitHub] spark pull request #15272: [SPARK-17698] [SQL] Join predicates should not co...

2016-09-27 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15272 [SPARK-17698] [SQL] Join predicates should not contain filter clauses ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17698

[GitHub] spark pull request #15229: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15229 [SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / from Catalog ## What changes were proposed in this pull request? Currently Spark does not respect bucketing

[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/15228 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15228: [SPARK-17654] [SQL] Propagate bucketing informati...

2016-09-23 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15228 [SPARK-17654] [SQL] Propagate bucketing information for Hive tables to / from Catalog ## What changes were proposed in this pull request? Currently Spark does not respect bucketing

[GitHub] spark pull request #15226: [SPARK-17649][CORE] Log how many Spark events got...

2016-09-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15226#discussion_r80350179 --- Diff: core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala --- @@ -117,6 +124,24 @@ private[spark] abstract class

[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2016-09-19 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 @hvanhovell Done with all changes. Ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2016-09-15 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 @rxin : I could but the test case depends on few Hive classes for validation. I could either (keep the test case in sql/hive and move HiveHash to sql/catalyst) OR (move both to sql/catalyst

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-14 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15013 @zsxwing : ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-09-14 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 ping !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing info ex...

2016-09-13 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15040 @cloud-fan : Ok. Looks like "add a field in CatalogTable" option won't be viable then. So should I move on with your advice of "boolean flag to indicate it's a spark native bu

[GitHub] spark issue #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing info ex...

2016-09-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15040 @cloud-fan : Would it be ok to add a field in CatalogTable to indicate if a table is from Hive ? For Hive tables, the hashing function also needs to be different while doing bucketing so having

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r78299323 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveHash.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-11 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15047#discussion_r78299106 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveHash.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #15040: [WIP] [SPARK-17487] [SQL] Configurable bucketing info ex...

2016-09-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15040 @cloud-fan : cc'ing you as you have lot of context about bucketing in Spark. I am looking for early feedback about this change wrt approach. I have included details in the PR description

[GitHub] spark issue #15047: [SPARK-17495] [SQL] Add Hash capability semantically equ...

2016-09-10 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15047 @rxin : can you recommend me someone for reviewing this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #15047: [SPARK-17495] [SQL] Add Hash capability semantica...

2016-09-10 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15047 [SPARK-17495] [SQL] Add Hash capability semantically equivalent to Hive's ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17495

[GitHub] spark pull request #15040: [WIP] [SPARK-17487] [SQL] Configuragble bucketing...

2016-09-09 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15040 [WIP] [SPARK-17487] [SQL] Configuragble bucketing info extraction ## What changes were proposed in this pull request? I am looking for early feedback about this change wrt approach

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-09 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r78207768 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -61,6 +62,51 @@ class JoinSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-09 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r78207613 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -61,6 +62,51 @@ class JoinSuite extends QueryTest with SharedSQLContext

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r78129784 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -156,24 +155,57 @@ case class FileSourceScanExec

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15013 Done with all change. Ready for review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r78115155 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r78115022 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r78115006 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r78114829 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r78114838 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala --- @@ -0,0 +1,313 @@ +/* + * Licensed

[GitHub] spark pull request #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15013#discussion_r78109991 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -199,6 +199,9 @@ private[spark] class BlockManager

[GitHub] spark pull request #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/15013#discussion_r78109911 --- Diff: core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala --- @@ -148,12 +149,32 @@ private[spark] class

[GitHub] spark issue #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should...

2016-09-08 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/15013 cc @zsxwing for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15013: [SPARK-17451] [CORE] CoarseGrainedExecutorBackend...

2016-09-08 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/15013 [SPARK-17451] [CORE] CoarseGrainedExecutorBackend should inform driver before self-kill ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-03 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r77438866 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -156,24 +156,57 @@ case class FileSourceScanExec

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-09-03 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 @cloud-fan : Thanks !! Did the change. Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-09-02 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 @cloud-fan : Sounds good to me. I tried doing that but got a `Task not serializable: java.io.NotSerializableException: org.apache.hadoop.fs.LocatedFileStatus`. This is because the new

[GitHub] spark issue #14920: [SPARK-17271] [SQL] Planner adds un-necessary Sort even ...

2016-09-01 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14920 Thanks @hvanhovell !! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14920: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-09-01 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/14920 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-01 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r77205135 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -156,24 +156,56 @@ case class FileSourceScanExec

[GitHub] spark issue #14920: [SPARK-17271] [SQL] Planner adds un-necessary Sort even ...

2016-09-01 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14920 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14920: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-09-01 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14920 [SPARK-17271] [SQL] Planner adds un-necessary Sort even if child orde… ## What changes were proposed in this pull request? Jira : https://issues.apache.org/jira/browse/SPARK-17271

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-09-01 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14864#discussion_r77118552 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -156,24 +156,56 @@ case class FileSourceScanExec

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-31 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14841#discussion_r77117090 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -61,6 +61,9 @@ case class SortOrder(child

[GitHub] spark issue #14910: [SPARK-17271] [SQL] Remove redundant `semanticEquals()` ...

2016-08-31 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14910 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14910: [SPARK-17271] [SQL] Remove redundant `semanticEqu...

2016-08-31 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14910 [SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder` ## What changes were proposed in this pull request? Removing `semanticEquals()` from `SortOrder` because it can

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-31 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14841#discussion_r77113690 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala --- @@ -61,6 +61,9 @@ case class SortOrder(child

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-31 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 @cloud-fan : I have taken care of that case in the PR (see L175 to L185). The sort ordering will only be used when all the buckets have single file. In subsequent PRs I plan to extend this so

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 cc @rxin , @cloud-fan for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 Jenkins test this please. The last run had JVM crash --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing met...

2016-08-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13231 Continuing this work in a new PR : https://github.com/apache/spark/pull/14864 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract `outpu...

2016-08-29 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14864 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14864: [SPARK-15453] [SQL] FileSourceScanExec to extract...

2016-08-29 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14864 [SPARK-15453] [SQL] FileSourceScanExec to extract `outputOrdering` information ## What changes were proposed in this pull request? Extracting sort ordering information

[GitHub] spark issue #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sort even ...

2016-08-26 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14841 cc'ing @rxin and @hvanhovell for review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sort even ...

2016-08-26 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14841 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14841: [SPARK-17271] [SQL] Planner adds un-necessary Sor...

2016-08-26 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14841 [SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering is semantically same as required ordering ## What changes were proposed in this pull request? Jira : https

[GitHub] spark issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing met...

2016-08-26 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/13231 @viirya : I was spent some time on this today and got a working version : https://github.com/tejasapatil/spark/commit/a17b167a8996b494480eb6917acd60eea4b09a17 I need to polish

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-23 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 @rxin : I have updated the description to include more info on changes done and future todos --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-08-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14702#discussion_r7591 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ScriptTransformationExec.scala --- @@ -0,0 +1,312 @@ +/* + * Licensed

[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-23 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14726#discussion_r75960177 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java --- @@ -22,15 +22,21 @@ import

[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14726#discussion_r75573049 --- Diff: core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java --- @@ -22,15 +22,21 @@ import

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14475 Continuing to https://github.com/apache/spark/pull/14726 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #14726: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14726 [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` ## What changes were proposed in this pull request? Jira: https://issues.apache.org/jira/browse/SPARK-16862

[GitHub] spark pull request #14475: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-19 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/14475 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-19 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14475 Yeah. I have been stuck with other things so could not clean it up. Will try again. In worst case close this PR and send a new one. --- If your project is set up for it, you can reply

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-18 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14475 cc @rxin : who would be the best person to review this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-08-18 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14702 cc @rxin : who would be the best person to review this PR ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14498: [SPARK-16904] [SQL] Removal of Hive Built-in Hash Functi...

2016-08-18 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14498 Is Spark's hashing function semantically equivalent to Hive's ? AFAIK, its not. I think it would be better to have a mode to be able to use Hive's hash method. eg. case when this would

[GitHub] spark issue #14475: [SPARK-16862] Configurable buffer size in `UnsafeSorterS...

2016-08-18 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14475 ping !!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14702: [SPARK-15694] Implement ScriptTransformation in s...

2016-08-18 Thread tejasapatil
GitHub user tejasapatil opened a pull request: https://github.com/apache/spark/pull/14702 [SPARK-15694] Implement ScriptTransformation in sql/core ## What changes were proposed in this pull request? Added `ScriptTransformationExec` which would run script operator in SQL

[GitHub] spark issue #14537: [SPARK-16948][SQL] Querying empty partitioned orc tables...

2016-08-12 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14537 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark pull request #14475: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-09 Thread tejasapatil
GitHub user tejasapatil reopened a pull request: https://github.com/apache/spark/pull/14475 [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader` ## What changes were proposed in this pull request? Jira: https://issues.apache.org/jira/browse/SPARK-16862

[GitHub] spark pull request #14475: [SPARK-16862] Configurable buffer size in `Unsafe...

2016-08-09 Thread tejasapatil
Github user tejasapatil closed the pull request at: https://github.com/apache/spark/pull/14475 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #14507: [SPARK-16919] Configurable update interval for co...

2016-08-05 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request: https://github.com/apache/spark/pull/14507#discussion_r73741019 --- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala --- @@ -32,9 +32,14 @@ private[spark] class ConsoleProgressBar(sc

[GitHub] spark issue #14507: [SPARK-16919] Configurable update interval for console p...

2016-08-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14507 Hive has `hive.querylog.plan.progress.interval` for the same purpose: https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration . Given that its mostly used for batch

[GitHub] spark issue #14507: [SPARK-16919] Configurable update interval for console p...

2016-08-05 Thread tejasapatil
Github user tejasapatil commented on the issue: https://github.com/apache/spark/pull/14507 For batch jobs running for say ~10 hours, with 3 sec frequency, there would be 18k lines from the progress bar. That sounds like a lot. In Hadoop land they used to have 3 sec but it was made

<    1   2   3   4   5   6   7   8   >