Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83761710
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java
---
@@ -0,0 +1,318 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83757435
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java
---
@@ -0,0 +1,318 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83756988
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java
---
@@ -0,0 +1,189 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83753744
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java
---
@@ -0,0 +1,189 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83759570
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/VectorizedSparkOrcNewRecordReader.java
---
@@ -0,0 +1,318 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83752300
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java
---
@@ -0,0 +1,189 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/13775#discussion_r83757105
--- Diff:
sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/orc/SparkVectorizedOrcRecordReader.java
---
@@ -0,0 +1,189 @@
+/*
+ * Licensed
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/13775
Earlier this year I had spent some time trying out Presto's ORC reader with
Spark.
In standalone benchmark, Presto's ORC reader is 3x faster than the one in
Hive. My experimental
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15272
@cloud-fan + @rxin : Fixed the test case. Ready for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15272
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15272
@rxin : Yes. I looked at it but could not find the root cause. I have been
busy with other stuff so could not invest more time. I plan to get this fixed
over the weekend.
---
If your project
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
@rxin : Yes. I think I know why its happening and will get back with a fix
over weekend.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15272
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15272#discussion_r82716898
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala
---
@@ -88,7 +88,7 @@ trait PredicateHelper
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
jenkins test this please.
Failed test from earlier run was in KafkaSourceStressSuite which I don't
see being related to this PR.
---
If your project is set up for it, you can reply
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
Jenkins test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15300
@hvanhovell , @cloud-fan : Can you please review this PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15300
cc @hvanhovell , @cloud-fan for review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15300
[SPARK-17729] [SQL] Enable creating hive bucketed tables
## What changes were proposed in this pull request?
Hive allows inserting data to bucketed table without guaranteeing bucketed
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15047
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15047
jenkins retest this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r81014894
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/HashByteArrayBenchmark.scala
---
@@ -59,90 +59,110 @@ object HashByteArrayBenchmark
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80848744
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/HiveHasher.java ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80848767
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -276,6 +276,97 @@ abstract class HashExpression[E
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80848722
--- Diff:
common/unsafe/src/main/java/org/apache/spark/unsafe/hash/HiveHasher.java ---
@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80848979
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -559,3 +607,219 @@ case class CurrentDatabase
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r80848863
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
---
@@ -559,3 +607,219 @@ case class CurrentDatabase
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15272
[SPARK-17698] [SQL] Join predicates should not contain filter clauses
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17698
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15229
[SPARK-17654] [SQL] Propagate bucketing information for Hive tables to /
from Catalog
## What changes were proposed in this pull request?
Currently Spark does not respect bucketing
Github user tejasapatil closed the pull request at:
https://github.com/apache/spark/pull/15228
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15228
[SPARK-17654] [SQL] Propagate bucketing information for Hive tables to /
from Catalog
## What changes were proposed in this pull request?
Currently Spark does not respect bucketing
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15226#discussion_r80350179
--- Diff:
core/src/main/scala/org/apache/spark/util/AsynchronousListenerBus.scala ---
@@ -117,6 +124,24 @@ private[spark] abstract class
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15047
@hvanhovell Done with all changes. Ready for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15047
@rxin : I could but the test case depends on few Hive classes for
validation. I could either (keep the test case in sql/hive and move HiveHash to
sql/catalyst) OR (move both to sql/catalyst
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15013
@zsxwing : ping
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
ping !!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15040
@cloud-fan : Ok. Looks like "add a field in CatalogTable" option won't be
viable then. So should I move on with your advice of "boolean flag to indicate
it's a spark native bu
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15040
@cloud-fan : Would it be ok to add a field in CatalogTable to indicate if a
table is from Hive ? For Hive tables, the hashing function also needs to be
different while doing bucketing so having
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r78299323
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveHash.scala
---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15047#discussion_r78299106
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveHash.scala
---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15040
@cloud-fan : cc'ing you as you have lot of context about bucketing in
Spark. I am looking for early feedback about this change wrt approach. I have
included details in the PR description
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15047
@rxin : can you recommend me someone for reviewing this PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15047
[SPARK-17495] [SQL] Add Hash capability semantically equivalent to Hive's
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17495
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15040
[WIP] [SPARK-17487] [SQL] Configuragble bucketing info extraction
## What changes were proposed in this pull request?
I am looking for early feedback about this change wrt approach
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r78207768
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -61,6 +62,51 @@ class JoinSuite extends QueryTest with SharedSQLContext
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r78207613
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala ---
@@ -61,6 +62,51 @@ class JoinSuite extends QueryTest with SharedSQLContext
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r78129784
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -156,24 +155,57 @@ case class FileSourceScanExec
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15013
Done with all change. Ready for review.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r78115155
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r78115022
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r78115006
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r78114829
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r78114838
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/script/ScriptTransformationExec.scala
---
@@ -0,0 +1,313 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15013#discussion_r78109991
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
---
@@ -199,6 +199,9 @@ private[spark] class BlockManager
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/15013#discussion_r78109911
--- Diff:
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
---
@@ -148,12 +149,32 @@ private[spark] class
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/15013
cc @zsxwing for review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/15013
[SPARK-17451] [CORE] CoarseGrainedExecutorBackend should inform driver
before self-kill
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r77438866
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -156,24 +156,57 @@ case class FileSourceScanExec
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
@cloud-fan : Thanks !! Did the change.
Jenkins, test this please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
@cloud-fan : Sounds good to me.
I tried doing that but got a `Task not serializable:
java.io.NotSerializableException: org.apache.hadoop.fs.LocatedFileStatus`. This
is because the new
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14920
Thanks @hvanhovell !!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user tejasapatil closed the pull request at:
https://github.com/apache/spark/pull/14920
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r77205135
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -156,24 +156,56 @@ case class FileSourceScanExec
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14920
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14920
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordeâ¦
## What changes were proposed in this pull request?
Jira : https://issues.apache.org/jira/browse/SPARK-17271
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14864#discussion_r77118552
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala
---
@@ -156,24 +156,56 @@ case class FileSourceScanExec
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14841#discussion_r77117090
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
---
@@ -61,6 +61,9 @@ case class SortOrder(child
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14910
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14910
[SPARK-17271] [SQL] Remove redundant `semanticEquals()` from `SortOrder`
## What changes were proposed in this pull request?
Removing `semanticEquals()` from `SortOrder` because it can
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14841#discussion_r77113690
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/SortOrder.scala
---
@@ -61,6 +61,9 @@ case class SortOrder(child
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
@cloud-fan : I have taken care of that case in the PR (see L175 to L185).
The sort ordering will only be used when all the buckets have single file. In
subsequent PRs I plan to extend this so
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
cc @rxin , @cloud-fan for review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
Jenkins test this please. The last run had JVM crash
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/13231
Continuing this work in a new PR :
https://github.com/apache/spark/pull/14864
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14864
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14864
[SPARK-15453] [SQL] FileSourceScanExec to extract `outputOrdering`
information
## What changes were proposed in this pull request?
Extracting sort ordering information
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14841
cc'ing @rxin and @hvanhovell for review
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14841
ok to test
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14841
[SPARK-17271] [SQL] Planner adds un-necessary Sort even if child ordering
is semantically same as required ordering
## What changes were proposed in this pull request?
Jira : https
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/13231
@viirya : I was spent some time on this today and got a working version :
https://github.com/tejasapatil/spark/commit/a17b167a8996b494480eb6917acd60eea4b09a17
I need to polish
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
@rxin : I have updated the description to include more info on changes done
and future todos
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14702#discussion_r7591
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ScriptTransformationExec.scala
---
@@ -0,0 +1,312 @@
+/*
+ * Licensed
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14726#discussion_r75960177
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
---
@@ -22,15 +22,21 @@
import
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14726#discussion_r75573049
--- Diff:
core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeSorterSpillReader.java
---
@@ -22,15 +22,21 @@
import
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14475
Continuing to https://github.com/apache/spark/pull/14726
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14726
[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`
## What changes were proposed in this pull request?
Jira: https://issues.apache.org/jira/browse/SPARK-16862
Github user tejasapatil closed the pull request at:
https://github.com/apache/spark/pull/14475
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14475
Yeah. I have been stuck with other things so could not clean it up. Will
try again. In worst case close this PR and send a new one.
---
If your project is set up for it, you can reply
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14475
cc @rxin : who would be the best person to review this PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14702
cc @rxin : who would be the best person to review this PR ?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14498
Is Spark's hashing function semantically equivalent to Hive's ? AFAIK, its
not. I think it would be better to have a mode to be able to use Hive's hash
method. eg. case when this would
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14475
ping !!!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil opened a pull request:
https://github.com/apache/spark/pull/14702
[SPARK-15694] Implement ScriptTransformation in sql/core
## What changes were proposed in this pull request?
Added `ScriptTransformationExec` which would run script operator in SQL
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14537
LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user tejasapatil reopened a pull request:
https://github.com/apache/spark/pull/14475
[SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`
## What changes were proposed in this pull request?
Jira: https://issues.apache.org/jira/browse/SPARK-16862
Github user tejasapatil closed the pull request at:
https://github.com/apache/spark/pull/14475
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user tejasapatil commented on a diff in the pull request:
https://github.com/apache/spark/pull/14507#discussion_r73741019
--- Diff: core/src/main/scala/org/apache/spark/ui/ConsoleProgressBar.scala
---
@@ -32,9 +32,14 @@ private[spark] class ConsoleProgressBar(sc
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14507
Hive has `hive.querylog.plan.progress.interval` for the same purpose:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration .
Given that its mostly used for batch
Github user tejasapatil commented on the issue:
https://github.com/apache/spark/pull/14507
For batch jobs running for say ~10 hours, with 3 sec frequency, there would
be 18k lines from the progress bar. That sounds like a lot. In Hadoop land they
used to have 3 sec but it was made
501 - 600 of 768 matches
Mail list logo