Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/15541
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/20480
[Spark-23306] Fix the oom caused by contention
## What changes were proposed in this pull request?
here is race condition in TaskMemoryManger, which may cause OOM.
The memory
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/17180
retest it please.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/17180
Will fix the unit test.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/18694
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/18694
Close the PR and will work on adding close interface for the iterator used
in SparkSQL to remove extra overhead.
---
If your project is set up for it, you can reply to this email and have your
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/17180
The test failure us caused by call method on the map after
`destructiveIterator()` has been called.
It is illegal by the definition.
https://github.com/apache/spark/blob/master/core/src
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/17180
per review comments, release the longArray on destructive iterator creation.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/18694
Currently the patch helps the scenario such as Join(A, Join(B,C)). It is
critical for us because we have some internal development in which each stage
may consists of tens of sort operators. We
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/18694
If it is assumed that the pipeline is as simple as one stage only has one
operator need to spill, you are right. But if the pipeline is more complex, for
example multiple operator needs to spill
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/18694
cleanup hook is used after task is done. The diff solve the leak for
SortMergeJoin only and does not apply to the limit case. Limit is another
special case and need to be taken care of separately
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18694#discussion_r128683903
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -649,6 +660,11 @@ private[joins] class
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/18694
The memory leak happens on following scenario. For example, in inner join,
the left side is exhausted, we will stop advance the right side. Because the
right side is not reach the end, the memory
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/18694#discussion_r128679491
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala
---
@@ -649,6 +660,11 @@ private[joins] class
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/18694
[SPARK-21492][SQL] Memory leak in SortMergeJoin
## What changes were proposed in this pull request?
Fix the memory in SortMergeJoin
## How was this patch tested?
Relies on existing
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/17180
[SPARK-19839][Core]release longArray in BytesToBytesMap
## What changes were proposed in this pull request?
When BytesToBytesMap spills, its longArray should be released. Otherwise,
it may
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/17155
@gatorsmile Thanks for reviewing this. I am thinking the logic again. On
the surface, the logic may be correct. Since in the join, the left and right
key should be the same type. Will close the PR
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/17155
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/17155
[SPARK-19815][SQL] Not order able should be applied to right key instead of
left key
## What changes were proposed in this pull request?
Change the orderable condition.
## How
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16909
@hvanhovell @davies Correct me if I am wrong. My understanding is that
following code will go though all matching rows on the right side, and put them
into the BufferedRowIterator. If there is OOM
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16909
@tejasapatil Do you want to fix the BufferedRowIterator for
WholeStageCodegenExec as well? As for inner join, the LinkedList currentRows
would cause the same issue as it buffer the rows from inner
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r91570259
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r91569919
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +489,26 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r91142141
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r91026585
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r91026433
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +488,52 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/16068#discussion_r90763121
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveUDFSuite.scala
---
@@ -487,6 +488,29 @@ class HiveUDFSuite extends QueryTest
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
@gatorsmile we cannot use deterministic = true/false, as there are
existing udf with deterministic as true, but stateful as true as well.
---
If your project is set up for it, you can reply
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
My understanding is that the non-deterministic udf does not need to be
stageful, but a stateful udf has to be non-deterministic.
Here is the comments in hive regarding this property
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
@hvanhovell Would you like take a look and let me know if you have any
concern?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
@hvanhovell Thanks for looking at this. We have a big number of UDFs that
have this issue. For example, the UDF gives different result with different
partition/sort, but the UDF is pushdown before
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/16068
retest it please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/16068
stateful udf should be nondeterministic
## What changes were proposed in this pull request?
Make stateful udf as nondeterministic
## How was this patch tested?
Mainly
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15541
@rxin Thanks for the feedback regarding the TaskAssigner API. The current
API is designed based on the current logic of TaskSchedulerImp, where the
scheduler takes many rounds to assign the tasks
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r85985739
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,232 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15541
@rxin Would you like to take a look and let you know if you have any
concern? Thanks.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84621076
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84619879
--- Diff:
core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
@@ -250,24 +251,24 @@ private[spark] class TaskSchedulerImpl
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84619023
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84424034
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15541
@gatorsmile I didn't see your new comments
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15541
@rxin Can you please take a look, and let me know if you have any concern?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84158910
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84129486
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -109,6 +108,85 @@ class TaskSchedulerImplSuite extends
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84119714
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84002685
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84002480
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/TaskSchedulerImplSuite.scala ---
@@ -109,6 +108,85 @@ class TaskSchedulerImplSuite extends
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84002353
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r84002236
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r83999756
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r83998058
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15541#discussion_r83997070
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,233 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15541
@rxin @gatorsmile Can you please take a look, and kindly provide your
comments.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/15541
[SPARK-17637][Scheduler]Packed scheduling for Spark tasks across executors
## What changes were proposed in this pull request?
Restructure the code and implement two new task assigner
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@wangmiao1981 Thanks for reviewing this. I will open another PR solving
these comments soon.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@rxin Thanks a lot for the detail review. I will update the patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@mridulm Thanks for reviewing this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@mridulm You are right. This patch is mainly for the job that has multiple
stages, which is very common in production pipeline. As you mentioned, if there
is shuffle involved
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@mridulm Thanks for the comments. Your concern regarding the locality is
right. The patch does not change this behavior, which takes priority of
locality preference. But if multiple executors
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15218#discussion_r82321008
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/15218#discussion_r82290564
--- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskAssigner.scala
---
@@ -0,0 +1,151 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@mridulm Thanks for review this. Will wait for a while in case there are
more comments before solving it.
---
If your project is set up for it, you can reply to this email and have your
reply
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
@gatorsmile Thanks. #65832 is the latest one which does not have the same
failure.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
retest please
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15218
Failed in DirectKafkaStreamSuite. It should has nothing to do with the
patch.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/15218
[Spark-17637][Scheduler]Packed scheduling for Spark tasks across executors
## What changes were proposed in this pull request?
Restructure the code and implement two new task assigner
Github user zhzhan commented on the issue:
https://github.com/apache/spark/pull/15080
@srowen Thanks for reviewing this. Any suggestion to improve it are
welcomed. It does bother us a lot without being able to locate the debug log
quickly in production.
---
If your project is set
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/15080
SPARK-17526: add log links in job failures
## What changes were proposed in this pull request?
Add the executor log links with the job failure message on Spark UI and
Console
## How
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/13322#issuecomment-222432192
My understanding is that this new added hidden column is mainly for serdes
object to/from row. How would you leverage it to solve the the out join case
where the null
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/10375#issuecomment-165844369
Any test cases to make sure it works as expected? Do you mind changing the
orc ppd enabled as default or using another JIRA.
---
If your project is set up for it, you
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/9553#discussion_r44243045
--- Diff: repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala
---
@@ -78,16 +79,21 @@ object Main extends Logging
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/9553#issuecomment-154943558
Need document update for this new configuration.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/9553#discussion_r44242983
--- Diff:
repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -132,6 +132,7 @@ class SparkILoop(
@DeveloperApi
var
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/9232#discussion_r43204781
--- Diff:
yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnSparkHadoopUtil.scala ---
@@ -142,6 +145,97 @@ class YarnSparkHadoopUtil extends
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8799#issuecomment-141355072
LGTM Thanks for fixing this.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/8783
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8783#issuecomment-141215436
@liancheng Thanks for review. Since
https://github.com/apache/spark/pull/8799 is opened, which also fix another
issue. I will close this one.
---
If your project
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/8783
[SPARK-10623][SQL]: fix the predicate pushdown construction
The predicate pushdown is not working because the construction is wrong.
Fix it with startAnd/end
You can merge this pull request
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8783#issuecomment-140941698
@liancheng @marmbrus Can you help to review it?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8547#issuecomment-136610247
Adding an PartitionValues.empty does not cover all problems. Will close
this PR, and investigate other approaches.
---
If your project is set up for it, you can reply
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/8547
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/8547#discussion_r38391122
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -436,7 +436,8 @@ abstract class HadoopFsRelation
private[sql
GitHub user zhzhan opened a pull request:
https://github.com/apache/spark/pull/8547
[SPARK-10304][SQL]: throw error when the table directory is invalid
Throw error if the directory of a table is invalid, validated by either all
files in the directory are partitioned, or none
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7520#issuecomment-135323382
@viirya Take a quick second look at the issue. As @chenghao-intel
mentioned, since normalizing the name(to lower case) is the default behavior.
Should we fix
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7520#issuecomment-135324209
Also we need to change
private lazy val nameToField: Map[String, StructField] = fields.map(f =
f.name.toLowerCase - f).toMap
---
If your project is set up
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7520#issuecomment-135332239
@liancheng have more insights on this part.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8033#issuecomment-131959546
@srowen It seems that the mapping got messed up, which I don't have clue
yet and didn't find any obvious reason why the patch can break the test. I will
dig more
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/8033#issuecomment-131960610
@srowen Probably you can revert back the change in
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala
---
If your project is set up for it, you
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7520#issuecomment-127465813
LGTM. Will let @liancheng take a final look.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/7520#discussion_r35396110
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -86,19 +86,10 @@ private[orc] class OrcOutputWriter
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/7520#discussion_r35395830
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -120,15 +111,11 @@ private[orc] class OrcOutputWriter
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7520#issuecomment-124335510
LGTM with the comments answered or resolved.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/7520#discussion_r35337326
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -85,18 +85,11 @@ private[orc] class OrcOutputWriter
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7200#issuecomment-118190051
some minor comments. Overall, LGTM
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/7200#issuecomment-118187344
@liancheng Because in spark, we will not create the orc file if the record
is empty. It is only happens with the ORC file created by hive, right?
---
If your project
Github user zhzhan commented on a diff in the pull request:
https://github.com/apache/spark/pull/7200#discussion_r33831074
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcFileOperator.scala ---
@@ -24,30 +24,58 @@ import
org.apache.hadoop.hive.serde2
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/4064#issuecomment-112875139
@WangTaoTheTonic The problem happens with spark-1.3 and hadoop-2.6 in
kerberos cluster. With hive-0.14 support, I suppose the problem may be gone,
but I didn't verify
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/4064
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhzhan closed the pull request at:
https://github.com/apache/spark/pull/5637
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user zhzhan commented on the pull request:
https://github.com/apache/spark/pull/5637#issuecomment-112615055
Close this PR, as it may be outdated with latest spark upstream and not
working.
---
If your project is set up for it, you can reply to this email and have your
reply
1 - 100 of 318 matches
Mail list logo