Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/20823
ok to test
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/22550
close this one since other PR is working on this.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user merlintang closed the pull request at:
https://github.com/apache/spark/pull/22550
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/22598
@gaborgsomogyi thanks for your PR, I am going through the details and test
on my local machine.
---
-
To unsubscribe, e
GitHub user merlintang opened a pull request:
https://github.com/apache/spark/pull/22550
[SPARK-25501] Kafka delegation token support
## What changes were proposed in this pull request?
Kafaka is going to support delegation token, Spark need to read the
delegation token
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/21455
@gabor. These fields are important for us the understand the spark kafka
streaming data like the topic name. we can use these information to track
the system status.
On Tue, Jun
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/20823
@holdenk can you look at this PR? thanks in advance.
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/21455
@jerryshao Actually, we can not use reflection to get this field
information.
---
-
To unsubscribe, e-mail: reviews
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/21504#discussion_r193911087
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
---
@@ -55,6 +56,11 @@ class StreamingQueryManager private
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/20823
@jmwdpk can you update this pr, since there is conflict. I have update this
pr. https://github.com/merlintang/spark/commits/SPARK-23674
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/21455
@jerryshao can you review this minor update ?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
GitHub user merlintang opened a pull request:
https://github.com/apache/spark/pull/21455
[SPARK-24093][DStream][Minor]Make some fields of KafkaStreamWriter/Inâ¦
â¦ternalRowMicroBatchWriter visible to outside of the classes
## What changes were proposed in this pull
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
@jerryshao can you backport this to branch 2.2 as well.
thanks
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
@jerryshao and @steveloughran thanks for your comments and review.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
@steveloughran can you review the added system test cases?
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
My local test is ok. I would set up a system test and update this soon.
sorry about this delay.
On Tue, Jan 2, 2018 at 3:42 PM, Marcelo Vanzin <notificati...@github.com>
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
I am so sorry for the late of testing function, I would update it soon.
On Thu, Dec 14, 2017 at 12:55 PM, UCB AMPLab <notificati...@github.com>
wrote:
> Can one of t
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
I have added this test case for the URI comparing based on Steve's
comments. I have tested this in my local vm, it pass the test.
meanwhile, for the hdfs://namenode1/path1 hdfs
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
@jerryshao yes, hdfs://us...@nn1.com:8020 and hdfs://us...@nn1.com:8020
would consider as two filesystem, since the authority information should be
taken into consideration. that is why need
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/19885#discussion_r154827513
--- Diff:
resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
---
@@ -1428,6 +1428,12 @@ private object Client extends
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/19885
@jerryshao can you review this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
GitHub user merlintang opened a pull request:
https://github.com/apache/spark/pull/19885
[SPARK-22587] Spark job fails if fs.defaultFS and application jar are dâ¦
â¦ifferent url
## What changes were proposed in this pull request?
Two filesystems comparing does
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16165
@markhamstra Thanks all.
btw: what if there are many redundant inprogress files in the disk and
impact the system performance?
---
If your project is set up for it, you can reply
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16165
@vanzin sorry, I mean the 2.1.1
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16165
should we backport this into 2.1? @vanzin
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/17092
@Yunni I test this patch locally, it can work, but I have one idea to
improve it. We can discuss it in other ticket.
---
If your project is set up for it, you can reply to this email and have
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/17092
@Yunni ok, let us discuss the further optimization step in other ticket.
the current patch is LGTM.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16965
@Yunni thanks, where I mention the L is the number of hash tables.
By this way, the memory usage would be O(L*N). the approximate NN searching
cost in one partition is O(L*N'). Where N
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16965
@Yunni Ok, if we want to move this quicker, we can keep the current AND-OR
implementation.
(2)(3) you mention that you explode the inner table (dataset). Does it mean
for each tuple
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16965
@Yunni Yes, we can use the AND-OR to increase the possibility by having
more the numHashTables and numHashFunctions. For the further user extension, if
users have a hash function with lower
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16965
@Yunni I agree with you that the current NN search and Join are using the
AND-OR. We can discuss how to use the OR-AND for that two searching as well.
For the OR-AND option
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16965
It seems this patch provide the AND-OR amplification. Can we provide the
option for users to choose the OR-AND amplification as well?
---
If your project is set up for it, you can reply
Github user merlintang closed the pull request at:
https://github.com/apache/spark/pull/15819
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
Many thanks, Xiao. I learnt lots.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94906952
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94727237
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94727256
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +219,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94727246
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +63,63 @@ case class InsertIntoHiveTable
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@gatorsmile can you retest the patch, then we can merge. Sorry to ping you
multiple times since several users are asking this.
---
If your project is set up for it, you can reply to this email
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94361979
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94359244
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94351849
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +63,63 @@ case class InsertIntoHiveTable
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r94351862
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala ---
@@ -216,5 +218,37 @@ class VersionsSuite extends SparkFunSuite
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@gatorsmile I have backport the test case in #16339 with small
modification. because the "INSERT OVERWRITE TABLE tab SELECT '$i'" will bring
the issue from hive side e.
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
yes, let me backport the test cases for checking the staging file.
On Thu, Dec 29, 2016 at 10:11 PM, Xiao Li <notificati...@github.com> wrote:
> Is that possible to
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
Thanks, Wenchen, I have backport the code of #16339 to here, I have tested
it locally. Can you review and verify?
On Sun, Dec 25, 2016 at 11:04 PM, Wenchen Fan <notific
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@gatorsmile Great! thanks so much, because I was pinged multiple times for
this bug. :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@cloud-fan @gatorsmile I have backport the code from #16134, can you
verify and backport this to spark 1.6.x?
---
If your project is set up for it, you can reply to this email and have your
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@gatorsmile one more customer is running into this issue in the spark
1.6.x. I backport the code #16134 to here and test it manually. Please verify.
---
If your project is set up for it, you
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16134
This patch is related to the path #15819 for spark 1.6. In the #15819, I
can add the code from this patch(#16134) now, then we can fix the staging
files issues in the spark 1.6.x
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/16134
+1 backport to spark 1.6.x
On Thu, Dec 15, 2016 at 8:14 AM, Xiao Li <notificati...@github.com> wrote:
> The staging directory and files will not be removed when user
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@cloud-fan @gatorsmile this patch is related to #16134, It seems #16134
would be merged soon. Meanwhile, should we backport #16104 into 1.6.x? please
advise. or else, I just backport #16134
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/16134#discussion_r92244682
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -328,6 +332,15 @@ case class InsertIntoHiveTable
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
Great, once the #16134 <https://github.com/apache/spark/pull/16134> is
done, we can backport them together.
On Tue, Dec 13, 2016 at 12:18 AM, Wenchen Fan <notificati...@g
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@gatorsmile what is going on this patch? this is a backport code, thus, can
you merge this patch into 1.6.x ? more than one users are running into this
issue in the spark 1.6.x.
---
If your
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
do you exit the spark shell ? I have tested on this, and this staging file
would be removed after we exit the spark shell under spark 2.0.x.
meanwhile, the staging file are used
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/13670
@kishorvpatil
you provided the function allexecutors, which is used to return the dead
and active executor information.
For the document
http://spark.apache.org/docs/latest
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@cloud-fan this is related to this PR in the 2.0.x
https://github.com/apache/spark/pull/12770
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
Ok.
On Sun, Dec 4, 2016 at 6:25 PM, Reynold Xin <notificati...@github.com>
wrote:
> We have stopped making new releases for 1.5 so it makes no sense to
&
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
this bug is related to 1.5.x as well as 1.6.x. please backport to 1.5.x as
well.
On Sun, Dec 4, 2016 at 6:20 PM, Reynold Xin <notificati...@github.com>
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
it is updated.
On Sun, Dec 4, 2016 at 11:23 AM, Xiao Li <notificati...@github.com> wrote:
> @merlintang <https://github.com/merlintang> Could you please add
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
yes, exactly. This path is only for spark 1.x. what i proposed here is that
we need to use the code of spark 2.0.x o fix the bug of spark 1.x. you can
see this message from the my previous
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r88778830
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r88778781
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
@cloud-fan @rxin can you review this code? since several customers are
complaining about the hive generated empty staging files in the HDFS.
---
If your project is set up for it, you can reply
Github user merlintang commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r88345264
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user merlintang commented on the issue:
https://github.com/apache/spark/pull/15819
Actually, I do not have the unit test, but the code list below (same as we
posted in the JIRA) can reproduce this bug.
The related code would be this way:
val sqlContext = new
GitHub user merlintang opened a pull request:
https://github.com/apache/spark/pull/15819
[SPARK-18372][SQL].Staging directory fail to be removed
## What changes were proposed in this pull request?
This fix is related to be bug:
https://issues.apache.org/jira/browse/SPARK
68 matches
Mail list logo