Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
@rxin, Thanks for this. I think we can surely go for creating one partition
per 1 block size(default block size 128MB) and would you be able to point me
about the second's(cost of opening a
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
@rxin, The support for openCostInBytes, similar to SQL has now been added
for the affected binaryFiles case. Can I please seek for your review and
valuable suggestion for taking this forward and
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
cc @srowen @hvanhovell @vanzin @skyluc @kmader @zsxwing @datafarmer
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
Hi @srowen
The Spark SQL property/algo that binaryFiles partition calculation is now
implementing is the property "spark.files.openCostInBytes" (in
org.apache.spark.sql.intern
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
/**
* Create an RDD for non-bucketed reads.
* The bucketed variant of this function is [[createBucketedReadRDD]].
*
* @param readFile a function to read each (part of a
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
@srowen @rxin @zsxwing Can you please have a look and advise.
Thanks
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user fidato13 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r87676512
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user fidato13 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r87676473
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user fidato13 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r88778929
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user fidato13 commented on a diff in the pull request:
https://github.com/apache/spark/pull/15819#discussion_r88778913
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
---
@@ -54,6 +61,61 @@ case class InsertIntoHiveTable
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
Can I request anyone to suggest if the changes are not required , I will
close the PR!
@srowen @hvanhovell @rxin @vanzin @skyluc @kmader @zsxwing @datafarmer
---
If your project is set
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing Can you please have a look and advise Jenkins
to retest . In this latest commit fixed "SPARK-12527, it is discarded unused.
You may specify targets with
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing The build and tests have passed. Request you to
merge as you get time. Thanks!
---
If your project is set up for it, you can reply to this email and have your
reply
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing The build and tests have passed. Request you to
have a look and merge as you get time. Thanks!
---
If your project is set up for it, you can reply to this email and
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing The build and tests have passed. Request you to
have a look and merge as you get time. Thanks!
---
If your project is set up for it, you can reply to this email and
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing The build and tests have passed. Request you to
have a look and merge as you get time. Thanks!
---
If your project is set up for it, you can reply to this email and
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
ping @rxin @srowen @zsxwing The build and tests have passed. Request you to
have a look and merge as you get time. Thanks!
---
If your project is set up for it, you can reply to this email and
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
Sure, I had tried to involve all the persons having their previous commits
on the file(as mentioned in the guideline). I will remove all but Reynold from
the list. Thanks.
---
If your project is
GitHub user fidato13 opened a pull request:
https://github.com/apache/spark/pull/15327
[SPARK-16575] [spark core] partition calculation mismatch with
sc.binaryFiles
## What changes were proposed in this pull request?
This Pull request comprises of the critical bug SPARK-16575
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
Hi @rxin ,
Yes, I agree and I believe SQL does take it into account. Should we proceed
with this pull request as it does give the correct partitions for other
components which may be helpful
Github user fidato13 commented on the issue:
https://github.com/apache/spark/pull/15327
@rxin Yes, it makes perfect sense to not create a partition per file.
Looking at the code in PortableDataStream.setMinPartitions:-
val maxSplitSize = math.ceil(totalLen / math.max
21 matches
Mail list logo