Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193921325
if it did then it was not always in the apis i think? i remember the apis
having paths: Seq[String] instead of files: Seq[FileStatus]. by explicitly
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193902037
@koertkuipers improving the efficiency of working with large files was
certainly a goal in this refactoring and this API is definitely not done yet.
That said, I'm
Github user koertkuipers commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193877543
i believe the need to pass all files along (e.g. inputFiles:
Array[FileStatus]) instead of just the input paths came from the need to cache
it so that stuff
Github user tedyu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55372864
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala
---
@@ -1,265 +0,0 @@
-/*
- * Licensed to the Apache Software
Github user tedyu commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55318504
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -465,214 +379,165 @@ abstract class OutputWriter {
}
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193511261
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193511260
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193510810
**[Test build #52582 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52582/consoleFull)**
for PR 11509 at commit
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/11509
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
Github user rxin commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193498704
Going to merge this in master.
We should rename HiveFileCatalog to MetastoreFileCatalog. cc @andrewor14
---
If your project is set up for it, you
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193493845
Merged build finished. Test PASSed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193493406
**[Test build #52590 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52590/consoleFull)**
for PR 11509 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193493854
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193431891
**[Test build #52590 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52590/consoleFull)**
for PR 11509 at commit
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55260105
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
---
@@ -17,32 +17,153 @@
package
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-193415690
**[Test build #52582 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52582/consoleFull)**
for PR 11509 at commit
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55257029
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -226,16 +226,17 @@ private[sql] object PhysicalRDD {
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55256446
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala
---
@@ -1,265 +0,0 @@
-/*
- * Licensed to the Apache
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55254873
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -147,6 +147,13 @@ case class CreateMetastoreDataSource(
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55254593
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -366,13 +366,6 @@ final class DataFrameWriter private[sql](df:
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55254454
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -88,7 +88,8 @@ class LibSVMRelationSuite extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55253851
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -167,22 +117,63 @@ class DefaultSource extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55253572
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -348,7 +348,7 @@ class OrcQuerySuite extends QueryTest with
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55252933
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
---
@@ -278,26 +298,61 @@ object ResolvedDataSource
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55252685
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -101,45 +111,28 @@ private[sql] case
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55216537
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -226,16 +226,17 @@ private[sql] object PhysicalRDD {
Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55213849
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -88,7 +88,8 @@ class LibSVMRelationSuite extends
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55212994
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/sources/SimpleTextRelation.scala
---
@@ -1,265 +0,0 @@
-/*
- * Licensed to the Apache
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55211124
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -147,6 +147,13 @@ case class CreateMetastoreDataSource(
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55210119
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -465,214 +379,168 @@ abstract class OutputWriter {
}
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55206478
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala ---
@@ -366,13 +366,6 @@ final class DataFrameWriter private[sql](df:
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55205874
--- Diff:
mllib/src/test/scala/org/apache/spark/ml/source/libsvm/LibSVMRelationSuite.scala
---
@@ -88,7 +88,8 @@ class LibSVMRelationSuite extends
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55204271
--- Diff:
mllib/src/main/scala/org/apache/spark/ml/source/libsvm/LibSVMRelation.scala ---
@@ -167,22 +117,63 @@ class DefaultSource extends
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192604670
Did one pass on this, looks great! All the comments are minor, it's fine
to be addressed later.
---
If your project is set up for it, you can reply to this email and
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117589
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/hive/orc/OrcQuerySuite.scala ---
@@ -348,7 +348,7 @@ class OrcQuerySuite extends QueryTest with
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117518
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -465,214 +379,168 @@ abstract class OutputWriter {
}
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117452
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/DefaultSource.scala
---
@@ -17,32 +17,153 @@
package
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117341
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala
---
@@ -351,8 +354,8 @@ private[sql] class
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117300
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
---
@@ -278,26 +298,61 @@ object ResolvedDataSource
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117253
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
---
@@ -92,19 +96,61 @@ object ResolvedDataSource
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55117222
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -101,45 +111,28 @@ private[sql] case
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192558309
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192558302
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192557781
**[Test build #52498 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52498/consoleFull)**
for PR 11509 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192543318
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192543317
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192543057
**[Test build #52493 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52493/consoleFull)**
for PR 11509 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192531809
**[Test build #52498 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52498/consoleFull)**
for PR 11509 at commit
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192514145
**[Test build #52493 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52493/consoleFull)**
for PR 11509 at commit
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55096068
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -58,18 +57,29 @@ import
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55086715
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -58,18 +57,29 @@ import
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55082206
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -58,18 +57,29 @@ import
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55082136
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
---
@@ -246,8 +116,10 @@ object CSVRelation extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55081339
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -464,215 +378,140 @@ abstract class OutputWriter {
}
}
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192441294
@davies agree, we should have a default internalScan that delegates to
external version while doing the `Row` => `InternalRow`. We can then make that
method
Github user nongli commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55077018
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -464,215 +378,140 @@ abstract class OutputWriter {
}
}
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55076629
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -173,16 +173,17 @@ private[sql] object PhysicalRDD {
Github user nongli commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55076660
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVRelation.scala
---
@@ -246,8 +116,10 @@ object CSVRelation extends
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55076507
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -103,7 +103,7 @@ object DataType {
/** Given the string
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55075899
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
---
@@ -92,19 +96,61 @@ object ResolvedDataSource
Github user marmbrus commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55075595
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/sources/CommitFailureTestRelationSuite.scala
---
@@ -1,104 +0,0 @@
-/*
- * Licensed to the
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55072046
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala ---
@@ -464,215 +378,140 @@ abstract class OutputWriter {
}
}
Github user davies commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192406483
@marmbrus InternalRow is not a public API, so we will have buildScan() to
return an RDD of Row for external libraries?
---
If your project is set up for it, you can
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55071018
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala
---
@@ -58,18 +57,29 @@ import
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55070549
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala ---
@@ -173,16 +173,17 @@ private[sql] object PhysicalRDD {
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55069011
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala ---
@@ -103,7 +103,7 @@ object DataType {
/** Given the string
Github user zsxwing commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55067756
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ResolvedDataSource.scala
---
@@ -92,19 +96,61 @@ object ResolvedDataSource
Github user davies commented on a diff in the pull request:
https://github.com/apache/spark/pull/11509#discussion_r55067502
--- Diff:
sql/hive/src/test/scala/org/apache/spark/sql/sources/CommitFailureTestRelationSuite.scala
---
@@ -1,104 +0,0 @@
-/*
- * Licensed to the
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192081129
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192081106
**[Test build #52439 has
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52439/consoleFull)**
for PR 11509 at commit
Github user AmplabJenkins commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192081128
Merged build finished. Test FAILed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user SparkQA commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192076770
**[Test build #52439 has
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52439/consoleFull)**
for PR 11509 at commit
Github user marmbrus commented on the pull request:
https://github.com/apache/spark/pull/11509#issuecomment-192076357
@rxin @nongli @cloud-fan @liancheng @yhuai
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
GitHub user marmbrus opened a pull request:
https://github.com/apache/spark/pull/11509
[SPARK-13665][SQL] Separate the concerns of HadoopFsRelation
`HadoopFsRelation` is used for reading most files into Spark SQL. However
today this class mixes the concerns of file management,
74 matches
Mail list logo