GitHub user xwu0226 opened a pull request:
https://github.com/apache/spark/pull/13212
[SPARK-15431][SQL] Support LIST FILE(s)|JAR(s) command natively
## What changes were proposed in this pull request?
Currently command "ADD FILE|JAR <filepath|jarpath>" is supported natively
in SparkSQL. However, when this command is run, the file/jar is added to the
resources that can not be looked up by "LIST FILE(s)|JAR(s)" command because
the LIST command is passed to Hive command processor in Spark-SQL or simply not
supported in Spark-shell. There is no way users can find out what files/jars
are added to the spark context.
Refer to [Hive
commands](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli)
This PR is to support following commands:
`LIST (FILE[s] [filepath ...] | JAR[s] [jarfile ...])`
### For example:
##### LIST FILE(s)
```
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt")
res1: org.apache.spark.sql.DataFrame = []
scala> spark.sql("add file hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt")
res2: org.apache.spark.sql.DataFrame = []
scala> spark.sql("list file
hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt").show(false)
+----------------------------------------------+
|result |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
+----------------------------------------------+
scala> spark.sql("list files").show(false)
+----------------------------------------------+
|result |
+----------------------------------------------+
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test1.txt|
|hdfs://bdavm009.svl.ibm.com:8020/tmp/test.txt |
+----------------------------------------------+
```
##### LIST JAR(s)
```
scala> spark.sql("add jar
/Users/xinwu/spark/core/src/test/resources/TestUDTF.jar")
res9: org.apache.spark.sql.DataFrame = [result: int]
scala> spark.sql("list jar TestUDTF.jar").show(false)
+---------------------------------------------+
|result |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+
scala> spark.sql("list jars").show(false)
+---------------------------------------------+
|result |
+---------------------------------------------+
|spark://192.168.1.234:50131/jars/TestUDTF.jar|
+---------------------------------------------+
```
## How was this patch tested?
New test cases are added for Spark-SQL, Spark-Shell and SparkContext API
code path.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/xwu0226/spark list_command
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13212.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13212
----
commit 3866e3dcbfbd9fe0e18ecde3b23bb14757e06a0c
Author: xin Wu <[email protected]>
Date: 2016-05-08T07:06:36Z
spark-15206 add testcases for distinct aggregate in having clause following
up PR12974
commit 951d3edc412ef3d6f77d70a4dd7dd7add966d7b1
Author: xin Wu <[email protected]>
Date: 2016-05-08T07:09:44Z
Revert "spark-15206 add testcases for distinct aggregate in having clause
following up PR12974"
This reverts commit 98a1f804d7343ba77731f9aa400c00f1a26c03fe.
commit 5b30cc3c0eb20c134e21942ef96a26e452f9171c
Author: xin Wu <[email protected]>
Date: 2016-05-17T22:09:57Z
adding spark native support for LIST FILES/JARS
commit 6396ec1591134ca3fd754a6a2684bc8b81218951
Author: xin Wu <[email protected]>
Date: 2016-05-17T22:52:31Z
update testcase
commit 79e97be7917d23f44f60cc857a471b14cb96831c
Author: xin Wu <[email protected]>
Date: 2016-05-19T07:07:02Z
support listing specific file(s)
commit a4dc6164ff51b428dae282aa90042758c4ae33d7
Author: Xin Wu <[email protected]>
Date: 2016-05-19T07:33:50Z
update testcases
commit 688c294060cb00cd6c387591bf700e58bdd3dba8
Author: Xin Wu <[email protected]>
Date: 2016-05-19T22:57:16Z
align with PR 13122
commit a0a76a3c5ff93dbf42f07bebd54b7a3514e87132
Author: Xin Wu <[email protected]>
Date: 2016-05-19T23:07:32Z
code style
commit 923988ac5d21e0c0afc6bf76d21a27e8f46f1246
Author: Xin Wu <[email protected]>
Date: 2016-05-19T23:11:36Z
code style
commit 21b092ab84b22abec93fde1fc1ca177db68d9f0d
Author: Xin Wu <[email protected]>
Date: 2016-05-20T04:16:26Z
update comments
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]