subject:"\[GitHub\] spark pull request\: \[SPARK\-2706\]\[SQL\] Enable Spark to support Hive..."

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-28 Thread qiaohaijun

Github user qiaohaijun commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-60730251
  
I will try it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-24 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-60426160
  
Okay, I've merged this to master. Will file a PR shortly to fix the tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2241


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-22 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-60148210
  
@marmbrus Thanks for the comments. 

Given that we have to support hive-0.12. There are two approaches I can 
think out to address the issue. 

1st: we can temporally make the compatibility test  as hive12 only in pom, 
and find a good way as followup to add corresponding compatibility test for 
hive0.13 in an elegant way. This approach also unblock some other jiras and 
build the foundation for further development  of hive 0.13 feature support. 

2nd: we can create a set of separate files for hive-0.13, e.g., 
compatibility suite, golden plan, golden answer, which may involve more than 
hundreds of files. In addition we need to change the current basic hive test 
code to adapt to different hive version. I think this approach may be a little 
rush, and also make the scope of this PR really big and hard to maintain.

I prefer the first approach, and opening followup jiras to address 
leftovers in a more granular way.

Please let know how you do think about it. If you have other options, 
please also  let me know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-21 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59884283
  
@marmbrus in #2499, i reproduce the golden answer and changed some *.ql 
because of 0.13 changes, the tests passed in my local machine.
@zhzhan not get you, why to replace the query play?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-21 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59887284
  
@scwf I mean change some *.ql you already did. The problem is that it 
need to add another layer to take care of compatibility test suite. I have not 
found a good way to do it. I will thank again to see whether there is a simple 
way to make it work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-21 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59888325
  
@scwf I am wondering how do you handle the decimal support, since hive-0.13 
has new semantic for this type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-21 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59889646
  
@marmbrus FYI: I ran the compatibility test, and so far the major 
outstanding issues include 1st:  decimal support, 2nd: udf7 and udf_round, 
which can be fixed, but I am not 100% sure it is the right way. Most of other 
failures are false positive and should be solved by regenerating golden answer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-21 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59962787
  
A few comments:  I'm not talking only about getting the current tests to 
pass, but upgrading the test set to include the new files.  Also, I hope to 
update the whitelist to include any new tests that are now passing.

I'm not particularly concerned about matching every small detail (i.e. if 
we don't need to match the empty comment field on metadata based on version).  
What is important is that we can connect to both Hive 1213 metastores and that 
queries run and return the correct answers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59837979
  
@scwf I did some basic functionality testing with you thrift patch, and it 
looks ok to me. By the way, because the 0.13.1 customized package is not 
available now, so I revert the pom back for testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59839556
  
Thanks, if you have any comment, let me know:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59879567
  
We can reproduce the golden answer for hive 0.13 as i done in my closed PR, 
how about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread marmbrus

Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59880136
  
@scwf, which PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59880350
  
@scwf The golden answer is different in hive12 and hive13. We need some 
extra shim layer to handle that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59880636
  
@marmbrus I think he refers to https://github.com/apache/spark/pull/2499


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-20 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59881210
  
@scwf Did you also replace the query plan for hive0.13 in your another PR? 
because I also saw some query plan changes in hive0.13.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-16 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18944242
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -288,23 +290,26 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
-  val results = new JArrayList[String]
+  val results = HiveShim.createDriverResultsArray
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close()
 throw new QueryExecutionException(response.getErrorMessage)
   }
   driver.setMaxRows(maxRows)
   driver.getResults(results)
-  driver.destroy()
-  results
+  driver.close()
+  results.map { r =
+r match {
+  case s: String = s
+  case o = o.toString
--- End diff --

Here ```r``` maybe a ```Array``` 
type(https://github.com/scwf/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchFormatter.java#L53-L64),
 we should cover that case, otherwise this will lead to console result printed 
as follows:
``` result
   [Object@5e41108b
```
   
And on the other hand i suggest that we should do some tests with this PR 
merged with #2685 to check the basic functionality


 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-16 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18969029
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -288,23 +290,26 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
-  val results = new JArrayList[String]
+  val results = HiveShim.createDriverResultsArray
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close()
 throw new QueryExecutionException(response.getErrorMessage)
   }
   driver.setMaxRows(maxRows)
   driver.getResults(results)
-  driver.destroy()
-  results
+  driver.close()
+  results.map { r =
+r match {
+  case s: String = s
+  case o = o.toString
--- End diff --

@scwf Thanks for the review. The reason I did this is that in hive testing 
code, I actually didn't find a case which the result value is not 
Array[String], and there the results is even initialized as Array[String]. For 
for the safety reason, I will change the code to process Array[Arrray[Object]].

This patch is independent with thrift server, but the thrift server patch 
should be verified after this one going to upstream, mainly due to pom file 
change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58994264
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21718/consoleFull)
 for   PR 2241 at commit 
[`cbb4691`](https://github.com/apache/spark/commit/cbb46911f881a0704fa16a36f2a362a930db6ade).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-14 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58995591
  
@marmbrus from a build perspective this LGTM with the caveat that right now 
it's only passing Hive compatibility for 0.12 tests and may require further 
modification to actually pass 0.13 tests. Up to you in terms of whether that 
blocks merging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-14 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59001081
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21718/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-14 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59001076
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21718/consoleFull)
 for   PR 2241 at commit 
[`cbb4691`](https://github.com/apache/spark/commit/cbb46911f881a0704fa16a36f2a362a930db6ade).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var 
compressed: Boolean)`
  * `class ShimFileSinkDesc(var dir: String, var tableInfo: TableDesc, var 
compressed: Boolean)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-14 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-59145387
  
@marmbrus I actually quite exhaustively tested the code in both unit test 
and system test in sandbox, and real cluster, and didn't see major issues. 
Regarding the compatibility test, there are several test case failure due to 
some hive 0.13 internal behavior change, e.g, hive decimal. We can fix it in 
the follow up. In my point of view, it would be good to take a accumulative 
approach. The current patch does not have impact on existing hive 12 support, 
but enable the community to actively improve hive0.13 support.

Some instant benefits: 1st. Native parquet support, 2nd. some new UDFs in 
hive 0.13, and 3rd: better support for ORC as source, e.g., compression, 
predictor push down, etc.

Please let me know if you have any other concerns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58849802
  
@zhzhan @scwf - I think this should be okay now for protobuf. We made some 
other changes this week updating the protobuf version to be based on protobuf 
2.5 instead of 2.4 in akka. So now throughout Spark we use this. Mind rebasing 
this? I think the protobuf issue will go away.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58850172
  
Ok, thanks for that, i will also test it in 
https://github.com/apache/spark/pull/2685


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58853725
  
Hi @zhzhan and @scwf  - I made some changes to the build to simplify it a 
bit. I made a PR into your branch. I tested it locally compiling for 0.12 and 
0.13, but it would be good if you tested it as well to make sure it works.

https://github.com/zhzhan/spark/pull/1/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58854626
  
Note @scwf there are some TODO's in there that need to be addressed in your 
patch for JDBC.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58927724
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18782903
  
--- Diff: docs/_config.yml ---
@@ -1,7 +1,5 @@
 highlighter: pygments
 markdown: kramdown
-gems:
-  - jekyll-redirect-from
--- End diff --

This particular change we should revert - I accidentally added this from my 
branch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58928624
  
@pwendell Thanks a lot for the help. But I have several concerns. 
1. Not sure how the new package is created. But hive-0.13.1 has native 
parquet support, and I remember I met binary compatibility issue when 
artifactIdparquet-hive-bundle/artifactId is included. So I exclude this one 
form hive-0.13.1.
2. Should Hive-0.13.1 is not compatible with hive-0.12.0. So hive-0.13.1 
does not support all compatibility test suite yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58929975
  
@zhzhan could you give a reproduction of the compatiblity issue? around 
parquet support? It didn't occur at compile time but maybe there is an issue at 
runtime.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58932508
  
@pwendell following is the trace from arquetMetastoreSuite, and it is 
caused by the parquet-hive-bundle

sbt/sbt -Phive-0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 test-only 
org.apache.spark.sql.parquet.ParquetMetastoreSuite

SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
11:11:50.877 ERROR hive.ql.exec.DDLTask: java.lang.NoSuchFieldError: 
doubleTypeInfo
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getObjectInspector(ArrayWritableObjectInspector.java:66)
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.init(ArrayWritableObjectInspector.java:59)
at 
org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe.initialize(ParquetHiveSerDe.java:113)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:288)
at 
org.apache.hadoop.hive.ql.metadata.Table.checkValidity(Table.java:194)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:597)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4189)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
at org.apache.spark.sql.hive.HiveContext.runHive(HiveContext.scala:297)
at 
org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:273)
at 
org.apache.spark.sql.hive.test.TestHiveContext.runSqlHive(TestHive.scala:93)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult$lzycompute(NativeCommand.scala:35)
at 
org.apache.spark.sql.hive.execution.NativeCommand.sideEffectResult(NativeCommand.scala:35)
at 
org.apache.spark.sql.execution.Command$class.execute(commands.scala:44)
at 
org.apache.spark.sql.hive.execution.NativeCommand.execute(NativeCommand.scala:30)
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd$lzycompute(HiveContext.scala:361)
at 
org.apache.spark.sql.hive.HiveContext$QueryExecution.toRdd(HiveContext.scala:361)
at 
org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.init(SchemaRDD.scala:104)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58934179
  
I see - so what exactly is the problem - is the issue that the Parquet 
serde is not compatible across versions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58935643
  
I think because the bundle has the package 
org.apache.hadoop.hive.ql.io.parquet, which is also now natively located in 
hive, and results in package conflicts. The way I fixed it is to remove this 
dependency  parquet-hive-bundle in hive-0.13.1. With this, I didn't see any 
problem with parquet anymore. Please comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18786749
  
--- Diff: assembly/pom.xml ---
@@ -205,6 +211,21 @@
   /dependencies
 /profile
 profile
+  idhive-versions/id
--- End diff --

I think this can be removed - right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18786888
  
--- Diff: assembly/pom.xml ---
@@ -205,6 +211,21 @@
   /dependencies
 /profile
 profile
+  idhive-versions/id
--- End diff --

We already require the `-Phive` profile to add hive to the assembly, so I 
think that is sufficient.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58937829
  
Okay then how about this to keep it simple:

1. Add the `parquet-hive-bundle` dependency inside of `hive/pom.xml` inside 
of a block for the `hive-0.12-0` profile only.
2. Change the default values of `hive.version`, `hive.short.version`, etc. 
to 0.13.1
3. Update the instructions as follows:

```
# Apache Hadoop 2.4.X with Hive 13 support
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean 
package

# Apache Hadoop 2.4.X with Hive 12 support
 mvn -Pyarn -Phive-0.12.0 -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive 
-DskipTests clean package
```

The issue is that we can't rely on dynamic profile activation because it 
messes up people linking against the `spark-hive` artifact. Many build tools 
like SBT do not support loading profiles in poms from other projects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58938511
  
@pwendell Sounds good to me. Will update the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58938955
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amp.lab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21692/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58943420
  
Thanks! Could you elaborate more on the second point you were making? About 
the compatibility tests? Is Hive 0.13 no language-compatible with Hive 0.12?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58956296
  
@pwendell 
With this new patch, we actually have to change the test script to pass all 
hive related test suite.  Need the help from somebody familiar with the test 
infra to add -Phive-0.12.0. (is the change of dev/run-tests enough?)

Regarding the compatibility test, the query plan in hive0.13 is different 
from hive 0.12. In the meantime, there is some new feature and internal change, 
for example decimal support, GenericUDFRound etc. But their scope is much 
smaller than this one and we can fix it in the following up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58979166
  
@zhzhan yeah you can just change `dev/run-tests` to build with the hive 12 
profile for now where it currently has `-Phive`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18805288
  
--- Diff: sql/hive/pom.xml ---
@@ -155,6 +162,25 @@
 artifactIdscalatest-maven-plugin/artifactId
   /plugin
 
+  plugin
--- End diff --

The indentation seems off here. It might be my fault from my earlier patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58993065
  
Test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-13 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-58993918
  
Hmmm Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-06 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18496664
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -287,22 +289,20 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
   val results = new JArrayList[String]
--- End diff --

Need consider compatibility here since ```driver.getResults(results)``` api 
changed in hive-0.13.1. You can refer my change 
there(https://github.com/scwf/spark/blob/hive-0.13.1-clean/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L295-L310)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18433803
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -80,8 +81,10 @@ class StatisticsSuite extends QueryTest with 
BeforeAndAfterAll {
 sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect()
 sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect()
 
-assert(queryTotalSize(analyzeTable) === defaultSizeInBytes)
-
+// TODO: How it works? needs to add it back for other hive version.
+if (HiveShim.version ==0.12.0) {
--- End diff --

I don't quite understand this part, but in my testing the new value is 
always 11624. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18433810
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
--- End diff --

Change the comments, and make the parser accept any decimal(precision, 
scale). But the gap is still there between hive0.13.1 and spark support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18433827
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.net.URI
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.`type`.HiveDecimal
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.stats.StatsSetupConst
+import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.mapred.InputFormat
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.12.0.
+ */
+private[hive] object HiveShim {
+  val version = 0.12.0
+  val metastoreDecimal = decimal
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(serdeClass, inputFormatClass, outputFormatClass, 
properties)
--- End diff --

we need return TableDesc to MetastoreRelation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18433838
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, path: Path) = {
+context.getExternalTmpPath(path.toUri)
+  }
+
+  def getDataLocationPath(p: Partition) = p.getDataLocation
+
+  def getAllPartitionsOf(client: Hive, tbl: Table) =  
client.getAllPartitionsOf(tbl)
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
+   * */
+  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
--- End diff --

I

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18433848
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, path: Path) = {
+context.getExternalTmpPath(path.toUri)
+  }
+
+  def getDataLocationPath(p: Partition) = p.getDataLocation
+
+  def getAllPartitionsOf(client: Hive, tbl: Table) =  
client.getAllPartitionsOf(tbl)
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
--- End diff --

Not sure how hive handle this issue, but FileSinkDesc is certainly not 
serializable with Path

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18434383
  
--- Diff: pom.xml ---
@@ -1260,7 +1259,18 @@
 /dependency
   /dependencies
 /profile
-
+profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
+/property
+  /activation
+  properties
+hive.version0.12.0/hive.version
+derby.version10.4.2.0/derby.version
+  /properties
+/profile
 profile
   idhive/id
--- End diff --

Do we need profile hive? Since we only support hive-0.12.0 and hive-0.13.1, 
i think we can provide two profile here is ok: profile hive-0.12 for version 
0.12.0 and profile hive-0.13 for version 0.13.1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57939048
  
I tested with @pwendell  shaded hive-0.13.1,  also has this problem:
Exception in thread main java.lang.ClassNotFoundException: 
com.google.protobuf_spark.GeneratedMessage

Hi @pwendell, i think we need regenerate OrcProto.java,Here is  
```org.spark_project.protobuf.ExtensionRegistry``` in

https://github.com/pwendell/hive/blob/0.13-shaded-protobuf/ql/src/gen/protobuf/gen-java/org/apache/hadoop/hive/ql/io/orc/OrcProto.java#L9

It should be ```com.google.protobuf_spark.ExtensionRegistry registry```?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread scwf

Github user scwf commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57939320
  
I made a shaded hive-0.13.1 version several days ago for 
testing(https://github.com/scwf/hive/tree/0.13.1-shaded).Hope it is useful:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-05 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18439490
  
--- Diff: pom.xml ---
@@ -1260,7 +1259,18 @@
 /dependency
   /dependencies
 /profile
-
+profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
+/property
+  /activation
+  properties
+hive.version0.12.0/hive.version
+derby.version10.4.2.0/derby.version
+  /properties
+/profile
 profile
   idhive/id
--- End diff --

The original patch took this way, but with this new approach, it can 
support different hive versions without changing pom file further. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18429655
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, path: Path) = {
+context.getExternalTmpPath(path.toUri)
+  }
+
+  def getDataLocationPath(p: Partition) = p.getDataLocation
+
+  def getAllPartitionsOf(client: Hive, tbl: Table) =  
client.getAllPartitionsOf(tbl)
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
--- End diff --

I am pretty confused about it. I think Hive needs to serialize FileSinkDesc 
when the query plan

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18429712
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, path: Path) = {
+context.getExternalTmpPath(path.toUri)
+  }
+
+  def getDataLocationPath(p: Partition) = p.getDataLocation
+
+  def getAllPartitionsOf(client: Hive, tbl: Table) =  
client.getAllPartitionsOf(tbl)
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
+   * */
+  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
+var f = new

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18429736
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, path: Path) = {
+context.getExternalTmpPath(path.toUri)
+  }
+
+  def getDataLocationPath(p: Partition) = p.getDataLocation
+
+  def getAllPartitionsOf(client: Hive, tbl: Table) =  
client.getAllPartitionsOf(tbl)
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
+   * */
+  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
--- End diff --

If we

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18429781
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
--- End diff --

```
if () {
  ...
} else {
  ...
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18429783
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = 
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+}
+appendReadColumnNames(conf, names)
--- End diff --

Why no null and empty check at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail:

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430042
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.net.URI
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.`type`.HiveDecimal
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.stats.StatsSetupConst
+import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.mapred.InputFormat
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.12.0.
+ */
+private[hive] object HiveShim {
+  val version = 0.12.0
+  val metastoreDecimal = decimal
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(serdeClass, inputFormatClass, outputFormatClass, 
properties)
--- End diff --

Is it necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430235
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala 
---
@@ -369,6 +371,7 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
* tests.
*/
   protected val originalUdfs: JavaSet[String] = 
FunctionRegistry.getFunctionNames
+  HiveShim.createDefaultDBIfNeeded(this)
--- End diff --

Can you add a comment at here to explain why it is necessary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430262
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala 
---
@@ -78,6 +79,7 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
   // For some hive test case which contain ${system:test.tmp.dir}
   System.setProperty(test.tmp.dir, testTempDir.getCanonicalPath)
 
+  CommandProcessorFactory.clean(hiveconf)
--- End diff --

Since it is a cleanup work, seems it is better to be placed after 
`System.clearProperty(spark.hostPort)`. Also, please add comment about what 
this call is doing and why it is needed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430322
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -80,8 +81,10 @@ class StatisticsSuite extends QueryTest with 
BeforeAndAfterAll {
 sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect()
 sql(INSERT INTO TABLE analyzeTable SELECT * FROM src).collect()
 
-assert(queryTotalSize(analyzeTable) === defaultSizeInBytes)
-
+// TODO: How it works? needs to add it back for other hive version.
+if (HiveShim.version ==0.12.0) {
--- End diff --

For Hive 0.13, will table always be updated after `INSERT INTO`?

When we added this test, the table size was not updated with the `INSERT 
INTO` command.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430463
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
--- End diff --

Can you double check it? I am not sure DECIMAL in hive-0.12 is actually 
DECIMAL(10,0). From the code, seems precision is unbounded.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430480
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -212,7 +214,18 @@ private[hive] object HiveQl {
   /**
* Returns the AST for the given SQL string.
*/
-  def getAst(sql: String): ASTNode = ParseUtils.findRootNonNullToken((new 
ParseDriver).parse(sql))
+  def getAst(sql: String): ASTNode = {
+/*
+ * Context has to be passed in in hive0.13.1.
--- End diff --

in in


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430557
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
--- End diff --

Let's say we connect to a existing hive 0.13 metastore. If there is a 
decimal column with a user-defined precision and scale, will we see parsing 
error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430590
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala
 ---
@@ -557,11 +557,14 @@ class HiveQuerySuite extends HiveComparisonTest {
   |WITH serdeproperties('s1'='9')
 .stripMargin)
 }
-sql(sADD JAR $testJar)
-sql(
-  ALTER TABLE alter1 SET SERDE 
'org.apache.hadoop.hive.serde2.TestSerDe'
-|WITH serdeproperties('s1'='9')
-  .stripMargin)
+// now only verify 0.12.0, and ignore other versions due to binary 
compatability
--- End diff --

Can you explain it a little bit more?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430600
  
--- Diff: 
sql/hive/v0.12.0/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.net.URI
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.`type`.HiveDecimal
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Hive, Partition, Table}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors._
+import org.apache.hadoop.hive.ql.stats.StatsSetupConst
+import org.apache.hadoop.hive.serde2.{Deserializer, ColumnProjectionUtils}
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.mapred.InputFormat
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.12.0.
+ */
+private[hive] object HiveShim {
+  val version = 0.12.0
+  val metastoreDecimal = decimal
+
+  def getTableDesc(
+serdeClass: Class[_ : Deserializer],
+inputFormatClass: Class[_ : InputFormat[_, _]],
+outputFormatClass: Class[_],
+properties: Properties) = {
+new TableDesc(serdeClass, inputFormatClass, outputFormatClass, 
properties)
+  }
+
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+
+  def createDefaultDBIfNeeded(context: HiveContext) ={  }
+
+  /** The string used to denote an empty comments field in the schema. */
+  def getEmptyCommentsFieldValue = None
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd(0), conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+new HiveDecimal(bd)
+  }
+
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+ColumnProjectionUtils.appendReadColumnIDs(conf, ids)
+ColumnProjectionUtils.appendReadColumnNames(conf, names)
+  }
+
+  def getExternalTmpPath(context: Context, uri: URI): String = {
--- End diff --

It will be good to make the return type consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18430657
  
--- Diff: sql/hive/pom.xml ---
@@ -119,6 +83,74 @@
 
   profiles
 profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
--- End diff --

If we use modified hive dependencies, can we avoid this error and simplify 
pom changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-04 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18431391
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import java.util.Properties
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.{Table, Hive, Partition}
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import org.apache.hadoop.hive.serde2.{ColumnProjectionUtils, Deserializer}
+import org.apache.hadoop.mapred.InputFormat
+import org.apache.spark.Logging
+import org.apache.hadoop.{io = hadoopIo}
+import scala.collection.JavaConversions._
+import scala.language.implicitConversions
+
+/**
+ * A compatibility layer for interacting with Hive version 0.13.1.
+ */
+private[hive] object HiveShim {
+  val version = 0.13.1
+  /*
+   * TODO: hive-0.13 support DECIMAL(precision, scale), DECIMAL in 
hive-0.12 is actually DECIMAL(10,0)
--- End diff --

Yeah I think you are right, it is unbounded in Hive 12.  Spark SQL also 
will use unbounded precision decimals internally, so when its not specified 
thats what we should assume.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-01 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57574745
  
@pwendell I think the packaging has some problem, probably in protobuf. I 
ran some test suite, but cannot go through. With the original package, the test 
is OK. Following are some example failure case.

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
test-only org.apache.spark.sql.hive.CachedTableSuite
Caused by: sbt.ForkMain$ForkError: 
com.google.protobuf_spark.GeneratedMessage
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
test-only org.apache.spark.sql.hive.execution.HiveQuerySuite
[info]   ...
[info]   Cause: java.lang.ClassNotFoundException: 
com.google.protobuf_spark.GeneratedMessage
[info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
test-only org.apache.spark.sql.parquet.ParquetMetastoreSuite
[info]   ...
[info]   Cause: java.lang.ClassNotFoundException: 
com.google.protobuf_spark.GeneratedMessage
[info]   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-10-01 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57580447
  
@yhuai I removed all unnecessary implicits to make it consistent, but have 
to keep wrapperToFileSinkDesc because HiveFileFormatUtils.getHiveRecordWriter 
needs FileSinkDesc type, and also it help to track the internal state change of 
FileSinkDesc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57273157
  
Hey @zhzhan I've published a modified version of Hive 0.13 that we can link 
against. A few benefits is:

1. I fixed the hive-exec jar so it only contains hive packages and not a 
bunch of other code.
2. I'm using a shaded version of the protobuf dependency (otherwise, this 
intefers with protobuf found in older hadoop client versions).


https://oss.sonatype.org/content/repositories/orgspark-project-1077/org/spark-project/hive/

For now you'll need to add this repo to the build to use it, but if it all 
works I can just fully publish this to maven central.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-30 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57394068
  
@pwendell Thanks a lot. But I need some confirmation from you.
1. Is it long term decision to have customized hive 0.13 repo 
org.spark-project.hive
 
2. Do you want to support both hive 0.13.0 and 0.13.1, or just 0.13.0? 
My current patch is against hive0.13.1. The package you provided is 0.13.0. 
There is also API compatibility issue between 0.13.1 and 0.13.0.  I can provide 
a 0.13.0 patch, but this approach will also need a separate package for both 
0.13.0 and 0.1.31 if we want to support both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-30 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57420122
  
Hey @zhzhan thanks for pointing that out I didn't realize. I think it's 
fine to just support Hive 0.13.1... didn't realize there were major changes. 

I went ahead and published a modified version of 0.13.1 here:


https://oss.sonatype.org/content/repositories/orgspark-project-1079/org/spark-project/hive/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18179030
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -284,24 +287,20 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
-
-  SessionState.start(sessionState)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
   val results = new JArrayList[String]
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close()
 throw new QueryExecutionException(response.getErrorMessage)
   }
   driver.setMaxRows(maxRows)
   driver.getResults(results)
-  driver.destroy()
+  driver.close()
--- End diff --

A quick question since I ran into something similar when trying to run 
things on 0.13: can the driver be re-used after you call `close()`? Because I 
don't see the driver being removed from the CommandProcessorFactory cache, so 
another call to `runHive` will reuse the closed driver.

(I ran into this when running the `HiveFromSpark` example compiled against 
Hive 0.13, so it should be easy to check.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18186480
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -284,24 +287,20 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
-
-  SessionState.start(sessionState)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
   val results = new JArrayList[String]
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close()
 throw new QueryExecutionException(response.getErrorMessage)
   }
   driver.setMaxRows(maxRows)
   driver.getResults(results)
-  driver.destroy()
+  driver.close()
--- End diff --

My understanding is that the driver is not removed from mapDriver of 
CommandProcessorFactory, and will be reused.

CommandProcessorFactory.clean(hiveConf) will destroy and remove the driver 
from mapDriver.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57242012
  
@zhzhan I think those implicits are not necessary. Can you change those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18195198
  
--- Diff: sql/hive/pom.xml ---
@@ -119,6 +83,74 @@
 
   profiles
 profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
--- End diff --

What will happen if we have `-Dhive.version=0.12.0`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57264575
  
@yhuai Can you be more specific regarding the comments: I think those 
implicits are not necessary. Can you change those?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r18198152
  
--- Diff: sql/hive/pom.xml ---
@@ -119,6 +83,74 @@
 
   profiles
 profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
--- End diff --

Due to special packaging requirements in hive 0.12.0.  This will cause 
compiling error. For future other versions, the flag is OK as long as the shim 
layer is provided.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-29 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57265563
  
I meant those implicit classes and methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-26 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57011211
  
@zhzhan Thank you for updating the PR. Can you let me know how to setup 
local dependencies? I'd like to try it. Thanks:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-26 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57040898
  
@yhuai Following cmd will build with 0.13.1 support. No extra setup 
required. Replace any place -Phive in the current building with 
-Dhive.version=0.13.1. Note that don't use -Phive and -Pthrift-server together 
with -Dhive.version
 
sbt/sbt -Dhive.version=0.13.1 -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 
assembly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-26 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-57041188
  
@yhuai I tried some 0.13.1 feature, for example collection_list, native 
parquet file support, limited orc file support. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-22 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17833372
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import scala.language.implicitConversions
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import scala.collection.JavaConversions._
+import org.apache.spark.Logging
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.Partition
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.mapred.InputFormat
+import java.util.Properties
+import org.apache.hadoop.hive.serde2.Deserializer
+
+/*hive-0.13.1 support shimmer layer*/
+object HiveShim {
+  val version = 0.13.1
+  /*
+   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is 
actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+  def getTableDesc(serdeClass: Class[_ : Deserializer], inputFormatClass: 
Class[_ : InputFormat[_, _]], outputFormatClass: Class[_], properties: 
Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+  /*handle the difference in HiveQuerySuite*/
+  def getEmptyCommentsFieldValue = 
+
+  def convertCatalystString2Hive(s: String) = s
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   * */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   * */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+} else {
+  appendReadColumnNames(conf, names)
+}
+  }
+
+  /*
+   * Bug introdiced in hive-0.13. FileSinkDesc is serilizable, but its 
member path is not.
+   * Fix it through wrapper.
+   * */
+  implicit def wrapperToFileSinkDesc(w: ShimFileSinkDesc): FileSinkDesc = {
+var f = new FileSinkDesc(new Path(w.dir), w.tableInfo, w.compressed)
+f.setCompressed(w.compressed)
+f.setCompressCodec(w.compressCodec)
+f.setCompressType(w.compressType)
+f.setTableInfo(w.tableInfo)
+f
+  }
+
+  implicit class wrapperToPartition(p: Partition) {
+def getDataLocationPath: Path =

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-22 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17834283
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -39,6 +38,8 @@ import org.apache.spark.sql.columnar.InMemoryRelation
 import org.apache.spark.sql.hive.execution.HiveTableScan
 import org.apache.spark.util.Utils
 
+import org.apache.spark.sql.hive.HiveShim
+import org.apache.spark.sql.hive.HiveShim._
--- End diff --

Yes. It need the implicit methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-22 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17865634
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala
 ---
@@ -40,6 +39,9 @@ import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.expressions.Row
 import org.apache.spark.sql.execution.{SparkPlan, UnaryNode}
 import org.apache.spark.sql.hive.{HiveContext, MetastoreRelation, 
SparkHiveHadoopWriter}
+import org.apache.spark.sql.hive.{ShimFileSinkDesc = FileSinkDesc, 
ShimContext = Context}
+import org.apache.spark.sql.hive.HiveShim
+import org.apache.spark.sql.hive.HiveShim._
--- End diff --

It is for implicit conversion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17832655
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -59,7 +60,7 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
 val table = client.getTable(databaseName, tblName)
 val partitions: Seq[Partition] =
   if (table.isPartitioned) {
-client.getAllPartitionsForPruner(table).toSeq
+client.getAllPartitionsOf(table).toSeq
--- End diff --

This is just the name change in hive, no functionality change. Please refer 
to hive-5483


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17832668
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -292,24 +294,20 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
-
-  SessionState.start(sessionState)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
   val results = new JArrayList[String]
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close
--- End diff --

I follow some example case in hive code base. If driver.destory is used, 
there are some very weird behavior. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17832786
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TestHive.scala 
---
@@ -77,7 +78,7 @@ class TestHiveContext(sc: SparkContext) extends 
HiveContext(sc) {
 
   // For some hive test case which contain ${system:test.tmp.dir}
   System.setProperty(test.tmp.dir, testTempDir.getCanonicalPath)
-
+  CommandProcessorFactory.clean(hiveconf);
--- End diff --

This is to cleanup. Otherwise, I observe memory leak in hive0.13.1. 
Following is the funciton.
  public static void clean(HiveConf conf) {
Driver drv = mapDrivers.get(conf);
if (drv != null) {
  drv.destroy();
}

mapDrivers.remove(conf);
  }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17833036
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -140,7 +141,7 @@ class HadoopTableReader(
   filterOpt: Option[PathFilter]): RDD[Row] = {
 val hivePartitionRDDs = partitionToDeserializer.map { case (partition, 
partDeserializer) =
   val partDesc = Utilities.getPartitionDesc(partition)
-  val partPath = partition.getPartitionPath
+  val partPath = partition.getDataLocationPath
--- End diff --

You are right.  getPartitionPath is not in hive-0.13.1, but in trunk, and 
we have to deal with it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17833077
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import scala.language.implicitConversions
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import scala.collection.JavaConversions._
+import org.apache.spark.Logging
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.Partition
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.mapred.InputFormat
+import java.util.Properties
+import org.apache.hadoop.hive.serde2.Deserializer
+
+/*hive-0.13.1 support shimmer layer*/
+object HiveShim {
+  val version = 0.13.1
+  /*
+   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is 
actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+  def getTableDesc(serdeClass: Class[_ : Deserializer], inputFormatClass: 
Class[_ : InputFormat[_, _]], outputFormatClass: Class[_], properties: 
Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+  /*handle the difference in HiveQuerySuite*/
+  def getEmptyCommentsFieldValue = 
+
+  def convertCatalystString2Hive(s: String) = s
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   * */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
--- End diff --

I didn't see appendReadColumns handles empty strings, and met error in this 
case during testing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17833159
  
--- Diff: sql/hive/pom.xml ---
@@ -119,6 +83,74 @@
 
   profiles
 profile
+  idhive-default/id
+  activation
+property
+  name!hive.version/name
+/property
+  /activation
+  dependencies
+dependency
+  groupIdcom.twitter/groupId
+  artifactIdparquet-hive-bundle/artifactId
+  version1.5.0/version
+/dependency
+dependency
+  groupIdorg.spark-project.hive/groupId
--- End diff --

For hive-0.12, because there is version conflicts in protobuf, the hive jar 
has to be shaded. It is OK for hive-0.13.1, but overall the problem has to be 
fixed in hive. I was told that the packaging issue may be fixed in hive 0.14. 
Because currently the primary hive version supported is hive 0.12, which use 
customized jar. After everything is sorted out, and spark moves to support more 
latest hive version, the pom should be changed accordingly and stabilized in 
this part. The current solution is not perfect, but it is hard to find a better 
way to solve this problem in one shot. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-21 Thread zhzhan

Github user zhzhan commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17833192
  
--- Diff: 
sql/hive/v0.13.1/src/main/scala/org/apache/spark/sql/hive/Shim.scala ---
@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive
+
+import org.apache.hadoop.hive.ql.processors.CommandProcessorFactory
+import scala.language.implicitConversions
+import org.apache.hadoop.hive.ql.plan.{FileSinkDesc, TableDesc}
+import org.apache.hadoop.hive.common.`type`.{HiveDecimal}
+import scala.collection.JavaConversions._
+import org.apache.spark.Logging
+import org.apache.hadoop.hive.serde2.ColumnProjectionUtils
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.hive.ql.Context
+import org.apache.hadoop.hive.ql.metadata.Partition
+import org.apache.hadoop.{io = hadoopIo}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.common.StatsSetupConst
+import org.apache.hadoop.mapred.InputFormat
+import java.util.Properties
+import org.apache.hadoop.hive.serde2.Deserializer
+
+/*hive-0.13.1 support shimmer layer*/
+object HiveShim {
+  val version = 0.13.1
+  /*
+   * hive-0.13 support DECIMAL(precision, scale), DECIMAL in hive-0.12 is 
actually DECIMAL(10,0)
+   * Full support of new decimal feature need to be fixed in seperate PR.
+   */
+  val metastoreDecimal = decimal(10,0)
+  def getTableDesc(serdeClass: Class[_ : Deserializer], inputFormatClass: 
Class[_ : InputFormat[_, _]], outputFormatClass: Class[_], properties: 
Properties) = {
+new TableDesc(inputFormatClass, outputFormatClass, properties)
+  }
+  def getStatsSetupConstTotalSize = StatsSetupConst.TOTAL_SIZE
+  def createDefaultDBIfNeeded(context: HiveContext) ={
+context.runSqlHive(CREATE DATABASE default)
+context.runSqlHive(USE default)
+  }
+  /*handle the difference in HiveQuerySuite*/
+  def getEmptyCommentsFieldValue = 
+
+  def convertCatalystString2Hive(s: String) = s
+
+  def getCommandProcessor(cmd: Array[String], conf: HiveConf) =  {
+CommandProcessorFactory.get(cmd, conf)
+  }
+
+  def createDecimal(bd: java.math.BigDecimal): HiveDecimal = {
+HiveDecimal.create(bd)
+  }
+
+  /*
+   * This function in hive-0.13 become private, but we have to do this to 
walkaround hive bug
+   * */
+  private def appendReadColumnNames(conf: Configuration, cols: 
Seq[String]) {
+val old: String = 
conf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, )
+val result: StringBuilder = new StringBuilder(old)
+var first: Boolean = old.isEmpty
+
+for (col - cols) {
+  if (first) {
+first = false
+  }
+  else {
+result.append(',')
+  }
+  result.append(col)
+}
+conf.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, 
result.toString)
+  }
+
+  /*
+   * Cannot use ColumnProjectionUtils.appendReadColumns directly, if ids 
is null or empty
+   * */
+  def appendReadColumns(conf: Configuration, ids: Seq[Integer], names: 
Seq[String]) {
+if (ids != null  ids.size  0) {
+  ColumnProjectionUtils.appendReadColumns(conf, ids)
+} else {
+  appendReadColumnNames(conf, names)
--- End diff --

good catch


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread gss2002

Github user gss2002 commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-56185334
  
We have been using this fix for a few weeks now against Hive 13. The only 
outstanding issue I see and this could be something larger is the fact that 
Spark Thrift service doesn't seem to support the hive.server2.enable.doAs = 
true. It doesn't set proxy user.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/2241#issuecomment-56212828
  
This patch does not include thrift patch, which will be fixed by other 
jiras, because I don't want the scope is too big.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17805241
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -170,13 +171,14 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
 
 val tableParameters = relation.hiveQlTable.getParameters
 val oldTotalSize =
-  
Option(tableParameters.get(StatsSetupConst.TOTAL_SIZE)).map(_.toLong).getOrElse(0L)
+  
Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize)).
--- End diff --

Don't orphan the `.`.  I'd format this as:

```scala
Option(tableParameters.get(HiveShim.getStatsSetupConstTotalSize))
  .map(_.toLong)
  .getOrElse(0L)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2706][SQL] Enable Spark to support Hive...

2014-09-19 Thread marmbrus

Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2241#discussion_r17805302
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -292,24 +294,20 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   val cmd_trimmed: String = cmd.trim()
   val tokens: Array[String] = cmd_trimmed.split(\\s+)
   val cmd_1: String = cmd_trimmed.substring(tokens(0).length()).trim()
-  val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), 
hiveconf)
-
-  SessionState.start(sessionState)
+  val proc: CommandProcessor = 
HiveShim.getCommandProcessor(Array(tokens(0)), hiveconf)
 
   proc match {
 case driver: Driver =
-  driver.init()
-
   val results = new JArrayList[String]
   val response: CommandProcessorResponse = driver.run(cmd)
   // Throw an exception if there is an error in query processing.
   if (response.getResponseCode != 0) {
-driver.destroy()
+driver.close
--- End diff --

Use `()` on all methods that have side-effects.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 >

1 - 100 of 163 matches

Mail list logo