Re: new JDBC server test cases seems failed ?
Noticed that Nan’s PR is not related to SQL, but the JDBC test suites got executed. Then I checked PRs of all those Jenkins builds that failed because of the JDBC suites, it turns out that none of them touched SQL code. The JDBC code is only contained in the assembly file when the hive-thriftserver build profile is enabled. So it seems that the root cause is related to Maven build changes that makes the JDBC suites always get executed and fail because JDBC code isn't included in the assembly jar. This also explains why I can’t reproduce it locally (I always enable hive-thriftserver profile), and why once the build fail, all JDBC suites fail together. Working on a patch to fix this. Thanks to Patrick for helping debugging this! On Jul 28, 2014, at 10:07 AM, Cheng Lian l...@databricks.com wrote: I’m looking into this, will fix this ASAP, sorry for the inconvenience. On Jul 28, 2014, at 9:47 AM, Patrick Wendell pwend...@gmail.com wrote: I'm going to revert it again - Cheng can you try to look into this? Thanks. On Sun, Jul 27, 2014 at 6:06 PM, Nan Zhu zhunanmcg...@gmail.com wrote: it's 20 minutes ago https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17259/consoleFull -- Nan Zhu On Sunday, July 27, 2014 at 8:53 PM, Michael Armbrust wrote: How recent is this? We've already reverted this patch once due to failing tests. It would be helpful to include a link to the failed build. If its failing again we'll have to revert again. On Sun, Jul 27, 2014 at 5:26 PM, Nan Zhu zhunanmcg...@gmail.com (mailto:zhunanmcg...@gmail.com) wrote: Hi, all It seems that the JDBC test cases are failed unexpectedly in Jenkins? [info] - test query execution against a Hive Thrift server *** FAILED *** [info] java.sql.SQLException: Could not open connection to jdbc:hive2://localhost:45518/: java.net.ConnectException: Connection refused [info] at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:146) [info] at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:123) [info] at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) [info] at java.sql.DriverManager.getConnection(DriverManager.java:571) [info] at java.sql.DriverManager.getConnection(DriverManager.java:215) [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.getConnection(HiveThriftServer2Suite.scala:131) [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.createStatement(HiveThriftServer2Suite.scala:134) [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite$$anonfun$1.apply$mcV$sp(HiveThriftServer2Suite.scala:110) [info] at org.apache.spark.sql.hive.thri ftserver.HiveThriftServer2Suite$$anonfun$1.apply(HiveThriftServer2Suite.scala:107) [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite$$anonfun$1.apply(HiveThriftServer2Suite.scala:107) [info] ... [info] Cause: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused [info] at org.apache.thrift.transport.TSocket.open(TSocket.java:185) [info] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248) [info] at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) [info] at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:144) [info] at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:123) [info] at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) [info] at java.sql.DriverManager.getConnection(DriverManager.java:571) [info] at java.sql.DriverManager.getConnection(DriverManager.java:215) [info] at org.apache.spark.sql.hive.thriftserver.H iveThriftServer2Suite.getConnection(HiveThriftServer2Suite.scala:131) [info] at org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.createStatement(HiveThriftServer2Suite.scala:134) [info] ... [info] Cause: java.net.ConnectException: Connection refused [info] at java.net.PlainSocketImpl.socketConnect(Native Method) [info] at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [info] at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [info] at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) [info] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) [info] at java.net.Socket.connect(Socket.java:579) [info] at org.apache.thrift.transport.TSocket.open(TSocket.java:180) [info] at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248) [info] at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37) [info] at org.apache.hive.jdbc.HiveConn ection.openTransport(HiveConnection.java:144) [info] ... [info] CliSuite: Executing: create table hive_test1(key int, val string);, expecting output: OK [warn] four warnings found [warn] Note:
package/assemble with local spark
Hi, How do you package an app with modified spark? In seems sbt would resolve the dependencies, and use the official spark release. Thank you! Larry
Re: Working Formula for Hive 0.13?
I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Fraud management system implementation
This sounds more like a user list https://spark.apache.org/community.html question. This is the dev list, where people discuss things related to contributing code and such to Spark. On Mon, Jul 28, 2014 at 10:15 AM, jitendra shelar jitendra.shelar...@gmail.com wrote: Hi, I am new to spark. I am learning spark and scala. I had some queries. 1) Can somebody please tell me if it is possible to implement credit card fraud management system using spark? 2) If yes, can somebody please guide me how to proceed. 3) Shall I prefer Scala or Java for this implementation? 4) Please suggest me some pointers related to Hidden Markonav Model (HMM) and anomaly detection in data mining (using spark). Thanks, Jitendra
Re: Working Formula for Hive 0.13?
Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Where and how is that fork being maintained? I'm not seeing an obviously correct branch or tag in the main asf hive repo github mirror. On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
Yeah so we need a model for this (Mark - do you have any ideas?). I did this in a personal github repo. I just did it quickly because dependency issues were blocking the 1.0 release: https://github.com/pwendell/hive/tree/branch-0.12-shaded-protobuf I think what we want is to have a semi official github repo with an index to each of the shaded dependencies and what version is included in which branch. - Patrick On Mon, Jul 28, 2014 at 10:02 AM, Mark Hamstra m...@clearstorydata.com wrote: Where and how is that fork being maintained? I'm not seeing an obviously correct branch or tag in the main asf hive repo github mirror. On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Working Formula for Hive 0.13?
I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have
Re: Working Formula for Hive 0.13?
Getting and maintaining our own branch in the main asf hive repo is a non-starter or isn't workable? On Mon, Jul 28, 2014 at 10:17 AM, Patrick Wendell pwend...@gmail.com wrote: Yeah so we need a model for this (Mark - do you have any ideas?). I did this in a personal github repo. I just did it quickly because dependency issues were blocking the 1.0 release: https://github.com/pwendell/hive/tree/branch-0.12-shaded-protobuf I think what we want is to have a semi official github repo with an index to each of the shaded dependencies and what version is included in which branch. - Patrick On Mon, Jul 28, 2014 at 10:02 AM, Mark Hamstra m...@clearstorydata.com wrote: Where and how is that fork being maintained? I'm not seeing an obviously correct branch or tag in the main asf hive repo github mirror. On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender
Re: Working Formula for Hive 0.13?
Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that’s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its dependencies *in its artifact*, making it unusable as-is because it contained copies of dependencies that clash with versions present in other artifacts, and can't be managed with Maven mechanisms. I am not sure why hive-exec was not published normally, with just its own classes. That's why it was copied, into an artifact with just hive-exec code. You could do the same thing for hive-exec 0.13.1. Or maybe someone knows that it's published more 'normally' now. I don't think hive-metastore is related to this question? I am no expert on the Hive artifacts, just remembering what the issue was initially in case it helps you get to a similar solution. On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote: hive-exec (as of 0.13.1) is published here: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar Should a JIRA be opened so that dependency on hive-metastore can be replaced by dependency on hive-exec ? Cheers On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote: The reason for org.spark-project.hive is that Spark relies on hive-exec, but the Hive project does not publish this artifact by itself, only with all its dependencies as an uber jar. Maybe that's been improved. If so, you need to point at the new hive-exec and perhaps sort out its dependencies manually in your build. On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote: I found 0.13.1 artifacts in maven: http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar However, Spark uses groupId of org.spark-project.hive, not org.apache.hive Can someone tell me how it is supposed to work ? Cheers On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote: I saw a note earlier, perhaps on the user list, that at least one person is using Hive 0.13. Anyone got a working build configuration for this version of Hive? Regards, - Steve -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity
Re: VertexPartition and ShippableVertexPartition
On Mon, Jul 28, 2014 at 4:29 AM, Larry Xiao xia...@sjtu.edu.cn wrote: On 7/28/14, 3:41 PM, shijiaxin wrote: There is a VertexPartition in the EdgePartition,which is created by EdgePartitionBuilder.toEdgePartition. and There is also a ShippableVertexPartition in the VertexRDD. These two Partitions have a lot of common things like index, data and Bitset, why is this necessary? There is a VertexPartition in the EdgePartition,which is created by Is the VertexPartition in the EdgePartition, the Mirror Cache part? Yes, exactly. The primary copy of each vertex is stored in the VertexRDD using the index, values, and mask data structures, which together form a hash map. In addition, each partition of the VertexRDD stores the corresponding partition of the routing table to facilitate joining with the edges. The ShippableVertexPartition class encapsulates the vertex hash map along with a RoutingTablePartition. After joining the vertices with the edges, the edge partitions cache their adjacent vertices in the mirror cache. They use the VertexPartition for this, which provides only the hash map functionality and not the routing table. Ankur http://www.ankurdave.com/
'Proper' Build Tool
Gents, It seem that until recently, building via sbt was a documented process in the 0.9 overview: http://spark.apache.org/docs/0.9.0/ The section on building mentions using sbt/sbt assembly. However in the latest overview: http://spark.apache.org/docs/latest/index.html There¹s no mention of building with sbt. What¹s the recommended way to build? What are most people using in their daily workflow? Cheers, - SteveN -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: 'Proper' Build Tool
Hi Steve, I had the opportunity to ask this question at the Summit to Andrew Orr. He mentioned that with 1.0 the recommended build tool is with maven. sbt is however still supported. You will notice that the dependencies are now completely handled within the maven pom.xml: the SparkBuild.scala /sbt reads the dependencies from the pom.xml. Andrew further suggested to look at the make-distribution.sh to see the recommended way to create builds. Using mvn on the command line is fine - but the aforementioned script provides a framework /guideline to set things up properly. 2014-07-28 13:06 GMT-07:00 Steve Nunez snu...@hortonworks.com: Gents, It seem that until recently, building via sbt was a documented process in the 0.9 overview: http://spark.apache.org/docs/0.9.0/ The section on building mentions using sbt/sbt assembly. However in the latest overview: http://spark.apache.org/docs/latest/index.html There¹s no mention of building with sbt. What¹s the recommended way to build? What are most people using in their daily workflow? Cheers, - SteveN -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Can I translate the documentations of Spark in Japanese?
Hi Yu, I could help translating Spark documentation to Japanese. Please let me know if you need. Best, Ken On Mon, Jul 28, 2014 at 1:03 AM, Yu Ishikawa [via Apache Spark Developers List] ml-node+s1001551n7546...@n3.nabble.com wrote: Hello Patrick, Thank you for your replying. I checked some other projects in terms of i18n of documentations. For example, the documentations of the Apache HTTP server project are supported i18n natively. https://github.com/apache/httpd/blob/trunk/docs%2Fmanual%2Findex.html But it seems that the Chinese documentations of Apache HBase are only linked from the top page of HBase. From:http://hbase.apache.org/ To: http://abloz.com/hbase/book.html I think that it is currently difficult to support i18n in Apache Spark documentations. I suggest that I will translate the documentations in Japanese in Github page unofficially. If possible, would you please link from translated documentations to Apache Spark documentation. Regards, Yu -- If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-translate-the-documentations-of-Spark-in-Japanese-tp7538p7546.html To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1...@n3.nabble.com To unsubscribe from Apache Spark Developers List, click here http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=dWd3LmdpLndvcmxkQGdtYWlsLmNvbXwxfC0zMTQ3MDY5ODA= . NAML http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- Kenichi Takagiwa - Keio University Graduate School of Science and Technology Department of Open and Environmental Systems Faculty of Computer Science Hiroaki Nishi Laboratory Email: ugw.gi.wo...@gmail.com Phone: +81-50-3575-6586 -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-translate-the-documentations-of-Spark-in-Japanese-tp7538p7570.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: 'Proper' Build Tool
Yeah for packagers we officially recommend using maven. Spark's dependency graph is very complicated and Maven and SBT use different conflict resolution strategies, so we've opted to official support Maven. SBT is still around though and it's used more often by day-to-day developers. - Patrick
Re: package/assemble with local spark
You can use publish-local in sbt. If you want to be more careful, you can give Spark a different version number and use that version number in your app. On Mon, Jul 28, 2014 at 4:33 AM, Larry Xiao xia...@sjtu.edu.cn wrote: Hi, How do you package an app with modified spark? In seems sbt would resolve the dependencies, and use the official spark release. Thank you! Larry
Re: Working Formula for Hive 0.13?
After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local repository, resolution will not be reattempted until the update interval of maven-repo has elapsed or updates are forced - [Help 1] Some hint would be appreciated. Cheers On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote: Yes, it is published. As of previous versions, at least, hive-exec included all of its
Re: Working Formula for Hive 0.13?
I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix that issue. If not, we'll have to continue forking our own version of Hive to change the way it publishes artifacts. - Patrick On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote: Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still uber jar. Right now I am facing the following error building against Hive 0.13.1 : [ERROR] Failed to execute goal on project spark-hive_2.10: Could not resolve dependencies for project org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following artifacts could not be resolved: org.spark-project.hive:hive-metastore:jar:0.13.1, org.spark-project.hive:hive-exec:jar:0.13.1, org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find org.spark-project.hive:hive-metastore:jar:0.13.1 in http://repo.maven.apache.org/maven2 was cached in the local
Re: Working Formula for Hive 0.13?
A few things: - When we upgrade to Hive 0.13.0, Patrick will likely republish the hive-exec jar just as we did for 0.12.0 - Since we have to tie into some pretty low level APIs it is unsurprising that the code doesn't just compile out of the box against 0.13.0 - ScalaReflection is for determining Schema from Scala classes, not reflection based bridge code. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 One question I have is, What is the goal of upgrading to hive 0.13.0? Is it purely because you are having problems connecting to newer metastores? Are there some features you are hoping for? This will help me prioritize this effort. Michael On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote: I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen helped me find this: https://issues.apache.org/jira/browse/HIVE-7423 I guess this means that for Hive 0.14, Spark should be able to directly pull in hive-exec-core.jar Cheers On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote: It would be great if the hive team can fix
Re: Working Formula for Hive 0.13?
The larger goal is to get a clean compile test in the environment I have to use. As near as I can tell, tests fail in parquet because parquet was only added in Hive 0.13. There could well be issues in later meta-stores, but one thing at a time... - SteveN On 7/28/14, 17:22, Michael Armbrust mich...@databricks.com wrote: A few things: - When we upgrade to Hive 0.13.0, Patrick will likely republish the hive-exec jar just as we did for 0.12.0 - Since we have to tie into some pretty low level APIs it is unsurprising that the code doesn't just compile out of the box against 0.13.0 - ScalaReflection is for determining Schema from Scala classes, not reflection based bridge code. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 One question I have is, What is the goal of upgrading to hive 0.13.0? Is it purely because you are having problems connecting to newer metastores? Are there some features you are hoping for? This will help me prioritize this effort. Michael On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote: I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection .scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveCon text.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMet astoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMet astoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl. scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team did. The slides of the talk hasn't been released yet though. On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote: Owen
Github mirroring is running behind
https://issues.apache.org/jira/browse/INFRA-8116 Just a heads up, the github mirroring is running behind. You can follow that JIRA to keep up to date on the fix. In the mean time you can use the Apache git itself: https://git-wip-us.apache.org/repos/asf/spark.git Some people have reported issues checking out Apache git as well, but it might work. - Patrick
Re: Working Formula for Hive 0.13?
bq. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 Which Spark release would this Hive upgrade take place ? I agree it is cleaner to upgrade Hive dependency vs. introducing reflection. Cheers On Mon, Jul 28, 2014 at 5:22 PM, Michael Armbrust mich...@databricks.com wrote: A few things: - When we upgrade to Hive 0.13.0, Patrick will likely republish the hive-exec jar just as we did for 0.12.0 - Since we have to tie into some pretty low level APIs it is unsurprising that the code doesn't just compile out of the box against 0.13.0 - ScalaReflection is for determining Schema from Scala classes, not reflection based bridge code. Either way its unclear to if there is any reason to use reflection to support multiple versions, instead of just upgrading to Hive 0.13.0 One question I have is, What is the goal of upgrading to hive 0.13.0? Is it purely because you are having problems connecting to newer metastores? Are there some features you are hoping for? This will help me prioritize this effort. Michael On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote: I was looking for a class where reflection-related code should reside. I found this but don't think it is the proper class for bridging differences between hive 0.12 and 0.13.1: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala Cheers On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote: After manually copying hive 0.13.1 jars to local maven repo, I got the following errors when building spark-hive_2.10 module : [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182: type mismatch; found : String required: Array[String] [ERROR] val proc: CommandProcessor = CommandProcessorFactory.get(tokens(0), hiveconf) [ERROR] ^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60: value getAllPartitionsForPruner is not a member of org.apache. hadoop.hive.ql.metadata.Hive [ERROR] client.getAllPartitionsForPruner(table).toSeq [ERROR]^ [ERROR] /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267: overloaded method constructor TableDesc with alternatives: (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2: Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc and ()org.apache.hadoop.hive.ql.plan.TableDesc cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer], Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in value tableDesc)(in value tableDesc)], java.util.Properties) [ERROR] val tableDesc = new TableDesc( [ERROR] ^ [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing with a stub. [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub. [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a stub. [ERROR] while compiling: /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala during phase: typer library version: version 2.10.4 compiler version: version 2.10.4 The above shows incompatible changes between 0.12 and 0.13.1 e.g. the first error corresponds to the following method in CommandProcessorFactory : public static CommandProcessor get(String[] cmd, HiveConf conf) Cheers On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote: So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s a release date schedule for 0.14. On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote: Exactly, forgot to mention Hulu team also made changes to cope with those incompatibility issues, but they said that¹s relatively easy once the re-packaging work is done. On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote: I've heard from Cloudera that there were hive internal changes between 0.12 and 0.13 that required code re-writing. Over time it might be possible for us to integrate with hive using API's that are more stable (this is the domain of Michael/Cheng/Yin more than me!). It would be interesting to see what the Hulu folks did. - Patrick On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote: AFAIK, according a recent talk, Hulu team in China has built Spark SQL against Hive 0.13 (or 0.13.1?) successfully. Basically they also re-packaged Hive 0.13 as what the Spark team
Re: [VOTE] Release Apache Spark 1.0.2 (RC1)
+1 Tested on standalone and yarn clusters 2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com: Let me add my vote as well. Did some basic tests by running simple projects with various Spark modules. Tested checksums. +1 On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested this on Mac OS X. Matei On Jul 25, 2014, at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDefFoundError when executing Spark QL query on HDFS CSV file. The full list is at http://s.apache.org/9NJ The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f The release files, including signatures, digests, etc can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1024/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ Please vote on releasing this package as Apache Spark 1.0.2! The vote is open until Tuesday, July 29, at 23:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/
Re: on shark, is tachyon less efficient than memory_only cache strategy ?
hi, haoyuan, thanks for replying. 2014-07-21 16:29 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Qingyang, Aha. Got it. 800MB data is pretty small. Loading from Tachyon does have a bit of extra overhead. But it will have more benefit when the data size is larger. Also, if you store the table in Tachyon, you can have different shark servers to query the data at the same time. For more trade-off, please refer to this page: http://tachyon-project.org/Running-Shark-on-Tachyon.html Best, Haoyuan On Wed, Jul 16, 2014 at 12:06 AM, qingyang li liqingyang1...@gmail.com wrote: let's me describe my scene: -- i have 8 machines (24 core , 16G memory, per machine) of spark cluster and tachyon cluster. On tachyon, I create one table which contains 800M data, when i run query sql on shark, it will cost 2.43s, but when i create the same table on spark memory , i run the same sql , it will cost 1.56s. data on tachyon cost more time than data on spark memory. they all have 150 map process, and per node 16-20 map process. I think the reason is that when data is on tachyon, shark will let spark slave load data from tachyon salve which is on the same node with tachyon slave, i have tried to set some configuration to tune shark and tachyon, but still can not make the former more fast than 2.43s. do anyone have some ideas ? By the way , my tachyon block size is 1GB now, i want to reset block size , will it work by setting tachyon.user.default.block.size.byte=8M ? if not, what does tachyon.user.default.block.size.byte mean? 2014-07-14 13:13 GMT+08:00 qingyang li liqingyang1...@gmail.com: Shark, thanks for replying. Let's me clear my question again. -- i create a table using create table xxx1 tblproperties(shark.cache=tachyon) as select * from xxx2 when excuting some sql (for example , select * from xxx1) using shark, shark will read data into shark's memory from tachyon's memory. I think if each time we execute sql, shark always load data from tachyon, it is less effient. could we use some cache policy (such as, CacheAllPolicy FIFOCachePolicy LRUCachePolicy ) to cache data to invoid reading data from tachyon for each sql query? -- 2014-07-14 2:47 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Qingyang, Are you asking Spark or Shark (The first email was Shark, the last email was Spark.)? Best, Haoyuan On Wed, Jul 9, 2014 at 7:40 PM, qingyang li liqingyang1...@gmail.com wrote: could i set some cache policy to let spark load data from tachyon only one time for all sql query? for example by using CacheAllPolicy FIFOCachePolicy LRUCachePolicy. But I have tried that three policy, they are not useful. I think , if spark always load data for each sql query, it will impact the query speed , it will take more time than the case that data are managed by spark itself. 2014-07-09 1:19 GMT+08:00 Haoyuan Li haoyuan...@gmail.com: Yes. For Shark, two modes, shark.cache=tachyon and shark.cache=memory, have the same ser/de overhead. Shark loads data from outsize of the process in Tachyon mode with the following benefits: - In-memory data sharing across multiple Shark instances (i.e. stronger isolation) - Instant recovery of in-memory tables - Reduce heap size = faster GC in shark - If the table is larger than the memory size, only the hot columns will be cached in memory from http://tachyon-project.org/master/Running-Shark-on-Tachyon.html and https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon Haoyuan On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson ilike...@gmail.com wrote: Shark's in-memory format is already serialized (it's compressed and column-based). On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan mri...@gmail.com wrote: You are ignoring serde costs :-) - Mridul On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson ilike...@gmail.com wrote: Tachyon should only be marginally less performant than memory_only, because we mmap the data from Tachyon's ramdisk. We do not have to, say, transfer the data over a pipe from Tachyon; we can directly read from the buffers in the same way that Shark reads from its in-memory columnar format. On Tue, Jul 8, 2014 at 1:18 AM, qingyang li liqingyang1...@gmail.com wrote: hi, when i create a table, i can point the cache strategy using shark.cache, i think
Re: [VOTE] Release Apache Spark 1.0.2 (RC1)
+1 (non-binding) Tested this on Mac OS X. On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or and...@databricks.com wrote: +1 Tested on standalone and yarn clusters 2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com: Let me add my vote as well. Did some basic tests by running simple projects with various Spark modules. Tested checksums. +1 On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested this on Mac OS X. Matei On Jul 25, 2014, at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDefFoundError when executing Spark QL query on HDFS CSV file. The full list is at http://s.apache.org/9NJ The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f The release files, including signatures, digests, etc can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1024/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ Please vote on releasing this package as Apache Spark 1.0.2! The vote is open until Tuesday, July 29, at 23:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/
Re: [VOTE] Release Apache Spark 1.0.2 (RC1)
+1 Tested basic spark-shell and pyspark operations and MLlib examples on a Mac. On Mon, Jul 28, 2014 at 8:29 PM, Mubarak Seyed spark.devu...@gmail.com wrote: +1 (non-binding) Tested this on Mac OS X. On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or and...@databricks.com wrote: +1 Tested on standalone and yarn clusters 2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com: Let me add my vote as well. Did some basic tests by running simple projects with various Spark modules. Tested checksums. +1 On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested this on Mac OS X. Matei On Jul 25, 2014, at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDefFoundError when executing Spark QL query on HDFS CSV file. The full list is at http://s.apache.org/9NJ The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f The release files, including signatures, digests, etc can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1024/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ Please vote on releasing this package as Apache Spark 1.0.2! The vote is open until Tuesday, July 29, at 23:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/
Re: [VOTE] Release Apache Spark 1.0.2 (RC1)
NOTICE and LICENSE files look good Hashes and sigs look good No executable in the source distribution Compile source and run standalone +1 - Henry On Fri, Jul 25, 2014 at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.0.2. This release fixes a number of bugs in Spark 1.0.1. Some of the notable ones are - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for SPARK-1199. The fix was reverted for 1.0.2. - SPARK-2576: NoClassDefFoundError when executing Spark QL query on HDFS CSV file. The full list is at http://s.apache.org/9NJ The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f The release files, including signatures, digests, etc can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/tdas.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1024/ The documentation corresponding to this release can be found at: http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/ Please vote on releasing this package as Apache Spark 1.0.2! The vote is open until Tuesday, July 29, at 23:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.0.2 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/
Re: Github mirroring is running behind
Hi devs, I don't know if this is going to help, but if you can watch vote on the ticket, it might help ASF INFRA prioritize and triage it faster: https://issues.apache.org/jira/browse/INFRA-8116 Please do. Thanks! On Mon, Jul 28, 2014 at 5:41 PM, Patrick Wendell pwend...@gmail.com wrote: https://issues.apache.org/jira/browse/INFRA-8116 Just a heads up, the github mirroring is running behind. You can follow that JIRA to keep up to date on the fix. In the mean time you can use the Apache git itself: https://git-wip-us.apache.org/repos/asf/spark.git Some people have reported issues checking out Apache git as well, but it might work. - Patrick