Re: new JDBC server test cases seems failed ?

2014-07-28 Thread Cheng Lian
Noticed that Nan’s PR is not related to SQL, but the JDBC test suites got 
executed. Then I checked PRs of all those Jenkins builds that failed because of 
the JDBC suites, it turns out that none of them touched SQL code.  The JDBC 
code is only contained in the assembly file when the hive-thriftserver build 
profile is enabled. So it seems that the root cause is related to Maven build 
changes that makes the JDBC suites always get executed and fail because JDBC 
code isn't included in the assembly jar. This also explains why I can’t 
reproduce it locally (I always enable hive-thriftserver profile), and why once 
the build fail, all JDBC suites fail together.

Working on a patch to fix this. Thanks to Patrick for helping debugging this!

On Jul 28, 2014, at 10:07 AM, Cheng Lian l...@databricks.com wrote:

 I’m looking into this, will fix this ASAP, sorry for the inconvenience.
 
 On Jul 28, 2014, at 9:47 AM, Patrick Wendell pwend...@gmail.com wrote:
 
 I'm going to revert it again - Cheng can you try to look into this? Thanks.
 
 On Sun, Jul 27, 2014 at 6:06 PM, Nan Zhu zhunanmcg...@gmail.com wrote:
 it's 20 minutes ago
 
 https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17259/consoleFull
 
 --
 Nan Zhu
 
 
 On Sunday, July 27, 2014 at 8:53 PM, Michael Armbrust wrote:
 
 How recent is this? We've already reverted this patch once due to failing
 tests. It would be helpful to include a link to the failed build. If its
 failing again we'll have to revert again.
 
 
 On Sun, Jul 27, 2014 at 5:26 PM, Nan Zhu zhunanmcg...@gmail.com 
 (mailto:zhunanmcg...@gmail.com) wrote:
 
 Hi, all
 
 It seems that the JDBC test cases are failed unexpectedly in Jenkins?
 
 
 [info] - test query execution against a Hive Thrift server *** FAILED ***
 [info] java.sql.SQLException: Could not open connection to
 jdbc:hive2://localhost:45518/: java.net.ConnectException: Connection
 refused [info] at
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:146)
 [info] at
 org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:123) [info]
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) [info] at
 java.sql.DriverManager.getConnection(DriverManager.java:571) [info] at
 java.sql.DriverManager.getConnection(DriverManager.java:215) [info] at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.getConnection(HiveThriftServer2Suite.scala:131)
 [info] at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.createStatement(HiveThriftServer2Suite.scala:134)
 [info] at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite$$anonfun$1.apply$mcV$sp(HiveThriftServer2Suite.scala:110)
 [info] at org.apache.spark.sql.hive.thri
 ftserver.HiveThriftServer2Suite$$anonfun$1.apply(HiveThriftServer2Suite.scala:107)
 [info] at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite$$anonfun$1.apply(HiveThriftServer2Suite.scala:107)
 [info] ... [info] Cause: org.apache.thrift.transport.TTransportException:
 java.net.ConnectException: Connection refused [info] at
 org.apache.thrift.transport.TSocket.open(TSocket.java:185) [info] at
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248)
 [info] at
 org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
 [info] at
 org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:144)
 [info] at
 org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:123) [info]
 at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) [info] at
 java.sql.DriverManager.getConnection(DriverManager.java:571) [info] at
 java.sql.DriverManager.getConnection(DriverManager.java:215) [info] at
 org.apache.spark.sql.hive.thriftserver.H
 iveThriftServer2Suite.getConnection(HiveThriftServer2Suite.scala:131)
 [info] at
 org.apache.spark.sql.hive.thriftserver.HiveThriftServer2Suite.createStatement(HiveThriftServer2Suite.scala:134)
 [info] ... [info] Cause: java.net.ConnectException: Connection refused
 [info] at java.net.PlainSocketImpl.socketConnect(Native Method) [info] at
 java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
 [info] at
 java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
 [info] at
 java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
 [info] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) 
 [info]
 at java.net.Socket.connect(Socket.java:579) [info] at
 org.apache.thrift.transport.TSocket.open(TSocket.java:180) [info] at
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:248)
 [info] at
 org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
 [info] at org.apache.hive.jdbc.HiveConn
 ection.openTransport(HiveConnection.java:144) [info] ... [info] CliSuite:
 Executing: create table hive_test1(key int, val string);, expecting 
 output:
 OK [warn] four warnings found [warn] Note:
 

package/assemble with local spark

2014-07-28 Thread Larry Xiao

Hi,

How do you package an app with modified spark?

In seems sbt would resolve the dependencies, and use the official spark 
release.


Thank you!

Larry


Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
I found 0.13.1 artifacts in maven:
http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar

However, Spark uses groupId of org.spark-project.hive, not org.apache.hive

Can someone tell me how it is supposed to work ?

Cheers


On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com wrote:

 I saw a note earlier, perhaps on the user list, that at least one person is
 using Hive 0.13. Anyone got a working build configuration for this version
 of Hive?

 Regards,
 - Steve



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Fraud management system implementation

2014-07-28 Thread Nicholas Chammas
This sounds more like a user list https://spark.apache.org/community.html
question. This is the dev list, where people discuss things related to
contributing code and such to Spark.


On Mon, Jul 28, 2014 at 10:15 AM, jitendra shelar 
jitendra.shelar...@gmail.com wrote:

 Hi,

 I am new to spark. I am learning spark and scala.

 I had some queries.

 1) Can somebody please tell me if it is possible to implement credit
 card fraud management system using spark?
 2) If yes, can somebody please guide me how to proceed.
 3) Shall I prefer Scala or Java for this implementation?

 4) Please suggest me some pointers related to Hidden Markonav Model
 (HMM) and anomaly detection in data mining (using spark).

 Thanks,
 Jitendra



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Sean Owen
Yes, it is published. As of previous versions, at least, hive-exec
included all of its dependencies *in its artifact*, making it unusable
as-is because it contained copies of dependencies that clash with
versions present in other artifacts, and can't be managed with Maven
mechanisms.

I am not sure why hive-exec was not published normally, with just its
own classes. That's why it was copied, into an artifact with just
hive-exec code.

You could do the same thing for hive-exec 0.13.1.
Or maybe someone knows that it's published more 'normally' now.
I don't think hive-metastore is related to this question?

I am no expert on the Hive artifacts, just remembering what the issue
was initially in case it helps you get to a similar solution.

On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
 hive-exec (as of 0.13.1) is published here:
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar

 Should a JIRA be opened so that dependency on hive-metastore can be
 replaced by dependency on hive-exec ?

 Cheers


 On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote:

 The reason for org.spark-project.hive is that Spark relies on
 hive-exec, but the Hive project does not publish this artifact by
 itself, only with all its dependencies as an uber jar. Maybe that's
 been improved. If so, you need to point at the new hive-exec and
 perhaps sort out its dependencies manually in your build.

 On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
  I found 0.13.1 artifacts in maven:
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
 
  However, Spark uses groupId of org.spark-project.hive, not
 org.apache.hive
 
  Can someone tell me how it is supposed to work ?
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com
 wrote:
 
  I saw a note earlier, perhaps on the user list, that at least one
 person is
  using Hive 0.13. Anyone got a working build configuration for this
 version
  of Hive?
 
  Regards,
  - Steve
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or
 entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the
 reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still
uber jar.

Right now I am facing the following error building against Hive 0.13.1 :

[ERROR] Failed to execute goal on project spark-hive_2.10: Could not
resolve dependencies for project
org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
artifacts could not be resolved:
org.spark-project.hive:hive-metastore:jar:0.13.1,
org.spark-project.hive:hive-exec:jar:0.13.1,
org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
org.spark-project.hive:hive-metastore:jar:0.13.1 in
http://repo.maven.apache.org/maven2 was cached in the local repository,
resolution will not be reattempted until the update interval of maven-repo
has elapsed or updates are forced - [Help 1]

Some hint would be appreciated.

Cheers


On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:

 Yes, it is published. As of previous versions, at least, hive-exec
 included all of its dependencies *in its artifact*, making it unusable
 as-is because it contained copies of dependencies that clash with
 versions present in other artifacts, and can't be managed with Maven
 mechanisms.

 I am not sure why hive-exec was not published normally, with just its
 own classes. That's why it was copied, into an artifact with just
 hive-exec code.

 You could do the same thing for hive-exec 0.13.1.
 Or maybe someone knows that it's published more 'normally' now.
 I don't think hive-metastore is related to this question?

 I am no expert on the Hive artifacts, just remembering what the issue
 was initially in case it helps you get to a similar solution.

 On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
  hive-exec (as of 0.13.1) is published here:
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
 
  Should a JIRA be opened so that dependency on hive-metastore can be
  replaced by dependency on hive-exec ?
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote:
 
  The reason for org.spark-project.hive is that Spark relies on
  hive-exec, but the Hive project does not publish this artifact by
  itself, only with all its dependencies as an uber jar. Maybe that's
  been improved. If so, you need to point at the new hive-exec and
  perhaps sort out its dependencies manually in your build.
 
  On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
   I found 0.13.1 artifacts in maven:
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
  
   However, Spark uses groupId of org.spark-project.hive, not
  org.apache.hive
  
   Can someone tell me how it is supposed to work ?
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com
  wrote:
  
   I saw a note earlier, perhaps on the user list, that at least one
  person is
   using Hive 0.13. Anyone got a working build configuration for this
  version
   of Hive?
  
   Regards,
   - Steve
  
  
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
  entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby
 notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
It would be great if the hive team can fix that issue. If not, we'll
have to continue forking our own version of Hive to change the way it
publishes artifacts.

- Patrick

On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
 Talked with Owen offline. He confirmed that as of 0.13, hive-exec is still
 uber jar.

 Right now I am facing the following error building against Hive 0.13.1 :

 [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
 resolve dependencies for project
 org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
 artifacts could not be resolved:
 org.spark-project.hive:hive-metastore:jar:0.13.1,
 org.spark-project.hive:hive-exec:jar:0.13.1,
 org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
 org.spark-project.hive:hive-metastore:jar:0.13.1 in
 http://repo.maven.apache.org/maven2 was cached in the local repository,
 resolution will not be reattempted until the update interval of maven-repo
 has elapsed or updates are forced - [Help 1]

 Some hint would be appreciated.

 Cheers


 On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:

 Yes, it is published. As of previous versions, at least, hive-exec
 included all of its dependencies *in its artifact*, making it unusable
 as-is because it contained copies of dependencies that clash with
 versions present in other artifacts, and can't be managed with Maven
 mechanisms.

 I am not sure why hive-exec was not published normally, with just its
 own classes. That's why it was copied, into an artifact with just
 hive-exec code.

 You could do the same thing for hive-exec 0.13.1.
 Or maybe someone knows that it's published more 'normally' now.
 I don't think hive-metastore is related to this question?

 I am no expert on the Hive artifacts, just remembering what the issue
 was initially in case it helps you get to a similar solution.

 On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
  hive-exec (as of 0.13.1) is published here:
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
 
  Should a JIRA be opened so that dependency on hive-metastore can be
  replaced by dependency on hive-exec ?
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com wrote:
 
  The reason for org.spark-project.hive is that Spark relies on
  hive-exec, but the Hive project does not publish this artifact by
  itself, only with all its dependencies as an uber jar. Maybe that's
  been improved. If so, you need to point at the new hive-exec and
  perhaps sort out its dependencies manually in your build.
 
  On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
   I found 0.13.1 artifacts in maven:
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
  
   However, Spark uses groupId of org.spark-project.hive, not
  org.apache.hive
  
   Can someone tell me how it is supposed to work ?
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez snu...@hortonworks.com
  wrote:
  
   I saw a note earlier, perhaps on the user list, that at least one
  person is
   using Hive 0.13. Anyone got a working build configuration for this
  version
   of Hive?
  
   Regards,
   - Steve
  
  
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
  entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby
 notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
Owen helped me find this:
https://issues.apache.org/jira/browse/HIVE-7423

I guess this means that for Hive 0.14, Spark should be able to directly
pull in hive-exec-core.jar

Cheers


On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote:

 It would be great if the hive team can fix that issue. If not, we'll
 have to continue forking our own version of Hive to change the way it
 publishes artifacts.

 - Patrick

 On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
 still
  uber jar.
 
  Right now I am facing the following error building against Hive 0.13.1 :
 
  [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
  resolve dependencies for project
  org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
  artifacts could not be resolved:
  org.spark-project.hive:hive-metastore:jar:0.13.1,
  org.spark-project.hive:hive-exec:jar:0.13.1,
  org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
  org.spark-project.hive:hive-metastore:jar:0.13.1 in
  http://repo.maven.apache.org/maven2 was cached in the local repository,
  resolution will not be reattempted until the update interval of
 maven-repo
  has elapsed or updates are forced - [Help 1]
 
  Some hint would be appreciated.
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:
 
  Yes, it is published. As of previous versions, at least, hive-exec
  included all of its dependencies *in its artifact*, making it unusable
  as-is because it contained copies of dependencies that clash with
  versions present in other artifacts, and can't be managed with Maven
  mechanisms.
 
  I am not sure why hive-exec was not published normally, with just its
  own classes. That's why it was copied, into an artifact with just
  hive-exec code.
 
  You could do the same thing for hive-exec 0.13.1.
  Or maybe someone knows that it's published more 'normally' now.
  I don't think hive-metastore is related to this question?
 
  I am no expert on the Hive artifacts, just remembering what the issue
  was initially in case it helps you get to a similar solution.
 
  On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
   hive-exec (as of 0.13.1) is published here:
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
  
   Should a JIRA be opened so that dependency on hive-metastore can be
   replaced by dependency on hive-exec ?
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
 wrote:
  
   The reason for org.spark-project.hive is that Spark relies on
   hive-exec, but the Hive project does not publish this artifact by
   itself, only with all its dependencies as an uber jar. Maybe that's
   been improved. If so, you need to point at the new hive-exec and
   perhaps sort out its dependencies manually in your build.
  
   On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
I found 0.13.1 artifacts in maven:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
   
However, Spark uses groupId of org.spark-project.hive, not
   org.apache.hive
   
Can someone tell me how it is supposed to work ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
 snu...@hortonworks.com
   wrote:
   
I saw a note earlier, perhaps on the user list, that at least one
   person is
using Hive 0.13. Anyone got a working build configuration for this
   version
of Hive?
   
Regards,
- Steve
   
   
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
   entity to
which it is addressed and may contain information that is
  confidential,
privileged and exempt from disclosure under applicable law. If the
   reader
of this message is not the intended recipient, you are hereby
  notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you
 have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Mark Hamstra
Where and how is that fork being maintained?  I'm not seeing an obviously
correct branch or tag in the main asf hive repo  github mirror.


On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote:

 It would be great if the hive team can fix that issue. If not, we'll
 have to continue forking our own version of Hive to change the way it
 publishes artifacts.

 - Patrick

 On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
 still
  uber jar.
 
  Right now I am facing the following error building against Hive 0.13.1 :
 
  [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
  resolve dependencies for project
  org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
  artifacts could not be resolved:
  org.spark-project.hive:hive-metastore:jar:0.13.1,
  org.spark-project.hive:hive-exec:jar:0.13.1,
  org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
  org.spark-project.hive:hive-metastore:jar:0.13.1 in
  http://repo.maven.apache.org/maven2 was cached in the local repository,
  resolution will not be reattempted until the update interval of
 maven-repo
  has elapsed or updates are forced - [Help 1]
 
  Some hint would be appreciated.
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:
 
  Yes, it is published. As of previous versions, at least, hive-exec
  included all of its dependencies *in its artifact*, making it unusable
  as-is because it contained copies of dependencies that clash with
  versions present in other artifacts, and can't be managed with Maven
  mechanisms.
 
  I am not sure why hive-exec was not published normally, with just its
  own classes. That's why it was copied, into an artifact with just
  hive-exec code.
 
  You could do the same thing for hive-exec 0.13.1.
  Or maybe someone knows that it's published more 'normally' now.
  I don't think hive-metastore is related to this question?
 
  I am no expert on the Hive artifacts, just remembering what the issue
  was initially in case it helps you get to a similar solution.
 
  On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
   hive-exec (as of 0.13.1) is published here:
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
  
   Should a JIRA be opened so that dependency on hive-metastore can be
   replaced by dependency on hive-exec ?
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
 wrote:
  
   The reason for org.spark-project.hive is that Spark relies on
   hive-exec, but the Hive project does not publish this artifact by
   itself, only with all its dependencies as an uber jar. Maybe that's
   been improved. If so, you need to point at the new hive-exec and
   perhaps sort out its dependencies manually in your build.
  
   On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
I found 0.13.1 artifacts in maven:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
   
However, Spark uses groupId of org.spark-project.hive, not
   org.apache.hive
   
Can someone tell me how it is supposed to work ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
 snu...@hortonworks.com
   wrote:
   
I saw a note earlier, perhaps on the user list, that at least one
   person is
using Hive 0.13. Anyone got a working build configuration for this
   version
of Hive?
   
Regards,
- Steve
   
   
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
   entity to
which it is addressed and may contain information that is
  confidential,
privileged and exempt from disclosure under applicable law. If the
   reader
of this message is not the intended recipient, you are hereby
  notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you
 have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
Yeah so we need a model for this (Mark - do you have any ideas?). I
did this in a personal github repo. I just did it quickly because
dependency issues were blocking the 1.0 release:

https://github.com/pwendell/hive/tree/branch-0.12-shaded-protobuf

I think what we want is to have a semi official github repo with an
index to each of the shaded dependencies and what version is included
in which branch.

- Patrick

On Mon, Jul 28, 2014 at 10:02 AM, Mark Hamstra m...@clearstorydata.com wrote:
 Where and how is that fork being maintained?  I'm not seeing an obviously
 correct branch or tag in the main asf hive repo  github mirror.


 On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com wrote:

 It would be great if the hive team can fix that issue. If not, we'll
 have to continue forking our own version of Hive to change the way it
 publishes artifacts.

 - Patrick

 On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
  Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
 still
  uber jar.
 
  Right now I am facing the following error building against Hive 0.13.1 :
 
  [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
  resolve dependencies for project
  org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
  artifacts could not be resolved:
  org.spark-project.hive:hive-metastore:jar:0.13.1,
  org.spark-project.hive:hive-exec:jar:0.13.1,
  org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
  org.spark-project.hive:hive-metastore:jar:0.13.1 in
  http://repo.maven.apache.org/maven2 was cached in the local repository,
  resolution will not be reattempted until the update interval of
 maven-repo
  has elapsed or updates are forced - [Help 1]
 
  Some hint would be appreciated.
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:
 
  Yes, it is published. As of previous versions, at least, hive-exec
  included all of its dependencies *in its artifact*, making it unusable
  as-is because it contained copies of dependencies that clash with
  versions present in other artifacts, and can't be managed with Maven
  mechanisms.
 
  I am not sure why hive-exec was not published normally, with just its
  own classes. That's why it was copied, into an artifact with just
  hive-exec code.
 
  You could do the same thing for hive-exec 0.13.1.
  Or maybe someone knows that it's published more 'normally' now.
  I don't think hive-metastore is related to this question?
 
  I am no expert on the Hive artifacts, just remembering what the issue
  was initially in case it helps you get to a similar solution.
 
  On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
   hive-exec (as of 0.13.1) is published here:
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
  
   Should a JIRA be opened so that dependency on hive-metastore can be
   replaced by dependency on hive-exec ?
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
 wrote:
  
   The reason for org.spark-project.hive is that Spark relies on
   hive-exec, but the Hive project does not publish this artifact by
   itself, only with all its dependencies as an uber jar. Maybe that's
   been improved. If so, you need to point at the new hive-exec and
   perhaps sort out its dependencies manually in your build.
  
   On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com wrote:
I found 0.13.1 artifacts in maven:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
   
However, Spark uses groupId of org.spark-project.hive, not
   org.apache.hive
   
Can someone tell me how it is supposed to work ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
 snu...@hortonworks.com
   wrote:
   
I saw a note earlier, perhaps on the user list, that at least one
   person is
using Hive 0.13. Anyone got a working build configuration for this
   version
of Hive?
   
Regards,
- Steve
   
   
   
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or
   entity to
which it is addressed and may contain information that is
  confidential,
privileged and exempt from disclosure under applicable law. If the
   reader
of this message is not the intended recipient, you are hereby
  notified
   that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you
 have
received this communication in error, please contact the sender
   immediately
and delete it from your system. Thank You.
   
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Cheng Lian
AFAIK, according a recent talk, Hulu team in China has built Spark SQL
against Hive 0.13 (or 0.13.1?) successfully. Basically they also
re-packaged Hive 0.13 as what the Spark team did. The slides of the talk
hasn't been released yet though.


On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote:

 Owen helped me find this:
 https://issues.apache.org/jira/browse/HIVE-7423

 I guess this means that for Hive 0.14, Spark should be able to directly
 pull in hive-exec-core.jar

 Cheers


 On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  It would be great if the hive team can fix that issue. If not, we'll
  have to continue forking our own version of Hive to change the way it
  publishes artifacts.
 
  - Patrick
 
  On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
   Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
  still
   uber jar.
  
   Right now I am facing the following error building against Hive 0.13.1
 :
  
   [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
   resolve dependencies for project
   org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
   artifacts could not be resolved:
   org.spark-project.hive:hive-metastore:jar:0.13.1,
   org.spark-project.hive:hive-exec:jar:0.13.1,
   org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
   org.spark-project.hive:hive-metastore:jar:0.13.1 in
   http://repo.maven.apache.org/maven2 was cached in the local
 repository,
   resolution will not be reattempted until the update interval of
  maven-repo
   has elapsed or updates are forced - [Help 1]
  
   Some hint would be appreciated.
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:
  
   Yes, it is published. As of previous versions, at least, hive-exec
   included all of its dependencies *in its artifact*, making it unusable
   as-is because it contained copies of dependencies that clash with
   versions present in other artifacts, and can't be managed with Maven
   mechanisms.
  
   I am not sure why hive-exec was not published normally, with just its
   own classes. That's why it was copied, into an artifact with just
   hive-exec code.
  
   You could do the same thing for hive-exec 0.13.1.
   Or maybe someone knows that it's published more 'normally' now.
   I don't think hive-metastore is related to this question?
  
   I am no expert on the Hive artifacts, just remembering what the issue
   was initially in case it helps you get to a similar solution.
  
   On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
hive-exec (as of 0.13.1) is published here:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
   
Should a JIRA be opened so that dependency on hive-metastore can be
replaced by dependency on hive-exec ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
  wrote:
   
The reason for org.spark-project.hive is that Spark relies on
hive-exec, but the Hive project does not publish this artifact by
itself, only with all its dependencies as an uber jar. Maybe that's
been improved. If so, you need to point at the new hive-exec and
perhaps sort out its dependencies manually in your build.
   
On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 I found 0.13.1 artifacts in maven:

   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar

 However, Spark uses groupId of org.spark-project.hive, not
org.apache.hive

 Can someone tell me how it is supposed to work ?

 Cheers


 On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
  snu...@hortonworks.com
wrote:

 I saw a note earlier, perhaps on the user list, that at least
 one
person is
 using Hive 0.13. Anyone got a working build configuration for
 this
version
 of Hive?

 Regards,
 - Steve



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual
 or
entity to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If
 the
reader
 of this message is not the intended recipient, you are hereby
   notified
that
 any printing, copying, dissemination, distribution, disclosure
 or
 forwarding of this communication is strictly prohibited. If you
  have
 received this communication in error, please contact the sender
immediately
 and delete it from your system. Thank You.

   
  
 



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Patrick Wendell
I've heard from Cloudera that there were hive internal changes between
0.12 and 0.13 that required code re-writing. Over time it might be
possible for us to integrate with hive using API's that are more
stable (this is the domain of Michael/Cheng/Yin more than me!). It
would be interesting to see what the Hulu folks did.

- Patrick

On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com wrote:
 AFAIK, according a recent talk, Hulu team in China has built Spark SQL
 against Hive 0.13 (or 0.13.1?) successfully. Basically they also
 re-packaged Hive 0.13 as what the Spark team did. The slides of the talk
 hasn't been released yet though.


 On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote:

 Owen helped me find this:
 https://issues.apache.org/jira/browse/HIVE-7423

 I guess this means that for Hive 0.14, Spark should be able to directly
 pull in hive-exec-core.jar

 Cheers


 On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  It would be great if the hive team can fix that issue. If not, we'll
  have to continue forking our own version of Hive to change the way it
  publishes artifacts.
 
  - Patrick
 
  On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
   Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
  still
   uber jar.
  
   Right now I am facing the following error building against Hive 0.13.1
 :
  
   [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
   resolve dependencies for project
   org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
   artifacts could not be resolved:
   org.spark-project.hive:hive-metastore:jar:0.13.1,
   org.spark-project.hive:hive-exec:jar:0.13.1,
   org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
   org.spark-project.hive:hive-metastore:jar:0.13.1 in
   http://repo.maven.apache.org/maven2 was cached in the local
 repository,
   resolution will not be reattempted until the update interval of
  maven-repo
   has elapsed or updates are forced - [Help 1]
  
   Some hint would be appreciated.
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com wrote:
  
   Yes, it is published. As of previous versions, at least, hive-exec
   included all of its dependencies *in its artifact*, making it unusable
   as-is because it contained copies of dependencies that clash with
   versions present in other artifacts, and can't be managed with Maven
   mechanisms.
  
   I am not sure why hive-exec was not published normally, with just its
   own classes. That's why it was copied, into an artifact with just
   hive-exec code.
  
   You could do the same thing for hive-exec 0.13.1.
   Or maybe someone knows that it's published more 'normally' now.
   I don't think hive-metastore is related to this question?
  
   I am no expert on the Hive artifacts, just remembering what the issue
   was initially in case it helps you get to a similar solution.
  
   On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
hive-exec (as of 0.13.1) is published here:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
   
Should a JIRA be opened so that dependency on hive-metastore can be
replaced by dependency on hive-exec ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
  wrote:
   
The reason for org.spark-project.hive is that Spark relies on
hive-exec, but the Hive project does not publish this artifact by
itself, only with all its dependencies as an uber jar. Maybe that's
been improved. If so, you need to point at the new hive-exec and
perhaps sort out its dependencies manually in your build.
   
On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 I found 0.13.1 artifacts in maven:

   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar

 However, Spark uses groupId of org.spark-project.hive, not
org.apache.hive

 Can someone tell me how it is supposed to work ?

 Cheers


 On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
  snu...@hortonworks.com
wrote:

 I saw a note earlier, perhaps on the user list, that at least
 one
person is
 using Hive 0.13. Anyone got a working build configuration for
 this
version
 of Hive?

 Regards,
 - Steve



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual
 or
entity to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If
 the
reader
 of this message is not the intended recipient, you are hereby
   notified
that
 any printing, copying, dissemination, distribution, disclosure
 or
 forwarding of this communication is strictly prohibited. If you
  have

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Mark Hamstra
Getting and maintaining our own branch in the main asf hive repo is a
non-starter or isn't workable?


On Mon, Jul 28, 2014 at 10:17 AM, Patrick Wendell pwend...@gmail.com
wrote:

 Yeah so we need a model for this (Mark - do you have any ideas?). I
 did this in a personal github repo. I just did it quickly because
 dependency issues were blocking the 1.0 release:

 https://github.com/pwendell/hive/tree/branch-0.12-shaded-protobuf

 I think what we want is to have a semi official github repo with an
 index to each of the shaded dependencies and what version is included
 in which branch.

 - Patrick

 On Mon, Jul 28, 2014 at 10:02 AM, Mark Hamstra m...@clearstorydata.com
 wrote:
  Where and how is that fork being maintained?  I'm not seeing an obviously
  correct branch or tag in the main asf hive repo  github mirror.
 
 
  On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  It would be great if the hive team can fix that issue. If not, we'll
  have to continue forking our own version of Hive to change the way it
  publishes artifacts.
 
  - Patrick
 
  On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
   Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
  still
   uber jar.
  
   Right now I am facing the following error building against Hive
 0.13.1 :
  
   [ERROR] Failed to execute goal on project spark-hive_2.10: Could not
   resolve dependencies for project
   org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
   artifacts could not be resolved:
   org.spark-project.hive:hive-metastore:jar:0.13.1,
   org.spark-project.hive:hive-exec:jar:0.13.1,
   org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
   org.spark-project.hive:hive-metastore:jar:0.13.1 in
   http://repo.maven.apache.org/maven2 was cached in the local
 repository,
   resolution will not be reattempted until the update interval of
  maven-repo
   has elapsed or updates are forced - [Help 1]
  
   Some hint would be appreciated.
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com
 wrote:
  
   Yes, it is published. As of previous versions, at least, hive-exec
   included all of its dependencies *in its artifact*, making it
 unusable
   as-is because it contained copies of dependencies that clash with
   versions present in other artifacts, and can't be managed with Maven
   mechanisms.
  
   I am not sure why hive-exec was not published normally, with just its
   own classes. That's why it was copied, into an artifact with just
   hive-exec code.
  
   You could do the same thing for hive-exec 0.13.1.
   Or maybe someone knows that it's published more 'normally' now.
   I don't think hive-metastore is related to this question?
  
   I am no expert on the Hive artifacts, just remembering what the issue
   was initially in case it helps you get to a similar solution.
  
   On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com wrote:
hive-exec (as of 0.13.1) is published here:
   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar
   
Should a JIRA be opened so that dependency on hive-metastore can be
replaced by dependency on hive-exec ?
   
Cheers
   
   
On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
  wrote:
   
The reason for org.spark-project.hive is that Spark relies on
hive-exec, but the Hive project does not publish this artifact by
itself, only with all its dependencies as an uber jar. Maybe
 that's
been improved. If so, you need to point at the new hive-exec and
perhaps sort out its dependencies manually in your build.
   
On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 I found 0.13.1 artifacts in maven:

   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar

 However, Spark uses groupId of org.spark-project.hive, not
org.apache.hive

 Can someone tell me how it is supposed to work ?

 Cheers


 On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
  snu...@hortonworks.com
wrote:

 I saw a note earlier, perhaps on the user list, that at least
 one
person is
 using Hive 0.13. Anyone got a working build configuration for
 this
version
 of Hive?

 Regards,
 - Steve



 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual
 or
entity to
 which it is addressed and may contain information that is
   confidential,
 privileged and exempt from disclosure under applicable law. If
 the
reader
 of this message is not the intended recipient, you are hereby
   notified
that
 any printing, copying, dissemination, distribution, disclosure
 or
 forwarding of this communication is strictly prohibited. If you
  have
 received this communication in error, please contact the sender

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Cheng Lian
Exactly, forgot to mention Hulu team also made changes to cope with those
incompatibility issues, but they said that’s relatively easy once the
re-packaging work is done.


On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com wrote:

 I've heard from Cloudera that there were hive internal changes between
 0.12 and 0.13 that required code re-writing. Over time it might be
 possible for us to integrate with hive using API's that are more
 stable (this is the domain of Michael/Cheng/Yin more than me!). It
 would be interesting to see what the Hulu folks did.

 - Patrick

 On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com
 wrote:
  AFAIK, according a recent talk, Hulu team in China has built Spark SQL
  against Hive 0.13 (or 0.13.1?) successfully. Basically they also
  re-packaged Hive 0.13 as what the Spark team did. The slides of the talk
  hasn't been released yet though.
 
 
  On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote:
 
  Owen helped me find this:
  https://issues.apache.org/jira/browse/HIVE-7423
 
  I guess this means that for Hive 0.14, Spark should be able to directly
  pull in hive-exec-core.jar
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
   It would be great if the hive team can fix that issue. If not, we'll
   have to continue forking our own version of Hive to change the way it
   publishes artifacts.
  
   - Patrick
  
   On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com wrote:
Talked with Owen offline. He confirmed that as of 0.13, hive-exec is
   still
uber jar.
   
Right now I am facing the following error building against Hive
 0.13.1
  :
   
[ERROR] Failed to execute goal on project spark-hive_2.10: Could not
resolve dependencies for project
org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The following
artifacts could not be resolved:
org.spark-project.hive:hive-metastore:jar:0.13.1,
org.spark-project.hive:hive-exec:jar:0.13.1,
org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
org.spark-project.hive:hive-metastore:jar:0.13.1 in
http://repo.maven.apache.org/maven2 was cached in the local
  repository,
resolution will not be reattempted until the update interval of
   maven-repo
has elapsed or updates are forced - [Help 1]
   
Some hint would be appreciated.
   
Cheers
   
   
On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com
 wrote:
   
Yes, it is published. As of previous versions, at least, hive-exec
included all of its dependencies *in its artifact*, making it
 unusable
as-is because it contained copies of dependencies that clash with
versions present in other artifacts, and can't be managed with
 Maven
mechanisms.
   
I am not sure why hive-exec was not published normally, with just
 its
own classes. That's why it was copied, into an artifact with just
hive-exec code.
   
You could do the same thing for hive-exec 0.13.1.
Or maybe someone knows that it's published more 'normally' now.
I don't think hive-metastore is related to this question?
   
I am no expert on the Hive artifacts, just remembering what the
 issue
was initially in case it helps you get to a similar solution.
   
On Mon, Jul 28, 2014 at 4:47 PM, Ted Yu yuzhih...@gmail.com
 wrote:
 hive-exec (as of 0.13.1) is published here:

   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-exec%7C0.13.1%7Cjar

 Should a JIRA be opened so that dependency on hive-metastore can
 be
 replaced by dependency on hive-exec ?

 Cheers


 On Mon, Jul 28, 2014 at 8:26 AM, Sean Owen so...@cloudera.com
   wrote:

 The reason for org.spark-project.hive is that Spark relies on
 hive-exec, but the Hive project does not publish this artifact
 by
 itself, only with all its dependencies as an uber jar. Maybe
 that's
 been improved. If so, you need to point at the new hive-exec and
 perhaps sort out its dependencies manually in your build.

 On Mon, Jul 28, 2014 at 4:01 PM, Ted Yu yuzhih...@gmail.com
  wrote:
  I found 0.13.1 artifacts in maven:
 

   
  
 
 http://search.maven.org/#artifactdetails%7Corg.apache.hive%7Chive-metastore%7C0.13.1%7Cjar
 
  However, Spark uses groupId of org.spark-project.hive, not
 org.apache.hive
 
  Can someone tell me how it is supposed to work ?
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 7:44 AM, Steve Nunez 
   snu...@hortonworks.com
 wrote:
 
  I saw a note earlier, perhaps on the user list, that at least
  one
 person is
  using Hive 0.13. Anyone got a working build configuration for
  this
 version
  of Hive?
 
  Regards,
  - Steve
 
 
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the
 individual
  or
 entity 

Re: VertexPartition and ShippableVertexPartition

2014-07-28 Thread Ankur Dave
On Mon, Jul 28, 2014 at 4:29 AM, Larry Xiao xia...@sjtu.edu.cn wrote:

 On 7/28/14, 3:41 PM, shijiaxin wrote:

 There is a VertexPartition in the EdgePartition,which is created by

 EdgePartitionBuilder.toEdgePartition.

 and There is also a ShippableVertexPartition in the VertexRDD.

 These two Partitions have a lot of common things like index, data and

 Bitset, why is this necessary?



There is a VertexPartition in the EdgePartition,which is created by

Is the VertexPartition in the EdgePartition, the Mirror Cache part?


Yes, exactly. The primary copy of each vertex is stored in the VertexRDD
using the index, values, and mask data structures, which together form a
hash map. In addition, each partition of the VertexRDD stores the
corresponding partition of the routing table to facilitate joining with the
edges. The ShippableVertexPartition class encapsulates the vertex hash map
along with a RoutingTablePartition.

After joining the vertices with the edges, the edge partitions cache their
adjacent vertices in the mirror cache. They use the VertexPartition for
this, which provides only the hash map functionality and not the routing
table.

Ankur http://www.ankurdave.com/


'Proper' Build Tool

2014-07-28 Thread Steve Nunez
Gents,

It seem that until recently, building via sbt was a documented process in
the 0.9 overview:

http://spark.apache.org/docs/0.9.0/

The section on building mentions using sbt/sbt assembly. However in the
latest overview:

http://spark.apache.org/docs/latest/index.html

There¹s no mention of building with sbt.

What¹s the recommended way to build? What are most people using in their
daily workflow?

Cheers,
- SteveN





-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: 'Proper' Build Tool

2014-07-28 Thread Stephen Boesch
Hi Steve,
  I had the opportunity to ask this question at the Summit to Andrew Orr.
 He mentioned that with 1.0 the recommended build tool is with maven. sbt
is however still supported. You will notice that the dependencies are now
completely handled within the maven pom.xml:  the SparkBuild.scala /sbt
reads the dependencies from the pom.xml.

Andrew further suggested to look at the make-distribution.sh to see the
recommended way to create builds.  Using mvn on the command line is fine
- but the aforementioned script provides a framework /guideline to set
things up properly.




2014-07-28 13:06 GMT-07:00 Steve Nunez snu...@hortonworks.com:

 Gents,

 It seem that until recently, building via sbt was a documented process in
 the 0.9 overview:

 http://spark.apache.org/docs/0.9.0/

 The section on building mentions using sbt/sbt assembly. However in the
 latest overview:

 http://spark.apache.org/docs/latest/index.html

 There¹s no mention of building with sbt.

 What¹s the recommended way to build? What are most people using in their
 daily workflow?

 Cheers,
 - SteveN





 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Re: Can I translate the documentations of Spark in Japanese?

2014-07-28 Thread giwa
Hi Yu,

I could help translating Spark documentation to Japanese. Please let me
know if you need.

Best,

Ken


On Mon, Jul 28, 2014 at 1:03 AM, Yu Ishikawa [via Apache Spark Developers
List] ml-node+s1001551n7546...@n3.nabble.com wrote:

 Hello Patrick,

 Thank you for your replying.
 I checked some other projects in terms of i18n of documentations.

 For example, the documentations of the Apache HTTP server project are
 supported i18n natively.
 https://github.com/apache/httpd/blob/trunk/docs%2Fmanual%2Findex.html

 But it seems that the Chinese documentations of Apache HBase are only
 linked from the top page of HBase.
 From:http://hbase.apache.org/
 To: http://abloz.com/hbase/book.html

 I think that it is currently difficult to support i18n in Apache Spark
 documentations.
 I suggest that I will translate the documentations in Japanese in Github
 page unofficially.
 If possible, would you please link from translated documentations to
 Apache Spark documentation.

 Regards,
 Yu

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-translate-the-documentations-of-Spark-in-Japanese-tp7538p7546.html
  To start a new topic under Apache Spark Developers List, email
 ml-node+s1001551n1...@n3.nabble.com
 To unsubscribe from Apache Spark Developers List, click here
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=dWd3LmdpLndvcmxkQGdtYWlsLmNvbXwxfC0zMTQ3MDY5ODA=
 .
 NAML
 http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 

Kenichi Takagiwa
-
Keio University
Graduate School of Science and Technology
Department of Open and Environmental Systems
Faculty of Computer Science
Hiroaki Nishi Laboratory
Email: ugw.gi.wo...@gmail.com
Phone: +81-50-3575-6586




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Can-I-translate-the-documentations-of-Spark-in-Japanese-tp7538p7570.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

Re: 'Proper' Build Tool

2014-07-28 Thread Patrick Wendell
Yeah for packagers we officially recommend using maven. Spark's
dependency graph is very complicated and Maven and SBT use different
conflict resolution strategies, so we've opted to official support
Maven.

SBT is still around though and it's used more often by day-to-day developers.

- Patrick


Re: package/assemble with local spark

2014-07-28 Thread Reynold Xin
You can use publish-local in sbt.

If you want to be more careful, you can give Spark a different version
number and use that version number in your app.



On Mon, Jul 28, 2014 at 4:33 AM, Larry Xiao xia...@sjtu.edu.cn wrote:

 Hi,

 How do you package an app with modified spark?

 In seems sbt would resolve the dependencies, and use the official spark
 release.

 Thank you!

 Larry



Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
After manually copying hive 0.13.1 jars to local maven repo, I got the
following errors when building spark-hive_2.10 module :

[ERROR]
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182:
type mismatch;
 found   : String
 required: Array[String]
[ERROR]   val proc: CommandProcessor =
CommandProcessorFactory.get(tokens(0), hiveconf)
[ERROR]
 ^
[ERROR]
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60:
value getAllPartitionsForPruner is not a member of org.apache.
 hadoop.hive.ql.metadata.Hive
[ERROR] client.getAllPartitionsForPruner(table).toSeq
[ERROR]^
[ERROR]
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267:
overloaded method constructor TableDesc with alternatives:
  (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
and
  ()org.apache.hadoop.hive.ql.plan.TableDesc
 cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer],
Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in
value tableDesc)(in   value tableDesc)], java.util.Properties)
[ERROR]   val tableDesc = new TableDesc(
[ERROR]   ^
[WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing
with a stub.
[WARNING] Class org.antlr.runtime.Token not found - continuing with a stub.
[WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a
stub.
[ERROR]
 while compiling:
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
during phase: typer
 library version: version 2.10.4
compiler version: version 2.10.4

The above shows incompatible changes between 0.12 and 0.13.1
e.g. the first error corresponds to the following method
in CommandProcessorFactory :
  public static CommandProcessor get(String[] cmd, HiveConf conf)

Cheers


On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com wrote:

 So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding
 the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s
 a release date schedule for 0.14.



 On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:

 Exactly, forgot to mention Hulu team also made changes to cope with those
 incompatibility issues, but they said that¹s relatively easy once the
 re-packaging work is done.
 
 
 On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 
  I've heard from Cloudera that there were hive internal changes between
  0.12 and 0.13 that required code re-writing. Over time it might be
  possible for us to integrate with hive using API's that are more
  stable (this is the domain of Michael/Cheng/Yin more than me!). It
  would be interesting to see what the Hulu folks did.
 
  - Patrick
 
  On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com
  wrote:
   AFAIK, according a recent talk, Hulu team in China has built Spark SQL
   against Hive 0.13 (or 0.13.1?) successfully. Basically they also
   re-packaged Hive 0.13 as what the Spark team did. The slides of the
 talk
   hasn't been released yet though.
  
  
   On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote:
  
   Owen helped me find this:
   https://issues.apache.org/jira/browse/HIVE-7423
  
   I guess this means that for Hive 0.14, Spark should be able to
 directly
   pull in hive-exec-core.jar
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
It would be great if the hive team can fix that issue. If not,
 we'll
have to continue forking our own version of Hive to change the way
 it
publishes artifacts.
   
- Patrick
   
On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com
 wrote:
 Talked with Owen offline. He confirmed that as of 0.13,
 hive-exec is
still
 uber jar.

 Right now I am facing the following error building against Hive
  0.13.1
   :

 [ERROR] Failed to execute goal on project spark-hive_2.10: Could
 not
 resolve dependencies for project
 org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The
 following
 artifacts could not be resolved:
 org.spark-project.hive:hive-metastore:jar:0.13.1,
 org.spark-project.hive:hive-exec:jar:0.13.1,
 org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
 org.spark-project.hive:hive-metastore:jar:0.13.1 in
 http://repo.maven.apache.org/maven2 was cached in the local
   repository,
 resolution will not be reattempted until the update interval of
maven-repo
 has elapsed or updates are forced - [Help 1]

 Some hint would be appreciated.

 Cheers


 On Mon, Jul 28, 2014 at 9:15 AM, Sean Owen so...@cloudera.com
  wrote:

 Yes, it is published. As of previous versions, at least,
 hive-exec
 included all of its 

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
I was looking for a class where reflection-related code should reside.

I found this but don't think it is the proper class for bridging
differences between hive 0.12 and 0.13.1:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala

Cheers


On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:

 After manually copying hive 0.13.1 jars to local maven repo, I got the
 following errors when building spark-hive_2.10 module :

 [ERROR]
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182:
 type mismatch;
  found   : String
  required: Array[String]
 [ERROR]   val proc: CommandProcessor =
 CommandProcessorFactory.get(tokens(0), hiveconf)
 [ERROR]
^
 [ERROR]
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60:
 value getAllPartitionsForPruner is not a member of org.apache.
  hadoop.hive.ql.metadata.Hive
 [ERROR] client.getAllPartitionsForPruner(table).toSeq
 [ERROR]^
 [ERROR]
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267:
 overloaded method constructor TableDesc with alternatives:
   (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
 Class[_],x$3: java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
 and
   ()org.apache.hadoop.hive.ql.plan.TableDesc
  cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer],
 Class[(some other)?0(in value tableDesc)(in value tableDesc)], Class[?0(in
 value tableDesc)(in   value tableDesc)], java.util.Properties)
 [ERROR]   val tableDesc = new TableDesc(
 [ERROR]   ^
 [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing
 with a stub.
 [WARNING] Class org.antlr.runtime.Token not found - continuing with a stub.
 [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a
 stub.
 [ERROR]
  while compiling:
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
 during phase: typer
  library version: version 2.10.4
 compiler version: version 2.10.4

 The above shows incompatible changes between 0.12 and 0.13.1
 e.g. the first error corresponds to the following method
 in CommandProcessorFactory :
   public static CommandProcessor get(String[] cmd, HiveConf conf)

 Cheers


 On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com
 wrote:

 So, do we have a short-term fix until Hive 0.14 comes out? Perhaps adding
 the hive-exec jar to the spark-project repo? It doesn¹t look like there¹s
 a release date schedule for 0.14.



 On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:

 Exactly, forgot to mention Hulu team also made changes to cope with those
 incompatibility issues, but they said that¹s relatively easy once the
 re-packaging work is done.
 
 
 On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com

 wrote:
 
  I've heard from Cloudera that there were hive internal changes between
  0.12 and 0.13 that required code re-writing. Over time it might be
  possible for us to integrate with hive using API's that are more
  stable (this is the domain of Michael/Cheng/Yin more than me!). It
  would be interesting to see what the Hulu folks did.
 
  - Patrick
 
  On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com
  wrote:
   AFAIK, according a recent talk, Hulu team in China has built Spark
 SQL
   against Hive 0.13 (or 0.13.1?) successfully. Basically they also
   re-packaged Hive 0.13 as what the Spark team did. The slides of the
 talk
   hasn't been released yet though.
  
  
   On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com wrote:
  
   Owen helped me find this:
   https://issues.apache.org/jira/browse/HIVE-7423
  
   I guess this means that for Hive 0.14, Spark should be able to
 directly
   pull in hive-exec-core.jar
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
  
It would be great if the hive team can fix that issue. If not,
 we'll
have to continue forking our own version of Hive to change the way
 it
publishes artifacts.
   
- Patrick
   
On Mon, Jul 28, 2014 at 9:34 AM, Ted Yu yuzhih...@gmail.com
 wrote:
 Talked with Owen offline. He confirmed that as of 0.13,
 hive-exec is
still
 uber jar.

 Right now I am facing the following error building against Hive
  0.13.1
   :

 [ERROR] Failed to execute goal on project spark-hive_2.10: Could
 not
 resolve dependencies for project
 org.apache.spark:spark-hive_2.10:jar:1.1.0-SNAPSHOT: The
 following
 artifacts could not be resolved:
 org.spark-project.hive:hive-metastore:jar:0.13.1,
 org.spark-project.hive:hive-exec:jar:0.13.1,
 org.spark-project.hive:hive-serde:jar:0.13.1: Failure to find
 org.spark-project.hive:hive-metastore:jar:0.13.1 in
 http://repo.maven.apache.org/maven2 was cached in the local
   

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Michael Armbrust
A few things:
 - When we upgrade to Hive 0.13.0, Patrick will likely republish the
hive-exec jar just as we did for 0.12.0
 - Since we have to tie into some pretty low level APIs it is unsurprising
that the code doesn't just compile out of the box against 0.13.0
 - ScalaReflection is for determining Schema from Scala classes, not
reflection based bridge code.  Either way its unclear to if there is any
reason to use reflection to support multiple versions, instead of just
upgrading to Hive 0.13.0

One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
it purely because you are having problems connecting to newer metastores?
 Are there some features you are hoping for?  This will help me prioritize
this effort.

Michael


On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:

 I was looking for a class where reflection-related code should reside.

 I found this but don't think it is the proper class for bridging
 differences between hive 0.12 and 0.13.1:

 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala

 Cheers


 On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:

  After manually copying hive 0.13.1 jars to local maven repo, I got the
  following errors when building spark-hive_2.10 module :
 
  [ERROR]
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182:
  type mismatch;
   found   : String
   required: Array[String]
  [ERROR]   val proc: CommandProcessor =
  CommandProcessorFactory.get(tokens(0), hiveconf)
  [ERROR]
 ^
  [ERROR]
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60:
  value getAllPartitionsForPruner is not a member of org.apache.
   hadoop.hive.ql.metadata.Hive
  [ERROR] client.getAllPartitionsForPruner(table).toSeq
  [ERROR]^
  [ERROR]
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267:
  overloaded method constructor TableDesc with alternatives:
(x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
  Class[_],x$3:
 java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
  and
()org.apache.hadoop.hive.ql.plan.TableDesc
   cannot be applied to (Class[org.apache.hadoop.hive.serde2.Deserializer],
  Class[(some other)?0(in value tableDesc)(in value tableDesc)],
 Class[?0(in
  value tableDesc)(in   value tableDesc)], java.util.Properties)
  [ERROR]   val tableDesc = new TableDesc(
  [ERROR]   ^
  [WARNING] Class org.antlr.runtime.tree.CommonTree not found - continuing
  with a stub.
  [WARNING] Class org.antlr.runtime.Token not found - continuing with a
 stub.
  [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing with a
  stub.
  [ERROR]
   while compiling:
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
  during phase: typer
   library version: version 2.10.4
  compiler version: version 2.10.4
 
  The above shows incompatible changes between 0.12 and 0.13.1
  e.g. the first error corresponds to the following method
  in CommandProcessorFactory :
public static CommandProcessor get(String[] cmd, HiveConf conf)
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com
  wrote:
 
  So, do we have a short-term fix until Hive 0.14 comes out? Perhaps
 adding
  the hive-exec jar to the spark-project repo? It doesn¹t look like
 there¹s
  a release date schedule for 0.14.
 
 
 
  On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:
 
  Exactly, forgot to mention Hulu team also made changes to cope with
 those
  incompatibility issues, but they said that¹s relatively easy once the
  re-packaging work is done.
  
  
  On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com
 
  wrote:
  
   I've heard from Cloudera that there were hive internal changes
 between
   0.12 and 0.13 that required code re-writing. Over time it might be
   possible for us to integrate with hive using API's that are more
   stable (this is the domain of Michael/Cheng/Yin more than me!). It
   would be interesting to see what the Hulu folks did.
  
   - Patrick
  
   On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian lian.cs@gmail.com
   wrote:
AFAIK, according a recent talk, Hulu team in China has built Spark
  SQL
against Hive 0.13 (or 0.13.1?) successfully. Basically they also
re-packaged Hive 0.13 as what the Spark team did. The slides of the
  talk
hasn't been released yet though.
   
   
On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com
 wrote:
   
Owen helped me find this:
https://issues.apache.org/jira/browse/HIVE-7423
   
I guess this means that for Hive 0.14, Spark should be able to
  directly
pull in hive-exec-core.jar
   
Cheers
   
   
On Mon, Jul 28, 2014 at 9:55 AM, Patrick Wendell 
  pwend...@gmail.com
wrote:
   
 It would be great if the hive team can fix 

Re: Working Formula for Hive 0.13?

2014-07-28 Thread Steve Nunez
The larger goal is to get a clean compile  test in the environment I have
to use. As near as I can tell, tests fail in parquet because parquet was
only added in Hive 0.13. There could well be issues in later meta-stores,
but one thing at a time...

- SteveN



On 7/28/14, 17:22, Michael Armbrust mich...@databricks.com wrote:

A few things:
 - When we upgrade to Hive 0.13.0, Patrick will likely republish the
hive-exec jar just as we did for 0.12.0
 - Since we have to tie into some pretty low level APIs it is unsurprising
that the code doesn't just compile out of the box against 0.13.0
 - ScalaReflection is for determining Schema from Scala classes, not
reflection based bridge code.  Either way its unclear to if there is any
reason to use reflection to support multiple versions, instead of just
upgrading to Hive 0.13.0

One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
it purely because you are having problems connecting to newer metastores?
 Are there some features you are hoping for?  This will help me prioritize
this effort.

Michael


On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:

 I was looking for a class where reflection-related code should reside.

 I found this but don't think it is the proper class for bridging
 differences between hive 0.12 and 0.13.1:

 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection
.scala

 Cheers


 On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:

  After manually copying hive 0.13.1 jars to local maven repo, I got the
  following errors when building spark-hive_2.10 module :
 
  [ERROR]
 
 
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveCon
text.scala:182:
  type mismatch;
   found   : String
   required: Array[String]
  [ERROR]   val proc: CommandProcessor =
  CommandProcessorFactory.get(tokens(0), hiveconf)
  [ERROR]
 ^
  [ERROR]
 
 
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMet
astoreCatalog.scala:60:
  value getAllPartitionsForPruner is not a member of org.apache.
   hadoop.hive.ql.metadata.Hive
  [ERROR] client.getAllPartitionsForPruner(table).toSeq
  [ERROR]^
  [ERROR]
 
 
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMet
astoreCatalog.scala:267:
  overloaded method constructor TableDesc with alternatives:
(x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
  Class[_],x$3:
 java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
  and
()org.apache.hadoop.hive.ql.plan.TableDesc
   cannot be applied to
(Class[org.apache.hadoop.hive.serde2.Deserializer],
  Class[(some other)?0(in value tableDesc)(in value tableDesc)],
 Class[?0(in
  value tableDesc)(in   value tableDesc)], java.util.Properties)
  [ERROR]   val tableDesc = new TableDesc(
  [ERROR]   ^
  [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
continuing
  with a stub.
  [WARNING] Class org.antlr.runtime.Token not found - continuing with a
 stub.
  [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing
with a
  stub.
  [ERROR]
   while compiling:
 
 
/homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.
scala
  during phase: typer
   library version: version 2.10.4
  compiler version: version 2.10.4
 
  The above shows incompatible changes between 0.12 and 0.13.1
  e.g. the first error corresponds to the following method
  in CommandProcessorFactory :
public static CommandProcessor get(String[] cmd, HiveConf conf)
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com
  wrote:
 
  So, do we have a short-term fix until Hive 0.14 comes out? Perhaps
 adding
  the hive-exec jar to the spark-project repo? It doesn¹t look like
 there¹s
  a release date schedule for 0.14.
 
 
 
  On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:
 
  Exactly, forgot to mention Hulu team also made changes to cope with
 those
  incompatibility issues, but they said that¹s relatively easy once
the
  re-packaging work is done.
  
  
  On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell
pwend...@gmail.com
 
  wrote:
  
   I've heard from Cloudera that there were hive internal changes
 between
   0.12 and 0.13 that required code re-writing. Over time it might be
   possible for us to integrate with hive using API's that are more
   stable (this is the domain of Michael/Cheng/Yin more than me!). It
   would be interesting to see what the Hulu folks did.
  
   - Patrick
  
   On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian
lian.cs@gmail.com
   wrote:
AFAIK, according a recent talk, Hulu team in China has built
Spark
  SQL
against Hive 0.13 (or 0.13.1?) successfully. Basically they also
re-packaged Hive 0.13 as what the Spark team did. The slides of
the
  talk
hasn't been released yet though.
   
   
On Tue, Jul 29, 2014 at 1:01 AM, Ted Yu yuzhih...@gmail.com
 wrote:
   
Owen 

Github mirroring is running behind

2014-07-28 Thread Patrick Wendell
https://issues.apache.org/jira/browse/INFRA-8116

Just a heads up, the github mirroring is running behind. You can
follow that JIRA to keep up to date on the fix.

In the mean time you can use the Apache git itself:

https://git-wip-us.apache.org/repos/asf/spark.git

Some people have reported issues checking out Apache git as well, but
it might work.

- Patrick


Re: Working Formula for Hive 0.13?

2014-07-28 Thread Ted Yu
bq. Either way its unclear to if there is any reason to use reflection to
support multiple versions, instead of just upgrading to Hive 0.13.0

Which Spark release would this Hive upgrade take place ?
I agree it is cleaner to upgrade Hive dependency vs. introducing reflection.

Cheers


On Mon, Jul 28, 2014 at 5:22 PM, Michael Armbrust mich...@databricks.com
wrote:

 A few things:
  - When we upgrade to Hive 0.13.0, Patrick will likely republish the
 hive-exec jar just as we did for 0.12.0
  - Since we have to tie into some pretty low level APIs it is unsurprising
 that the code doesn't just compile out of the box against 0.13.0
  - ScalaReflection is for determining Schema from Scala classes, not
 reflection based bridge code.  Either way its unclear to if there is any
 reason to use reflection to support multiple versions, instead of just
 upgrading to Hive 0.13.0

 One question I have is, What is the goal of upgrading to hive 0.13.0?  Is
 it purely because you are having problems connecting to newer metastores?
  Are there some features you are hoping for?  This will help me prioritize
 this effort.

 Michael


 On Mon, Jul 28, 2014 at 4:05 PM, Ted Yu yuzhih...@gmail.com wrote:

  I was looking for a class where reflection-related code should reside.
 
  I found this but don't think it is the proper class for bridging
  differences between hive 0.12 and 0.13.1:
 
 
 sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
 
  Cheers
 
 
  On Mon, Jul 28, 2014 at 3:41 PM, Ted Yu yuzhih...@gmail.com wrote:
 
   After manually copying hive 0.13.1 jars to local maven repo, I got the
   following errors when building spark-hive_2.10 module :
  
   [ERROR]
  
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala:182:
   type mismatch;
found   : String
required: Array[String]
   [ERROR]   val proc: CommandProcessor =
   CommandProcessorFactory.get(tokens(0), hiveconf)
   [ERROR]
  ^
   [ERROR]
  
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:60:
   value getAllPartitionsForPruner is not a member of org.apache.
hadoop.hive.ql.metadata.Hive
   [ERROR] client.getAllPartitionsForPruner(table).toSeq
   [ERROR]^
   [ERROR]
  
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala:267:
   overloaded method constructor TableDesc with alternatives:
 (x$1: Class[_ : org.apache.hadoop.mapred.InputFormat[_, _]],x$2:
   Class[_],x$3:
  java.util.Properties)org.apache.hadoop.hive.ql.plan.TableDesc
   and
 ()org.apache.hadoop.hive.ql.plan.TableDesc
cannot be applied to
 (Class[org.apache.hadoop.hive.serde2.Deserializer],
   Class[(some other)?0(in value tableDesc)(in value tableDesc)],
  Class[?0(in
   value tableDesc)(in   value tableDesc)], java.util.Properties)
   [ERROR]   val tableDesc = new TableDesc(
   [ERROR]   ^
   [WARNING] Class org.antlr.runtime.tree.CommonTree not found -
 continuing
   with a stub.
   [WARNING] Class org.antlr.runtime.Token not found - continuing with a
  stub.
   [WARNING] Class org.antlr.runtime.tree.Tree not found - continuing
 with a
   stub.
   [ERROR]
while compiling:
  
 
 /homes/xx/spark/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
   during phase: typer
library version: version 2.10.4
   compiler version: version 2.10.4
  
   The above shows incompatible changes between 0.12 and 0.13.1
   e.g. the first error corresponds to the following method
   in CommandProcessorFactory :
 public static CommandProcessor get(String[] cmd, HiveConf conf)
  
   Cheers
  
  
   On Mon, Jul 28, 2014 at 1:32 PM, Steve Nunez snu...@hortonworks.com
   wrote:
  
   So, do we have a short-term fix until Hive 0.14 comes out? Perhaps
  adding
   the hive-exec jar to the spark-project repo? It doesn¹t look like
  there¹s
   a release date schedule for 0.14.
  
  
  
   On 7/28/14, 10:50, Cheng Lian lian.cs@gmail.com wrote:
  
   Exactly, forgot to mention Hulu team also made changes to cope with
  those
   incompatibility issues, but they said that¹s relatively easy once the
   re-packaging work is done.
   
   
   On Tue, Jul 29, 2014 at 1:20 AM, Patrick Wendell pwend...@gmail.com
 
  
   wrote:
   
I've heard from Cloudera that there were hive internal changes
  between
0.12 and 0.13 that required code re-writing. Over time it might be
possible for us to integrate with hive using API's that are more
stable (this is the domain of Michael/Cheng/Yin more than me!). It
would be interesting to see what the Hulu folks did.
   
- Patrick
   
On Mon, Jul 28, 2014 at 10:16 AM, Cheng Lian 
 lian.cs@gmail.com
wrote:
 AFAIK, according a recent talk, Hulu team in China has built
 Spark
   SQL
 against Hive 0.13 (or 0.13.1?) successfully. Basically they also
 re-packaged Hive 0.13 as what the Spark team 

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Andrew Or
+1 Tested on standalone and yarn clusters


2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com:

 Let me add my vote as well.
 Did some basic tests by running simple projects with various Spark
 modules. Tested checksums.

 +1

 On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  +1
 
  Tested this on Mac OS X.
 
  Matei
 
  On Jul 25, 2014, at 4:08 PM, Tathagata Das tathagata.das1...@gmail.com
 wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version 1.0.2.
 
  This release fixes a number of bugs in Spark 1.0.1.
  Some of the notable ones are
  - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
  SPARK-1199. The fix was reverted for 1.0.2.
  - SPARK-2576: NoClassDefFoundError when executing Spark QL query on
  HDFS CSV file.
  The full list is at http://s.apache.org/9NJ
 
  The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f
 
  The release files, including signatures, digests, etc can be found at:
  http://people.apache.org/~tdas/spark-1.0.2-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/tdas.asc
 
  The staging repository for this release can be found at:
  https://repository.apache.org/content/repositories/orgapachespark-1024/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.0.2!
 
  The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
  a majority of at least 3 +1 PMC votes are cast.
  [ ] +1 Release this package as Apache Spark 1.0.2
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 



Re: on shark, is tachyon less efficient than memory_only cache strategy ?

2014-07-28 Thread qingyang li
hi, haoyuan, thanks for replying.


2014-07-21 16:29 GMT+08:00 Haoyuan Li haoyuan...@gmail.com:

 Qingyang,

 Aha. Got it.

 800MB data is pretty small. Loading from Tachyon does have a bit of extra
 overhead. But it will have more benefit when the data size is larger. Also,
 if you store the table in Tachyon, you can have different shark servers to
 query the data at the same time. For more trade-off, please refer to this
 page: http://tachyon-project.org/Running-Shark-on-Tachyon.html

 Best,

 Haoyuan


 On Wed, Jul 16, 2014 at 12:06 AM, qingyang li liqingyang1...@gmail.com
 wrote:

  let's me describe my scene:
  --
  i have 8 machines (24 core , 16G memory, per machine) of spark cluster
 and
  tachyon cluster.  On tachyon,  I create one table which contains 800M
 data,
  when i run query sql on shark,   it will cost 2.43s,  but when i create
 the
  same table on spark memory , i run  the same sql , it will cost 1.56s.
   data on tachyon cost more time than data on spark memory.   they all
 have
  150 map process,  and per node 16-20 map process.
  I think the reason is that when data is on tachyon, shark will let spark
  slave load data from tachyon salve which is on the same node with tachyon
  slave,
  i have tried to set some configuration to tune shark and tachyon, but
 still
  can not make the former more fast than 2.43s.
  do anyone have some ideas ?
 
  By the way ,  my tachyon block size is 1GB now,  i want to reset block
 size
  ,  will it work by setting tachyon.user.default.block.size.byte=8M ?  if
  not,  what does tachyon.user.default.block.size.byte mean?
 
 
  2014-07-14 13:13 GMT+08:00 qingyang li liqingyang1...@gmail.com:
 
   Shark,  thanks for replying.
   Let's me clear my question again.
   --
   i create a table using  create table xxx1
   tblproperties(shark.cache=tachyon) as select * from xxx2
   when excuting some sql (for example , select * from xxx1) using shark,
shark will read data into shark's memory  from tachyon's memory.
   I think if each time we execute sql, shark always load data from
 tachyon,
   it is less effient.
   could we use some cache policy (such as,  CacheAllPolicy
 FIFOCachePolicy
   LRUCachePolicy ) to cache data to invoid reading data from tachyon for
   each sql query?
   --
  
  
  
   2014-07-14 2:47 GMT+08:00 Haoyuan Li haoyuan...@gmail.com:
  
   Qingyang,
  
   Are you asking Spark or Shark (The first email was Shark, the last
  email
   was Spark.)?
  
   Best,
  
   Haoyuan
  
  
   On Wed, Jul 9, 2014 at 7:40 PM, qingyang li liqingyang1...@gmail.com
 
   wrote:
  
could i set some cache policy to let spark load data from tachyon
 only
   one
time for all sql query?  for example by using CacheAllPolicy
FIFOCachePolicy LRUCachePolicy.  But I have tried that three policy,
   they
are not useful.
I think , if spark always load data for each sql query,  it will
  impact
   the
query speed , it will take more time than the case that data are
   managed by
spark itself.
   
   
   
   
2014-07-09 1:19 GMT+08:00 Haoyuan Li haoyuan...@gmail.com:
   
 Yes. For Shark, two modes, shark.cache=tachyon and
shark.cache=memory,
 have the same ser/de overhead. Shark loads data from outsize of
 the
process
 in Tachyon mode with the following benefits:


- In-memory data sharing across multiple Shark instances (i.e.
stronger
isolation)
- Instant recovery of in-memory tables
- Reduce heap size = faster GC in shark
- If the table is larger than the memory size, only the hot
  columns
will
be cached in memory

 from
  http://tachyon-project.org/master/Running-Shark-on-Tachyon.html
   and
 https://github.com/amplab/shark/wiki/Running-Shark-with-Tachyon

 Haoyuan


 On Tue, Jul 8, 2014 at 9:58 AM, Aaron Davidson 
 ilike...@gmail.com
wrote:

  Shark's in-memory format is already serialized (it's compressed
  and
  column-based).
 
 
  On Tue, Jul 8, 2014 at 9:50 AM, Mridul Muralidharan 
   mri...@gmail.com
  wrote:
 
   You are ignoring serde costs :-)
  
   - Mridul
  
   On Tue, Jul 8, 2014 at 8:48 PM, Aaron Davidson 
   ilike...@gmail.com
  wrote:
Tachyon should only be marginally less performant than
   memory_only,
   because
we mmap the data from Tachyon's ramdisk. We do not have to,
  say,
  transfer
the data over a pipe from Tachyon; we can directly read from
  the
  buffers
   in
the same way that Shark reads from its in-memory columnar
   format.
   
   
   
On Tue, Jul 8, 2014 at 1:18 AM, qingyang li 
 liqingyang1...@gmail.com
wrote:
   
hi, when i create a table, i can point the cache strategy
  using
shark.cache,
i think 

Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Mubarak Seyed
+1 (non-binding)

Tested this on Mac OS X.


On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or and...@databricks.com wrote:

 +1 Tested on standalone and yarn clusters


 2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com:

  Let me add my vote as well.
  Did some basic tests by running simple projects with various Spark
  modules. Tested checksums.
 
  +1
 
  On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
   +1
  
   Tested this on Mac OS X.
  
   Matei
  
   On Jul 25, 2014, at 4:08 PM, Tathagata Das 
 tathagata.das1...@gmail.com
  wrote:
  
   Please vote on releasing the following candidate as Apache Spark
  version 1.0.2.
  
   This release fixes a number of bugs in Spark 1.0.1.
   Some of the notable ones are
   - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
   SPARK-1199. The fix was reverted for 1.0.2.
   - SPARK-2576: NoClassDefFoundError when executing Spark QL query on
   HDFS CSV file.
   The full list is at http://s.apache.org/9NJ
  
   The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f
  
   The release files, including signatures, digests, etc can be found at:
   http://people.apache.org/~tdas/spark-1.0.2-rc1/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/tdas.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-1024/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/
  
   Please vote on releasing this package as Apache Spark 1.0.2!
  
   The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
   a majority of at least 3 +1 PMC votes are cast.
   [ ] +1 Release this package as Apache Spark 1.0.2
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
 



Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Xiangrui Meng
+1

Tested basic spark-shell and pyspark operations and MLlib examples on a Mac.

On Mon, Jul 28, 2014 at 8:29 PM, Mubarak Seyed spark.devu...@gmail.com wrote:
 +1 (non-binding)

 Tested this on Mac OS X.


 On Mon, Jul 28, 2014 at 6:52 PM, Andrew Or and...@databricks.com wrote:

 +1 Tested on standalone and yarn clusters


 2014-07-28 14:59 GMT-07:00 Tathagata Das tathagata.das1...@gmail.com:

  Let me add my vote as well.
  Did some basic tests by running simple projects with various Spark
  modules. Tested checksums.
 
  +1
 
  On Sun, Jul 27, 2014 at 4:52 PM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
   +1
  
   Tested this on Mac OS X.
  
   Matei
  
   On Jul 25, 2014, at 4:08 PM, Tathagata Das 
 tathagata.das1...@gmail.com
  wrote:
  
   Please vote on releasing the following candidate as Apache Spark
  version 1.0.2.
  
   This release fixes a number of bugs in Spark 1.0.1.
   Some of the notable ones are
   - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
   SPARK-1199. The fix was reverted for 1.0.2.
   - SPARK-2576: NoClassDefFoundError when executing Spark QL query on
   HDFS CSV file.
   The full list is at http://s.apache.org/9NJ
  
   The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
  
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f
  
   The release files, including signatures, digests, etc can be found at:
   http://people.apache.org/~tdas/spark-1.0.2-rc1/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/tdas.asc
  
   The staging repository for this release can be found at:
  
 https://repository.apache.org/content/repositories/orgapachespark-1024/
  
   The documentation corresponding to this release can be found at:
   http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/
  
   Please vote on releasing this package as Apache Spark 1.0.2!
  
   The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
   a majority of at least 3 +1 PMC votes are cast.
   [ ] +1 Release this package as Apache Spark 1.0.2
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
 



Re: [VOTE] Release Apache Spark 1.0.2 (RC1)

2014-07-28 Thread Henry Saputra
NOTICE and LICENSE files look good
Hashes and sigs look good
No executable in the source distribution
Compile source and run standalone

+1

- Henry

On Fri, Jul 25, 2014 at 4:08 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.0.2.

 This release fixes a number of bugs in Spark 1.0.1.
 Some of the notable ones are
 - SPARK-2452: Known issue is Spark 1.0.1 caused by attempted fix for
 SPARK-1199. The fix was reverted for 1.0.2.
 - SPARK-2576: NoClassDefFoundError when executing Spark QL query on
 HDFS CSV file.
 The full list is at http://s.apache.org/9NJ

 The tag to be voted on is v1.0.2-rc1 (commit 8fb6f00e):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=8fb6f00e195fb258f3f70f04756e07c259a2351f

 The release files, including signatures, digests, etc can be found at:
 http://people.apache.org/~tdas/spark-1.0.2-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/tdas.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1024/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~tdas/spark-1.0.2-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.0.2!

 The vote is open until Tuesday, July 29, at 23:00 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.
 [ ] +1 Release this package as Apache Spark 1.0.2
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/


Re: Github mirroring is running behind

2014-07-28 Thread Reynold Xin
Hi devs,

I don't know if this is going to help, but if you can watch  vote on the
ticket, it might help ASF INFRA prioritize and triage it faster:
https://issues.apache.org/jira/browse/INFRA-8116

Please do. Thanks!



On Mon, Jul 28, 2014 at 5:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 https://issues.apache.org/jira/browse/INFRA-8116

 Just a heads up, the github mirroring is running behind. You can
 follow that JIRA to keep up to date on the fix.

 In the mean time you can use the Apache git itself:

 https://git-wip-us.apache.org/repos/asf/spark.git

 Some people have reported issues checking out Apache git as well, but
 it might work.

 - Patrick