Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Sean Owen
+1 I tested the source and Hadoop 2.4 release. Checksums and
signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
fail any more than usual.

FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
project and have encountered no problems.


I notice that the 1.1.0 release removes the CDH4-specific build, but
adds two MapR-specific builds. Compare with
https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
commented on the commit:
https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc

I'm in favor of removing all vendor-specific builds. This change
*looks* a bit funny as there was no JIRA (?) and appears to swap one
vendor for another. Of course there's nothing untoward going on, but
what was the reasoning? It's best avoided, and MapR already
distributes Spark just fine, no?

This is a gray area with ASF projects. I mention it as well because it
came up with Apache Flink recently
(http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
Another vendor rightly noted this could look like favoritism. They
changed to remove vendor releases.

On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0!

 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1029/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.1.0!

 The vote is open until Monday, September 01, at 03:11 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == Regressions fixed since RC1 ==
 LZ4 compression issue: https://issues.apache.org/jira/browse/SPARK-3277

 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.0.X will not block
 this release.

 == What default changes should I be aware of? ==
 1. The default value of spark.io.compression.codec is now snappy
 -- Old behavior can be restored by switching to lzf

 2. PySpark now performs external spilling during aggregations.
 -- Old behavior can be restored by setting spark.shuffle.spill to false.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
Hey Sean,

The reason there are no longer CDH-specific builds is that all newer
versions of CDH and HDP work with builds for the upstream Hadoop
projects. I dropped CDH4 in favor of a  newer Hadoop version (2.4) and
the Hadoop-without-Hive (also 2.4) build.

For MapR - we can't officially post those artifacts on ASF web space
when we make the final release, we can only link to them as being
hosted by MapR specifically since they use non-compatible licenses.
However, I felt that providing these during a testing period was
alright, with the goal of increasing test coverage. I couldn't find
any policy against posting these on personal web space during RC
voting. However, we can remove them if there is one.

Dropping CDH4 was more because it is now pretty old, but we can add it
back if people want. The binary packaging is a slightly separate
question from release votes, so I can always add more binary packages
whenever. And on this, my main concern is covering the most popular
Hadoop versions to lower the bar for users to build and test Spark.

- Patrick

On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote:
 +1 I tested the source and Hadoop 2.4 release. Checksums and
 signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
 fail any more than usual.

 FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
 project and have encountered no problems.


 I notice that the 1.1.0 release removes the CDH4-specific build, but
 adds two MapR-specific builds. Compare with
 https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
 commented on the commit:
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc

 I'm in favor of removing all vendor-specific builds. This change
 *looks* a bit funny as there was no JIRA (?) and appears to swap one
 vendor for another. Of course there's nothing untoward going on, but
 what was the reasoning? It's best avoided, and MapR already
 distributes Spark just fine, no?

 This is a gray area with ASF projects. I mention it as well because it
 came up with Apache Flink recently
 (http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
 Another vendor rightly noted this could look like favoritism. They
 changed to remove vendor releases.

 On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0!

 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1029/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.1.0!

 The vote is open until Monday, September 01, at 03:11 UTC and passes if
 a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.1.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == Regressions fixed since RC1 ==
 LZ4 compression issue: https://issues.apache.org/jira/browse/SPARK-3277

 == What justifies a -1 vote for this release? ==
 This vote is happening very late into the QA period compared with
 previous votes, so -1 votes should only occur for significant
 regressions from 1.0.2. Bugs already present in 1.0.X will not block
 this release.

 == What default changes should I be aware of? ==
 1. The default value of spark.io.compression.codec is now snappy
 -- Old behavior can be restored by switching to lzf

 2. PySpark now performs external spilling during aggregations.
 -- Old behavior can be restored by setting spark.shuffle.spill to false.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Sean Owen
(Copying my reply since I don't know if it goes to the mailing list)

Great, thanks for explaining the reasoning. You're saying these aren't
going into the final release? I think that moots any issue surrounding
distributing them then.

This is all I know of from the ASF:
https://community.apache.org/projectIndependence.html I don't read it
as expressly forbidding this kind of thing although you can see how it
bumps up against the spirit. There's not a bright line -- what about
Tomcat providing binaries compiled for Windows for example? does that
favor an OS vendor?

From this technical ASF perspective only the releases matter -- do
what you want with snapshots and RCs. The only issue there is maybe
releasing something different than was in the RC; is that at all
confusing? Just needs a note.

I think this theoretical issue doesn't exist if these binaries aren't
released, so I see no reason to not proceed.

The rest is a different question about whether you want to spend time
maintaining this profile and candidate. The vendor already manages
their build I think and -- and I don't know -- may even prefer not to
have a different special build floating around. There's also the
theoretical argument that this turns off other vendors from adopting
Spark if it's perceived to be too connected to other vendors. I'd like
to maximize Spark's distribution and there's some argument you do this
by not making vendor profiles. But as I say a different question to
just think about over time...

(oh and PS for my part I think it's a good thing that CDH4 binaries
were removed. I wasn't arguing for resurrecting them)

On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean,

 The reason there are no longer CDH-specific builds is that all newer
 versions of CDH and HDP work with builds for the upstream Hadoop
 projects. I dropped CDH4 in favor of a  newer Hadoop version (2.4) and
 the Hadoop-without-Hive (also 2.4) build.

 For MapR - we can't officially post those artifacts on ASF web space
 when we make the final release, we can only link to them as being
 hosted by MapR specifically since they use non-compatible licenses.
 However, I felt that providing these during a testing period was
 alright, with the goal of increasing test coverage. I couldn't find
 any policy against posting these on personal web space during RC
 voting. However, we can remove them if there is one.

 Dropping CDH4 was more because it is now pretty old, but we can add it
 back if people want. The binary packaging is a slightly separate
 question from release votes, so I can always add more binary packages
 whenever. And on this, my main concern is covering the most popular
 Hadoop versions to lower the bar for users to build and test Spark.

 - Patrick

 On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote:
 +1 I tested the source and Hadoop 2.4 release. Checksums and
 signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
 fail any more than usual.

 FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
 project and have encountered no problems.


 I notice that the 1.1.0 release removes the CDH4-specific build, but
 adds two MapR-specific builds. Compare with
 https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
 commented on the commit:
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc

 I'm in favor of removing all vendor-specific builds. This change
 *looks* a bit funny as there was no JIRA (?) and appears to swap one
 vendor for another. Of course there's nothing untoward going on, but
 what was the reasoning? It's best avoided, and MapR already
 distributes Spark just fine, no?

 This is a gray area with ASF projects. I mention it as well because it
 came up with Apache Flink recently
 (http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
 Another vendor rightly noted this could look like favoritism. They
 changed to remove vendor releases.

 On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0!

 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1029/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.1.0-rc2-docs/

 Please vote on releasing this package as Apache Spark 1.1.0!

 The vote is open 

Re: [Spark SQL] off-heap columnar store

2014-08-29 Thread Evan Chan

 The reason I'm asking about the columnar compressed format is that
 there are some problems for which Parquet is not practical.


 Can you elaborate?

Sure.

- Organization or co has no Hadoop, but significant investment in some
other NoSQL store.
- Need to efficiently add a new column to existing data
- Need to mark some existing rows as deleted or replace small bits of
existing data

For these use cases, it would be much more efficient and practical if
we didn't have to take the origin of the data from the datastore,
convert it to Parquet first.  Doing so loses significant latency and
causes Ops headaches in having to maintain HDFS. It would be great
to be able to load data directly into the columnar format, into the
InMemoryColumnarCache.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Matei Zaharia
Personally I'd actually consider putting CDH4 back if there are still users on 
it. It's always better to be inclusive, and the convenience of a one-click 
download is high. Do we have a sense on what % of CDH users still use CDH4?

Matei

On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com) wrote:

(Copying my reply since I don't know if it goes to the mailing list) 

Great, thanks for explaining the reasoning. You're saying these aren't 
going into the final release? I think that moots any issue surrounding 
distributing them then. 

This is all I know of from the ASF: 
https://community.apache.org/projectIndependence.html I don't read it 
as expressly forbidding this kind of thing although you can see how it 
bumps up against the spirit. There's not a bright line -- what about 
Tomcat providing binaries compiled for Windows for example? does that 
favor an OS vendor? 

From this technical ASF perspective only the releases matter -- do 
what you want with snapshots and RCs. The only issue there is maybe 
releasing something different than was in the RC; is that at all 
confusing? Just needs a note. 

I think this theoretical issue doesn't exist if these binaries aren't 
released, so I see no reason to not proceed. 

The rest is a different question about whether you want to spend time 
maintaining this profile and candidate. The vendor already manages 
their build I think and -- and I don't know -- may even prefer not to 
have a different special build floating around. There's also the 
theoretical argument that this turns off other vendors from adopting 
Spark if it's perceived to be too connected to other vendors. I'd like 
to maximize Spark's distribution and there's some argument you do this 
by not making vendor profiles. But as I say a different question to 
just think about over time... 

(oh and PS for my part I think it's a good thing that CDH4 binaries 
were removed. I wasn't arguing for resurrecting them) 

On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com wrote: 
 Hey Sean, 
 
 The reason there are no longer CDH-specific builds is that all newer 
 versions of CDH and HDP work with builds for the upstream Hadoop 
 projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and 
 the Hadoop-without-Hive (also 2.4) build. 
 
 For MapR - we can't officially post those artifacts on ASF web space 
 when we make the final release, we can only link to them as being 
 hosted by MapR specifically since they use non-compatible licenses. 
 However, I felt that providing these during a testing period was 
 alright, with the goal of increasing test coverage. I couldn't find 
 any policy against posting these on personal web space during RC 
 voting. However, we can remove them if there is one. 
 
 Dropping CDH4 was more because it is now pretty old, but we can add it 
 back if people want. The binary packaging is a slightly separate 
 question from release votes, so I can always add more binary packages 
 whenever. And on this, my main concern is covering the most popular 
 Hadoop versions to lower the bar for users to build and test Spark. 
 
 - Patrick 
 
 On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote: 
 +1 I tested the source and Hadoop 2.4 release. Checksums and 
 signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't 
 fail any more than usual. 
 
 FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another 
 project and have encountered no problems. 
 
 
 I notice that the 1.1.0 release removes the CDH4-specific build, but 
 adds two MapR-specific builds. Compare with 
 https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I 
 commented on the commit: 
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
  
 
 I'm in favor of removing all vendor-specific builds. This change 
 *looks* a bit funny as there was no JIRA (?) and appears to swap one 
 vendor for another. Of course there's nothing untoward going on, but 
 what was the reasoning? It's best avoided, and MapR already 
 distributes Spark just fine, no? 
 
 This is a gray area with ASF projects. I mention it as well because it 
 came up with Apache Flink recently 
 (http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
  
 Another vendor rightly noted this could look like favoritism. They 
 changed to remove vendor releases. 
 
 On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com wrote: 
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0! 
 
 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=711aebb329ca28046396af1e34395a0df92b5327
  
 
 The release files, including signatures, digests, etc. can be found at: 
 http://people.apache.org/~pwendell/spark-1.1.0-rc2/ 
 
 Release artifacts are signed with the 

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
Yeah, we can't/won't post MapR binaries on the ASF web space for the
release. However, I have been linking to them (at their request) with
a clear identifier that it is an incompatible license and a 3rd party
build.

The only vendor specific build property we provide is compatibility
with different Hadoop FileSystem clients, since unfortunately there is
not a universally adopted client/server protocol. I think our goal has
always been to provide a path for using ASF Spark with
vendor-specific filesystems. Some vendors perform backports or
enhancements... and this of course we would never want to manage in
the upstream project.

In terms of vendor support for this approach - In the early days
Cloudera asked us to add CDH4 repository and more recently Pivotal and
MapR also asked us to allow linking against their hadoop-client
libraries. So we've added these based on direct requests from vendors.
Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
imagine ruffling feathers by supporting this. But if we get feedback
in that direction over time we can of course consider a different
approach.

- Patrick



On Thu, Aug 28, 2014 at 11:30 PM, Sean Owen so...@cloudera.com wrote:
 (Copying my reply since I don't know if it goes to the mailing list)

 Great, thanks for explaining the reasoning. You're saying these aren't
 going into the final release? I think that moots any issue surrounding
 distributing them then.

 This is all I know of from the ASF:
 https://community.apache.org/projectIndependence.html I don't read it
 as expressly forbidding this kind of thing although you can see how it
 bumps up against the spirit. There's not a bright line -- what about
 Tomcat providing binaries compiled for Windows for example? does that
 favor an OS vendor?

 From this technical ASF perspective only the releases matter -- do
 what you want with snapshots and RCs. The only issue there is maybe
 releasing something different than was in the RC; is that at all
 confusing? Just needs a note.

 I think this theoretical issue doesn't exist if these binaries aren't
 released, so I see no reason to not proceed.

 The rest is a different question about whether you want to spend time
 maintaining this profile and candidate. The vendor already manages
 their build I think and -- and I don't know -- may even prefer not to
 have a different special build floating around. There's also the
 theoretical argument that this turns off other vendors from adopting
 Spark if it's perceived to be too connected to other vendors. I'd like
 to maximize Spark's distribution and there's some argument you do this
 by not making vendor profiles. But as I say a different question to
 just think about over time...

 (oh and PS for my part I think it's a good thing that CDH4 binaries
 were removed. I wasn't arguing for resurrecting them)

 On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Sean,

 The reason there are no longer CDH-specific builds is that all newer
 versions of CDH and HDP work with builds for the upstream Hadoop
 projects. I dropped CDH4 in favor of a  newer Hadoop version (2.4) and
 the Hadoop-without-Hive (also 2.4) build.

 For MapR - we can't officially post those artifacts on ASF web space
 when we make the final release, we can only link to them as being
 hosted by MapR specifically since they use non-compatible licenses.
 However, I felt that providing these during a testing period was
 alright, with the goal of increasing test coverage. I couldn't find
 any policy against posting these on personal web space during RC
 voting. However, we can remove them if there is one.

 Dropping CDH4 was more because it is now pretty old, but we can add it
 back if people want. The binary packaging is a slightly separate
 question from release votes, so I can always add more binary packages
 whenever. And on this, my main concern is covering the most popular
 Hadoop versions to lower the bar for users to build and test Spark.

 - Patrick

 On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote:
 +1 I tested the source and Hadoop 2.4 release. Checksums and
 signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
 fail any more than usual.

 FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
 project and have encountered no problems.


 I notice that the 1.1.0 release removes the CDH4-specific build, but
 adds two MapR-specific builds. Compare with
 https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
 commented on the commit:
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc

 I'm in favor of removing all vendor-specific builds. This change
 *looks* a bit funny as there was no JIRA (?) and appears to swap one
 vendor for another. Of course there's nothing untoward going on, but
 what was the reasoning? It's best avoided, and MapR already
 distributes Spark just fine, no?

 This is a gray area with ASF projects. I 

RE: Working Formula for Hive 0.13?

2014-08-29 Thread Zhan Zhang
I have preliminary patch against spark1.0.2, which is attached to spark-2706.
Now I am working on supporting both hive-0.12 and hive-0.13.1 with
non-intrusive way (not breaking any existing hive-0.12 when introduce
supporting new version). I will attach a proposal to solve multi-version
support issue to spark-2706 soon.

Thanks.

Zhan Zhang



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Working-Formula-for-Hive-0-13-tp7551p8118.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Sean Owen
On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell pwend...@gmail.com wrote:
 In terms of vendor support for this approach - In the early days
 Cloudera asked us to add CDH4 repository and more recently Pivotal and
 MapR also asked us to allow linking against their hadoop-client
 libraries. So we've added these based on direct requests from vendors.
 Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
 imagine ruffling feathers by supporting this. But if we get feedback
 in that direction over time we can of course consider a different
 approach.

By this, you mean that it's easy to control the Hadoop version in the
build and set it to some other vendor-specific release? Yes that seems
ideal. Making the build flexible, and adding the repository references
to pom.xml is part of enabling that -- to me, no question that's good.

So you can always roll your own build for your cluster, if you need
to. I understand the role of the cdh4 / mapr3 / mapr4 binaries as just
a convenience.

But it's a convenience for people who...
- are installing Spark on a cluster (i.e. not an end user)
- that doesn't have it in their distro already
- whose distro isn't compatible with a plain vanilla Hadoop distro

That can't be many. CDH4.6+ is most of the installed CDH base and it
already has Spark. I thought MapR already had Spark built in. The
audience seems small enough, and the convenience relatively small
enough (is it hard to run the distribution script?) that it caused me
to ask whether it was worth bothering providing these, especially give
the possible ASF sensitivity.

I say crack on; you get my point.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Andrew Ash
FWIW we use CDH4 extensively and would very much appreciate having a
prebuilt version of Spark for it.

We're doing a CDH 4.4 to 4.7 upgrade across all the clusters now and have
plans for a 5.x transition after that.
On Aug 28, 2014 11:57 PM, Sean Owen so...@cloudera.com wrote:

 On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  In terms of vendor support for this approach - In the early days
  Cloudera asked us to add CDH4 repository and more recently Pivotal and
  MapR also asked us to allow linking against their hadoop-client
  libraries. So we've added these based on direct requests from vendors.
  Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
  imagine ruffling feathers by supporting this. But if we get feedback
  in that direction over time we can of course consider a different
  approach.

 By this, you mean that it's easy to control the Hadoop version in the
 build and set it to some other vendor-specific release? Yes that seems
 ideal. Making the build flexible, and adding the repository references
 to pom.xml is part of enabling that -- to me, no question that's good.

 So you can always roll your own build for your cluster, if you need
 to. I understand the role of the cdh4 / mapr3 / mapr4 binaries as just
 a convenience.

 But it's a convenience for people who...
 - are installing Spark on a cluster (i.e. not an end user)
 - that doesn't have it in their distro already
 - whose distro isn't compatible with a plain vanilla Hadoop distro

 That can't be many. CDH4.6+ is most of the installed CDH base and it
 already has Spark. I thought MapR already had Spark built in. The
 audience seems small enough, and the convenience relatively small
 enough (is it hard to run the distribution script?) that it caused me
 to ask whether it was worth bothering providing these, especially give
 the possible ASF sensitivity.

 I say crack on; you get my point.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Running Spark On Yarn without Spark-Submit

2014-08-29 Thread Archit Thakur
Hi,

My requirement is to run Spark on Yarn without using the script
spark-submit.

I have a servlet and a tomcat server. As and when request comes, it creates
a new SC and keeps it alive for the further requests, I ma setting my
master in sparkConf

as sparkConf.setMaster(yarn-cluster)

but the request is stuck indefinitely.

This works when I set
sparkConf.setMaster(yarn-client)

I am not sure, why is it not launching job in yarn-cluster mode.

Any thoughts?

Thanks and Regards,
Archit Thakur.


Re: Running Spark On Yarn without Spark-Submit

2014-08-29 Thread Archit Thakur
including u...@spark.apache.org.


On Fri, Aug 29, 2014 at 2:03 PM, Archit Thakur archit279tha...@gmail.com
wrote:

 Hi,

 My requirement is to run Spark on Yarn without using the script
 spark-submit.

 I have a servlet and a tomcat server. As and when request comes, it
 creates a new SC and keeps it alive for the further requests, I ma setting
 my master in sparkConf

 as sparkConf.setMaster(yarn-cluster)

 but the request is stuck indefinitely.

 This works when I set
 sparkConf.setMaster(yarn-client)

 I am not sure, why is it not launching job in yarn-cluster mode.

 Any thoughts?

 Thanks and Regards,
 Archit Thakur.






Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Koert Kuipers
i suspect there are more cdh4 than cdh5 clusters. most people plan to move
to cdh5 within say 6 months.


On Fri, Aug 29, 2014 at 3:57 AM, Andrew Ash and...@andrewash.com wrote:

 FWIW we use CDH4 extensively and would very much appreciate having a
 prebuilt version of Spark for it.

 We're doing a CDH 4.4 to 4.7 upgrade across all the clusters now and have
 plans for a 5.x transition after that.
 On Aug 28, 2014 11:57 PM, Sean Owen so...@cloudera.com wrote:

  On Fri, Aug 29, 2014 at 7:42 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   In terms of vendor support for this approach - In the early days
   Cloudera asked us to add CDH4 repository and more recently Pivotal and
   MapR also asked us to allow linking against their hadoop-client
   libraries. So we've added these based on direct requests from vendors.
   Given the ubiquity of the Hadoop FileSystem API, it's hard for me to
   imagine ruffling feathers by supporting this. But if we get feedback
   in that direction over time we can of course consider a different
   approach.
 
  By this, you mean that it's easy to control the Hadoop version in the
  build and set it to some other vendor-specific release? Yes that seems
  ideal. Making the build flexible, and adding the repository references
  to pom.xml is part of enabling that -- to me, no question that's good.
 
  So you can always roll your own build for your cluster, if you need
  to. I understand the role of the cdh4 / mapr3 / mapr4 binaries as just
  a convenience.
 
  But it's a convenience for people who...
  - are installing Spark on a cluster (i.e. not an end user)
  - that doesn't have it in their distro already
  - whose distro isn't compatible with a plain vanilla Hadoop distro
 
  That can't be many. CDH4.6+ is most of the installed CDH base and it
  already has Spark. I thought MapR already had Spark built in. The
  audience seems small enough, and the convenience relatively small
  enough (is it hard to run the distribution script?) that it caused me
  to ask whether it was worth bothering providing these, especially give
  the possible ASF sensitivity.
 
  I say crack on; you get my point.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: Running Spark On Yarn without Spark-Submit

2014-08-29 Thread Chester @work
Archit
 We are using yarn-cluster mode , and calling spark via Client class 
directly from servlet server. It works fine. 
To establish a communication channel to give further requests, 
 It should be possible with yarn client, but not with yarn server. Yarn 
client mode, spark driver is outside the yarn cluster; so it can issue more 
commands. In yarn cluster, all programs including spark driver is running 
inside the yarn cluster. There is no communication channel with the client 
until the job finishes.

If you job is to keep spark context alive, and wait for other commands, then 
this should wait forever. 

I am actually working on some improvements on this and experiment in our 
product, I will create PRs when I feel conformable with the solution

1) change Client API to allow the caller to know yarn app resource capacity 
before passing arguments
2) add YarnApplicationListener to the Client 
3) provide communication channel between application and spark Yarn client in 
cluster. 

The #1) is not directly related to the communication discussed here

#2) allows the application to have application life cycle call back as to app 
start end in progress failure etc with yarn resources allocations 

I changed #1 and #2 in forked spark, and it's worked well in cdh5, and I am 
testing against 2.0.5-alpha as well. 

For #3) I did not change in spark currently, as I am not sure the best approach 
yet. I put the change in the application runner which launch the spark yarn 
client in the cluster. 

The runner in yarn cluster get applications host and port information  from the 
passed configuration (args), then creates an Akka actor using spark context 
actor system, send a hand shake message to the caller outside the cluster, 
after that you will have a two way communications 

With this approach, I can send spark listener call backs to the app, error 
messages, app level messages etc. 

The runner inside the cluster can also receive requests from outside cluster 
such as stop. 

We are not sure Akka approach is the best, so I am still experimenting it. So 
far it does what we wants .

Hope this helps

Chester


Sent from my iPhone

 On Aug 29, 2014, at 2:36 AM, Archit Thakur archit279tha...@gmail.com wrote:
 
 including u...@spark.apache.org.
 
 
 On Fri, Aug 29, 2014 at 2:03 PM, Archit Thakur archit279tha...@gmail.com 
 wrote:
 Hi,
 
 My requirement is to run Spark on Yarn without using the script spark-submit.
 
 I have a servlet and a tomcat server. As and when request comes, it creates 
 a new SC and keeps it alive for the further requests, I ma setting my master 
 in sparkConf
 
 as sparkConf.setMaster(yarn-cluster)
 
 but the request is stuck indefinitely. 
 
 This works when I set
 sparkConf.setMaster(yarn-client)
 
 I am not sure, why is it not launching job in yarn-cluster mode.
 
 Any thoughts?
 
 Thanks and Regards,
 Archit Thakur. 
 


Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp
reminder:   this is happening right now.  jenkins is currently in quiet
mode, and in ~30 minutes, will be briefly going down.


On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote:

 as with all software upgrades, sometimes things don't always work as
 expected.

 a recent change to stapler[1], to verbosely
 report NotExportableExceptions[2] is spamming our jenkins log file with
 stack traces, which is growing rather quickly (1.2G since 9am).  this has
 been reported to the jenkins jira[3], and a fix has been pushed and will be
 rolled out soon[4].

 this isn't affecting any builds, and jenkins is happily humming along.

 in the interim, so that we don't run out of disk space, i will be
 redirecting the jenkins logs tommorow morning to /dev/null for the long
 weekend.

 once a real fix has been released, i will update any packages needed and
 redirect the logging back to the log file.

 other than a short downtime, this will have no user-facing impact.

 please let me know if you have any questions/concerns.

 thanks for your patience!

 shane the new guy  :)

 [1] -- https://wiki.jenkins-ci.org/display/JENKINS/Architecture
 [2] --
 https://github.com/stapler/stapler/commit/ed2cb8b04c1514377f3a8bfbd567f050a67c6e1c
 [3] --
 https://issues.jenkins-ci.org/browse/JENKINS-24458?focusedCommentId=209247
 [4] --
 https://github.com/stapler/stapler/commit/e2b39098ca1f61a58970b8a41a3ae79053cf30e3



Re: Jira tickets for starter tasks

2014-08-29 Thread Madhu
Cheng Lian-2 wrote
 You can just start the work :)

Given 100+ contributors, starting work without a JIRA issue assigned to you
could lead to duplication of effort by well meaning people that have no idea
they are working on the same issue. This does happen and I don't think it's
a good thing.

Just my $0.02



-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Jira-tickets-for-starter-tasks-tp8102p8127.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: emergency jenkins restart, aug 29th, 730am-9am PDT -- plus a postmortem

2014-08-29 Thread shane knapp
this is done.


On Fri, Aug 29, 2014 at 7:32 AM, shane knapp skn...@berkeley.edu wrote:

 reminder:   this is happening right now.  jenkins is currently in quiet
 mode, and in ~30 minutes, will be briefly going down.


 On Thu, Aug 28, 2014 at 1:03 PM, shane knapp skn...@berkeley.edu wrote:

 as with all software upgrades, sometimes things don't always work as
 expected.

 a recent change to stapler[1], to verbosely
 report NotExportableExceptions[2] is spamming our jenkins log file with
 stack traces, which is growing rather quickly (1.2G since 9am).  this has
 been reported to the jenkins jira[3], and a fix has been pushed and will be
 rolled out soon[4].

 this isn't affecting any builds, and jenkins is happily humming along.

 in the interim, so that we don't run out of disk space, i will be
 redirecting the jenkins logs tommorow morning to /dev/null for the long
 weekend.

 once a real fix has been released, i will update any packages needed and
 redirect the logging back to the log file.

 other than a short downtime, this will have no user-facing impact.

 please let me know if you have any questions/concerns.

 thanks for your patience!

 shane the new guy  :)

 [1] -- https://wiki.jenkins-ci.org/display/JENKINS/Architecture
 [2] --
 https://github.com/stapler/stapler/commit/ed2cb8b04c1514377f3a8bfbd567f050a67c6e1c
 [3] --
 https://issues.jenkins-ci.org/browse/JENKINS-24458?focusedCommentId=209247
 [4] --
 https://github.com/stapler/stapler/commit/e2b39098ca1f61a58970b8a41a3ae79053cf30e3





Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Ye Xianjin
We just used CDH 4.7 for our production cluster. And I believe we won't use CDH 
5 in the next year.

Sent from my iPhone

 On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com wrote:
 
 Personally I'd actually consider putting CDH4 back if there are still users 
 on it. It's always better to be inclusive, and the convenience of a one-click 
 download is high. Do we have a sense on what % of CDH users still use CDH4?
 
 Matei
 
 On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com) wrote:
 
 (Copying my reply since I don't know if it goes to the mailing list) 
 
 Great, thanks for explaining the reasoning. You're saying these aren't 
 going into the final release? I think that moots any issue surrounding 
 distributing them then. 
 
 This is all I know of from the ASF: 
 https://community.apache.org/projectIndependence.html I don't read it 
 as expressly forbidding this kind of thing although you can see how it 
 bumps up against the spirit. There's not a bright line -- what about 
 Tomcat providing binaries compiled for Windows for example? does that 
 favor an OS vendor? 
 
 From this technical ASF perspective only the releases matter -- do 
 what you want with snapshots and RCs. The only issue there is maybe 
 releasing something different than was in the RC; is that at all 
 confusing? Just needs a note. 
 
 I think this theoretical issue doesn't exist if these binaries aren't 
 released, so I see no reason to not proceed. 
 
 The rest is a different question about whether you want to spend time 
 maintaining this profile and candidate. The vendor already manages 
 their build I think and -- and I don't know -- may even prefer not to 
 have a different special build floating around. There's also the 
 theoretical argument that this turns off other vendors from adopting 
 Spark if it's perceived to be too connected to other vendors. I'd like 
 to maximize Spark's distribution and there's some argument you do this 
 by not making vendor profiles. But as I say a different question to 
 just think about over time... 
 
 (oh and PS for my part I think it's a good thing that CDH4 binaries 
 were removed. I wasn't arguing for resurrecting them) 
 
 On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com wrote: 
 Hey Sean, 
 
 The reason there are no longer CDH-specific builds is that all newer 
 versions of CDH and HDP work with builds for the upstream Hadoop 
 projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and 
 the Hadoop-without-Hive (also 2.4) build. 
 
 For MapR - we can't officially post those artifacts on ASF web space 
 when we make the final release, we can only link to them as being 
 hosted by MapR specifically since they use non-compatible licenses. 
 However, I felt that providing these during a testing period was 
 alright, with the goal of increasing test coverage. I couldn't find 
 any policy against posting these on personal web space during RC 
 voting. However, we can remove them if there is one. 
 
 Dropping CDH4 was more because it is now pretty old, but we can add it 
 back if people want. The binary packaging is a slightly separate 
 question from release votes, so I can always add more binary packages 
 whenever. And on this, my main concern is covering the most popular 
 Hadoop versions to lower the bar for users to build and test Spark. 
 
 - Patrick 
 
 On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com wrote: 
 +1 I tested the source and Hadoop 2.4 release. Checksums and 
 signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't 
 fail any more than usual. 
 
 FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another 
 project and have encountered no problems. 
 
 
 I notice that the 1.1.0 release removes the CDH4-specific build, but 
 adds two MapR-specific builds. Compare with 
 https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I 
 commented on the commit: 
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
  
 
 I'm in favor of removing all vendor-specific builds. This change 
 *looks* a bit funny as there was no JIRA (?) and appears to swap one 
 vendor for another. Of course there's nothing untoward going on, but 
 what was the reasoning? It's best avoided, and MapR already 
 distributes Spark just fine, no? 
 
 This is a gray area with ASF projects. I mention it as well because it 
 came up with Apache Flink recently 
 (http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E)
 Another vendor rightly noted this could look like favoritism. They 
 changed to remove vendor releases. 
 
 On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com 
 wrote: 
 Please vote on releasing the following candidate as Apache Spark version 
 1.1.0! 
 
 The tag to be voted on is v1.1.0-rc2 (commit 711aebb3): 
 

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
Okay I'll plan to add cdh4 binary as well for the final release!

---
sent from my phone
On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:

 We just used CDH 4.7 for our production cluster. And I believe we won't
 use CDH 5 in the next year.

 Sent from my iPhone

  On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com wrote:
 
  Personally I'd actually consider putting CDH4 back if there are still
 users on it. It's always better to be inclusive, and the convenience of a
 one-click download is high. Do we have a sense on what % of CDH users still
 use CDH4?
 
  Matei
 
  On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com) wrote:
 
  (Copying my reply since I don't know if it goes to the mailing list)
 
  Great, thanks for explaining the reasoning. You're saying these aren't
  going into the final release? I think that moots any issue surrounding
  distributing them then.
 
  This is all I know of from the ASF:
  https://community.apache.org/projectIndependence.html I don't read it
  as expressly forbidding this kind of thing although you can see how it
  bumps up against the spirit. There's not a bright line -- what about
  Tomcat providing binaries compiled for Windows for example? does that
  favor an OS vendor?
 
  From this technical ASF perspective only the releases matter -- do
  what you want with snapshots and RCs. The only issue there is maybe
  releasing something different than was in the RC; is that at all
  confusing? Just needs a note.
 
  I think this theoretical issue doesn't exist if these binaries aren't
  released, so I see no reason to not proceed.
 
  The rest is a different question about whether you want to spend time
  maintaining this profile and candidate. The vendor already manages
  their build I think and -- and I don't know -- may even prefer not to
  have a different special build floating around. There's also the
  theoretical argument that this turns off other vendors from adopting
  Spark if it's perceived to be too connected to other vendors. I'd like
  to maximize Spark's distribution and there's some argument you do this
  by not making vendor profiles. But as I say a different question to
  just think about over time...
 
  (oh and PS for my part I think it's a good thing that CDH4 binaries
  were removed. I wasn't arguing for resurrecting them)
 
  On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Hey Sean,
 
  The reason there are no longer CDH-specific builds is that all newer
  versions of CDH and HDP work with builds for the upstream Hadoop
  projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and
  the Hadoop-without-Hive (also 2.4) build.
 
  For MapR - we can't officially post those artifacts on ASF web space
  when we make the final release, we can only link to them as being
  hosted by MapR specifically since they use non-compatible licenses.
  However, I felt that providing these during a testing period was
  alright, with the goal of increasing test coverage. I couldn't find
  any policy against posting these on personal web space during RC
  voting. However, we can remove them if there is one.
 
  Dropping CDH4 was more because it is now pretty old, but we can add it
  back if people want. The binary packaging is a slightly separate
  question from release votes, so I can always add more binary packages
  whenever. And on this, my main concern is covering the most popular
  Hadoop versions to lower the bar for users to build and test Spark.
 
  - Patrick
 
  On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com
 wrote:
  +1 I tested the source and Hadoop 2.4 release. Checksums and
  signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
  fail any more than usual.
 
  FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
  project and have encountered no problems.
 
 
  I notice that the 1.1.0 release removes the CDH4-specific build, but
  adds two MapR-specific builds. Compare with
  https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
  commented on the commit:
 
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
 
  I'm in favor of removing all vendor-specific builds. This change
  *looks* a bit funny as there was no JIRA (?) and appears to swap one
  vendor for another. Of course there's nothing untoward going on, but
  what was the reasoning? It's best avoided, and MapR already
  distributes Spark just fine, no?
 
  This is a gray area with ASF projects. I mention it as well because it
  came up with Apache Flink recently
  (
 http://mail-archives.eu.apache.org/mod_mbox/incubator-flink-dev/201408.mbox/%3CCANC1h_u%3DN0YKFu3pDaEVYz5ZcQtjQnXEjQA2ReKmoS%2Bye7%3Do%3DA%40mail.gmail.com%3E
 )
  Another vendor rightly noted this could look like favoritism. They
  changed to remove vendor releases.
 
  On Fri, Aug 29, 2014 at 3:14 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on 

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
There were several formatting and typographical errors in the SQL docs that
I've fixed in this PR https://github.com/apache/spark/pull/2201. Dunno if
we want to roll that into the release.


On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
wrote:

 Okay I'll plan to add cdh4 binary as well for the final release!

 ---
 sent from my phone
 On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:

  We just used CDH 4.7 for our production cluster. And I believe we won't
  use CDH 5 in the next year.
 
  Sent from my iPhone
 
   On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
 wrote:
  
   Personally I'd actually consider putting CDH4 back if there are still
  users on it. It's always better to be inclusive, and the convenience of a
  one-click download is high. Do we have a sense on what % of CDH users
 still
  use CDH4?
  
   Matei
  
   On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com)
 wrote:
  
   (Copying my reply since I don't know if it goes to the mailing list)
  
   Great, thanks for explaining the reasoning. You're saying these aren't
   going into the final release? I think that moots any issue surrounding
   distributing them then.
  
   This is all I know of from the ASF:
   https://community.apache.org/projectIndependence.html I don't read it
   as expressly forbidding this kind of thing although you can see how it
   bumps up against the spirit. There's not a bright line -- what about
   Tomcat providing binaries compiled for Windows for example? does that
   favor an OS vendor?
  
   From this technical ASF perspective only the releases matter -- do
   what you want with snapshots and RCs. The only issue there is maybe
   releasing something different than was in the RC; is that at all
   confusing? Just needs a note.
  
   I think this theoretical issue doesn't exist if these binaries aren't
   released, so I see no reason to not proceed.
  
   The rest is a different question about whether you want to spend time
   maintaining this profile and candidate. The vendor already manages
   their build I think and -- and I don't know -- may even prefer not to
   have a different special build floating around. There's also the
   theoretical argument that this turns off other vendors from adopting
   Spark if it's perceived to be too connected to other vendors. I'd like
   to maximize Spark's distribution and there's some argument you do this
   by not making vendor profiles. But as I say a different question to
   just think about over time...
  
   (oh and PS for my part I think it's a good thing that CDH4 binaries
   were removed. I wasn't arguing for resurrecting them)
  
   On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Hey Sean,
  
   The reason there are no longer CDH-specific builds is that all newer
   versions of CDH and HDP work with builds for the upstream Hadoop
   projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and
   the Hadoop-without-Hive (also 2.4) build.
  
   For MapR - we can't officially post those artifacts on ASF web space
   when we make the final release, we can only link to them as being
   hosted by MapR specifically since they use non-compatible licenses.
   However, I felt that providing these during a testing period was
   alright, with the goal of increasing test coverage. I couldn't find
   any policy against posting these on personal web space during RC
   voting. However, we can remove them if there is one.
  
   Dropping CDH4 was more because it is now pretty old, but we can add it
   back if people want. The binary packaging is a slightly separate
   question from release votes, so I can always add more binary packages
   whenever. And on this, my main concern is covering the most popular
   Hadoop versions to lower the bar for users to build and test Spark.
  
   - Patrick
  
   On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com
  wrote:
   +1 I tested the source and Hadoop 2.4 release. Checksums and
   signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
   fail any more than usual.
  
   FWIW I've also been using the 1.1.0-SNAPSHOT for some time in another
   project and have encountered no problems.
  
  
   I notice that the 1.1.0 release removes the CDH4-specific build, but
   adds two MapR-specific builds. Compare with
   https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
   commented on the commit:
  
 
 https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
  
   I'm in favor of removing all vendor-specific builds. This change
   *looks* a bit funny as there was no JIRA (?) and appears to swap one
   vendor for another. Of course there's nothing untoward going on, but
   what was the reasoning? It's best avoided, and MapR already
   distributes Spark just fine, no?
  
   This is a gray area with ASF projects. I mention it as well because
 it
   came up with Apache Flink recently
  

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
Hey Nicholas,

Thanks for this, we can merge in doc changes outside of the actual
release timeline, so we'll make sure to loop those changes in before
we publish the final 1.1 docs.

- Patrick

On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 There were several formatting and typographical errors in the SQL docs that
 I've fixed in this PR. Dunno if we want to roll that into the release.


 On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Okay I'll plan to add cdh4 binary as well for the final release!

 ---
 sent from my phone
 On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:

  We just used CDH 4.7 for our production cluster. And I believe we won't
  use CDH 5 in the next year.
 
  Sent from my iPhone
 
   On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
   wrote:
  
   Personally I'd actually consider putting CDH4 back if there are still
  users on it. It's always better to be inclusive, and the convenience of
  a
  one-click download is high. Do we have a sense on what % of CDH users
  still
  use CDH4?
  
   Matei
  
   On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com)
   wrote:
  
   (Copying my reply since I don't know if it goes to the mailing list)
  
   Great, thanks for explaining the reasoning. You're saying these aren't
   going into the final release? I think that moots any issue surrounding
   distributing them then.
  
   This is all I know of from the ASF:
   https://community.apache.org/projectIndependence.html I don't read it
   as expressly forbidding this kind of thing although you can see how it
   bumps up against the spirit. There's not a bright line -- what about
   Tomcat providing binaries compiled for Windows for example? does that
   favor an OS vendor?
  
   From this technical ASF perspective only the releases matter -- do
   what you want with snapshots and RCs. The only issue there is maybe
   releasing something different than was in the RC; is that at all
   confusing? Just needs a note.
  
   I think this theoretical issue doesn't exist if these binaries aren't
   released, so I see no reason to not proceed.
  
   The rest is a different question about whether you want to spend time
   maintaining this profile and candidate. The vendor already manages
   their build I think and -- and I don't know -- may even prefer not to
   have a different special build floating around. There's also the
   theoretical argument that this turns off other vendors from adopting
   Spark if it's perceived to be too connected to other vendors. I'd like
   to maximize Spark's distribution and there's some argument you do this
   by not making vendor profiles. But as I say a different question to
   just think about over time...
  
   (oh and PS for my part I think it's a good thing that CDH4 binaries
   were removed. I wasn't arguing for resurrecting them)
  
   On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Hey Sean,
  
   The reason there are no longer CDH-specific builds is that all newer
   versions of CDH and HDP work with builds for the upstream Hadoop
   projects. I dropped CDH4 in favor of a newer Hadoop version (2.4) and
   the Hadoop-without-Hive (also 2.4) build.
  
   For MapR - we can't officially post those artifacts on ASF web space
   when we make the final release, we can only link to them as being
   hosted by MapR specifically since they use non-compatible licenses.
   However, I felt that providing these during a testing period was
   alright, with the goal of increasing test coverage. I couldn't find
   any policy against posting these on personal web space during RC
   voting. However, we can remove them if there is one.
  
   Dropping CDH4 was more because it is now pretty old, but we can add
   it
   back if people want. The binary packaging is a slightly separate
   question from release votes, so I can always add more binary packages
   whenever. And on this, my main concern is covering the most popular
   Hadoop versions to lower the bar for users to build and test Spark.
  
   - Patrick
  
   On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com
  wrote:
   +1 I tested the source and Hadoop 2.4 release. Checksums and
   signatures are OK. Compiles fine with Java 8 on OS X. Tests... don't
   fail any more than usual.
  
   FWIW I've also been using the 1.1.0-SNAPSHOT for some time in
   another
   project and have encountered no problems.
  
  
   I notice that the 1.1.0 release removes the CDH4-specific build, but
   adds two MapR-specific builds. Compare with
   https://dist.apache.org/repos/dist/release/spark/spark-1.0.2/ I
   commented on the commit:
  
 
  https://github.com/apache/spark/commit/ceb19830b88486faa87ff41e18d03ede713a73cc
  
   I'm in favor of removing all vendor-specific builds. This change
   *looks* a bit funny as there was no JIRA (?) and appears to swap one
   vendor for another. Of course 

Re: Jira tickets for starter tasks

2014-08-29 Thread Ron's Yahoo!
Hi Josh,
  Can you add me as well?

Thanks,
Ron

On Aug 28, 2014, at 3:56 PM, Josh Rosen rosenvi...@gmail.com wrote:

 A JIRA admin needs to add you to the ‘’Contributors” role group in order to 
 allow you to assign issues to yourself.  I’ve added this email address to 
 that group, so you should be set!
 
 - Josh
 
 
 On August 28, 2014 at 3:52:57 PM, Bill Bejeck (bbej...@gmail.com) wrote:
 
 Hi,  
 
 How do I get a starter task jira ticket assigned to myself? Or do I just do  
 the work and issue a pull request with the associated jira number?  
 
 Thanks,  
 Bill  


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Jira tickets for starter tasks

2014-08-29 Thread Josh Rosen
Added you; you should be set!

If anyone else wants me to add them, please email me off-list so that we don’t 
end up flooding the dev list with replies. Thanks!


On August 29, 2014 at 10:03:41 AM, Ron's Yahoo! (zlgonza...@yahoo.com) wrote:

Hi Josh,  
Can you add me as well?  

Thanks,  
Ron  

On Aug 28, 2014, at 3:56 PM, Josh Rosen rosenvi...@gmail.com wrote:  

 A JIRA admin needs to add you to the ‘’Contributors” role group in order to 
 allow you to assign issues to yourself. I’ve added this email address to that 
 group, so you should be set!  
  
 - Josh  
  
  
 On August 28, 2014 at 3:52:57 PM, Bill Bejeck (bbej...@gmail.com) wrote:  
  
 Hi,  
  
 How do I get a starter task jira ticket assigned to myself? Or do I just do  
 the work and issue a pull request with the associated jira number?  
  
 Thanks,  
 Bill  



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Nicholas Chammas
[Let me know if I should be posting these comments in a different thread.]

Should the default Spark version in spark-ec2
https://github.com/apache/spark/blob/e1535ad3c6f7400f2b7915ea91da9c60510557ba/ec2/spark_ec2.py#L86
be updated for this release?

Nick
​


On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell pwend...@gmail.com
wrote:

 Hey Nicholas,

 Thanks for this, we can merge in doc changes outside of the actual
 release timeline, so we'll make sure to loop those changes in before
 we publish the final 1.1 docs.

 - Patrick

 On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  There were several formatting and typographical errors in the SQL docs
 that
  I've fixed in this PR. Dunno if we want to roll that into the release.
 
 
  On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Okay I'll plan to add cdh4 binary as well for the final release!
 
  ---
  sent from my phone
  On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:
 
   We just used CDH 4.7 for our production cluster. And I believe we
 won't
   use CDH 5 in the next year.
  
   Sent from my iPhone
  
On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
wrote:
   
Personally I'd actually consider putting CDH4 back if there are
 still
   users on it. It's always better to be inclusive, and the convenience
 of
   a
   one-click download is high. Do we have a sense on what % of CDH users
   still
   use CDH4?
   
Matei
   
On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com)
wrote:
   
(Copying my reply since I don't know if it goes to the mailing list)
   
Great, thanks for explaining the reasoning. You're saying these
 aren't
going into the final release? I think that moots any issue
 surrounding
distributing them then.
   
This is all I know of from the ASF:
https://community.apache.org/projectIndependence.html I don't read
 it
as expressly forbidding this kind of thing although you can see how
 it
bumps up against the spirit. There's not a bright line -- what about
Tomcat providing binaries compiled for Windows for example? does
 that
favor an OS vendor?
   
From this technical ASF perspective only the releases matter -- do
what you want with snapshots and RCs. The only issue there is maybe
releasing something different than was in the RC; is that at all
confusing? Just needs a note.
   
I think this theoretical issue doesn't exist if these binaries
 aren't
released, so I see no reason to not proceed.
   
The rest is a different question about whether you want to spend
 time
maintaining this profile and candidate. The vendor already manages
their build I think and -- and I don't know -- may even prefer not
 to
have a different special build floating around. There's also the
theoretical argument that this turns off other vendors from adopting
Spark if it's perceived to be too connected to other vendors. I'd
 like
to maximize Spark's distribution and there's some argument you do
 this
by not making vendor profiles. But as I say a different question to
just think about over time...
   
(oh and PS for my part I think it's a good thing that CDH4 binaries
were removed. I wasn't arguing for resurrecting them)
   
On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell 
 pwend...@gmail.com
   wrote:
Hey Sean,
   
The reason there are no longer CDH-specific builds is that all
 newer
versions of CDH and HDP work with builds for the upstream Hadoop
projects. I dropped CDH4 in favor of a newer Hadoop version (2.4)
 and
the Hadoop-without-Hive (also 2.4) build.
   
For MapR - we can't officially post those artifacts on ASF web
 space
when we make the final release, we can only link to them as being
hosted by MapR specifically since they use non-compatible licenses.
However, I felt that providing these during a testing period was
alright, with the goal of increasing test coverage. I couldn't find
any policy against posting these on personal web space during RC
voting. However, we can remove them if there is one.
   
Dropping CDH4 was more because it is now pretty old, but we can add
it
back if people want. The binary packaging is a slightly separate
question from release votes, so I can always add more binary
 packages
whenever. And on this, my main concern is covering the most popular
Hadoop versions to lower the bar for users to build and test Spark.
   
- Patrick
   
On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com
   wrote:
+1 I tested the source and Hadoop 2.4 release. Checksums and
signatures are OK. Compiles fine with Java 8 on OS X. Tests...
 don't
fail any more than usual.
   
FWIW I've also been using the 1.1.0-SNAPSHOT for some time in
another
project and have encountered no problems.
   
   
I 

new jenkins plugin installed and ready for use

2014-08-29 Thread shane knapp
i have always found the 'Rebuild' plugin super useful:
https://wiki.jenkins-ci.org/display/JENKINS/Rebuild+Plugin

this is installed and enables.  enjoy!

shane


Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Patrick Wendell
Oh darn - I missed this update. GRR, unfortunately I think this means
I'll need to cut a new RC. Thanks for catching this Nick.

On Fri, Aug 29, 2014 at 10:18 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 [Let me know if I should be posting these comments in a different thread.]

 Should the default Spark version in spark-ec2 be updated for this release?

 Nick



 On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey Nicholas,

 Thanks for this, we can merge in doc changes outside of the actual
 release timeline, so we'll make sure to loop those changes in before
 we publish the final 1.1 docs.

 - Patrick

 On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  There were several formatting and typographical errors in the SQL docs
  that
  I've fixed in this PR. Dunno if we want to roll that into the release.
 
 
  On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Okay I'll plan to add cdh4 binary as well for the final release!
 
  ---
  sent from my phone
  On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:
 
   We just used CDH 4.7 for our production cluster. And I believe we
   won't
   use CDH 5 in the next year.
  
   Sent from my iPhone
  
On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
wrote:
   
Personally I'd actually consider putting CDH4 back if there are
still
   users on it. It's always better to be inclusive, and the convenience
   of
   a
   one-click download is high. Do we have a sense on what % of CDH users
   still
   use CDH4?
   
Matei
   
On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com)
wrote:
   
(Copying my reply since I don't know if it goes to the mailing
list)
   
Great, thanks for explaining the reasoning. You're saying these
aren't
going into the final release? I think that moots any issue
surrounding
distributing them then.
   
This is all I know of from the ASF:
https://community.apache.org/projectIndependence.html I don't read
it
as expressly forbidding this kind of thing although you can see how
it
bumps up against the spirit. There's not a bright line -- what
about
Tomcat providing binaries compiled for Windows for example? does
that
favor an OS vendor?
   
From this technical ASF perspective only the releases matter -- do
what you want with snapshots and RCs. The only issue there is maybe
releasing something different than was in the RC; is that at all
confusing? Just needs a note.
   
I think this theoretical issue doesn't exist if these binaries
aren't
released, so I see no reason to not proceed.
   
The rest is a different question about whether you want to spend
time
maintaining this profile and candidate. The vendor already manages
their build I think and -- and I don't know -- may even prefer not
to
have a different special build floating around. There's also the
theoretical argument that this turns off other vendors from
adopting
Spark if it's perceived to be too connected to other vendors. I'd
like
to maximize Spark's distribution and there's some argument you do
this
by not making vendor profiles. But as I say a different question to
just think about over time...
   
(oh and PS for my part I think it's a good thing that CDH4 binaries
were removed. I wasn't arguing for resurrecting them)
   
On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell
pwend...@gmail.com
   wrote:
Hey Sean,
   
The reason there are no longer CDH-specific builds is that all
newer
versions of CDH and HDP work with builds for the upstream Hadoop
projects. I dropped CDH4 in favor of a newer Hadoop version (2.4)
and
the Hadoop-without-Hive (also 2.4) build.
   
For MapR - we can't officially post those artifacts on ASF web
space
when we make the final release, we can only link to them as being
hosted by MapR specifically since they use non-compatible
licenses.
However, I felt that providing these during a testing period was
alright, with the goal of increasing test coverage. I couldn't
find
any policy against posting these on personal web space during RC
voting. However, we can remove them if there is one.
   
Dropping CDH4 was more because it is now pretty old, but we can
add
it
back if people want. The binary packaging is a slightly separate
question from release votes, so I can always add more binary
packages
whenever. And on this, my main concern is covering the most
popular
Hadoop versions to lower the bar for users to build and test
Spark.
   
- Patrick
   
On Thu, Aug 28, 2014 at 11:04 PM, Sean Owen so...@cloudera.com
   wrote:
+1 I tested the source and Hadoop 2.4 release. Checksums and
signatures are OK. Compiles fine with Java 

Re: Compie error with XML elements

2014-08-29 Thread Yi Tian
Hi, Devl!

I got the same problem.

You can try to upgrade your scala plugins to  0.41.2

It works on my mac.

On Aug 12, 2014, at 15:19, Devl Devel devl.developm...@gmail.com wrote:

 When compiling the master checkout of spark. The Intellij compile fails
 with:
 
Error:(45, 8) not found: value $scope
  div class=row-fluid
   ^
 which is caused by HTML elements in classes like HistoryPage.scala:
 
val content =
  div class=row-fluid
div class=span12...
 
 How can I compile these classes that have html node elements in them?
 
 Thanks in advance.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Marcelo Vanzin
In our internal projects we use this bit of code in the maven pom to
create a properties file with build information (sorry for the messy
indentation). Then we have code that reads this property file
somewhere and provides that info. This should make it easier to not
have to change version numbers in Scala/Java/Python code ever again.
:-)

Shouldn't be hard to do something like that in sbt (actually should be
much easier).


  plugin
groupIdorg.apache.maven.plugins/groupId
artifactIdmaven-antrun-plugin/artifactId
version1.6/version
executions
  execution
idbuild-info/id
phasecompile/phase
goals
  goalrun/goal
/goals
configuration
  target
taskdef
resource=net/sf/antcontrib/antcontrib.properties
classpathref=maven.plugin.classpath/
if
  not
isset property=build.hash/
  /not
  then
exec executable=git
  outputproperty=build.hash
  arg line=rev-parse HEAD/
/exec
  /then
/if
echobuildRevision: ${build.hash}/echo
echo file=${build.info}
message=version=${project.version}${line.separator} /
echo file=${build.info} append=true
message=hash=${build.hash}${line.separator} /
echo file=${build.info} append=true /
  /target
/configuration
  /execution
/executions
dependencies
  dependency
groupIdant-contrib/groupId
artifactIdant-contrib/artifactId
version1.0b3/version
exclusions
  exclusion
groupIdant/groupId
artifactIdant/artifactId
  /exclusion
/exclusions
  /dependency
/dependencies
  /plugin
/plugins

On Fri, Aug 29, 2014 at 11:43 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
 Sounds good. As an FYI, we had this problem with the 1.0.2 release
 https://issues.apache.org/jira/browse/SPARK-3242. Is there perhaps some
 kind of automated check we can make to catch this for us in the future?
 Where would it go?


 On Fri, Aug 29, 2014 at 2:18 PM, Patrick Wendell pwend...@gmail.com wrote:

 Oh darn - I missed this update. GRR, unfortunately I think this means
 I'll need to cut a new RC. Thanks for catching this Nick.

 On Fri, Aug 29, 2014 at 10:18 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  [Let me know if I should be posting these comments in a different
 thread.]
 
  Should the default Spark version in spark-ec2 be updated for this
 release?
 
  Nick
 
 
 
  On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hey Nicholas,
 
  Thanks for this, we can merge in doc changes outside of the actual
  release timeline, so we'll make sure to loop those changes in before
  we publish the final 1.1 docs.
 
  - Patrick
 
  On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   There were several formatting and typographical errors in the SQL docs
   that
   I've fixed in this PR. Dunno if we want to roll that into the release.
  
  
   On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
   Okay I'll plan to add cdh4 binary as well for the final release!
  
   ---
   sent from my phone
   On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:
  
We just used CDH 4.7 for our production cluster. And I believe we
won't
use CDH 5 in the next year.
   
Sent from my iPhone
   
 On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 Personally I'd actually consider putting CDH4 back if there are
 still
users on it. It's always better to be inclusive, and the
 convenience
of
a
one-click download is high. Do we have a sense on what % of CDH
 users
still
use CDH4?

 Matei

 On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com
 )
 wrote:

 (Copying my reply since I don't know if it goes to the mailing
 list)

 Great, thanks for explaining the reasoning. You're saying these
 aren't
 going into the final release? I think that moots any issue
 surrounding
 distributing them then.

 This is all I know of from the ASF:
 https://community.apache.org/projectIndependence.html I don't
 read
 it
 as expressly forbidding this kind of thing although you can see
 how
 it
 bumps up against the spirit. There's not a bright line -- what
 about
 Tomcat providing binaries compiled for Windows for example? 

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Cheng Lian
Just noticed one thing: although --with-hive is deprecated by -Phive,
make-distribution.sh still relies on $SPARK_HIVE (which was controlled by
--with-hive) to determine whether to include datanucleus jar files. This
means we have to do something like SPARK_HIVE=true ./make-distribution.sh
... to enable Hive support. Otherwise datanucleus jars are not included in
lib/.

This issue is similar to SPARK-3234
https://issues.apache.org/jira/browse/SPARK-3234, both
SPARK_HADOOP_VERSION and SPARK_HIVE are controlled by some deprecated
command line options.
​


On Fri, Aug 29, 2014 at 11:18 AM, Patrick Wendell pwend...@gmail.com
wrote:

 Oh darn - I missed this update. GRR, unfortunately I think this means
 I'll need to cut a new RC. Thanks for catching this Nick.

 On Fri, Aug 29, 2014 at 10:18 AM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:
  [Let me know if I should be posting these comments in a different
 thread.]
 
  Should the default Spark version in spark-ec2 be updated for this
 release?
 
  Nick
 
 
 
  On Fri, Aug 29, 2014 at 12:55 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Hey Nicholas,
 
  Thanks for this, we can merge in doc changes outside of the actual
  release timeline, so we'll make sure to loop those changes in before
  we publish the final 1.1 docs.
 
  - Patrick
 
  On Fri, Aug 29, 2014 at 9:24 AM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
   There were several formatting and typographical errors in the SQL docs
   that
   I've fixed in this PR. Dunno if we want to roll that into the release.
  
  
   On Fri, Aug 29, 2014 at 12:17 PM, Patrick Wendell pwend...@gmail.com
 
   wrote:
  
   Okay I'll plan to add cdh4 binary as well for the final release!
  
   ---
   sent from my phone
   On Aug 29, 2014 8:26 AM, Ye Xianjin advance...@gmail.com wrote:
  
We just used CDH 4.7 for our production cluster. And I believe we
won't
use CDH 5 in the next year.
   
Sent from my iPhone
   
 On 2014年8月29日, at 14:39, Matei Zaharia matei.zaha...@gmail.com
 wrote:

 Personally I'd actually consider putting CDH4 back if there are
 still
users on it. It's always better to be inclusive, and the
 convenience
of
a
one-click download is high. Do we have a sense on what % of CDH
 users
still
use CDH4?

 Matei

 On August 28, 2014 at 11:31:13 PM, Sean Owen (so...@cloudera.com
 )
 wrote:

 (Copying my reply since I don't know if it goes to the mailing
 list)

 Great, thanks for explaining the reasoning. You're saying these
 aren't
 going into the final release? I think that moots any issue
 surrounding
 distributing them then.

 This is all I know of from the ASF:
 https://community.apache.org/projectIndependence.html I don't
 read
 it
 as expressly forbidding this kind of thing although you can see
 how
 it
 bumps up against the spirit. There's not a bright line -- what
 about
 Tomcat providing binaries compiled for Windows for example? does
 that
 favor an OS vendor?

 From this technical ASF perspective only the releases matter --
 do
 what you want with snapshots and RCs. The only issue there is
 maybe
 releasing something different than was in the RC; is that at all
 confusing? Just needs a note.

 I think this theoretical issue doesn't exist if these binaries
 aren't
 released, so I see no reason to not proceed.

 The rest is a different question about whether you want to spend
 time
 maintaining this profile and candidate. The vendor already
 manages
 their build I think and -- and I don't know -- may even prefer
 not
 to
 have a different special build floating around. There's also the
 theoretical argument that this turns off other vendors from
 adopting
 Spark if it's perceived to be too connected to other vendors. I'd
 like
 to maximize Spark's distribution and there's some argument you do
 this
 by not making vendor profiles. But as I say a different question
 to
 just think about over time...

 (oh and PS for my part I think it's a good thing that CDH4
 binaries
 were removed. I wasn't arguing for resurrecting them)

 On Fri, Aug 29, 2014 at 7:26 AM, Patrick Wendell
 pwend...@gmail.com
wrote:
 Hey Sean,

 The reason there are no longer CDH-specific builds is that all
 newer
 versions of CDH and HDP work with builds for the upstream Hadoop
 projects. I dropped CDH4 in favor of a newer Hadoop version
 (2.4)
 and
 the Hadoop-without-Hive (also 2.4) build.

 For MapR - we can't officially post those artifacts on ASF web
 space
 when we make the final release, we can only link to them as
 being
 hosted by MapR specifically since they use non-compatible
 licenses.
 However, I felt that providing these during a testing period was
 alright, 

Re: [VOTE] Release Apache Spark 1.1.0 (RC2)

2014-08-29 Thread Jeremy Freeman
+1. Validated several custom analysis pipelines on a private cluster in
standalone mode. Tested new PySpark support for arbitrary Hadoop input
formats, works great!

-- Jeremy



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-1-0-RC2-tp8107p8143.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Need to check approach for continuing development on Spark

2014-08-29 Thread smalpani
Hi,

We are developing an app in Spring in which we are using Cassandra and
calling datastax api's from Java to query it. The internal library is
responsible for calling cassandra and other data sources like RDS. We are
calling several client API's from Spark provided by the client-jar to
perform certain operations on that data like:
1. Reading data from S3 and inserting in cassandra by providing the objects
through API and then internally the API will store in cassandra.
2. Taking the data from cassandra through API as objects and then processing
on that data to generate metrics and saving it in cassandra through APi's
only.
3. Then internally through those API's only calculating aggregates and
separating data in bands etc.

The whole project is driven by Spring. Please let me know if we are
approaching towards it fine.




--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Need-to-check-approach-for-continuing-development-on-Spark-tp8142.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Compie error with XML elements

2014-08-29 Thread Patrick Wendell
In some cases IntelliJ's Scala compiler can't compile valid Scala
source files. Hopefully they fix (or have fixed) this in a newer
version.

- Patrick

On Fri, Aug 29, 2014 at 11:38 AM, Yi Tian tianyi.asiai...@gmail.com wrote:
 Hi, Devl!

 I got the same problem.

 You can try to upgrade your scala plugins to  0.41.2

 It works on my mac.

 On Aug 12, 2014, at 15:19, Devl Devel devl.developm...@gmail.com wrote:

 When compiling the master checkout of spark. The Intellij compile fails
 with:

Error:(45, 8) not found: value $scope
  div class=row-fluid
   ^
 which is caused by HTML elements in classes like HistoryPage.scala:

val content =
  div class=row-fluid
div class=span12...

 How can I compile these classes that have html node elements in them?

 Thanks in advance.


 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org