Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
No, that isn't necessarily enough to be considered a blocker.  A blocker
would be something that would have large negative effects on a significant
number of people trying to run Spark.  Arguably, something that prevents a
minority of Spark developers from running unit tests on one OS does not
qualify.  That's not to say that we shouldn't fix this, but only that it
needn't block a 2.0.0 release.

On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander  wrote:

> Spark Unit tests fail on Windows in Spark 2.0. It can be considered as
> blocker since there are people that develop for Spark on Windows. The
> referenced issue is indeed Minor and has nothing to do with unit tests.
>
>
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* Wednesday, June 22, 2016 4:09 PM
> *To:* Marcelo Vanzin 
> *Cc:* Ulanov, Alexander ; Reynold Xin <
> r...@databricks.com>; dev@spark.apache.org
> *Subject:* Re: [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> It's also marked as Minor, not Blocker.
>
>
>
> On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin 
> wrote:
>
> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
>  wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> > resolved.
>
> To be pedantic, it's marked as a duplicate
> (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
> mean necessarily that it's fixed.
>
>
>
>
> > https://issues.apache.org/jira/browse/SPARK-15893
> >
> > From: Reynold Xin [mailto:r...@databricks.com]
> > Sent: Tuesday, June 21, 2016 6:27 PM
> > To: dev@spark.apache.org
> > Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
> >
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and
> passes
> > if a majority of at least 3+1 PMC votes are cast.
> >
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.0.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> >
> >
> >
> > The tag to be voted on is v2.0.0-rc1
> > (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
> >
> >
> >
> > This release candidate resolves ~2400 issues:
> > https://s.apache.org/spark-2.0.0-rc1-jira
> >
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
> >
> >
> >
> > Release artifacts are signed with the following key:
> >
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> >
> >
> > The staging repository for this release can be found at:
> >
> > https://repository.apache.org/content/repositories/orgapachespark-1187/
> >
> >
> >
> > The documentation corresponding to this release can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
> >
> >
> >
> >
> >
> > ===
> >
> > == How can I help test this release? ==
> >
> > ===
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 1.x.
> >
> >
> >
> > 
> >
> > == What justifies a -1 vote for this release? ==
> >
> > 
> >
> > Critical bugs impacting major functionalities.
> >
> >
> >
> > Bugs already present in 1.x, missing features, or bugs related to new
> > features will not necessarily block this release. Note that historically
> > Spark documentation has been published on the website separately from the
> > main release so we do not need to block the release due to documentation
> > errors either.
> >
> >
> >
> >
>
>
> --
> Marcelo
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>
>


RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
Here is the fix https://github.com/apache/spark/pull/13868
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Wednesday, June 22, 2016 6:43 PM
To: Ulanov, Alexander 
Cc: Mark Hamstra ; Marcelo Vanzin 
; dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

Alex - if you have access to a windows box, can you fix the issue? I'm not sure 
how many Spark contributors have windows boxes.


On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander 
> wrote:
Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker 
since there are people that develop for Spark on Windows. The referenced issue 
is indeed Minor and has nothing to do with unit tests.

From: Mark Hamstra 
[mailto:m...@clearstorydata.com]
Sent: Wednesday, June 22, 2016 4:09 PM
To: Marcelo Vanzin >
Cc: Ulanov, Alexander 
>; Reynold Xin 
>; 
dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

It's also marked as Minor, not Blocker.

On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin 
> wrote:
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
> wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.

To be pedantic, it's marked as a duplicate
(https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
mean necessarily that it's fixed.



> https://issues.apache.org/jira/browse/SPARK-15893
>
> From: Reynold Xin [mailto:r...@databricks.com]
> Sent: Tuesday, June 21, 2016 6:27 PM
> To: dev@spark.apache.org
> Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Spark 2.0.0
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
>
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
>
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
>
>
>
> ===
>
> == How can I help test this release? ==
>
> ===
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
>
>
> 
>
> == What justifies a -1 vote for this release? ==
>
> 
>
> Critical bugs impacting major functionalities.
>
>
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>

--
Marcelo

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.org




why did spark2.0 Disallow ROW FORMAT and STORED AS (parquet | orc | avro etc.)

2016-06-22 Thread linxi zeng
Hi All,
I have tried the spark sql of Spark branch-2.0 and countered an
unexpected problem:

Operation not allowed: ROW FORMAT DELIMITED is only compatible with
'textfile', not 'orc'(line 1, pos 0)

the sql is like:

CREATE TABLE IF NOT EXISTS test.test_orc
(
 ...
)
PARTITIONED BY (xxx)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
stored as orc

I found this JIRA: https://issues.apache.org/jira/browse/SPARK-15279,
but still can't understand why?

and by the way, this sql is work fine on spark1.4.


Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman
Thank you Holden, I look forward to watching your talk!

On Wed, Jun 22, 2016 at 7:12 PM Holden Karau  wrote:

> PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
> and mostly (but not entirely) opaque to the JVM. It is possible (by using
> some internals) to pass a PySpark DataFrame to a Scala library (you may or
> may not find the talk I gave at Spark Summit useful
> https://www.youtube.com/watch?v=V6DkTVvy9vk as well as some of the Python
> examples in
> https://github.com/high-performance-spark/high-performance-spark-examples
> ). Good luck! :)
>
> On Wed, Jun 22, 2016 at 7:07 PM, Daniel Imberman <
> daniel.imber...@gmail.com> wrote:
>
>> Hi All,
>>
>> I've developed a spark module in scala that I would like to add a python
>> port for. I want to be able to allow users to create a pyspark RDD and send
>> it to my system. I've been looking into the pyspark source code as well as
>> py4J and was wondering if there has been anything like this implemented
>> before.
>>
>> Thank you
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Holden Karau
PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
and mostly (but not entirely) opaque to the JVM. It is possible (by using
some internals) to pass a PySpark DataFrame to a Scala library (you may or
may not find the talk I gave at Spark Summit useful
https://www.youtube.com/watch?v=V6DkTVvy9vk as well as some of the Python
examples in
https://github.com/high-performance-spark/high-performance-spark-examples
). Good luck! :)

On Wed, Jun 22, 2016 at 7:07 PM, Daniel Imberman 
wrote:

> Hi All,
>
> I've developed a spark module in scala that I would like to add a python
> port for. I want to be able to allow users to create a pyspark RDD and send
> it to my system. I've been looking into the pyspark source code as well as
> py4J and was wondering if there has been anything like this implemented
> before.
>
> Thank you
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Creating a python port for a Scala Spark Projeect

2016-06-22 Thread Daniel Imberman
Hi All,

I've developed a spark module in scala that I would like to add a python
port for. I want to be able to allow users to create a pyspark RDD and send
it to my system. I've been looking into the pyspark source code as well as
py4J and was wondering if there has been anything like this implemented
before.

Thank you


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Reynold Xin
Alex - if you have access to a windows box, can you fix the issue? I'm not
sure how many Spark contributors have windows boxes.


On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander  wrote:

> Spark Unit tests fail on Windows in Spark 2.0. It can be considered as
> blocker since there are people that develop for Spark on Windows. The
> referenced issue is indeed Minor and has nothing to do with unit tests.
>
>
>
> *From:* Mark Hamstra [mailto:m...@clearstorydata.com]
> *Sent:* Wednesday, June 22, 2016 4:09 PM
> *To:* Marcelo Vanzin 
> *Cc:* Ulanov, Alexander ; Reynold Xin <
> r...@databricks.com>; dev@spark.apache.org
> *Subject:* Re: [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> It's also marked as Minor, not Blocker.
>
>
>
> On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin 
> wrote:
>
> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
>  wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> > resolved.
>
> To be pedantic, it's marked as a duplicate
> (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
> mean necessarily that it's fixed.
>
>
>
>
> > https://issues.apache.org/jira/browse/SPARK-15893
> >
> > From: Reynold Xin [mailto:r...@databricks.com]
> > Sent: Tuesday, June 21, 2016 6:27 PM
> > To: dev@spark.apache.org
> > Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
> >
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and
> passes
> > if a majority of at least 3+1 PMC votes are cast.
> >
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.0.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> >
> >
> >
> > The tag to be voted on is v2.0.0-rc1
> > (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
> >
> >
> >
> > This release candidate resolves ~2400 issues:
> > https://s.apache.org/spark-2.0.0-rc1-jira
> >
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
> >
> >
> >
> > Release artifacts are signed with the following key:
> >
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> >
> >
> > The staging repository for this release can be found at:
> >
> > https://repository.apache.org/content/repositories/orgapachespark-1187/
> >
> >
> >
> > The documentation corresponding to this release can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
> >
> >
> >
> >
> >
> > ===
> >
> > == How can I help test this release? ==
> >
> > ===
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 1.x.
> >
> >
> >
> > 
> >
> > == What justifies a -1 vote for this release? ==
> >
> > 
> >
> > Critical bugs impacting major functionalities.
> >
> >
> >
> > Bugs already present in 1.x, missing features, or bugs related to new
> > features will not necessarily block this release. Note that historically
> > Spark documentation has been published on the website separately from the
> > main release so we do not need to block the release due to documentation
> > errors either.
> >
> >
> >
> >
>
>
> --
> Marcelo
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>
>


RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker 
since there are people that develop for Spark on Windows. The referenced issue 
is indeed Minor and has nothing to do with unit tests.

From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Wednesday, June 22, 2016 4:09 PM
To: Marcelo Vanzin 
Cc: Ulanov, Alexander ; Reynold Xin 
; dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

It's also marked as Minor, not Blocker.

On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin 
> wrote:
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
> wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.

To be pedantic, it's marked as a duplicate
(https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
mean necessarily that it's fixed.



> https://issues.apache.org/jira/browse/SPARK-15893
>
> From: Reynold Xin [mailto:r...@databricks.com]
> Sent: Tuesday, June 21, 2016 6:27 PM
> To: dev@spark.apache.org
> Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Spark 2.0.0
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
>
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
>
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
>
>
>
> ===
>
> == How can I help test this release? ==
>
> ===
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
>
>
> 
>
> == What justifies a -1 vote for this release? ==
>
> 
>
> Critical bugs impacting major functionalities.
>
>
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>


--
Marcelo

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
It's also marked as Minor, not Blocker.

On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin  wrote:

> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
>  wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> > resolved.
>
> To be pedantic, it's marked as a duplicate
> (https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
> mean necessarily that it's fixed.
>
>
>
> > https://issues.apache.org/jira/browse/SPARK-15893
> >
> > From: Reynold Xin [mailto:r...@databricks.com]
> > Sent: Tuesday, June 21, 2016 6:27 PM
> > To: dev@spark.apache.org
> > Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
> >
> >
> >
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and
> passes
> > if a majority of at least 3+1 PMC votes are cast.
> >
> >
> >
> > [ ] +1 Release this package as Apache Spark 2.0.0
> >
> > [ ] -1 Do not release this package because ...
> >
> >
> >
> >
> >
> > The tag to be voted on is v2.0.0-rc1
> > (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
> >
> >
> >
> > This release candidate resolves ~2400 issues:
> > https://s.apache.org/spark-2.0.0-rc1-jira
> >
> >
> >
> > The release files, including signatures, digests, etc. can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
> >
> >
> >
> > Release artifacts are signed with the following key:
> >
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> >
> >
> > The staging repository for this release can be found at:
> >
> > https://repository.apache.org/content/repositories/orgapachespark-1187/
> >
> >
> >
> > The documentation corresponding to this release can be found at:
> >
> > http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
> >
> >
> >
> >
> >
> > ===
> >
> > == How can I help test this release? ==
> >
> > ===
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions from 1.x.
> >
> >
> >
> > 
> >
> > == What justifies a -1 vote for this release? ==
> >
> > 
> >
> > Critical bugs impacting major functionalities.
> >
> >
> >
> > Bugs already present in 1.x, missing features, or bugs related to new
> > features will not necessarily block this release. Note that historically
> > Spark documentation has been published on the website separately from the
> > main release so we do not need to block the release due to documentation
> > errors either.
> >
> >
> >
> >
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Marcelo Vanzin
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
 wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.

To be pedantic, it's marked as a duplicate
(https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
mean necessarily that it's fixed.



> https://issues.apache.org/jira/browse/SPARK-15893
>
> From: Reynold Xin [mailto:r...@databricks.com]
> Sent: Tuesday, June 21, 2016 6:27 PM
> To: dev@spark.apache.org
> Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Spark 2.0.0
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
>
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
>
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
>
>
>
> ===
>
> == How can I help test this release? ==
>
> ===
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
>
>
> 
>
> == What justifies a -1 vote for this release? ==
>
> 
>
> Critical bugs impacting major functionalities.
>
>
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>



-- 
Marcelo

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Hamstra
SPARK-15893 is resolved as a duplicate of SPARK-15899.  SPARK-15899 is
Unresolved.

On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander  wrote:

> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
>
> https://issues.apache.org/jira/browse/SPARK-15893
>
> *From:* Reynold Xin [mailto:r...@databricks.com]
> *Sent:* Tuesday, June 21, 2016 6:27 PM
> *To:* dev@spark.apache.org
> *Subject:* [VOTE] Release Apache Spark 2.0.0 (RC1)
>
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
> if a majority of at least 3+1 PMC votes are cast.
>
>
>
> [ ] +1 Release this package as Apache Spark 2.0.0
>
> [ ] -1 Do not release this package because ...
>
>
>
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
>
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
>
>
> The release files, including signatures, digests, etc. can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
>
>
> Release artifacts are signed with the following key:
>
> https://people.apache.org/keys/committer/pwendell.asc
>
>
>
> The staging repository for this release can be found at:
>
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
>
>
> The documentation corresponding to this release can be found at:
>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
>
>
>
> ===
>
> == How can I help test this release? ==
>
> ===
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
>
>
> 
>
> == What justifies a -1 vote for this release? ==
>
> 
>
> Critical bugs impacting major functionalities.
>
>
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>
>


RE: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Ulanov, Alexander
-1
Spark Unit tests fail on Windows. Still not resolved, though marked as resolved.
https://issues.apache.org/jira/browse/SPARK-15893
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Tuesday, June 21, 2016 6:27 PM
To: dev@spark.apache.org
Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)

Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes if a 
majority of at least 3+1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc1 (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).

This release candidate resolves ~2400 issues: 
https://s.apache.org/spark-2.0.0-rc1-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1187/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/


===
== How can I help test this release? ==
===
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.


== What justifies a -1 vote for this release? ==

Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.




Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sameer Agarwal
+1

On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta 
wrote:

> +1 (non-binding)
>
> On 2016/06/23 4:53, Reynold Xin wrote:
>
> +1 myself
>
>
> On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara <
> sean.mcnam...@webtrends.com> wrote:
>
>> +1
>>
>> On Jun 22, 2016, at 1:14 PM, Michael Armbrust 
>> wrote:
>>
>> +1
>>
>> On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly 
>> wrote:
>>
>>> +1
>>>
>>> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter < 
>>> timhun...@databricks.com> wrote:
>>>
 +1 This release passes all tests on the graphframes and tensorframes
 packages.

 On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger < 
 c...@koeninger.org> wrote:

> If we're considering backporting changes for the 0.8 kafka
> integration, I am sure there are people who would like to get
>
> https://issues.apache.org/jira/browse/SPARK-10963
>
> into 1.6.x as well
>
> On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen < 
> so...@cloudera.com> wrote:
> > Good call, probably worth back-porting, I'll try to do that. I don't
> > think it blocks a release, but would be good to get into a next RC if
> > any.
> >
> > On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins <
> robbin...@gmail.com> wrote:
> >> This has failed on our 1.6 stream builds regularly.
> >> ( 
> https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
> >>
> >> On Wed, 22 Jun 2016 at 11:15 Sean Owen < 
> so...@cloudera.com> wrote:
> >>>
> >>> Oops, one more in the "does anybody else see this" department:
> >>>
> >>> - offset recovery *** FAILED ***
> >>>   recoveredOffsetRanges.forall(((or:
> (org.apache.spark.streaming.Time,
> >>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
> >>>
> >>>
> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
> >>>
> >>>
> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
> >>>
> >>>
> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
> >>> was false Recovered ranges are not the same as the ones generated
> >>> (DirectKafkaStreamSuite.scala:301)
> >>>
> >>> This actually fails consistently for me too in the Kafka
> integration
> >>> code. Not timezone related, I think.
> >
> > -
> > To unsubscribe, e-mail: 
> dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: 
> dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: 
> dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: 
> dev-h...@spark.apache.org
>
>

>>
>>
>
>


-- 
Sameer Agarwal
Software Engineer | Databricks Inc.
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Kousuke Saruta

+1 (non-binding)


On 2016/06/23 4:53, Reynold Xin wrote:

+1 myself


On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara 
> wrote:


+1


On Jun 22, 2016, at 1:14 PM, Michael Armbrust
> wrote:

+1

On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
> wrote:

+1

On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
>
wrote:

+1 This release passes all tests on the graphframes and
tensorframes packages.

On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger
> wrote:

If we're considering backporting changes for the 0.8
kafka
integration, I am sure there are people who would
like to get

https://issues.apache.org/jira/browse/SPARK-10963

into 1.6.x as well

On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen
> wrote:
> Good call, probably worth back-porting, I'll try to
do that. I don't
> think it blocks a release, but would be good to get
into a next RC if
> any.
>
> On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins
> wrote:
>> This has failed on our 1.6 stream builds regularly.
>> (https://issues.apache.org/jira/browse/SPARK-6005)
looks fixed in 2.0?
>>
>> On Wed, 22 Jun 2016 at 11:15 Sean Owen
> wrote:
>>>
>>> Oops, one more in the "does anybody else see
this" department:
>>>
>>> - offset recovery *** FAILED ***
>>>  recoveredOffsetRanges.forall(((or:
(org.apache.spark.streaming.Time,
>>>
Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>
>>>

earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>
>>>

scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>
>>>

scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>>> was false Recovered ranges are not the same as
the ones generated
>>> (DirectKafkaStreamSuite.scala:301)
>>>
>>> This actually fails consistently for me too in
the Kafka integration
>>> code. Not timezone related, I think.
>
>

-
> To unsubscribe, e-mail:
dev-unsubscr...@spark.apache.org

> For additional commands, e-mail:
dev-h...@spark.apache.org

>


-
To unsubscribe, e-mail:
dev-unsubscr...@spark.apache.org

For additional commands, e-mail:
dev-h...@spark.apache.org











Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean McNamara
+1

On Jun 22, 2016, at 1:14 PM, Michael Armbrust 
> wrote:

+1

On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly 
> wrote:
+1

On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter 
> wrote:
+1 This release passes all tests on the graphframes and tensorframes packages.

On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger 
> wrote:
If we're considering backporting changes for the 0.8 kafka
integration, I am sure there are people who would like to get

https://issues.apache.org/jira/browse/SPARK-10963

into 1.6.x as well

On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen 
> wrote:
> Good call, probably worth back-porting, I'll try to do that. I don't
> think it blocks a release, but would be good to get into a next RC if
> any.
>
> On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins 
> > wrote:
>> This has failed on our 1.6 stream builds regularly.
>> (https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
>>
>> On Wed, 22 Jun 2016 at 11:15 Sean Owen 
>> > wrote:
>>>
>>> Oops, one more in the "does anybody else see this" department:
>>>
>>> - offset recovery *** FAILED ***
>>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>>
>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>>
>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>>
>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>>> was false Recovered ranges are not the same as the ones generated
>>> (DirectKafkaStreamSuite.scala:301)
>>>
>>> This actually fails consistently for me too in the Kafka integration
>>> code. Not timezone related, I think.
>
> -
> To unsubscribe, e-mail: 
> dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: 
> dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.org






Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Michael Armbrust
+1

On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly 
wrote:

> +1
>
> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter 
> wrote:
>
>> +1 This release passes all tests on the graphframes and tensorframes
>> packages.
>>
>> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger 
>> wrote:
>>
>>> If we're considering backporting changes for the 0.8 kafka
>>> integration, I am sure there are people who would like to get
>>>
>>> https://issues.apache.org/jira/browse/SPARK-10963
>>>
>>> into 1.6.x as well
>>>
>>> On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen  wrote:
>>> > Good call, probably worth back-porting, I'll try to do that. I don't
>>> > think it blocks a release, but would be good to get into a next RC if
>>> > any.
>>> >
>>> > On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins 
>>> wrote:
>>> >> This has failed on our 1.6 stream builds regularly.
>>> >> (https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in
>>> 2.0?
>>> >>
>>> >> On Wed, 22 Jun 2016 at 11:15 Sean Owen  wrote:
>>> >>>
>>> >>> Oops, one more in the "does anybody else see this" department:
>>> >>>
>>> >>> - offset recovery *** FAILED ***
>>> >>>   recoveredOffsetRanges.forall(((or:
>>> (org.apache.spark.streaming.Time,
>>> >>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>> >>>
>>> >>>
>>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>> >>>
>>> >>>
>>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>> >>>
>>> >>>
>>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>>> >>> was false Recovered ranges are not the same as the ones generated
>>> >>> (DirectKafkaStreamSuite.scala:301)
>>> >>>
>>> >>> This actually fails consistently for me too in the Kafka integration
>>> >>> code. Not timezone related, I think.
>>> >
>>> > -
>>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> > For additional commands, e-mail: dev-h...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>
>>>
>>


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Jonathan Kelly
+1

On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter 
wrote:

> +1 This release passes all tests on the graphframes and tensorframes
> packages.
>
> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger 
> wrote:
>
>> If we're considering backporting changes for the 0.8 kafka
>> integration, I am sure there are people who would like to get
>>
>> https://issues.apache.org/jira/browse/SPARK-10963
>>
>> into 1.6.x as well
>>
>> On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen  wrote:
>> > Good call, probably worth back-porting, I'll try to do that. I don't
>> > think it blocks a release, but would be good to get into a next RC if
>> > any.
>> >
>> > On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins 
>> wrote:
>> >> This has failed on our 1.6 stream builds regularly.
>> >> (https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
>> >>
>> >> On Wed, 22 Jun 2016 at 11:15 Sean Owen  wrote:
>> >>>
>> >>> Oops, one more in the "does anybody else see this" department:
>> >>>
>> >>> - offset recovery *** FAILED ***
>> >>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>> >>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>> >>>
>> >>>
>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>> >>>
>> >>>
>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>> >>>
>> >>>
>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>> >>> was false Recovered ranges are not the same as the ones generated
>> >>> (DirectKafkaStreamSuite.scala:301)
>> >>>
>> >>> This actually fails consistently for me too in the Kafka integration
>> >>> code. Not timezone related, I think.
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>


Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread Jörn Franke
You should see at it both levels: there is one bloom filter for Orc data and 
one for data in-memory. 

It is already a good step towards an integration of format and in-memory 
representation for columnar data. 

> On 22 Jun 2016, at 14:01, BaiRan  wrote:
> 
> After building bloom filter on existing data, does spark engine utilise bloom 
> filter during query processing?
> Is there any plan about predicate push down by using bloom filter in ORC / 
> Parquet?
> 
> Thanks
> Ran
>> On 22 Jun, 2016, at 10:48 am, Reynold Xin  wrote:
>> 
>> SPARK-12818 is about building a bloom filter on existing data. It has 
>> nothing to do with the ORC bloom filter, which can be used to do predicate 
>> pushdown.
>> 
>> 
>>> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan  wrote:
>>> Hi all,
>>> 
>>> I have a question about bloom filter implementation in Spark-12818 issue. 
>>> If I have a ORC file with bloom filter metadata, how can I utilise it by 
>>> Spark SQL?
>>> Thanks.
>>> 
>>> Best,
>>> Ran
> 


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Tim Hunter
+1 This release passes all tests on the graphframes and tensorframes
packages.

On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger  wrote:

> If we're considering backporting changes for the 0.8 kafka
> integration, I am sure there are people who would like to get
>
> https://issues.apache.org/jira/browse/SPARK-10963
>
> into 1.6.x as well
>
> On Wed, Jun 22, 2016 at 7:41 AM, Sean Owen  wrote:
> > Good call, probably worth back-porting, I'll try to do that. I don't
> > think it blocks a release, but would be good to get into a next RC if
> > any.
> >
> > On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins 
> wrote:
> >> This has failed on our 1.6 stream builds regularly.
> >> (https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
> >>
> >> On Wed, 22 Jun 2016 at 11:15 Sean Owen  wrote:
> >>>
> >>> Oops, one more in the "does anybody else see this" department:
> >>>
> >>> - offset recovery *** FAILED ***
> >>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
> >>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
> >>>
> >>>
> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
> >>>
> >>>
> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
> >>>
> >>>
> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
> >>> was false Recovered ranges are not the same as the ones generated
> >>> (DirectKafkaStreamSuite.scala:301)
> >>>
> >>> This actually fails consistently for me too in the Kafka integration
> >>> code. Not timezone related, I think.
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> > For additional commands, e-mail: dev-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Mark Grover
Yeah, I am +1 for including Kafka 0.10 integration as well. We had to wait
for Kafka 0.10 because there were incompatibilities between the Kafka 0.9
and 0.10 API. And, yes, the code for 0.8.0 remains unchanged so there
shouldn't be any regression for existing users. It's only new code for 0.10.

The comments about python support lacking are correct but I do think it's
unfair to unblock this particular PR, without a wider policy of blocking
every PR on that.

On Wed, Jun 22, 2016 at 9:01 AM, Chris Fregly  wrote:

> +1 for 0.10 support.  this is huge.
>
> On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger 
> wrote:
>
>> Luciano knows there are publicly available examples of how to use the
>> 0.10 connector, including TLS support, because he asked me about it
>> and I gave him a link
>>
>>
>> https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala
>>
>> If any committer at any time had said "I'd accept this PR, if only it
>> included X", I'd be happy to provide X.  Documentation updates and
>> python support for the 0.8 direct stream connector were done after the
>> original PR.
>>
>>
>>
>> On Wed, Jun 22, 2016 at 9:55 AM, Luciano Resende 
>> wrote:
>> >
>> >
>> > On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger 
>> wrote:
>> >>
>> >> As far as I know the only thing blocking it at this point is lack of
>> >> committer review / approval.
>> >>
>> >> It's technically adding a new feature after spark code-freeze, but it
>> >> doesn't change existing code, and the kafka project didn't release
>> >> 0.10 until the end of may.
>> >>
>> >
>> >
>> > To be fair with the Kafka 0.10 PR assessment :
>> >
>> > I was expecting somewhat an easy transition from customer using 0.80 to
>> 0.10
>> > connector, but the 0.10 seems to have been treated as a completely new
>> > extension, also, there is no python support, no samples on the pr
>> > demonstrating how to use security capabilities and no documentation
>> updates.
>> >
>> > Thanks
>> >
>> > --
>> > Luciano Resende
>> > http://twitter.com/lresende1975
>> > http://lresende.blogspot.com/
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>>
>
>
> --
> *Chris Fregly*
> Research Scientist @ PipelineIO
> San Francisco, CA
> pipeline.io
> advancedspark.com
>
>


[build system] jenkins process wedged, need to do restart

2016-06-22 Thread shane knapp
of course, on my first day back from vacation, i notice that the
jenkins process got wedged immediately upon my visiting the page.

one quick jenkins/httpd restart later and we're back up and building.
sorry for any inconvenience!

shane

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Chris Fregly
+1 for 0.10 support.  this is huge.

On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger  wrote:

> Luciano knows there are publicly available examples of how to use the
> 0.10 connector, including TLS support, because he asked me about it
> and I gave him a link
>
>
> https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala
>
> If any committer at any time had said "I'd accept this PR, if only it
> included X", I'd be happy to provide X.  Documentation updates and
> python support for the 0.8 direct stream connector were done after the
> original PR.
>
>
>
> On Wed, Jun 22, 2016 at 9:55 AM, Luciano Resende 
> wrote:
> >
> >
> > On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger 
> wrote:
> >>
> >> As far as I know the only thing blocking it at this point is lack of
> >> committer review / approval.
> >>
> >> It's technically adding a new feature after spark code-freeze, but it
> >> doesn't change existing code, and the kafka project didn't release
> >> 0.10 until the end of may.
> >>
> >
> >
> > To be fair with the Kafka 0.10 PR assessment :
> >
> > I was expecting somewhat an easy transition from customer using 0.80 to
> 0.10
> > connector, but the 0.10 seems to have been treated as a completely new
> > extension, also, there is no python support, no samples on the pr
> > demonstrating how to use security capabilities and no documentation
> updates.
> >
> > Thanks
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


-- 
*Chris Fregly*
Research Scientist @ PipelineIO
San Francisco, CA
pipeline.io
advancedspark.com


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
Luciano knows there are publicly available examples of how to use the
0.10 connector, including TLS support, because he asked me about it
and I gave him a link

https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala

If any committer at any time had said "I'd accept this PR, if only it
included X", I'd be happy to provide X.  Documentation updates and
python support for the 0.8 direct stream connector were done after the
original PR.



On Wed, Jun 22, 2016 at 9:55 AM, Luciano Resende  wrote:
>
>
> On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger  wrote:
>>
>> As far as I know the only thing blocking it at this point is lack of
>> committer review / approval.
>>
>> It's technically adding a new feature after spark code-freeze, but it
>> doesn't change existing code, and the kafka project didn't release
>> 0.10 until the end of may.
>>
>
>
> To be fair with the Kafka 0.10 PR assessment :
>
> I was expecting somewhat an easy transition from customer using 0.80 to 0.10
> connector, but the 0.10 seems to have been treated as a completely new
> extension, also, there is no python support, no samples on the pr
> demonstrating how to use security capabilities and no documentation updates.
>
> Thanks
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Luciano Resende
On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger  wrote:

> As far as I know the only thing blocking it at this point is lack of
> committer review / approval.
>
> It's technically adding a new feature after spark code-freeze, but it
> doesn't change existing code, and the kafka project didn't release
> 0.10 until the end of may.
>
>

To be fair with the Kafka 0.10 PR assessment :

I was expecting somewhat an easy transition from customer using 0.80 to
0.10 connector, but the 0.10 seems to have been treated as a completely new
extension, also, there is no python support, no samples on the pr
demonstrating how to use security capabilities and no documentation
updates.

Thanks

-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen
Hm, I thought that was to be added for 2.0. Imran I know you may have
been working alongside Mark on it; what do you think?

TD / Reynold would you object to it for 2.0?

On Wed, Jun 22, 2016 at 3:46 PM, Cody Koeninger  wrote:
> As far as I know the only thing blocking it at this point is lack of
> committer review / approval.
>
> It's technically adding a new feature after spark code-freeze, but it
> doesn't change existing code, and the kafka project didn't release
> 0.10 until the end of may.
>
>
> On Wed, Jun 22, 2016 at 9:39 AM, Sean Owen  wrote:
>> I profess ignorance again though I really should know by now, but,
>> what's opposing that? I personally thought this was going to be in 2.0
>> and didn't kind of notice it wasn't ...
>>
>> On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger  wrote:
>>> I don't have a vote, but I'd just like to reiterate that I think kafka
>>> 0.10 support should be added to a 2.0 release candidate; if not now,
>>> then well before release.
>>>
>>> - it's a completely standalone jar, so shouldn't break anyone who's
>>> using the existing 0.8 support
>>> - it's like the 5th highest voted open ticket, and has been open for months
>>> - Luciano has said multiple times that he wants to merge that PR into
>>> Bahir if it isn't in a RC for spark 2.0, which I think would confuse
>>> users and cause maintenance problems
>>>
>>> On Wed, Jun 22, 2016 at 12:38 AM, Sean Owen  wrote:
 While I'd officially -1 this while there are still many blockers, this
 should certainly be tested as usual, because they're mostly doc and
 "audit" type issues.

 On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and 
> passes
> if a majority of at least 3+1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc1
> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>
> This release candidate resolves ~2400 issues:
> https://s.apache.org/spark-2.0.0-rc1-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1187/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>
>
> ===
> == How can I help test this release? ==
> ===
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> 
> == What justifies a -1 vote for this release? ==
> 
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
As far as I know the only thing blocking it at this point is lack of
committer review / approval.

It's technically adding a new feature after spark code-freeze, but it
doesn't change existing code, and the kafka project didn't release
0.10 until the end of may.


On Wed, Jun 22, 2016 at 9:39 AM, Sean Owen  wrote:
> I profess ignorance again though I really should know by now, but,
> what's opposing that? I personally thought this was going to be in 2.0
> and didn't kind of notice it wasn't ...
>
> On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger  wrote:
>> I don't have a vote, but I'd just like to reiterate that I think kafka
>> 0.10 support should be added to a 2.0 release candidate; if not now,
>> then well before release.
>>
>> - it's a completely standalone jar, so shouldn't break anyone who's
>> using the existing 0.8 support
>> - it's like the 5th highest voted open ticket, and has been open for months
>> - Luciano has said multiple times that he wants to merge that PR into
>> Bahir if it isn't in a RC for spark 2.0, which I think would confuse
>> users and cause maintenance problems
>>
>> On Wed, Jun 22, 2016 at 12:38 AM, Sean Owen  wrote:
>>> While I'd officially -1 this while there are still many blockers, this
>>> should certainly be tested as usual, because they're mostly doc and
>>> "audit" type issues.
>>>
>>> On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin  wrote:
 Please vote on releasing the following candidate as Apache Spark version
 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
 if a majority of at least 3+1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.0.0
 [ ] -1 Do not release this package because ...


 The tag to be voted on is v2.0.0-rc1
 (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).

 This release candidate resolves ~2400 issues:
 https://s.apache.org/spark-2.0.0-rc1-jira

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1187/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/


 ===
 == How can I help test this release? ==
 ===
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions from 1.x.

 
 == What justifies a -1 vote for this release? ==
 
 Critical bugs impacting major functionalities.

 Bugs already present in 1.x, missing features, or bugs related to new
 features will not necessarily block this release. Note that historically
 Spark documentation has been published on the website separately from the
 main release so we do not need to block the release due to documentation
 errors either.


>>>
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Sean Owen
I profess ignorance again though I really should know by now, but,
what's opposing that? I personally thought this was going to be in 2.0
and didn't kind of notice it wasn't ...

On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger  wrote:
> I don't have a vote, but I'd just like to reiterate that I think kafka
> 0.10 support should be added to a 2.0 release candidate; if not now,
> then well before release.
>
> - it's a completely standalone jar, so shouldn't break anyone who's
> using the existing 0.8 support
> - it's like the 5th highest voted open ticket, and has been open for months
> - Luciano has said multiple times that he wants to merge that PR into
> Bahir if it isn't in a RC for spark 2.0, which I think would confuse
> users and cause maintenance problems
>
> On Wed, Jun 22, 2016 at 12:38 AM, Sean Owen  wrote:
>> While I'd officially -1 this while there are still many blockers, this
>> should certainly be tested as usual, because they're mostly doc and
>> "audit" type issues.
>>
>> On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin  wrote:
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
>>> if a majority of at least 3+1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> The tag to be voted on is v2.0.0-rc1
>>> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>>>
>>> This release candidate resolves ~2400 issues:
>>> https://s.apache.org/spark-2.0.0-rc1-jira
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1187/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>>>
>>>
>>> ===
>>> == How can I help test this release? ==
>>> ===
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions from 1.x.
>>>
>>> 
>>> == What justifies a -1 vote for this release? ==
>>> 
>>> Critical bugs impacting major functionalities.
>>>
>>> Bugs already present in 1.x, missing features, or bugs related to new
>>> features will not necessarily block this release. Note that historically
>>> Spark documentation has been published on the website separately from the
>>> main release so we do not need to block the release due to documentation
>>> errors either.
>>>
>>>
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Nicholas Chammas
For the clueless (like me):

https://bahir.apache.org/#home

Apache Bahir provides extensions to distributed analytic platforms such as
Apache Spark.

Initially Apache Bahir will contain streaming connectors that were a part
of Apache Spark prior to version 2.0:

   - streaming-akka
   - streaming-mqtt
   - streaming-twitter
   - streaming-zeromq

The Apache Bahir community welcomes the proposal of new extensions.

Nick
​

On Wed, Jun 22, 2016 at 10:40 AM Sean Owen  wrote:

> I profess ignorance again though I really should know by now, but,
> what's opposing that? I personally thought this was going to be in 2.0
> and didn't kind of notice it wasn't ...
>
> On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger 
> wrote:
> > I don't have a vote, but I'd just like to reiterate that I think kafka
> > 0.10 support should be added to a 2.0 release candidate; if not now,
> > then well before release.
> >
> > - it's a completely standalone jar, so shouldn't break anyone who's
> > using the existing 0.8 support
> > - it's like the 5th highest voted open ticket, and has been open for
> months
> > - Luciano has said multiple times that he wants to merge that PR into
> > Bahir if it isn't in a RC for spark 2.0, which I think would confuse
> > users and cause maintenance problems
> >
> > On Wed, Jun 22, 2016 at 12:38 AM, Sean Owen  wrote:
> >> While I'd officially -1 this while there are still many blockers, this
> >> should certainly be tested as usual, because they're mostly doc and
> >> "audit" type issues.
> >>
> >> On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin 
> wrote:
> >>> Please vote on releasing the following candidate as Apache Spark
> version
> >>> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and
> passes
> >>> if a majority of at least 3+1 PMC votes are cast.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.0.0
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>>
> >>> The tag to be voted on is v2.0.0-rc1
> >>> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
> >>>
> >>> This release candidate resolves ~2400 issues:
> >>> https://s.apache.org/spark-2.0.0-rc1-jira
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
> >>>
> >>> Release artifacts are signed with the following key:
> >>> https://people.apache.org/keys/committer/pwendell.asc
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1187/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>>
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
> >>>
> >>>
> >>> ===
> >>> == How can I help test this release? ==
> >>> ===
> >>> If you are a Spark user, you can help us test this release by taking an
> >>> existing Spark workload and running on this release candidate, then
> >>> reporting any regressions from 1.x.
> >>>
> >>> 
> >>> == What justifies a -1 vote for this release? ==
> >>> 
> >>> Critical bugs impacting major functionalities.
> >>>
> >>> Bugs already present in 1.x, missing features, or bugs related to new
> >>> features will not necessarily block this release. Note that
> historically
> >>> Spark documentation has been published on the website separately from
> the
> >>> main release so we do not need to block the release due to
> documentation
> >>> errors either.
> >>>
> >>>
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> >> For additional commands, e-mail: dev-h...@spark.apache.org
> >>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC1)

2016-06-22 Thread Cody Koeninger
I don't have a vote, but I'd just like to reiterate that I think kafka
0.10 support should be added to a 2.0 release candidate; if not now,
then well before release.

- it's a completely standalone jar, so shouldn't break anyone who's
using the existing 0.8 support
- it's like the 5th highest voted open ticket, and has been open for months
- Luciano has said multiple times that he wants to merge that PR into
Bahir if it isn't in a RC for spark 2.0, which I think would confuse
users and cause maintenance problems

On Wed, Jun 22, 2016 at 12:38 AM, Sean Owen  wrote:
> While I'd officially -1 this while there are still many blockers, this
> should certainly be tested as usual, because they're mostly doc and
> "audit" type issues.
>
> On Wed, Jun 22, 2016 at 2:26 AM, Reynold Xin  wrote:
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Friday, June 24, 2016 at 19:00 PDT and passes
>> if a majority of at least 3+1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc1
>> (0c66ca41afade6db73c9aeddd5aed6e5dcea90df).
>>
>> This release candidate resolves ~2400 issues:
>> https://s.apache.org/spark-2.0.0-rc1-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1187/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc1-docs/
>>
>>
>> ===
>> == How can I help test this release? ==
>> ===
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> 
>> == What justifies a -1 vote for this release? ==
>> 
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Spark internal Logging trait potential thread unsafe

2016-06-22 Thread Prajwal Tuladhar
Created a JIRA issue https://issues.apache.org/jira/browse/SPARK-16131 and
PR @ https://github.com/apache/spark/pull/13842

On Fri, Jun 17, 2016 at 5:19 AM, Sean Owen  wrote:

> I think that's OK to change, yes. I don't see why it's necessary to
> init log_ the way it is now. initializeLogIfNecessary() has a purpose
> though.
>
> On Fri, Jun 17, 2016 at 2:39 AM, Prajwal Tuladhar 
> wrote:
> > Hi,
> >
> > The way log instance inside Logger trait is current being initialized
> > doesn't seem to be thread safe [1]. Current implementation only
> guarantees
> > initializeLogIfNecessary() is initialized in lazy + thread safe way.
> >
> > Is there a reason why it can't be just: [2]
> >
> > @transient private lazy val log_ : Logger = {
> > initializeLogIfNecessary(false)
> > LoggerFactory.getLogger(logName)
> >   }
> >
> >
> > And with that initializeLogIfNecessary() can be called without double
> > locking.
> >
> > --
> > --
> > Cheers,
> > Praj
> >
> > [1]
> >
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/internal/Logging.scala#L44-L50
> > [2]
> >
> https://github.com/apache/spark/blob/8ef3399aff04bf8b7ab294c0f55bcf195995842b/core/src/main/scala/org/apache/spark/internal/Logging.scala#L35
>



-- 
--
Cheers,
Praj


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
Good call, probably worth back-porting, I'll try to do that. I don't
think it blocks a release, but would be good to get into a next RC if
any.

On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins  wrote:
> This has failed on our 1.6 stream builds regularly.
> (https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
>
> On Wed, 22 Jun 2016 at 11:15 Sean Owen  wrote:
>>
>> Oops, one more in the "does anybody else see this" department:
>>
>> - offset recovery *** FAILED ***
>>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
>> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>>
>> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>>
>> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>>
>> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
>> was false Recovered ranges are not the same as the ones generated
>> (DirectKafkaStreamSuite.scala:301)
>>
>> This actually fails consistently for me too in the Kafka integration
>> code. Not timezone related, I think.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Question about Bloom Filter in Spark 2.0

2016-06-22 Thread BaiRan
After building bloom filter on existing data, does spark engine utilise bloom 
filter during query processing?
Is there any plan about predicate push down by using bloom filter in ORC / 
Parquet?

Thanks
Ran
> On 22 Jun, 2016, at 10:48 am, Reynold Xin  wrote:
> 
> SPARK-12818 is about building a bloom filter on existing data. It has nothing 
> to do with the ORC bloom filter, which can be used to do predicate pushdown.
> 
> 
> On Tue, Jun 21, 2016 at 7:45 PM, BaiRan  > wrote:
> Hi all,
> 
> I have a question about bloom filter implementation in Spark-12818 issue. If 
> I have a ORC file with bloom filter metadata, how can I utilise it by Spark 
> SQL?
> Thanks.
> 
> Best,
> Ran
> 



Spark Task failure with File segment length as negative

2016-06-22 Thread Priya Ch
Hi All,

I am running Spark Application with 1.8TB of data (which is stored in Hive
tables format).  I am reading the data using HiveContect and processing it.
The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I
am launching the application with 25 executors with 5 cores each and 45GB
per executor. Also, specified the property
spark.yarn.executor.memoryOverhead=2024.

During the execution, tasks are lost and ShuffleMapTasks are re-submitted.
I am seeing that tasks are failing with the following message -

*java.lang.IllegalArgumentException: requirement failed: File segment
length cannot be negative (got -27045427)*









* at scala.Predef$.require(Predef.scala:233)*









* at org.apache.spark.storage.FileSegment.(FileSegment.scala:28)*









* at
org.apache.spark.storage.DiskBlockObjectWriter.fileSegment(DiskBlockObjectWriter.scala:220)*









* at
org.apache.spark.shuffle.sort.ShuffleExternalSorter.writeSortedFile(ShuffleExternalSorter.java:184)*









* at
org.apache.spark.shuffle.sort.ShuffleExternalSorter.closeAndGetSpills(ShuffleExternalSorter.java:398)*









* at
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.closeAndWriteOutput(UnsafeShuffleWriter.java:206)*









* at
org.apache.spark.shuffle.sort.UnsafeShuffleWriter.write(UnsafeShuffleWriter.java:166)*









* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)*









* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)*









* at org.apache.spark.scheduler.Task.run(Task.scala:89)*









* at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)*









* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*









* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*









I understood that its because the shuffle block is > 2G, the Int value is
taking negative and throwing the above exeception.

Can someone throw light on this ? What is the fix for this ?

Thanks,
Padma CH


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Pete Robbins
This has failed on our 1.6 stream builds regularly. (
https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?

On Wed, 22 Jun 2016 at 11:15 Sean Owen  wrote:

> Oops, one more in the "does anybody else see this" department:
>
> - offset recovery *** FAILED ***
>   recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
> Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
>
> earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
>
> scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
>
> scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
> was false Recovered ranges are not the same as the ones generated
> (DirectKafkaStreamSuite.scala:301)
>
> This actually fails consistently for me too in the Kafka integration
> code. Not timezone related, I think.
>
> On Wed, Jun 22, 2016 at 9:02 AM, Sean Owen  wrote:
> > I'm fairly convinced this error and others that appear timestamp
> > related are an environment problem. This test and method have been
> > present for several Spark versions, without change. I reviewed the
> > logic and it seems sound, explicitly setting the time zone correctly.
> > I am not sure why it behaves differently on this machine.
> >
> > I'd give a +1 to this release if nobody else is seeing errors like
> > this. The sigs, hashes, other tests pass for me.
> >
> > On Tue, Jun 21, 2016 at 6:49 PM, Sean Owen  wrote:
> >> UIUtilsSuite:
> >> - formatBatchTime *** FAILED ***
> >>   "2015/05/14 [14]:04:40" did not equal "2015/05/14 [21]:04:40"
> >> (UIUtilsSuite.scala:73)
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>


Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
Oops, one more in the "does anybody else see this" department:

- offset recovery *** FAILED ***
  recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
earlierOffsetRangesAsSets.contains(scala.Tuple2.apply[org.apache.spark.streaming.Time,
scala.collection.immutable.Set[org.apache.spark.streaming.kafka.OffsetRange]](or._1,
scala.this.Predef.refArrayOps[org.apache.spark.streaming.kafka.OffsetRange](or._2).toSet[org.apache.spark.streaming.kafka.OffsetRange]
was false Recovered ranges are not the same as the ones generated
(DirectKafkaStreamSuite.scala:301)

This actually fails consistently for me too in the Kafka integration
code. Not timezone related, I think.

On Wed, Jun 22, 2016 at 9:02 AM, Sean Owen  wrote:
> I'm fairly convinced this error and others that appear timestamp
> related are an environment problem. This test and method have been
> present for several Spark versions, without change. I reviewed the
> logic and it seems sound, explicitly setting the time zone correctly.
> I am not sure why it behaves differently on this machine.
>
> I'd give a +1 to this release if nobody else is seeing errors like
> this. The sigs, hashes, other tests pass for me.
>
> On Tue, Jun 21, 2016 at 6:49 PM, Sean Owen  wrote:
>> UIUtilsSuite:
>> - formatBatchTime *** FAILED ***
>>   "2015/05/14 [14]:04:40" did not equal "2015/05/14 [21]:04:40"
>> (UIUtilsSuite.scala:73)

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.6.2 (RC2)

2016-06-22 Thread Sean Owen
I'm fairly convinced this error and others that appear timestamp
related are an environment problem. This test and method have been
present for several Spark versions, without change. I reviewed the
logic and it seems sound, explicitly setting the time zone correctly.
I am not sure why it behaves differently on this machine.

I'd give a +1 to this release if nobody else is seeing errors like
this. The sigs, hashes, other tests pass for me.

On Tue, Jun 21, 2016 at 6:49 PM, Sean Owen  wrote:
> UIUtilsSuite:
> - formatBatchTime *** FAILED ***
>   "2015/05/14 [14]:04:40" did not equal "2015/05/14 [21]:04:40"
> (UIUtilsSuite.scala:73)

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org