Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Wenchen Fan
see https://issues.apache.org/jira/browse/SPARK-19611

On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau  wrote:

> Whats the regression this fixed in 2.1 from 2.0?
>
> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan 
> wrote:
>
>> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will
>> only scan all table files only once, and write back the inferred schema to
>> metastore so that we don't need to do the schema inference again.
>>
>> So technically this will introduce a performance regression for the first
>> query, but compared to branch-2.0, it's not performance regression. And
>> this patch fixed a regression in branch-2.1, which can run in branch-2.0.
>> Personally, I think we should keep INFER_AND_SAVE as the default mode.
>>
>> + [Eric], what do you think?
>>
>> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust > > wrote:
>>
>>> Thanks for pointing this out, Michael.  Based on the conversation on
>>> the PR
>>> 
>>> this seems like a risky change to include in a release branch with a
>>> default other than NEVER_INFER.
>>>
>>> +Wenchen?  What do you think?
>>>
>>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
>>> wrote:
>>>
 We've identified the cause of the change in behavior. It is related to
 the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key
 and its related functionality was absent from our previous build. The
 default setting in the current build was causing Spark to attempt to scan
 all table files during query analysis. Changing this setting to NEVER_INFER
 disabled this operation and resolved the issue we had.

 Michael


 On Apr 20, 2017, at 3:42 PM, Michael Allman 
 wrote:

 I want to caution that in testing a build from this morning's
 branch-2.1 we found that Hive partition pruning was not working. We found
 that Spark SQL was fetching all Hive table partitions for a very simple
 query whereas in a build from several weeks ago it was fetching only the
 required partitions. I cannot currently think of a reason for the
 regression outside of some difference between branch-2.1 from our previous
 build and branch-2.1 from this morning.

 That's all I know right now. We are actively investigating to find the
 root cause of this problem, and specifically whether this is a problem in
 the Spark codebase or not. I will report back when I have an answer to that
 question.

 Michael


 On Apr 18, 2017, at 11:59 AM, Michael Armbrust 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
 PST and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.1.1
 [ ] -1 Do not release this package because ...


 To learn more about Apache Spark, please see http://spark.apache.org/

 The tag to be voted on is v2.1.1-rc3
  (2ed19cff2f6ab79
 a718526e5d16633412d8c4dd4)

 List of JIRA tickets resolved can be found with this filter
 
 .

 The release files, including signatures, digests, etc. can be found at:
 http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1230/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/


 *FAQ*

 *How can I help test this release?*

 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions.

 *What should happen to JIRA tickets still targeting 2.1.1?*

 Committers should look at those and triage. Extremely important bug
 fixes, documentation, and API tweaks that impact compatibility should be
 worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

 *But my bug isn't fixed!??!*

 In order to make timely releases, we will typically not hold the
 release unless the bug in question is a regression from 2.1.0.

 *What happened to RC1?*

 There were issues with the release packaging and as a result was
 skipped.




>>>
>>
>
>
> --
> Cell : 

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Armbrust
Yeah, I agree.

-1 (binding)

This vote fails, and I'll cut a new RC after #17749
 is merged.

On Mon, Apr 24, 2017 at 12:18 PM, Eric Liang  wrote:

> -1 (non-binding)
>
> I also agree with using NEVER_INFER for 2.1.1. The migration cost is
> unexpected for a point release.
>
> On Mon, Apr 24, 2017 at 11:08 AM Holden Karau 
> wrote:
>
>> Whoops, sorry finger slipped on that last message.
>> It sounds like whatever we do is going to break some existing users
>> (either with the tables by case sensitivity or with the unexpected scan).
>>
>> Personally I agree with Michael Allman on this, I believe we should
>> use INFER_NEVER for 2.1.1.
>>
>> On Mon, Apr 24, 2017 at 11:01 AM, Holden Karau 
>> wrote:
>>
>>> It
>>>
>>> On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman 
>>> wrote:
>>>
 The trouble we ran into is that this upgrade was blocking access to our
 tables, and we didn't know why. This sounds like a kind of migration
 operation, but it was not apparent that this was the case. It took an
 expert examining a stack trace and source code to figure this out. Would a
 more naive end user be able to debug this issue? Maybe we're an unusual
 case, but our particular experience was pretty bad. I have my doubts that
 the schema inference on our largest tables would ever complete without
 throwing some kind of timeout (which we were in fact receiving) or the end
 user just giving up and killing our job. We ended up doing a rollback while
 we investigated the source of the issue. In our case, INFER_NEVER is
 clearly the best configuration. We're going to add that to our default
 configuration files.

 My expectation is that a minor point release is a pretty safe bug fix
 release. We were a bit hasty in not doing better due diligence pre-upgrade.

 One suggestion the Spark team might consider is releasing 2.1.1 with
 INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of
 up-front migration notes would help in identifying this new behavior in 
 2.2.

 Thanks,

 Michael


 On Apr 24, 2017, at 2:09 AM, Wenchen Fan 
 wrote:

 see https://issues.apache.org/jira/browse/SPARK-19611

 On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau 
 wrote:

> Whats the regression this fixed in 2.1 from 2.0?
>
> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan 
> wrote:
>
>> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will
>> only scan all table files only once, and write back the inferred schema 
>> to
>> metastore so that we don't need to do the schema inference again.
>>
>> So technically this will introduce a performance regression for the
>> first query, but compared to branch-2.0, it's not performance regression.
>> And this patch fixed a regression in branch-2.1, which can run in
>> branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the
>> default mode.
>>
>> + [Eric], what do you think?
>>
>> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> Thanks for pointing this out, Michael.  Based on the conversation
>>> on the PR
>>> 
>>> this seems like a risky change to include in a release branch with a
>>> default other than NEVER_INFER.
>>>
>>> +Wenchen?  What do you think?
>>>
>>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman <
>>> mich...@videoamp.com> wrote:
>>>
 We've identified the cause of the change in behavior. It is related
 to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode".
 This key and its related functionality was absent from our previous 
 build.
 The default setting in the current build was causing Spark to attempt 
 to
 scan all table files during query analysis. Changing this setting
 to NEVER_INFER disabled this operation and resolved the issue we had.

 Michael


 On Apr 20, 2017, at 3:42 PM, Michael Allman 
 wrote:

 I want to caution that in testing a build from this morning's
 branch-2.1 we found that Hive partition pruning was not working. We 
 found
 that Spark SQL was fetching all Hive table partitions for a very simple
 query whereas in a build from several weeks ago it was fetching only 
 the
 required partitions. I cannot currently think of a reason for the
 regression outside of some difference between branch-2.1 from our 
 previous
 build and branch-2.1 from this morning.

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
Whoops, sorry finger slipped on that last message.
It sounds like whatever we do is going to break some existing users (either
with the tables by case sensitivity or with the unexpected scan).

Personally I agree with Michael Allman on this, I believe we should
use INFER_NEVER for 2.1.1.

On Mon, Apr 24, 2017 at 11:01 AM, Holden Karau  wrote:

> It
>
> On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman 
> wrote:
>
>> The trouble we ran into is that this upgrade was blocking access to our
>> tables, and we didn't know why. This sounds like a kind of migration
>> operation, but it was not apparent that this was the case. It took an
>> expert examining a stack trace and source code to figure this out. Would a
>> more naive end user be able to debug this issue? Maybe we're an unusual
>> case, but our particular experience was pretty bad. I have my doubts that
>> the schema inference on our largest tables would ever complete without
>> throwing some kind of timeout (which we were in fact receiving) or the end
>> user just giving up and killing our job. We ended up doing a rollback while
>> we investigated the source of the issue. In our case, INFER_NEVER is
>> clearly the best configuration. We're going to add that to our default
>> configuration files.
>>
>> My expectation is that a minor point release is a pretty safe bug fix
>> release. We were a bit hasty in not doing better due diligence pre-upgrade.
>>
>> One suggestion the Spark team might consider is releasing 2.1.1 with
>> INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of up-front
>> migration notes would help in identifying this new behavior in 2.2.
>>
>> Thanks,
>>
>> Michael
>>
>>
>> On Apr 24, 2017, at 2:09 AM, Wenchen Fan  wrote:
>>
>> see https://issues.apache.org/jira/browse/SPARK-19611
>>
>> On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau 
>> wrote:
>>
>>> Whats the regression this fixed in 2.1 from 2.0?
>>>
>>> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan 
>>> wrote:
>>>
 IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will
 only scan all table files only once, and write back the inferred schema to
 metastore so that we don't need to do the schema inference again.

 So technically this will introduce a performance regression for the
 first query, but compared to branch-2.0, it's not performance regression.
 And this patch fixed a regression in branch-2.1, which can run in
 branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the
 default mode.

 + [Eric], what do you think?

 On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <
 mich...@databricks.com> wrote:

> Thanks for pointing this out, Michael.  Based on the conversation on
> the PR
> 
> this seems like a risky change to include in a release branch with a
> default other than NEVER_INFER.
>
> +Wenchen?  What do you think?
>
> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
> wrote:
>
>> We've identified the cause of the change in behavior. It is related
>> to the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode".
>> This key and its related functionality was absent from our previous 
>> build.
>> The default setting in the current build was causing Spark to attempt to
>> scan all table files during query analysis. Changing this setting
>> to NEVER_INFER disabled this operation and resolved the issue we had.
>>
>> Michael
>>
>>
>> On Apr 20, 2017, at 3:42 PM, Michael Allman 
>> wrote:
>>
>> I want to caution that in testing a build from this morning's
>> branch-2.1 we found that Hive partition pruning was not working. We found
>> that Spark SQL was fetching all Hive table partitions for a very simple
>> query whereas in a build from several weeks ago it was fetching only the
>> required partitions. I cannot currently think of a reason for the
>> regression outside of some difference between branch-2.1 from our 
>> previous
>> build and branch-2.1 from this morning.
>>
>> That's all I know right now. We are actively investigating to find
>> the root cause of this problem, and specifically whether this is a 
>> problem
>> in the Spark codebase or not. I will report back when I have an answer to
>> that question.
>>
>> Michael
>>
>>
>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark
>> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as 

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
It

On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman 
wrote:

> The trouble we ran into is that this upgrade was blocking access to our
> tables, and we didn't know why. This sounds like a kind of migration
> operation, but it was not apparent that this was the case. It took an
> expert examining a stack trace and source code to figure this out. Would a
> more naive end user be able to debug this issue? Maybe we're an unusual
> case, but our particular experience was pretty bad. I have my doubts that
> the schema inference on our largest tables would ever complete without
> throwing some kind of timeout (which we were in fact receiving) or the end
> user just giving up and killing our job. We ended up doing a rollback while
> we investigated the source of the issue. In our case, INFER_NEVER is
> clearly the best configuration. We're going to add that to our default
> configuration files.
>
> My expectation is that a minor point release is a pretty safe bug fix
> release. We were a bit hasty in not doing better due diligence pre-upgrade.
>
> One suggestion the Spark team might consider is releasing 2.1.1 with
> INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of up-front
> migration notes would help in identifying this new behavior in 2.2.
>
> Thanks,
>
> Michael
>
>
> On Apr 24, 2017, at 2:09 AM, Wenchen Fan  wrote:
>
> see https://issues.apache.org/jira/browse/SPARK-19611
>
> On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau 
> wrote:
>
>> Whats the regression this fixed in 2.1 from 2.0?
>>
>> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan 
>> wrote:
>>
>>> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will
>>> only scan all table files only once, and write back the inferred schema to
>>> metastore so that we don't need to do the schema inference again.
>>>
>>> So technically this will introduce a performance regression for the
>>> first query, but compared to branch-2.0, it's not performance regression.
>>> And this patch fixed a regression in branch-2.1, which can run in
>>> branch-2.0. Personally, I think we should keep INFER_AND_SAVE as the
>>> default mode.
>>>
>>> + [Eric], what do you think?
>>>
>>> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust <
>>> mich...@databricks.com> wrote:
>>>
 Thanks for pointing this out, Michael.  Based on the conversation on
 the PR
 
 this seems like a risky change to include in a release branch with a
 default other than NEVER_INFER.

 +Wenchen?  What do you think?

 On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
 wrote:

> We've identified the cause of the change in behavior. It is related to
> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This
> key and its related functionality was absent from our previous build. The
> default setting in the current build was causing Spark to attempt to scan
> all table files during query analysis. Changing this setting to 
> NEVER_INFER
> disabled this operation and resolved the issue we had.
>
> Michael
>
>
> On Apr 20, 2017, at 3:42 PM, Michael Allman 
> wrote:
>
> I want to caution that in testing a build from this morning's
> branch-2.1 we found that Hive partition pruning was not working. We found
> that Spark SQL was fetching all Hive table partitions for a very simple
> query whereas in a build from several weeks ago it was fetching only the
> required partitions. I cannot currently think of a reason for the
> regression outside of some difference between branch-2.1 from our previous
> build and branch-2.1 from this morning.
>
> That's all I know right now. We are actively investigating to find the
> root cause of this problem, and specifically whether this is a problem in
> the Spark codebase or not. I will report back when I have an answer to 
> that
> question.
>
> Michael
>
>
> On Apr 18, 2017, at 11:59 AM, Michael Armbrust 
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark
> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
>  (2ed19cff2f6ab79
> a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter
> 

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Allman
The trouble we ran into is that this upgrade was blocking access to our tables, 
and we didn't know why. This sounds like a kind of migration operation, but it 
was not apparent that this was the case. It took an expert examining a stack 
trace and source code to figure this out. Would a more naive end user be able 
to debug this issue? Maybe we're an unusual case, but our particular experience 
was pretty bad. I have my doubts that the schema inference on our largest 
tables would ever complete without throwing some kind of timeout (which we were 
in fact receiving) or the end user just giving up and killing our job. We ended 
up doing a rollback while we investigated the source of the issue. In our case, 
INFER_NEVER is clearly the best configuration. We're going to add that to our 
default configuration files.

My expectation is that a minor point release is a pretty safe bug fix release. 
We were a bit hasty in not doing better due diligence pre-upgrade.

One suggestion the Spark team might consider is releasing 2.1.1 with 
INVER_NEVER and 2.2.0 with INFER_AND_SAVE. Clearly some kind of up-front 
migration notes would help in identifying this new behavior in 2.2.

Thanks,

Michael


> On Apr 24, 2017, at 2:09 AM, Wenchen Fan  wrote:
> 
> see https://issues.apache.org/jira/browse/SPARK-19611 
> 
> 
> On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau  > wrote:
> Whats the regression this fixed in 2.1 from 2.0?
> 
> On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan  > wrote:
> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only 
> scan all table files only once, and write back the inferred schema to 
> metastore so that we don't need to do the schema inference again.
> 
> So technically this will introduce a performance regression for the first 
> query, but compared to branch-2.0, it's not performance regression. And this 
> patch fixed a regression in branch-2.1, which can run in branch-2.0. 
> Personally, I think we should keep INFER_AND_SAVE as the default mode.
> 
> + [Eric], what do you think?
> 
> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust  > wrote:
> Thanks for pointing this out, Michael.  Based on the conversation on the PR 
>  this 
> seems like a risky change to include in a release branch with a default other 
> than NEVER_INFER.
> 
> +Wenchen?  What do you think?
> 
> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman  > wrote:
> We've identified the cause of the change in behavior. It is related to the 
> SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its 
> related functionality was absent from our previous build. The default setting 
> in the current build was causing Spark to attempt to scan all table files 
> during query analysis. Changing this setting to NEVER_INFER disabled this 
> operation and resolved the issue we had.
> 
> Michael
> 
> 
>> On Apr 20, 2017, at 3:42 PM, Michael Allman > > wrote:
>> 
>> I want to caution that in testing a build from this morning's branch-2.1 we 
>> found that Hive partition pruning was not working. We found that Spark SQL 
>> was fetching all Hive table partitions for a very simple query whereas in a 
>> build from several weeks ago it was fetching only the required partitions. I 
>> cannot currently think of a reason for the regression outside of some 
>> difference between branch-2.1 from our previous build and branch-2.1 from 
>> this morning.
>> 
>> That's all I know right now. We are actively investigating to find the root 
>> cause of this problem, and specifically whether this is a problem in the 
>> Spark codebase or not. I will report back when I have an answer to that 
>> question.
>> 
>> Michael
>> 
>> 
>>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust >> > wrote:
>>> 
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes 
>>> if a majority of at least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Spark 2.1.1
>>> [ ] -1 Do not release this package because ...
>>> 
>>> 
>>> To learn more about Apache Spark, please see http://spark.apache.org/ 
>>> 
>>> 
>>> The tag to be voted on is v2.1.1-rc3 
>>>  
>>> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>>> 
>>> List of JIRA tickets resolved can be found with this filter 
>>> .
>>> 

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
Whats the regression this fixed in 2.1 from 2.0?

On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan  wrote:

> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only
> scan all table files only once, and write back the inferred schema to
> metastore so that we don't need to do the schema inference again.
>
> So technically this will introduce a performance regression for the first
> query, but compared to branch-2.0, it's not performance regression. And
> this patch fixed a regression in branch-2.1, which can run in branch-2.0.
> Personally, I think we should keep INFER_AND_SAVE as the default mode.
>
> + [Eric], what do you think?
>
> On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust 
> wrote:
>
>> Thanks for pointing this out, Michael.  Based on the conversation on the
>> PR 
>> this seems like a risky change to include in a release branch with a
>> default other than NEVER_INFER.
>>
>> +Wenchen?  What do you think?
>>
>> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
>> wrote:
>>
>>> We've identified the cause of the change in behavior. It is related to
>>> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key
>>> and its related functionality was absent from our previous build. The
>>> default setting in the current build was causing Spark to attempt to scan
>>> all table files during query analysis. Changing this setting to NEVER_INFER
>>> disabled this operation and resolved the issue we had.
>>>
>>> Michael
>>>
>>>
>>> On Apr 20, 2017, at 3:42 PM, Michael Allman 
>>> wrote:
>>>
>>> I want to caution that in testing a build from this morning's branch-2.1
>>> we found that Hive partition pruning was not working. We found that Spark
>>> SQL was fetching all Hive table partitions for a very simple query whereas
>>> in a build from several weeks ago it was fetching only the required
>>> partitions. I cannot currently think of a reason for the regression outside
>>> of some difference between branch-2.1 from our previous build and
>>> branch-2.1 from this morning.
>>>
>>> That's all I know right now. We are actively investigating to find the
>>> root cause of this problem, and specifically whether this is a problem in
>>> the Spark codebase or not. I will report back when I have an answer to that
>>> question.
>>>
>>> Michael
>>>
>>>
>>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust 
>>> wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark
>>> version 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00
>>> PST and passes if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.1.1
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.1.1-rc3
>>>  (2ed19cff2f6ab79
>>> a718526e5d16633412d8c4dd4)
>>>
>>> List of JIRA tickets resolved can be found with this filter
>>> 
>>> .
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1230/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>>>
>>>
>>> *FAQ*
>>>
>>> *How can I help test this release?*
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should be
>>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>>
>>> *But my bug isn't fixed!??!*
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.1.0.
>>>
>>> *What happened to RC1?*
>>>
>>> There were issues with the release packaging and as a result was skipped.
>>>
>>>
>>>
>>>
>>
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-21 Thread Wenchen Fan
IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will only
scan all table files only once, and write back the inferred schema to
metastore so that we don't need to do the schema inference again.

So technically this will introduce a performance regression for the first
query, but compared to branch-2.0, it's not performance regression. And
this patch fixed a regression in branch-2.1, which can run in branch-2.0.
Personally, I think we should keep INFER_AND_SAVE as the default mode.

+ [Eric], what do you think?

On Sat, Apr 22, 2017 at 1:37 AM, Michael Armbrust 
wrote:

> Thanks for pointing this out, Michael.  Based on the conversation on the
> PR 
> this seems like a risky change to include in a release branch with a
> default other than NEVER_INFER.
>
> +Wenchen?  What do you think?
>
> On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
> wrote:
>
>> We've identified the cause of the change in behavior. It is related to
>> the SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key
>> and its related functionality was absent from our previous build. The
>> default setting in the current build was causing Spark to attempt to scan
>> all table files during query analysis. Changing this setting to NEVER_INFER
>> disabled this operation and resolved the issue we had.
>>
>> Michael
>>
>>
>> On Apr 20, 2017, at 3:42 PM, Michael Allman  wrote:
>>
>> I want to caution that in testing a build from this morning's branch-2.1
>> we found that Hive partition pruning was not working. We found that Spark
>> SQL was fetching all Hive table partitions for a very simple query whereas
>> in a build from several weeks ago it was fetching only the required
>> partitions. I cannot currently think of a reason for the regression outside
>> of some difference between branch-2.1 from our previous build and
>> branch-2.1 from this morning.
>>
>> That's all I know right now. We are actively investigating to find the
>> root cause of this problem, and specifically whether this is a problem in
>> the Spark codebase or not. I will report back when I have an answer to that
>> question.
>>
>> Michael
>>
>>
>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust 
>> wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.1
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.1-rc3
>>  (2ed19cff2f6ab79
>> a718526e5d16633412d8c4dd4)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1230/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.0.
>>
>> *What happened to RC1?*
>>
>> There were issues with the release packaging and as a result was skipped.
>>
>>
>>
>>
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-21 Thread Michael Armbrust
Thanks for pointing this out, Michael.  Based on the conversation on the PR
 this
seems like a risky change to include in a release branch with a default
other than NEVER_INFER.

+Wenchen?  What do you think?

On Thu, Apr 20, 2017 at 4:14 PM, Michael Allman 
wrote:

> We've identified the cause of the change in behavior. It is related to the
> SQL conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and
> its related functionality was absent from our previous build. The default
> setting in the current build was causing Spark to attempt to scan all table
> files during query analysis. Changing this setting to NEVER_INFER disabled
> this operation and resolved the issue we had.
>
> Michael
>
>
> On Apr 20, 2017, at 3:42 PM, Michael Allman  wrote:
>
> I want to caution that in testing a build from this morning's branch-2.1
> we found that Hive partition pruning was not working. We found that Spark
> SQL was fetching all Hive table partitions for a very simple query whereas
> in a build from several weeks ago it was fetching only the required
> partitions. I cannot currently think of a reason for the regression outside
> of some difference between branch-2.1 from our previous build and
> branch-2.1 from this morning.
>
> That's all I know right now. We are actively investigating to find the
> root cause of this problem, and specifically whether this is a problem in
> the Spark codebase or not. I will report back when I have an answer to that
> question.
>
> Michael
>
>
> On Apr 18, 2017, at 11:59 AM, Michael Armbrust 
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
>  (2ed19cff2f6ab79
> a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>
>
>
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
We've identified the cause of the change in behavior. It is related to the SQL 
conf key "spark.sql.hive.caseSensitiveInferenceMode". This key and its related 
functionality was absent from our previous build. The default setting in the 
current build was causing Spark to attempt to scan all table files during query 
analysis. Changing this setting to NEVER_INFER disabled this operation and 
resolved the issue we had.

Michael


> On Apr 20, 2017, at 3:42 PM, Michael Allman  wrote:
> 
> I want to caution that in testing a build from this morning's branch-2.1 we 
> found that Hive partition pruning was not working. We found that Spark SQL 
> was fetching all Hive table partitions for a very simple query whereas in a 
> build from several weeks ago it was fetching only the required partitions. I 
> cannot currently think of a reason for the regression outside of some 
> difference between branch-2.1 from our previous build and branch-2.1 from 
> this morning.
> 
> That's all I know right now. We are actively investigating to find the root 
> cause of this problem, and specifically whether this is a problem in the 
> Spark codebase or not. I will report back when I have an answer to that 
> question.
> 
> Michael
> 
> 
>> On Apr 18, 2017, at 11:59 AM, Michael Armbrust > > wrote:
>> 
>> Please vote on releasing the following candidate as Apache Spark version 
>> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes 
>> if a majority of at least 3 +1 PMC votes are cast.
>> 
>> [ ] +1 Release this package as Apache Spark 2.1.1
>> [ ] -1 Do not release this package because ...
>> 
>> 
>> To learn more about Apache Spark, please see http://spark.apache.org/ 
>> 
>> 
>> The tag to be voted on is v2.1.1-rc3 
>>  
>> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>> 
>> List of JIRA tickets resolved can be found with this filter 
>> .
>> 
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/ 
>> 
>> 
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc 
>> 
>> 
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1230/ 
>> 
>> 
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/ 
>> 
>> 
>> 
>> FAQ
>> 
>> How can I help test this release?
>> 
>> If you are a Spark user, you can help us test this release by taking an 
>> existing Spark workload and running on this release candidate, then 
>> reporting any regressions.
>> 
>> What should happen to JIRA tickets still targeting 2.1.1?
>> 
>> Committers should look at those and triage. Extremely important bug fixes, 
>> documentation, and API tweaks that impact compatibility should be worked on 
>> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>> 
>> But my bug isn't fixed!??!
>> 
>> In order to make timely releases, we will typically not hold the release 
>> unless the bug in question is a regression from 2.1.0.
>> 
>> What happened to RC1?
>> 
>> There were issues with the release packaging and as a result was skipped.
> 



Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Michael Allman
I want to caution that in testing a build from this morning's branch-2.1 we 
found that Hive partition pruning was not working. We found that Spark SQL was 
fetching all Hive table partitions for a very simple query whereas in a build 
from several weeks ago it was fetching only the required partitions. I cannot 
currently think of a reason for the regression outside of some difference 
between branch-2.1 from our previous build and branch-2.1 from this morning.

That's all I know right now. We are actively investigating to find the root 
cause of this problem, and specifically whether this is a problem in the Spark 
codebase or not. I will report back when I have an answer to that question.

Michael


> On Apr 18, 2017, at 11:59 AM, Michael Armbrust  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
> 
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> 
> 
> The tag to be voted on is v2.1.1-rc3 
>  
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
> 
> List of JIRA tickets resolved can be found with this filter 
> .
> 
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/ 
> 
> 
> 
> FAQ
> 
> How can I help test this release?
> 
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions.
> 
> What should happen to JIRA tickets still targeting 2.1.1?
> 
> Committers should look at those and triage. Extremely important bug fixes, 
> documentation, and API tweaks that impact compatibility should be worked on 
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
> 
> But my bug isn't fixed!??!
> 
> In order to make timely releases, we will typically not hold the release 
> unless the bug in question is a regression from 2.1.0.
> 
> What happened to RC1?
> 
> There were issues with the release packaging and as a result was skipped.



Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Nicholas Chammas
Steve,

I think you're a good person to ask about this. Is the below any cause for
concern? Or did I perhaps test this incorrectly?

Nick


On Tue, Apr 18, 2017 at 11:50 PM Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> I had trouble starting up a shell with the AWS package loaded
> (specifically, org.apache.hadoop:hadoop-aws:2.7.3):
>
>
> [NOT FOUND  ] 
> com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)
>
>  local-m2-cache: tried
>
>   
> file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar
>
> [NOT FOUND  ] 
> org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)
>
>  local-m2-cache: tried
>
>   
> file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar
>
> [NOT FOUND  ] 
> com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)
>
>  local-m2-cache: tried
>
>   
> file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar
>
> ::
>
> ::  FAILED DOWNLOADS::
>
> :: ^ see resolution messages for details  ^ ::
>
> ::
>
> :: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)
>
> :: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)
>
> :: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar
>
> :: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
>
> ::
>
> Anyone know anything about this? I made sure to build Spark against the
> appropriate version of Hadoop.
>
> Nick
>
> On Tue, Apr 18, 2017 at 2:59 PM Michael Armbrust 
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
>> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
>> passes if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.1.1
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.1.1-rc3
>>  (
>> 2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>>
>> List of JIRA tickets resolved can be found with this filter
>> 
>> .
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1230/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>>
>>
>> *FAQ*
>>
>> *How can I help test this release?*
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> *What should happen to JIRA tickets still targeting 2.1.1?*
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>>
>> *But my bug isn't fixed!??!*
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.1.0.
>>
>> *What happened to RC1?*
>>
>> There were issues with the release packaging and as a result was skipped.
>>
> ​
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-20 Thread Denny Lee
+1 (non-binding)


On Wed, Apr 19, 2017 at 9:23 PM Dong Joon Hyun <dh...@hortonworks.com>
wrote:

> +1
>
> I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
> with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver
> –Psparkr`
>
> At the end of R test, I saw `Had CRAN check errors; see logs.`,
> but tests passed and log file looks good.
>
> Bests,
> Dongjoon.
>
> From: Reynold Xin <r...@databricks.com>
> Date: Wednesday, April 19, 2017 at 3:41 PM
> To: Marcelo Vanzin <van...@cloudera.com>
> Cc: Michael Armbrust <mich...@databricks.com>, "dev@spark.apache.org" <
> dev@spark.apache.org>
> Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)
>
> +1
>
> On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin <van...@cloudera.com>
> wrote:
>
>> +1 (non-binding).
>>
>> Ran the hadoop-2.6 binary against our internal tests and things look good.
>>
>> On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
>> <mich...@databricks.com> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
>> passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 2.1.1
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see http://spark.apache.org/
>> >
>> > The tag to be voted on is v2.1.1-rc3
>> > (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>> >
>> > List of JIRA tickets resolved can be found with this filter.
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1230/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>> >
>> >
>> > FAQ
>> >
>> > How can I help test this release?
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > What should happen to JIRA tickets still targeting 2.1.1?
>> >
>> > Committers should look at those and triage. Extremely important bug
>> fixes,
>> > documentation, and API tweaks that impact compatibility should be
>> worked on
>> > immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>> >
>> > But my bug isn't fixed!??!
>> >
>> > In order to make timely releases, we will typically not hold the release
>> > unless the bug in question is a regression from 2.1.0.
>> >
>> > What happened to RC1?
>> >
>> > There were issues with the release packaging and as a result was
>> skipped.
>>
>>
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Dong Joon Hyun
+1

I tested RC3 on CentOS 7.3.1611/OpenJDK 1.8.0_121/R 3.3.3
with `-Pyarn -Phadoop-2.7 -Pkinesis-asl -Phive -Phive-thriftserver –Psparkr`

At the end of R test, I saw `Had CRAN check errors; see logs.`,
but tests passed and log file looks good.

Bests,
Dongjoon.

From: Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>>
Date: Wednesday, April 19, 2017 at 3:41 PM
To: Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>>
Cc: Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>>, 
"dev@spark.apache.org<mailto:dev@spark.apache.org>" 
<dev@spark.apache.org<mailto:dev@spark.apache.org>>
Subject: Re: [VOTE] Apache Spark 2.1.1 (RC3)

+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin 
<van...@cloudera.com<mailto:van...@cloudera.com>> wrote:
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
<mich...@databricks.com<mailto:mich...@databricks.com>> wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



--
Marcelo

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>




Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Reynold Xin
+1

On Wed, Apr 19, 2017 at 3:31 PM, Marcelo Vanzin  wrote:

> +1 (non-binding).
>
> Ran the hadoop-2.6 binary against our internal tests and things look good.
>
> On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
>  wrote:
> > Please vote on releasing the following candidate as Apache Spark version
> > 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
> passes
> > if a majority of at least 3 +1 PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Spark 2.1.1
> > [ ] -1 Do not release this package because ...
> >
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v2.1.1-rc3
> > (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
> >
> > List of JIRA tickets resolved can be found with this filter.
> >
> > The release files, including signatures, digests, etc. can be found at:
> > http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
> >
> > Release artifacts are signed with the following key:
> > https://people.apache.org/keys/committer/pwendell.asc
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1230/
> >
> > The documentation corresponding to this release can be found at:
> > http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
> >
> >
> > FAQ
> >
> > How can I help test this release?
> >
> > If you are a Spark user, you can help us test this release by taking an
> > existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > What should happen to JIRA tickets still targeting 2.1.1?
> >
> > Committers should look at those and triage. Extremely important bug
> fixes,
> > documentation, and API tweaks that impact compatibility should be worked
> on
> > immediately. Everything else please retarget to 2.1.2 or 2.2.0.
> >
> > But my bug isn't fixed!??!
> >
> > In order to make timely releases, we will typically not hold the release
> > unless the bug in question is a regression from 2.1.0.
> >
> > What happened to RC1?
> >
> > There were issues with the release packaging and as a result was skipped.
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Marcelo Vanzin
+1 (non-binding).

Ran the hadoop-2.6 binary against our internal tests and things look good.

On Tue, Apr 18, 2017 at 11:59 AM, Michael Armbrust
 wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
> (2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter.
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> FAQ
>
> How can I help test this release?
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> What should happen to JIRA tickets still targeting 2.1.1?
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> But my bug isn't fixed!??!
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> What happened to RC1?
>
> There were issues with the release packaging and as a result was skipped.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Kazuaki Ishizaki
+1 (non-binding)

I tested it on Ubuntu 16.04 and openjdk8 on ppc64le. All of the tests for 
core have passed..

$ java -version
openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 
1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)
$ build/mvn -DskipTests -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 
package install
$ build/mvn -Phive -Phive-thriftserver -Pyarn -Phadoop-2.7 test -pl core
...
Total number of tests run: 1788
Suites: completed 198, aborted 0
Tests: succeeded 1788, failed 0, canceled 4, ignored 8, pending 0
All tests passed.
[INFO] 

[INFO] BUILD SUCCESS
[INFO] 

[INFO] Total time: 16:38 min
[INFO] Finished at: 2017-04-19T18:17:43+09:00
[INFO] Final Memory: 56M/672M
[INFO] 


Regards,
Kazuaki Ishizaki,



From:   Michael Armbrust <mich...@databricks.com>
To: "dev@spark.apache.org" <dev@spark.apache.org>
Date:   2017/04/19 04:00
Subject:        [VOTE] Apache Spark 2.1.1 (RC3)



Please vote on releasing the following candidate as Apache Spark version 
2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and 
passes if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3 (
2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1230/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/


FAQ

How can I help test this release?

If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then 
reporting any regressions.

What should happen to JIRA tickets still targeting 2.1.1?

Committers should look at those and triage. Extremely important bug fixes, 
documentation, and API tweaks that impact compatibility should be worked 
on immediately. Everything else please retarget to 2.1.2 or 2.2.0.

But my bug isn't fixed!??!

In order to make timely releases, we will typically not hold the release 
unless the bug in question is a regression from 2.1.0.

What happened to RC1?

There were issues with the release packaging and as a result was skipped.




Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-19 Thread Sean Owen
+1 from me -- this worked unusually smoothly on the first try.

Sigs and license and so forth look OK. Tests pass with Java 8, Ubuntu 17,
-Phive -Phadoop-2.7 -Pyarn.

I had to run the build with -Xss2m to get this test to pass, but it might
be somewhat specific to my env somehow:

- SPARK-16845: GeneratedClass$SpecificOrdering grows beyond 64 KB ***
FAILED ***
  com.google.common.util.concurrent.ExecutionError:
java.lang.StackOverflowError
  at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2261)


On Tue, Apr 18, 2017 at 7:59 PM Michael Armbrust 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
>  (
> 2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>


Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-18 Thread Nicholas Chammas
I had trouble starting up a shell with the AWS package loaded
(specifically, org.apache.hadoop:hadoop-aws:2.7.3):


[NOT FOUND  ]
com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle) (0ms)

 local-m2-cache: tried

  
file:/home/ec2-user/.m2/repository/com/sun/jersey/jersey-server/1.9/jersey-server-1.9.jar

[NOT FOUND  ]
org.codehaus.jettison#jettison;1.1!jettison.jar(bundle) (1ms)

 local-m2-cache: tried

  
file:/home/ec2-user/.m2/repository/org/codehaus/jettison/jettison/1.1/jettison-1.1.jar

[NOT FOUND  ]
com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar (0ms)

 local-m2-cache: tried

  
file:/home/ec2-user/.m2/repository/com/sun/xml/bind/jaxb-impl/2.2.3-1/jaxb-impl-2.2.3-1.jar

::

::  FAILED DOWNLOADS::

:: ^ see resolution messages for details  ^ ::

::

:: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)

:: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)

:: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar

:: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)

::

Anyone know anything about this? I made sure to build Spark against the
appropriate version of Hadoop.

Nick

On Tue, Apr 18, 2017 at 2:59 PM Michael Armbrust 
wrote:

Please vote on releasing the following candidate as Apache Spark version
> 2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and
> passes if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.1.1
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.1.1-rc3
>  (
> 2ed19cff2f6ab79a718526e5d16633412d8c4dd4)
>
> List of JIRA tickets resolved can be found with this filter
> 
> .
>
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1230/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/
>
>
> *FAQ*
>
> *How can I help test this release?*
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> *What should happen to JIRA tickets still targeting 2.1.1?*
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.1.2 or 2.2.0.
>
> *But my bug isn't fixed!??!*
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.1.0.
>
> *What happened to RC1?*
>
> There were issues with the release packaging and as a result was skipped.
>
​


[VOTE] Apache Spark 2.1.1 (RC3)

2017-04-18 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version
2.1.1. The vote is open until Fri, April 21st, 2018 at 13:00 PST and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.1.1
[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.1.1-rc3
 (
2ed19cff2f6ab79a718526e5d16633412d8c4dd4)

List of JIRA tickets resolved can be found with this filter

.

The release files, including signatures, digests, etc. can be found at:
http://home.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1230/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.1.1-rc3-docs/


*FAQ*

*How can I help test this release?*

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

*What should happen to JIRA tickets still targeting 2.1.1?*

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.1.2 or 2.2.0.

*But my bug isn't fixed!??!*

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.1.0.

*What happened to RC1?*

There were issues with the release packaging and as a result was skipped.