date:20150427

[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl

2015-04-27 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/5715#discussion_r29126734
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -81,11 +81,38 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf(spark.sql.hive.convertCTAS, false).toBoolean
 
-  override protected[sql] def executePlan(plan: LogicalPlan): 
this.QueryExecution =
-new this.QueryExecution(plan)
+  /* A catalyst metadata catalog that points to the Hive Metastore. */
+  @transient
+  override protected[sql] lazy val catalog = new 
HiveMetastoreCatalog(this) with OverrideCatalog
--- End diff --

reorder to make catalog, functionRegistry, analyzer, sqlParser togethor  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...

2015-04-27 Thread tsudukim

Github user tsudukim commented on a diff in the pull request:

https://github.com/apache/spark/pull/5227#discussion_r29132469
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java ---
@@ -260,15 +260,14 @@ static String quoteForBatchScript(String arg) {
 quoted.append('');
 break;
 
-  case '=':
--- End diff --

I've run `SparkLauncherSuite` on Windows and it's OK.
If double-quotation is parsed properly, `=` in double-quotation is not need 
to be escaped.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource

Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/3062#issuecomment-96516889
  
@mengxr for SVM, I manually tried what you suggested and it looks good.

I loaded the example below in JPMML and evaluated it as Classification map, 
indeed the intercept on the NO category acts as threshold when 
`normalizationMethod = none`.
Here the example:
code
?xml version=1.0 encoding=UTF-8 standalone=yes?
PMML xmlns=http://www.dmg.org/PMML-4_2;
Header description=linear SVM: if predicted value gt; 0, the outcome 
is positive, or negative otherwise
Application name=Apache Spark MLlib version=1.4.0-SNAPSHOT/
Timestamp2015-04-27T06:58:22/Timestamp
/Header
DataDictionary numberOfFields=10
DataField name=field_0 optype=continuous dataType=double/
DataField name=field_1 optype=continuous dataType=double/
DataField name=field_2 optype=continuous dataType=double/
DataField name=field_3 optype=continuous dataType=double/
DataField name=field_4 optype=continuous dataType=double/
DataField name=field_5 optype=continuous dataType=double/
DataField name=field_6 optype=continuous dataType=double/
DataField name=field_7 optype=continuous dataType=double/
DataField name=field_8 optype=continuous dataType=double/
DataField name=target optype=categorical dataType=string/
/DataDictionary
RegressionModel modelName=linear SVM: if predicted value gt; 0, the 
outcome is positive, or negative otherwise functionName=classification 
normalizationMethod=none
MiningSchema
MiningField name=field_0 usageType=active/
MiningField name=field_1 usageType=active/
MiningField name=field_2 usageType=active/
MiningField name=field_3 usageType=active/
MiningField name=field_4 usageType=active/
MiningField name=field_5 usageType=active/
MiningField name=field_6 usageType=active/
MiningField name=field_7 usageType=active/
MiningField name=field_8 usageType=active/
MiningField name=target usageType=target/
/MiningSchema
RegressionTable intercept=-1.2973802920137774 targetCategory=1
NumericPredictor name=field_0 
coefficient=-0.0818303650185629/
NumericPredictor name=field_1 
coefficient=0.5609579878511747/
NumericPredictor name=field_2 
coefficient=0.1382792114252377/
NumericPredictor name=field_3 
coefficient=0.07497131265977852/
NumericPredictor name=field_4 
coefficient=-0.47760356523751296/
NumericPredictor name=field_5 
coefficient=0.3817837986572615/
NumericPredictor name=field_6 
coefficient=-0.23753782335208481/
NumericPredictor name=field_7 
coefficient=0.2548602390316011/
NumericPredictor name=field_8 
coefficient=-0.10271528637619945/
/RegressionTable
RegressionTable intercept=0.0 targetCategory=0/
/RegressionModel
/PMML
/code

However, I noticed that if the SVM model threshold is set to None, it 
simply displays the margin (which is how it is implemented now in the pmml 
exporter). 
My question is, should we support both? If `threshold = None`, export as 
regression (like it is implemented now), if `threshold  None`, export as 
binary classification (as you suggested). What do you think?






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SQL][Minor] fix java doc for DataFrame.agg

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5712#issuecomment-96516809
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread mag-

Github user mag- commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96587264
  
Are you aware that all this regexp hacks will break when hadoop changes 
version to 3.0.0?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5717#issuecomment-96597976
  
  [Test build #30964 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30964/consoleFull)
 for   PR 5717 at commit 
[`fc862f4`](https://github.com/apache/spark/commit/fc862f421b5cdbac18535fa09a2af668a5fc74d9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...

2015-04-27 Thread jerryshao

Github user jerryshao commented on the pull request:

https://github.com/apache/spark/pull/4723#issuecomment-96580759
  
Hi @davies and @tdas , I met a problem of converting Python `int` into Java 
`Long`, the Java API in KafkaUtils requires offset as `Long` type, this is 
simple for Python 2, since Python 2 has a built-in `long` type which can be 
mapped to Java `Long` through py4j automatically, but python 3 only has `int` 
type, and py4j will map python `int` into Java `Integer`, I'm not sure how to 
support `Long` in python 3.

A simple solution is to modify all the Java-Python interface to change to 
type `Interger`, but it may not support super large offset. I'm not sure is 
there any other solution. Sorry for dumb question and thanks a lot in advance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Add filter by location boundingbox in TwitterI...

2015-04-27 Thread yang0228

GitHub user yang0228 opened a pull request:

https://github.com/apache/spark/pull/5718

Add filter by location boundingbox in TwitterInputDStream.scala

Current TwitterInputDStream only filters by keywords.
Need a filtering by location feature.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/spark branch-1.3

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5718.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5718


commit f476108901c42ea61873f02dc2fee15896550d30
Author: q00251598 qiyad...@huawei.com
Date:   2015-03-02T18:13:11Z

[SPARK-5741][SQL] Support the path contains comma in HiveContext

When run ```select * from nzhang_part where hr = 'file,';```, it throws 
exception ```java.lang.IllegalArgumentException: Can not create a Path from an 
empty string```
. Because the path of hdfs contains comma, and 
FileInputFormat.setInputPaths will split path by comma.

### SQL
```
set hive.merge.mapfiles=true;
set hive.merge.mapredfiles=true;
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

create table nzhang_part like srcpart;

insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select 
key, value, hr from srcpart where ds='2008-04-08';

insert overwrite table nzhang_part partition (ds='2010-08-15', hr=11) 
select key, value from srcpart where ds='2008-04-08';

insert overwrite table nzhang_part partition (ds='2010-08-15', hr)
select * from (
select key, value, hr from srcpart where ds='2008-04-08'
union all
select '1' as key, '1' as value, 'file,' as hr from src limit 1) s;

select * from nzhang_part where hr = 'file,';
```

### Error Log
```
15/02/10 14:33:16 ERROR SparkSQLDriver: Failed in [select * from 
nzhang_part where hr = 'file,']
java.lang.IllegalArgumentException: Can not create a Path from an empty 
string
at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
at org.apache.hadoop.fs.Path.init(Path.java:135)
at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241)
at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
at 
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:251)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:172)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)

Author: q00251598 qiyad...@huawei.com

Closes #4532 from watermen/SPARK-5741 and squashes the following commits:

9758ab1 [q00251598] fix bug
1db1a1c [q00251598] use setInputPaths(Job job, Path... inputPaths)
b788a72 [q00251598] change FileInputFormat.setInputPaths to jobConf.set and 
add test suite

(cherry picked from commit 9ce12aaf283a2793e719bdc956dd858922636e8d)
Signed-off-by: Michael Armbrust mich...@databricks.com

commit 4ffaf856882fb1f4a5bfc24e5a05c74ba950e282
Author: Yanbo Liang yblia...@gmail.com
Date:   2015-03-02T18:17:24Z

[SPARK-6080] [PySpark] correct LogisticRegressionWithLBFGS regType 
parameter for pyspark

Currently LogisticRegressionWithLBFGS in 
python/pyspark/mllib/classification.py will invoke callMLlibFunc with a wrong 
regType parameter.
It was assigned to str(regType) which translate None(Python) to 
None(Java/Scala). The right way should be translate None(Python) to 
null(Java/Scala) just as what we did at LogisticRegressionWithSGD.

Author: Yanbo Liang yblia...@gmail.com

Closes #4831 from yanboliang/pyspark_classification and squashes the 
following commits:

12db65a [Yanbo Liang] correct LogisticRegressionWithLBFGS regType parameter 
for pyspark

(cherry picked from commit af2effdd7b54316af0c02e781911acfb148b962b)
Signed-off-by: Xiangrui Meng m...@databricks.com

commit 54ac243655d2eaf331d9f8fc43a8c1301803320b
Author: Paul Power paul.po...@peerside.com
Date:   2015-03-02T21:08:47Z

[DOCS] Refactored Dataframe join comment to use correct parameter ordering

The API signatire for join requires the JoinType to be the third parameter. 
The code examples provided for join show JoinType being provided as the 2nd 
parater resuling in errors (i.e. df1.join(df2,

[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...

2015-04-27 Thread tsudukim

Github user tsudukim commented on the pull request:

https://github.com/apache/spark/pull/5227#issuecomment-96598393
  
The problem I mentioned was that the spark-shell.cmd which is called by 
`SparkLauncherSuite` somehow failed to launch test application.
It turned out to be caused by the limitation of Windows batch that the one 
command line must be shorter than 8192 characters. (The fullpath for classpath 
was long because I worked at a deep folder.)

So I assume all issue is now cleared up. Sorry for my late response.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96611185
  
@mag- if you're talking about what I think you are, it was a temporary 
thing that's long since gone already 
https://github.com/apache/spark/pull/629/files


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...

2015-04-27 Thread adrian-wang

GitHub user adrian-wang opened a pull request:

https://github.com/apache/spark/pull/5717

[SPARK-7165] [SQL] use sort merge join for outer join

This is an extended version of #5208 
In this patch, we are introducing sort merge join for not only inner joins, 
but left outer/ right outer/ full outer joins.
Using sort merge join could resolve the OOM which is quite common as the 
memory easily becomes too small for joins of large tables.

Test cases are always available in SortMergeCompatibilitySuite. And we need 
to add some more in `JoinSuite` to test the Join selection.
Also , This patch would benefit from #3438 quite a lot.

/cc @chenghao-intel


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adrian-wang/spark outersmj

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5717


commit fc862f421b5cdbac18535fa09a2af668a5fc74d9
Author: Daoyuan Wang daoyuan.w...@intel.com
Date:   2015-04-27T09:40:55Z

use sort merge join for outer join




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5715#issuecomment-96597552
  
  [Test build #30962 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30962/consoleFull)
 for   PR 5715 at commit 
[`f76a7b1`](https://github.com/apache/spark/commit/f76a7b1eb2cec2c922f8a82e3e67da03984e886e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5716#issuecomment-96597549
  
  [Test build #30961 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30961/consoleFull)
 for   PR 5716 at commit 
[`b64564c`](https://github.com/apache/spark/commit/b64564c74248ef137ed3352e145735ce669bccf8).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...

2015-04-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5716#issuecomment-96613653
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5673#issuecomment-96792683
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5723#issuecomment-96798651
  
  [Test build #31067 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31067/consoleFull)
 for   PR 5723 at commit 
[`98bfe48`](https://github.com/apache/spark/commit/98bfe48d603c56f45945049b72a484686e2d0be2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...

2015-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5377#discussion_r29190170
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java 
---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.util.List;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelOutboundHandlerAdapter;
+import io.netty.channel.ChannelPromise;
+import io.netty.channel.FileRegion;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import io.netty.util.AbstractReferenceCounted;
+import io.netty.util.ReferenceCountUtil;
+
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.NettyUtils;
+
+class SaslEncryption {
+
+  @VisibleForTesting
+  static final String ENCRYPTION_HANDLER_NAME = saslEncryption;
+
+  /**
+   * Adds channel handlers that perform encryption / decryption of data 
using SASL.
+   *
+   * @param channel The channel.
+   * @param backend The SASL backend.
+   * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted 
blocks, to control
+   * memory usage.
+   */
+  static void addToChannel(
+  Channel channel,
+  SaslEncryptionBackend backend,
+  int maxOutboundBlockSize) {
+channel.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, 
maxOutboundBlockSize))
+  .addFirst(saslDecryption, new DecryptionHandler(backend))
+  .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder());
+  }
+
+  private static class EncryptionHandler extends 
ChannelOutboundHandlerAdapter {
+
+private final int maxOutboundBlockSize;
+private final SaslEncryptionBackend backend;
+
+EncryptionHandler(SaslEncryptionBackend backend, int 
maxOutboundBlockSize) {
+  this.backend = backend;
+  this.maxOutboundBlockSize = maxOutboundBlockSize;
+}
+
+/**
+ * Wrap the incoming message in an implementation that will perform 
encryption lazily. This is
+ * needed to guarantee ordering of the outgoing encrypted packets - 
they need to be decrypted in
+ * the same order, and netty doesn't have an atomic 
ChannelHandlerContext.write() API, so it
+ * does not guarantee any ordering.
+ */
+@Override
+public void write(ChannelHandlerContext ctx, Object msg, 
ChannelPromise promise)
+  throws Exception {
+
+  ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), 
promise);
+}
+
+@Override
+public void handlerRemoved(ChannelHandlerContext ctx) throws Exception 
{
+  try {
+backend.dispose();
+  } finally {
+super.handlerRemoved(ctx);
+  }
+}
+
+  }
+
+  private static class DecryptionHandler extends 
MessageToMessageDecoderByteBuf {
+
+private final SaslEncryptionBackend backend;
+
+DecryptionHandler(SaslEncryptionBackend backend) {
+  this.backend = backend;
+}
+
+@Override
+protected void decode(ChannelHandlerContext ctx, ByteBuf msg, 
ListObject out)
+  throws Exception {
+
+  byte[] data;
+  int offset;
+  int length = msg.readableBytes();
+  if (msg.hasArray()) {
+data = msg.array();
+offset = msg.arrayOffset();
+  } else {
+data = new byte[length];
+msg.readBytes(data);
+offset = 0;
+

[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-27 Thread His-name-is-Joof

Github user His-name-is-Joof commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-96819726
  
Joof
On Apr 27, 2015 2:41 PM, Shivaram Venkataraman notificati...@github.com
wrote:

 @His-name-is-Joof https://github.com/His-name-is-Joof -- Could you let
 me know what your JIRA username is ? I would like to assign this issue to
 your

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/5667#issuecomment-96813694.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

2015-04-27 Thread tgravescs

Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/4688#issuecomment-96819919
  
@harishreedharan , fyi - we are doing a feature code freeze for spark 1.4 
this friday.   I think this is really close so hopefully we can get it in. Let 
me know if there are any questions of concerns on my comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0

2015-04-27 Thread punya

Github user punya commented on the pull request:

https://github.com/apache/spark/pull/5726#issuecomment-96794273
  
@srowen I'm sure it will :) I was using the PR to get Jenkins to figure out 
what tests actually break. At that point, I'll add [WIP] to the title and see 
if I can fix the tests.

(Please let me know if there's a different process you'd prefer.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5891][ML] Add Binarizer ML Transformer

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5699#issuecomment-96797201
  
  [Test build #30972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30972/consoleFull)
 for   PR 5699 at commit 
[`1682f8c`](https://github.com/apache/spark/commit/1682f8c05965ccbb34472c5d6e01166ce147f730).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0

2015-04-27 Thread punya

Github user punya closed the pull request at:

https://github.com/apache/spark/pull/5726


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4783#issuecomment-96800849
  
  [Test build #31061 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31061/consoleFull)
 for   PR 4783 at commit 
[`db1e948`](https://github.com/apache/spark/commit/db1e948097b202573cac16a23a8bf22f1d6e2a5b).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6422][STREAMING] support customized act...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5098#issuecomment-96804955
  
  [Test build #31049 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31049/consoleFull)
 for   PR 5098 at commit 
[`4fe04ee`](https://github.com/apache/spark/commit/4fe04ee69cb9f70e2156f18aa59c13774e39a009).
 * This patch **passes all tests**.

 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96813165
  
  [Test build #722 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/722/consoleFull)
 for   PR 5727 at commit 
[`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-96813694
  
@His-name-is-Joof -- Could you let me know what your JIRA username is ? I 
would like to assign this issue to your


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7100][MLLib] Fix persisted RDD leak in ...

2015-04-27 Thread jkbradley

Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/5669#issuecomment-96816164
  
After discussing with @mengxr I think we should not bother with the 
try-finally wrapper.  As mentioned above, the method should generally not fail, 
so the data will be unpersisted as needed.  When an exception is thrown, then 
the data will be unpersisted whenever another RDD pushes input out of 
memory/disk, without undue harm to other jobs.

@jimfcarroll  Could you please update the PR to remove the try-finally 
wrapper, but keep the unpersist at the end?

Thanks for going through this discussion!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96816185
  
  [Test build #31068 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31068/consoleFull)
 for   PR 5727 at commit 
[`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6).
 * This patch **fails MiMa tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...

2015-04-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/5568#issuecomment-96816065
  
Thanks @concretevitamin for the review. Will merge this after Jenkins passes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4450#discussion_r29190908
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala ---
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+import java.io.OutputStream
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.storage.BlockObjectWriter
+
+/**
+ * A logical byte buffer that wraps a list of byte arrays. All the byte 
arrays have equal size. The
+ * advantage of this over a standard ArrayBuffer is that it can grow 
without claiming large amounts
+ * of memory and needing to copy the full contents.
+ */
+private[spark] class ChainedBuffer(chunkSize: Int) {
+  private val chunkSizeLog2 = (math.log(chunkSize) / math.log(2)).toInt
+  assert(math.pow(2, chunkSizeLog2).toInt == chunkSize)
+  private val chunks: ArrayBuffer[Array[Byte]] = new 
ArrayBuffer[Array[Byte]]()
+  private var _size: Int = _
+
+  /**
+   * Feed bytes from this buffer into a BlockObjectWriter.
+   *
+   * @param pos Offset in the buffer to read from.
+   * @param writer BlockObjectWriter to read into.
+   * @param len Number of bytes to read.
+   */
+  def read(pos: Int, writer: BlockObjectWriter, len: Int): Unit = {
+var chunkIndex = pos  chunkSizeLog2
+var posInChunk = pos - (chunkIndex  chunkSizeLog2)
+var moved = 0
+while (moved  len) {
+  val toRead = math.min(len - moved, chunkSize - posInChunk)
+  writer.writeBytes(chunks(chunkIndex), posInChunk, toRead)
+  moved += toRead
+  chunkIndex += 1
+  posInChunk = 0
+}
+  }
+
+  /**
+   * Read bytes from this buffer into a byte array.
+   *
+   * @param pos Offset in the buffer to read from.
+   * @param bytes Byte array to read into.
+   * @param offs Offset in the byte array to read to.
+   * @param len Number of bytes to read.
+   */
+  def read(pos: Int, bytes: Array[Byte], offs: Int, len: Int): Unit = {
+var chunkIndex = pos  chunkSizeLog2
+var posInChunk = pos - (chunkIndex  chunkSizeLog2)
+var moved = 0
+while (moved  len) {
+  val toRead = math.min(len - moved, chunkSize - posInChunk)
+  System.arraycopy(chunks(chunkIndex), posInChunk, bytes, offs + 
moved, toRead)
+  moved += toRead
+  chunkIndex += 1
+  posInChunk = 0
+}
+  }
+
+  /**
+   * Write bytes from a byte array into this buffer.
+   *
+   * @param pos Offset in the buffer to write to.
+   * @param bytes Byte array to write from.
+   * @param offs Offset in the byte array to write from.
+   * @param len Number of bytes to write.
+   */
+  def write(pos: Int, bytes: Array[Byte], offs: Int, len: Int): Unit = {
+// Grow if needed
+val endChunkIndex = (pos + len - 1)  chunkSizeLog2
+while (endChunkIndex = chunks.length) {
+  chunks += new Array[Byte](chunkSize)
+}
+
+var chunkIndex = pos  chunkSizeLog2
+var posInChunk = pos - (chunkIndex  chunkSizeLog2)
+var moved = 0
+while (moved  len) {
+  val toWrite = math.min(len - moved, chunkSize - posInChunk)
+  System.arraycopy(bytes, offs + moved, chunks(chunkIndex), 
posInChunk, toWrite)
+  moved += toWrite
+  chunkIndex += 1
+  posInChunk = 0
+}
+
+_size = math.max(_size, pos + len)
+  }
+
+  /**
+   * Total size of buffer that can be written to without allocating 
additional memory.
+   */
+  def capacity: Int = chunks.size * chunkSize
+
+  /**
+   * Size of the logical buffer.
+   */
+  def size: Int = _size
+}
+
+/**
+ * Output stream that writes to a ChainedBuffer.
+ */
+private[spark] class ChainedBufferOutputStream(chainedBuffer: 
ChainedBuffer) extends OutputStream {
+  private

[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5673#issuecomment-96828599
  
  [Test build #31066 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31066/consoleFull)
 for   PR 5673 at commit 
[`ab7c72b`](https://github.com/apache/spark/commit/ab7c72b486106a98bafb70b61125ed84f1d01cdd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `tachyon-0.6.4.jar`
   * `tachyon-client-0.6.4.jar`

 * This patch **removes the following dependencies:**
   * `tachyon-0.5.0.jar`
   * `tachyon-client-0.5.0.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5673


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0

2015-04-27 Thread srowen

Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/5726#issuecomment-96791158
  
... I doubt this passes tests. The problem is that you break compatibility 
with old versions of Hive; it's not this simple.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5726#issuecomment-96794694
  
  [Test build #31064 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31064/consoleFull)
 for   PR 5726 at commit 
[`310c315`](https://github.com/apache/spark/commit/310c3150c3c3c6c600d2049faba6b8349ba66f99).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **removes the following dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `activation-1.1.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `aopalliance-1.0.jar`
   * `arpack_combined_all-0.1.jar`
   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.4.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jets3t-0.7.1.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `management-api-3.0.0-b012.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.4.1.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   * `scala-library-2.10.4.jar`
   *

[GitHub] spark pull request: [SPARK-6746B] Refactor large functions in DAGS...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5396#issuecomment-96797161
  
  [Test build #31024 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31024/consoleFull)
 for   PR 5396 at commit 
[`f0dcc7b`](https://github.com/apache/spark/commit/f0dcc7b8b62e7cbb4608b5cc9f3e6fe865c87bd8).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-96800371
  
  [Test build #31060 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31060/consoleFull)
 for   PR 5667 at commit 
[`f8814a6`](https://github.com/apache/spark/commit/f8814a67436922342f89e54b8fc2ef24b63d1308).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5442#issuecomment-96811042
  
**[Test build #31015 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31015/consoleFull)**
 for PR 5442 at commit 
[`89684ce`](https://github.com/apache/spark/commit/89684ce59cfe4d989c2f36495d21ecb142c9881d)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread JoshRosen

GitHub user JoshRosen opened a pull request:

https://github.com/apache/spark/pull/5727

[Build] Enable MiMa checks for launcher and sql projects

Now that 1.3 has been released, we should enable MiMa checks for the `sql` 
and `launcher` subprojects.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/JoshRosen/spark enable-more-mima-checks

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5727


commit 1aae027d640342ca7fb1146f72c9b62aea9f78c6
Author: Josh Rosen joshro...@databricks.com
Date:   2015-04-27T20:32:06Z

Enable MiMa checks for launcher and sql projects.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark-5854 personalized page rank

2015-04-27 Thread dwmclary

Github user dwmclary commented on the pull request:

https://github.com/apache/spark/pull/4774#issuecomment-96816965
  
@jegonzal does this algorithm look correct to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5589#issuecomment-96819141
  
  [Test build #30999 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30999/consoleFull)
 for   PR 5589 at commit 
[`a6d5ec5`](https://github.com/apache/spark/commit/a6d5ec51caabdb900c5c1971bfe438957ee1a032).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `activation-1.1.jar`
   * `aopalliance-1.0.jar`
   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-math3-3.4.1.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jodd-core-3.6.3.jar`
   * `management-api-3.0.0-b012.jar`
   * `protobuf-java-2.4.1.jar`
   * `snappy-java-1.1.1.7.jar`
   * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-core_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-repl_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-sql_2.10-1.4.0-SNAPSHOT.jar`
   * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar`
   * `stax-api-1.0.1.jar`
   * `xz-1.0.jar`

 * This patch **removes the following dependencies:**
   * `breeze-macros_2.10-0.3.1.jar`
   * `breeze_2.10-0.10.jar`
   * `commons-codec-1.5.jar`
   * `commons-el-1.0.jar`
   * `commons-io-2.4.jar`
   * `commons-lang-2.4.jar`
   * `commons-math3-3.1.1.jar`
   * `hadoop-client-1.0.4.jar`
   * `hadoop-core-1.0.4.jar`
   * `hsqldb-1.8.0.10.jar`
   * `jackson-annotations-2.3.0.jar`
   * `jackson-core-2.3.0.jar`
   * `jackson-databind-2.3.0.jar`
   * `jblas-1.2.3.jar`
   * `snappy-java-1.1.1.6.jar`
   * `spark-bagel_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-catalyst_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-core_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-graphx_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-mllib_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-network-common_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-network-shuffle_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-repl_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-sql_2.10-1.3.0-SNAPSHOT.jar`
   * `spark-streaming_2.10-1.3.0-SNAPSHOT.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-04-27 Thread doctapp

Github user doctapp commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-96791733
  
@hellertime thanks for the info, didn't catch this wasn't pre-installed 
with mesos.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...

2015-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5377#discussion_r29188459
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java 
---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.util.List;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelOutboundHandlerAdapter;
+import io.netty.channel.ChannelPromise;
+import io.netty.channel.FileRegion;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import io.netty.util.AbstractReferenceCounted;
+import io.netty.util.ReferenceCountUtil;
+
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.NettyUtils;
+
+class SaslEncryption {
+
+  @VisibleForTesting
+  static final String ENCRYPTION_HANDLER_NAME = saslEncryption;
+
+  /**
+   * Adds channel handlers that perform encryption / decryption of data 
using SASL.
+   *
+   * @param channel The channel.
+   * @param backend The SASL backend.
+   * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted 
blocks, to control
+   * memory usage.
+   */
+  static void addToChannel(
+  Channel channel,
+  SaslEncryptionBackend backend,
+  int maxOutboundBlockSize) {
+channel.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, 
maxOutboundBlockSize))
+  .addFirst(saslDecryption, new DecryptionHandler(backend))
+  .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder());
+  }
+
+  private static class EncryptionHandler extends 
ChannelOutboundHandlerAdapter {
+
+private final int maxOutboundBlockSize;
+private final SaslEncryptionBackend backend;
+
+EncryptionHandler(SaslEncryptionBackend backend, int 
maxOutboundBlockSize) {
+  this.backend = backend;
+  this.maxOutboundBlockSize = maxOutboundBlockSize;
+}
+
+/**
+ * Wrap the incoming message in an implementation that will perform 
encryption lazily. This is
+ * needed to guarantee ordering of the outgoing encrypted packets - 
they need to be decrypted in
+ * the same order, and netty doesn't have an atomic 
ChannelHandlerContext.write() API, so it
+ * does not guarantee any ordering.
+ */
+@Override
+public void write(ChannelHandlerContext ctx, Object msg, 
ChannelPromise promise)
+  throws Exception {
+
+  ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), 
promise);
+}
+
+@Override
+public void handlerRemoved(ChannelHandlerContext ctx) throws Exception 
{
+  try {
+backend.dispose();
+  } finally {
+super.handlerRemoved(ctx);
+  }
+}
+
+  }
+
+  private static class DecryptionHandler extends 
MessageToMessageDecoderByteBuf {
+
+private final SaslEncryptionBackend backend;
+
+DecryptionHandler(SaslEncryptionBackend backend) {
+  this.backend = backend;
+}
+
+@Override
+protected void decode(ChannelHandlerContext ctx, ByteBuf msg, 
ListObject out)
+  throws Exception {
+
+  byte[] data;
+  int offset;
+  int length = msg.readableBytes();
+  if (msg.hasArray()) {
+data = msg.array();
+offset = msg.arrayOffset();
--- End diff --

It's unnecessary since `MessageToMessageDecoder` will release the input 
message when this

[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5667


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96813311
  
  [Test build #31068 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31068/consoleFull)
 for   PR 5727 at commit 
[`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...

2015-04-27 Thread concretevitamin

Github user concretevitamin commented on the pull request:

https://github.com/apache/spark/pull/5568#issuecomment-96814334
  
LGTM. /cc @shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5725#issuecomment-96814367
  
  [Test build #31069 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31069/consoleFull)
 for   PR 5725 at commit 
[`0925847`](https://github.com/apache/spark/commit/092584701277394a704c7600c6a631326d7895c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...

2015-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5377#discussion_r29190765
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/sasl/SparkSaslServer.java 
---
@@ -60,13 +60,19 @@
   static final String DIGEST = DIGEST-MD5;
 
   /**
-   * The quality of protection is just auth. This means that we are doing
-   * authentication only, we are not supporting integrity or privacy 
protection of the
-   * communication channel after authentication. This could be changed to 
be configurable
-   * in the future.
+   * QOP value that includes encryption.
+   */
+  static final String QOP_AUTH_CONF = auth-conf;
+
+  /**
+   * QOP value that does not include encryption.
+   */
+  static final String QOP_AUTH = auth;
+
+  /**
+   * Common SASL config properties for both client and server.
*/
   static final MapString, String SASL_PROPS = ImmutableMap.String, 
Stringbuilder()
-.put(Sasl.QOP, auth)
 .put(Sasl.SERVER_AUTH, true)
--- End diff --

I don't think it applies to the client. I'm also not sure whether it's 
needed at all, but I'll change the code so it's only set for the server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...

2015-04-27 Thread harishreedharan

Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/4688#issuecomment-96824485
  
I am testing the changes right now. I will update this PR soon. Thanks 
@tgravescs!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4450#discussion_r29192947
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala
 ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+import java.util.Comparator
+
+import org.apache.spark.storage.BlockObjectWriter
+
+/**
+ * A common interface for size-tracking collections of key-value pairs that
+ * - Have an associated partition for each key-value pair.
+ * - Support a memory-efficient sorted iterator
+ * - Support a WritablePartitionedIterator for writing the contents 
directly as bytes.
+ */
+private[spark] trait WritablePartitionedPairCollection[K, V] extends 
SizeTracker {
+  /**
+   * Insert a key-value pair with a partition into the collection
+   */
+  def insert(partition: Int, key: K, value: V): Unit
+
+  /**
+   * Estimate the collection's current memory usage in bytes.
+   */
+  def estimateSize(): Long
+
+  /**
+   * Iterate through the data in order of partition ID and then the given 
comparator. This may
+   * destroy the underlying collection.
+   */
+  def partitionedDestructiveSortedIterator(keyComparator: Comparator[K]): 
Iterator[((Int, K), V)]
--- End diff --

Agree this is an obnoxiously long name.  However, if we rename 
`partitionedDestructiveSortedIterator` to `partitionedIterator`, then we 
probably also want to rename `destructiveSortedWritablePartitionedIterator` to 
`writablePartitionedIterator`.  But a method named 
`writablePartitionedIterator` exists as well (which is not destructive or 
sorted).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-96828951
  
  [Test build #31071 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31071/consoleFull)
 for   PR 5694 at commit 
[`83e80ef`](https://github.com/apache/spark/commit/83e80ef4eec49dcee7c55900e4cbcf9b899aea65).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...

2015-04-27 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/5377#discussion_r29195262
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java 
---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.util.List;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelOutboundHandlerAdapter;
+import io.netty.channel.ChannelPromise;
+import io.netty.channel.FileRegion;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import io.netty.util.AbstractReferenceCounted;
+import io.netty.util.ReferenceCountUtil;
+
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.NettyUtils;
+
+class SaslEncryption {
+
+  @VisibleForTesting
+  static final String ENCRYPTION_HANDLER_NAME = saslEncryption;
+
+  /**
+   * Adds channel handlers that perform encryption / decryption of data 
using SASL.
+   *
+   * @param channel The channel.
+   * @param backend The SASL backend.
+   * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted 
blocks, to control
+   * memory usage.
+   */
+  static void addToChannel(
+  Channel channel,
+  SaslEncryptionBackend backend,
+  int maxOutboundBlockSize) {
+channel.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, 
maxOutboundBlockSize))
+  .addFirst(saslDecryption, new DecryptionHandler(backend))
+  .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder());
+  }
+
+  private static class EncryptionHandler extends 
ChannelOutboundHandlerAdapter {
+
+private final int maxOutboundBlockSize;
+private final SaslEncryptionBackend backend;
+
+EncryptionHandler(SaslEncryptionBackend backend, int 
maxOutboundBlockSize) {
+  this.backend = backend;
+  this.maxOutboundBlockSize = maxOutboundBlockSize;
+}
+
+/**
+ * Wrap the incoming message in an implementation that will perform 
encryption lazily. This is
+ * needed to guarantee ordering of the outgoing encrypted packets - 
they need to be decrypted in
+ * the same order, and netty doesn't have an atomic 
ChannelHandlerContext.write() API, so it
+ * does not guarantee any ordering.
+ */
+@Override
+public void write(ChannelHandlerContext ctx, Object msg, 
ChannelPromise promise)
+  throws Exception {
+
+  ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), 
promise);
+}
+
+@Override
+public void handlerRemoved(ChannelHandlerContext ctx) throws Exception 
{
+  try {
+backend.dispose();
+  } finally {
+super.handlerRemoved(ctx);
+  }
+}
+
+  }
+
+  private static class DecryptionHandler extends 
MessageToMessageDecoderByteBuf {
+
+private final SaslEncryptionBackend backend;
+
+DecryptionHandler(SaslEncryptionBackend backend) {
+  this.backend = backend;
+}
+
+@Override
+protected void decode(ChannelHandlerContext ctx, ByteBuf msg, 
ListObject out)
+  throws Exception {
+
+  byte[] data;
+  int offset;
+  int length = msg.readableBytes();
+  if (msg.hasArray()) {
+data = msg.array();
+offset = msg.arrayOffset();
--- End diff --

I see, it's just slightly odd that only one of the two cases moves msg's 
reader index. In

[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5647#issuecomment-96792936
  
  [Test build #30986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30986/consoleFull)
 for   PR 5647 at commit 
[`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **removes the following dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `activation-1.1.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `aopalliance-1.0.jar`
   * `arpack_combined_all-0.1.jar`
   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.1.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jets3t-0.7.1.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `management-api-3.0.0-b012.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.4.1.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   * `scala-library-2.10.4.jar`
   *

[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5673#issuecomment-96792973
  
  [Test build #31066 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31066/consoleFull)
 for   PR 5673 at commit 
[`ab7c72b`](https://github.com/apache/spark/commit/ab7c72b486106a98bafb70b61125ed84f1d01cdd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6775] [SPARK-6776] [SQL] [WIP] Refactor...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5422#issuecomment-96801888
  
  [Test build #31021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31021/consoleFull)
 for   PR 5422 at commit 
[`2529b76`](https://github.com/apache/spark/commit/2529b76141c068fe69a03e29ddc50ca496cd9dcd).
 * This patch **fails Spark unit tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `activation-1.1.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `aopalliance-1.0.jar`
   * `arpack_combined_all-0.1.jar`
   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.1.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jets3t-0.7.1.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `management-api-3.0.0-b012.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.4.1.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   *

[GitHub] spark pull request: [SPARK-7138][Streaming] Add method to BlockGen...

2015-04-27 Thread tdas

Github user tdas commented on the pull request:

https://github.com/apache/spark/pull/5695#issuecomment-96808771
  
@huitseeker Yes, I want to use the existing rate limiter interface. So that 
any sort of rate controlling in the future can be applied through that 
interface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...

2015-04-27 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/5377#discussion_r29189853
  
--- Diff: 
network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java 
---
@@ -0,0 +1,260 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.sasl;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.WritableByteChannel;
+import java.util.List;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Preconditions;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import io.netty.channel.Channel;
+import io.netty.channel.ChannelHandlerContext;
+import io.netty.channel.ChannelOutboundHandlerAdapter;
+import io.netty.channel.ChannelPromise;
+import io.netty.channel.FileRegion;
+import io.netty.handler.codec.MessageToMessageDecoder;
+import io.netty.util.AbstractReferenceCounted;
+import io.netty.util.ReferenceCountUtil;
+
+import org.apache.spark.network.util.ByteArrayWritableChannel;
+import org.apache.spark.network.util.NettyUtils;
+
+class SaslEncryption {
+
+  @VisibleForTesting
+  static final String ENCRYPTION_HANDLER_NAME = saslEncryption;
+
+  /**
+   * Adds channel handlers that perform encryption / decryption of data 
using SASL.
+   *
+   * @param channel The channel.
+   * @param backend The SASL backend.
+   * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted 
blocks, to control
+   * memory usage.
+   */
+  static void addToChannel(
+  Channel channel,
+  SaslEncryptionBackend backend,
+  int maxOutboundBlockSize) {
+channel.pipeline()
+  .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, 
maxOutboundBlockSize))
+  .addFirst(saslDecryption, new DecryptionHandler(backend))
+  .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder());
+  }
+
+  private static class EncryptionHandler extends 
ChannelOutboundHandlerAdapter {
+
+private final int maxOutboundBlockSize;
+private final SaslEncryptionBackend backend;
+
+EncryptionHandler(SaslEncryptionBackend backend, int 
maxOutboundBlockSize) {
+  this.backend = backend;
+  this.maxOutboundBlockSize = maxOutboundBlockSize;
+}
+
+/**
+ * Wrap the incoming message in an implementation that will perform 
encryption lazily. This is
+ * needed to guarantee ordering of the outgoing encrypted packets - 
they need to be decrypted in
+ * the same order, and netty doesn't have an atomic 
ChannelHandlerContext.write() API, so it
+ * does not guarantee any ordering.
+ */
+@Override
+public void write(ChannelHandlerContext ctx, Object msg, 
ChannelPromise promise)
+  throws Exception {
+
+  ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), 
promise);
+}
+
+@Override
+public void handlerRemoved(ChannelHandlerContext ctx) throws Exception 
{
+  try {
+backend.dispose();
+  } finally {
+super.handlerRemoved(ctx);
+  }
+}
+
+  }
+
+  private static class DecryptionHandler extends 
MessageToMessageDecoderByteBuf {
+
+private final SaslEncryptionBackend backend;
+
+DecryptionHandler(SaslEncryptionBackend backend) {
+  this.backend = backend;
+}
+
+@Override
+protected void decode(ChannelHandlerContext ctx, ByteBuf msg, 
ListObject out)
+  throws Exception {
+
+  byte[] data;
+  int offset;
+  int length = msg.readableBytes();
+  if (msg.hasArray()) {
+data = msg.array();
+offset = msg.arrayOffset();
+  } else {
+data = new byte[length];
+msg.readBytes(data);
+offset = 0;
+

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96816752
  
It looks like SQL failed 11 MiMa checks, although it looks like all of them 
are in test code or internal APIs (so we can just double-check, then add the 
proper excludes / annotations):

```
[info] spark-sql: found 13 potential binary incompatibilities (filtered 101)
[error]  * method 
checkAnalysis()org.apache.spark.sql.catalyst.analysis.CheckAnalysis in class 
org.apache.spark.sql.SQLContext does not have a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.sql.SQLContext.checkAnalysis)
[error]  * method children()scala.collection.immutable.Nil# in class 
org.apache.spark.sql.execution.ExecutedCommand has now a different result type; 
was: scala.collection.immutable.Nil#, is now: scala.collection.Seq
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.ExecutedCommand.children)
[error]  * class org.apache.spark.sql.execution.AddExchange does not have a 
correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.execution.AddExchange)
[error]  * method children()scala.collection.immutable.Nil# in class 
org.apache.spark.sql.execution.LogicalLocalTable has now a different result 
type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalLocalTable.children)
[error]  * method 
newInstance()org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation in 
class org.apache.spark.sql.execution.LogicalLocalTable has now a different 
result type; was: org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, 
is now: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalLocalTable.newInstance)
[error]  * method children()scala.collection.immutable.Nil# in class 
org.apache.spark.sql.execution.PhysicalRDD has now a different result type; 
was: scala.collection.immutable.Nil#, is now: scala.collection.Seq
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.PhysicalRDD.children)
[error]  * method children()scala.collection.immutable.Nil# in class 
org.apache.spark.sql.execution.LocalTableScan has now a different result type; 
was: scala.collection.immutable.Nil#, is now: scala.collection.Seq
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LocalTableScan.children)
[error]  * object org.apache.spark.sql.execution.AddExchange does not have 
a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.execution.AddExchange$)
[error]  * method children()scala.collection.immutable.Nil# in class 
org.apache.spark.sql.execution.LogicalRDD has now a different result type; was: 
scala.collection.immutable.Nil#, is now: scala.collection.Seq
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalRDD.children)
[error]  * method 
newInstance()org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation in 
class org.apache.spark.sql.execution.LogicalRDD has now a different result 
type; was: org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, is 
now: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
[error]filter with: 
ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalRDD.newInstance)
[error]  * class org.apache.spark.sql.parquet.ParquetTestData does not have 
a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.ParquetTestData)
[error]  * object org.apache.spark.sql.parquet.ParquetTestData does not 
have a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.ParquetTestData$)
[error]  * class org.apache.spark.sql.parquet.TestGroupWriteSupport does 
not have a correspondent in new version
[error]filter with: 
ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.TestGroupWriteSupport)
```

`launcher` failed its checks because it couldn't find a spark-launcher JAR 
on Maven:

```
[info] spark-mllib: found 0 potential binary incompatibilities (filtered 
242)
sbt.ResolveException: unresolved dependency: 
org.apache.spark#spark-launcher_2.10;1.3.0: not found
at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278)
at

[GitHub] spark pull request: [SPARK-7100][MLLib] Fix persisted RDD leak in ...

2015-04-27 Thread jimfcarroll

Github user jimfcarroll commented on the pull request:

https://github.com/apache/spark/pull/5669#issuecomment-96816661
  
Your project. I'll downgrade it if you want. :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread helena

Github user helena commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29194115
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
 ---
@@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
 logDebug(sRead partition data of $this from block manager, block 
$blockId)
 iterator
   case None = // Data not found in Block Manager, grab it from write 
ahead log file
-val reader = new WriteAheadLogRandomReader(partition.segment.path, 
hadoopConf)
-val dataRead = reader.read(partition.segment)
-reader.close()
+var dataRead: ByteBuffer = null
--- End diff --

I feel dirty seeing nulls in scala


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...

2015-04-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/5568#issuecomment-96830292
  
Hmm this is weird - somehow the test results for this PR were never posted 
to github. 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31005/consoleFull
 reports a json parsing error

cc @shaneknapp 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...

2015-04-27 Thread brkyvz

Github user brkyvz commented on a diff in the pull request:

https://github.com/apache/spark/pull/5616#discussion_r29195806
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.expressions
+
+import org.apache.spark.sql.catalyst.analysis.UnresolvedException
+import org.apache.spark.sql.types._
+
+abstract class MathematicalExpression(name: String) extends 
UnaryExpression with Serializable { 
+  self: Product =
+  type EvaluatedType = Any
+
+  override def dataType: DataType = DoubleType
+  override def foldable: Boolean = child.foldable
+  override def nullable: Boolean = true
+  override def toString: String = s$name($child)
+
+  lazy val numeric = child.dataType match {
+case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]]
+case other = sys.error(sType $other does not support numeric 
operations)
+  }
+}
+
+abstract class MathematicalExpressionForDouble(f: Double = Double, name: 
String)
+  extends MathematicalExpression(name) { self: Product =
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  val result = f(numeric.toDouble(evalE)) 
+  if (result.isNaN) null
+  else result
+}
+  }
+}
+
+abstract class MathematicalExpressionForInt(f: Int = Int, name: String)
+  extends MathematicalExpression(name) { self: Product =
+  override def dataType: DataType = IntegerType
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  f(numeric.toInt(evalE))
+}
+  }
+}
+
+abstract class MathematicalExpressionForFloat(f: Float = Float, name: 
String)
+  extends MathematicalExpression(name) { self: Product =
+
+  override def dataType: DataType = FloatType
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  val result = f(numeric.toFloat(evalE))
+  if (result.isNaN) null
+  else result
+}
+  }
+}
+
+abstract class MathematicalExpressionForLong(f: Long = Long, name: String)
+  extends MathematicalExpression(name) { self: Product =
+
+  override def dataType: DataType = LongType
+
+  override def eval(input: Row): Any = {
+val evalE = child.eval(input)
+if (evalE == null) {
+  null
+} else {
+  f(numeric.toLong(evalE))
+}
+  }
+}
+
+case class Sin(child: Expression) extends 
MathematicalExpressionForDouble(math.sin, SIN)
+
+case class Asin(child: Expression) extends 
MathematicalExpressionForDouble(math.asin, ASIN)
+
+case class Sinh(child: Expression) extends 
MathematicalExpressionForDouble(math.sinh, SINH)
+
+case class Cos(child: Expression) extends 
MathematicalExpressionForDouble(math.cos, COS)
+
+case class Acos(child: Expression) extends 
MathematicalExpressionForDouble(math.acos, ACOS)
+
+case class Cosh(child: Expression) extends 
MathematicalExpressionForDouble(math.cosh, COSH)
+
+case class Tan(child: Expression) extends 
MathematicalExpressionForDouble(math.tan, TAN)
+
+case class Atan(child: Expression) extends 
MathematicalExpressionForDouble(math.atan, ATAN)
+
+case class Tanh(child: Expression) extends 
MathematicalExpressionForDouble(math.tanh, TANH)
+
+case class Ceil(child: Expression) extends 
MathematicalExpressionForDouble(math.ceil, CEIL)
+
+case class Floor(child: Expression) extends 
MathematicalExpressionForDouble(math.floor, FLOOR)
+
+case class Rint(child: Expression) extends 
MathematicalExpressionForDouble(math.rint, ROUND)
+
+case class Cbrt(child: Expression) extends

[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3074#issuecomment-96795918
  
  [Test build #721 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/721/consoleFull)
 for   PR 3074 at commit 
[`064101c`](https://github.com/apache/spark/commit/064101c0096eb44b7d91fa62bafa27756279aca2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5636#issuecomment-96795847
  
  [Test build #30990 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30990/consoleFull)
 for   PR 5636 at commit 
[`0335b96`](https://github.com/apache/spark/commit/0335b967b4b1a91782b5a608220c9c3eeb0bf8e1).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `tachyon-0.6.4.jar`
   * `tachyon-client-0.6.4.jar`

 * This patch **removes the following dependencies:**
   * `tachyon-0.5.0.jar`
   * `tachyon-client-0.5.0.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-04-27 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r29190417
  
--- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala ---
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.status.api.v1
+
+import java.util.Date
+
+import scala.collection.Map
+
+import org.apache.spark.JobExecutionStatus
+
+class ApplicationInfo(
--- End diff --

this was intentional, to get them covered by mima.  I also think the goal 
is provide more stability than implied by `@DeveloperApi`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0

2015-04-27 Thread punya

Github user punya commented on the pull request:

https://github.com/apache/spark/pull/5726#issuecomment-96796911
  
Closing in favor of work on 
https://issues.apache.org/jira/browse/SPARK-6906.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5005#issuecomment-96807849
  
**[Test build #31054 timed 
out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31054/consoleFull)**
 for PR 5005 at commit 
[`2e0603a`](https://github.com/apache/spark/commit/2e0603a0f94c51d3dae64883f2bd91f3080f9c7e)
 after a configured wait of `120m`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...

2015-04-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/4783#issuecomment-96815195
  
Thanks @advancedxy for fixing the tests. This change LGTM. @rxin @srowen 
Could you also take a final look at this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...

2015-04-27 Thread JoshRosen

Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/5727#issuecomment-96817556
  
Ah, `launcher` was only added in 1.4, so I'll put the MiMa exclude back.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4450#discussion_r29191148
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala ---
@@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C](
   in.close()
 }
   }
+} else if (spills.isEmpty  partitionWriters == null) {
+  // Case where we only have in-memory data
+  val collection = if (aggregator.isDefined) map else buffer
+  val it = 
collection.destructiveSortedWritablePartitionedIterator(comparator)
+  while (it.hasNext) {
+val writer = blockManager.getDiskWriter(
+  blockId, outputFile, ser, fileBufferSize, 
context.taskMetrics.shuffleWriteMetrics.get)
+val partitionId = it.nextPartition()
+while (it.hasNext  it.nextPartition() == partitionId) {
+  it.writeNext(writer)
+}
+writer.commitAndClose()
+val segment = writer.fileSegment()
+lengths(partitionId) = segment.length
+  }
 } else {
-  // Either we're not bypassing merge-sort or we have only in-memory 
data; get an iterator by
-  // partition and just write everything directly.
+  // Not bypassing merge-sort; get an iterator by partition and just 
write everything directly.
--- End diff --

That is correct.  So there's definitely room for performance optimization 
here, but I thought it would be easier as a followup.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7168] [BUILD] Update plugin versions in...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5720#issuecomment-96826279
  
  [Test build #31063 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31063/consoleFull)
 for   PR 5720 at commit 
[`98a8947`](https://github.com/apache/spark/commit/98a8947fbd62bc048e2462b37627966a7dfd9e11).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/5673#issuecomment-96829768
  
Thanks. Merging in master.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...

2015-04-27 Thread squito

Github user squito commented on a diff in the pull request:

https://github.com/apache/spark/pull/4435#discussion_r29183894
  
--- Diff: 
core/src/main/java/org/apache/spark/status/api/v1/TaskSorting.java ---
@@ -0,0 +1,45 @@
+package org.apache.spark.status.api.v1;/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import org.apache.spark.status.api.EnumUtil;
+
+import java.util.HashSet;
+import java.util.Set;
+
+public enum TaskSorting {
--- End diff --

unfortunately jersey requires them to be public -- I'll tag w/ 
`@DeveloperApi`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5685#issuecomment-96796263
  
  [Test build #30977 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30977/consoleFull)
 for   PR 5685 at commit 
[`6d4d3f1`](https://github.com/apache/spark/commit/6d4d3f1ac8da883fb814613afec35900b078b751).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **adds the following new dependencies:**
   * `tachyon-0.6.4.jar`
   * `tachyon-client-0.6.4.jar`

 * This patch **removes the following dependencies:**
   * `tachyon-0.5.0.jar`
   * `tachyon-client-0.5.0.jar`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4450#discussion_r29191682
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala
 ---
@@ -0,0 +1,254 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+import java.io.InputStream
+import java.nio.IntBuffer
+import java.util.Comparator
+
+import org.apache.spark.SparkEnv
+import org.apache.spark.serializer.{JavaSerializerInstance, 
SerializerInstance}
+import org.apache.spark.storage.BlockObjectWriter
+import org.apache.spark.util.collection.PartitionedSerializedPairBuffer._
+
+/**
+ * Append-only buffer of key-value pairs, each with a corresponding 
partition ID, that serializes
+ * its records upon insert and stores them as raw bytes.
+ *
+ * We use two data-structures to store the contents. The serialized 
records are stored in a
+ * ChainedBuffer that can expand gracefully as records are added. This 
buffer is accompanied by a
+ * metadata buffer that stores pointers into the data buffer as well as 
the partition ID of each
+ * record. Each entry in the metadata buffer takes up a fixed amount of 
space.
+ *
+ * Sorting the collection means swapping entries in the metadata buffer - 
the record buffer need not
+ * be modified at all. Storing the partition IDs in the metadata buffer 
means that comparisons can
+ * happen without following any pointers, which should minimize cache 
misses.
+ *
+ * Currently, only sorting by partition is supported.
+ *
+ * @param metaInitialRecords The initial number of entries in the metadata 
buffer.
+ * @param kvBlockSize The size of each byte buffer in the ChainedBuffer 
used to store the records.
+ * @param serializerInstance the serializer used for serializing inserted 
records.
+ */
+private[spark] class PartitionedSerializedPairBuffer[K, V](
+metaInitialRecords: Int,
+kvBlockSize: Int,
+serializerInstance: SerializerInstance = 
SparkEnv.get.serializer.newInstance)
+  extends WritablePartitionedPairCollection[K, V] {
+
+  if (serializerInstance.isInstanceOf[JavaSerializerInstance]) {
+throw new IllegalArgumentException(PartitionedSerializedPairBuffer 
does not support +
+   Java-serialized objects.)
+  }
+
+  private var metaBuffer = IntBuffer.allocate(metaInitialRecords * NMETA)
+
+  private val kvBuffer: ChainedBuffer = new ChainedBuffer(kvBlockSize)
+  private val kvOutputStream = new ChainedBufferOutputStream(kvBuffer)
+  private val kvSerializationStream = 
serializerInstance.serializeStream(kvOutputStream)
+
+  def insert(partition: Int, key: K, value: V): Unit = {
+if (metaBuffer.position == metaBuffer.capacity) {
+  growMetaBuffer()
+}
+
+val keyStart = kvBuffer.size
+if (keyStart  0) {
+  throw new Exception(sCan't grow buffer beyond ${1  31} bytes)
+}
+kvSerializationStream.writeObject[Any](key)
+kvSerializationStream.flush()
+val valueStart = kvBuffer.size
+kvSerializationStream.writeObject[Any](value)
+kvSerializationStream.flush()
+val valueEnd = kvBuffer.size
+
+metaBuffer.put(keyStart)
+metaBuffer.put(valueStart)
+metaBuffer.put(valueEnd)
+metaBuffer.put(partition)
+  }
+
+  /** Double the size of the array because we've reached capacity */
+  private def growMetaBuffer(): Unit = {
+if (metaBuffer.capacity * 4 = (1  30)) {
+  // Doubling the capacity would create an array bigger than 
Int.MaxValue, so don't
+  throw new Exception(
+sCan't grow buffer beyond ${(1  30) / (NMETA * 4)} elements)
+}
+val newMetaBuffer = IntBuffer.allocate(metaBuffer.capacity * 2)
+newMetaBuffer.put(metaBuffer.array)
+metaBuffer = newMetaBuffer
+  }
+
+  /** Iterate through the data in a given order. For this

[GitHub] spark pull request: [SPARK-6862][Streaming][WebUI] Add BatchPage t...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5473#issuecomment-96827274
  
  [Test build #31070 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31070/consoleFull)
 for   PR 5473 at commit 
[`cb62e4f`](https://github.com/apache/spark/commit/cb62e4fe27763a23f7c925fc7086d3f606dc7034).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...

2015-04-27 Thread helena

Github user helena commented on a diff in the pull request:

https://github.com/apache/spark/pull/5645#discussion_r29194436
  
--- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
 ---
@@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
 logDebug(sRead partition data of $this from block manager, block 
$blockId)
 iterator
   case None = // Data not found in Block Manager, grab it from write 
ahead log file
-val reader = new WriteAheadLogRandomReader(partition.segment.path, 
hadoopConf)
-val dataRead = reader.read(partition.segment)
-reader.close()
+var dataRead: ByteBuffer = null
--- End diff --

ByteBuffer.wrap(new byte[0])


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add support for zipping a sequence of RDDs

2015-04-27 Thread mohitjaggi

Github user mohitjaggi commented on the pull request:

https://github.com/apache/spark/pull/2429#issuecomment-96791360
  
closing on sean's request. i have a workaround.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: add support for zipping a sequence of RDDs

2015-04-27 Thread mohitjaggi

Github user mohitjaggi closed the pull request at:

https://github.com/apache/spark/pull/2429


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5637#issuecomment-96796679
  
  [Test build #30989 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30989/consoleFull)
 for   PR 5637 at commit 
[`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch **removes the following dependencies:**
   * `RoaringBitmap-0.4.5.jar`
   * `activation-1.1.jar`
   * `akka-actor_2.10-2.3.4-spark.jar`
   * `akka-remote_2.10-2.3.4-spark.jar`
   * `akka-slf4j_2.10-2.3.4-spark.jar`
   * `aopalliance-1.0.jar`
   * `arpack_combined_all-0.1.jar`
   * `avro-1.7.7.jar`
   * `breeze-macros_2.10-0.11.2.jar`
   * `breeze_2.10-0.11.2.jar`
   * `chill-java-0.5.0.jar`
   * `chill_2.10-0.5.0.jar`
   * `commons-beanutils-1.7.0.jar`
   * `commons-beanutils-core-1.8.0.jar`
   * `commons-cli-1.2.jar`
   * `commons-codec-1.10.jar`
   * `commons-collections-3.2.1.jar`
   * `commons-compress-1.4.1.jar`
   * `commons-configuration-1.6.jar`
   * `commons-digester-1.8.jar`
   * `commons-httpclient-3.1.jar`
   * `commons-io-2.1.jar`
   * `commons-lang-2.5.jar`
   * `commons-lang3-3.3.2.jar`
   * `commons-math-2.1.jar`
   * `commons-math3-3.4.1.jar`
   * `commons-net-2.2.jar`
   * `compress-lzf-1.0.0.jar`
   * `config-1.2.1.jar`
   * `core-1.1.2.jar`
   * `curator-client-2.4.0.jar`
   * `curator-framework-2.4.0.jar`
   * `curator-recipes-2.4.0.jar`
   * `gmbal-api-only-3.0.0-b023.jar`
   * `grizzly-framework-2.1.2.jar`
   * `grizzly-http-2.1.2.jar`
   * `grizzly-http-server-2.1.2.jar`
   * `grizzly-http-servlet-2.1.2.jar`
   * `grizzly-rcm-2.1.2.jar`
   * `groovy-all-2.3.7.jar`
   * `guava-14.0.1.jar`
   * `guice-3.0.jar`
   * `hadoop-annotations-2.2.0.jar`
   * `hadoop-auth-2.2.0.jar`
   * `hadoop-client-2.2.0.jar`
   * `hadoop-common-2.2.0.jar`
   * `hadoop-hdfs-2.2.0.jar`
   * `hadoop-mapreduce-client-app-2.2.0.jar`
   * `hadoop-mapreduce-client-common-2.2.0.jar`
   * `hadoop-mapreduce-client-core-2.2.0.jar`
   * `hadoop-mapreduce-client-jobclient-2.2.0.jar`
   * `hadoop-mapreduce-client-shuffle-2.2.0.jar`
   * `hadoop-yarn-api-2.2.0.jar`
   * `hadoop-yarn-client-2.2.0.jar`
   * `hadoop-yarn-common-2.2.0.jar`
   * `hadoop-yarn-server-common-2.2.0.jar`
   * `ivy-2.4.0.jar`
   * `jackson-annotations-2.4.0.jar`
   * `jackson-core-2.4.4.jar`
   * `jackson-core-asl-1.8.8.jar`
   * `jackson-databind-2.4.4.jar`
   * `jackson-jaxrs-1.8.8.jar`
   * `jackson-mapper-asl-1.8.8.jar`
   * `jackson-module-scala_2.10-2.4.4.jar`
   * `jackson-xc-1.8.8.jar`
   * `jansi-1.4.jar`
   * `javax.inject-1.jar`
   * `javax.servlet-3.0.0.v201112011016.jar`
   * `javax.servlet-3.1.jar`
   * `javax.servlet-api-3.0.1.jar`
   * `jaxb-api-2.2.2.jar`
   * `jaxb-impl-2.2.3-1.jar`
   * `jcl-over-slf4j-1.7.10.jar`
   * `jersey-client-1.9.jar`
   * `jersey-core-1.9.jar`
   * `jersey-grizzly2-1.9.jar`
   * `jersey-guice-1.9.jar`
   * `jersey-json-1.9.jar`
   * `jersey-server-1.9.jar`
   * `jersey-test-framework-core-1.9.jar`
   * `jersey-test-framework-grizzly2-1.9.jar`
   * `jets3t-0.7.1.jar`
   * `jettison-1.1.jar`
   * `jetty-util-6.1.26.jar`
   * `jline-0.9.94.jar`
   * `jline-2.10.4.jar`
   * `jodd-core-3.6.3.jar`
   * `json4s-ast_2.10-3.2.10.jar`
   * `json4s-core_2.10-3.2.10.jar`
   * `json4s-jackson_2.10-3.2.10.jar`
   * `jsr305-1.3.9.jar`
   * `jtransforms-2.4.0.jar`
   * `jul-to-slf4j-1.7.10.jar`
   * `kryo-2.21.jar`
   * `log4j-1.2.17.jar`
   * `lz4-1.2.0.jar`
   * `management-api-3.0.0-b012.jar`
   * `mesos-0.21.0-shaded-protobuf.jar`
   * `metrics-core-3.1.0.jar`
   * `metrics-graphite-3.1.0.jar`
   * `metrics-json-3.1.0.jar`
   * `metrics-jvm-3.1.0.jar`
   * `minlog-1.2.jar`
   * `netty-3.8.0.Final.jar`
   * `netty-all-4.0.23.Final.jar`
   * `objenesis-1.2.jar`
   * `opencsv-2.3.jar`
   * `oro-2.0.8.jar`
   * `paranamer-2.6.jar`
   * `parquet-column-1.6.0rc3.jar`
   * `parquet-common-1.6.0rc3.jar`
   * `parquet-encoding-1.6.0rc3.jar`
   * `parquet-format-2.2.0-rc1.jar`
   * `parquet-generator-1.6.0rc3.jar`
   * `parquet-hadoop-1.6.0rc3.jar`
   * `parquet-jackson-1.6.0rc3.jar`
   * `protobuf-java-2.4.1.jar`
   * `protobuf-java-2.5.0-spark.jar`
   * `py4j-0.8.2.1.jar`
   * `pyrolite-2.0.1.jar`
   * `quasiquotes_2.10-2.0.1.jar`
   * `reflectasm-1.07-shaded.jar`
   * `scala-compiler-2.10.4.jar`
   * `scala-library-2.10.4.jar`
   *

[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...

2015-04-27 Thread shivaram

Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/5667#issuecomment-96820582
  
@srowen Could you add Joof as a contributor on our JIRA ? The assignee 
auto-complete doesn't seem to pick this up right now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...

2015-04-27 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4450#discussion_r29193245
  
--- Diff: 
core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala
 ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+import java.util.Comparator
+
+import org.apache.spark.storage.BlockObjectWriter
+
+/**
+ * A common interface for size-tracking collections of key-value pairs that
+ * - Have an associated partition for each key-value pair.
+ * - Support a memory-efficient sorted iterator
+ * - Support a WritablePartitionedIterator for writing the contents 
directly as bytes.
+ */
+private[spark] trait WritablePartitionedPairCollection[K, V] extends 
SizeTracker {
+  /**
+   * Insert a key-value pair with a partition into the collection
+   */
+  def insert(partition: Int, key: K, value: V): Unit
+
+  /**
+   * Estimate the collection's current memory usage in bytes.
+   */
+  def estimateSize(): Long
+
+  /**
+   * Iterate through the data in order of partition ID and then the given 
comparator. This may
+   * destroy the underlying collection.
+   */
+  def partitionedDestructiveSortedIterator(keyComparator: Comparator[K]): 
Iterator[((Int, K), V)]
+
+  /**
+   * Iterate through the data and write out the elements instead of 
returning them. Records are
+   * returned in order of their partition ID and then the given comparator.
+   * This may destroy the underlying collection.
+   */
+  def destructiveSortedWritablePartitionedIterator(keyComparator: 
Comparator[K])
+: WritablePartitionedIterator = {
+
WritablePartitionedIterator.fromIterator(partitionedDestructiveSortedIterator(keyComparator))
+  }
+
+  /**
+   * Iterate through the data and write out the elements instead of 
returning them.
+   */
+  def writablePartitionedIterator(): WritablePartitionedIterator
+}
+
+private[spark] object WritablePartitionedPairCollection {
+  /**
+   * A comparator for (Int, K) pairs that orders them by only their 
partition ID.
+   */
+  def partitionComparator[K]: Comparator[(Int, K)] = new Comparator[(Int, 
K)] {
+override def compare(a: (Int, K), b: (Int, K)): Int = {
+  a._1 - b._1
+}
+  }
+
+  /**
+   * A comparator for (Int, K) pairs that orders them both by their 
partition ID and a key ordering.
+   */
+  def partitionKeyComparator[K](keyComparator: Comparator[K]): 
Comparator[(Int, K)] = {
+new Comparator[(Int, K)] {
+  override def compare(a: (Int, K), b: (Int, K)): Int = {
+val partitionDiff = a._1 - b._1
+if (partitionDiff != 0) {
+  partitionDiff
+} else {
+  keyComparator.compare(a._2, b._2)
+}
+  }
+}
+  }
+}
+
+/**
+ * Iterator that writes elements to a BlockObjectWriter instead of 
returning them. Each element
+ * has an associated partition.
+ */
+private[spark] trait WritablePartitionedIterator {
+  def writeNext(writer: BlockObjectWriter): Unit
+
+  def hasNext(): Boolean
+
+  def nextPartition(): Int
--- End diff --

`WritablePartitionedIterator` doesn't have a `next` method (just 
`writeNext`), so I don't think there would be anything to peek at.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...

2015-04-27 Thread brennonyork

Github user brennonyork commented on the pull request:

https://github.com/apache/spark/pull/5694#issuecomment-96827964
  
jenkins, retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5723#issuecomment-96829203
  
  [Test build #31067 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31067/consoleFull)
 for   PR 5723 at commit 
[`98bfe48`](https://github.com/apache/spark/commit/98bfe48d603c56f45945049b72a484686e2d0be2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.
 * This patch does not change any dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...

2015-04-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/5383#issuecomment-96516549
  
@liancheng @marmbrus  Any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7160][SQL] Support converting DataFrame...

2015-04-27 Thread rayortigas

GitHub user rayortigas opened a pull request:

https://github.com/apache/spark/pull/5713

[SPARK-7160][SQL] Support converting DataFrames to typed RDDs.

https://issues.apache.org/jira/browse/SPARK-7160
https://github.com/databricks/spark-csv/pull/52

cc:
@rxin (who made the original suggestion)
@vlyubin #5279
@punya #5578
@davies #5350
@marmbrus (ScalaReflection and more)


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rayortigas/spark df-to-typed-rdd

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/5713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #5713


commit add51b6ad8f0ffe0ed600917d4339a531da07750
Author: Ray Ortigas r...@linkedin.com
Date:   2015-04-27T06:27:50Z

[SPARK-7160][SQL] Support converting DataFrames to typed RDDs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7158] [SQL] Fix bug of cached data cann...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5714#issuecomment-96552127
  
  [Test build #30963 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30963/consoleFull)
 for   PR 5714 at commit 
[`e2c4298`](https://github.com/apache/spark/commit/e2c429829e0525b72eaf0d2879d735ab75072c43).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6505][SQL]Remove the reflection call in...

2015-04-27 Thread liancheng

Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/5660#issuecomment-96515766
  
Thanks for working on this! I'm merging this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-27 Thread chenghao-intel

Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-96516483
  
@liancheng @rxin @marmbrus can you trigger the unit test for me?

Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl

2015-04-27 Thread scwf

Github user scwf commented on a diff in the pull request:

https://github.com/apache/spark/pull/5715#discussion_r29126807
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -81,11 +81,38 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertCTAS: Boolean =
 getConf(spark.sql.hive.convertCTAS, false).toBoolean
 
-  override protected[sql] def executePlan(plan: LogicalPlan): 
this.QueryExecution =
-new this.QueryExecution(plan)
+  /* A catalyst metadata catalog that points to the Hive Metastore. */
+  @transient
+  override protected[sql] lazy val catalog = new 
HiveMetastoreCatalog(this) with OverrideCatalog
+
+  // Note that HiveUDFs will be overridden by functions registered in this 
context.
+  @transient
+  override protected[sql] lazy val functionRegistry =
+new HiveFunctionRegistry with OverrideFunctionRegistry {
+  def caseSensitive: Boolean = false
+}
 
+  /* An analyzer that uses the Hive metastore. */
   @transient
-  protected[sql] val ddlParserWithHiveQL = new 
DDLParser(HiveQl.parseSql(_))
--- End diff --

we do not need this, since if we override sqlParser, we can inherited from 
sqlcontext the ddlParser
`
  protected[sql] val ddlParser = new DDLParser(sqlParser.parse(_))

`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6888][SQL] Make the jdbc driver handlin...

2015-04-27 Thread rtreffer

Github user rtreffer commented on the pull request:

https://github.com/apache/spark/pull/#issuecomment-96573899
  
@marmbrus what should we do now? New PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-4705:[core] Write event logs of differen...

2015-04-27 Thread twinkle-sachdeva

Github user twinkle-sachdeva closed the pull request at:

https://github.com/apache/spark/pull/4845


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7155] [CORE] Allow newAPIHadoopFile to ...

2015-04-27 Thread yongtang

Github user yongtang commented on the pull request:

https://github.com/apache/spark/pull/5708#issuecomment-96524002
  
@srowen Thanks for the comment. I updated the pull request so that 
setInputPaths instead of addInputPaths are used. In addition to 
newAPIHadoopFile(), the instances of addInputPath inside wholeTextFiles() and 
binaryFiles() have also been updated with setInputPaths. That should bring 
behavior consistency across all ScalaContext.scala. 

The unit test for this issue has also been updated to cover every method 
involved. Please let me know if there is anything else that needs to be taken 
care of.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export

2015-04-27 Thread selvinsource

Github user selvinsource commented on the pull request:

https://github.com/apache/spark/pull/3062#issuecomment-96544425
  
For binary logistic regression, using the same principle (intercept as 
threshold), doing some maths, we could set:
`intercept = -ln(1/threshold - 1)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5715#issuecomment-96552130
  
  [Test build #30962 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30962/consoleFull)
 for   PR 5715 at commit 
[`f76a7b1`](https://github.com/apache/spark/commit/f76a7b1eb2cec2c922f8a82e3e67da03984e886e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...

2015-04-27 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/5716#issuecomment-96552117
  
  [Test build #30961 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30961/consoleFull)
 for   PR 5716 at commit 
[`b64564c`](https://github.com/apache/spark/commit/b64564c74248ef137ed3352e145735ce669bccf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...

2015-04-27 Thread twinkle-sachdeva

Github user twinkle-sachdeva commented on the pull request:

https://github.com/apache/spark/pull/5449#issuecomment-96515946
  
Hi @srowen ,

Please review the changes.

Thanks,



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-96516744
  
I think Jenkins is having some trouble right now.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...

2015-04-27 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4015#issuecomment-96516751
  
Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-6505][SQL]Remove the reflection call in...

2015-04-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/5660


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0

2015-04-27 Thread LuqmanSahaf

Github user LuqmanSahaf commented on the pull request:

https://github.com/apache/spark/pull/468#issuecomment-96522017
  
@darose I am facing the VerifyError you mentioned in one of the comments. 
Can you tell me how you solved that error?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 713 matches

Mail list logo