[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/5715#discussion_r29126734 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -81,11 +81,38 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { protected[sql] def convertCTAS: Boolean = getConf(spark.sql.hive.convertCTAS, false).toBoolean - override protected[sql] def executePlan(plan: LogicalPlan): this.QueryExecution = -new this.QueryExecution(plan) + /* A catalyst metadata catalog that points to the Hive Metastore. */ + @transient + override protected[sql] lazy val catalog = new HiveMetastoreCatalog(this) with OverrideCatalog --- End diff -- reorder to make catalog, functionRegistry, analyzer, sqlParser togethor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...
Github user tsudukim commented on a diff in the pull request: https://github.com/apache/spark/pull/5227#discussion_r29132469 --- Diff: launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java --- @@ -260,15 +260,14 @@ static String quoteForBatchScript(String arg) { quoted.append(''); break; - case '=': --- End diff -- I've run `SparkLauncherSuite` on Windows and it's OK. If double-quotation is parsed properly, `=` in double-quotation is not need to be escaped. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96516889 @mengxr for SVM, I manually tried what you suggested and it looks good. I loaded the example below in JPMML and evaluated it as Classification map, indeed the intercept on the NO category acts as threshold when `normalizationMethod = none`. Here the example: code ?xml version=1.0 encoding=UTF-8 standalone=yes? PMML xmlns=http://www.dmg.org/PMML-4_2; Header description=linear SVM: if predicted value gt; 0, the outcome is positive, or negative otherwise Application name=Apache Spark MLlib version=1.4.0-SNAPSHOT/ Timestamp2015-04-27T06:58:22/Timestamp /Header DataDictionary numberOfFields=10 DataField name=field_0 optype=continuous dataType=double/ DataField name=field_1 optype=continuous dataType=double/ DataField name=field_2 optype=continuous dataType=double/ DataField name=field_3 optype=continuous dataType=double/ DataField name=field_4 optype=continuous dataType=double/ DataField name=field_5 optype=continuous dataType=double/ DataField name=field_6 optype=continuous dataType=double/ DataField name=field_7 optype=continuous dataType=double/ DataField name=field_8 optype=continuous dataType=double/ DataField name=target optype=categorical dataType=string/ /DataDictionary RegressionModel modelName=linear SVM: if predicted value gt; 0, the outcome is positive, or negative otherwise functionName=classification normalizationMethod=none MiningSchema MiningField name=field_0 usageType=active/ MiningField name=field_1 usageType=active/ MiningField name=field_2 usageType=active/ MiningField name=field_3 usageType=active/ MiningField name=field_4 usageType=active/ MiningField name=field_5 usageType=active/ MiningField name=field_6 usageType=active/ MiningField name=field_7 usageType=active/ MiningField name=field_8 usageType=active/ MiningField name=target usageType=target/ /MiningSchema RegressionTable intercept=-1.2973802920137774 targetCategory=1 NumericPredictor name=field_0 coefficient=-0.0818303650185629/ NumericPredictor name=field_1 coefficient=0.5609579878511747/ NumericPredictor name=field_2 coefficient=0.1382792114252377/ NumericPredictor name=field_3 coefficient=0.07497131265977852/ NumericPredictor name=field_4 coefficient=-0.47760356523751296/ NumericPredictor name=field_5 coefficient=0.3817837986572615/ NumericPredictor name=field_6 coefficient=-0.23753782335208481/ NumericPredictor name=field_7 coefficient=0.2548602390316011/ NumericPredictor name=field_8 coefficient=-0.10271528637619945/ /RegressionTable RegressionTable intercept=0.0 targetCategory=0/ /RegressionModel /PMML /code However, I noticed that if the SVM model threshold is set to None, it simply displays the margin (which is how it is implemented now in the pmml exporter). My question is, should we support both? If `threshold = None`, export as regression (like it is implemented now), if `threshold None`, export as binary classification (as you suggested). What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL][Minor] fix java doc for DataFrame.agg
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5712#issuecomment-96516809 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0
Github user mag- commented on the pull request: https://github.com/apache/spark/pull/468#issuecomment-96587264 Are you aware that all this regexp hacks will break when hadoop changes version to 3.0.0? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5717#issuecomment-96597976 [Test build #30964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30964/consoleFull) for PR 5717 at commit [`fc862f4`](https://github.com/apache/spark/commit/fc862f421b5cdbac18535fa09a2af668a5fc74d9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5946][Streaming] Add Python API for dir...
Github user jerryshao commented on the pull request: https://github.com/apache/spark/pull/4723#issuecomment-96580759 Hi @davies and @tdas , I met a problem of converting Python `int` into Java `Long`, the Java API in KafkaUtils requires offset as `Long` type, this is simple for Python 2, since Python 2 has a built-in `long` type which can be mapped to Java `Long` through py4j automatically, but python 3 only has `int` type, and py4j will map python `int` into Java `Integer`, I'm not sure how to support `Long` in python 3. A simple solution is to modify all the Java-Python interface to change to type `Interger`, but it may not support super large offset. I'm not sure is there any other solution. Sorry for dumb question and thanks a lot in advance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add filter by location boundingbox in TwitterI...
GitHub user yang0228 opened a pull request: https://github.com/apache/spark/pull/5718 Add filter by location boundingbox in TwitterInputDStream.scala Current TwitterInputDStream only filters by keywords. Need a filtering by location feature. You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-1.3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5718.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5718 commit f476108901c42ea61873f02dc2fee15896550d30 Author: q00251598 qiyad...@huawei.com Date: 2015-03-02T18:13:11Z [SPARK-5741][SQL] Support the path contains comma in HiveContext When run ```select * from nzhang_part where hr = 'file,';```, it throws exception ```java.lang.IllegalArgumentException: Can not create a Path from an empty string``` . Because the path of hdfs contains comma, and FileInputFormat.setInputPaths will split path by comma. ### SQL ``` set hive.merge.mapfiles=true; set hive.merge.mapredfiles=true; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; create table nzhang_part like srcpart; insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select key, value, hr from srcpart where ds='2008-04-08'; insert overwrite table nzhang_part partition (ds='2010-08-15', hr=11) select key, value from srcpart where ds='2008-04-08'; insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select * from ( select key, value, hr from srcpart where ds='2008-04-08' union all select '1' as key, '1' as value, 'file,' as hr from src limit 1) s; select * from nzhang_part where hr = 'file,'; ``` ### Error Log ``` 15/02/10 14:33:16 ERROR SparkSQLDriver: Failed in [select * from nzhang_part where hr = 'file,'] java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400) at org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:251) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:172) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196) Author: q00251598 qiyad...@huawei.com Closes #4532 from watermen/SPARK-5741 and squashes the following commits: 9758ab1 [q00251598] fix bug 1db1a1c [q00251598] use setInputPaths(Job job, Path... inputPaths) b788a72 [q00251598] change FileInputFormat.setInputPaths to jobConf.set and add test suite (cherry picked from commit 9ce12aaf283a2793e719bdc956dd858922636e8d) Signed-off-by: Michael Armbrust mich...@databricks.com commit 4ffaf856882fb1f4a5bfc24e5a05c74ba950e282 Author: Yanbo Liang yblia...@gmail.com Date: 2015-03-02T18:17:24Z [SPARK-6080] [PySpark] correct LogisticRegressionWithLBFGS regType parameter for pyspark Currently LogisticRegressionWithLBFGS in python/pyspark/mllib/classification.py will invoke callMLlibFunc with a wrong regType parameter. It was assigned to str(regType) which translate None(Python) to None(Java/Scala). The right way should be translate None(Python) to null(Java/Scala) just as what we did at LogisticRegressionWithSGD. Author: Yanbo Liang yblia...@gmail.com Closes #4831 from yanboliang/pyspark_classification and squashes the following commits: 12db65a [Yanbo Liang] correct LogisticRegressionWithLBFGS regType parameter for pyspark (cherry picked from commit af2effdd7b54316af0c02e781911acfb148b962b) Signed-off-by: Xiangrui Meng m...@databricks.com commit 54ac243655d2eaf331d9f8fc43a8c1301803320b Author: Paul Power paul.po...@peerside.com Date: 2015-03-02T21:08:47Z [DOCS] Refactored Dataframe join comment to use correct parameter ordering The API signatire for join requires the JoinType to be the third parameter. The code examples provided for join show JoinType being provided as the 2nd parater resuling in errors (i.e. df1.join(df2,
[GitHub] spark pull request: [SPARK-6435] spark-shell --jars option does no...
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/5227#issuecomment-96598393 The problem I mentioned was that the spark-shell.cmd which is called by `SparkLauncherSuite` somehow failed to launch test application. It turned out to be caused by the limitation of Windows batch that the one command line must be shorter than 8192 characters. (The fullpath for classpath was long because I worked at a deep folder.) So I assume all issue is now cleared up. Sorry for my late response. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/468#issuecomment-96611185 @mag- if you're talking about what I think you are, it was a temporary thing that's long since gone already https://github.com/apache/spark/pull/629/files --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7165] [SQL] use sort merge join for out...
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/5717 [SPARK-7165] [SQL] use sort merge join for outer join This is an extended version of #5208 In this patch, we are introducing sort merge join for not only inner joins, but left outer/ right outer/ full outer joins. Using sort merge join could resolve the OOM which is quite common as the memory easily becomes too small for joins of large tables. Test cases are always available in SortMergeCompatibilitySuite. And we need to add some more in `JoinSuite` to test the Join selection. Also , This patch would benefit from #3438 quite a lot. /cc @chenghao-intel You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark outersmj Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5717 commit fc862f421b5cdbac18535fa09a2af668a5fc74d9 Author: Daoyuan Wang daoyuan.w...@intel.com Date: 2015-04-27T09:40:55Z use sort merge join for outer join --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5715#issuecomment-96597552 [Test build #30962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30962/consoleFull) for PR 5715 at commit [`f76a7b1`](https://github.com/apache/spark/commit/f76a7b1eb2cec2c922f8a82e3e67da03984e886e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5716#issuecomment-96597549 [Test build #30961 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30961/consoleFull) for PR 5716 at commit [`b64564c`](https://github.com/apache/spark/commit/b64564c74248ef137ed3352e145735ce669bccf8). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5716#issuecomment-96613653 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5673#issuecomment-96792683 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5723#issuecomment-96798651 [Test build #31067 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31067/consoleFull) for PR 5723 at commit [`98bfe48`](https://github.com/apache/spark/commit/98bfe48d603c56f45945049b72a484686e2d0be2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5377#discussion_r29190170 --- Diff: network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.WritableByteChannel; +import java.util.List; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.Channel; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.ChannelOutboundHandlerAdapter; +import io.netty.channel.ChannelPromise; +import io.netty.channel.FileRegion; +import io.netty.handler.codec.MessageToMessageDecoder; +import io.netty.util.AbstractReferenceCounted; +import io.netty.util.ReferenceCountUtil; + +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.NettyUtils; + +class SaslEncryption { + + @VisibleForTesting + static final String ENCRYPTION_HANDLER_NAME = saslEncryption; + + /** + * Adds channel handlers that perform encryption / decryption of data using SASL. + * + * @param channel The channel. + * @param backend The SASL backend. + * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted blocks, to control + * memory usage. + */ + static void addToChannel( + Channel channel, + SaslEncryptionBackend backend, + int maxOutboundBlockSize) { +channel.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, maxOutboundBlockSize)) + .addFirst(saslDecryption, new DecryptionHandler(backend)) + .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder()); + } + + private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + +private final int maxOutboundBlockSize; +private final SaslEncryptionBackend backend; + +EncryptionHandler(SaslEncryptionBackend backend, int maxOutboundBlockSize) { + this.backend = backend; + this.maxOutboundBlockSize = maxOutboundBlockSize; +} + +/** + * Wrap the incoming message in an implementation that will perform encryption lazily. This is + * needed to guarantee ordering of the outgoing encrypted packets - they need to be decrypted in + * the same order, and netty doesn't have an atomic ChannelHandlerContext.write() API, so it + * does not guarantee any ordering. + */ +@Override +public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) + throws Exception { + + ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), promise); +} + +@Override +public void handlerRemoved(ChannelHandlerContext ctx) throws Exception { + try { +backend.dispose(); + } finally { +super.handlerRemoved(ctx); + } +} + + } + + private static class DecryptionHandler extends MessageToMessageDecoderByteBuf { + +private final SaslEncryptionBackend backend; + +DecryptionHandler(SaslEncryptionBackend backend) { + this.backend = backend; +} + +@Override +protected void decode(ChannelHandlerContext ctx, ByteBuf msg, ListObject out) + throws Exception { + + byte[] data; + int offset; + int length = msg.readableBytes(); + if (msg.hasArray()) { +data = msg.array(); +offset = msg.arrayOffset(); + } else { +data = new byte[length]; +msg.readBytes(data); +offset = 0; +
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user His-name-is-Joof commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-96819726 Joof On Apr 27, 2015 2:41 PM, Shivaram Venkataraman notificati...@github.com wrote: @His-name-is-Joof https://github.com/His-name-is-Joof -- Could you let me know what your JIRA username is ? I would like to assign this issue to your â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5667#issuecomment-96813694. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/4688#issuecomment-96819919 @harishreedharan , fyi - we are doing a feature code freeze for spark 1.4 this friday. I think this is really close so hopefully we can get it in. Let me know if there are any questions of concerns on my comments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0
Github user punya commented on the pull request: https://github.com/apache/spark/pull/5726#issuecomment-96794273 @srowen I'm sure it will :) I was using the PR to get Jenkins to figure out what tests actually break. At that point, I'll add [WIP] to the title and see if I can fix the tests. (Please let me know if there's a different process you'd prefer.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5891][ML] Add Binarizer ML Transformer
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5699#issuecomment-96797201 [Test build #30972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30972/consoleFull) for PR 5699 at commit [`1682f8c`](https://github.com/apache/spark/commit/1682f8c05965ccbb34472c5d6e01166ce147f730). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0
Github user punya closed the pull request at: https://github.com/apache/spark/pull/5726 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-96800849 [Test build #31061 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31061/consoleFull) for PR 4783 at commit [`db1e948`](https://github.com/apache/spark/commit/db1e948097b202573cac16a23a8bf22f1d6e2a5b). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6422][STREAMING] support customized act...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5098#issuecomment-96804955 [Test build #31049 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31049/consoleFull) for PR 5098 at commit [`4fe04ee`](https://github.com/apache/spark/commit/4fe04ee69cb9f70e2156f18aa59c13774e39a009). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5727#issuecomment-96813165 [Test build #722 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/722/consoleFull) for PR 5727 at commit [`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-96813694 @His-name-is-Joof -- Could you let me know what your JIRA username is ? I would like to assign this issue to your --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7100][MLLib] Fix persisted RDD leak in ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/5669#issuecomment-96816164 After discussing with @mengxr I think we should not bother with the try-finally wrapper. As mentioned above, the method should generally not fail, so the data will be unpersisted as needed. When an exception is thrown, then the data will be unpersisted whenever another RDD pushes input out of memory/disk, without undue harm to other jobs. @jimfcarroll Could you please update the PR to remove the try-finally wrapper, but keep the unpersist at the end? Thanks for going through this discussion! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5727#issuecomment-96816185 [Test build #31068 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31068/consoleFull) for PR 5727 at commit [`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5568#issuecomment-96816065 Thanks @concretevitamin for the review. Will merge this after Jenkins passes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29190908 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +import java.io.OutputStream + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.storage.BlockObjectWriter + +/** + * A logical byte buffer that wraps a list of byte arrays. All the byte arrays have equal size. The + * advantage of this over a standard ArrayBuffer is that it can grow without claiming large amounts + * of memory and needing to copy the full contents. + */ +private[spark] class ChainedBuffer(chunkSize: Int) { + private val chunkSizeLog2 = (math.log(chunkSize) / math.log(2)).toInt + assert(math.pow(2, chunkSizeLog2).toInt == chunkSize) + private val chunks: ArrayBuffer[Array[Byte]] = new ArrayBuffer[Array[Byte]]() + private var _size: Int = _ + + /** + * Feed bytes from this buffer into a BlockObjectWriter. + * + * @param pos Offset in the buffer to read from. + * @param writer BlockObjectWriter to read into. + * @param len Number of bytes to read. + */ + def read(pos: Int, writer: BlockObjectWriter, len: Int): Unit = { +var chunkIndex = pos chunkSizeLog2 +var posInChunk = pos - (chunkIndex chunkSizeLog2) +var moved = 0 +while (moved len) { + val toRead = math.min(len - moved, chunkSize - posInChunk) + writer.writeBytes(chunks(chunkIndex), posInChunk, toRead) + moved += toRead + chunkIndex += 1 + posInChunk = 0 +} + } + + /** + * Read bytes from this buffer into a byte array. + * + * @param pos Offset in the buffer to read from. + * @param bytes Byte array to read into. + * @param offs Offset in the byte array to read to. + * @param len Number of bytes to read. + */ + def read(pos: Int, bytes: Array[Byte], offs: Int, len: Int): Unit = { +var chunkIndex = pos chunkSizeLog2 +var posInChunk = pos - (chunkIndex chunkSizeLog2) +var moved = 0 +while (moved len) { + val toRead = math.min(len - moved, chunkSize - posInChunk) + System.arraycopy(chunks(chunkIndex), posInChunk, bytes, offs + moved, toRead) + moved += toRead + chunkIndex += 1 + posInChunk = 0 +} + } + + /** + * Write bytes from a byte array into this buffer. + * + * @param pos Offset in the buffer to write to. + * @param bytes Byte array to write from. + * @param offs Offset in the byte array to write from. + * @param len Number of bytes to write. + */ + def write(pos: Int, bytes: Array[Byte], offs: Int, len: Int): Unit = { +// Grow if needed +val endChunkIndex = (pos + len - 1) chunkSizeLog2 +while (endChunkIndex = chunks.length) { + chunks += new Array[Byte](chunkSize) +} + +var chunkIndex = pos chunkSizeLog2 +var posInChunk = pos - (chunkIndex chunkSizeLog2) +var moved = 0 +while (moved len) { + val toWrite = math.min(len - moved, chunkSize - posInChunk) + System.arraycopy(bytes, offs + moved, chunks(chunkIndex), posInChunk, toWrite) + moved += toWrite + chunkIndex += 1 + posInChunk = 0 +} + +_size = math.max(_size, pos + len) + } + + /** + * Total size of buffer that can be written to without allocating additional memory. + */ + def capacity: Int = chunks.size * chunkSize + + /** + * Size of the logical buffer. + */ + def size: Int = _size +} + +/** + * Output stream that writes to a ChainedBuffer. + */ +private[spark] class ChainedBufferOutputStream(chainedBuffer: ChainedBuffer) extends OutputStream { + private
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5673#issuecomment-96828599 [Test build #31066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31066/consoleFull) for PR 5673 at commit [`ab7c72b`](https://github.com/apache/spark/commit/ab7c72b486106a98bafb70b61125ed84f1d01cdd). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `tachyon-0.6.4.jar` * `tachyon-client-0.6.4.jar` * This patch **removes the following dependencies:** * `tachyon-0.5.0.jar` * `tachyon-client-0.5.0.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5673 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5726#issuecomment-96791158 ... I doubt this passes tests. The problem is that you break compatibility with old versions of Hive; it's not this simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5726#issuecomment-96794694 [Test build #31064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31064/consoleFull) for PR 5726 at commit [`310c315`](https://github.com/apache/spark/commit/310c3150c3c3c6c600d2049faba6b8349ba66f99). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. * This patch **removes the following dependencies:** * `RoaringBitmap-0.4.5.jar` * `activation-1.1.jar` * `akka-actor_2.10-2.3.4-spark.jar` * `akka-remote_2.10-2.3.4-spark.jar` * `akka-slf4j_2.10-2.3.4-spark.jar` * `aopalliance-1.0.jar` * `arpack_combined_all-0.1.jar` * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar` * `chill-java-0.5.0.jar` * `chill_2.10-0.5.0.jar` * `commons-beanutils-1.7.0.jar` * `commons-beanutils-core-1.8.0.jar` * `commons-cli-1.2.jar` * `commons-codec-1.10.jar` * `commons-collections-3.2.1.jar` * `commons-compress-1.4.1.jar` * `commons-configuration-1.6.jar` * `commons-digester-1.8.jar` * `commons-httpclient-3.1.jar` * `commons-io-2.1.jar` * `commons-lang-2.5.jar` * `commons-lang3-3.3.2.jar` * `commons-math-2.1.jar` * `commons-math3-3.4.1.jar` * `commons-net-2.2.jar` * `compress-lzf-1.0.0.jar` * `config-1.2.1.jar` * `core-1.1.2.jar` * `curator-client-2.4.0.jar` * `curator-framework-2.4.0.jar` * `curator-recipes-2.4.0.jar` * `gmbal-api-only-3.0.0-b023.jar` * `grizzly-framework-2.1.2.jar` * `grizzly-http-2.1.2.jar` * `grizzly-http-server-2.1.2.jar` * `grizzly-http-servlet-2.1.2.jar` * `grizzly-rcm-2.1.2.jar` * `groovy-all-2.3.7.jar` * `guava-14.0.1.jar` * `guice-3.0.jar` * `hadoop-annotations-2.2.0.jar` * `hadoop-auth-2.2.0.jar` * `hadoop-client-2.2.0.jar` * `hadoop-common-2.2.0.jar` * `hadoop-hdfs-2.2.0.jar` * `hadoop-mapreduce-client-app-2.2.0.jar` * `hadoop-mapreduce-client-common-2.2.0.jar` * `hadoop-mapreduce-client-core-2.2.0.jar` * `hadoop-mapreduce-client-jobclient-2.2.0.jar` * `hadoop-mapreduce-client-shuffle-2.2.0.jar` * `hadoop-yarn-api-2.2.0.jar` * `hadoop-yarn-client-2.2.0.jar` * `hadoop-yarn-common-2.2.0.jar` * `hadoop-yarn-server-common-2.2.0.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.4.0.jar` * `jackson-core-2.4.4.jar` * `jackson-core-asl-1.8.8.jar` * `jackson-databind-2.4.4.jar` * `jackson-jaxrs-1.8.8.jar` * `jackson-mapper-asl-1.8.8.jar` * `jackson-module-scala_2.10-2.4.4.jar` * `jackson-xc-1.8.8.jar` * `jansi-1.4.jar` * `javax.inject-1.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `javax.servlet-3.1.jar` * `javax.servlet-api-3.0.1.jar` * `jaxb-api-2.2.2.jar` * `jaxb-impl-2.2.3-1.jar` * `jcl-over-slf4j-1.7.10.jar` * `jersey-client-1.9.jar` * `jersey-core-1.9.jar` * `jersey-grizzly2-1.9.jar` * `jersey-guice-1.9.jar` * `jersey-json-1.9.jar` * `jersey-server-1.9.jar` * `jersey-test-framework-core-1.9.jar` * `jersey-test-framework-grizzly2-1.9.jar` * `jets3t-0.7.1.jar` * `jettison-1.1.jar` * `jetty-util-6.1.26.jar` * `jline-0.9.94.jar` * `jline-2.10.4.jar` * `jodd-core-3.6.3.jar` * `json4s-ast_2.10-3.2.10.jar` * `json4s-core_2.10-3.2.10.jar` * `json4s-jackson_2.10-3.2.10.jar` * `jsr305-1.3.9.jar` * `jtransforms-2.4.0.jar` * `jul-to-slf4j-1.7.10.jar` * `kryo-2.21.jar` * `log4j-1.2.17.jar` * `lz4-1.2.0.jar` * `management-api-3.0.0-b012.jar` * `mesos-0.21.0-shaded-protobuf.jar` * `metrics-core-3.1.0.jar` * `metrics-graphite-3.1.0.jar` * `metrics-json-3.1.0.jar` * `metrics-jvm-3.1.0.jar` * `minlog-1.2.jar` * `netty-3.8.0.Final.jar` * `netty-all-4.0.23.Final.jar` * `objenesis-1.2.jar` * `opencsv-2.3.jar` * `oro-2.0.8.jar` * `paranamer-2.6.jar` * `parquet-column-1.6.0rc3.jar` * `parquet-common-1.6.0rc3.jar` * `parquet-encoding-1.6.0rc3.jar` * `parquet-format-2.2.0-rc1.jar` * `parquet-generator-1.6.0rc3.jar` * `parquet-hadoop-1.6.0rc3.jar` * `parquet-jackson-1.6.0rc3.jar` * `protobuf-java-2.4.1.jar` * `protobuf-java-2.5.0-spark.jar` * `py4j-0.8.2.1.jar` * `pyrolite-2.0.1.jar` * `quasiquotes_2.10-2.0.1.jar` * `reflectasm-1.07-shaded.jar` * `scala-compiler-2.10.4.jar` * `scala-library-2.10.4.jar` *
[GitHub] spark pull request: [SPARK-6746B] Refactor large functions in DAGS...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5396#issuecomment-96797161 [Test build #31024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31024/consoleFull) for PR 5396 at commit [`f0dcc7b`](https://github.com/apache/spark/commit/f0dcc7b8b62e7cbb4608b5cc9f3e6fe865c87bd8). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-96800371 [Test build #31060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31060/consoleFull) for PR 5667 at commit [`f8814a6`](https://github.com/apache/spark/commit/f8814a67436922342f89e54b8fc2ef24b63d1308). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6806] [SparkR] [Docs] Fill in SparkR ex...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5442#issuecomment-96811042 **[Test build #31015 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31015/consoleFull)** for PR 5442 at commit [`89684ce`](https://github.com/apache/spark/commit/89684ce59cfe4d989c2f36495d21ecb142c9881d) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
GitHub user JoshRosen opened a pull request: https://github.com/apache/spark/pull/5727 [Build] Enable MiMa checks for launcher and sql projects Now that 1.3 has been released, we should enable MiMa checks for the `sql` and `launcher` subprojects. You can merge this pull request into a Git repository by running: $ git pull https://github.com/JoshRosen/spark enable-more-mima-checks Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5727.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5727 commit 1aae027d640342ca7fb1146f72c9b62aea9f78c6 Author: Josh Rosen joshro...@databricks.com Date: 2015-04-27T20:32:06Z Enable MiMa checks for launcher and sql projects. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark-5854 personalized page rank
Github user dwmclary commented on the pull request: https://github.com/apache/spark/pull/4774#issuecomment-96816965 @jegonzal does this algorithm look correct to you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7007][core] Add a metric source for Exe...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5589#issuecomment-96819141 [Test build #30999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30999/consoleFull) for PR 5589 at commit [`a6d5ec5`](https://github.com/apache/spark/commit/a6d5ec51caabdb900c5c1971bfe438957ee1a032). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `activation-1.1.jar` * `aopalliance-1.0.jar` * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar` * `commons-cli-1.2.jar` * `commons-codec-1.10.jar` * `commons-compress-1.4.1.jar` * `commons-io-2.1.jar` * `commons-lang-2.5.jar` * `commons-math3-3.4.1.jar` * `gmbal-api-only-3.0.0-b023.jar` * `grizzly-framework-2.1.2.jar` * `grizzly-http-2.1.2.jar` * `grizzly-http-server-2.1.2.jar` * `grizzly-http-servlet-2.1.2.jar` * `grizzly-rcm-2.1.2.jar` * `guice-3.0.jar` * `hadoop-annotations-2.2.0.jar` * `hadoop-auth-2.2.0.jar` * `hadoop-client-2.2.0.jar` * `hadoop-common-2.2.0.jar` * `hadoop-hdfs-2.2.0.jar` * `hadoop-mapreduce-client-app-2.2.0.jar` * `hadoop-mapreduce-client-common-2.2.0.jar` * `hadoop-mapreduce-client-core-2.2.0.jar` * `hadoop-mapreduce-client-jobclient-2.2.0.jar` * `hadoop-mapreduce-client-shuffle-2.2.0.jar` * `hadoop-yarn-api-2.2.0.jar` * `hadoop-yarn-client-2.2.0.jar` * `hadoop-yarn-common-2.2.0.jar` * `hadoop-yarn-server-common-2.2.0.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.4.0.jar` * `jackson-core-2.4.4.jar` * `jackson-databind-2.4.4.jar` * `jackson-jaxrs-1.8.8.jar` * `jackson-module-scala_2.10-2.4.4.jar` * `jackson-xc-1.8.8.jar` * `javax.inject-1.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `javax.servlet-3.1.jar` * `javax.servlet-api-3.0.1.jar` * `jaxb-api-2.2.2.jar` * `jaxb-impl-2.2.3-1.jar` * `jersey-client-1.9.jar` * `jersey-core-1.9.jar` * `jersey-grizzly2-1.9.jar` * `jersey-guice-1.9.jar` * `jersey-json-1.9.jar` * `jersey-server-1.9.jar` * `jersey-test-framework-core-1.9.jar` * `jersey-test-framework-grizzly2-1.9.jar` * `jettison-1.1.jar` * `jetty-util-6.1.26.jar` * `jodd-core-3.6.3.jar` * `management-api-3.0.0-b012.jar` * `protobuf-java-2.4.1.jar` * `snappy-java-1.1.1.7.jar` * `spark-bagel_2.10-1.4.0-SNAPSHOT.jar` * `spark-catalyst_2.10-1.4.0-SNAPSHOT.jar` * `spark-core_2.10-1.4.0-SNAPSHOT.jar` * `spark-graphx_2.10-1.4.0-SNAPSHOT.jar` * `spark-launcher_2.10-1.4.0-SNAPSHOT.jar` * `spark-mllib_2.10-1.4.0-SNAPSHOT.jar` * `spark-network-common_2.10-1.4.0-SNAPSHOT.jar` * `spark-network-shuffle_2.10-1.4.0-SNAPSHOT.jar` * `spark-repl_2.10-1.4.0-SNAPSHOT.jar` * `spark-sql_2.10-1.4.0-SNAPSHOT.jar` * `spark-streaming_2.10-1.4.0-SNAPSHOT.jar` * `stax-api-1.0.1.jar` * `xz-1.0.jar` * This patch **removes the following dependencies:** * `breeze-macros_2.10-0.3.1.jar` * `breeze_2.10-0.10.jar` * `commons-codec-1.5.jar` * `commons-el-1.0.jar` * `commons-io-2.4.jar` * `commons-lang-2.4.jar` * `commons-math3-3.1.1.jar` * `hadoop-client-1.0.4.jar` * `hadoop-core-1.0.4.jar` * `hsqldb-1.8.0.10.jar` * `jackson-annotations-2.3.0.jar` * `jackson-core-2.3.0.jar` * `jackson-databind-2.3.0.jar` * `jblas-1.2.3.jar` * `snappy-java-1.1.1.6.jar` * `spark-bagel_2.10-1.3.0-SNAPSHOT.jar` * `spark-catalyst_2.10-1.3.0-SNAPSHOT.jar` * `spark-core_2.10-1.3.0-SNAPSHOT.jar` * `spark-graphx_2.10-1.3.0-SNAPSHOT.jar` * `spark-mllib_2.10-1.3.0-SNAPSHOT.jar` * `spark-network-common_2.10-1.3.0-SNAPSHOT.jar` * `spark-network-shuffle_2.10-1.3.0-SNAPSHOT.jar` * `spark-repl_2.10-1.3.0-SNAPSHOT.jar` * `spark-sql_2.10-1.3.0-SNAPSHOT.jar` * `spark-streaming_2.10-1.3.0-SNAPSHOT.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user doctapp commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-96791733 @hellertime thanks for the info, didn't catch this wasn't pre-installed with mesos. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5377#discussion_r29188459 --- Diff: network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.WritableByteChannel; +import java.util.List; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.Channel; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.ChannelOutboundHandlerAdapter; +import io.netty.channel.ChannelPromise; +import io.netty.channel.FileRegion; +import io.netty.handler.codec.MessageToMessageDecoder; +import io.netty.util.AbstractReferenceCounted; +import io.netty.util.ReferenceCountUtil; + +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.NettyUtils; + +class SaslEncryption { + + @VisibleForTesting + static final String ENCRYPTION_HANDLER_NAME = saslEncryption; + + /** + * Adds channel handlers that perform encryption / decryption of data using SASL. + * + * @param channel The channel. + * @param backend The SASL backend. + * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted blocks, to control + * memory usage. + */ + static void addToChannel( + Channel channel, + SaslEncryptionBackend backend, + int maxOutboundBlockSize) { +channel.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, maxOutboundBlockSize)) + .addFirst(saslDecryption, new DecryptionHandler(backend)) + .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder()); + } + + private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + +private final int maxOutboundBlockSize; +private final SaslEncryptionBackend backend; + +EncryptionHandler(SaslEncryptionBackend backend, int maxOutboundBlockSize) { + this.backend = backend; + this.maxOutboundBlockSize = maxOutboundBlockSize; +} + +/** + * Wrap the incoming message in an implementation that will perform encryption lazily. This is + * needed to guarantee ordering of the outgoing encrypted packets - they need to be decrypted in + * the same order, and netty doesn't have an atomic ChannelHandlerContext.write() API, so it + * does not guarantee any ordering. + */ +@Override +public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) + throws Exception { + + ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), promise); +} + +@Override +public void handlerRemoved(ChannelHandlerContext ctx) throws Exception { + try { +backend.dispose(); + } finally { +super.handlerRemoved(ctx); + } +} + + } + + private static class DecryptionHandler extends MessageToMessageDecoderByteBuf { + +private final SaslEncryptionBackend backend; + +DecryptionHandler(SaslEncryptionBackend backend) { + this.backend = backend; +} + +@Override +protected void decode(ChannelHandlerContext ctx, ByteBuf msg, ListObject out) + throws Exception { + + byte[] data; + int offset; + int length = msg.readableBytes(); + if (msg.hasArray()) { +data = msg.array(); +offset = msg.arrayOffset(); --- End diff -- It's unnecessary since `MessageToMessageDecoder` will release the input message when this
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5667 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5727#issuecomment-96813311 [Test build #31068 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31068/consoleFull) for PR 5727 at commit [`1aae027`](https://github.com/apache/spark/commit/1aae027d640342ca7fb1146f72c9b62aea9f78c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...
Github user concretevitamin commented on the pull request: https://github.com/apache/spark/pull/5568#issuecomment-96814334 LGTM. /cc @shivaram --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7076][SPARK-7077][SPARK-7080][SQL] Use ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5725#issuecomment-96814367 [Test build #31069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31069/consoleFull) for PR 5725 at commit [`0925847`](https://github.com/apache/spark/commit/092584701277394a704c7600c6a631326d7895c6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5377#discussion_r29190765 --- Diff: network/common/src/main/java/org/apache/spark/network/sasl/SparkSaslServer.java --- @@ -60,13 +60,19 @@ static final String DIGEST = DIGEST-MD5; /** - * The quality of protection is just auth. This means that we are doing - * authentication only, we are not supporting integrity or privacy protection of the - * communication channel after authentication. This could be changed to be configurable - * in the future. + * QOP value that includes encryption. + */ + static final String QOP_AUTH_CONF = auth-conf; + + /** + * QOP value that does not include encryption. + */ + static final String QOP_AUTH = auth; + + /** + * Common SASL config properties for both client and server. */ static final MapString, String SASL_PROPS = ImmutableMap.String, Stringbuilder() -.put(Sasl.QOP, auth) .put(Sasl.SERVER_AUTH, true) --- End diff -- I don't think it applies to the client. I'm also not sure whether it's needed at all, but I'll change the code so it's only set for the server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5342][YARN] Allow long running Spark ap...
Github user harishreedharan commented on the pull request: https://github.com/apache/spark/pull/4688#issuecomment-96824485 I am testing the changes right now. I will update this PR soon. Thanks @tgravescs! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29192947 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +import java.util.Comparator + +import org.apache.spark.storage.BlockObjectWriter + +/** + * A common interface for size-tracking collections of key-value pairs that + * - Have an associated partition for each key-value pair. + * - Support a memory-efficient sorted iterator + * - Support a WritablePartitionedIterator for writing the contents directly as bytes. + */ +private[spark] trait WritablePartitionedPairCollection[K, V] extends SizeTracker { + /** + * Insert a key-value pair with a partition into the collection + */ + def insert(partition: Int, key: K, value: V): Unit + + /** + * Estimate the collection's current memory usage in bytes. + */ + def estimateSize(): Long + + /** + * Iterate through the data in order of partition ID and then the given comparator. This may + * destroy the underlying collection. + */ + def partitionedDestructiveSortedIterator(keyComparator: Comparator[K]): Iterator[((Int, K), V)] --- End diff -- Agree this is an obnoxiously long name. However, if we rename `partitionedDestructiveSortedIterator` to `partitionedIterator`, then we probably also want to rename `destructiveSortedWritablePartitionedIterator` to `writablePartitionedIterator`. But a method named `writablePartitionedIterator` exists as well (which is not destructive or sorted). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-96828951 [Test build #31071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31071/consoleFull) for PR 5694 at commit [`83e80ef`](https://github.com/apache/spark/commit/83e80ef4eec49dcee7c55900e4cbcf9b899aea65). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/5377#discussion_r29195262 --- Diff: network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.WritableByteChannel; +import java.util.List; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.Channel; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.ChannelOutboundHandlerAdapter; +import io.netty.channel.ChannelPromise; +import io.netty.channel.FileRegion; +import io.netty.handler.codec.MessageToMessageDecoder; +import io.netty.util.AbstractReferenceCounted; +import io.netty.util.ReferenceCountUtil; + +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.NettyUtils; + +class SaslEncryption { + + @VisibleForTesting + static final String ENCRYPTION_HANDLER_NAME = saslEncryption; + + /** + * Adds channel handlers that perform encryption / decryption of data using SASL. + * + * @param channel The channel. + * @param backend The SASL backend. + * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted blocks, to control + * memory usage. + */ + static void addToChannel( + Channel channel, + SaslEncryptionBackend backend, + int maxOutboundBlockSize) { +channel.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, maxOutboundBlockSize)) + .addFirst(saslDecryption, new DecryptionHandler(backend)) + .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder()); + } + + private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + +private final int maxOutboundBlockSize; +private final SaslEncryptionBackend backend; + +EncryptionHandler(SaslEncryptionBackend backend, int maxOutboundBlockSize) { + this.backend = backend; + this.maxOutboundBlockSize = maxOutboundBlockSize; +} + +/** + * Wrap the incoming message in an implementation that will perform encryption lazily. This is + * needed to guarantee ordering of the outgoing encrypted packets - they need to be decrypted in + * the same order, and netty doesn't have an atomic ChannelHandlerContext.write() API, so it + * does not guarantee any ordering. + */ +@Override +public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) + throws Exception { + + ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), promise); +} + +@Override +public void handlerRemoved(ChannelHandlerContext ctx) throws Exception { + try { +backend.dispose(); + } finally { +super.handlerRemoved(ctx); + } +} + + } + + private static class DecryptionHandler extends MessageToMessageDecoderByteBuf { + +private final SaslEncryptionBackend backend; + +DecryptionHandler(SaslEncryptionBackend backend) { + this.backend = backend; +} + +@Override +protected void decode(ChannelHandlerContext ctx, ByteBuf msg, ListObject out) + throws Exception { + + byte[] data; + int offset; + int length = msg.readableBytes(); + if (msg.hasArray()) { +data = msg.array(); +offset = msg.arrayOffset(); --- End diff -- I see, it's just slightly odd that only one of the two cases moves msg's reader index. In
[GitHub] spark pull request: [SPARK-6612] [MLLib] [PySpark] Python KMeans p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5647#issuecomment-96792936 [Test build #30986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30986/consoleFull) for PR 5647 at commit [`9903837`](https://github.com/apache/spark/commit/990383761841b444506e91f3052c2de3736d6052). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **removes the following dependencies:** * `RoaringBitmap-0.4.5.jar` * `activation-1.1.jar` * `akka-actor_2.10-2.3.4-spark.jar` * `akka-remote_2.10-2.3.4-spark.jar` * `akka-slf4j_2.10-2.3.4-spark.jar` * `aopalliance-1.0.jar` * `arpack_combined_all-0.1.jar` * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar` * `chill-java-0.5.0.jar` * `chill_2.10-0.5.0.jar` * `commons-beanutils-1.7.0.jar` * `commons-beanutils-core-1.8.0.jar` * `commons-cli-1.2.jar` * `commons-codec-1.10.jar` * `commons-collections-3.2.1.jar` * `commons-compress-1.4.1.jar` * `commons-configuration-1.6.jar` * `commons-digester-1.8.jar` * `commons-httpclient-3.1.jar` * `commons-io-2.1.jar` * `commons-lang-2.5.jar` * `commons-lang3-3.3.2.jar` * `commons-math-2.1.jar` * `commons-math3-3.1.1.jar` * `commons-net-2.2.jar` * `compress-lzf-1.0.0.jar` * `config-1.2.1.jar` * `core-1.1.2.jar` * `curator-client-2.4.0.jar` * `curator-framework-2.4.0.jar` * `curator-recipes-2.4.0.jar` * `gmbal-api-only-3.0.0-b023.jar` * `grizzly-framework-2.1.2.jar` * `grizzly-http-2.1.2.jar` * `grizzly-http-server-2.1.2.jar` * `grizzly-http-servlet-2.1.2.jar` * `grizzly-rcm-2.1.2.jar` * `groovy-all-2.3.7.jar` * `guava-14.0.1.jar` * `guice-3.0.jar` * `hadoop-annotations-2.2.0.jar` * `hadoop-auth-2.2.0.jar` * `hadoop-client-2.2.0.jar` * `hadoop-common-2.2.0.jar` * `hadoop-hdfs-2.2.0.jar` * `hadoop-mapreduce-client-app-2.2.0.jar` * `hadoop-mapreduce-client-common-2.2.0.jar` * `hadoop-mapreduce-client-core-2.2.0.jar` * `hadoop-mapreduce-client-jobclient-2.2.0.jar` * `hadoop-mapreduce-client-shuffle-2.2.0.jar` * `hadoop-yarn-api-2.2.0.jar` * `hadoop-yarn-client-2.2.0.jar` * `hadoop-yarn-common-2.2.0.jar` * `hadoop-yarn-server-common-2.2.0.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.4.0.jar` * `jackson-core-2.4.4.jar` * `jackson-core-asl-1.8.8.jar` * `jackson-databind-2.4.4.jar` * `jackson-jaxrs-1.8.8.jar` * `jackson-mapper-asl-1.8.8.jar` * `jackson-module-scala_2.10-2.4.4.jar` * `jackson-xc-1.8.8.jar` * `jansi-1.4.jar` * `javax.inject-1.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `javax.servlet-3.1.jar` * `javax.servlet-api-3.0.1.jar` * `jaxb-api-2.2.2.jar` * `jaxb-impl-2.2.3-1.jar` * `jcl-over-slf4j-1.7.10.jar` * `jersey-client-1.9.jar` * `jersey-core-1.9.jar` * `jersey-grizzly2-1.9.jar` * `jersey-guice-1.9.jar` * `jersey-json-1.9.jar` * `jersey-server-1.9.jar` * `jersey-test-framework-core-1.9.jar` * `jersey-test-framework-grizzly2-1.9.jar` * `jets3t-0.7.1.jar` * `jettison-1.1.jar` * `jetty-util-6.1.26.jar` * `jline-0.9.94.jar` * `jline-2.10.4.jar` * `jodd-core-3.6.3.jar` * `json4s-ast_2.10-3.2.10.jar` * `json4s-core_2.10-3.2.10.jar` * `json4s-jackson_2.10-3.2.10.jar` * `jsr305-1.3.9.jar` * `jtransforms-2.4.0.jar` * `jul-to-slf4j-1.7.10.jar` * `kryo-2.21.jar` * `log4j-1.2.17.jar` * `lz4-1.2.0.jar` * `management-api-3.0.0-b012.jar` * `mesos-0.21.0-shaded-protobuf.jar` * `metrics-core-3.1.0.jar` * `metrics-graphite-3.1.0.jar` * `metrics-json-3.1.0.jar` * `metrics-jvm-3.1.0.jar` * `minlog-1.2.jar` * `netty-3.8.0.Final.jar` * `netty-all-4.0.23.Final.jar` * `objenesis-1.2.jar` * `opencsv-2.3.jar` * `oro-2.0.8.jar` * `paranamer-2.6.jar` * `parquet-column-1.6.0rc3.jar` * `parquet-common-1.6.0rc3.jar` * `parquet-encoding-1.6.0rc3.jar` * `parquet-format-2.2.0-rc1.jar` * `parquet-generator-1.6.0rc3.jar` * `parquet-hadoop-1.6.0rc3.jar` * `parquet-jackson-1.6.0rc3.jar` * `protobuf-java-2.4.1.jar` * `protobuf-java-2.5.0-spark.jar` * `py4j-0.8.2.1.jar` * `pyrolite-2.0.1.jar` * `quasiquotes_2.10-2.0.1.jar` * `reflectasm-1.07-shaded.jar` * `scala-compiler-2.10.4.jar` * `scala-library-2.10.4.jar` *
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5673#issuecomment-96792973 [Test build #31066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31066/consoleFull) for PR 5673 at commit [`ab7c72b`](https://github.com/apache/spark/commit/ab7c72b486106a98bafb70b61125ed84f1d01cdd). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6775] [SPARK-6776] [SQL] [WIP] Refactor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5422#issuecomment-96801888 [Test build #31021 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31021/consoleFull) for PR 5422 at commit [`2529b76`](https://github.com/apache/spark/commit/2529b76141c068fe69a03e29ddc50ca496cd9dcd). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `RoaringBitmap-0.4.5.jar` * `activation-1.1.jar` * `akka-actor_2.10-2.3.4-spark.jar` * `akka-remote_2.10-2.3.4-spark.jar` * `akka-slf4j_2.10-2.3.4-spark.jar` * `aopalliance-1.0.jar` * `arpack_combined_all-0.1.jar` * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar` * `chill-java-0.5.0.jar` * `chill_2.10-0.5.0.jar` * `commons-beanutils-1.7.0.jar` * `commons-beanutils-core-1.8.0.jar` * `commons-cli-1.2.jar` * `commons-codec-1.10.jar` * `commons-collections-3.2.1.jar` * `commons-compress-1.4.1.jar` * `commons-configuration-1.6.jar` * `commons-digester-1.8.jar` * `commons-httpclient-3.1.jar` * `commons-io-2.1.jar` * `commons-lang-2.5.jar` * `commons-lang3-3.3.2.jar` * `commons-math-2.1.jar` * `commons-math3-3.1.1.jar` * `commons-net-2.2.jar` * `compress-lzf-1.0.0.jar` * `config-1.2.1.jar` * `core-1.1.2.jar` * `curator-client-2.4.0.jar` * `curator-framework-2.4.0.jar` * `curator-recipes-2.4.0.jar` * `gmbal-api-only-3.0.0-b023.jar` * `grizzly-framework-2.1.2.jar` * `grizzly-http-2.1.2.jar` * `grizzly-http-server-2.1.2.jar` * `grizzly-http-servlet-2.1.2.jar` * `grizzly-rcm-2.1.2.jar` * `groovy-all-2.3.7.jar` * `guava-14.0.1.jar` * `guice-3.0.jar` * `hadoop-annotations-2.2.0.jar` * `hadoop-auth-2.2.0.jar` * `hadoop-client-2.2.0.jar` * `hadoop-common-2.2.0.jar` * `hadoop-hdfs-2.2.0.jar` * `hadoop-mapreduce-client-app-2.2.0.jar` * `hadoop-mapreduce-client-common-2.2.0.jar` * `hadoop-mapreduce-client-core-2.2.0.jar` * `hadoop-mapreduce-client-jobclient-2.2.0.jar` * `hadoop-mapreduce-client-shuffle-2.2.0.jar` * `hadoop-yarn-api-2.2.0.jar` * `hadoop-yarn-client-2.2.0.jar` * `hadoop-yarn-common-2.2.0.jar` * `hadoop-yarn-server-common-2.2.0.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.4.0.jar` * `jackson-core-2.4.4.jar` * `jackson-core-asl-1.8.8.jar` * `jackson-databind-2.4.4.jar` * `jackson-jaxrs-1.8.8.jar` * `jackson-mapper-asl-1.8.8.jar` * `jackson-module-scala_2.10-2.4.4.jar` * `jackson-xc-1.8.8.jar` * `jansi-1.4.jar` * `javax.inject-1.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `javax.servlet-3.1.jar` * `javax.servlet-api-3.0.1.jar` * `jaxb-api-2.2.2.jar` * `jaxb-impl-2.2.3-1.jar` * `jcl-over-slf4j-1.7.10.jar` * `jersey-client-1.9.jar` * `jersey-core-1.9.jar` * `jersey-grizzly2-1.9.jar` * `jersey-guice-1.9.jar` * `jersey-json-1.9.jar` * `jersey-server-1.9.jar` * `jersey-test-framework-core-1.9.jar` * `jersey-test-framework-grizzly2-1.9.jar` * `jets3t-0.7.1.jar` * `jettison-1.1.jar` * `jetty-util-6.1.26.jar` * `jline-0.9.94.jar` * `jline-2.10.4.jar` * `jodd-core-3.6.3.jar` * `json4s-ast_2.10-3.2.10.jar` * `json4s-core_2.10-3.2.10.jar` * `json4s-jackson_2.10-3.2.10.jar` * `jsr305-1.3.9.jar` * `jtransforms-2.4.0.jar` * `jul-to-slf4j-1.7.10.jar` * `kryo-2.21.jar` * `log4j-1.2.17.jar` * `lz4-1.2.0.jar` * `management-api-3.0.0-b012.jar` * `mesos-0.21.0-shaded-protobuf.jar` * `metrics-core-3.1.0.jar` * `metrics-graphite-3.1.0.jar` * `metrics-json-3.1.0.jar` * `metrics-jvm-3.1.0.jar` * `minlog-1.2.jar` * `netty-3.8.0.Final.jar` * `netty-all-4.0.23.Final.jar` * `objenesis-1.2.jar` * `opencsv-2.3.jar` * `oro-2.0.8.jar` * `paranamer-2.6.jar` * `parquet-column-1.6.0rc3.jar` * `parquet-common-1.6.0rc3.jar` * `parquet-encoding-1.6.0rc3.jar` * `parquet-format-2.2.0-rc1.jar` * `parquet-generator-1.6.0rc3.jar` * `parquet-hadoop-1.6.0rc3.jar` * `parquet-jackson-1.6.0rc3.jar` * `protobuf-java-2.4.1.jar` * `protobuf-java-2.5.0-spark.jar` * `py4j-0.8.2.1.jar` * `pyrolite-2.0.1.jar` * `quasiquotes_2.10-2.0.1.jar` * `reflectasm-1.07-shaded.jar` * `scala-compiler-2.10.4.jar` *
[GitHub] spark pull request: [SPARK-7138][Streaming] Add method to BlockGen...
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/5695#issuecomment-96808771 @huitseeker Yes, I want to use the existing rate limiter interface. So that any sort of rate controlling in the future can be applied through that interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6229] Add SASL encryption to network li...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/5377#discussion_r29189853 --- Diff: network/common/src/main/java/org/apache/spark/network/sasl/SaslEncryption.java --- @@ -0,0 +1,260 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.network.sasl; + +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.channels.WritableByteChannel; +import java.util.List; + +import com.google.common.annotations.VisibleForTesting; +import com.google.common.base.Preconditions; +import io.netty.buffer.ByteBuf; +import io.netty.buffer.Unpooled; +import io.netty.channel.Channel; +import io.netty.channel.ChannelHandlerContext; +import io.netty.channel.ChannelOutboundHandlerAdapter; +import io.netty.channel.ChannelPromise; +import io.netty.channel.FileRegion; +import io.netty.handler.codec.MessageToMessageDecoder; +import io.netty.util.AbstractReferenceCounted; +import io.netty.util.ReferenceCountUtil; + +import org.apache.spark.network.util.ByteArrayWritableChannel; +import org.apache.spark.network.util.NettyUtils; + +class SaslEncryption { + + @VisibleForTesting + static final String ENCRYPTION_HANDLER_NAME = saslEncryption; + + /** + * Adds channel handlers that perform encryption / decryption of data using SASL. + * + * @param channel The channel. + * @param backend The SASL backend. + * @param maxOutboundBlockSize Max size in bytes of outgoing encrypted blocks, to control + * memory usage. + */ + static void addToChannel( + Channel channel, + SaslEncryptionBackend backend, + int maxOutboundBlockSize) { +channel.pipeline() + .addFirst(ENCRYPTION_HANDLER_NAME, new EncryptionHandler(backend, maxOutboundBlockSize)) + .addFirst(saslDecryption, new DecryptionHandler(backend)) + .addFirst(saslFrameDecoder, NettyUtils.createFrameDecoder()); + } + + private static class EncryptionHandler extends ChannelOutboundHandlerAdapter { + +private final int maxOutboundBlockSize; +private final SaslEncryptionBackend backend; + +EncryptionHandler(SaslEncryptionBackend backend, int maxOutboundBlockSize) { + this.backend = backend; + this.maxOutboundBlockSize = maxOutboundBlockSize; +} + +/** + * Wrap the incoming message in an implementation that will perform encryption lazily. This is + * needed to guarantee ordering of the outgoing encrypted packets - they need to be decrypted in + * the same order, and netty doesn't have an atomic ChannelHandlerContext.write() API, so it + * does not guarantee any ordering. + */ +@Override +public void write(ChannelHandlerContext ctx, Object msg, ChannelPromise promise) + throws Exception { + + ctx.write(new EncryptedMessage(backend, msg, maxOutboundBlockSize), promise); +} + +@Override +public void handlerRemoved(ChannelHandlerContext ctx) throws Exception { + try { +backend.dispose(); + } finally { +super.handlerRemoved(ctx); + } +} + + } + + private static class DecryptionHandler extends MessageToMessageDecoderByteBuf { + +private final SaslEncryptionBackend backend; + +DecryptionHandler(SaslEncryptionBackend backend) { + this.backend = backend; +} + +@Override +protected void decode(ChannelHandlerContext ctx, ByteBuf msg, ListObject out) + throws Exception { + + byte[] data; + int offset; + int length = msg.readableBytes(); + if (msg.hasArray()) { +data = msg.array(); +offset = msg.arrayOffset(); + } else { +data = new byte[length]; +msg.readBytes(data); +offset = 0; +
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5727#issuecomment-96816752 It looks like SQL failed 11 MiMa checks, although it looks like all of them are in test code or internal APIs (so we can just double-check, then add the proper excludes / annotations): ``` [info] spark-sql: found 13 potential binary incompatibilities (filtered 101) [error] * method checkAnalysis()org.apache.spark.sql.catalyst.analysis.CheckAnalysis in class org.apache.spark.sql.SQLContext does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingMethodProblem](org.apache.spark.sql.SQLContext.checkAnalysis) [error] * method children()scala.collection.immutable.Nil# in class org.apache.spark.sql.execution.ExecutedCommand has now a different result type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.ExecutedCommand.children) [error] * class org.apache.spark.sql.execution.AddExchange does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.execution.AddExchange) [error] * method children()scala.collection.immutable.Nil# in class org.apache.spark.sql.execution.LogicalLocalTable has now a different result type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalLocalTable.children) [error] * method newInstance()org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation in class org.apache.spark.sql.execution.LogicalLocalTable has now a different result type; was: org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, is now: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalLocalTable.newInstance) [error] * method children()scala.collection.immutable.Nil# in class org.apache.spark.sql.execution.PhysicalRDD has now a different result type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.PhysicalRDD.children) [error] * method children()scala.collection.immutable.Nil# in class org.apache.spark.sql.execution.LocalTableScan has now a different result type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LocalTableScan.children) [error] * object org.apache.spark.sql.execution.AddExchange does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.execution.AddExchange$) [error] * method children()scala.collection.immutable.Nil# in class org.apache.spark.sql.execution.LogicalRDD has now a different result type; was: scala.collection.immutable.Nil#, is now: scala.collection.Seq [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalRDD.children) [error] * method newInstance()org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation in class org.apache.spark.sql.execution.LogicalRDD has now a different result type; was: org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation, is now: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan [error]filter with: ProblemFilters.exclude[IncompatibleResultTypeProblem](org.apache.spark.sql.execution.LogicalRDD.newInstance) [error] * class org.apache.spark.sql.parquet.ParquetTestData does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.ParquetTestData) [error] * object org.apache.spark.sql.parquet.ParquetTestData does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.ParquetTestData$) [error] * class org.apache.spark.sql.parquet.TestGroupWriteSupport does not have a correspondent in new version [error]filter with: ProblemFilters.exclude[MissingClassProblem](org.apache.spark.sql.parquet.TestGroupWriteSupport) ``` `launcher` failed its checks because it couldn't find a spark-launcher JAR on Maven: ``` [info] spark-mllib: found 0 potential binary incompatibilities (filtered 242) sbt.ResolveException: unresolved dependency: org.apache.spark#spark-launcher_2.10;1.3.0: not found at sbt.IvyActions$.sbt$IvyActions$$resolve(IvyActions.scala:278) at
[GitHub] spark pull request: [SPARK-7100][MLLib] Fix persisted RDD leak in ...
Github user jimfcarroll commented on the pull request: https://github.com/apache/spark/pull/5669#issuecomment-96816661 Your project. I'll downgrade it if you want. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user helena commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29194115 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( logDebug(sRead partition data of $this from block manager, block $blockId) iterator case None = // Data not found in Block Manager, grab it from write ahead log file -val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) -val dataRead = reader.read(partition.segment) -reader.close() +var dataRead: ByteBuffer = null --- End diff -- I feel dirty seeing nulls in scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6991] [SparkR] Adds support for zipPart...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5568#issuecomment-96830292 Hmm this is weird - somehow the test results for this PR were never posted to github. https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31005/consoleFull reports a json parsing error cc @shaneknapp --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6829] Added math functions for DataFram...
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/5616#discussion_r29195806 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/mathfunctions.scala --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions + +import org.apache.spark.sql.catalyst.analysis.UnresolvedException +import org.apache.spark.sql.types._ + +abstract class MathematicalExpression(name: String) extends UnaryExpression with Serializable { + self: Product = + type EvaluatedType = Any + + override def dataType: DataType = DoubleType + override def foldable: Boolean = child.foldable + override def nullable: Boolean = true + override def toString: String = s$name($child) + + lazy val numeric = child.dataType match { +case n: NumericType = n.numeric.asInstanceOf[Numeric[Any]] +case other = sys.error(sType $other does not support numeric operations) + } +} + +abstract class MathematicalExpressionForDouble(f: Double = Double, name: String) + extends MathematicalExpression(name) { self: Product = + override def eval(input: Row): Any = { +val evalE = child.eval(input) +if (evalE == null) { + null +} else { + val result = f(numeric.toDouble(evalE)) + if (result.isNaN) null + else result +} + } +} + +abstract class MathematicalExpressionForInt(f: Int = Int, name: String) + extends MathematicalExpression(name) { self: Product = + override def dataType: DataType = IntegerType + + override def eval(input: Row): Any = { +val evalE = child.eval(input) +if (evalE == null) { + null +} else { + f(numeric.toInt(evalE)) +} + } +} + +abstract class MathematicalExpressionForFloat(f: Float = Float, name: String) + extends MathematicalExpression(name) { self: Product = + + override def dataType: DataType = FloatType + + override def eval(input: Row): Any = { +val evalE = child.eval(input) +if (evalE == null) { + null +} else { + val result = f(numeric.toFloat(evalE)) + if (result.isNaN) null + else result +} + } +} + +abstract class MathematicalExpressionForLong(f: Long = Long, name: String) + extends MathematicalExpression(name) { self: Product = + + override def dataType: DataType = LongType + + override def eval(input: Row): Any = { +val evalE = child.eval(input) +if (evalE == null) { + null +} else { + f(numeric.toLong(evalE)) +} + } +} + +case class Sin(child: Expression) extends MathematicalExpressionForDouble(math.sin, SIN) + +case class Asin(child: Expression) extends MathematicalExpressionForDouble(math.asin, ASIN) + +case class Sinh(child: Expression) extends MathematicalExpressionForDouble(math.sinh, SINH) + +case class Cos(child: Expression) extends MathematicalExpressionForDouble(math.cos, COS) + +case class Acos(child: Expression) extends MathematicalExpressionForDouble(math.acos, ACOS) + +case class Cosh(child: Expression) extends MathematicalExpressionForDouble(math.cosh, COSH) + +case class Tan(child: Expression) extends MathematicalExpressionForDouble(math.tan, TAN) + +case class Atan(child: Expression) extends MathematicalExpressionForDouble(math.atan, ATAN) + +case class Tanh(child: Expression) extends MathematicalExpressionForDouble(math.tanh, TANH) + +case class Ceil(child: Expression) extends MathematicalExpressionForDouble(math.ceil, CEIL) + +case class Floor(child: Expression) extends MathematicalExpressionForDouble(math.floor, FLOOR) + +case class Rint(child: Expression) extends MathematicalExpressionForDouble(math.rint, ROUND) + +case class Cbrt(child: Expression) extends
[GitHub] spark pull request: [SPARK-2691][Mesos] Support for Mesos DockerIn...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3074#issuecomment-96795918 [Test build #721 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/721/consoleFull) for PR 3074 at commit [`064101c`](https://github.com/apache/spark/commit/064101c0096eb44b7d91fa62bafa27756279aca2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5945] Spark should not retry a stage in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5636#issuecomment-96795847 [Test build #30990 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30990/consoleFull) for PR 5636 at commit [`0335b96`](https://github.com/apache/spark/commit/0335b967b4b1a91782b5a608220c9c3eeb0bf8e1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `tachyon-0.6.4.jar` * `tachyon-client-0.6.4.jar` * This patch **removes the following dependencies:** * `tachyon-0.5.0.jar` * `tachyon-client-0.5.0.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r29190417 --- Diff: core/src/main/scala/org/apache/spark/status/api/v1/api.scala --- @@ -0,0 +1,253 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.status.api.v1 + +import java.util.Date + +import scala.collection.Map + +import org.apache.spark.JobExecutionStatus + +class ApplicationInfo( --- End diff -- this was intentional, to get them covered by mima. I also think the goal is provide more stability than implied by `@DeveloperApi` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7175] Upgrade to Hive 1.1.0
Github user punya commented on the pull request: https://github.com/apache/spark/pull/5726#issuecomment-96796911 Closing in favor of work on https://issues.apache.org/jira/browse/SPARK-6906. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [ML] SPARK-2426: Integrate Breeze NNLS with ML...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5005#issuecomment-96807849 **[Test build #31054 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31054/consoleFull)** for PR 5005 at commit [`2e0603a`](https://github.com/apache/spark/commit/2e0603a0f94c51d3dae64883f2bd91f3080f9c7e) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6030][CORE] Using simulated field layou...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/4783#issuecomment-96815195 Thanks @advancedxy for fixing the tests. This change LGTM. @rxin @srowen Could you also take a final look at this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Build] Enable MiMa checks for launcher and sq...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5727#issuecomment-96817556 Ah, `launcher` was only added in 1.4, so I'll put the MiMa exclude back. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29191148 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -740,15 +723,29 @@ private[spark] class ExternalSorter[K, V, C]( in.close() } } +} else if (spills.isEmpty partitionWriters == null) { + // Case where we only have in-memory data + val collection = if (aggregator.isDefined) map else buffer + val it = collection.destructiveSortedWritablePartitionedIterator(comparator) + while (it.hasNext) { +val writer = blockManager.getDiskWriter( + blockId, outputFile, ser, fileBufferSize, context.taskMetrics.shuffleWriteMetrics.get) +val partitionId = it.nextPartition() +while (it.hasNext it.nextPartition() == partitionId) { + it.writeNext(writer) +} +writer.commitAndClose() +val segment = writer.fileSegment() +lengths(partitionId) = segment.length + } } else { - // Either we're not bypassing merge-sort or we have only in-memory data; get an iterator by - // partition and just write everything directly. + // Not bypassing merge-sort; get an iterator by partition and just write everything directly. --- End diff -- That is correct. So there's definitely room for performance optimization here, but I thought it would be easier as a followup. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7168] [BUILD] Update plugin versions in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5720#issuecomment-96826279 [Test build #31063 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31063/consoleFull) for PR 5720 at commit [`98a8947`](https://github.com/apache/spark/commit/98a8947fbd62bc048e2462b37627966a7dfd9e11). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-7107 Add parameter for zookeeper.znode.p...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/5673#issuecomment-96829768 Thanks. Merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3454] [WIP] separate json endpoints for...
Github user squito commented on a diff in the pull request: https://github.com/apache/spark/pull/4435#discussion_r29183894 --- Diff: core/src/main/java/org/apache/spark/status/api/v1/TaskSorting.java --- @@ -0,0 +1,45 @@ +package org.apache.spark.status.api.v1;/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +import org.apache.spark.status.api.EnumUtil; + +import java.util.HashSet; +import java.util.Set; + +public enum TaskSorting { --- End diff -- unfortunately jersey requires them to be public -- I'll tag w/ `@DeveloperApi` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7120][SPARK-7121] Closure cleaner nesti...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5685#issuecomment-96796263 [Test build #30977 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30977/consoleFull) for PR 5685 at commit [`6d4d3f1`](https://github.com/apache/spark/commit/6d4d3f1ac8da883fb814613afec35900b078b751). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `tachyon-0.6.4.jar` * `tachyon-client-0.6.4.jar` * This patch **removes the following dependencies:** * `tachyon-0.5.0.jar` * `tachyon-client-0.5.0.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29191682 --- Diff: core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala --- @@ -0,0 +1,254 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +import java.io.InputStream +import java.nio.IntBuffer +import java.util.Comparator + +import org.apache.spark.SparkEnv +import org.apache.spark.serializer.{JavaSerializerInstance, SerializerInstance} +import org.apache.spark.storage.BlockObjectWriter +import org.apache.spark.util.collection.PartitionedSerializedPairBuffer._ + +/** + * Append-only buffer of key-value pairs, each with a corresponding partition ID, that serializes + * its records upon insert and stores them as raw bytes. + * + * We use two data-structures to store the contents. The serialized records are stored in a + * ChainedBuffer that can expand gracefully as records are added. This buffer is accompanied by a + * metadata buffer that stores pointers into the data buffer as well as the partition ID of each + * record. Each entry in the metadata buffer takes up a fixed amount of space. + * + * Sorting the collection means swapping entries in the metadata buffer - the record buffer need not + * be modified at all. Storing the partition IDs in the metadata buffer means that comparisons can + * happen without following any pointers, which should minimize cache misses. + * + * Currently, only sorting by partition is supported. + * + * @param metaInitialRecords The initial number of entries in the metadata buffer. + * @param kvBlockSize The size of each byte buffer in the ChainedBuffer used to store the records. + * @param serializerInstance the serializer used for serializing inserted records. + */ +private[spark] class PartitionedSerializedPairBuffer[K, V]( +metaInitialRecords: Int, +kvBlockSize: Int, +serializerInstance: SerializerInstance = SparkEnv.get.serializer.newInstance) + extends WritablePartitionedPairCollection[K, V] { + + if (serializerInstance.isInstanceOf[JavaSerializerInstance]) { +throw new IllegalArgumentException(PartitionedSerializedPairBuffer does not support + + Java-serialized objects.) + } + + private var metaBuffer = IntBuffer.allocate(metaInitialRecords * NMETA) + + private val kvBuffer: ChainedBuffer = new ChainedBuffer(kvBlockSize) + private val kvOutputStream = new ChainedBufferOutputStream(kvBuffer) + private val kvSerializationStream = serializerInstance.serializeStream(kvOutputStream) + + def insert(partition: Int, key: K, value: V): Unit = { +if (metaBuffer.position == metaBuffer.capacity) { + growMetaBuffer() +} + +val keyStart = kvBuffer.size +if (keyStart 0) { + throw new Exception(sCan't grow buffer beyond ${1 31} bytes) +} +kvSerializationStream.writeObject[Any](key) +kvSerializationStream.flush() +val valueStart = kvBuffer.size +kvSerializationStream.writeObject[Any](value) +kvSerializationStream.flush() +val valueEnd = kvBuffer.size + +metaBuffer.put(keyStart) +metaBuffer.put(valueStart) +metaBuffer.put(valueEnd) +metaBuffer.put(partition) + } + + /** Double the size of the array because we've reached capacity */ + private def growMetaBuffer(): Unit = { +if (metaBuffer.capacity * 4 = (1 30)) { + // Doubling the capacity would create an array bigger than Int.MaxValue, so don't + throw new Exception( +sCan't grow buffer beyond ${(1 30) / (NMETA * 4)} elements) +} +val newMetaBuffer = IntBuffer.allocate(metaBuffer.capacity * 2) +newMetaBuffer.put(metaBuffer.array) +metaBuffer = newMetaBuffer + } + + /** Iterate through the data in a given order. For this
[GitHub] spark pull request: [SPARK-6862][Streaming][WebUI] Add BatchPage t...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5473#issuecomment-96827274 [Test build #31070 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31070/consoleFull) for PR 5473 at commit [`cb62e4f`](https://github.com/apache/spark/commit/cb62e4fe27763a23f7c925fc7086d3f606dc7034). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7056][Streaming] Make the Write Ahead L...
Github user helena commented on a diff in the pull request: https://github.com/apache/spark/pull/5645#discussion_r29194436 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala --- @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag]( logDebug(sRead partition data of $this from block manager, block $blockId) iterator case None = // Data not found in Block Manager, grab it from write ahead log file -val reader = new WriteAheadLogRandomReader(partition.segment.path, hadoopConf) -val dataRead = reader.read(partition.segment) -reader.close() +var dataRead: ByteBuffer = null --- End diff -- ByteBuffer.wrap(new byte[0]) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add support for zipping a sequence of RDDs
Github user mohitjaggi commented on the pull request: https://github.com/apache/spark/pull/2429#issuecomment-96791360 closing on sean's request. i have a workaround. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: add support for zipping a sequence of RDDs
Github user mohitjaggi closed the pull request at: https://github.com/apache/spark/pull/2429 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7009] repackaging spark assembly jar wi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5637#issuecomment-96796679 [Test build #30989 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30989/consoleFull) for PR 5637 at commit [`ab38c71`](https://github.com/apache/spark/commit/ab38c71356c23d63ca9f3990c8c0f0b8e8fc7976). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **removes the following dependencies:** * `RoaringBitmap-0.4.5.jar` * `activation-1.1.jar` * `akka-actor_2.10-2.3.4-spark.jar` * `akka-remote_2.10-2.3.4-spark.jar` * `akka-slf4j_2.10-2.3.4-spark.jar` * `aopalliance-1.0.jar` * `arpack_combined_all-0.1.jar` * `avro-1.7.7.jar` * `breeze-macros_2.10-0.11.2.jar` * `breeze_2.10-0.11.2.jar` * `chill-java-0.5.0.jar` * `chill_2.10-0.5.0.jar` * `commons-beanutils-1.7.0.jar` * `commons-beanutils-core-1.8.0.jar` * `commons-cli-1.2.jar` * `commons-codec-1.10.jar` * `commons-collections-3.2.1.jar` * `commons-compress-1.4.1.jar` * `commons-configuration-1.6.jar` * `commons-digester-1.8.jar` * `commons-httpclient-3.1.jar` * `commons-io-2.1.jar` * `commons-lang-2.5.jar` * `commons-lang3-3.3.2.jar` * `commons-math-2.1.jar` * `commons-math3-3.4.1.jar` * `commons-net-2.2.jar` * `compress-lzf-1.0.0.jar` * `config-1.2.1.jar` * `core-1.1.2.jar` * `curator-client-2.4.0.jar` * `curator-framework-2.4.0.jar` * `curator-recipes-2.4.0.jar` * `gmbal-api-only-3.0.0-b023.jar` * `grizzly-framework-2.1.2.jar` * `grizzly-http-2.1.2.jar` * `grizzly-http-server-2.1.2.jar` * `grizzly-http-servlet-2.1.2.jar` * `grizzly-rcm-2.1.2.jar` * `groovy-all-2.3.7.jar` * `guava-14.0.1.jar` * `guice-3.0.jar` * `hadoop-annotations-2.2.0.jar` * `hadoop-auth-2.2.0.jar` * `hadoop-client-2.2.0.jar` * `hadoop-common-2.2.0.jar` * `hadoop-hdfs-2.2.0.jar` * `hadoop-mapreduce-client-app-2.2.0.jar` * `hadoop-mapreduce-client-common-2.2.0.jar` * `hadoop-mapreduce-client-core-2.2.0.jar` * `hadoop-mapreduce-client-jobclient-2.2.0.jar` * `hadoop-mapreduce-client-shuffle-2.2.0.jar` * `hadoop-yarn-api-2.2.0.jar` * `hadoop-yarn-client-2.2.0.jar` * `hadoop-yarn-common-2.2.0.jar` * `hadoop-yarn-server-common-2.2.0.jar` * `ivy-2.4.0.jar` * `jackson-annotations-2.4.0.jar` * `jackson-core-2.4.4.jar` * `jackson-core-asl-1.8.8.jar` * `jackson-databind-2.4.4.jar` * `jackson-jaxrs-1.8.8.jar` * `jackson-mapper-asl-1.8.8.jar` * `jackson-module-scala_2.10-2.4.4.jar` * `jackson-xc-1.8.8.jar` * `jansi-1.4.jar` * `javax.inject-1.jar` * `javax.servlet-3.0.0.v201112011016.jar` * `javax.servlet-3.1.jar` * `javax.servlet-api-3.0.1.jar` * `jaxb-api-2.2.2.jar` * `jaxb-impl-2.2.3-1.jar` * `jcl-over-slf4j-1.7.10.jar` * `jersey-client-1.9.jar` * `jersey-core-1.9.jar` * `jersey-grizzly2-1.9.jar` * `jersey-guice-1.9.jar` * `jersey-json-1.9.jar` * `jersey-server-1.9.jar` * `jersey-test-framework-core-1.9.jar` * `jersey-test-framework-grizzly2-1.9.jar` * `jets3t-0.7.1.jar` * `jettison-1.1.jar` * `jetty-util-6.1.26.jar` * `jline-0.9.94.jar` * `jline-2.10.4.jar` * `jodd-core-3.6.3.jar` * `json4s-ast_2.10-3.2.10.jar` * `json4s-core_2.10-3.2.10.jar` * `json4s-jackson_2.10-3.2.10.jar` * `jsr305-1.3.9.jar` * `jtransforms-2.4.0.jar` * `jul-to-slf4j-1.7.10.jar` * `kryo-2.21.jar` * `log4j-1.2.17.jar` * `lz4-1.2.0.jar` * `management-api-3.0.0-b012.jar` * `mesos-0.21.0-shaded-protobuf.jar` * `metrics-core-3.1.0.jar` * `metrics-graphite-3.1.0.jar` * `metrics-json-3.1.0.jar` * `metrics-jvm-3.1.0.jar` * `minlog-1.2.jar` * `netty-3.8.0.Final.jar` * `netty-all-4.0.23.Final.jar` * `objenesis-1.2.jar` * `opencsv-2.3.jar` * `oro-2.0.8.jar` * `paranamer-2.6.jar` * `parquet-column-1.6.0rc3.jar` * `parquet-common-1.6.0rc3.jar` * `parquet-encoding-1.6.0rc3.jar` * `parquet-format-2.2.0-rc1.jar` * `parquet-generator-1.6.0rc3.jar` * `parquet-hadoop-1.6.0rc3.jar` * `parquet-jackson-1.6.0rc3.jar` * `protobuf-java-2.4.1.jar` * `protobuf-java-2.5.0-spark.jar` * `py4j-0.8.2.1.jar` * `pyrolite-2.0.1.jar` * `quasiquotes_2.10-2.0.1.jar` * `reflectasm-1.07-shaded.jar` * `scala-compiler-2.10.4.jar` * `scala-library-2.10.4.jar` *
[GitHub] spark pull request: [SPARK-6856] [R] Make RDD information more use...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/5667#issuecomment-96820582 @srowen Could you add Joof as a contributor on our JIRA ? The assignee auto-complete doesn't seem to pick this up right now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4550. In sort-based shuffle, store map o...
Github user sryza commented on a diff in the pull request: https://github.com/apache/spark/pull/4450#discussion_r29193245 --- Diff: core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +import java.util.Comparator + +import org.apache.spark.storage.BlockObjectWriter + +/** + * A common interface for size-tracking collections of key-value pairs that + * - Have an associated partition for each key-value pair. + * - Support a memory-efficient sorted iterator + * - Support a WritablePartitionedIterator for writing the contents directly as bytes. + */ +private[spark] trait WritablePartitionedPairCollection[K, V] extends SizeTracker { + /** + * Insert a key-value pair with a partition into the collection + */ + def insert(partition: Int, key: K, value: V): Unit + + /** + * Estimate the collection's current memory usage in bytes. + */ + def estimateSize(): Long + + /** + * Iterate through the data in order of partition ID and then the given comparator. This may + * destroy the underlying collection. + */ + def partitionedDestructiveSortedIterator(keyComparator: Comparator[K]): Iterator[((Int, K), V)] + + /** + * Iterate through the data and write out the elements instead of returning them. Records are + * returned in order of their partition ID and then the given comparator. + * This may destroy the underlying collection. + */ + def destructiveSortedWritablePartitionedIterator(keyComparator: Comparator[K]) +: WritablePartitionedIterator = { + WritablePartitionedIterator.fromIterator(partitionedDestructiveSortedIterator(keyComparator)) + } + + /** + * Iterate through the data and write out the elements instead of returning them. + */ + def writablePartitionedIterator(): WritablePartitionedIterator +} + +private[spark] object WritablePartitionedPairCollection { + /** + * A comparator for (Int, K) pairs that orders them by only their partition ID. + */ + def partitionComparator[K]: Comparator[(Int, K)] = new Comparator[(Int, K)] { +override def compare(a: (Int, K), b: (Int, K)): Int = { + a._1 - b._1 +} + } + + /** + * A comparator for (Int, K) pairs that orders them both by their partition ID and a key ordering. + */ + def partitionKeyComparator[K](keyComparator: Comparator[K]): Comparator[(Int, K)] = { +new Comparator[(Int, K)] { + override def compare(a: (Int, K), b: (Int, K)): Int = { +val partitionDiff = a._1 - b._1 +if (partitionDiff != 0) { + partitionDiff +} else { + keyComparator.compare(a._2, b._2) +} + } +} + } +} + +/** + * Iterator that writes elements to a BlockObjectWriter instead of returning them. Each element + * has an associated partition. + */ +private[spark] trait WritablePartitionedIterator { + def writeNext(writer: BlockObjectWriter): Unit + + def hasNext(): Boolean + + def nextPartition(): Int --- End diff -- `WritablePartitionedIterator` doesn't have a `next` method (just `writeNext`), so I don't think there would be anything to peek at. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7017][Build][Project Infra]: Refactor d...
Github user brennonyork commented on the pull request: https://github.com/apache/spark/pull/5694#issuecomment-96827964 jenkins, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7174][Core] Move calling `TaskScheduler...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5723#issuecomment-96829203 [Test build #31067 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31067/consoleFull) for PR 5723 at commit [`98bfe48`](https://github.com/apache/spark/commit/98bfe48d603c56f45945049b72a484686e2d0be2). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6734] [SQL] Add UDTF.close support in G...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/5383#issuecomment-96516549 @liancheng @marmbrus Any more comments? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7160][SQL] Support converting DataFrame...
GitHub user rayortigas opened a pull request: https://github.com/apache/spark/pull/5713 [SPARK-7160][SQL] Support converting DataFrames to typed RDDs. https://issues.apache.org/jira/browse/SPARK-7160 https://github.com/databricks/spark-csv/pull/52 cc: @rxin (who made the original suggestion) @vlyubin #5279 @punya #5578 @davies #5350 @marmbrus (ScalaReflection and more) You can merge this pull request into a Git repository by running: $ git pull https://github.com/rayortigas/spark df-to-typed-rdd Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5713.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5713 commit add51b6ad8f0ffe0ed600917d4339a531da07750 Author: Ray Ortigas r...@linkedin.com Date: 2015-04-27T06:27:50Z [SPARK-7160][SQL] Support converting DataFrames to typed RDDs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7158] [SQL] Fix bug of cached data cann...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5714#issuecomment-96552127 [Test build #30963 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30963/consoleFull) for PR 5714 at commit [`e2c4298`](https://github.com/apache/spark/commit/e2c429829e0525b72eaf0d2879d735ab75072c43). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6505][SQL]Remove the reflection call in...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5660#issuecomment-96515766 Thanks for working on this! I'm merging this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-96516483 @liancheng @rxin @marmbrus can you trigger the unit test for me? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl
Github user scwf commented on a diff in the pull request: https://github.com/apache/spark/pull/5715#discussion_r29126807 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala --- @@ -81,11 +81,38 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { protected[sql] def convertCTAS: Boolean = getConf(spark.sql.hive.convertCTAS, false).toBoolean - override protected[sql] def executePlan(plan: LogicalPlan): this.QueryExecution = -new this.QueryExecution(plan) + /* A catalyst metadata catalog that points to the Hive Metastore. */ + @transient + override protected[sql] lazy val catalog = new HiveMetastoreCatalog(this) with OverrideCatalog + + // Note that HiveUDFs will be overridden by functions registered in this context. + @transient + override protected[sql] lazy val functionRegistry = +new HiveFunctionRegistry with OverrideFunctionRegistry { + def caseSensitive: Boolean = false +} + /* An analyzer that uses the Hive metastore. */ @transient - protected[sql] val ddlParserWithHiveQL = new DDLParser(HiveQl.parseSql(_)) --- End diff -- we do not need this, since if we override sqlParser, we can inherited from sqlcontext the ddlParser ` protected[sql] val ddlParser = new DDLParser(sqlParser.parse(_)) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6888][SQL] Make the jdbc driver handlin...
Github user rtreffer commented on the pull request: https://github.com/apache/spark/pull/#issuecomment-96573899 @marmbrus what should we do now? New PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4705:[core] Write event logs of differen...
Github user twinkle-sachdeva closed the pull request at: https://github.com/apache/spark/pull/4845 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7155] [CORE] Allow newAPIHadoopFile to ...
Github user yongtang commented on the pull request: https://github.com/apache/spark/pull/5708#issuecomment-96524002 @srowen Thanks for the comment. I updated the pull request so that setInputPaths instead of addInputPaths are used. In addition to newAPIHadoopFile(), the instances of addInputPath inside wholeTextFiles() and binaryFiles() have also been updated with setInputPaths. That should bring behavior consistency across all ScalaContext.scala. The unit test for this issue has also been updated to cover every method involved. Please let me know if there is anything else that needs to be taken care of. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1406] Mllib pmml model export
Github user selvinsource commented on the pull request: https://github.com/apache/spark/pull/3062#issuecomment-96544425 For binary logistic regression, using the same principle (intercept as threshold), doing some maths, we could set: `intercept = -ln(1/threshold - 1)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7163] [SQL] minor refactory for HiveQl
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5715#issuecomment-96552130 [Test build #30962 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30962/consoleFull) for PR 5715 at commit [`f76a7b1`](https://github.com/apache/spark/commit/f76a7b1eb2cec2c922f8a82e3e67da03984e886e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7162][YARN]Launcher error in yarn-clien...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5716#issuecomment-96552117 [Test build #30961 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30961/consoleFull) for PR 5716 at commit [`b64564c`](https://github.com/apache/spark/commit/b64564c74248ef137ed3352e145735ce669bccf8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-6735:[YARN] Adding properties to disable...
Github user twinkle-sachdeva commented on the pull request: https://github.com/apache/spark/pull/5449#issuecomment-96515946 Hi @srowen , Please review the changes. Thanks, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-96516744 I think Jenkins is having some trouble right now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5213] [SQL] Pluggable SQL Parser Suppor...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4015#issuecomment-96516751 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6505][SQL]Remove the reflection call in...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5660 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-1556: bump jets3t version to 0.9.0
Github user LuqmanSahaf commented on the pull request: https://github.com/apache/spark/pull/468#issuecomment-96522017 @darose I am facing the VerifyError you mentioned in one of the comments. Can you tell me how you solved that error? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org