[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1241#issuecomment-48569956 Actually it's still necessary once the PR gets merged. It basically compares all the public interfaces at bytecode level against 1.0, and see if any public interfaces changed. This is done in place to ensure API compatibility throughout the 1.x release cycle. Unfortunately, all the methods that are labeled "private[package_name]" loses the private visibility at bytecode level, so the compatibility checker is more conservative and we would need to manually mark these interface changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14751335 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/csv/CsvRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.csv + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.expressions.{GenericMutableRow, AttributeReference, Row} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan} +import org.apache.spark.sql.Logging + +private[sql] object CsvRDD extends Logging { + + private[sql] def inferSchema( + csv: RDD[String], + delimiter: String, + quote: String, --- End diff -- is it ever possible for "quote" to be something that's longer than 1 char? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14751361 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", --- End diff -- Is it possible to use a combination of single quotes and double quotes in the same file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: Clean up SparkKMeans example's code
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1352#issuecomment-48570213 Merging in master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2115: Stage kill link is too close to st...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1350#issuecomment-48570288 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1346#issuecomment-48570378 @yhuai I haven't looked at the changes yet, but can you make sure the end API is usable in Java? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14752632 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/csv/CsvRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.csv + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.expressions.{GenericMutableRow, AttributeReference, Row} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan} +import org.apache.spark.sql.Logging + +private[sql] object CsvRDD extends Logging { + + private[sql] def inferSchema( + csv: RDD[String], + delimiter: String, + quote: String, --- End diff -- What are the examples of quotes that are more than 1 char long? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14752678 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", --- End diff -- Ok if the standard dictates that only double quotes are allowed, it is actually confusing to allow arbitrary quotes (it is one thing to allow single quotes since they might be common, but I wouldn't allow arbitrary length quotes). This is adding unnecessary complexity to both the interface and the implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753118 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/csv/CsvRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.csv + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.expressions.{GenericMutableRow, AttributeReference, Row} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan} +import org.apache.spark.sql.Logging + +private[sql] object CsvRDD extends Logging { + + private[sql] def inferSchema( + csv: RDD[String], + delimiter: String, + quote: String, --- End diff -- Do people actually use this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753240 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", --- End diff -- Actually both Pandas and R only allow single character quotes. I think it make sense to allow changing quotes, but allowing multi character quotes is just plain confusing and also inconsistent with the standard / other tools. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753286 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", + header: Boolean = false): SchemaRDD = { +val csv = sparkContext.textFile(path) --- End diff -- textFile creates an RDD of lines. This actually doesn't work if a record contains new lines (inside quotes). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala --- @@ -130,6 +131,47 @@ class SQLContext(@transient val sparkContext: SparkContext) new SchemaRDD(this, JsonRDD.inferSchema(json, samplingRatio)) /** + * Loads a CSV file (according to RFC 4180) and returns the result as a [[SchemaRDD]]. + * + * NOTE: If there are new line characters inside quoted fields this method may fail to + * parse correctly, because the two lines may be in different partitions. Use + * [[SQLContext#csvRDD]] to parse such files. + * + * @param path path to input file + * @param delimiter Optional delimiter (default is comma) + * @param quote Optional quote character or string (default is '"') + * @param header Optional flag to indicate first line of each file is the header + * (default is false) + */ + def csvFile(path: String, + delimiter: String = ",", + quote: String = "\"", + header: Boolean = false): SchemaRDD = { +val csv = sparkContext.textFile(path) --- End diff -- To do this properly, we would either need a new input format that handles CSV line splits, or assemble the lines back from textFile. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753430 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/csv/CsvRDD.scala --- @@ -0,0 +1,91 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.csv + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.types._ +import org.apache.spark.sql.catalyst.expressions.{GenericMutableRow, AttributeReference, Row} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.{ExistingRdd, SparkLogicalPlan} +import org.apache.spark.sql.Logging + +private[sql] object CsvRDD extends Logging { + + private[sql] def inferSchema( + csv: RDD[String], + delimiter: String, + quote: String, + useHeader: Boolean): LogicalPlan = { + +// Constructing schema +// TODO: Infer types based on a sample and/or let user specify types/schema +val firstLine = csv.first() +// Assuming first row is representative and using it to determine number of fields +val firstRow = new CsvTokenizer(Seq(firstLine).iterator, delimiter, quote).next() +val header = if (useHeader) { + firstRow +} else { + firstRow.zipWithIndex.map { case (value, index) => s"V$index" } +} + +val schemaFields = header.map { fieldName => + StructField(fieldName.asInstanceOf[String], StringType, nullable = true) +} +val schema = StructType(schemaFields) + +val parsedCSV = csv.mapPartitions { iter => + // When using header, any input line that equals firstLine is assumed to be header + val csvIter = if (useHeader) { +iter.filter(_ != firstLine) + } else { +iter + } + val tokenIter = new CsvTokenizer(csvIter, delimiter, quote) + parseCSV(tokenIter, schema) +} + +SparkLogicalPlan(ExistingRdd(asAttributes(schema), parsedCSV)) + } + + private def castToType(value: Any, dataType: DataType): Any = dataType match { --- End diff -- We should consolidate all the type related functions in @yhuai's PR on public/schema type. We have many similar code blocks like this in Spark SQL, and many of them are missing stuff (e.g. this one misses Timestamp, Decimal, etc) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP] SPARK-2360: CSV import to SchemaRDDs
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1351#discussion_r14753602 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/csv/CsvTokenizer.scala --- @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.csv + +import scala.collection.mutable.ArrayBuffer + +/** + * Tokenizer based on RFC 4180 for comma separated values. + * It implements an iterator that returns each tokenized line as an Array[Any]. + */ +private[sql] class CsvTokenizer(inputIter: Iterator[String], --- End diff -- Can we use an existing tokenizer such as http://commons.apache.org/proper/commons-lang/javadocs/api-3.1/org/apache/commons/lang3/text/StrTokenizer.html ? We already depend on commons-lang in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2115: Stage kill link is too close to st...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1350#issuecomment-48576867 Thanks - merging this in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2393][SQL] Prototype implementation of ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1238#issuecomment-48577069 If this is no longer WIP, can you change the title to make it more descriptive? At the very least I wouldn't include the word "Prototype". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1346#discussion_r14754430 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala --- @@ -93,47 +92,56 @@ abstract class DataType { } def isPrimitive: Boolean = false + + def simpleString: String } -case object NullType extends DataType +case object NullType extends DataType { + def simpleString: String = "null" +} trait PrimitiveType extends DataType { override def isPrimitive = true } abstract class NativeType extends DataType { - type JvmType - @transient val tag: TypeTag[JvmType] - val ordering: Ordering[JvmType] + private[sql] type JvmType + @transient private[sql] val tag: TypeTag[JvmType] + private[sql] val ordering: Ordering[JvmType] - @transient val classTag = { + @transient private[sql] val classTag = { val mirror = runtimeMirror(Utils.getSparkClassLoader) ClassTag[JvmType](mirror.runtimeClass(tag.tpe)) } } case object StringType extends NativeType with PrimitiveType { - type JvmType = String - @transient lazy val tag = typeTag[JvmType] - val ordering = implicitly[Ordering[JvmType]] + private[sql] type JvmType = String + @transient private[sql] lazy val tag = typeTag[JvmType] + private[sql] val ordering = implicitly[Ordering[JvmType]] + def simpleString: String = "string" } --- End diff -- while you at it, add a blank line to separate each class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [WIP][SPARK-2179][SQL] Public API for DataType...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1346#discussion_r14754724 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SchemaRDDLike.scala --- @@ -123,9 +123,12 @@ private[sql] trait SchemaRDDLike { def saveAsTable(tableName: String): Unit = sqlContext.executePlan(InsertIntoCreatedTable(None, tableName, logicalPlan)).toRdd + /** Returns the schema. */ + def schema: StructType = queryExecution.analyzed.schema + /** Returns the output schema in the tree format. */ - def schemaString: String = queryExecution.analyzed.schemaString + def formattedSchemaString: String = schema.formattedSchemaString --- End diff -- Do we have to change this API? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2427: Fix Scala examples that use the wr...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1353#discussion_r14755685 --- Diff: examples/src/main/scala/org/apache/spark/examples/SparkPageRank.scala --- @@ -31,8 +31,12 @@ import org.apache.spark.{SparkConf, SparkContext} */ object SparkPageRank { def main(args: Array[String]) { +if (args.length < 1) { + System.err.println("Usage: SparkPageRank ") + System.exit(1) +} val sparkConf = new SparkConf().setAppName("PageRank") -var iters = args(1).toInt +val iters = if (args.length > 0) args(1).toInt else 100 --- End diff -- 100 is pretty big. Can we set this to a small number, e.g. 10 or 20? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: SPARK-2427: Fix Scala examples that use the wr...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1353#issuecomment-48640422 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1744#discussion_r15780210 --- Diff: dev/lint-python --- @@ -0,0 +1,63 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" +SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)" +PEP8_REPORT_PATH="$SPARK_ROOT_DIR/dev/pep8-report.txt" + +cd $SPARK_ROOT_DIR + +# Get pep8 at runtime so that we don't rely on it being installed on the build server. +#+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162 +#+ TODOs: +#+ - Dynamically determine latest release version of pep8 and use that. +#+ - Download this from a more reliable source. (GitHub raw can be flaky, apparently. (?)) +PEP8_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pep8.py" +PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/jcrocholl/pep8/1.5.7/pep8.py"; + +curl --silent -o "$PEP8_SCRIPT_PATH" "$PEP8_SCRIPT_REMOTE_PATH" +curl_status=$? + +if [ $curl_status -ne 0 ] +then +echo "Failed to download pep8.py from \"$PEP8_SCRIPT_REMOTE_PATH\"." +exit $curl_status +fi + + +# There is no need to write this output to a file +#+ first, but we do so so that the check status can +#+ be output before the report, like with the +#+ scalastyle and RAT checks. +python $PEP8_SCRIPT_PATH ./python --exclude="cloudpickle.py" \ +> "$PEP8_REPORT_PATH" +pep8_status=${PIPESTATUS[0]} #$? + +if [ $pep8_status -ne 0 ] +then --- End diff -- here too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1744#discussion_r15780205 --- Diff: dev/lint-python --- @@ -0,0 +1,63 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +SCRIPT_DIR="$( cd "$( dirname "$0" )" && pwd )" +SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)" +PEP8_REPORT_PATH="$SPARK_ROOT_DIR/dev/pep8-report.txt" + +cd $SPARK_ROOT_DIR + +# Get pep8 at runtime so that we don't rely on it being installed on the build server. +#+ See: https://github.com/apache/spark/pull/1744#issuecomment-50982162 +#+ TODOs: +#+ - Dynamically determine latest release version of pep8 and use that. +#+ - Download this from a more reliable source. (GitHub raw can be flaky, apparently. (?)) +PEP8_SCRIPT_PATH="$SPARK_ROOT_DIR/dev/pep8.py" +PEP8_SCRIPT_REMOTE_PATH="https://raw.githubusercontent.com/jcrocholl/pep8/1.5.7/pep8.py"; + +curl --silent -o "$PEP8_SCRIPT_PATH" "$PEP8_SCRIPT_REMOTE_PATH" +curl_status=$? + +if [ $curl_status -ne 0 ] --- End diff -- can we follow the rest of spark bash style to put then on the same line as the if? we can also reduce the level of indication there --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1772 [SPARK-2323] Exception in accumulator update should not crash DAGScheduler & SparkContext You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark accumulator-dagscheduler Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1772.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1772 commit 360b46d2cb5cf64137d08f5089588083dfe43771 Author: Reynold Xin Date: 2014-08-04T22:54:48Z [SPARK-2323] Exception in accumulator update should not crash DAGScheduler & SparkContext. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51132429 This doesn't actually weaken the semantics because previously it would just crash. Given the 1.1 timeline, it is best to use the current semantics, which is really "no guarantee" (Patrick's nicer way of putting it is "attempted at least once"). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51132704 Haha, I don't know if it responds to such complex phrases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51132720 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51134330 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51141863 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2315] Implement drop, dropRight and dro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1254#issuecomment-51142915 I'm not sure what's happening. Maybe Jenkins is lazy today. We can retry tomorrow, and if it doesn't work, create a new PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51146388 Merging this in master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2323] Exception in accumulator update s...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1772#issuecomment-51146442 Actually I merged it in master, branch-1.0. and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1780 [SPARK-2856] Decrease initial buffer size for Kryo to 64KB. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark kryo-init-size Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1780.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1780 commit 551b935c3db56cb214ebbea6922cc5b6d37d229a Author: Reynold Xin Date: 2014-08-05T05:52:05Z [SPARK-2856] Decrease initial buffer size for Kryo to 64KB. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1648#discussion_r15796222 --- Diff: sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala --- @@ -38,6 +39,7 @@ class HiveCompatibilitySuite extends HiveQueryFileTest with BeforeAndAfter { override def beforeAll() { TestHive.cacheTables = true +TestHive.set(SQLConf.SHUFFLE_PARTITIONS, "2") --- End diff -- We should keep it at 2 to speed up tests ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1781 [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB. This can substantially reduce memory usage during shuffle. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-2503-spark.shuffle.file.buffer.kb Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1781.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1781 commit 1b8f72b387c10d53b54ae3a2e7eb6b9b72f14d36 Author: Reynold Xin Date: 2014-08-05T06:06:02Z [SPARK-2503] Lower shuffle output buffer (spark.shuffle.file.buffer.kb) to 32KB. This can substantially reduce memory usage during shuffle. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1780#discussion_r15797592 --- Diff: core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala --- @@ -47,7 +47,9 @@ class KryoSerializer(conf: SparkConf) with Logging with Serializable { - private val bufferSize = conf.getInt("spark.kryoserializer.buffer.mb", 2) * 1024 * 1024 + private val bufferSize = +(conf.getDouble("spark.kryoserializer.buffer.mb", 0.064) * 1024 * 1024).toInt --- End diff -- It's pretty obvious, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51164849 Maybe it was started before that fix got merged. Let's run this again, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51164855 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2856] Decrease initial buffer size for ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1780#issuecomment-51164898 Thanks. Merging in master. @andrewor14 if you feel strongly about it, I can push a commit to add an one-line comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2859] Update url of Kryo project in rel...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1782#issuecomment-51165308 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2585] Remove special handling of Hadoop...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1648#issuecomment-51171274 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Set Spark SQL Hive compatibility test shuffle ...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1784 Set Spark SQL Hive compatibility test shuffle partitions to 2. This should improve test runtime because majority of the test runtime are scheduling and task overheads. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark sparksql-test-parallelism Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1784.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1784 commit 6e5ec73a5c069ba65ab093fb718b00b00dac3d36 Author: Reynold Xin Date: 2014-08-05T09:16:43Z Set Spark SQL Hive compatibility test shuffle partitions to 2. This should improve test runtime because majority of the test runtime are scheduling and task overheads. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2869 - Potential leak of Jdbc Connection...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1792#discussion_r15840899 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -70,20 +70,30 @@ class JdbcRDD[T: ClassTag]( override def compute(thePart: Partition, context: TaskContext) = new NextIterator[T] { context.addOnCompleteCallback{ () => closeIfNeeded() } val part = thePart.asInstanceOf[JdbcPartition] -val conn = getConnection() -val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY) +var conn : Connection = _ +var stmt : PreparedStatement = _ +try { --- End diff -- Actually as I look at this again, I think the try is not necessary. Basically the first line in the constructor adds "closeIfNeeded" callback to the complete callback list. Then if any exception is thrown, even during the constructor, I think closeIfNeeded is called to close connections. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2869 - Fix tiny bug in JdbcRdd for closi...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1792#discussion_r15843358 --- Diff: core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala --- @@ -106,7 +106,7 @@ class JdbcRDD[T: ClassTag]( case e: Exception => logWarning("Exception closing statement", e) } try { -if (null != conn && ! stmt.isClosed()) conn.close() +if (null != conn && ! conn.isClosed()) conn.close() --- End diff -- while you are at it, do you mind adding curly braces for the close()? The spark coding style requires curly braces even for one-liner ifs. It's an oversight that we didn't do this for the ifs in this file. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51263909 He's probably en route somewhere because he woke up early this morning to catch a flight... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51263913 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51263924 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51270610 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51273157 I think Jenkins is sick ... we can keep trying ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51273180 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2503] Lower shuffle output buffer (spar...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1781#issuecomment-51274284 Thanks. Merging in master & branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2869 - Fix tiny bug in JdbcRdd for closi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1792#issuecomment-51277474 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Tighten the visibility of various SQLConf meth...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1794 Tighten the visibility of various SQLConf methods and renamed setter/getters You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark sql-conf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1794.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1794 commit 4b19d6ccd3544e02811361cf7cd06327a15335e5 Author: Reynold Xin Date: 2014-08-06T01:15:08Z Tighten the visibility of various SQLConf methods and renamed setter/getters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2869 - Fix tiny bug in JdbcRdd for closi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1792#issuecomment-51282959 Thanks. Merging this in master & branch-1.0 and branch-1.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2862] Use shorthand range notation to a...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1787#issuecomment-51283209 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2861] Fix Doc comment of histogram meth...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1786#issuecomment-51285191 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Tighten the visibility of various SQLCon...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1794#issuecomment-51289764 @concretevitamin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51367875 This looks good to me. What do you think, @JoshRosen / @davies ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51373160 The line length is already 100 in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1799#discussion_r15897877 --- Diff: core/src/main/scala/org/apache/spark/SparkEnv.scala --- @@ -246,8 +250,13 @@ object SparkEnv extends Logging { "." } -val shuffleManager = instantiateClass[ShuffleManager]( - "spark.shuffle.manager", "org.apache.spark.shuffle.hash.HashShuffleManager") +// Let the user specify short names for shuffle managers +val shortShuffleMgrNames = Map( + "HASH" -> "org.apache.spark.shuffle.hash.HashShuffleManager", + "SORT" -> "org.apache.spark.shuffle.sort.SortShuffleManager") +val shuffleMgrName = conf.get("spark.shuffle.manager", "HASH") --- End diff -- can we make this case insensitive? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1799#discussion_r15898020 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -54,87 +55,36 @@ private[spark] class SortShuffleWriter[K, V, C]( /** Write a bunch of records to this task's output */ override def write(records: Iterator[_ <: Product2[K, V]]): Unit = { -// Get an iterator with the elements for each partition ID -val partitions: Iterator[(Int, Iterator[Product2[K, _]])] = { - if (dep.mapSideCombine) { -if (!dep.aggregator.isDefined) { - throw new IllegalStateException("Aggregator is empty for map-side combine") -} -sorter = new ExternalSorter[K, V, C]( - dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer) -sorter.write(records) -sorter.partitionedIterator - } else { -// In this case we pass neither an aggregator nor an ordering to the sorter, because we -// don't care whether the keys get sorted in each partition; that will be done on the -// reduce side if the operation being run is sortByKey. -sorter = new ExternalSorter[K, V, V]( - None, Some(dep.partitioner), None, dep.serializer) -sorter.write(records) -sorter.partitionedIterator +if (dep.mapSideCombine) { + if (!dep.aggregator.isDefined) { +throw new IllegalStateException("Aggregator is empty for map-side combine") } + sorter = new ExternalSorter[K, V, C]( +dep.aggregator, Some(dep.partitioner), dep.keyOrdering, dep.serializer) + sorter.insertAll(records) +} else { + // In this case we pass neither an aggregator nor an ordering to the sorter, because we + // don't care whether the keys get sorted in each partition; that will be done on the + // reduce side if the operation being run is sortByKey. + sorter = new ExternalSorter[K, V, V]( --- End diff -- I think this fits in one line without wrapping ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1799#discussion_r15898462 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -640,9 +713,122 @@ private[spark] class ExternalSorter[K, V, C]( */ def iterator: Iterator[Product2[K, C]] = partitionedIterator.flatMap(pair => pair._2) + /** + * Write all the data added into this ExternalSorter into a file in the disk store, creating + * an .index file for it as well with the offsets of each partition. This is called by the + * SortShuffleWriter and can go through an efficient path of just concatenating binary files + * if we decided to avoid merge-sorting. + * + * @param blockId block ID to write to. The index file will be blockId.name + ".index". + * @param context a TaskContext for a running Spark task, for us to update shuffle metrics. + * @return array of lengths, in bytes, of each partition of the file (used by map output tracker) + */ + def writePartitionedFile(blockId: BlockId, context: TaskContext): Array[Long] = { +val outputFile = blockManager.diskBlockManager.getFile(blockId) + +// Track location of each range in the output file +val offsets = new Array[Long](numPartitions + 1) +val lengths = new Array[Long](numPartitions) + +// Statistics +var totalBytes = 0L +var totalTime = 0L + +if (bypassMergeSort && partitionWriters != null) { --- End diff -- In the comment above partitionWriters, we say: "Array of file writers for each partition, used if bypassMergeSort is true" . This implies that when bypassMergeSort is true, partitionWriters wouldn't be null. Can you document the case when partitionWriters would be null? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1799#discussion_r15898826 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -640,9 +713,122 @@ private[spark] class ExternalSorter[K, V, C]( */ def iterator: Iterator[Product2[K, C]] = partitionedIterator.flatMap(pair => pair._2) + /** + * Write all the data added into this ExternalSorter into a file in the disk store, creating + * an .index file for it as well with the offsets of each partition. This is called by the + * SortShuffleWriter and can go through an efficient path of just concatenating binary files + * if we decided to avoid merge-sorting. + * + * @param blockId block ID to write to. The index file will be blockId.name + ".index". + * @param context a TaskContext for a running Spark task, for us to update shuffle metrics. + * @return array of lengths, in bytes, of each partition of the file (used by map output tracker) + */ + def writePartitionedFile(blockId: BlockId, context: TaskContext): Array[Long] = { +val outputFile = blockManager.diskBlockManager.getFile(blockId) + +// Track location of each range in the output file +val offsets = new Array[Long](numPartitions + 1) +val lengths = new Array[Long](numPartitions) + +// Statistics +var totalBytes = 0L +var totalTime = 0L + +if (bypassMergeSort && partitionWriters != null) { + // We decided to write separate files for each partition, so just concatenate them. To keep + // this simple we spill out the current in-memory collection so that everything is in files. + spillToPartitionFiles(if (aggregator.isDefined) map else buffer) + partitionWriters.foreach(_.commitAndClose()) + var out: FileOutputStream = null + var in: FileInputStream = null + try { +out = new FileOutputStream(outputFile) +for (i <- 0 until numPartitions) { + val file = partitionWriters(i).fileSegment().file --- End diff -- I find this part that uses fileSegments slightly convoluted. But we can deal with this later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2787: Make sort-based shuffle write file...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1799#discussion_r15898996 --- Diff: core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala --- @@ -120,6 +128,18 @@ private[spark] class ExternalSorter[K, V, C]( // How much of the shared memory pool this collection has claimed private var myMemoryThreshold = 0L + // If there are fewer than spark.shuffle.sort.bypassMergeThreshold partitions and we don't need + // local aggregation and sorting, write numPartitions files directly and just concatenate them + // at the end. This avoids doing serialization and deserialization twice to merge together the + // spilled files, which would happen with the normal code path. The downside is having multiple + // files open at a time and thus more memory allocated to buffers. + private val bypassMergeThreshold = conf.getInt("spark.shuffle.sort.bypassMergeThreshold", 200) + private[collection] val bypassMergeSort =// private[collection] for unit tests --- End diff -- You can use this in unit test: http://doc.scalatest.org/1.4.1/org/scalatest/PrivateMethodTester.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51388197 Ok I'm merging this in master. Thanks, @nchammas. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2627] [PySpark] have the build enforce ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1744#issuecomment-51388412 And branch-1.1 too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2583] ConnectionManager error reporting
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1758#issuecomment-51388657 It actually failed deterministically on Josh's laptop. It could be a bug somewhere in the configuration or a racing condition in Spark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2879 [BUILD] Use HTTPS to access Maven C...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1805#discussion_r15899974 --- Diff: pom.xml --- @@ -143,11 +143,11 @@ - maven-repo + central --- End diff -- any reason we call apache maven "central"? (the old name is confusing too) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15900503 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,13 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + private var valRB: Array[K] = Array() --- End diff -- Can we perhaps make this thread safe? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15919352 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -222,7 +228,8 @@ class RangePartitioner[K : Ordering : ClassTag, V]( } @throws(classOf[IOException]) - private def readObject(in: ObjectInputStream) { + private def readObject(in: ObjectInputStream): Unit = this.synchronized { +if (valRB != null) return --- End diff -- Do we not want to deserialize valRB if it is not null? Are you worried rangeBounds might be called while the deserialization is happening? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1021] Defer the data-driven computation...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1689#discussion_r15919599 --- Diff: core/src/main/scala/org/apache/spark/Partitioner.scala --- @@ -113,8 +113,12 @@ class RangePartitioner[K : Ordering : ClassTag, V]( private var ordering = implicitly[Ordering[K]] // An array of upper bounds for the first (partitions - 1) partitions - private var rangeBounds: Array[K] = { -if (partitions <= 1) { + @volatile private var valRB: Array[K] = null --- End diff -- Any idea on volatile's impact on read performance? rangeBounds is read multiple times in getPartition. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2887] fix bug of countApproxDistinct() ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1812#discussion_r15919681 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1004,7 +1004,7 @@ abstract class RDD[T: ClassTag]( }, (h1: HyperLogLogPlus, h2: HyperLogLogPlus) => { h1.addAll(h2) -h2 +h1 --- End diff -- ah can't believe I missed that --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Branch 1.1
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1824#issuecomment-51434585 What is this pull request for? Do you mind closing it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2894] spark-shell doesn't accept flags
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1825#issuecomment-51434809 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919794 --- Diff: core/src/main/scala/org/apache/spark/HttpServer.scala --- @@ -50,7 +50,7 @@ private[spark] class HttpServer(resourceBase: File, securityManager: SecurityMan if (server != null) { throw new ServerStateException("Server is already started") } else { - logInfo("Starting HTTP Server") + logDebug("Starting HTTP Server") --- End diff -- This logging message is useless to me... maybe we can remove it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919800 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -146,7 +146,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { setViewAcls(defaultAclUsers, sparkConf.get("spark.ui.view.acls", "")) private val secretKey = generateSecretKey() - logInfo("SecurityManager: authentication " + (if (authOn) "enabled" else "disabled") + + logDebug("SecurityManager: authentication " + (if (authOn) "enabled" else "disabled") + --- End diff -- This would be very useful. I'd keep it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919803 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -170,8 +170,8 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { } private[spark] def setViewAcls(defaultUsers: Seq[String], allowedUsers: String) { -viewAcls = (defaultUsers ++ allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet -logInfo("Changing view acls to: " + viewAcls.mkString(",")) +viewAcls = (defaultUsers ++ allowedUsers.split(',')).map(_.trim()).filter(!_.isEmpty).toSet +logDebug("Changing view acls to: " + viewAcls.mkString(",")) --- End diff -- this too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919807 --- Diff: core/src/main/scala/org/apache/spark/SecurityManager.scala --- @@ -179,8 +179,8 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging { } private[spark] def setUIAcls(aclSetting: Boolean) { -uiAclsOn = aclSetting -logInfo("Changing acls enabled to: " + uiAclsOn) +uiAclsOn = aclSetting +logDebug("Changing acls enabled to: " + uiAclsOn) --- End diff -- and this one --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919824 --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala --- @@ -810,7 +810,7 @@ class SparkContext(config: SparkConf) extends Logging { // Fetch the file locally in case a job is executed using DAGScheduler.runLocally(). Utils.fetchFile(path, new File(SparkFiles.getRootDirectory()), conf, env.securityManager) -logInfo("Added file " + path + " at " + key + " with timestamp " + addedFiles(key)) +logDebug("Added file " + path + " at " + key + " with timestamp " + addedFiles(key)) --- End diff -- I'd keep all the ones in SparkContext in INFO. I think they are kind of important ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919835 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -167,7 +167,7 @@ private[spark] class Executor( SparkEnv.set(env) Thread.currentThread.setContextClassLoader(replClassLoader) val ser = SparkEnv.get.closureSerializer.newInstance() - logInfo(s"Running $taskName (TID $taskId)") + logDebug(s"Running $taskName (TID $taskId)") --- End diff -- let's keep this on. this is only chatty if you run it in single node. in distributed mode, it is very important information to log. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919838 --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala --- @@ -167,7 +167,7 @@ private[spark] class Executor( SparkEnv.set(env) Thread.currentThread.setContextClassLoader(replClassLoader) val ser = SparkEnv.get.closureSerializer.newInstance() - logInfo(s"Running $taskName (TID $taskId)") + logDebug(s"Running $taskName (TID $taskId)") --- End diff -- also the following two --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919852 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -188,7 +188,7 @@ class HadoopRDD[K, V]( val iter = new NextIterator[(K, V)] { val split = theSplit.asInstanceOf[HadoopPartition] - logInfo("Input split: " + split.inputSplit) + logDebug("Input split: " + split.inputSplit) --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919867 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -859,7 +859,7 @@ class DAGScheduler( return } - logInfo("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")") + logDebug("Submitting " + tasks.size + " missing tasks from " + stage + " (" + stage.rdd + ")") --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919860 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -535,12 +535,12 @@ class DAGScheduler( * Cancel a job that is running or waiting in the queue. */ def cancelJob(jobId: Int) { -logInfo("Asked to cancel job " + jobId) +logDebug("Asked to cancel job " + jobId) eventProcessActor ! JobCancelled(jobId) } def cancelJobGroup(groupId: String) { -logInfo("Asked to cancel job group " + groupId) +logDebug("Asked to cancel job group " + groupId) --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919859 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -535,12 +535,12 @@ class DAGScheduler( * Cancel a job that is running or waiting in the queue. */ def cancelJob(jobId: Int) { -logInfo("Asked to cancel job " + jobId) +logDebug("Asked to cancel job " + jobId) --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919873 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1086,7 +1086,7 @@ class DAGScheduler( handleJobCancellation(jobId, s"because Stage $stageId was cancelled") } case None => -logInfo("No active jobs to kill for Stage " + stageId) +logDebug("No active jobs to kill for Stage " + stageId) --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919874 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1117,7 +1117,7 @@ class DAGScheduler( failJobAndIndependentStages(job, s"Job aborted due to stage failure: $reason") } if (dependentJobs.isEmpty) { - logInfo("Ignoring failure of " + failedStage + " because all jobs depending on it are done") + logDebug("Ignoring failure of " + failedStage + " because all jobs depending on it are done") --- End diff -- i'd leave this one as info too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919880 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -431,7 +431,7 @@ private[spark] class TaskSetManager( // a good proxy to task serialization time. // val timeTaken = clock.getTime() - startTime val taskName = s"task ${info.id} in stage ${taskSet.id}" - logInfo("Starting %s (TID %d, %s, %s, %d bytes)".format( + logDebug("Starting %s (TID %d, %s, %s, %d bytes)".format( --- End diff -- this is progress reporting, isn't it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919888 --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala --- @@ -503,7 +503,7 @@ private[spark] class TaskSetManager( isZombie = true } } else { - logInfo("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id + + logDebug("Ignoring task-finished event for " + info.id + " in stage " + taskSet.id + --- End diff -- i would make this info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919916 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -908,7 +908,7 @@ private[spark] class BlockManager( * Remove all blocks belonging to the given broadcast. */ def removeBroadcast(broadcastId: Long, tellMaster: Boolean): Int = { -logInfo(s"Removing broadcast $broadcastId") +logDebug(s"Removing broadcast $broadcastId") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919915 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -898,7 +898,7 @@ private[spark] class BlockManager( */ def removeRdd(rddId: Int): Int = { // TODO: Avoid a linear scan by creating another mapping of RDD.id to blocks. -logInfo(s"Removing RDD $rddId") +logDebug(s"Removing RDD $rddId") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919909 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -518,12 +518,12 @@ private[spark] class BlockManager( def get(blockId: BlockId): Option[BlockResult] = { val local = getLocal(blockId) if (local.isDefined) { - logInfo(s"Found block $blockId locally") + logDebug(s"Found block $blockId locally") return local } val remote = getRemote(blockId) if (remote.isDefined) { - logInfo(s"Found block $blockId remotely") + logDebug(s"Found block $blockId remotely") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919913 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -856,7 +856,7 @@ private[spark] class BlockManager( // Drop to disk, if storage level requires if (level.useDisk && !diskStore.contains(blockId)) { - logInfo(s"Writing block $blockId to disk") + logDebug(s"Writing block $blockId to disk") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919907 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -518,12 +518,12 @@ private[spark] class BlockManager( def get(blockId: BlockId): Option[BlockResult] = { val local = getLocal(blockId) if (local.isDefined) { - logInfo(s"Found block $blockId locally") + logDebug(s"Found block $blockId locally") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919922 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala --- @@ -38,14 +38,14 @@ class BlockManagerMaster(var driverActor: ActorRef, conf: SparkConf) extends Log /** Remove a dead executor from the driver actor. This is only called on the driver side. */ def removeExecutor(execId: String) { tell(RemoveExecutor(execId)) -logInfo("Removed " + execId + " successfully in removeExecutor") +logDebug("Removed " + execId + " successfully in removeExecutor") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919918 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -920,7 +920,7 @@ private[spark] class BlockManager( * Remove a block from both memory and disk. */ def removeBlock(blockId: BlockId, tellMaster: Boolean = true): Unit = { -logInfo(s"Removing block $blockId") +logDebug(s"Removing block $blockId") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919910 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -837,7 +837,7 @@ private[spark] class BlockManager( blockId: BlockId, data: Either[Array[Any], ByteBuffer]): Option[BlockStatus] = { -logInfo(s"Dropping block $blockId from memory") +logDebug(s"Dropping block $blockId from memory") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919930 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala --- @@ -38,14 +38,14 @@ class BlockManagerMaster(var driverActor: ActorRef, conf: SparkConf) extends Log /** Remove a dead executor from the driver actor. This is only called on the driver side. */ def removeExecutor(execId: String) { tell(RemoveExecutor(execId)) -logInfo("Removed " + execId + " successfully in removeExecutor") +logDebug("Removed " + execId + " successfully in removeExecutor") } /** Register the BlockManager's id with the driver. */ def registerBlockManager(blockManagerId: BlockManagerId, maxMemSize: Long, slaveActor: ActorRef) { -logInfo("Trying to register BlockManager") +logDebug("Trying to register BlockManager") tell(RegisterBlockManager(blockManagerId, maxMemSize, slaveActor)) -logInfo("Registered BlockManager") +logDebug("Registered BlockManager") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919933 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala --- @@ -57,7 +57,7 @@ class BlockManagerMaster(var driverActor: ActorRef, conf: SparkConf) extends Log tachyonSize: Long): Boolean = { val res = askDriverWithReply[Boolean]( UpdateBlockInfo(blockManagerId, blockId, storageLevel, memSize, diskSize, tachyonSize)) -logInfo("Updated info of block " + blockId) +logDebug("Updated info of block " + blockId) --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919929 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala --- @@ -38,14 +38,14 @@ class BlockManagerMaster(var driverActor: ActorRef, conf: SparkConf) extends Log /** Remove a dead executor from the driver actor. This is only called on the driver side. */ def removeExecutor(execId: String) { tell(RemoveExecutor(execId)) -logInfo("Removed " + execId + " successfully in removeExecutor") +logDebug("Removed " + execId + " successfully in removeExecutor") } /** Register the BlockManager's id with the driver. */ def registerBlockManager(blockManagerId: BlockManagerId, maxMemSize: Long, slaveActor: ActorRef) { -logInfo("Trying to register BlockManager") +logDebug("Trying to register BlockManager") --- End diff -- i'd leave this as info --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP] [SPARK-2655] Change logging level from I...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/1776#discussion_r15919942 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -69,7 +69,7 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus def receive = { case RegisterBlockManager(blockManagerId, maxMemSize, slaveActor) => - logInfo("received a register") + logDebug("received a register") --- End diff -- i'd remove this and maybe just keep a general logDebug to log all the messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org