[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249372411 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { children.last.generateTreeString( depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, maxFields) } +append("") Review comment: Moreover, if you are appending empty strings all over the place then it would be good to make `append` drop these. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249371798 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -92,31 +95,57 @@ object StringUtils { /** * Concatenation of sequence of strings to final string with cheap append method - * and one memory allocation for the final string. + * and one memory allocation for the final string. Can also bound the final size of + * the string. */ - class StringConcat { + class StringConcat(val maxLength: Int = Integer.MAX_VALUE) { private val strings = new ArrayBuffer[String] private var length: Int = 0 +def atLimit: Boolean = length >= maxLength + /** * Appends a string and accumulates its length to allocate a string buffer for all - * appended strings once in the toString method. + * appended strings once in the toString method. Returns true if the string still + * has room for further appends before it hits its max limit. */ -def append(s: String): Unit = { - if (s != null) { +def append(s: String): Boolean = { + if (!atLimit && s != null) { strings.append(s) length += s.length } + return !atLimit } /** * The method allocates memory for all appended strings, writes them to the memory and * returns concatenated string. */ override def toString: String = { - val result = new java.lang.StringBuilder(length) - strings.foreach(result.append) + val finalLength = Math.min(length, maxLength) Review comment: Two things: 1. You could also avoid the complexity and just log the last string here. 2. If we go down this path, can you make sure the methods you are using are not making copies for the `strings` buffer. I am reasonably sure that `dropRight` does this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249370703 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1625,6 +1625,14 @@ object SQLConf { .intConf .createWithDefault(25) + val MAX_PLAN_STRING_LENGTH = buildConf("spark.sql.maxPlanLength") +.doc("Maximum number of characters to output for a plan string. If the plan is " + + "longer, further output will be truncated. The default setting always generates a full " + + "plan. Set this to a lower value such as 8192 if plan strings are taking up too much " + + "memory or are causing OutOfMemory errors in the driver or UI processes.") +.intConf Review comment: Can you check that the value set is > 0? You can use the `checkValue(i => i > 0 && i <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH, "...")` function for this. Also use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead of `Int.MaxValue` as the default here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249370703 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ## @@ -1625,6 +1625,14 @@ object SQLConf { .intConf .createWithDefault(25) + val MAX_PLAN_STRING_LENGTH = buildConf("spark.sql.maxPlanLength") +.doc("Maximum number of characters to output for a plan string. If the plan is " + + "longer, further output will be truncated. The default setting always generates a full " + + "plan. Set this to a lower value such as 8192 if plan strings are taking up too much " + + "memory or are causing OutOfMemory errors in the driver or UI processes.") +.intConf Review comment: Can you check that the value set is > 0? You can use the `checkValue(_ > 0, "...")` function for this. Also use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead of `Int.MaxValue` as the default here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249369278 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { children.last.generateTreeString( depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, maxFields) } +append("") Review comment: Are you using the boolean returned by the append for control flow anywhere? Let's remove it for the initial PR, we can always address this in a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249369278 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala ## @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] extends Product { children.last.generateTreeString( depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, maxFields) } +append("") Review comment: Are you using the boolean returned by the append for control flow anywhere? Let's remove it for this PR, we can always address this in a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r249369346 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala ## @@ -92,31 +95,57 @@ object StringUtils { /** * Concatenation of sequence of strings to final string with cheap append method - * and one memory allocation for the final string. + * and one memory allocation for the final string. Can also bound the final size of + * the string. */ - class StringConcat { + class StringConcat(val maxLength: Int = Integer.MAX_VALUE) { Review comment: You generally cannot allocate an array of `Integer.MAX_VALUE` elements. Can you use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248838363 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala ## @@ -204,6 +204,27 @@ package object util extends Logging { truncatedString(seq, "", sep, "", maxFields) } + /** Whether we have warned about plan string truncation yet. */ + private val planSizeWarningPrinted = new AtomicBoolean(false) + + def withSizeLimitedWriter[T](writer: Writer)(f: (Writer) => T): Option[T] = { Review comment: Where is this being used? It is not being used for any of the actual plan logging right? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248831742 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import java.io.Writer + +class WriterSizeException(val extraChars: Long, val charLimit: Long) extends Exception( + s"Writer reached limit of $charLimit characters. $extraChars extra characters ignored.") + +/** + * This class is used to control the size of generated writers. Guarantees that the total number + * of characters written will be less than the specified size. + * + * Checks size before writing and throws a WriterSizeException if the total size would count the + * limit. + */ +class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer { + + private var charsWritten: Long = 0 + + override def write(cbuf: Array[Char], off: Int, len: Int): Unit = { +val charsToWrite = Math.min(charLimit - charsWritten, len).toInt +underlying.write(cbuf, off, charsToWrite) +charsWritten += charsToWrite +if (charsToWrite < len) { + throw new WriterSizeException(len - charsToWrite, charLimit) Review comment: Please don't use exceptions for control flow it is confusing and is likely to be broken by intermediate exception handlers. Can we do something a bit better here? One option would be to have the append function return `true`/`false` depending on whether we appended or not. Another one would be to just have `treeString(..)` run its course without appending anything else. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248836825 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ## @@ -112,6 +113,14 @@ class QueryExecution( concat.toString } +// private def writeOrError(writer: Writer)(f: Writer => Unit): Unit = { Review comment: What is this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248831742 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import java.io.Writer + +class WriterSizeException(val extraChars: Long, val charLimit: Long) extends Exception( + s"Writer reached limit of $charLimit characters. $extraChars extra characters ignored.") + +/** + * This class is used to control the size of generated writers. Guarantees that the total number + * of characters written will be less than the specified size. + * + * Checks size before writing and throws a WriterSizeException if the total size would count the + * limit. + */ +class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer { + + private var charsWritten: Long = 0 + + override def write(cbuf: Array[Char], off: Int, len: Int): Unit = { +val charsToWrite = Math.min(charLimit - charsWritten, len).toInt +underlying.write(cbuf, off, charsToWrite) +charsWritten += charsToWrite +if (charsToWrite < len) { + throw new WriterSizeException(len - charsToWrite, charLimit) Review comment: Please don't use exceptions for control flow it is confusing and is likely to be broken by intermediate exception handlers. Can we do something a bit better here? One option would be to have the append function return `true`/`false` depending on whether we appended or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248829460 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import java.io.Writer + +class WriterSizeException(val extraChars: Long, val charLimit: Long) extends Exception( Review comment: Can we try to avoid defining a custom exception. This is pretty bad in combination with python. Lets use a SparkException here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#discussion_r248829491 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala ## @@ -0,0 +1,48 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.util + +import java.io.Writer + +class WriterSizeException(val extraChars: Long, val charLimit: Long) extends Exception( + s"Writer reached limit of $charLimit characters. $extraChars extra characters ignored.") + +/** + * This class is used to control the size of generated writers. Guarantees that the total number + * of characters written will be less than the specified size. + * + * Checks size before writing and throws a WriterSizeException if the total size would count the + * limit. + */ +class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer { Review comment: +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org