[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249372411
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 ##
 @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   children.last.generateTreeString(
 depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, 
maxFields)
 }
+append("")
 
 Review comment:
   Moreover, if you are appending empty strings all over the place then it 
would be good to make `append` drop these.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249371798
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -92,31 +95,57 @@ object StringUtils {
 
   /**
* Concatenation of sequence of strings to final string with cheap append 
method
-   * and one memory allocation for the final string.
+   * and one memory allocation for the final string.  Can also bound the final 
size of
+   * the string.
*/
-  class StringConcat {
+  class StringConcat(val maxLength: Int = Integer.MAX_VALUE) {
 private val strings = new ArrayBuffer[String]
 private var length: Int = 0
 
+def atLimit: Boolean = length >= maxLength
+
 /**
  * Appends a string and accumulates its length to allocate a string buffer 
for all
- * appended strings once in the toString method.
+ * appended strings once in the toString method.  Returns true if the 
string still
+ * has room for further appends before it hits its max limit.
  */
-def append(s: String): Unit = {
-  if (s != null) {
+def append(s: String): Boolean = {
+  if (!atLimit && s != null) {
 strings.append(s)
 length += s.length
   }
+  return !atLimit
 }
 
 /**
  * The method allocates memory for all appended strings, writes them to 
the memory and
  * returns concatenated string.
  */
 override def toString: String = {
-  val result = new java.lang.StringBuilder(length)
-  strings.foreach(result.append)
+  val finalLength = Math.min(length, maxLength)
 
 Review comment:
   Two things:
   1. You could also avoid the complexity and just log the last string here. 
   2. If we go down this path, can you make sure the methods you are using are 
not making copies for the `strings` buffer. I am reasonably sure that 
`dropRight` does this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249370703
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1625,6 +1625,14 @@ object SQLConf {
 .intConf
 .createWithDefault(25)
 
+  val MAX_PLAN_STRING_LENGTH = buildConf("spark.sql.maxPlanLength")
+.doc("Maximum number of characters to output for a plan string.  If the 
plan is " +
+  "longer, further output will be truncated.  The default setting always 
generates a full " +
+  "plan.  Set this to a lower value such as 8192 if plan strings are 
taking up too much " +
+  "memory or are causing OutOfMemory errors in the driver or UI 
processes.")
+.intConf
 
 Review comment:
   Can you check that the value set is > 0? You can use the `checkValue(i => i 
> 0 && i <= ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH, "...")` function for 
this.
   
   Also use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead of 
`Int.MaxValue` as the default here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249370703
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
 ##
 @@ -1625,6 +1625,14 @@ object SQLConf {
 .intConf
 .createWithDefault(25)
 
+  val MAX_PLAN_STRING_LENGTH = buildConf("spark.sql.maxPlanLength")
+.doc("Maximum number of characters to output for a plan string.  If the 
plan is " +
+  "longer, further output will be truncated.  The default setting always 
generates a full " +
+  "plan.  Set this to a lower value such as 8192 if plan strings are 
taking up too much " +
+  "memory or are causing OutOfMemory errors in the driver or UI 
processes.")
+.intConf
 
 Review comment:
   Can you check that the value set is > 0? You can use the `checkValue(_ > 0, 
"...")` function for this.
   
   Also use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead of 
`Int.MaxValue` as the default here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249369278
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 ##
 @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   children.last.generateTreeString(
 depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, 
maxFields)
 }
+append("")
 
 Review comment:
   Are you using the boolean returned by the append for control flow anywhere?
   
   Let's remove it for the initial PR, we can always address this in  a 
follow-up.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249369278
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala
 ##
 @@ -591,6 +591,7 @@ abstract class TreeNode[BaseType <: TreeNode[BaseType]] 
extends Product {
   children.last.generateTreeString(
 depth + 1, lastChildren :+ true, append, verbose, prefix, addSuffix, 
maxFields)
 }
+append("")
 
 Review comment:
   Are you using the boolean returned by the append for control flow anywhere?
   
   Let's remove it for this PR, we can always address this in  a follow-up.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-21 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r249369346
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala
 ##
 @@ -92,31 +95,57 @@ object StringUtils {
 
   /**
* Concatenation of sequence of strings to final string with cheap append 
method
-   * and one memory allocation for the final string.
+   * and one memory allocation for the final string.  Can also bound the final 
size of
+   * the string.
*/
-  class StringConcat {
+  class StringConcat(val maxLength: Int = Integer.MAX_VALUE) {
 
 Review comment:
   You generally cannot allocate an array of `Integer.MAX_VALUE` elements. Can 
you use `ByteArrayMethods.MAX_ROUNDED_ARRAY_LENGTH` instead.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248838363
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
 ##
 @@ -204,6 +204,27 @@ package object util extends Logging {
 truncatedString(seq, "", sep, "", maxFields)
   }
 
+  /** Whether we have warned about plan string truncation yet. */
+  private val planSizeWarningPrinted = new AtomicBoolean(false)
+
+  def withSizeLimitedWriter[T](writer: Writer)(f: (Writer) => T): Option[T] = {
 
 Review comment:
   Where is this being used? It is not being used for any of the actual plan 
logging right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248831742
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.io.Writer
+
+class WriterSizeException(val extraChars: Long, val charLimit: Long) extends 
Exception(
+  s"Writer reached limit of $charLimit characters.  $extraChars extra 
characters ignored.")
+
+/**
+ * This class is used to control the size of generated writers.  Guarantees 
that the total number
+ * of characters written will be less than the specified size.
+ *
+ * Checks size before writing and throws a WriterSizeException if the total 
size would count the
+ * limit.
+ */
+class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer {
+
+  private var charsWritten: Long = 0
+
+  override def write(cbuf: Array[Char], off: Int, len: Int): Unit = {
+val charsToWrite = Math.min(charLimit - charsWritten, len).toInt
+underlying.write(cbuf, off, charsToWrite)
+charsWritten += charsToWrite
+if (charsToWrite < len) {
+  throw new WriterSizeException(len - charsToWrite, charLimit)
 
 Review comment:
   Please don't use exceptions for control flow it is confusing and is likely 
to be broken by intermediate exception handlers. Can we do something a bit 
better here? One option would be to have the append function return 
`true`/`false` depending on whether we appended or not. Another one would be to 
just have `treeString(..)` run its course without appending anything else.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248836825
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala
 ##
 @@ -112,6 +113,14 @@ class QueryExecution(
 concat.toString
   }
 
+//  private def writeOrError(writer: Writer)(f: Writer => Unit): Unit = {
 
 Review comment:
What is this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248831742
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.io.Writer
+
+class WriterSizeException(val extraChars: Long, val charLimit: Long) extends 
Exception(
+  s"Writer reached limit of $charLimit characters.  $extraChars extra 
characters ignored.")
+
+/**
+ * This class is used to control the size of generated writers.  Guarantees 
that the total number
+ * of characters written will be less than the specified size.
+ *
+ * Checks size before writing and throws a WriterSizeException if the total 
size would count the
+ * limit.
+ */
+class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer {
+
+  private var charsWritten: Long = 0
+
+  override def write(cbuf: Array[Char], off: Int, len: Int): Unit = {
+val charsToWrite = Math.min(charLimit - charsWritten, len).toInt
+underlying.write(cbuf, off, charsToWrite)
+charsWritten += charsToWrite
+if (charsToWrite < len) {
+  throw new WriterSizeException(len - charsToWrite, charLimit)
 
 Review comment:
   Please don't use exceptions for control flow it is confusing and is likely 
to be broken by intermediate exception handlers. Can we do something a bit 
better here? One option would be to have the append function return 
`true`/`false` depending on whether we appended or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248829460
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.io.Writer
+
+class WriterSizeException(val extraChars: Long, val charLimit: Long) extends 
Exception(
 
 Review comment:
   Can we try to avoid defining a custom exception. This is pretty bad in 
combination with python. Lets use a SparkException here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans

2019-01-17 Thread GitBox
hvanhovell commented on a change in pull request #23169: [SPARK-26103][SQL] 
Limit the length of debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#discussion_r248829491
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/SizeLimitedWriter.scala
 ##
 @@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.catalyst.util
+
+import java.io.Writer
+
+class WriterSizeException(val extraChars: Long, val charLimit: Long) extends 
Exception(
+  s"Writer reached limit of $charLimit characters.  $extraChars extra 
characters ignored.")
+
+/**
+ * This class is used to control the size of generated writers.  Guarantees 
that the total number
+ * of characters written will be less than the specified size.
+ *
+ * Checks size before writing and throws a WriterSizeException if the total 
size would count the
+ * limit.
+ */
+class SizeLimitedWriter(underlying: Writer, charLimit: Long) extends Writer {
 
 Review comment:
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org