[GitHub] spark pull request #19224: [SPARK-20990][SQL] Read all JSON documents in fil...

attilapiros Tue, 16 Jan 2018 11:03:07 -0800

Github user attilapiros commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19224#discussion_r161854693
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala
 ---
    @@ -361,3 +361,78 @@ class JacksonParser(
         }
       }
     }
    +
    +object JacksonParser {
    +  private[spark] def splitDocuments(input: InputStream) = new 
Iterator[String] {
    +
    +    private implicit class JsonCharacter(char: Char) {
    +      def isJsonObjectFinished(endToken: Option[Char]): Boolean = {
    +        endToken match {
    +          case None => char == '}' || char == ']'
    +          case Some(x) => char == x
    +        }
    +      }
    +    }
    +    private var currentChar: Char = input.read().toChar
    +    private var previousToken: Option[Char] = None
    +    private var nextRecord = readNext
    +
    +    override def hasNext: Boolean = nextRecord.isDefined
    +
    +    override def next(): String = {
    +      if (!hasNext) {
    +        throw new NoSuchElementException("End of stream")
    +      }
    +      val curRecord = nextRecord.get
    +      nextRecord = readNext
    +      curRecord
    +    }
    +
    +    private def moveToNextChar() = {
    +      if (!currentChar.isWhitespace) {
    +        previousToken = Some(currentChar)
    +      }
    +      currentChar = input.read().toChar
    +    }
    +
    +    private def readJsonObject: Option[String] = {
    +      val endToken = currentChar match {
    +        case '{' => Some('}')
    +        case '[' => Some(']')
    +        case _ => None
    +      }
    +
    +      val sb = new StringBuilder()
    +      sb.append(currentChar)
    +      while (!currentChar.isJsonObjectFinished(endToken) && 
input.available() > 0) {
    +        moveToNextChar()
    +        currentChar match {
    +          case '{' | '[' =>
    --- End diff --
    
    The problem I presented I think can ne be solved with extending your 
solution, but as I see you could have more problems in your mind. It would be 
nice to collect all the cases where the splitDocuments should work and how. Do 
you have time for this? Then maybe we can find the solution. I would be glad to 
help by reviewing and thinking together or solving parts of the problem.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #19224: [SPARK-20990][SQL] Read all JSON documents in fil...

Reply via email to