Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/19651#discussion_r154408356
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFilters.scala
---
@@ -0,0 +1,210 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.orc
+
+import org.apache.orc.storage.ql.io.sarg.{PredicateLeaf, SearchArgument,
SearchArgumentFactory}
+import org.apache.orc.storage.ql.io.sarg.SearchArgument.Builder
+import org.apache.orc.storage.serde2.io.HiveDecimalWritable
+
+import org.apache.spark.sql.sources.Filter
+import org.apache.spark.sql.types._
+
+/**
+ * Helper object for building ORC `SearchArgument`s, which are used for
ORC predicate push-down.
+ *
+ * Due to limitation of ORC `SearchArgument` builder, we had to end up
with a pretty weird double-
+ * checking pattern when converting `And`/`Or`/`Not` filters.
+ *
+ * An ORC `SearchArgument` must be built in one pass using a single
builder. For example, you can't
+ * build `a = 1` and `b = 2` first, and then combine them into `a = 1 AND
b = 2`. This is quite
+ * different from the cases in Spark SQL or Parquet, where complex filters
can be easily built using
+ * existing simpler ones.
+ *
+ * The annoying part is that, `SearchArgument` builder methods like
`startAnd()`, `startOr()`, and
+ * `startNot()` mutate internal state of the builder instance. This
forces us to translate all
+ * convertible filters with a single builder instance. However, before
actually converting a filter,
+ * we've no idea whether it can be recognized by ORC or not. Thus, when an
inconvertible filter is
+ * found, we may already end up with a builder whose internal state is
inconsistent.
+ *
+ * For example, to convert an `And` filter with builder `b`, we call
`b.startAnd()` first, and then
+ * try to convert its children. Say we convert `left` child successfully,
but find that `right`
+ * child is inconvertible. Alas, `b.startAnd()` call can't be rolled
back, and `b` is inconsistent
+ * now.
+ *
+ * The workaround employed here is that, for `And`/`Or`/`Not`, we first
try to convert their
+ * children with brand new builders, and only do the actual conversion
with the right builder
+ * instance when the children are proven to be convertible.
+ *
+ * P.S.: Hive seems to use `SearchArgument` together with
`ExprNodeGenericFuncDesc` only. Usage of
+ * builder methods mentioned above can only be found in test code, where
all tested filters are
+ * known to be convertible.
+ */
+private[orc] object OrcFilters {
--- End diff --
Yes. It's logically the same with old version. Only API usage is updated
here.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]