[GitHub] spark pull request: [SPARK-2135][SQL] Use planner for in-memory sc...

liancheng Thu, 12 Jun 2014 19:37:18 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1072#discussion_r13736713
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
 ---
    @@ -55,14 +66,26 @@ private[sql] case class InMemoryColumnarTableScan(
         cached.count()
         cached
       }
    +}
    +
    +private[sql] case class InMemoryColumnarTableScan(
    +    attributes: Seq[Attribute],
    +    relation: InMemoryRelation)
    +  extends LeafNode {
    +
    +  override def output: Seq[Attribute] = attributes
     
       override def execute() = {
    -    cachedColumnBuffers.mapPartitions { iterator =>
    +    relation.cachedColumnBuffers.mapPartitions { iterator =>
           val columnBuffers = iterator.next()
           assert(!iterator.hasNext)
     
           new Iterator[Row] {
    -        val columnAccessors = columnBuffers.map(ColumnAccessor(_))
    +        // Find the ordinals of the requested columns.  If none are 
requested, use the first.
    +        val requestedColumns =
    +          if (attributes.isEmpty) Seq(0) else 
attributes.map(relation.output.indexOf(_))
    --- End diff --
    
    I'm not sure if I understand this correctly: is it because we don't know 
the count of all the rows that we must scan at least 1 column here? If so, I 
think we can simply record the count while building the in-memory columnar byte 
array to avoid scanning any columns.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2135][SQL] Use planner for in-memory sc...

Reply via email to