Weston Pace created ARROW-16410:
-----------------------------------

             Summary: [C++] Scanner -> ScanNode
                 Key: ARROW-16410
                 URL: https://issues.apache.org/jira/browse/ARROW-16410
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Weston Pace
            Assignee: Weston Pace


This is going to be a parent issue for a number of changes I'd like to make to 
the scanner.

 * I'd like to remove the concept of the scanner from the public API
 * Related to that, ScanOptions will probably break into two classes.  
QueryOptions will be the public facing half while ScanOptions will be options 
sent to the ScanNode.  Most users won't see ScanOptions as it will be internal.
    * For example, QueryOptions will have {{batch_readahead}} which represents 
how many "query engine batches" to readahead.  Since files & the query engine 
have different ideas of what constitutes a "batch" the related property in 
ScanOptions will be {{rows_to_readahead}}.
    * Another example is projection, in QueryOptions projection is column 
selection as well as custom projection expressions that a user wants to run.  
In ScanOptions "projection" is the desired list of columns and the output type 
for each column, which controls casting and inference.
 * Partially related (and partially unrelated) to the above two items I would 
like to move the scanner away from AsyncGenerator and recast it as an execution 
engine node.
 * The {{Scanner}} class will become deprecated and eventually go away.  Some 
methods like {{Scanner::ToTable}} may move into a new {{QueryBuilder}} or 
{{Query}} object.




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to