Re: [Datasource V2] Exception Handling for Catalogs - Naming Suggestions

2020-05-13 Thread Wenchen Fan
This looks a bit specific and maybe it's better to allow catalogs to
customize the error message, which is more general.

On Wed, May 13, 2020 at 12:16 AM Russell Spitzer 
wrote:

> Currently the way some actions work, we receive an error during analysis
> phase. For example, doing a "SELECT * FROM non_existent_table" will return
> an analysis exception as the NoSuchTableException is captured and replaced.
>
> Other actions like the "ShowNamespaceExec" call catalog methods directly
> and directly throw a NoSuchNamsSpaceException
>
> While I don't think the difference here is a big deal, It would be nice if
> we could have just one set of behaviors. My interest here was being able to
> throw custom NoSuchTable and NoSuchNameSpace exceptions
> which contained naming suggestions.
>
> For example
>
> *SELECT * from  catalog.keyspace.tuble*
>
> Could optionally return an analysis exception which suggested other names
> of tables (or keyspaces) which contained like
>
> "Could not find keyspace.tuble, found a near hit: keyspace.table"
>
> While I have the code to do this internally within my own catalog because
> of previous functionality,
>  I cannot surface this information to the user anymore in DSV2 because of
> the way the exceptions are handled.
>
>
> Given this, I'm wondering if we could wrap and rethrow the
> NoSuchExceptions or would it be better for catalogs to support an
> interfaces like
>
>
> trait SupportsTableNamingSuggestions extends TableCatalog {
>   val matchThreshold = 0.7
>   val matcher: (String, String) => Double = JaroWinkler // Or some other
> string comparison algorithm
>
>   /** Given an identifier, return a list of identifiers that it may have
> been mistaken for"
>   default def tableSuggestions(missingIdent: Identifier): Seq[Ident] {
> for (table <- getTables(ident.namespace) if
> matcher(missingIdent.table, ident.table) > matchThreshold) yield {
>  table
> }
>   }
> }
> // And another for namespaces
>
> Then on analysis failures, if the catalog supports this trait we could
> supplement the analysis failure with some helpful info. We could leave this
> as a trait for
> those implementations whose listing of tables or namespaces is costly.
>
>
> Thanks for your consideration, this is the kind of feature that I think is
> very useful to end users and we can add with pretty limited cost,
> Russ
>


[Datasource V2] Exception Handling for Catalogs - Naming Suggestions

2020-05-12 Thread Russell Spitzer
Currently the way some actions work, we receive an error during analysis
phase. For example, doing a "SELECT * FROM non_existent_table" will return
an analysis exception as the NoSuchTableException is captured and replaced.

Other actions like the "ShowNamespaceExec" call catalog methods directly
and directly throw a NoSuchNamsSpaceException

While I don't think the difference here is a big deal, It would be nice if
we could have just one set of behaviors. My interest here was being able to
throw custom NoSuchTable and NoSuchNameSpace exceptions
which contained naming suggestions.

For example

*SELECT * from  catalog.keyspace.tuble*

Could optionally return an analysis exception which suggested other names
of tables (or keyspaces) which contained like

"Could not find keyspace.tuble, found a near hit: keyspace.table"

While I have the code to do this internally within my own catalog because
of previous functionality,
 I cannot surface this information to the user anymore in DSV2 because of
the way the exceptions are handled.


Given this, I'm wondering if we could wrap and rethrow the NoSuchExceptions
or would it be better for catalogs to support an interfaces like


trait SupportsTableNamingSuggestions extends TableCatalog {
  val matchThreshold = 0.7
  val matcher: (String, String) => Double = JaroWinkler // Or some other
string comparison algorithm

  /** Given an identifier, return a list of identifiers that it may have
been mistaken for"
  default def tableSuggestions(missingIdent: Identifier): Seq[Ident] {
for (table <- getTables(ident.namespace) if matcher(missingIdent.table,
ident.table) > matchThreshold) yield {
 table
}
  }
}
// And another for namespaces

Then on analysis failures, if the catalog supports this trait we could
supplement the analysis failure with some helpful info. We could leave this
as a trait for
those implementations whose listing of tables or namespaces is costly.


Thanks for your consideration, this is the kind of feature that I think is
very useful to end users and we can add with pretty limited cost,
Russ