[
https://issues.apache.org/jira/browse/SQOOP-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128536#comment-13128536
]
Aaron Kimball commented on SQOOP-365:
-------------------------------------
This proposal looks like a good start! Here are some questions I have about it:
* One of the main advantages of Sqoop in it's current form is its ease of
deployment by end-users. Like Pig, it can be installed on a client machine
without burdening cluster operators.
** How will we maintain this ease-of-deployment in the face of the web-based
app? Can/will Sqoop come with a self-contained server (e.g. Jetty?) to support
'localhost' execution of the web app?
** I like the idea of pre-defined connections. But will Sqoop still support the
ability to use the existing 'ad hoc' connection mechanism? For users who
already have a username/password they can use to connect to a database, it may
be useful for them to get started easily with their existing credentials,
without requiring an operator to configure a connection.
* Many production deployments count on running Sqoop in commnad-line mode using
the existing command-line arguments to specify the job. Will Sqoop2 be
backwards-compatible with these arguments?
* How and where does Sqoop store information about Connections, resource
limits, etc?
** How, if at all, do we guard against end-users starting a second Sqoop server
to get around resource limits? Are the resource limits and temporary locking
info, etc, stored in the target database itself? (If so, how do we guard
against stale locks..?)
I also don't believe that it's productive for the command-line client to use
the REST API directly. Starting a server (even on localhost) as a pre-req for
running a command-line tool seems overly complicated to me.
I think a better architecture may be to define a number of Operations
internally. Each Operation can have a programmatic (Java) API that executes it.
Each Operation can also be bound to a REST API endpoint. But this way a user
can still simply run the command-line application without configuring an entire
server. The command-line app would run the Operation directly, as opposed to
running it in the address space of a separate process somewhere. This would
reduce the number of layers of complexity when debugging what goes wrong.
Involving the network (even loopback) where none is needed seems like asking
for trouble.
Finally, on the front of API compatibility: Arvind, in an offline discussion,
we talked about having a separate API package of interfaces that would have
"api level" versioning (a la the Servlet API) that is distinct from the
implementation version. Is that still part of your vision for Sqoop 2? I don't
see it described in this proposal.
I looked through the proposed source layout for this. Without a README
specifying what goes in which directories, it's hard for me to understand what
you're trying to accomplish. What's the "infra" project for?
I think based on what I said above about Operations, etc, there should be a
"libsqoop" project that corresponds to the guts of the project. The "server"
should just be a REST API implementation (perhaps w/ an embedded Jetty server,
but also perhaps deployable as a WAR on a fully-administered Tomcat instance)
that embeds libsqoop to perform the Operations. And the client, similarly, is a
thin command-line-arg parsing shell that embeds libsqoop to perform Operations
directly.
Is infra ~= libsqoop in this idea? Or is that about independent testing of
connectors, etc?
I think there should also be a plugin-api library (libsqoopapi?) which the
connector/*/ projects link against, rather than libsqoop itself. This API would
also be used by third-party SqoopTool implementations.
This document's off to a great start -- this is definitely in line with the
next evolution of Sqoop as a first-class mechanism for getting data into
Hadoop. Looking forward to your answers!
Cheers,
Aaron
> Proposal for next major revision of Sqoop.
> ------------------------------------------
>
> Key: SQOOP-365
> URL: https://issues.apache.org/jira/browse/SQOOP-365
> Project: Sqoop
> Issue Type: Wish
> Reporter: Arvind Prabhakar
> Assignee: Arvind Prabhakar
> Attachments: sqoop2.tar.gz
>
>
> This issue tracks the design and development of the next major revision of
> Sqoop. The proposal has been articulated on the wiki at the following
> location:
> https://cwiki.apache.org/confluence/display/SQOOP/Sqoop+2
> Please review the proposal and add your comments to this JIRA.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira