Re: Query _ Sqoop on EMR

Szabolcs Vasas Fri, 10 Feb 2017 05:17:09 -0800

Hi Sneh,

I have tried the same scenario with S3 and got the following error:


17/02/10 04:09:04 ERROR tool.ImportTool: Imported Failed: Wrong FS:
s3a://..., expected: hdfs://...

I guess this is the same you have seen on your side. I have debugged the
code and this error comes from
the org.apache.sqoop.tool.ImportTool#initIncrementalConstraints method but
I can't see a straightforward solution to it. Hadoop itself supports S3 and
that is why some scenarios with S3 work but Sqoop does not have a full S3
support.
If you think you can create a new feature request a JIRA I think the
community will work more on S3 related features in the future.

Regards,
Szabolcs

On Mon, Feb 6, 2017 at 9:52 PM, Anna Szonyi <[email protected]> wrote:

> Hi Sneh,
>
> Currently the exclude tables is implemented with contains on the array of
> tables:
> for (String tableName : tables) {
>
> if (excludes.contains(tableName)) {
>
> System.out.println("Skipping table: " + tableName);
>
> }
> ...
>
> So it currently doesn't work, however adding support for some sort of
> wildcard wouldn't be too difficult.
> If this is something you need, it might make sense to create a jira
> <https://issues.apache.org/jira/browse/SQOOP/> for it, with your usecase.
>
> Thanks,
> Anna
>
>
>
> On Sun, Feb 5, 2017 at 8:34 PM, Sneh <[email protected]>
> wrote:
>
>> Hi Liz,
>>
>> I tried running the following command (create a job and then exec) to
>> incremental fetch data to S3 (on AWS EMR cluster with EMRFS consistent
>> view).
>> sqoop job --create incre_reservation -- import --connect
>> "jdbc:postgresql://rds-replica-hmssync.XXX.rds.amazonaws.com/hms"
>> --username XXX --password XXX --table reservationbooking --incremental
>> lastmodified --check-column modified_at --target-dir
>> "s3://platform-poc/sqoop/reservation/incre"
>>
>> The error which I get says that FS should be HDFS and not S3.
>> I came up with *alternate* approach to "delta fetch" the data to HDFS
>> and then run merge command.
>>
>> I wanted to check if the "hop" to HDFS can be saved and direct merge
>> could happen at S3.
>>
>> I got an another question, unrelated to the above:
>> -> Is there a way I can use wildcards to exclude tables (without
>> specifying the exact table names) while importing all the tables?
>>
>> Thanks for your time!
>>
>>
>> Wishes,
>> Sneh
>> 8884383482 <(888)%20438-3482>
>>
>> On Fri, Feb 3, 2017 at 5:24 PM, Erzsebet Szilagyi <
>> [email protected]> wrote:
>>
>>> Hi Sneh,
>>> Could you give us a sample command that you are trying to run?
>>> Thanks,
>>> Liz
>>>
>>> On Thu, Jan 19, 2017 at 1:36 PM, Sneh <[email protected]>
>>> wrote:
>>>
>>>> Dear Sqoop users,
>>>>
>>>> I've spawned an EMR cluster with Sqoop 1.4.6 and trying to "increment
>>>> fetch" data from RDS to S3.
>>>> The error I get is that FS should be HDFS and not S3.
>>>>
>>>> My EMR cluster is enabled for EMRFS consistent view.
>>>> I am trying to build a pipeline from RDS to S3. Need help in direction
>>>> to how to proceed when increment Sqoop job is unable to write to S3.
>>>>
>>>> Please help!
>>>>
>>>>
>>>> Wishes,
>>>> Sneh
>>>> 8884383482 <(888)%20438-3482>
>>>>
>>>>
>>>> <https://s3-ap-southeast-1.amazonaws.com/treebo-email/Great+Rates/sign.jpg>
>>>
>>>
>>>
>>>
>>> --
>>> Erzsebet Szilagyi
>>> Software Engineer
>>> [image: www.cloudera.com] <http://www.cloudera.com>
>>>
>>
>>
>>
>> <https://s3-ap-southeast-1.amazonaws.com/treebo-email/Great+Rates/sign.jpg>
>
>
>


-- 
Szabolcs Vasas
Software Engineer
<http://www.cloudera.com>

Re: Query _ Sqoop on EMR

Reply via email to