Seeing as I'm now depending on this behavior, I nominate that that bug
be upgraded to feature :-)

-Mark

On Fri, Oct 21, 2011 at 1:38 PM, arv...@cloudera.com
<arv...@cloudera.com> wrote:
> Glad it worked Mark!
>> And it looks like you don't have to do a hive import to use it.
> That sounds like a bug to me :)
> Arvind
>
> On Fri, Oct 21, 2011 at 9:41 AM, Mark Roddy <markro...@gmail.com> wrote:
>>
>> Thanks for the help Arvind.  The hive-drop-import-delims worked.  And
>> it looks like you don't have to do a hive import to use it.
>>
>> -Mark
>>
>>
>> On Fri, Oct 21, 2011 at 11:43 AM, Arvind Prabhakar <arv...@apache.org>
>> wrote:
>> > One work around worth trying is to use the "--hive-drop-import-delims"
>> > option and do a hive import. With this option set, Sqoop will remove
>> > any new lines or ^A characters which are the default delimiters used
>> > for Hive. After the import is done, you could copy the file out of
>> > Hive directly and use it in your application.
>> >
>> > Arvind
>> >
>> > On Fri, Oct 21, 2011 at 7:05 AM, Mark Roddy <markro...@gmail.com> wrote:
>> >> I used "--escaped-by \\" due to bash, so that "\" would be the escape
>> >> character used.  That works fine, I end up with \n and \t characters
>> >> escaped by '\'.
>> >>
>> >>
>> >> To put the problem more concretely, I have a singe record from the db
>> >> with a field containing the following value:
>> >> "foo
>> >> bar baz
>> >> biz"
>> >>
>> >> Sqoop will spit out:
>> >> "foo\
>> >> bar baz\
>> >> biz"
>> >>
>> >>
>> >> No if I run a map reduce job on this with the TextInputFormat, the
>> >> record will be terminated after "foo" not after "biz".  I did a little
>> >> digging and TextInputFormat uses LineRecordReader, which uses
>> >> LineReader which looking at the source, clearly does not honor the
>> >> escape char.  Is there a tool/input format/etc that will read from
>> >> HDFS and honor this?  It does not seem that M/R can do it out of the
>> >> box.  I can't find a way to get Pig.  I assume there must be something
>> >> that will honor the escape, but can not find anything.
>> >>
>> >>
>> >>
>> >> On Fri, Oct 21, 2011 at 5:26 AM, Alexander C.H. Lorenz
>> >> <wget.n...@googlemail.com> wrote:
>> >>> Hi Mark,
>> >>> --escaped-by \/ (backslash - slash) tells bash to escape the next
>> >>> character.
>> >>> (if I understood you right)
>> >>> - Alex
>> >>> On Fri, Oct 21, 2011 at 12:12 AM, Mark Roddy <markro...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> I'm moving free form data out of a RDBMS that has a lot of \n, \r\n,
>> >>>> and \t characters.
>> >>>>
>> >>>> I used "--escaped-by \\" (extra \ cause of bash), but I'm a little
>> >>>> confused about what to do with this data now.  I can't seem to find
>> >>>> any tools that will honor the '\' escape char.  TextInputFormat does
>> >>>> not seem to.
>> >>>>
>> >>>> I'm working on replacing an existing in house tool w/sqoop that
>> >>>> replace newlines with the literal string '\n'.  I'd be happy to do as
>> >>>> such but I don't see any way of doing so.
>> >>>>
>> >>>> I'm sure I'm not the first person to run into this so I appreciate
>> >>>> any
>> >>>> suggestions.
>> >>>>
>> >>>> -Mark
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Alexander Lorenz
>> >>> http://mapredit.blogspot.com
>> >>>
>> >>>
>> >>
>> >
>
>

Reply via email to