Ram, Having comma delimiter in the field is a separate issue on it's own. Barring that, you could use string length and sub string UDFs available in Hive to get rid of the enclosing characters. There is an example of that in the email thread I pointed out earlier.
You can see all available Hive UDFs at: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF Mark On Wed, Nov 28, 2012 at 8:37 AM, Ram Krishnamurthy < [email protected]> wrote: > Issue with option 2 is that the delimiter is also in one of the field. > exa. Charlotte, NC where comma is the delimiter so columns get messy. > Regardless. How would you strip out " in Hive QL? - 2a > > Thanks, > > > On Wed, Nov 28, 2012 at 11:17 AM, Mark Grover <[email protected] > > wrote: > >> Moving to [email protected] since this is an Apache Hive related >> question >> >> Hi Ram, >> Hive doesn't provide a way to do this out of the box. I have seen other >> people have the same request before (and more, e.g. use of escape >> characters to embed the enclosing characters, etc.) and have just created a >> Hive JIRA to discuss and implement this. >> https://issues.apache.org/jira/browse/HIVE-3751 >> >> However, there are several ways I can think of to work around this for >> now. They are: >> 1. Pre-process the files to get rid of the enclosing characters ("). This >> may or may not be scalable depending on your implementation, would be >> fairly simple to implement. >> 2. Create a staging Hive table with enclosing characters as a part of the >> data. Then, you could do one of 3 things with this staging table. >> 2a. Create another Hive table that gets populated by a Hive query on the >> staging table. The query basically strips out the enclosing characters. >> 2b. Create a view on the existing Hive staging table that presents the >> data after removing the enclosing characters. >> 2c. Use the existing RegexSerDe when creating the Hive staging table to >> get rid of the enclosing characters when serializing/deserializing the data >> from the staging table. >> 3. Create a custom informat or SerDe that supports enclosing characters >> in delimited fields. >> >> #3 above will be an answer to the JIRA I just created. So, the community >> would be grateful to you if you go that route and contribute it back. Of >> course, feel free to choose another option if that works better for you. >> >> For your reference, there is a similar thread that I had replied to at: >> http://mail-archives.apache.org/mod_mbox/hive-user/201204.mbox/%3CCAENxBwwrZrqBSJXtJHpqc_FfcZvwRMoaT9W7dR=jgtyjoxp...@mail.gmail.com%3E >> >> Good luck! >> Mark >> >> On Wed, Nov 28, 2012 at 7:25 AM, Ram Krishnamurthy < >> [email protected]> wrote: >> >>> I am having issues loading a delimited csv. here is a sample data >>> "1", "123456", " ", "Charlotte, NC" >>> >>> The delimiter is , >>> and fields are enclosed by " " >>> >>> Does anyone have syntax for loading this in Hive? >>> >>> Thanks, Ram >>> >>> >>> >> > > > -- > Thanks, > *Ram Krishnamurthy* > [email protected] > *Cell: 704-953-8125* > > > >
