Hi Nitin,
Thanks for your reply, we were in an impression that the codec will be
responsible for ORC format conversion also.
However as per your reply it seems that a conversion from normal CSV to ORC is
required before hive upload.
We got some leads from following URLs
Keshav,
Owen has provided the solution already. Thats the easiest of the the lot
and from the master who wrote ORC himself :)
to put it in simple words what he has suggested is,
create a staging table which will be based on default text data format.
From the staging data load data into a ORC
Hi,
I have a file which is delimted by a tab. Also, there are some fields in the
file which has a tab /t character and a new line /n character in some fields.
Is there any way to load this file using Hive load command? Or do i have to use
a Custom Map Reduce (custom) Input format with java ?
If your data contains new line chars, its better you write a custom map
reduce job and convert the data into a single line removing all unwanted
chars in column separator as well just having single new line char per line
On Sat, Sep 21, 2013 at 12:38 AM, Raj Hadoop hadoop...@yahoo.com wrote:
Please note that there is an escape chacter in the fields where the /t and /n
are present.
From: Raj Hadoop hadoop...@yahoo.com
To: Hive user@hive.apache.org
Sent: Friday, September 20, 2013 3:04 PM
Subject: How to load /t /n file to Hive
Hi,
I have a file
Hi Nitin,
Thanks for the reply. I have a huge file in unix.
As per the file definition, the file is a tab separated file of fields. But I
am sure that within some field's I have some new line character.
How should I find a record? It is a huge file. Is there some command?
Thanks,
Hi
One way that we used to solve that problem it's to transform the data when
you are creating/loading it, for example we've applied UrlEncode to each
field on create time.
Thanks,
Gabo.
2013/9/20 Raj Hadoop hadoop...@yahoo.com
Hi Nitin,
Thanks for the reply. I have a huge file in unix.
We have a small (3GB /280M rows) table with 435 partitions that is highly
skewed: one partition has nearly 200M, two others have nearly 40M apiece,
then the remaining 432 have all together less than 1% of total table size.
So .. the skew is something to be addressed. However - even give that -
Another detail: ~400 mappers 64 reducers
2013/9/20 Stephen Boesch java...@gmail.com
We have a small (3GB /280M rows) table with 435 partitions that is highly
skewed: one partition has nearly 200M, two others have nearly 40M apiece,
then the remaining 432 have all together less than 1%
Hi Gabo,
Are you suggesting to use java.net.URLEncoder ? Can you be more specific ? I
have lot of fields in the file which are not only URL related but some text
fields which has new line characters.
Thanks,
Raj
From: Gabriel Eisbruch
Hi Raj,
UrlEncode It's a good way to encode data and be sure that you will encode
all special chars (as example \n will be encoded to %0A) It's not necessary
that the field be and url to encode these (you could use other encoder but,
we had very good results with UrlEncoder, ever the best way
11 matches
Mail list logo