Look at  the block size concept in Hadoop and see if that is what you are 
looking for 

Sent from my iPhone

On Jan 16, 2013, at 7:31 AM, Kaliyug Antagonist <[email protected]> 
wrote:

> I want to load a SegY file onto HDFS of a 3-node Apache Hadoop cluster.
> 
> To summarize, the SegY file consists of :
> 
> 3200 bytes textual header
> 400 bytes binary header
> Variable bytes data
> The 99.99% size of the file is due to the variable bytes data which is 
> collection of thousands of contiguous traces. For any SegY file to make 
> sense, it must have the textual header+binary header+at least one trace of 
> data. What I want to achieve is to split a large SegY file onto the Hadoop 
> cluster so that a smaller SegY file is available on each node for local 
> processing.
> 
> The scenario is as follows:
> 
> The SegY file is large in size(above 10GB) and is resting on the local file 
> system of the NameNode machine
> The file is to be split on the nodes in such a way each node has a small SegY 
> file with a strict structure - 3200 bytes textual header + 400 bytes binary 
> header + variable bytes dataAs obvious, I can't blindly use 
> FSDataOutputStream or hadoop fs -copyFromLocal as this may not ensure the 
> format in which the chunks of the larger file are required
> Please guide me as to how I must proceed.
> 
> Thanks and regards !

Reply via email to