On Thu, Mar 31, 2016 at 6:42 PM, Arun Patel <arunp.bigd...@gmail.com> wrote:
> [image: Mic Drop] > Since there are millions of files (with sizes from 1mb to 15mb), I would > like to store them in a sequence file. How do I store the location of each > of these files in HBase? > > I see lots blogs and books talking about storing large files on HDFS and > storing file paths on HBase. But, I don't see any real examples. I was > wondering if anybody implemented this in production. > > I don't know of any open implementation that I could point you at. There is some consideration of what would be involved spanning HDFS and HBase in this blog [1]. St.Ack 1. http://blog.cloudera.com/blog/2015/06/inside-apache-hbases-new-support-for-mobs/ > Looking forward for reply from the community experts. Thanks. > > Regards, > Arun > > On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > For #1, please take a look > > at > > > hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java > > > > e.g. the following methods: > > > > public DFSInputStream open(String src) throws IOException { > > > > public HdfsDataOutputStream append(final String src, final int > > buffersize, > > > > EnumSet<CreateFlag> flag, final Progressable progress, > > > > final FileSystem.Statistics statistics) throws IOException { > > > > > > Cheers > > > > On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel <arunp.bigd...@gmail.com> > > wrote: > > > > > I would like to store large documents (over 100 MB) on HDFS and insert > > > metadata in HBase. > > > > > > 1) Users will use HBase REST API for PUT and GET requests for storing > and > > > retrieving documents. In this case, how to PUT and GET documents > to/from > > > HDFS?What are the recommended ways for storing and accessing document > > > to/from HDFS that provides optimum performance? > > > > > > Can you please share any sample code? or a Github project? > > > > > > 2) What are the performance issues I need to know? > > > > > > Regards, > > > Arun > > > > > >