Sorry , not Hive files but xml files to some Avro format and store these into Hive will be fast .
On Sat, Jan 3, 2015 at 9:59 PM, Shashidhar Rao <[email protected]> wrote: > Hi, > > Exact number of files is not known but it will run into millions of files > depending on client's request who collects terabytes of xml data every day. > Basically, storing is just one part but the main part will be how to query > these data like aggregation, count and do some analytics over these data. > Fast retrieval is required , say for e.g for a particular year what are the > top 10 products, top ten manufacturers and top ten stores etc. > > Will Hive be a better choice ? And will converting these Hive files to > some format work out. > > Thanks > Shashi > > On Sat, Jan 3, 2015 at 9:44 PM, Wilm Schumacher <[email protected] > > wrote: > >> Hi, >> >> how many xml files are you planning to store? Perhaps it is possible to >> store them directly on hdfs and save meta data in hbase. This sounds >> more reasonable to me. >> >> If the number of xml files is to large (millions and billions), then you >> can use hadoop map files to put files together. E.g. based on years, or >> month. >> >> Regards, >> >> Wilm >> >> Am 03.01.2015 um 17:06 schrieb Shashidhar Rao: >> > Hi, >> > >> > Can someone help me by suggesting the best way to solve this use case >> > >> > 1. XML files keep flowing from external system and need to be stored >> > into HDFS. >> > 2. These files can be directly stored using NoSql database e.g any >> > xml supported NoSql. or >> > 3. These files need to be processed and stored in one of the database >> > HBase, Hive etc. >> > 4. There won't be any updates only read and has to be retrieved based >> > on some queries and a dashboard has to be created , bits of analytics >> > >> > The xml files are huge and expected number of nodes is roughly around >> > 12 nodes. >> > I am stuck in the storage part say if I convert xml to json and store >> > it into HBase , the processing part from xml to json will be huge. >> > >> > It will be only reading and no updates. >> > >> > Please suggest how to store these xml files. >> > >> > Thanks >> > Shashi >> >> >
