Chiranjeevi,
- HDFS works as distributed system. Thus, reads can be served from different nodes at the source. - Not all databases are distributed. If your database server is not distributed then you might face issues for parallel read beyond certain no. of partitions (say 4-5 partitions) - Ready to use applications for these usecases are available on https://www.datatorrent.com/apphub/ - Source code for these apps is Apache licensed under : https://github.com/datatorrent/app-templates - I would suggest to do some sample tests for the workloads you are looking for and take the decision. Kindly share your results for the benefit of the community. ~ Yogi On 25 January 2017 at 14:17, chiranjeevi vasupilli <[email protected]> wrote: > Hi Team, > > Can you please provide the pointers for using the Data Base vs HDFS as > source data for Data Torrent tool. > > Currenlry we are using HDFS to read the data as source and would like to > know the proc/cons , if we swith to Data base as source system for data. > > Please sugges. > -- > ur's > chiru >
