Hi, Here I'm trying to process a very big input file through giraph, ~70GB. I'm running the giraph program on a 40 nodes linux cluster but the program just get stuck there after it read in a small fraction of the input file. Although each node has 16GB mem, it looks that only one node read the input file which is on HDFS(into its memory). As the input file is so big, is there a way to scatter the input file on all the nodes so each node will read in a fraction of the file then start processing the graph? Will it be helpful if we split the single big input file into many smaller files and let each node read in one of them to process( of course the overall stucture of the graph should be kept)? Thanks!
Best Regards, Suijian
