You should try passing your solr writer into rdd.foreachPartition() for max parallelism - each partition on each executor will execute the function passed in.
HTH, Duc On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > Any input/suggestions on parallelizing below operations using Spark over > Java Thread pooling > - reading of 100 thousands json files from local file system > - processing each file content and submitting to Solr as Input document > > Thanks, > Susheel > > On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar <susheel2...@gmail.com> > wrote: > >> Hello Spark Users, >> >> My first email to spark mailing list and looking forward. I have been >> working on Solr and in the past have used Java thread pooling to >> parallelize Solr indexing using SolrJ. >> >> Now i am again working on indexing data and this time from JSON files (in >> 100 thousands) and before I try out parallelizing the operations using >> Spark (reading each JSON file, post its content to Solr) I wanted to >> confirm my understanding. >> >> >> By reading json files using wholeTextFiles and then posting the content >> to Solr >> >> - would be similar to what i will achieve using Java multi-threading / >> thread pooling and using ExecutorFramework and >> - what additional other advantages i would get by using Spark (less >> code...) >> - How we can parallelize/batch this further? For e.g. In my Java >> multi-threaded i not only parallelize the reading / data acquisition but >> also posting in batches in parallel. >> >> >> Below is the code snippet to give you an idea of what i am thinking to >> start initially. Please feel free to suggest/correct my understanding and >> below code structure. >> >> SparkConf conf = new SparkConf().setAppName(appName).setMaster("local[8]" >> ); >> >> JavaSparkContext sc = new JavaSparkContext(conf); >> >> JavaPairRDD<String,String> rdd = sc.wholeTextFiles("/../*.json"); >> >> rdd.foreach(new VoidFunction<Tuple2<String,String>>() { >> >> >> @Override >> >> public void post(Tuple2<String, String> arg0) throws Exception { >> >> //post content to Solr >> >> arg0._2 >> >> ... >> >> ... >> >> } >> >> }); >> >> >> Thanks, >> >> Susheel >> > >