You should try passing your solr writer into rdd.foreachPartition() for max
parallelism - each partition on each executor will execute the function
passed in.

HTH,
Duc

On Tue, Nov 17, 2015 at 7:36 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Any input/suggestions on parallelizing below operations using Spark over
> Java Thread pooling
> - reading of 100 thousands json files from local file system
> - processing each file content and submitting to Solr as Input document
>
> Thanks,
> Susheel
>
> On Mon, Nov 16, 2015 at 5:44 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
>> Hello Spark Users,
>>
>> My first email to spark mailing list and looking forward. I have been
>> working on Solr and in the past have used Java thread pooling to
>> parallelize Solr indexing using SolrJ.
>>
>> Now i am again working on indexing data and this time from JSON files (in
>> 100 thousands) and before I try out parallelizing the operations using
>> Spark (reading each JSON file, post its content to Solr) I wanted to
>> confirm my understanding.
>>
>>
>> By reading json files using wholeTextFiles and then posting the content
>> to Solr
>>
>> - would be similar to what i will achieve using Java multi-threading /
>> thread pooling and using ExecutorFramework  and
>> - what additional other advantages i would get by using Spark (less
>> code...)
>> - How we can parallelize/batch this further? For e.g. In my Java
>> multi-threaded i not only parallelize the reading / data acquisition but
>> also posting in batches in parallel.
>>
>>
>> Below is the code snippet to give you an idea of what i am thinking to
>> start initially.  Please feel free to suggest/correct my understanding and
>> below code structure.
>>
>> SparkConf conf = new SparkConf().setAppName(appName).setMaster("local[8]"
>> );
>>
>> JavaSparkContext sc = new JavaSparkContext(conf);
>>
>> JavaPairRDD<String,String> rdd = sc.wholeTextFiles("/../*.json");
>>
>> rdd.foreach(new VoidFunction<Tuple2<String,String>>() {
>>
>>
>> @Override
>>
>> public void post(Tuple2<String, String> arg0) throws Exception {
>>
>> //post content to Solr
>>
>> arg0._2
>>
>> ...
>>
>> ...
>>
>> }
>>
>> });
>>
>>
>> Thanks,
>>
>> Susheel
>>
>
>

Reply via email to