> script to iterate and load the files via the post command. You mean load parquet filed over post? That sounds unbelievable … Do u mean you created Solr doc for each parquet record in a partition and used solrJ or some other java lib to post the docs to Solr?
df.mapPatitions(p => { ///batch the parquet records, convert batch to a solr-doc-batch, then send to Solr via Solr request}) If you are sending raw parquet to Solr I would love to learn more :) ! > On Aug 10, 2020, at 7:50 PM, Russell Jurney <russell.jur...@gmail.com> wrote: > > There are ways to load data directly from Spark to Solr but I didn't find > any of them satisfactory so I just create enough Spark partitions with > reparition() (increase partition count)/coalesce() (decrease partition > count) that I get as many Parquet files as I want and then I use a bash > script to iterate and load the files via the post command. > > Thanks, > Russell Jurney @rjurney <http://twitter.com/rjurney> > russell.jur...@gmail.com LI <http://linkedin.com/in/russelljurney> FB > <http://facebook.com/jurney> datasyndrome.com > > > On Fri, Aug 7, 2020 at 9:48 AM Jörn Franke <jornfra...@gmail.com> wrote: > >> DIH is deprecated and it will be removed from Solr. You may though still >> be able to install it as a plug-in. However, AFAIK nobody maintains it. Do >> not use it anymore >> >> You can write a custom Spark data source that writes to Solr or does it in >> a spark Map step using SolrJ . >> In both cases do not create 100s of executors to avoid overloading. >> >> >>> Am 07.08.2020 um 18:39 schrieb Kevin Van Lieshout < >> kevin.vanl...@gmail.com>: >>> >>> Hi, >>> >>> Is there any assistance around writing parquets from spark to solr shards >>> or is it possible to customize a DIH to import a parquet to a solr shard. >>> Let me know if this is possible, or the best work around for this. Much >>> appreciated, thanks >>> >>> >>> Kevin VL >>