Just confirmed that you do need to create the core directory before doing the SHARDSPLIT (at least with HDFS) - otherwise it fails saying that it cannot find classes - like the cluster classes.
Iv'e noticed that the disk usage on HDFS goes up when I do the split - for example, if I split a 100G shard, the index size goes up by 100G with the two new shards. Is this correct for HDFS operation? Thank you! -Joe On Mon, Nov 17, 2014 at 7:12 PM, Joseph Obernberger < joseph.obernber...@gmail.com> wrote: > Looks like the shard split failed, and only created one additional shard. > I didn't allocate enough memory for 3x - since two additional shards needed > to be created. I was allocating 20G for each shard, so in order do the > split, I needed to give 60G for the direct memory access. I've now > switched it to 10G, and run the split - that works, but I still need to > build the directories before hand otherwise I get the cannot find class > problem. > > Here are my HDFS parameters: > <directoryFactory name="DirectoryFactory" > class="solr.HdfsDirectoryFactory"> > <bool name="solr.hdfs.blockcache.enabled">true</bool> > <int name="solr.hdfs.blockcache.slab.count">80</int> > <bool > name="solr.hdfs.blockcache.direct.memory.allocation">true</bool> > <int name="solr.hdfs.blockcache.blocksperbank">16384</int> > <bool name="solr.hdfs.blockcache.read.enabled">true</bool> > <bool name="solr.hdfs.blockcache.write.enabled">false</bool> > <bool name="solr.hdfs.nrtcachingdirectory.enable">true</bool> > <int name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">64</int> > <int name="solr.hdfs.nrtcachingdirectory.maxcachedmb">512</int> > <str name="solr.hdfs.home">hdfs://nameservice1:8020/solr6</str> > <str name="solr.hdfs.confdir">/etc/hadoop/conf.cloudera.hdfs1</str> > </directoryFactory> > > I did have the slab.count set to 160 before, and just didn't have the RAM > to try this out. The split is now running and I see the amount of space > going into the new shards is increasing. Looks like it's going to be > overnight before it completes. > > -Joe > > On Mon, Nov 17, 2014 at 5:57 PM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> Tell us more about your HDFS stuff. Specifically, how >> do you have your HDFSDirectoryFactory specified in >> solrconfig.xml? >> >> Cause you shouldn't have to do things like create the >> directory ahead of time I don't think. >> >> Best, >> Erick >> >> On Mon, Nov 17, 2014 at 12:17 PM, Joseph Obernberger >> <joseph.obernber...@gmail.com> wrote: >> > Originally I had two shards on two machines - shard1 and shard2. >> > I did a SHARDSPLIT on shard1. >> > Now have shard1, shard2, and shard1_0 >> > If I select the core (COLLECT_shard1_0_replica1) and execute a query, I >> get >> > all the docs OK, but if I specific &distrib=false, I get 0 documents. >> > >> > Under HDFS - when/how will the new core start to get data? >> > Thank you! >> > >> > -Joe >> > >