Re: how to index 20 MB plain-text xml

2014-03-31 Thread primoz . skale
Hi!

I had the same issue with XML files. Even small XML files produced OOM 
exception. I read that the way XMLs are parsed can sometimes blow up 
memory requirements to such values that java runs out of heap. My solution 
was:

1. Don't parse XML files
2. Parse only small XML files and hope for the best
3. Give Solr the largest possible amount of java heap size (and hope for 
the best)

But then again, one time I also got OOM exception with Word documents - it 
turned out that some user had pasted 400 MB worth of photos into a Word 
file.

Regards,

Primoz




From:   Floyd Wu floyd...@gmail.com
To: solr-user@lucene.apache.org
Date:   31.03.2014 08:18
Subject:Re: how to index 20 MB plain-text xml



Hi Alex,

Thanks for your responding. Personally I don't want to feed these big xml
to solr. But users wants.
I'll try your suggestions later.

Many thanks.

Floyd



2014-03-31 13:44 GMT+08:00 Alexandre Rafalovitch arafa...@gmail.com:

 Without digging too deep into why exactly this is happening, here are
 the general options:

 0. Are you actually committing? Check the messages in the logs and see
 if the records show up when you expect them too.
 1. Are you actually trying to feed 20Mb file to Solr? Maybe it's HTTP
 buffer that's blowing up? Try using stream.file instead (notice
 security warning though): http://wiki.apache.org/solr/ContentStream
 2. Split file into smaller ones and and commit each separately
 3. Set hard auto-commit in solrconfig.xml based on number of documents
 to flush in-memory structures to disk
 4. Switch to using DataImportHandler to pull from XML instead of pushing
 5. Increase amount of memory to Solr (-X command line flags)

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency

 On Mon, Mar 31, 2014 at 12:00 PM, Floyd Wu floyd...@gmail.com wrote:
  I have many plain text xml that I transfer to form of solr xml format.
  But every time I send them to solr, I hit OOM exception.
  How to configure solr to eat these big xml?
  Please guide me a way. Thanks
 
  floyd




Re: Updating an entry in Solr

2013-11-13 Thread primoz . skale
Yes, that's correct. You can also update document per field but all 
fields need to be stored=true, because Solr (version = 4.0) first gets 
your document from the index, creates new document with modified field, 
and adds it again to the index...

Primoz



From:   gohome190 gohome...@gmail.com
To: solr-user@lucene.apache.org
Date:   13.11.2013 14:39
Subject:Re: Updating an entry in Solr



Okay, so I've found in the solr tutorial that if you do a POST command and
post a new entry with the same uniquekey (in my case, id_) as an entry
already in the index, solr will automatically replace it for you.  That
seems to be what I need, right?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Updating-an-entry-in-Solr-tp4100674p4100675.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adding a server to an existing SOLR cloud cluster

2013-11-11 Thread primoz . skale
Try manually creating shard replicas on the new server. I think the new 
server is only used automatically when you start you Solr server instance 
with correct command line option (aka. -DnumShards)  - I never liked 
this kind of behaviour. 

The server is not present in clusterstate.json file, because it contains 
no replicas - but it is a live node, as you have already stated.

Best regards,

Primoz



From:   ade-b adrian.bro...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.11.2013 14:48
Subject:Adding a server to an existing SOLR cloud cluster



Hi 

We have a SOLRCloud cluster of 3 solr servers (v4.5.0 running under 
tomcat)
with 1 shard. We added a new SOLR server (v4.5.1) by simply starting 
tomcat
and pointing it at the zookeeper ensemble used by the existing cluster. My
understanding was that this new server would handshake with zookeeper and
add itself as a replica to the existing cluster.

What has actually happened is that the server is in zookeeper's 
live_nodes,
but is not in the clusterstate.json file. It also does not have a
CORE/collection associated with it.

Any ideas? I assume I am missing a step. Do I have to manually create the
core on the new server?


Cheers
Ade



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Adding a server to an existing SOLR cloud cluster

2013-11-11 Thread primoz . skale
According to the wiki pages it should, but I have not really tried it yet 
- I like to make the bookeeping myself :)

I am sorry but someones with more knowledge of Solr will have to answer 
your question.

Primoz



From:   ade-b adrian.bro...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.11.2013 15:44
Subject:Re: Adding a server to an existing SOLR cloud cluster



Thanks.

If I understand what you are saying, it should automatically register 
itself
with the existing cluster if we start SOLR with the correct command line
options. We tried adding the numShards option to the command line but 
still
get the same outcome.

We start the new SOLR server using 

/usr/bin/java
-Djava.util.logging.config.file=/mnt/ephemeral/apache-tomcat-7.0.47/conf/logging.properties
-Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -server
-Xms256m -Xmx1024m -XX:+DisableExplicitGC
-Dsolr.solr.home=/mnt/ephemeral/solr -Dport=8080 -DhostContext=solr
-DnumShards=1 -DzkClientTimeout=15000 -DzkHost=zk ip address
-Djava.endorsed.dirs=/mnt/ephemeral/apache-tomcat-7.0.47/endorsed 
-classpath
/mnt/ephemeral/apache-tomcat-7.0.47/bin/bootstrap.jar:/mnt/ephemeral/apache-tomcat-7.0.47/bin/tomcat-juli.jar
-Dcatalina.base=/mnt/ephemeral/apache-tomcat-7.0.47
-Dcatalina.home=/mnt/ephemeral/apache-tomcat-7.0.47
-Djava.io.tmpdir=/mnt/ephemeral/apache-tomcat-7.0.47/temp
org.apache.catalina.startup.Bootstrap start

Regards
Ade



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-a-server-to-an-existing-SOLR-cloud-cluster-tp4100275p4100286.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: A few questions about solr and tika

2013-10-18 Thread primoz . skale
Everythink about Tika extraction is written under those links. Basicaly 
what you need is the following:

1) requestHandler for Tika in solrconfig.xml
2) keep all the fields in schema.xml that are needed for Tika (they are 
marked in example schema.xml) and set those you don't need to 
indexed=false and stored=false
3) if you want to limit the returned fields in query response use query 
parameter 'fl'.

Primoz




From:   wonder a-wonde...@rambler.ru
To: solr-user@lucene.apache.org
Date:   17.10.2013 14:44
Subject:Re: A few questions about solr and tika



Thanks for answer. If I dont want to store and index any fields i do:
field name=links type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=link type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=img type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=iframe type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=area type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=map type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=pragma type=string indexed=false stored=false 
multiValued=true/!--удаление лишних TIKA--
field name=expires type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=keywords type=string indexed=false stored=false 
multiValued=true/!--удаление лишних полей TIKA--
field name=stream_source_info type=string indexed=false 
stored=false multiValued=true/!--удаление лишних полей TIKA--

Other qestions is still open for me.


17.10.2013 14:26, primoz.sk...@policija.si пишет:
 Why don't you check these:

 - Content extraction with Apache Tika (
 http://www.youtube.com/watch?v=ifgFjAeTOws)
 - ExtractingRequestHandler (
 http://wiki.apache.org/solr/ExtractingRequestHandler)
 - Uploading Data with Solr Cell using Apache Tika (
 
https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Solr+Cell+using+Apache+Tika

 )

 Primož



 From:   wonder a-wonde...@rambler.ru
 To: solr-user@lucene.apache.org
 Date:   17.10.2013 12:23
 Subject:A few questions about solr and tika



 Hello everyone! Please tell me how and where to set Tika options in
 Solr? Where is Tica conf? I'm want to know how I can eliminate not
 required to me response attribute(such as links or images)? Also I am
 interesting how i can get and index only metadata in several file 
formats?







Re: SolrCloud Performance Issue

2013-10-17 Thread primoz . skale
Query result cache hit might be low due to using NOW in bf. NOW is always 
translated to current time and that of course changes from ms to ms... :)

Primoz



From:   Shamik Bandopadhyay sham...@gmail.com
To: solr-user@lucene.apache.org
Date:   17.10.2013 00:14
Subject:SolrCloud Performance Issue



Hi,

  I'm in the process of transitioning to SolrCloud from a conventional
Master-Slave model. I'm using Solr 4.4 and has set-up 2 shards with 1
replica each. I've 3 zookeeper ensemble. All the nodes are running on AWS
EC2 instances. Shards are on m1.xlarge and sharing a zookeeper instance
(mounted on a separate volume). 6 gb memory is allocated to each solr
instance.

I've around 10 million documents in index. With the previous standalone
model, the queries avg around 100 ms.  The SolrCloud query response have
been abysmal so far. The query response time is over 1000ms, reaching
2000ms often. I expected some surge due to additional servers, network
latency, etc. but this difference is really baffling. The hardware is
similar in both cases, except for the fact that couple of SolrCloud node 
is
sharing zookeeper as well. m1x.large I/O is high, so shouldn't be a
bottleneck as well.

The other difference from old setup is that I'm using the new
CloudSolrServer class which is having the 3 zookeeper reference for load
balancing. But I don't think it has any major impact as the queries
executed from Solr admin query panel confirms the slowness.

Here are some of my configuration setup:

autoCommit
maxTime3/maxTime
openSearcherfalse/openSearcher
/autoCommit

autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit


maxBooleanClauses1024/maxBooleanClauses


filterCache class=solr.FastLRUCache size=16384 initialSize=4096
autowarmCount=4096/

queryResultCache class=solr.LRUCache size=16384 initialSize=8192
autowarmCount=4096/

documentCache class=solr.LRUCache size=32768 initialSize=16384
autowarmCount=0/

fieldValueCache class=solr.FastLRUCache size=16384
autowarmCount=8192 showItems=4096 /

enableLazyFieldLoadingtrue/enableLazyFieldLoading

queryResultWindowSize200/queryResultWindowSize

queryResultMaxDocsCached400/queryResultMaxDocsCached



listener event=newSearcher class=solr.QuerySenderListener
arr name=queries
lststr name=qline/str/lst
lststr name=qxref/str/lst
lststr name=qdraw/str/lst
/arr
/listener
listener event=firstSearcher
class=solr.QuerySenderListener
arr name=queries
lststr name=qline/str/lst
lststr name=qdraw/str/lst
lststr name=qline/strstr
name=fqlanguage:english/str/lst
lststr name=qline/strstr
name=fqSource2:documentation/str/lst
lststr name=qline/strstr
name=fqSource2:CloudHelp/str/lst
lststr name=qdraw/strstr
name=fqlanguage:english/str/lst
lststr name=qdraw/strstr
name=fqSource2:documentation/str/lst
lststr name=qdraw/strstr
name=fqSource2:CloudHelp/str/lst
/arr
/listener

maxWarmingSearchers2/maxWarmingSearchers


The custom request handler :

requestHandler name=/adskcloudhelp class=solr.SearchHandler
lst name=defaults
str name=echoParamsexplicit/str
float name=tie0.01/float
str name=wtvelocity/str
str name=v.templatebrowse/str
str
name=v.contentTypetext/html;charset=UTF-8/str
str name=v.layoutlayout/str
str name=v.channelcloudhelp/str

str name=defTypeedismax/str
str name=q.alt*:*/str
str name=rows15/str
str
name=flid,url,Description,Source2,text,filetype,title,LastUpdateDate,PublishDate,ViewCount,TotalMessageCount,Solution,LastPostAuthor,Author,Duration,AuthorUrl,ThumbnailUrl,TopicId,score/str
str name=qftext^1.5 title^2 IndexTerm^.9
keywords^1.2 ADSKCommandSrch^2 ADSKContextId^1/str
str name=bqSource2:CloudHelp^3
Source2:youtube^0.85/str
str
name=bfrecip(ms(NOW,PublishDate),3.16e-11,1,1)^2.0/str
str name=dftext/str


str name=faceton/str
str name=facet.mincount1/str
str name=facet.limit100/str
str name=facet.fieldlanguage/str
str name=facet.fieldSource2/str
str name=facet.fieldDocumentationBook/str
str name=facet.fieldADSKProductDisplay/str
str name=facet.fieldaudience/str


str 

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
I sometimes also do get null ranges when doing colletions/cores API 
actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed 
because zkCli had problems with putfile command, but in 4.5.0 it works 
OK. All you have to do is download clusterstate.json from ZK (get 
/clusterstate.json), fix ranges to appropriate values and upload the file 
back to ZK with zkCli. 

But why those null ranges happen at all is beyond me :)

Primoz



From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   16.10.2013 07:37
Subject:Re: Regarding Solr Cloud issue...



I'm sorry I am not able to reproduce this issue.

I started 5 solr-4.4 instances.
I copied example directory into example1, example2, example3 and example4
cd example; java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar
cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar start.jar
cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar start.jar
cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar start.jar
cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar start.jar

After that I invoked:
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollection51numShards=5replicationFactor=1


I can see all shards having non-null ranges in clusterstate.


On Tue, Oct 15, 2013 at 8:47 PM, Chris christu...@gmail.com wrote:

 Hi Shalin,.

 Thank you for your quick reply. I appreciate all the help.

 I started the solr cloud servers first...with 5 nodes.

 then i issued a command like below to create the shards -


 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=5replicationFactor=1

 
 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4

 

 Please advice.

 Regards,
 Chris


 On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar 
 shalinman...@gmail.com wrote:

  How did you create these shards? Can you tell us how to reproduce the
  issue?
 
  Any shard in a collection with compositeId router should never have 
null
  ranges.
 
 
  On Tue, Oct 15, 2013 at 7:07 PM, Chris christu...@gmail.com wrote:
 
   Hi,
  
   I am using solr 4.4 as cloud. while creating shards i see that the 
last
   shard has range of null. i am not sure if this is a bug.
  
   I am stuck with having null value for the range in clusterstate.json
   (attached below)
  
   shard5:{ range:null, state:active, 
replicas:{core_node1:{
   state:active, core:Web_shard5_replica1,
   node_name:domain-name.com:1981_solr, base_url:
   http://domain-name.com:1981/solr;, leader:true,
   router:compositeId},
  
   I tried to use zookeeper cli to change this, but it was not able to. 
I
   tried to locate this file, but didn't find it anywhere.
  
   Can you please let me know how do i change the range from null to
  something
   meaningful? i have the range that i need, so if i can find the file,
  maybe
   i can change it manually.
  
   My next question is - can we have a catch all for ranges, i mean if
  things
   don't match any other range then insert in this shard..is this
 possible?
  
   Kindly advice.
   Chris
  
 
 
 
  --
  Regards,
  Shalin Shekhar Mangar.
 




-- 
Regards,
Shalin Shekhar Mangar.



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-16 Thread primoz . skale
I will certainly try, but give me some time :)

Primoz



From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   16.10.2013 07:05
Subject:Re: Cores with lot of folders with prefix index.XXX



I think that's an acceptable strategy. Can you put up a patch?


On Tue, Oct 15, 2013 at 2:32 PM, primoz.sk...@policija.si wrote:

 I have a question for developers of Solr regarding the issue of
 left-over index folders when replication fails. Could be this issue
 resolved quickly if when replication starts Solr creates a flag file 
in
 index. folder and when replication ends (and commits) this file is
 deleted? In this case if a server is restarted (or on schedule) it could
 quickly scan all the index. folders and delete those (maybe not 
the
 last one or those relevant to the index.properties file) that still
 *contain* a flag file and are so unfinished and uncommited.

 I have not really looked at the code yet so I may have a different view 
on
 the workings of replication. Would the solution I described at least
 address this issue?

 Best regards,

 Primoz





 From:   primoz.sk...@policija.si
 To: solr-user@lucene.apache.org
 Date:   11.10.2013 12:46
 Subject:Re: Cores with lot of folders with prefix index.XXX



 Thanks, I guess I was wrong after all in my last post.

 Primož




 From:   Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   11.10.2013 12:43
 Subject:Re: Cores with lot of folders with prefix index.XXX



 There are open issues related to extra index.XXX folders lying around if
 replication/recovery fails. See
 https://issues.apache.org/jira/browse/SOLR-4506


 On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro
 yago.rive...@gmail.comwrote:

  The thread that you point is about master / slave - replication, Is 
this
  issue valid on SolrCloud context?
 
  I check the index.properties and indeed the variable index=index.X
  point to a folder, the others can be deleted without any scary side
 effect?
 
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si 
wrote:
 
   Do you have a lot of failed replications? Maybe those folders have
   something to do with this (please see the last answer at
  
 

 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing



   ). If your disk space is valuable check index.properties file under
 data
   folder and try to determine which folders can be safely deleted.
  
   Primo¾
  
  
  
  
   From: Yago Riveiro yago.rive...@gmail.com (mailto:
  yago.rive...@gmail.com)
   To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
   Date: 11.10.2013 12:13
   Subject: Re: Cores with lot of folders with prefix index.XXX
  
  
  
   I have ssd's therefor my space is like gold, I can have 30% of my
 space
   waste in failed replications, or replications that are not cleaned.
  
   The question for me is if this a normal behaviour or is a bug. If is 
a
   normal behaviour I have a trouble because a ssd with more than 512G 
is
   expensive.
  
   --
   Yago Riveiro
   Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
  
  
   On Friday, October 11, 2013 at 11:03 AM,
 primoz.sk...@policija.si(mailto:
  primoz.sk...@policija.si) wrote:
  
I think this is connected to replications being made? I also have
 quite
some of them but currently I am not worried :)
   
  
  
  
 
 
 


 --
 Regards,
 Shalin Shekhar Mangar.





-- 
Regards,
Shalin Shekhar Mangar.



Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
If I am not mistaken the only way to create a new shard from a collection 
in 4.4.0 was to use cores API. That worked fine for me until I used 
*other* cores API commands. Those usually produced null ranges. 

In 4.5.0 this is fixed with newly added commands createshard etc. to the 
collections API, right?

Primoz



From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   16.10.2013 09:06
Subject:Re: Regarding Solr Cloud issue...



Chris, can you post your complete clusterstate.json? Do all shards have a
null range? Also, did you issue any core admin CREATE commands apart from
the create collection api.

Primoz, I was able to reproduce this but by doing an illegal operation.
Suppose I create a collection with numShards=5 and then I issue a core
admin create command such as:
http://localhost:8983/solr/admin/cores?action=CREATEname=xyzcollection=mycollection51shard=shard6


Then a shard6 is added to the collection with a null range. This is a 
bug
because we should never allow such a core admin create to succeed anyway.
I'll open an issue.



On Wed, Oct 16, 2013 at 11:49 AM, primoz.sk...@policija.si wrote:

 I sometimes also do get null ranges when doing colletions/cores API
 actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily fixed
 because zkCli had problems with putfile command, but in 4.5.0 it works
 OK. All you have to do is download clusterstate.json from ZK (get
 /clusterstate.json), fix ranges to appropriate values and upload the 
file
 back to ZK with zkCli.

 But why those null ranges happen at all is beyond me :)

 Primoz



 From:   Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   16.10.2013 07:37
 Subject:Re: Regarding Solr Cloud issue...



 I'm sorry I am not able to reproduce this issue.

 I started 5 solr-4.4 instances.
 I copied example directory into example1, example2, example3 and 
example4
 cd example; java -Dbootstrap_confdir=./solr/collection1/conf
 -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar
 cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar 
start.jar
 cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar 
start.jar
 cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar 
start.jar
 cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar 
start.jar

 After that I invoked:

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollection51numShards=5replicationFactor=1



 I can see all shards having non-null ranges in clusterstate.


 On Tue, Oct 15, 2013 at 8:47 PM, Chris christu...@gmail.com wrote:

  Hi Shalin,.
 
  Thank you for your quick reply. I appreciate all the help.
 
  I started the solr cloud servers first...with 5 nodes.
 
  then i issued a command like below to create the shards -
 
 
 

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=5replicationFactor=1


  
 

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4


  
 
  Please advice.
 
  Regards,
  Chris
 
 
  On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar 
  shalinman...@gmail.com wrote:
 
   How did you create these shards? Can you tell us how to reproduce 
the
   issue?
  
   Any shard in a collection with compositeId router should never have
 null
   ranges.
  
  
   On Tue, Oct 15, 2013 at 7:07 PM, Chris christu...@gmail.com wrote:
  
Hi,
   
I am using solr 4.4 as cloud. while creating shards i see that the
 last
shard has range of null. i am not sure if this is a bug.
   
I am stuck with having null value for the range in 
clusterstate.json
(attached below)
   
shard5:{ range:null, state:active,
 replicas:{core_node1:{
state:active, core:Web_shard5_replica1,
node_name:domain-name.com:1981_solr, base_url:
http://domain-name.com:1981/solr;, leader:true,
router:compositeId},
   
I tried to use zookeeper cli to change this, but it was not able 
to.
 I
tried to locate this file, but didn't find it anywhere.
   
Can you please let me know how do i change the range from null to
   something
meaningful? i have the range that i need, so if i can find the 
file,
   maybe
i can change it manually.
   
My next question is - can we have a catch all for ranges, i mean 
if
   things
don't match any other range then insert in this shard..is this
  possible?
   
Kindly advice.
Chris
   
  
  
  
   --
   Regards,
   Shalin Shekhar Mangar.
  
 



 --
 Regards,
 Shalin Shekhar Mangar.




-- 
Regards,
Shalin Shekhar Mangar.



Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
Yap, you are right - I only created extra replicas with cores API. For a 
new shard I had to use split shard command.

My apologies.

Primož



From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   16.10.2013 10:45
Subject:Re: Regarding Solr Cloud issue...



If the initial collection was created with a numShards parameter (and 
hence
compositeId router then there was no way to create a new logical shard. 
You
can add replicas with the core admin API but only to shards that already
exist. A new logical shard can only be created by splitting an existing 
one.

The createshard API also has the same limitation -- it cannot create a
shard for a collection with compositeId router. It is supposed to be used
for collections with custom sharding (i.e. implicit router). In such
collections, there is no concept of a hash range and routing is done
explicitly by the user using the shards parameter in the request or by
sending the request to the target core/node directly.

So, in summary, attempting to add a new logical shard to a collection with
compositeId router via CoreAdmin APIs is wrong, unsupported and should be
disallowed. Adding replicas to existing logical shards is okay though.


On Wed, Oct 16, 2013 at 12:56 PM, primoz.sk...@policija.si wrote:

 If I am not mistaken the only way to create a new shard from a 
collection
 in 4.4.0 was to use cores API. That worked fine for me until I used
 *other* cores API commands. Those usually produced null ranges.

 In 4.5.0 this is fixed with newly added commands createshard etc. to 
the
 collections API, right?

 Primoz



 From:   Shalin Shekhar Mangar shalinman...@gmail.com
 To: solr-user@lucene.apache.org
 Date:   16.10.2013 09:06
 Subject:Re: Regarding Solr Cloud issue...



 Chris, can you post your complete clusterstate.json? Do all shards have 
a
 null range? Also, did you issue any core admin CREATE commands apart 
from
 the create collection api.

 Primoz, I was able to reproduce this but by doing an illegal operation.
 Suppose I create a collection with numShards=5 and then I issue a core
 admin create command such as:

 
http://localhost:8983/solr/admin/cores?action=CREATEname=xyzcollection=mycollection51shard=shard6



 Then a shard6 is added to the collection with a null range. This is a
 bug
 because we should never allow such a core admin create to succeed 
anyway.
 I'll open an issue.



 On Wed, Oct 16, 2013 at 11:49 AM, primoz.sk...@policija.si wrote:

  I sometimes also do get null ranges when doing colletions/cores API
  actions CREATE or/and UNLOAD, etc... In 4.4.0 that was not easily 
fixed
  because zkCli had problems with putfile command, but in 4.5.0 it 
works
  OK. All you have to do is download clusterstate.json from ZK (get
  /clusterstate.json), fix ranges to appropriate values and upload the
 file
  back to ZK with zkCli.
 
  But why those null ranges happen at all is beyond me :)
 
  Primoz
 
 
 
  From:   Shalin Shekhar Mangar shalinman...@gmail.com
  To: solr-user@lucene.apache.org
  Date:   16.10.2013 07:37
  Subject:Re: Regarding Solr Cloud issue...
 
 
 
  I'm sorry I am not able to reproduce this issue.
 
  I started 5 solr-4.4 instances.
  I copied example directory into example1, example2, example3 and
 example4
  cd example; java -Dbootstrap_confdir=./solr/collection1/conf
  -Dcollection.configName=myconf -DzkRun -DnumShards=1 -jar start.jar
  cd example1; java -Djetty.port=7574 -DzkHost=localhost:9983 -jar
 start.jar
  cd example2; java -Djetty.port=7575 -DzkHost=localhost:9983 -jar
 start.jar
  cd example3; java -Djetty.port=7576 -DzkHost=localhost:9983 -jar
 start.jar
  cd example4; java -Djetty.port=7577 -DzkHost=localhost:9983 -jar
 start.jar
 
  After that I invoked:
 
 

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollection51numShards=5replicationFactor=1


 
 
  I can see all shards having non-null ranges in clusterstate.
 
 
  On Tue, Oct 15, 2013 at 8:47 PM, Chris christu...@gmail.com wrote:
 
   Hi Shalin,.
  
   Thank you for your quick reply. I appreciate all the help.
  
   I started the solr cloud servers first...with 5 nodes.
  
   then i issued a command like below to create the shards -
  
  
  
 
 

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=5replicationFactor=1


 
   
  
 
 

 
http://localhost:8983/solr/admin/collections?action=CREATEname=mycollectionnumShards=3replicationFactor=4


 
   
  
   Please advice.
  
   Regards,
   Chris
  
  
   On Tue, Oct 15, 2013 at 8:07 PM, Shalin Shekhar Mangar 
   shalinman...@gmail.com wrote:
  
How did you create these shards? Can you tell us how to reproduce
 the
issue?
   
Any shard in a collection with compositeId router should never 
have
  null
ranges.
   
   
On Tue, Oct 15, 2013 at 7:07 PM, Chris christu...@gmail.com 
wrote:
   
 Hi,

 I am using solr 4.4 as cloud. while creating shards i 

Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
 Also, another issue that needs to be raised is the creation of cores 
from
 the core admin section of the gui, doesnt really work well, it 
creates
 files but then they do not work (again i am using 4.4)

From my experience core admin section of the GUI does not work well in 
SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0 
which acts much better.

I would use only HTTP requests (cores and collections API) with 
SolrCloud and would use GUI only for viewing the state of cluster and 
cores.

Primoz




Re: Regarding Solr Cloud issue...

2013-10-16 Thread primoz . skale
Hm, good question. I haven't really done any upgrading yet, because I just 
reinstall and reindex everything. I would replace jars with the new ones 
(if needed - check release notes for version 4.4.0 and 4.5.0 where all the 
versions of external tools [tika, maven, etc.] are stated) and deploy the 
updated WAR file to servlet container.

Primoz




From:   Chris christu...@gmail.com
To: solr-user solr-user@lucene.apache.org
Date:   16.10.2013 14:30
Subject:Re: Regarding Solr Cloud issue...



oh great. Thanks Primoz.

is there any simple way to do the upgrade to 4.5 without having to change
my configurations? update a few jar files etc?


On Wed, Oct 16, 2013 at 4:58 PM, primoz.sk...@policija.si wrote:

  Also, another issue that needs to be raised is the creation of cores
 from
  the core admin section of the gui, doesnt really work well, it
 creates
  files but then they do not work (again i am using 4.4)

 From my experience core admin section of the GUI does not work well in
 SolrCloud domain. If I am not mistaken this was somehow fixed in 4.5.0
 which acts much better.

 I would use only HTTP requests (cores and collections API) with
 SolrCloud and would use GUI only for viewing the state of cluster and
 cores.

 Primoz






Re: Error when i want to create a CORE

2013-10-16 Thread primoz . skale
Can you try with a directory path that contains *no* spaces.

Primoz



From:   raige regis...@gmail.com
To: solr-user@lucene.apache.org
Date:   16.10.2013 14:46
Subject:Error when i want to create a CORE



I install the version solr 4.5 on windows. I launch with Jetty web server 
the
example. I have no problem with collection 1 core. But, when i want to
create my core, the server send me this error : 
*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Could not load config file C:\Documents and
Settings\r.lucas\Bureau\Moteur\solr-4.5.0\example\solr\index1\solrconfig.xml*

could you help please



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Error-when-i-want-to-create-a-CORE-tp4095894.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: howto increase indexing speed?

2013-10-16 Thread primoz . skale
I think DIH uses only one core per instance. IMHO 300 doc/sec is quite 
good. If you would like to use more cores you need to use solrj. Or maybe 
more than one DIH and more cores of course.

Primoz



From:   Giovanni Bricconi giovanni.bricc...@banzai.it
To: solr-user solr-user@lucene.apache.org
Date:   16.10.2013 16:25
Subject:howto increase indexing speed?



I have a small solr setup, not even on a physical machine but a vmware
virtual machine with a single cpu that reads data using DIH from a
database. The machine has no phisical disks attached but stores data on a
netapp nas.

Currently this machine indexes 320 documents/sec, not bad but we plan to
double the index and we would like to keep nearly the same.

Doing some basic checks during the indexing I have found with iostat that
the usage of the disks is nearly 8% and the source database is running
fine, instead the  virtual cpu is 95% running on solr.

Now I can quite easily add another virtual cpu to the solr box, but as far
as I know this won't help because DIH doesn't work in parallel. Am I 
wrong?

What would you do? Rewrite the feeding process quitting dih and using 
solrj
to feed data in parallel? Would you instead keep DIH and switch to a
sharded configuration?

Thank you for any hints

Giovanni



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-15 Thread primoz . skale
I have a question for developers of Solr regarding the issue of 
left-over index folders when replication fails. Could be this issue 
resolved quickly if when replication starts Solr creates a flag file in 
index. folder and when replication ends (and commits) this file is 
deleted? In this case if a server is restarted (or on schedule) it could 
quickly scan all the index. folders and delete those (maybe not the 
last one or those relevant to the index.properties file) that still 
*contain* a flag file and are so unfinished and uncommited.

I have not really looked at the code yet so I may have a different view on 
the workings of replication. Would the solution I described at least 
address this issue?

Best regards,

Primoz





From:   primoz.sk...@policija.si
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:46
Subject:Re: Cores with lot of folders with prefix index.XXX



Thanks, I guess I was wrong after all in my last post.

Primož




From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:43
Subject:Re: Cores with lot of folders with prefix index.XXX



There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506


On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro 
yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side 
effect?


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at
 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing


  ). If your disk space is valuable check index.properties file under 
data
  folder and try to determine which folders can be safely deleted.
 
  Primo¾
 
 
 
 
  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
  I have ssd's therefor my space is like gold, I can have 30% of my 
space
  waste in failed replications, or replications that are not cleaned.
 
  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:03 AM, 
primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:
 
   I think this is connected to replications being made? I also have 
quite
   some of them but currently I am not worried :)
  
 
 
 





-- 
Regards,
Shalin Shekhar Mangar.




Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
For pre 4.x Solr (aka Solr 3.x) basic authentication works fine. Check 
this site: http://wiki.apache.org/solr/SolrSecurity

Even master-slave replication architecture (*not* SolrCloud) works for 
me. There could be some problems with *cross-shard* queries etc. though 
(see SOLR-1861, SOLR-3421).

I know I haven't answered your question but hopefully I have given you 
some more information on the subject.

Best regards,

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 10:55
Subject:Solr Cloud Basic Authentification



I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in 
the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?
Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
One possible solution is to firewall access to SolrCloud server(s). Only 
proxy/load-balacing servers should have unrestricted access to Solr 
infrastructure. Then you can implement basic/advanced authentication on 
the proxy/LB side.

Primož



From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:17
Subject:Re: Solr Cloud Basic Authentification



Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic
authentification.

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094911.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Cloud Basic Authentification

2013-10-11 Thread primoz . skale
If you want to deploy basic authentication in a way that a login is 
required when creating collections it is only a simple matter of 
constrainting a url pattern (eg. /solr/admin/collections/*). Maybe this 
link will help: 
http://stackoverflow.com/questions/5323855/jetty-webserver-security/5332049#5332049

But keep in mind that intra-node requests in SolrCloud must also be 
authenticated (because http stack is used). If I understand correctly this 
is currently not possible.

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:25
Subject:Re: Solr Cloud Basic Authentification



Thank you,
But I'm afraid that wiki page does not cover my topic of interest



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Basic-Authentification-tp4094903p4094915.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
I think this is connected to replications being made? I also have quite 
some of them but currently I am not worried :)

Primož



From:   yriveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 11:54
Subject:Cores with lot of folders with prefix index.XXX



Hi,

I have some cores with lot of folder with format index.X, my question 
is
why?

The collateral effect of this are shards with 50% of size than replicas in
other nodes.

There is any way to delete this folders to free space?

It's a bug?

/Yago



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Cores-with-lot-of-folders-with-prefix-index-XXX-tp4094920.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Do you have a lot of failed replications? Maybe those folders have 
something to do with this (please see the last answer at 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing
). If your disk space is valuable check index.properties file under data 
folder and try to determine which folders can be safely deleted.

Primož




From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:13
Subject:Re: Cores with lot of folders with prefix index.XXX



I have ssd's therefor my space is like gold, I can have 30% of my space 
waste in failed replications, or replications that are not cleaned. 

The question for me is if this a normal behaviour or is a bug. If is a 
normal behaviour I have a trouble because a ssd with more than 512G is 
expensive.

-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si wrote:

 I think this is connected to replications being made? I also have quite
 some of them but currently I am not worried :)





Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Honestly I don't know for sure if you can delete then. Maybe make a backup 
then delete them and see if it still works :)

Replication works differently in SolrCloud world as I currently know. I 
don't think there are any additional index.* folders because fallback does 
not work in SolrCloud (someone correct me if I am wrong!).

Primož



From:   Yago Riveiro yago.rive...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:36
Subject:Re: Cores with lot of folders with prefix index.XXX



The thread that you point is about master / slave - replication, Is this 
issue valid on SolrCloud context? 

I check the index.properties and indeed the variable index=index.X 
point to a folder, the others can be deleted without any scary side 
effect?


-- 
Yago Riveiro
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

 Do you have a lot of failed replications? Maybe those folders have 
 something to do with this (please see the last answer at 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

 ). If your disk space is valuable check index.properties file under data 
 
 folder and try to determine which folders can be safely deleted.
 
 Primo¾
 
 
 
 
 From: Yago Riveiro yago.rive...@gmail.com (
mailto:yago.rive...@gmail.com)
 To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
 Date: 11.10.2013 12:13
 Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
 I have ssd's therefor my space is like gold, I can have 30% of my space 
 waste in failed replications, or replications that are not cleaned. 
 
 The question for me is if this a normal behaviour or is a bug. If is a 
 normal behaviour I have a trouble because a ssd with more than 512G is 
 expensive.
 
 -- 
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
 On Friday, October 11, 2013 at 11:03 AM, primoz.sk...@policija.si (
mailto:primoz.sk...@policija.si) wrote:
 
  I think this is connected to replications being made? I also have 
quite
  some of them but currently I am not worried :)
  
 
 
 





Re: Cores with lot of folders with prefix index.XXXXXXX

2013-10-11 Thread primoz . skale
Thanks, I guess I was wrong after all in my last post.

Primož




From:   Shalin Shekhar Mangar shalinman...@gmail.com
To: solr-user@lucene.apache.org
Date:   11.10.2013 12:43
Subject:Re: Cores with lot of folders with prefix index.XXX



There are open issues related to extra index.XXX folders lying around if
replication/recovery fails. See
https://issues.apache.org/jira/browse/SOLR-4506


On Fri, Oct 11, 2013 at 4:06 PM, Yago Riveiro 
yago.rive...@gmail.comwrote:

 The thread that you point is about master / slave - replication, Is this
 issue valid on SolrCloud context?

 I check the index.properties and indeed the variable index=index.X
 point to a folder, the others can be deleted without any scary side 
effect?


 --
 Yago Riveiro
 Sent with Sparrow (http://www.sparrowmailapp.com/?sig)


 On Friday, October 11, 2013 at 11:31 AM, primoz.sk...@policija.si wrote:

  Do you have a lot of failed replications? Maybe those folders have
  something to do with this (please see the last answer at
 
 
http://stackoverflow.com/questions/3145192/why-does-my-solr-slave-index-keep-growing

  ). If your disk space is valuable check index.properties file under 
data
  folder and try to determine which folders can be safely deleted.
 
  Primo¾
 
 
 
 
  From: Yago Riveiro yago.rive...@gmail.com (mailto:
 yago.rive...@gmail.com)
  To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org)
  Date: 11.10.2013 12:13
  Subject: Re: Cores with lot of folders with prefix index.XXX
 
 
 
  I have ssd's therefor my space is like gold, I can have 30% of my 
space
  waste in failed replications, or replications that are not cleaned.
 
  The question for me is if this a normal behaviour or is a bug. If is a
  normal behaviour I have a trouble because a ssd with more than 512G is
  expensive.
 
  --
  Yago Riveiro
  Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
 
 
  On Friday, October 11, 2013 at 11:03 AM, 
primoz.sk...@policija.si(mailto:
 primoz.sk...@policija.si) wrote:
 
   I think this is connected to replications being made? I also have 
quite
   some of them but currently I am not worried :)
  
 
 
 





-- 
Regards,
Shalin Shekhar Mangar.



Re: Collection API wrong configuration

2013-10-09 Thread primoz . skale
Works fine at my end. I use Solr 4.5.0 on Windows 7. 

I tried:

zkcli.bat -cmd upconfig -zkhost localhost:9000 -d 
..\solr\collection2\conf -n my_custom_collection

java -Djetty.port=8001 -DzkHost=localhost:9000 -jar start.jar

and finally

http://localhost:8001/solr/admin/collections?action=CREATEname=my_custom_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_custom_collection

If I open newly created core/shard I can see under Schema the modified 
schema file.

Best regards,

Primož




From:   maephisto my_sky...@yahoo.com
To: solr-user@lucene.apache.org
Date:   09.10.2013 11:57
Subject:Collection API wrong configuration



I'm experimenting with SolrCloud using Solr 4.5.0  and the Collection API

What i did was: 
1. upload configuration to ZK
zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d
solr/my_custom_collection/conf/ -n my_custom_collection
2. create a collection using the api:
/admin/collections?action=CREATEname=my_custom_collectionnumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=my_custom_config

The outcome of these action seem to be that the collection cores don't use
the my_custom_collection but the example configuration.
Any idea why this is happening?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: Hardware dimension for new SolrCloud cluster

2013-10-08 Thread primoz . skale
I think Mr. Erickson summarized the issue of hardware sizing quite well in 
the following article:

http://searchhub.org/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best regards,

Primož




From:   Henrik Ossipoff Hansen h...@entertainment-trading.com
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date:   08.10.2013 14:59
Subject:Hardware dimension for new SolrCloud cluster



We're in the process of moving onto SolrCloud, and have gotten to the 
point where we are considering how to do our hardware setup.

We're limited to VMs running on our server cluster and storage system, so 
buying new physical servers is out of the question - the question is how 
we should dimension the new VMs.

Our document area is somewhat small, with about 1.2 million orders (rising 
of course), 75k products (divided into 5 countries - each which will be 
their own collection/core) and some million customers.

In our current master/slave setup, we only index the products, with each 
country taking up about 35 MB of disk space. The index frequency i more or 
less updating the indexes 8 times per hour (mostly this is not all data 
thought, but atomic updates with new stock data, new prices etc.).

Our upcoming order and customer indexes however will more or less receive 
updates on the fly as it happens (softcommit) and we expect the same to 
be the case for products in the near future.

- For hardware, it's down to 1 or 2 cores - current master runs with 2 
cores
- RAM - currently our master runs with 6 GB only
- How much heap space should we allocate for max heap?

We currently plan on this setup:
- 1 machine for a simple loadbalancer
- 4 VMs totally for the Solr machines themselves (for both leaders and 
replicas, just one replica per shard is enough for our use case)
- A qorum of 3 ZKs

Question is - is this machine setup enough? And how exactly do we 
dimension the Solr machines?

Any help, pointers or resources will be much appreciated :)

Thank you!