Re: Concurent indexing

2013-10-15 Thread maephisto
Thanks for the tip!

I must mention that I am using Solr 4.4.0 and this problem only appears when
i'm doing the indexing in the SolrCloud configuration deployed on standalone
Jetty 9.0.6.
When I do the same operations on a modified example in Solr 4.4.0 with
embedded Jetty, indexing to a simple core, I do not have any problem of this

View this message in context:
Sent from the Solr - User mailing list archive at

Debugging update request

2013-10-15 Thread maephisto
As a followup to another thread, where I described how my SolrCloud sometimes
just stops accepting updates

I have a question, is there a way to debug or analyze the update request?
Verbose output or anything else?
It happens for me that when in the above situation, I'm using the
tool to post 1 doc and i get no feedback, it just hangs and waits.

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Concurent indexing

2013-10-15 Thread maephisto
Hi Chris!
Could you describe your problem, how similar is it to mine?
Also, on which version of Solr are you encountering it?

View this message in context:
Sent from the Solr - User mailing list archive at

Concurent indexing

2013-10-14 Thread maephisto

I have a collection (numShards=3, replicationFactor=2) split on 2 machines.
Since the amount of data is huge I have to index, I would like start
multiple instances of the same process that would index data to Solr.
Is there any limitation or counter-indication is this area? 

The indexing client is custom built by me and parses files (each instance
parses a different file), and the uniqueId is auto-generated. 
Would a commit in a process also commit the uncommitted changes created by
another process?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Concurent indexing

2013-10-14 Thread maephisto
Thank you!

I was worried because i was experimenting with this system, and at some
point i was processing 2 big files and both indexing processes had added
about 750k docs when suddenly Solr simply refused to accept any more added
docs. Querying was working fine but trying to add 1 more single doc would
get no response.
(I had no autocommit setup)

It only came back to life when i restarted Jetty.
Any idea what went wrong? Is there a maximum nr of docs that can be added? 

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread maephisto
My only doubt is: upload a new set of configuration files to the same
configuration name like so:

Initial configuration: -zkhost localhost:9983 -cmd upconfig -confdir conf_initial/
-confname my_custom_config
and afterwards, to change it do: -zkhost localhost:9983 -cmd upconfig -confdir conf_changed/
-confname my_custom_config

Is this correct?
If so, what happens afterwards, will ZK distribute this changes to all cores
and reload them?

View this message in context:
Sent from the Solr - User mailing list archive at

Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
I've deployed a SolrCloud cluster in Jetty 9 using solr 4.4.0 and I would
like to add some basic authentification.
My question is how can I provide the credentials so that they're used in the
collection API when creating a new collection or by ZK?

Are there any useful docs/wiki on this topic?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
Thank you!

I'm more interested in the SolrCloud architecture, with shards, shards
replicas and distributed index and search.
This are the features i use and would like to protect by some basic

I imagine that there must be a way to have this, otherwise anybody could
mess with or even drop my collection.

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Solr Cloud Basic Authentification

2013-10-11 Thread maephisto
Thank you,
But I'm afraid that wiki page does not cover my topic of interest

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Multiple schemas in the same SolrCloud ?

2013-10-11 Thread maephisto
Upload the new configuration and the use the collection API to reload you

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Solr 4.4.0 on Ubuntu 10.04 with Jetty 6.1 from package Repository

2013-10-10 Thread maephisto
Tried it and worked as expected with latest version of Jetty (.0.6 if I
remember correctly) and Solr 4.4.0
This tutorial should help you (it's verified by me and working):

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Multiple schemas in the same SolrCloud ?

2013-10-10 Thread maephisto
On this topic, once you've uploaded you collection's configuration in ZK, how
can you update it?
Upload the new one with the same config name ?

View this message in context:
Sent from the Solr - User mailing list archive at

Collection API wrong configuration

2013-10-09 Thread maephisto
I'm experimenting with SolrCloud using Solr 4.5.0  and the Collection API

What i did was: 
1. upload configuration to ZK -cmd upconfig -zkhost -d
solr/my_custom_collection/conf/ -n my_custom_collection
2. create a collection using the api:

The outcome of these action seem to be that the collection cores don't use
the my_custom_collection but the example configuration.
Any idea why this is happening?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Collection API wrong configuration

2013-10-09 Thread maephisto
Using Solr 4.4.0 the same scenarion behaves as expected.

Can anyone else try this, to check if it this only happens with 4.5.0 and if
so, is this a desired behaviour or a bug?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Collection API wrong configuration

2013-10-09 Thread maephisto
Yes, the problem described in the ticket is what I'm also confronting with.

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Can I use app specific document id as the document id that Solr uses for internal purposes?

2013-10-06 Thread maephisto
Just use 
field name=myCustomAppId /
and worry of nothing else

View this message in context:
Sent from the Solr - User mailing list archive at

Zookeeper : What goes on the new node ?

2013-10-04 Thread maephisto

Imagin a collection collection1 with 3 shards, replicationFactor=2 and
maxShardsPerNode=2 hosted on three machines.
Then add a new collection, collection2 configured the same. 
Cool, so now we have three machines each with 4 cores, a shard leader and a
replica for each of the two collection.

What happen if i add a new machine to the cluster? How will ZK decide what
goes there? Since the required nr of replicas are fullfilled, will it add
anything to the new node?


View this message in context:
Sent from the Solr - User mailing list archive at

Updating a indexed vs a non-indexed field

2013-10-04 Thread maephisto
In the last Solr versions Atomic Updates were introduced

I'm wondering, updating a field that is 
stored=true indexed=true 
would be different as updating a field that is 
stored=true indexed=false

Would Solr try to reindex the doc only if the field is indexed and not if
it's just a stored field ?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Updating a indexed vs a non-indexed field

2013-10-04 Thread maephisto
Thank you!

My question is actually what's the difference in updating an indexed field
vs. updating a non-indexed field? Will updating an indexed field trigger a
refresh in the solr indexes while in the other case wouldn't?

View this message in context:
Sent from the Solr - User mailing list archive at

SolrCloud Nodes infrastucture

2013-10-02 Thread maephisto

I have some foggy stuff on my goggles when i'm looking at SolrCloud, maybe
somebody can help me clear that out!

Let's assume I have a collection that's split in 3 shards. That means i
already have 3 nodes.
Then I also want some replication - replication factor = 2. That's 3 more
Zookeeeper it's a must so I'll give it 3 independent machines.

Now, my question is how many machines would I need for that 6 nodes? 
Is it possible to use just 3 machines, each with two Solr instances? Will ZK
then know not to assign the shard replica on the same machine as the shard
leader ?
Even more, is it possible to use just 3 big mean machines, each with a ZK
instance and 2 Solr instances ? Would I be sacrificing performance?


View this message in context:
Sent from the Solr - User mailing list archive at

Re: SolrCloud Nodes infrastucture

2013-10-02 Thread maephisto
Thank you Shalin!

Let's assume that this configuration is all set up, 3 machines each with 2
Solr instances and a Zk, and I have for collection1 3 shards and 3 replicas.
What if I want to add one more collection, sharded in 3 shards with a
replicationFactor=2 ? 

How can I do this, can it be done dinamically using the CoreAPI? do I need
to go to each solr instance and add the new core or can it be somehow

View this message in context:
Sent from the Solr - User mailing list archive at

Dynamic analizer settings change

2013-09-11 Thread maephisto
Let's take the following type definition and schema (borrowed from Rafal
Kuc's Solr 4 cookbook) :
fieldType name=text class=solr.TextField positionIncrementGap=100
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English/

and schema:

field name=id type=string indexed=true stored=true
required=true /
field name=title type=text indexed=true stored=true /

The above analizer will apply SnowballPorterFilter english language filter. 
But would it be possible to change the language to french during indexing
for some documents. is this possible? If not, what would be the best
solution for having the same analizer but with different languages, which
languange being determined at index time ?


View this message in context:
Sent from the Solr - User mailing list archive at

Re: Dynamic analizer settings change

2013-09-11 Thread maephisto
Thanks, Erik!

I might have missed mentioning something relevant. When querying Solr, I
wouldn't actually need to query all fields, but only the one corresponding
to the language picked by the user on the website. If he's using DE, then
the search should only apply to the text_de field.

What if I need to work with 50 different languages?
Then I would get a schema with 50 types and 50 fields (text_en, text_fr,
text_de, ...): won't this affect the performance ? bigger documents -
slower queries.

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Dynamic analizer settings change

2013-09-11 Thread maephisto
Thanks Jack! Indeed, very nice examples in your book.

Inspired from there, here's a crazy idea: would it be possible to build a
custom processor chain that would detect the language and use it to apply
filters, like the aforementioned SnowballPorterFilter.
That would leave at the end a document having as fields: text(with filtered
content) and language(the one determined by the processor).
And at search time, always append the language=user selected language.

Does this make sense? If so, would it affect the performance at index time?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-04 Thread maephisto
Thanks Shawn!

Indeed, setting the JAVA_OPTS and restarting Tomcat did the trick.
Currently I'm exploring and experimenting with SolrCloud, thus I only used
only one ZK.
For a production environment you suggestion would, of course, be mandatory.

View this message in context:
Sent from the Solr - User mailing list archive at

Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
I've setup a ZK instance and also deployed Solr in Tomcat7 on a different
instance in Amazon EC2.
Afterwards I tried starting tomcat specifying the ZK host IP, like so:

sudo service tomcat7 start -DzkHost=zk ip:2181 -DnumShards=3

Solr loads fine, but is not in the cloud. 

Any idea what am i doing wrong here?

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Starting Solr in Tomcat with specifying ZK host(s)

2013-09-03 Thread maephisto
When i try to deploy using jetty, everything works fine, and the solr
instance gets in the cloud

sudo java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkHost=zk ip:2181 -DnumShards=3 -jar

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Multiple replicas for specific shard

2013-08-28 Thread maephisto
Thanks Keith!

But could this be done dinamically?
Let's take the following example: a SolrCloud cluster with sport event
results split in three shards by category - footbal shard, golf shard and
baseball shard. Each of this shards has a replica on a machine.
Then i realize that my footbal related QPS grow dramatically so i decide to
add 2 more replicas for the footbal shard, on two new machines.

How can i proceed in this situatian ? 

View this message in context:
Sent from the Solr - User mailing list archive at

Re: Multiple replicas for specific shard

2013-08-28 Thread maephisto
Thanks Erik,
I think this answers my question

View this message in context:
Sent from the Solr - User mailing list archive at

Multiple replicas for specific shard

2013-08-27 Thread maephisto

Imagine the following configuration: a SolrCloud cluster, with 3 shards, a
replication factor of 2 and 6 nodes. 
Now, if i'll add one more node to the cluster ZK will automatically assign a
shard replica to it.

My question is, can i influence which of the shards to be replicated on the
new node? Can I have 5 replicas for one shard and just 1 for the others ?

Thank you!

View this message in context:
Sent from the Solr - User mailing list archive at