Re: TermsComponent from deleted document

2011-09-10 Thread Manish Bafna
Which is preferable? using TermsComponent or Facets for autosuggest?

On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : http://wiki.apache.org/solr/TermsComponent states that TermsComponent
 will
 : return frequencies from deleted documents too.
 :
 : Is there anyway to omit the deleted documents to get the frequencies.

 not really -- until a deleted document is expunged from segment merging,
 they are still included in the term stats which is what the TermsComponent
 looks at.

 If having 100% accurate term counts is really important to you, then you
 can optimize after doing any updates on your index - but there is
 obviously a performance tradeoff there.



 -Hoss



Re: Sorting groups by numFound group size

2011-09-10 Thread Martijn v Groningen
Not yet. If you want you can create an issue for sorting groups by numFound.

On 9 September 2011 18:49, O. Klein kl...@octoweb.nl wrote:

 I am also looking for way to sort on numFound.

 Has an issue been created?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Sorting-groups-by-numFound-group-size-tp3315740p3323420.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Met vriendelijke groet,

Martijn van Groningen


Re: How to write this query?

2011-09-10 Thread crisfromnova
Hi,

key:value1^8 key:value2^4 key:value3^2 is correct.

Sorry for bad query written.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-write-this-query-tp3318577p3325033.html
Sent from the Solr - User mailing list archive at Nabble.com.


Master Slave Question

2011-09-10 Thread Jamie Johnson
Is it appropriate to query the master servers when replicating?  I ask
because there could be a case where we index say 50 documents to the
master, they have not yet been replicated and a user asks for page 2,
when they ask for page 2 the request could be sent to a slave and get
0.  Is there a way to avoid this?  My thought was to not allow
querying of the master but I'm not sure that this could be configured
in solr


Re: Master Slave Question

2011-09-10 Thread Patrick Sauts
Real Time indexing (solr 4) or decrease replication poll and auto commit
time.

2011/9/10 Jamie Johnson jej2...@gmail.com

 Is it appropriate to query the master servers when replicating?  I ask
 because there could be a case where we index say 50 documents to the
 master, they have not yet been replicated and a user asks for page 2,
 when they ask for page 2 the request could be sent to a slave and get
 0.  Is there a way to avoid this?  My thought was to not allow
 querying of the master but I'm not sure that this could be configured
 in solr



Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
1s of all, thanks everyone, your expertise and time is much appreciated.

@Jamie:
Great suggestion, I just have one small objection to it ... I wouldn't
want to mix the core's name with the collection's configName. Wouldn't
you also want to keep the two separate for clarity? What do you think
about that?

@Yury:
Overall what you said makes sense and I'll roll with it. But FYI,
through experimentation I found out that collection=myconf does not
become the value for configName when I inspect ZooKeeper.jsp, here's
an example of what shows up if I setup the solr.xml file but don't say
anything in the cmd line startup:

myconf (v=0 children=1) configName=configuration1

But perhaps that's exactly what you are trying to warn me about. I'll
experiment more and get back.

- Pulkit

On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote:
 as a note you could change out the values in solr.xml to be as follows
 and pull these values from System Properties.

  cores adminPath=/admin/cores defaultCoreName=${collection.configName}
    core name=${collection.configName} instanceDir=. shard=${shard}/
  /cores

 unless someone says otherwise, but the quick tests I've run seem to
 work perfectly well with this setup.

 2011/9/9 Yury Kats yuryk...@yahoo.com:
 On 9/9/2011 6:54 PM, Pulkit Singhal wrote:
 Thanks Again.

 Another question:

 My solr.xml has:
   cores adminPath=/admin/cores defaultCoreName=master1
     core name=master1 instanceDir=. shard=shard1 
 collection=myconf/
   /cores

 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar

 With this you are telling ZK to bootstrap a collection with content of 
 specific
 files, but you don't tell what collection that should be.

 Hence you want collection.configName parameter, and you want
 solr.xml to reference the same name in 'collection' attribute for the cores,
 so that SolrCloud knows where to pull configuration for that core from.






Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
Yes now I'm sure that
a) collection=blah in solr.xml, and
b) -Dcollection.configName=myconf at cmd line
actually fill in values for two very different fields.

Here's why I say so:

Example config # 1:
core name=master1 instanceDir=. shard=shard1 collection=scaleDeep/

Results in:
/collections (v=6 children=1)
  scaleDeep (v=0 children=1) configName=myconf

Example config # 2:
Results in:
/collections (v=6 children=1)
  scaleDeep (v=0 children=1) configName=scaleDeep

What do you think about that? I maybe mis-interpreting the resutls so
pleaase pelase feel free to set me straight on this.

Also it would be nice if I knew the code well enough to just look @ it
and give an authoritative answer. Does anyone have that kind of
expertise? Reverse-engineering is getting a bit mundane.

Thanks!
- Pulkit

On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal
pulkitsing...@gmail.com wrote:
 1s of all, thanks everyone, your expertise and time is much appreciated.

 @Jamie:
 Great suggestion, I just have one small objection to it ... I wouldn't
 want to mix the core's name with the collection's configName. Wouldn't
 you also want to keep the two separate for clarity? What do you think
 about that?

 @Yury:
 Overall what you said makes sense and I'll roll with it. But FYI,
 through experimentation I found out that collection=myconf does not
 become the value for configName when I inspect ZooKeeper.jsp, here's
 an example of what shows up if I setup the solr.xml file but don't say
 anything in the cmd line startup:

 myconf (v=0 children=1) configName=configuration1

 But perhaps that's exactly what you are trying to warn me about. I'll
 experiment more and get back.

 - Pulkit

 On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote:
 as a note you could change out the values in solr.xml to be as follows
 and pull these values from System Properties.

  cores adminPath=/admin/cores defaultCoreName=${collection.configName}
    core name=${collection.configName} instanceDir=. shard=${shard}/
  /cores

 unless someone says otherwise, but the quick tests I've run seem to
 work perfectly well with this setup.

 2011/9/9 Yury Kats yuryk...@yahoo.com:
 On 9/9/2011 6:54 PM, Pulkit Singhal wrote:
 Thanks Again.

 Another question:

 My solr.xml has:
   cores adminPath=/admin/cores defaultCoreName=master1
     core name=master1 instanceDir=. shard=shard1 
 collection=myconf/
   /cores

 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar 
 start.jar

 With this you are telling ZK to bootstrap a collection with content of 
 specific
 files, but you don't tell what collection that should be.

 Hence you want collection.configName parameter, and you want
 solr.xml to reference the same name in 'collection' attribute for the cores,
 so that SolrCloud knows where to pull configuration for that core from.







Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Pulkit Singhal
Sorry a message got sent without me finishing it up, ctrl+s is not save but
send ... sigh!

Yes now I'm sure that
a) collection=blah in solr.xml, and
b) -Dcollection.configName=myconf at cmd line
actually fill in values for two very different fields.

Here's why I say so:

Example config # 1:
core name=master1 instanceDir=. shard=shard1 collection=*scaleDeep*
/
java -Dcollection.configName=*myconf* ... -DzkRun -jar start.jar

Results in:
/collections (v=6 children=1)
*scaleDeep* (v=0 children=1) configName=*myconf*

Example config # 2:
core name=master1 instanceDir=. shard=shard1 collection=*scaleDeep*
/
java -Dcollection.configName=*scaleDeep* ... -DzkRun -jar start.jar

Results in:
/collections (v=6 children=1)
*scaleDeep* (v=0 children=1) configName=*scaleDeep*

What do you think about that? I maybe mis-interpreting the results so
please please feel free to set me straight on this.

Also it would be nice if I knew the code well enough to just look @ it
and give an authoritative answer. Does anyone have that kind of
expertise? Reverse-engineering is getting a bit mundane.

Thanks!
- Pulkit

 On Sat, Sep 10, 2011 at 11:43 AM, Pulkit Singhal
 pulkitsing...@gmail.com wrote:
 1s of all, thanks everyone, your expertise and time is much appreciated.

 @Jamie:
 Great suggestion, I just have one small objection to it ... I wouldn't
 want to mix the core's name with the collection's configName. Wouldn't
 you also want to keep the two separate for clarity? What do you think
 about that?

 @Yury:
 Overall what you said makes sense and I'll roll with it. But FYI,
 through experimentation I found out that collection=myconf does not
 become the value for configName when I inspect ZooKeeper.jsp, here's
 an example of what shows up if I setup the solr.xml file but don't say
 anything in the cmd line startup:

 myconf (v=0 children=1) configName=configuration1

 But perhaps that's exactly what you are trying to warn me about. I'll
 experiment more and get back.

 - Pulkit

 On Fri, Sep 9, 2011 at 10:17 PM, Jamie Johnson jej2...@gmail.com wrote:
 as a note you could change out the values in solr.xml to be as follows
 and pull these values from System Properties.

  cores adminPath=/admin/cores
defaultCoreName=${collection.configName}
core name=${collection.configName} instanceDir=.
shard=${shard}/
  /cores

 unless someone says otherwise, but the quick tests I've run seem to
 work perfectly well with this setup.

 2011/9/9 Yury Kats yuryk...@yahoo.com:
 On 9/9/2011 6:54 PM, Pulkit Singhal wrote:
 Thanks Again.

 Another question:

 My solr.xml has:
   cores adminPath=/admin/cores defaultCoreName=master1
 core name=master1 instanceDir=. shard=shard1
collection=myconf/
   /cores

 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar
start.jar

 With this you are telling ZK to bootstrap a collection with content of
specific
 files, but you don't tell what collection that should be.

 Hence you want collection.configName parameter, and you want
 solr.xml to reference the same name in 'collection' attribute for the
cores,
 so that SolrCloud knows where to pull configuration for that core from.








Re: TermsComponent from deleted document

2011-09-10 Thread Martijn v Groningen
I'd use the suggester:
http://wiki.apache.org/solr/Suggester

The suggester can give a collation. The TermsComponent can't do that.
The suggester builds on top of the spellchecking infrastructure, so
should be easy to use if you're familiar with that.

Martijn

On 10 September 2011 08:37, Manish Bafna manish.bafna...@gmail.com wrote:

 Which is preferable? using TermsComponent or Facets for autosuggest?

 On Fri, Sep 9, 2011 at 10:33 PM, Chris Hostetter
 hossman_luc...@fucit.orgwrote:

 
  : http://wiki.apache.org/solr/TermsComponent states that TermsComponent
  will
  : return frequencies from deleted documents too.
  :
  : Is there anyway to omit the deleted documents to get the frequencies.
 
  not really -- until a deleted document is expunged from segment merging,
  they are still included in the term stats which is what the TermsComponent
  looks at.
 
  If having 100% accurate term counts is really important to you, then you
  can optimize after doing any updates on your index - but there is
  obviously a performance tradeoff there.
 
 
 
  -Hoss
 



--
Met vriendelijke groet,

Martijn van Groningen


Full-search index for the database

2011-09-10 Thread Eugeny Balakhonov
I want to create full-text search for my database.

It means that search engine should look up some string for all fields of my
database.

I have created Solr configuration for extracting and indexing data from a
database.

 

 

According documentation in the file schema.xml I have created field for
full-text search index:

 

field name=TEXT type=... indexed=true stored=true
multiValued=true/

 

Also I have added strings for copying all values of all fields into this
full-search field:

 

...

copyField source= dest=TEXT/

...

 

In result I have possibility to search for all fields in my database. But I
can't recognize which field in the found record contains requested string.

Highlighting functionality just marks string in the TEXT field like
following:

 

lst name=highlighting

lst name=431046.431344...8473633

  arr name=TEXT

strAny text any text emTest/em/str 

  /arr

/lst

lst name=431046.431231...8476393

  arr name=TEXT

   strAny text any text emTest/em/str 

  /arr

/lst

 

How to create full-search index with possibility to recognize source
database field?

 

Thx a lot.

Eugeny



Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Pulkit Singhal
Hi Yury,

How do you manage to start the instances without any issues? The way I see
it, no matter which instance is started first, the slave will complain about
not being to find its respective master because that instance hasn't been
started yet ... no?

Thanks,
- Pulkit

2011/5/17 Yury Kats yuryk...@yahoo.com

 On 5/17/2011 10:17 AM, Stefan Matheis wrote:
  Yury,
 
  perhaps Java-Pararms (like used for this sample:
 
 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
 )
  can help you?

 Ah, thanks! It does seem to work!

 Cluster's solrconfig.xml (shared between all Solr instances and cores via
 SolrCloud/ZK):
 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAftercommit/str
str name=replicateAfterstartup/str
  /lst
   lst name=slave
str name=enable${enable.slave:false}/str
str name=pollInterval00:01:00/str
str name=masterUrlhttp://
 ${masterHost:xyz}/solr/master/replication/str
  /lst
 /requestHandler

 Node 1 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard1
 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard2
 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node2:8983 /
/core
  /cores

 Node 2 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard2
 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard1
 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node1:8983 /
/core
  /cores




Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Mark Miller

On Sep 9, 2011, at 6:54 PM, Pulkit Singhal wrote:

 Thanks Again.
 
 Another question:
 
 My solr.xml has:
  cores adminPath=/admin/cores defaultCoreName=master1
core name=master1 instanceDir=. shard=shard1 collection=myconf/
  /cores
 
 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar

I don't think so? The CoreAdmin handler takes a collection= param to name the 
collection the SolrCore belongs to. This is different than setting the name of 
the config file set to use. If you don't specify anything, I believe it 
defaults to configuration1. The conf file set name has nothing to do with the 
collection name.

 
 But the zookeeper.jsp page doesn't seem to take any of that into
 effect and shows:
 /collections (v=6 children=1)
  collection1 (v=0 children=1) configName=configuration1
   shards (v=0 children=1)
shard1 (v=0 children=1)
 tiklup-mac.local:8983_solr_ (v=0)
 node_name=tiklup-mac.local:8983_solr
 url=http://tiklup-mac.local:8983/solr/;
 
 Then what is the point of naming the core and the collection?

The SolrCore name determines which URL's to use to work with that core. The 
collection name determines which collection the SolrCore acts as a shard in. 
The collection.configName is the name of the config file set you are uploading 
- if you leave it out, it's called configuration1.

 
 - Pulkit
 
 2011/9/9 Yury Kats yuryk...@yahoo.com:
 On 9/9/2011 10:52 AM, Pulkit Singhal wrote:
 Thank You Yury. After looking at your thread, there's something I must
 clarify: Is solr.xml not uploaded and held in ZooKeeper?
 
 Not as far as I understand. Cores are loaded/created by the local
 Solr server based on solr.xml and then registered with ZK, so that
 ZK know what cores are out there and how they are organized in shards.
 
 
 because you have a slightly different config between Node 1  2:
 http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html
 
 
 I have two shards, each shard having a master and a slave core.
 Cores are located so that master and slave are on different nodes.
 This protects search (but not indexing) from node failure.
 

- Mark Miller
lucidimagination.com
2011.lucene-eurocon.org | Oct 17-20 | Barcelona












Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Pulkit Singhal
Sorry, stupid question, now I see that the core still starts and the polling
process simply logs an error:

SEVERE: Master at: http://localhost:7574/solr/master2/replication is not
available.
Index fetch failed. Exception: Connection refused

I was able to setup the instructions in-detail with this thread's help here:
http://pulkitsinghal.blogspot.com/2011/09/multicore-master-slave-replication-in.html

Thanks,
- Pulkit

On Sat, Sep 10, 2011 at 2:54 PM, Pulkit Singhal pulkitsing...@gmail.comwrote:

 Hi Yury,

 How do you manage to start the instances without any issues? The way I see
 it, no matter which instance is started first, the slave will complain about
 not being to find its respective master because that instance hasn't been
 started yet ... no?

 Thanks,
 - Pulkit

 2011/5/17 Yury Kats yuryk...@yahoo.com

 On 5/17/2011 10:17 AM, Stefan Matheis wrote:
  Yury,
 
  perhaps Java-Pararms (like used for this sample:
 
 http://wiki.apache.org/solr/SolrReplication#enable.2BAC8-disable_master.2BAC8-slave_in_a_node
 )
  can help you?

 Ah, thanks! It does seem to work!

 Cluster's solrconfig.xml (shared between all Solr instances and cores via
 SolrCloud/ZK):
 requestHandler name=/replication class=solr.ReplicationHandler 
  lst name=master
 str name=enable${enable.master:false}/str
 str name=replicateAftercommit/str
str name=replicateAfterstartup/str
  /lst
   lst name=slave
str name=enable${enable.slave:false}/str
str name=pollInterval00:01:00/str
str name=masterUrlhttp://
 ${masterHost:xyz}/solr/master/replication/str
  /lst
 /requestHandler

 Node 1 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard1
 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard2
 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node2:8983 /
/core
  /cores

 Node 2 solr.xml:
  cores adminPath=/admin/cores defaultCoreName=master
core name=master instanceDir=core1 shard=shard2
 collection=myconf 
  property name=enable.master value=true /
/core
core name=slave instanceDir=core2 shard=shard1
 collection=myconf
  property name=enable.slave value=true /
  property name=masterHost value=node1:8983 /
/core
  /cores





Nested documents

2011-09-10 Thread Andy
Hi,

Does Solr support nested documents? If not is there any plan to add such a 
feature?

Thanks.

Re: Replication setup with SolrCloud/Zk

2011-09-10 Thread Yury Kats
On 9/10/2011 3:54 PM, Pulkit Singhal wrote:
 Hi Yury,
 
 How do you manage to start the instances without any issues? The way I see
 it, no matter which instance is started first, the slave will complain about
 not being to find its respective master because that instance hasn't been
 started yet ... no?

Yes, but it's not a big deal. The slaves polls periodically, so next time
around the master will be up.


Re: Master Slave Question

2011-09-10 Thread Jamie Johnson
Is this feature on Trunk currently?

On Sat, Sep 10, 2011 at 12:26 PM, Patrick Sauts
patrick.via...@gmail.com wrote:
 Real Time indexing (solr 4) or decrease replication poll and auto commit
 time.

 2011/9/10 Jamie Johnson jej2...@gmail.com

 Is it appropriate to query the master servers when replicating?  I ask
 because there could be a case where we index say 50 documents to the
 master, they have not yet been replicated and a user asks for page 2,
 when they ask for page 2 the request could be sent to a slave and get
 0.  Is there a way to avoid this?  My thought was to not allow
 querying of the master but I'm not sure that this could be configured
 in solr




Re: Solr Cloud - is replication really a feature on the trunk?

2011-09-10 Thread Jamie Johnson
Thanks Mark, so perhaps a more appropriate config would be

cores adminPath=/admin/cores defaultCoreName=${coreName}
core name=${coreName} instanceDir=. shard=${shard}
collection=${collection}/
  /cores

which would require the collection and coreName be specified as System
Properties.

On Sat, Sep 10, 2011 at 4:22 PM, Mark Miller markrmil...@gmail.com wrote:

 On Sep 9, 2011, at 6:54 PM, Pulkit Singhal wrote:

 Thanks Again.

 Another question:

 My solr.xml has:
  cores adminPath=/admin/cores defaultCoreName=master1
    core name=master1 instanceDir=. shard=shard1 collection=myconf/
  /cores

 And I omitted -Dcollection.configName=myconf from the startup command
 because I felt that specifying collection=myconf should take care of
 that:
 cd /trunk/solr/example
 java -Dbootstrap_confdir=./solr/conf -Dslave=disabled -DzkRun -jar start.jar

 I don't think so? The CoreAdmin handler takes a collection= param to name the 
 collection the SolrCore belongs to. This is different than setting the name 
 of the config file set to use. If you don't specify anything, I believe it 
 defaults to configuration1. The conf file set name has nothing to do with the 
 collection name.


 But the zookeeper.jsp page doesn't seem to take any of that into
 effect and shows:
     /collections (v=6 children=1)
          collection1 (v=0 children=1) configName=configuration1
               shards (v=0 children=1)
                    shard1 (v=0 children=1)
                         tiklup-mac.local:8983_solr_ (v=0)
 node_name=tiklup-mac.local:8983_solr
 url=http://tiklup-mac.local:8983/solr/;

 Then what is the point of naming the core and the collection?

 The SolrCore name determines which URL's to use to work with that core. The 
 collection name determines which collection the SolrCore acts as a shard in. 
 The collection.configName is the name of the config file set you are 
 uploading - if you leave it out, it's called configuration1.


 - Pulkit

 2011/9/9 Yury Kats yuryk...@yahoo.com:
 On 9/9/2011 10:52 AM, Pulkit Singhal wrote:
 Thank You Yury. After looking at your thread, there's something I must
 clarify: Is solr.xml not uploaded and held in ZooKeeper?

 Not as far as I understand. Cores are loaded/created by the local
 Solr server based on solr.xml and then registered with ZK, so that
 ZK know what cores are out there and how they are organized in shards.


 because you have a slightly different config between Node 1  2:
 http://lucene.472066.n3.nabble.com/Replication-setup-with-SolrCloud-Zk-td2952602.html


 I have two shards, each shard having a master and a slave core.
 Cores are located so that master and slave are on different nodes.
 This protects search (but not indexing) from node failure.


 - Mark Miller
 lucidimagination.com
 2011.lucene-eurocon.org | Oct 17-20 | Barcelona