cores vs indices

2011-08-08 Thread Daniel Schobel
Can someone provide me with a succinct defintion of what a solr core
is? Is there a one-to-one relationship of cores to solr indices or can
you have multiple indices per core?

Cheers,

Daniel


Re: cores vs indices

2011-08-08 Thread Dave Stuart
Hi Daniel,

Yes there is a one-to-one relationship between Solr indices and cores. The one 
to many comes when you look at the relationship between cores and tomcat/jetty 
webapps instances. This gives you the ability to clone, add and swap cores 
around.

See for for core manipulation functions:
http://wiki.apache.org/solr/CoreAdmin

Regards,


Dave


On 8 Aug 2011, at 04:35, Daniel Schobel wrote:

 Can someone provide me with a succinct defintion of what a solr core
 is? Is there a one-to-one relationship of cores to solr indices or can
 you have multiple indices per core?
 
 Cheers,
 
 Daniel



Can Master push data to slave

2011-08-08 Thread Pawan Darira
Hi

I am using Solr 1.4. and doing a replication process where my slave is
pulling data from Master. I have 2 questions

a. Can Master push data to slave
b. How to make sure that lock file is not created while replication

Please help

thanks
Pawan


string cut-off filter?

2011-08-08 Thread Bernd Fehling

Hi list,

is there a string cut-off filter to limit the length
of a KeywordTokenized string?

So the string should not be dropped, only limitited to a
certain length.

Regards
Bernd


Scoring using POJO/SolrJ

2011-08-08 Thread Kissue Kissue
Hi,

I am using the SolrJ client library and using a POJO with the @Field
annotation to index documents and to retrieve documents from the index. I
retrieve the documents from the index like so:

ListItem beans = response.getBeans(Item.class)

Now in order to add the scores to the beans i added a field called score
with the @Field annotation and the scores were then returned when i read
from the index.

Now when i am indexing, i get the error: ERROR:unknown field 'score'. I
guess because it expects the score to be defined in my schema. Now i am
thinking that if i define this field in my schema then rather than returning
the document scores it might just go ahead and return actual values for the
field (null if i dont add a value).

How can i go around this problem?

Many thanks.


how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
hi all,
I read Apache Solr 3.1 Released Note today and found that
MMapDirectory is now the default implementation in 64 bit Systems.
I am now using solr 1.4 with 64-bit jvm in Linux. how can I use
MMapDirectory? will it improve performance?


Multiplexing TokenFilter for multi-language?

2011-08-08 Thread cnyee
Sorry if this has already been discussed, but I have already spent a couple
of days googling in vain

The problem:
- documents in multiple languages (us, de, fr, es).
- language is known (a team of editors determines the language manually, and
users are asked to specify language option for searching).

My intended approach:
- one index.
- a multiplexing token filter, a MultilingualSnowballFilterFactory that
instantiates a Snowball Stemmer for the appropriate language.
- language is a facet, to get rid of cross-language ambiguities with
multiple languages mixed in the same field.

The problem is how to communicate the language to the
MultilingualSnowballFilterFactory. Once the language is known, instantiating
the Snowball Stemmer for the right language is easy. I got a working version
attached below. 

My solution:
- append the language as the first token for the FilterFactory to pick up.
E.g. es This is a spanish document.
- this would mean I need to duplicate the fields - an original version for
storing, and a version with the language marker appended for indexing. E.g
description (indexed=false, stored=true), description_i (indexed=true,
stored=false).

Is there a better way?

Many thanks in advance.

Yee

http://lucene.472066.n3.nabble.com/file/n3235341/MultilingualSnowballFilterFactory.java
MultilingualSnowballFilterFactory.java 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-tp3235341p3235341.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Dyer, James
If you want to try MMapDirectory with Solr 1.4, then copy the class 
org.apache.solr.core.MMapDirectoryFactory from 3.x or Trunk, and either add it 
to the .war file (you can just add it under src/java and re-package the war), 
or you can put it in its own .jar file in the lib directory under 
solr_home.  Then, in solrconfig.xml, add this entry under the root config 
element:

directoryFactory class=org.apache.solr.core.MMapDirectoryFactory /

I'm not sure if MMapDirectory will perform better for you with Linux over 
NIOFSDir.  I'm pretty sure in Trunk/4.0 it's the default for Windows and maybe 
Solaris.  In Windows, there is a definite advantage for using MMapDirectory on 
a 64-bit system.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Li Li [mailto:fancye...@gmail.com] 
Sent: Monday, August 08, 2011 4:09 AM
To: solr-user@lucene.apache.org
Subject: how to enable MMapDirectory in solr 1.4?

hi all,
I read Apache Solr 3.1 Released Note today and found that
MMapDirectory is now the default implementation in 64 bit Systems.
I am now using solr 1.4 with 64-bit jvm in Linux. how can I use
MMapDirectory? will it improve performance?


PositionIncrement gap and multi-valued fields.

2011-08-08 Thread Luis Cappa Banda
Hello!

I have a doubt about the behaviour of searching over field types that have
positionIncrementGap defined. For example, supose that:


   1. We have a field called test defined as multi-valued and white space
   tokenized.
   2. The index has an single document with a test value:

str
TEST1
/str
str
AAA BBB
/str
str
CCC DDD
/str
str
EEE FFF
/str
str
TEST2
/str


I read that positionIncrementGap defines the virtual space between the last
token of one field instance and the first token of the next instance
(source:
http://lucene.472066.n3.nabble.com/positionIncrementGap-in-schema-xml-td488338.html).
When it says last token of one field instance means that is the last token
of the first entry from the multi-valued content? In our example before it
will be TEST1.

Anyway, I've been doing some tests modifying the positionIncrementGap value
with high values and low values. Can anybody explain me with detail which
implications has in Solr scoring algorythm an upper and a lower value? I
would like to understand how this value affects matching results in fields
and also calculating the final score (maybe more gap implies more spaces and
a worst score when the value matches, etc.).

Thank you for reading so far!


Re: Weighted facet strings

2011-08-08 Thread Jonathan Rochkind
One kind of hacky way to accomplish some of those tasks involves 
creating a lot more Solr fields. (This kind of 'de-normalization' is 
often the answer to how to make Solr do something).


So facet fields are ordinarily not tokenized or normalized at all. But 
that doesn't work very well for matching query terms.  So if you want 
actual queries to match on these categories, you probably want an 
additional field that is tokenized/analyzed.  If you want to boost 
different category assignments differently, you probably want _multiple_ 
additional tokenized/analyzed fields.


So for instance, create separate analyzed fields for each category 
'weight', perhaps using the default 'text' analysis type.


categor_text_weight_1
category_text_weight_2
etc

Then use dismax to query, include all those category_text_* fields in 
the 'qf', and boost the higher weight ones more than the lower weight ones.


That will handle a number of your use cases, but not all of them.

Your first two cases are the most problematic:

filter: category=some_category_name, query: *.* - Results should be 
score by the above mentioned weight 


So Solr doesn't really work like that. Normally a filter does not effect 
the scoring of the actual results _at all_. But if you change the query to:


fq=category:some_category
q=some_category
defType=dismax
qf=category_text_weight1, category_text_weight2^10, 
category_text_weight3^20


THEN, with the multiple analyzed category_text_weight_* fields, as 
described above, I think it should do what you want. You may have to 
play with exactly what boost to give to each field.


But your second use case is still tricky.

Solr doesn't really do exactly what you ask, but by using this method I 
think you can figure out hacky ways to accomplish it.  I'm not sure if 
it will solve all of your use cases, but maybe this will give you a 
start to figuring it out.



On 8/5/2011 6:55 AM, Michael Lorz wrote:

Hi all,

I have documents which are (manually) tagged whith categories. Each
category-document relation has a weight between 1 and 5:

5: document fits perfectly in this category,
.
.
1: document may be considered as belonging to this category.


I would now like to use this information with solr. At the moment, I don't use
the weight at all:

field name=category type=string indexed=true stored=true
multiValued=true/

Both the category as well as the document body are specified as query fields
(str name=qf  in solrconfig.xml).


What I would like is the following:

- filter: category=some_category_name, query: *.*  - Results should be score by
the above mentioned weight
- filter: category=some_category_name, query: some_keyword - Results should be
scored by a combination of the score of 'some_keyword' and the above mentioned
weight
- filter: none, query: some_category_name - Documents with category
'some_category_name' should be found as well as documents which contain the term
'some_category_name'. Results should be scored by a combination of the score of
'some_keyword' and the above mentioned weight


Do you have any ideas how this could be done?

Thanks in advance
Michi


Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Rich Cariens
We patched our 1.4.1 build with
SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969(making
MMapDirectory configurable) and realized a 64% search performance
boost on our Linux hosts.

On Mon, Aug 8, 2011 at 10:05 AM, Dyer, James james.d...@ingrambook.comwrote:

 If you want to try MMapDirectory with Solr 1.4, then copy the class
 org.apache.solr.core.MMapDirectoryFactory from 3.x or Trunk, and either add
 it to the .war file (you can just add it under src/java and re-package the
 war), or you can put it in its own .jar file in the lib directory under
 solr_home.  Then, in solrconfig.xml, add this entry under the root
 config element:

 directoryFactory class=org.apache.solr.core.MMapDirectoryFactory /

 I'm not sure if MMapDirectory will perform better for you with Linux over
 NIOFSDir.  I'm pretty sure in Trunk/4.0 it's the default for Windows and
 maybe Solaris.  In Windows, there is a definite advantage for using
 MMapDirectory on a 64-bit system.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: Monday, August 08, 2011 4:09 AM
 To: solr-user@lucene.apache.org
 Subject: how to enable MMapDirectory in solr 1.4?

 hi all,
I read Apache Solr 3.1 Released Note today and found that
 MMapDirectory is now the default implementation in 64 bit Systems.
I am now using solr 1.4 with 64-bit jvm in Linux. how can I use
 MMapDirectory? will it improve performance?



solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Ian Connor
Hi,

I have seen some of these errors come through from time to time. It looks
like:

/usr/lib/ruby/1.8/net/http.rb:1060:in
`request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post'

/usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in
`post'

/usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in
`send'

/usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in
`create_and_send_query'

/usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in
`query'

It is as if the http object has gone away. Would it be good to create a new
one inside of the connection or is something more serious going on?
ubuntu 10.04
passenger 3.0.8
rails 2.3.11

-- 
Regards,

Ian Connor


Re: solr-ruby: Error undefined method `closed?' for nil:NilClass

2011-08-08 Thread Erik Hatcher
Ian -

What does your solr-ruby using code look like?

Solr::Connection is light-weight, so you could just construct a new one of 
those for each request.  Are you keeping an instance around?

Erik


On Aug 8, 2011, at 12:03 , Ian Connor wrote:

 Hi,
 
 I have seen some of these errors come through from time to time. It looks
 like:
 
 /usr/lib/ruby/1.8/net/http.rb:1060:in
 `request'\n/usr/lib/ruby/1.8/net/http.rb:845:in `post'
 
 /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:158:in
 `post'
 
 /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:151:in
 `send'
 
 /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:174:in
 `create_and_send_query'
 
 /usr/lib/ruby/gems/1.8/gems/solr-ruby-0.0.8/lib/solr/connection.rb:92:in
 `query'
 
 It is as if the http object has gone away. Would it be good to create a new
 one inside of the connection or is something more serious going on?
 ubuntu 10.04
 passenger 3.0.8
 rails 2.3.11
 
 -- 
 Regards,
 
 Ian Connor



edismax configuration

2011-08-08 Thread Mark juszczec
Hello all

Can someone direct me to a link with config info in order to allow use of
the edismax QueryHandler?

Mark


is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable to do it
without passing in data to the q paramater.  Is it possible to get a sorted
list without searching for any terms?


Test failures on lucene_solr_3_3 and branch_3x

2011-08-08 Thread Shawn Heisey
I've got a consistent test failure on Solr source code checked out from 
svn.  The same thing happens with 3.3 and branch_3x.  I have information 
saved from the failures on branch_3x, which I have gotten to to fail 
about a dozen times in a row.  It fails on a test called 
TestSqlEntityProcessorDelta, part of the dataimporthandler tests.  It is 
consistently reproducible in a shorter timeframe than normal with the 
following commandline:


ant test -Dtestcase=TestSqlEntityProcessorDelta

Comprehensive ant output here, from a full test run:

http://pastebin.com/eyAt8Qg8

Platform information:

[root@idxst0-a solr]# uname -a
Linux idxst0-a 2.6.18-238.12.1.el5.centos.plusxen #1 SMP Wed Jun 1 
11:57:54 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

[root@idxst0-a solr]# cat /etc/redhat-release
CentOS release 5.6 (Final)
[root@idxst0-a solr]# java -version
java version 1.6.0_26
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)

[root@idxst0-a yum.repos.d]# yum repolist
Loaded plugins: fastestmirror, protectbase
Loading mirror speeds from cached hostfile
 * addons: mirror.san.fastserv.com
 * base: mirrors.tummy.com
 * centosplus: mirror.san.fastserv.com
 * contrib: mirror.san.fastserv.com
 * epel: mirrors.xmission.com
 * extras: mirrors.xmission.com
 * jpackage-generic: jpackage.netmindz.net
 * jpackage-generic-nonfree: www.mirrorservice.org
 * jpackage-generic-nonfree-updates: www.mirrorservice.org
 * jpackage-generic-updates: jpackage.netmindz.net
 * jpackage-rhel: jpackage.netmindz.net
 * jpackage-rhel-updates: jpackage.netmindz.net
 * rpmforge: fr2.rpmfind.net
 * updates: mirrors.tummy.com



Re: is it possible to do a sort without query?

2011-08-08 Thread Alexei Martchenko
You can use the standard query parser and pass q=*:*

2011/8/8 Jason Toy jason...@gmail.com

 I am trying to list some data based on a function I run ,
 specifically  termfreq(post_text,'indie music')  and I am unable to do it
 without passing in data to the q paramater.  Is it possible to get a sorted
 list without searching for any terms?




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Aelexei, thank you , that does seem to work.

My sort results seem to be totally wrong though, I'm not sure if its because
of my sort function or something else.

My query consists of:
sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
And I get back 4571232 hits.
All the results don't have the phrase indie music anywhere in their data.
 Does termfreq not support phrases?
If not, how can I sort specifically by termfreq of a phrase?



On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko 
ale...@superdownloads.com.br wrote:

 You can use the standard query parser and pass q=*:*

 2011/8/8 Jason Toy jason...@gmail.com

  I am trying to list some data based on a function I run ,
  specifically  termfreq(post_text,'indie music')  and I am unable to do it
  without passing in data to the q paramater.  Is it possible to get a
 sorted
  list without searching for any terms?
 



 --

 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533




-- 
- sent from my mobile
6176064373


solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
hi, i have my solr field text configured as per earlier discussion:

 fieldType name=text class=solr.TextField positionIncrementGap=100
autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/



filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType


and for debugging purposes i am storing the text field as well, so:


   field name=text type=text indexed=true stored=true /

now when i do a search against a document, that i KNOW has a certain phrase,
in this case official handbook of the Federal Government

my query looks like:

result name=response numFound=0 start=0 maxScore=0.0/lst
name=debugstr name=rawquerystringid:062085.1 AND text:official
handbook of the Federal Government/strstr name=querystringid:062085.1
AND text:official handbook of the Federal Government/strstr
name=parsedquery+id:062085.1 +PhraseQuery(text:official handbook of the
federal government)/strstr name=parsedquery_toString+id:062085.1
+text:official handbook of the federal government/str


i get 0 results, so, when i search just for that id, and i get the result:


way way at the end sure enough is the string

http://qihealing.net/doc.txt output 

is there a document size limit or is it the fact that im sending to solr
using solrj and its too large?






--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-1-not-indexing-entire-document-tp3236719p3236719.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr 3.1, not indexing entire document?

2011-08-08 Thread Markus Jelsma
Check your maxFieldLength settting.

 hi, i have my solr field text configured as per earlier discussion:
 
  fieldType name=text class=solr.TextField positionIncrementGap=100
 autoGeneratePhraseQueries=true
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 
 
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldType
 
 
 and for debugging purposes i am storing the text field as well, so:
 
 
field name=text type=text indexed=true stored=true /
 
 now when i do a search against a document, that i KNOW has a certain
 phrase, in this case official handbook of the Federal Government
 
 my query looks like:
 
 result name=response numFound=0 start=0 maxScore=0.0/lst
 name=debugstr name=rawquerystringid:062085.1 AND text:official
 handbook of the Federal Government/strstr
 name=querystringid:062085.1 AND text:official handbook of the Federal
 Government/strstr
 name=parsedquery+id:062085.1 +PhraseQuery(text:official handbook of the
 federal government)/strstr name=parsedquery_toString+id:062085.1
 +text:official handbook of the federal government/str
 
 
 i get 0 results, so, when i search just for that id, and i get the result:
 
 
 way way at the end sure enough is the string
 
 http://qihealing.net/doc.txt output
 
 is there a document size limit or is it the fact that im sending to solr
 using solrj and its too large?
 
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solr-3-1-not-indexing-entire-document-t
 p3236719p3236719.html Sent from the Solr - User mailing list archive at
 Nabble.com.


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Yury Kats
On 8/8/2011 4:34 PM, Jason Toy wrote:
 Aelexei, thank you , that does seem to work.
 
 My sort results seem to be totally wrong though, I'm not sure if its because
 of my sort function or something else.
 
 My query consists of:
 sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
 And I get back 4571232 hits.

That would be the total number of docs, I guess.
Since your query is *:*, ie find everything.

 All the results don't have the phrase indie music anywhere in their data.

You are only sorting on termfreq of indie music, you are not querying
documents that contain it.


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma

 Aelexei, thank you , that does seem to work.
 
 My sort results seem to be totally wrong though, I'm not sure if its
 because of my sort function or something else.
 
 My query consists of:
 sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
 And I get back 4571232 hits.

That's normal, you issue a catch all query. Sorting should work but..

 All the results don't have the phrase indie music anywhere in their data.
  Does termfreq not support phrases?

No, it is TERM frequency and indie music is not one term. I don't know how 
this function parses your input but it might not understand your + escape and 
think it's one term constisting of exactly that.

 If not, how can I sort specifically by termfreq of a phrase?

You cannot. What you can do is index multiple terms as one term using the 
shingle filter. Take care, it can significantly increase your index size and 
number of unique terms.

 
 
 
 On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko 
 
 ale...@superdownloads.com.br wrote:
  You can use the standard query parser and pass q=*:*
  
  2011/8/8 Jason Toy jason...@gmail.com
  
   I am trying to list some data based on a function I run ,
   specifically  termfreq(post_text,'indie music')  and I am unable to do
   it without passing in data to the q paramater.  Is it possible to get
   a
  
  sorted
  
   list without searching for any terms?
  
  --
  
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533


Re: edismax configuration

2011-08-08 Thread Markus Jelsma
http://wiki.apache.org/solr/CommonQueryParameters#defType

 Hello all
 
 Can someone direct me to a link with config info in order to allow use of
 the edismax QueryHandler?
 
 Mark


Re: edismax configuration

2011-08-08 Thread Mark juszczec
Got it.  Thank you.

I thought this was going to be much more difficult than it actually was.

Mark

On Mon, Aug 8, 2011 at 4:50 PM, Markus Jelsma markus.jel...@openindex.iowrote:

 http://wiki.apache.org/solr/CommonQueryParameters#defType

  Hello all
 
  Can someone direct me to a link with config info in order to allow use of
  the edismax QueryHandler?
 
  Mark



Re: PivotFaceting in solr 3.3

2011-08-08 Thread Erik Hatcher
As far as I know, there isn't a patch for pivot faceting for 3.x.  It'd require 
extracting the code from trunk and porting it.  Perhaps as easy as applying the 
diff from the pivot commit from trunk to the 3.x codebase?  (but probably not 
quite that easy)

Erik

On Aug 3, 2011, at 00:58 , Isha Garg wrote:

 Hi Pranav,
 
 I know Pivot faceting is a feature in solr 4.0 But i want is 
 there any patch that can make pivot faceting possible in solr3.3.
 Thanks!
 Isha
 
 
 On Wednesday 03 August 2011 10:23 AM, Pranav Prakash wrote:
 From what I know, this is a feature in Solr 4.0 marked as SOLR-792 in JIRA.
 Is this what you are looking for ?
 
 https://issues.apache.org/jira/browse/SOLR-792
 
 
 *Pranav Prakash*
 
 temet nosce
 
 Twitterhttp://twitter.com/pranavprakash  | Bloghttp://blog.myblive.com  |
 Googlehttp://www.google.com/profiles/pranny
 
 
 On Wed, Aug 3, 2011 at 10:16, Isha Gargisha.g...@orkash.com  wrote:
 
   
 Hi All!
 
  Can anyone tell which patch should I apply to solr 3.3 to enable pivot
 faceting in it.
 
 Thanks in advance!
 Isha garg
 
 
 
 
 
 
   
 



Re: string cut-off filter?

2011-08-08 Thread karsten-solr
Hi Bernd,

I also searched for such a filter but did not found it.

Best regards
  Karsten

P.S. I am using now this filter:

public class CutMaxLengthFilter extends TokenFilter {

public CutMaxLengthFilter(TokenStream in) {
this(in, DEFAULT_MAXLENGTH);
}

public CutMaxLengthFilter(TokenStream in, int maxLength) {
super(in);
this.maxLength = maxLength;
}

public static final int DEFAULT_MAXLENGTH = 15;
private final int maxLength;
private final CharTermAttribute termAtt = 
addAttribute(CharTermAttribute.class);

@Override
public final boolean incrementToken() throws IOException {
if (!input.incrementToken()) {
return false;
}
int length = termAtt.length();
if (maxLength  0  length  maxLength) {
termAtt.setLength(maxLength);
}
return true;
}
}

with this factory

public class CutMaxLengthFilterFactory extends BaseTokenFilterFactory {

private int maxLength;

@Override
public void init(MapString, String args) {
super.init(args);
maxLength = getInt(maxLength, 
CutMaxLengthFilter.DEFAULT_MAXLENGTH);
}

public TokenStream create(TokenStream input) {
return new CutMaxLengthFilter(input, maxLength);
}
}



 Original-Nachricht 
 Datum: Mon, 08 Aug 2011 10:15:45 +0200
 Von: Bernd Fehling bernd.fehl...@uni-bielefeld.de
 An: solr-user@lucene.apache.org
 Betreff: string cut-off filter?

 Hi list,
 
 is there a string cut-off filter to limit the length
 of a KeywordTokenized string?
 
 So the string should not be dropped, only limitited to a
 certain length.
 
 Regards
 Bernd


Re: Dispatching a query to multiple different cores

2011-08-08 Thread Erik Hatcher
You could use Solr's distributed (shards parameter) capability to do this.  
However, if you've got somewhat different schemas that isn't necessarily going 
to work properly.  Perhaps unify your schemas in order to facilitate this using 
Solr's distributed search feature?

Erik

On Aug 3, 2011, at 05:22 , Ahmed Boubaker wrote:

 Hello there!
 
 I have a multicore solr with 6 different simple cores and somewhat
 different schemas and I defined another meta core which I would it to be a
 dispatcher:  the requests are sent to simple cores and results are
 aggregated before sending back the results to the user.
 
 Any idea or hints how can I achieve this?
 I am wondering whether writing custom SearchComponent or a custom
 SearchHandler are good entry points?
 Is it possible to acces other SolrCore which are in the same container as
 the meta core?
 
 Many thanks for your help.
 
 Boubaker



Re: solr 3.1, not indexing entire document?

2011-08-08 Thread dhastings
that was it... thanks.  obviously the document is well over 2 mgs.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-3-1-not-indexing-entire-document-tp3236719p3236773.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: string cut-off filter?

2011-08-08 Thread Markus Jelsma
There is none indeed exept using copyField and maxChars. Could you perhaps 
come up with some regex that replaces the group of chars beyond the desired 
limit and replace it with '' ?

That would fit in a pattern replace char filter.

 Hi Bernd,
 
 I also searched for such a filter but did not found it.
 
 Best regards
   Karsten
 
 P.S. I am using now this filter:
 
 public class CutMaxLengthFilter extends TokenFilter {
 
   public CutMaxLengthFilter(TokenStream in) {
   this(in, DEFAULT_MAXLENGTH);
   }
 
   public CutMaxLengthFilter(TokenStream in, int maxLength) {
   super(in);
   this.maxLength = maxLength;
   }
 
   public static final int DEFAULT_MAXLENGTH = 15;
   private final int maxLength;
   private final CharTermAttribute termAtt =
 addAttribute(CharTermAttribute.class);
 
   @Override
   public final boolean incrementToken() throws IOException {
   if (!input.incrementToken()) {
   return false;
   }
   int length = termAtt.length();
   if (maxLength  0  length  maxLength) {
   termAtt.setLength(maxLength);
   }
   return true;
   }
 }
 
 with this factory
 
 public class CutMaxLengthFilterFactory extends BaseTokenFilterFactory {
 
   private int maxLength;
 
   @Override
   public void init(MapString, String args) {
   super.init(args);
   maxLength = getInt(maxLength, 
CutMaxLengthFilter.DEFAULT_MAXLENGTH);
   }
 
   public TokenStream create(TokenStream input) {
   return new CutMaxLengthFilter(input, maxLength);
   }
 }
 
 
 
  Original-Nachricht 
 
  Datum: Mon, 08 Aug 2011 10:15:45 +0200
  Von: Bernd Fehling bernd.fehl...@uni-bielefeld.de
  An: solr-user@lucene.apache.org
  Betreff: string cut-off filter?
  
  Hi list,
  
  is there a string cut-off filter to limit the length
  of a KeywordTokenized string?
  
  So the string should not be dropped, only limitited to a
  certain length.
  
  Regards
  Bernd


Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Hi,
Currently we are in the process of figuring out how to deal with
millions of CSV files containing weather data(20+ million files). Each
file is about 500 bytes in size.
We want to calculate statistics on fields read from the file. For
example, the standard deviation of wind speed across all 20+ million files.
Processing speed isn't an important issue. The analysis routine can run
for days, if needed.

The StatsComponent(http://wiki.apache.org/solr/StatsComponent) for Solr
appears to be able to calculate the statistics we are interested in.

Will the StatsComponent in Solr do what we need with minimal configuration?
Can the StatsComponent only be used on a subset of the data? For
example, only look at data from certain months?
Are there other free programs out there that can parse and analyze 20+
million files?

We are still very new to Solr and really appreciate all your help.
Thanks,
Fred


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
Are not  Dismax queries able to search for phrases using the default
index(which is what I am using?) If I can already do phrase  searches, I
don't understand why I would need to reindex t be able to access phrases
from a function.

On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma markus.jel...@openindex.iowrote:


  Aelexei, thank you , that does seem to work.
 
  My sort results seem to be totally wrong though, I'm not sure if its
  because of my sort function or something else.
 
  My query consists of:
  sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
  And I get back 4571232 hits.

 That's normal, you issue a catch all query. Sorting should work but..

  All the results don't have the phrase indie music anywhere in their
 data.
   Does termfreq not support phrases?

 No, it is TERM frequency and indie music is not one term. I don't know how
 this function parses your input but it might not understand your + escape
 and
 think it's one term constisting of exactly that.

  If not, how can I sort specifically by termfreq of a phrase?

 You cannot. What you can do is index multiple terms as one term using the
 shingle filter. Take care, it can significantly increase your index size
 and
 number of unique terms.

 
 
 
  On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko 
 
  ale...@superdownloads.com.br wrote:
   You can use the standard query parser and pass q=*:*
  
   2011/8/8 Jason Toy jason...@gmail.com
  
I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable to
 do
it without passing in data to the q paramater.  Is it possible to get
a
  
   sorted
  
list without searching for any terms?
  
   --
  
   *Alexei Martchenko* | *CEO* | Superdownloads
   ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
   5083.1018/5080.3535/5080.3533




-- 
- sent from my mobile
6176064373


Example Solr Config on EC2

2011-08-08 Thread Matt Shields
I'm looking for some examples of how to setup Solr on EC2.  The
configuration I'm looking for would have multiple nodes for redundancy.
 I've tested in-house with a single master and slave with replication
running in Tomcat on Windows Server 2003, but even if I have multiple slaves
the single master is a single point of failure.  Any suggestions or example
configurations?  The project I'm working on is a .NET setup, so ideally I'd
like to keep this search cluster on Windows Server, even though I prefer
Linux.

Matthew Shields
Owner
BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
Managed Services
www.beantownhost.com
www.sysadminvalley.com
www.jeeprally.com


Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Walter Underwood
This does not seem well matched to Solr. Solr and Lucene are optimized to show 
the best few matches, not every match.

I'd use Hadoop for this. Or MarkLogic, if you'd like to talk about that 
off-list.

wunder
Lead Engineer, MarkLogic

On Aug 8, 2011, at 1:59 PM, Fred Smith wrote:

 Hi,
 Currently we are in the process of figuring out how to deal with
 millions of CSV files containing weather data(20+ million files). Each
 file is about 500 bytes in size.
 We want to calculate statistics on fields read from the file. For
 example, the standard deviation of wind speed across all 20+ million files.
 Processing speed isn't an important issue. The analysis routine can run
 for days, if needed.
 
 The StatsComponent(http://wiki.apache.org/solr/StatsComponent) for Solr
 appears to be able to calculate the statistics we are interested in.
 
 Will the StatsComponent in Solr do what we need with minimal configuration?
 Can the StatsComponent only be used on a subset of the data? For
 example, only look at data from certain months?
 Are there other free programs out there that can parse and analyze 20+
 million files?
 
 We are still very new to Solr and really appreciate all your help.
 Thanks,
 Fred



Re: Dispatching a query to multiple different cores

2011-08-08 Thread Jonathan Rochkind
However, if you unify your schemas to do this, I'd consider whether you 
really want seperate cores/shards in the first place.


If you want to search over all of them together, what are your reasons 
to put them in seperate solr indexes in the first place?  Ordinarily, if 
you want to search over them all together, the best place to start is 
putting them in the same solr index.


Then, the distribution/sharding feature is generally your next step, 
only if you have so many documents that you need to shard for 
performance reasons. That is the intended use case of the 
distribution/sharding feature.


On 8/8/2011 4:54 PM, Erik Hatcher wrote:

You could use Solr's distributed (shards parameter) capability to do this.  
However, if you've got somewhat different schemas that isn't necessarily going 
to work properly.  Perhaps unify your schemas in order to facilitate this using 
Solr's distributed search feature?

Erik

On Aug 3, 2011, at 05:22 , Ahmed Boubaker wrote:


Hello there!

I have a multicore solr with 6 different simple cores and somewhat
different schemas and I defined another meta core which I would it to be a
dispatcher:  the requests are sent to simple cores and results are
aggregated before sending back the results to the user.

Any idea or hints how can I achieve this?
I am wondering whether writing custom SearchComponent or a custom
SearchHandler are good entry points?
Is it possible to acces other SolrCore which are in the same container as
the meta core?

Many thanks for your help.

Boubaker




Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Markus Jelsma
 Hi,
 Currently we are in the process of figuring out how to deal with
 millions of CSV files containing weather data(20+ million files). Each
 file is about 500 bytes in size.
 We want to calculate statistics on fields read from the file. For
 example, the standard deviation of wind speed across all 20+ million files.
 Processing speed isn't an important issue. The analysis routine can run
 for days, if needed.
 
 The StatsComponent(http://wiki.apache.org/solr/StatsComponent) for Solr
 appears to be able to calculate the statistics we are interested in.
 
 Will the StatsComponent in Solr do what we need with minimal configuration?
 Can the StatsComponent only be used on a subset of the data? For
 example, only look at data from certain months?

If i remember correctly, it cannot.

 Are there other free programs out there that can parse and analyze 20+
 million files?

Yes, if analyzing data like your data is all you do (not search, that's Solr's 
power) then you're most likely much better of not using Solr and write 
map/reduce programs for Apache Hadoop, it will analyze huge amounts of data. 
Hadoop can be quite difficult to start with so you can use the excellent Apache 
CouchDB database that supports map/reduce as well.

CouchDB is much easier to begin with. If you transform a sample of your data 
to the JSON format, install CouchDB, load your data, write a simple map/reduce 
function all in 8 hours. Loading and processing all the data will take a bit 
longer.

Cheers


 
 We are still very new to Solr and really appreciate all your help.
 Thanks,
 Fred


Re: Example Solr Config on EC2

2011-08-08 Thread Yury Kats
On 8/8/2011 5:03 PM, Matt Shields wrote:
 I'm looking for some examples of how to setup Solr on EC2.  The
 configuration I'm looking for would have multiple nodes for redundancy.
  I've tested in-house with a single master and slave with replication
 running in Tomcat on Windows Server 2003, but even if I have multiple slaves
 the single master is a single point of failure.  Any suggestions or example
 configurations?

This article describes various configurations:
http://www.lucidimagination.com/content/scaling-lucene-and-solr#d0e410


Re: csv responsewriter and numfound

2011-08-08 Thread Erik Hatcher
Great question.  But how would that get returned in the response?  

It is a drag that the header is lost when results are written in CSV, but there 
really isn't an obvious spot for that information to be returned.

Erik

On Aug 4, 2011, at 01:52 , Pooja Verlani wrote:

 Hi,
 
 Is there anyway to get numFound from csv response format? Some parameter?
 Or shall I change the code for csvResponseWriter for this?
 
 Thanks,
 Pooja



Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jonathan Rochkind

Dismax queries can. But

sort=termfreq(all_lists_text,'indie+music')

is not using dismax.  Apparenty termfreq function can not? I am not familiar 
with the termfreq function.

To understand why you'd need to reindex, you might want to read up on how 
lucene actually works, to get a basic understanding of how different indexing 
choices effect what is possible at query time. Lucene In Action is a pretty 
good book.



On 8/8/2011 5:02 PM, Jason Toy wrote:

Are not  Dismax queries able to search for phrases using the default
index(which is what I am using?) If I can already do phrase  searches, I
don't understand why I would need to reindex t be able to access phrases
from a function.

On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsmamarkus.jel...@openindex.iowrote:


Aelexei, thank you , that does seem to work.

My sort results seem to be totally wrong though, I'm not sure if its
because of my sort function or something else.

My query consists of:
sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
And I get back 4571232 hits.

That's normal, you issue a catch all query. Sorting should work but..


All the results don't have the phrase indie music anywhere in their

data.

  Does termfreq not support phrases?

No, it is TERM frequency and indie music is not one term. I don't know how
this function parses your input but it might not understand your + escape
and
think it's one term constisting of exactly that.


If not, how can I sort specifically by termfreq of a phrase?

You cannot. What you can do is index multiple terms as one term using the
shingle filter. Take care, it can significantly increase your index size
and
number of unique terms.




On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko

ale...@superdownloads.com.br  wrote:

You can use the standard query parser and pass q=*:*

2011/8/8 Jason Toyjason...@gmail.com


I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable to

do

it without passing in data to the q paramater.  Is it possible to get
a

sorted


list without searching for any terms?

--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533





Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma

 Are not  Dismax queries able to search for phrases using the default
 index(which is what I am using?) If I can already do phrase  searches, I
 don't understand why I would need to reindex t be able to access phrases
 from a function.

Executing a Lucene phrase query is not the same as term frequency (phrase != 
term). A phrase consists of multiple terms and Lucene has an inverted term 
index, not an inverted phrase index (unless your index your data that way).

 
 On Mon, Aug 8, 2011 at 1:49 PM, Markus Jelsma 
markus.jel...@openindex.iowrote:
   Aelexei, thank you , that does seem to work.
   
   My sort results seem to be totally wrong though, I'm not sure if its
   because of my sort function or something else.
   
   My query consists of:
   sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
   And I get back 4571232 hits.
  
  That's normal, you issue a catch all query. Sorting should work but..
  
   All the results don't have the phrase indie music anywhere in their
  
  data.
  
Does termfreq not support phrases?
  
  No, it is TERM frequency and indie music is not one term. I don't know
  how this function parses your input but it might not understand your +
  escape and
  think it's one term constisting of exactly that.
  
   If not, how can I sort specifically by termfreq of a phrase?
  
  You cannot. What you can do is index multiple terms as one term using the
  shingle filter. Take care, it can significantly increase your index size
  and
  number of unique terms.
  
   On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko 
   
   ale...@superdownloads.com.br wrote:
You can use the standard query parser and pass q=*:*

2011/8/8 Jason Toy jason...@gmail.com

 I am trying to list some data based on a function I run ,
 specifically  termfreq(post_text,'indie music')  and I am unable to
  
  do
  
 it without passing in data to the q paramater.  Is it possible to
 get a

sorted

 list without searching for any terms?

--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: csv responsewriter and numfound

2011-08-08 Thread Yonik Seeley
On Mon, Aug 8, 2011 at 5:12 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 Great question.  But how would that get returned in the response?

 It is a drag that the header is lost when results are written in CSV, but 
 there really isn't an obvious spot for that information to be returned.

I guess a comment would be one option.

-Yonik
http://www.lucidimagination.com


Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Jonathan Rochkind

On 8/8/2011 5:10 PM, Markus Jelsma wrote:

Will the StatsComponent in Solr do what we need with minimal configuration?
Can the StatsComponent only be used on a subset of the data? For
example, only look at data from certain months?

If i remember correctly, it cannot.


Well, if you index things properly, you could an fq to only certain 
months, and then use StatsComponent on top.


But I'd agree with others that Solr is probably not the best tool for 
this job. Solr's primary area of competency is text indexing and text 
search, not mathematical calculation. If you need a whole lot of text 
indexing and a little bit of math too, you might be able to get 
StatsComponent to do what you need, although you'll probably run into 
some tricky parts becuase this isn't really Solr's focus.


But if you need a whole bunch of math and no text indexing at all -- use 
a tool that has math rather than text search as it's prime area of 
competency/focus, don't make things hard for yourself by using the wrong 
tool for the job.


(StatsComponent, incidentally, performs not-so-great on very large 
result sets, depending on what you ask it for).


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma

 Dismax queries can. But
 
 sort=termfreq(all_lists_text,'indie+music')
 
 is not using dismax.  Apparenty termfreq function can not? I am not
 familiar with the termfreq function.

It simply returns the TF of the given _term_  as it is indexed of the current 
document. 

Sorting on TF like this seems strange as by default queries are already sorted 
that way since TF plays a big role in the final score.

 
 To understand why you'd need to reindex, you might want to read up on how
 lucene actually works, to get a basic understanding of how different
 indexing choices effect what is possible at query time. Lucene In Action
 is a pretty good book.
 
 On 8/8/2011 5:02 PM, Jason Toy wrote:
  Are not  Dismax queries able to search for phrases using the default
  index(which is what I am using?) If I can already do phrase  searches, I
  don't understand why I would need to reindex t be able to access phrases
  from a function.
  
  On Mon, Aug 8, 2011 at 1:49 PM, Markus 
Jelsmamarkus.jel...@openindex.iowrote:
  Aelexei, thank you , that does seem to work.
  
  My sort results seem to be totally wrong though, I'm not sure if its
  because of my sort function or something else.
  
  My query consists of:
  sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
  And I get back 4571232 hits.
  
  That's normal, you issue a catch all query. Sorting should work but..
  
  All the results don't have the phrase indie music anywhere in their
  
  data.
  
Does termfreq not support phrases?
  
  No, it is TERM frequency and indie music is not one term. I don't know
  how this function parses your input but it might not understand your +
  escape and
  think it's one term constisting of exactly that.
  
  If not, how can I sort specifically by termfreq of a phrase?
  
  You cannot. What you can do is index multiple terms as one term using
  the shingle filter. Take care, it can significantly increase your index
  size and
  number of unique terms.
  
  On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko
  
  ale...@superdownloads.com.br  wrote:
  You can use the standard query parser and pass q=*:*
  
  2011/8/8 Jason Toyjason...@gmail.com
  
  I am trying to list some data based on a function I run ,
  specifically  termfreq(post_text,'indie music')  and I am unable to
  
  do
  
  it without passing in data to the q paramater.  Is it possible to get
  a
  
  sorted
  
  list without searching for any terms?
  
  --
  
  *Alexei Martchenko* | *CEO* | Superdownloads
  ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
  5083.1018/5080.3535/5080.3533


Re: Can Master push data to slave

2011-08-08 Thread Markus Jelsma
Hi,

 Hi
 
 I am using Solr 1.4. and doing a replication process where my slave is
 pulling data from Master. I have 2 questions
 
 a. Can Master push data to slave

Not in current versions. Not sure about exotic patches for this.

 b. How to make sure that lock file is not created while replication

What do you mean? 

 
 Please help
 
 thanks
 Pawan


Re: Example Solr Config on EC2

2011-08-08 Thread mbohlig
Matthew,

Here's another resource:
http://www.lucidimagination.com/blog/2010/02/01/solr-shines-through-the-cloud-lucidworks-solr-on-ec2/


Michael Bohlig
Lucid Imagination



- Original Message 
From: Matt Shields m...@mattshields.org
To: solr-user@lucene.apache.org
Sent: Mon, August 8, 2011 2:03:20 PM
Subject: Example Solr Config on EC2

I'm looking for some examples of how to setup Solr on EC2.  The
configuration I'm looking for would have multiple nodes for redundancy.
I've tested in-house with a single master and slave with replication
running in Tomcat on Windows Server 2003, but even if I have multiple slaves
the single master is a single point of failure.  Any suggestions or example
configurations?  The project I'm working on is a .NET setup, so ideally I'd
like to keep this search cluster on Windows Server, even though I prefer
Linux.

Matthew Shields
Owner
BeanTown Host - Web Hosting, Domain Names, Dedicated Servers, Colocation,
Managed Services
www.beantownhost.com
www.sysadminvalley.com
www.jeeprally.com



Re: Can Solr with the StatsComponent analyze 20+ million files?

2011-08-08 Thread Fred Smith
Thank you Walter, Markus and Jonathan for your fast responses and help!
We will be looking into CouchDB (and Hadoop if necessary) to process our
data.
Thanks again,
Fred


Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin,

Is this means if I apply the patch mention at below link still Solr does not
support lots of core?
https://issues.apache.org/jira/browse/SOLR-1293

Are you saying this is just a concept and the patch is not an
implementation? We are planning to use lots of core in our commerce system
to separate products for each client in search and provide customization for
each client. So could you please let us know if this is feasible and if we
want to create around 500 cores and have around 8-10 load balancing solr
slaves?

Please let us know. Based on your feedback our approach will be decided.

Thanks  Regards,
Umesh



On Mon, Jul 25, 2011 at 3:36 AM, Markus Jelsma-2 [via Lucene] 
ml-node+3196893-77535491-416...@n3.nabble.com wrote:

 No i missed something and interpreted the question as using a lot of cores.


  LotsOfCores does not exist as a feature. It is just a write-up, some jira

  issues and a couple of patches. Did I miss something?
 
  On Sun, Jul 24, 2011 at 8:26 PM, Markus Jelsma
 
  [hidden email] 
  http://user/SendEmail.jtp?type=nodenode=3196893i=0wrote:

   It works fine but you would keep an eye on additional overhead, cores
   `stealing` too much CPU from others, trouble with cores that merge
   segments stealing I/O and of course RAM. It can also result in quite a
   high number of
   open file descriptors.
  
   There are more, but these seem most common to me.
  
Hi,
   
Is anbody using lots of core feature in production? Is this feature
scalable. I have around 1000 core and want to use this feature. Will
  
   there
  
be any issue in production?
   
http://wiki.apache.org/solr/LotsOfCores
   
Thanks,
Umesh
   
--
  
View this message in context:
  
 http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in
   -
  
production-tp3193798p3193798.html Sent from the Solr - User mailing
list archive at Nabble.com.


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in-production-tp3193798p3196893.html
  To unsubscribe from Is anobdy using lotsofcores feature in production?, click
 herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3193798code=VW9tZXNoQGdtYWlsLmNvbXwzMTkzNzk4fDIyODkyODYxMg==.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in-production-tp3193798p3236957.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Can Master push data to slave

2011-08-08 Thread simon
You could configure a PostCommit event listener on the master which
would send a HTTP fetchindex request to the slave you want to carry
out replication  - see
http://wiki.apache.org/solr/SolrReplication#HTTP_API

But why do you want the master to push to the slave ?

-Simon

On Mon, Aug 8, 2011 at 5:26 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
 Hi,

 Hi

 I am using Solr 1.4. and doing a replication process where my slave is
 pulling data from Master. I have 2 questions

 a. Can Master push data to slave

 Not in current versions. Not sure about exotic patches for this.

 b. How to make sure that lock file is not created while replication

 What do you mean?


 Please help

 thanks
 Pawan



Re: Is anobdy using lotsofcores feature in production?

2011-08-08 Thread Uomesh
Hi Shalin,

Is this means if I apply the patch mention at below link still Solr does not
support lots of core?
https://issues.apache.org/jira/browse/SOLR-1293

Are you saying this is just a concept and the patch is not an
implementation? We are planning to use lots of core in our commerce system
to separate products for each client in search and provide customization for
each client. So could you please let us know if this is feasible and if we
want to create around 500 cores and have around 8-10 load balancing solr
slaves?

Please let us know. Based on your feedback our approach will be decided.

Thanks  Regards,
Umesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-anobdy-using-lotsofcores-feature-in-production-tp3193798p3236958.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Same id on two shards

2011-08-08 Thread simon
Only one should be returned, but it's non-deterministic. See
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

-Simon

On Sat, Aug 6, 2011 at 6:27 AM, Pooja Verlani pooja.verl...@gmail.com wrote:
 Hi,

 We have a multicore solr with 6 cores. We merge the results using shards
 parameter or distrib handler.
 I have a problem, I might post one document on one of the cores and then
 post it after some days on another core, as I have a time-sliced multicore
 setup!

 The question is if I retrieve a document which is posted on both the shards,
 will solr return me only one document or both. And if only one document will
 be return, which one?

 Regards,
 Pooja



Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Jason Toy
I am trying to test out and compare different sorts and scoring.

 When I use dismax to search for indie music
with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100
I see some stuff that seems irrelevant, meaning in top results I see only
1 or 2 mentions of indie music, but when I look further down the list I do
see other docs that have more occurrences of indie music.
So I a want to test by comparing the the different queries versus seeing a
list of docs ranked specifically by the count of occurrences of the phrase
indie music

On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma markus.jel...@openindex.iowrote:


  Dismax queries can. But
 
  sort=termfreq(all_lists_text,'indie+music')
 
  is not using dismax.  Apparenty termfreq function can not? I am not
  familiar with the termfreq function.

 It simply returns the TF of the given _term_  as it is indexed of the
 current
 document.

 Sorting on TF like this seems strange as by default queries are already
 sorted
 that way since TF plays a big role in the final score.

 
  To understand why you'd need to reindex, you might want to read up on how
  lucene actually works, to get a basic understanding of how different
  indexing choices effect what is possible at query time. Lucene In Action
  is a pretty good book.
 
  On 8/8/2011 5:02 PM, Jason Toy wrote:
   Are not  Dismax queries able to search for phrases using the default
   index(which is what I am using?) If I can already do phrase  searches,
 I
   don't understand why I would need to reindex t be able to access
 phrases
   from a function.
  
   On Mon, Aug 8, 2011 at 1:49 PM, Markus
 Jelsmamarkus.jel...@openindex.iowrote:
   Aelexei, thank you , that does seem to work.
  
   My sort results seem to be totally wrong though, I'm not sure if its
   because of my sort function or something else.
  
   My query consists of:
   sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
   And I get back 4571232 hits.
  
   That's normal, you issue a catch all query. Sorting should work but..
  
   All the results don't have the phrase indie music anywhere in their
  
   data.
  
 Does termfreq not support phrases?
  
   No, it is TERM frequency and indie music is not one term. I don't know
   how this function parses your input but it might not understand your +
   escape and
   think it's one term constisting of exactly that.
  
   If not, how can I sort specifically by termfreq of a phrase?
  
   You cannot. What you can do is index multiple terms as one term using
   the shingle filter. Take care, it can significantly increase your
 index
   size and
   number of unique terms.
  
   On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko
  
   ale...@superdownloads.com.br  wrote:
   You can use the standard query parser and pass q=*:*
  
   2011/8/8 Jason Toyjason...@gmail.com
  
   I am trying to list some data based on a function I run ,
   specifically  termfreq(post_text,'indie music')  and I am unable to
  
   do
  
   it without passing in data to the q paramater.  Is it possible to
 get
   a
  
   sorted
  
   list without searching for any terms?
  
   --
  
   *Alexei Martchenko* | *CEO* | Superdownloads
   ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
   5083.1018/5080.3535/5080.3533




-- 
- sent from my mobile
6176064373


Re: bug in termfreq? was Re: is it possible to do a sort without query?

2011-08-08 Thread Markus Jelsma
If your want to understand and debug the scoring you can use debugQuery=true 
to see how different documents score. Most of the time docs with both terms are 
on top of the result set unless norms are interferring.

To understand your should check the Solr relevancy wiki but the Lucene docs 
are much better although very low level.

http://wiki.apache.org/solr/SolrRelevancyCookbook
http://lucene.apache.org/java/3_1_0/api/core/org/apache/lucene/search/Similarity.html

Your question is more a relevance question than about the termfreq function. 
To be short, don't use those kind of functions if you don't yet understand 
similarity as describe in the Lucene docs.

 I am trying to test out and compare different sorts and scoring.
 
  When I use dismax to search for indie music
 with: qf=all_lists_textq=indie+musicdefType=dismaxrows=100
 I see some stuff that seems irrelevant, meaning in top results I see only
 1 or 2 mentions of indie music, but when I look further down the list I
 do see other docs that have more occurrences of indie music.
 So I a want to test by comparing the the different queries versus seeing a
 list of docs ranked specifically by the count of occurrences of the phrase
 indie music
 
 On Mon, Aug 8, 2011 at 2:19 PM, Markus Jelsma 
markus.jel...@openindex.iowrote:
   Dismax queries can. But
   
   sort=termfreq(all_lists_text,'indie+music')
   
   is not using dismax.  Apparenty termfreq function can not? I am not
   familiar with the termfreq function.
  
  It simply returns the TF of the given _term_  as it is indexed of the
  current
  document.
  
  Sorting on TF like this seems strange as by default queries are already
  sorted
  that way since TF plays a big role in the final score.
  
   To understand why you'd need to reindex, you might want to read up on
   how lucene actually works, to get a basic understanding of how
   different indexing choices effect what is possible at query time.
   Lucene In Action is a pretty good book.
   
   On 8/8/2011 5:02 PM, Jason Toy wrote:
Are not  Dismax queries able to search for phrases using the default
index(which is what I am using?) If I can already do phrase 
searches,
  
  I
  
don't understand why I would need to reindex t be able to access
  
  phrases
  
from a function.

On Mon, Aug 8, 2011 at 1:49 PM, Markus
  
  Jelsmamarkus.jel...@openindex.iowrote:
Aelexei, thank you , that does seem to work.

My sort results seem to be totally wrong though, I'm not sure if
its because of my sort function or something else.

My query consists of:
sort=termfreq(all_lists_text,'indie+music')+descq=*:*rows=100
And I get back 4571232 hits.

That's normal, you issue a catch all query. Sorting should work
but..

All the results don't have the phrase indie music anywhere in
their

data.

  Does termfreq not support phrases?

No, it is TERM frequency and indie music is not one term. I don't
know how this function parses your input but it might not
understand your + escape and
think it's one term constisting of exactly that.

If not, how can I sort specifically by termfreq of a phrase?

You cannot. What you can do is index multiple terms as one term
using the shingle filter. Take care, it can significantly increase
your
  
  index
  
size and
number of unique terms.

On Mon, Aug 8, 2011 at 1:08 PM, Alexei Martchenko

ale...@superdownloads.com.br  wrote:
You can use the standard query parser and pass q=*:*

2011/8/8 Jason Toyjason...@gmail.com

I am trying to list some data based on a function I run ,
specifically  termfreq(post_text,'indie music')  and I am unable
to

do

it without passing in data to the q paramater.  Is it possible to
  
  get
  
a

sorted

list without searching for any terms?

--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: Same id on two shards

2011-08-08 Thread Shawn Heisey

On 8/8/2011 4:07 PM, simon wrote:

Only one should be returned, but it's non-deterministic. See
http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations


I had heard it was based on which one responded first.  This is part of 
why we have a small index that contains the newest content and only 
distribute content to the other shards once a day.  The hope is that the 
small index (less than 1GB, fits into RAM on that virtual machine) will 
always respond faster than the other larger shards (over 18GB each).  Is 
this an incorrect assumption on our part?


The build system does do everything it can to ensure that periods of 
overlap are limited to the time it takes to commit a change across all 
of the shards, which should amount to just a few seconds once a day.  
There might be situations when the index gets out of whack and we have 
duplicate id values for a longer time period, but in practice it hasn't 
happened yet.


Thanks,
Shawn



Re: merge factor performance

2011-08-08 Thread Erick Erickson
What version of Solr are you using? And how are you sending your
docs to Solr?

Bumping your JVM size and bumping your RAM size
to 128M also might help..

How are you sending your docs to Solr? And where are you
getting them from? Are you sure that Solr is your problem or
is it your data acquisition? (hint, just comment out the call
to Solr if you're using SolrJ)...

Bottom line: There isn't much information to go on here...

And have you seen:
http://wiki.apache.org/solr/FAQ#How_can_indexing_be_accelerated.3F

Best
Erick

 also what about RAM Size (default is 32 MB) ?

 Which other factors we need to consider ?

 When should we consider optimize ?

 Any other deviation from default would help us in achieving the target.

 We are allocating JVM max heap size allocation 512 MB, default concurrent
 mark sweep is set for garbage collection.


 Thanks
 Naveen







Re: MultiSearcher/ParallelSearcher - searching over multiple cores?

2011-08-08 Thread Erick Erickson
I think you'll have to make this go yourself, I don't see how to make
Solr do it for you. And even if it could, the scores aren't comparable,
so combining them for presentation to the user will be interesting

Best
Erick

On Thu, Aug 4, 2011 at 2:27 PM, Ralf Musick ra...@gmx.de wrote:
 Hi Erik,

 I have several types with different properties, but they are supposed to
 be combined to one search.
 Imagine a book with property title and a journal with property name.
 (the types in my project have of course more complex properties.)

 So I created a new core with combined searchfields: field name is indexed,
 title is indexed, some shared properties are indexed like id.
 Further an additional solr field type is created.
 Of course there are several indexer, each per type. A specific type indexer
 stores only the fields of that type and stores further the type information
 eg book.
 After indexing, all types are in the same core.

 To search over all types, the query has to look like that ((title: bla) and
 (type: book)) or ((name: bla) and (type: journal)).

 At least you get books or journal sorted by boost factor - and you have the
 type information as return field to differ the search results.

 I hope it is coherent.

 Thanks for your answer,
  Best Ralf








Re: Records skipped when using DataImportHandler

2011-08-08 Thread Erick Erickson
Spend some time in the admin/analysis page, that'll show you what
part of the analysis chain is doing what to your data. It'll save you a world
of headache...

But at a guess WordDelimiterFilterFactory is your culprit...

Best
Erick

On Thu, Aug 4, 2011 at 6:08 PM, anand sridhar anand.for...@gmail.com wrote:
 Ok. After analysis, I narrowed the reduced results set to the fact that the
 zipcode field is not indexed 'as is'. i.e the zipcode field values are
 broken down into tokens and then stored. Hence, if there are 10 documents
 with zipcode fields varying from 91000-91009, then the zipcode fields are
 not stored as 91000, 91001 etc.. instead, the most common recurrences are
 grabbed together and stored as tokens  hence resulting in a reduced
 resultset.
 The net effect is I cannot search for a value like 91000  since its not
 stored as it is.

 I suspect this to do something with the type of field the zipcode is
 associated to. Right now , zipcode is a field of type text_general where the
 StandardTokenizerFactory may be breakign the values into tokens. However, I
 want to store them without tokenizing. Whats the best field type to do this.
 ?

 I already explored the String fieldtype which is supposed to store the
 values as is, but I see that the values are still being tokenized.


 Thanks,
 Anand
 On Wed, Aug 3, 2011 at 7:24 PM, Erick Erickson erickerick...@gmail.comwrote:

 Sorry, I'm on a restricted machine so can't get the precise URL. But,
 there's a debug page for DIH that might allow you to see what the query
 actually returns. I'd guess one of two things:
 1 you aren't getting the number of rows you think.
 2 you aren't committing the documents you add.

 But that's just a guess.

 Best
 Erick
 On Aug 3, 2011 2:15 PM, anand sridhar anand.for...@gmail.com wrote:
  Hi,
  I am a newbie to Solr and have been trying to learn using
  DataImportHandler.
  I have a query in data-config.xml that fetches about 5 records when i
 fire
  it in SQL Query manager.
  However, when Solr does a full import, it is skipping 4 records and only
  importing 1 record.
  What could be the reason for that. ?
 
  My data-config.xml looks like this -
 
  dataConfig
  dataSource type=JdbcDataSource
  name=GeoService
  driver=net.sourceforge.jtds.jdbc.Driver
  url=jdbc:jtds:sqlserver://10.168.50.104/ZipCodeLookup
  user=sa
  password=psiuser/
  document
  entity name=city
  query=select ll.cityId as id, ll.zip as zipCode, c.cityName as
  cityName, st.stateName as state, ct.countryName as country from
 latlonginfo
  ll,city c, state st, country ct where ll.cityId = c.cityID and
  c.stateID=st.stateID and st.countryID = ct.countryID
  order by ll.areacode
  dataSource=GeoService
  field column=zipCode name=zipCode/
  field column=cityName name=cityName/
  field column=state name=state/
  field column=country name=country/
  /entity
  /document
  /dataConfig
 
  My fields definition in schema.xml looks as below -
 
  field name=CityName type=text_general indexed=true stored=true
 /
  field name=zipCode type=text_general indexed=true stored=true/
  field name=state type=text_general indexed=true stored=true /
  field name=country type=text_general indexed=true stored=true /
 
  One observation I made was the 1 record that is being indexes is the last
  record in the result set. I have verified that there are no duplicate
  records being retreived.
 
  For eg, if the result set from Database is -
 
  zipcode CityName state country
  --- - - ---
  91324 Northridge CA USA
  91325 Northridge CA USA
  91327 Northridge CA USA
  91328 Northridge CA USA
  91329 Northridge CA USA
  91330 Northridge CA USA
 
  The record being indexed is the last record all the time.
 
  Any suggestions are welcome.
 
  Thanks,
  Anand




Re: Same id on two shards

2011-08-08 Thread simon
I think the first one to respond is indeed the way it works, but
that's only deterministic up to a point (if your small index is in the
throes of a commit and everything required for a response happens to
be  cached on the larger shard ... who knows ?)

On Mon, Aug 8, 2011 at 7:10 PM, Shawn Heisey s...@elyograg.org wrote:
 On 8/8/2011 4:07 PM, simon wrote:

 Only one should be returned, but it's non-deterministic. See

 http://wiki.apache.org/solr/DistributedSearch#Distributed_Searching_Limitations

 I had heard it was based on which one responded first.  This is part of why
 we have a small index that contains the newest content and only distribute
 content to the other shards once a day.  The hope is that the small index
 (less than 1GB, fits into RAM on that virtual machine) will always respond
 faster than the other larger shards (over 18GB each).  Is this an incorrect
 assumption on our part?

 The build system does do everything it can to ensure that periods of overlap
 are limited to the time it takes to commit a change across all of the
 shards, which should amount to just a few seconds once a day.  There might
 be situations when the index gets out of whack and we have duplicate id
 values for a longer time period, but in practice it hasn't happened yet.

 Thanks,
 Shawn




Re: Suggestions for copying fields across cores...

2011-08-08 Thread Erick Erickson
Not that I know of. Separate cores are pretty distinct to Solr, so
you're probably
stuck with doing it by sending the request to each core...

Best
Erick

On Fri, Aug 5, 2011 at 5:51 PM, josh lucas j...@lucasjosh.com wrote:
 Is there a suggested way to copy data in fields to additional fields that 
 will only be in a different core?  Obviously I could index the data 
 separately and I could build that into my current indexing process but I'm 
 curious if there might be an easier, more automated way.

 Thanks!


 josh


Re: how to enable MMapDirectory in solr 1.4?

2011-08-08 Thread Li Li
thank you. I will try it.

On Mon, Aug 8, 2011 at 11:18 PM, Rich Cariens richcari...@gmail.com wrote:
 We patched our 1.4.1 build with
 SOLR-1969https://issues.apache.org/jira/browse/SOLR-1969(making
 MMapDirectory configurable) and realized a 64% search performance
 boost on our Linux hosts.

 On Mon, Aug 8, 2011 at 10:05 AM, Dyer, James james.d...@ingrambook.comwrote:

 If you want to try MMapDirectory with Solr 1.4, then copy the class
 org.apache.solr.core.MMapDirectoryFactory from 3.x or Trunk, and either add
 it to the .war file (you can just add it under src/java and re-package the
 war), or you can put it in its own .jar file in the lib directory under
 solr_home.  Then, in solrconfig.xml, add this entry under the root
 config element:

 directoryFactory class=org.apache.solr.core.MMapDirectoryFactory /

 I'm not sure if MMapDirectory will perform better for you with Linux over
 NIOFSDir.  I'm pretty sure in Trunk/4.0 it's the default for Windows and
 maybe Solaris.  In Windows, there is a definite advantage for using
 MMapDirectory on a 64-bit system.

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Li Li [mailto:fancye...@gmail.com]
 Sent: Monday, August 08, 2011 4:09 AM
 To: solr-user@lucene.apache.org
 Subject: how to enable MMapDirectory in solr 1.4?

 hi all,
    I read Apache Solr 3.1 Released Note today and found that
 MMapDirectory is now the default implementation in 64 bit Systems.
    I am now using solr 1.4 with 64-bit jvm in Linux. how can I use
 MMapDirectory? will it improve performance?