Re: Anyone else see this error when running unit tests?

2013-02-14 Thread Amit Nithian
Okay so I think I found a solution if you are a maven user and don't
mind forcing the test codec to Lucene40 then do the following:

Add this to your pom.xml under the

build

 pluginManagement

 plugins section


   plugin

  groupIdorg.apache.maven.plugins/groupId

  artifactIdmaven-surefire-plugin/artifactId

  version2.13/version

  configuration

   argLine-Dtests.codec=Lucene40/argLine

  /configuration

  /plugin


If you are running in Eclipse, simply add this as a VM argument. The
default test codec is set to random and this means that there is a
possibility of picking Lucene3x if some random variable is  2 and other
conditions are met. For me, my test-framework jar must not be ahead of
the lucene one (b/c I don't control the classpath order and honestly this
shouldn't be a requirement to run a test) so it periodically bombed. This
little fix seems to have helped provided that you don't care about Lucene3x
vs Lucene40 for your tests (I am on Lucene40 so it's fine for me).

HTH!

Amit


On Mon, Feb 4, 2013 at 6:18 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Me too, it fails randomly with test classes. We use Solr4.0 for testing, no
 maven, only ant.
 --roman
 On 4 Feb 2013 20:48, Mike Schultz mike.schu...@gmail.com wrote:

  Yes.  Just today actually.  I had some unit test based on
  AbstractSolrTestCase which worked in 4.0 but in 4.1 they would fail
  intermittently with that error message.  The key to this behavior is
 found
  by looking at the code in the lucene class:
  TestRuleSetupAndRestoreClassEnv.
  I don't understand it completely but there are a number of random code
  paths
  through there.  The following helped me get around the problem, at least
 in
  the short term.
 
 
 
 @org.apache.lucene.util.LuceneTestCase.SuppressCodecs({Lucene3x,Lucene40})
  public class CoreLevelTest extends AbstractSolrTestCase {
 
  I also need to call this inside my setUp() method, in 4.0 this wasn't
  required.
  initCore(solrconfig.xml, schema.xml, /tmp/my-solr-home);
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/Anyone-else-see-this-error-when-running-unit-tests-tp4015034p4038472.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



Re: replication problems with solr4.1

2013-02-14 Thread Amit Nithian
I may be missing something but let me go back to your original statements:
1) You build the index once per week from scratch
2) You replicate this from master to slave.

My understanding of the way replication works is that it's meant to only
send along files that are new and if any files named the same between the
master and slave have different sizes then this is a corruption of sorts
and do this index.timestamp and send the full thing down. This, I think,
explains your index.timestamp issue although why the old index/ directory
isn't being deleted i'm not sure about. This is why I was asking about OS
details, file system details etc (perhaps something else is locking that
directory preventing Java from deleting it?)

The second issue is the index generation which is governed by commits and
is represented by looking at the last few characters in the segments_XX
file. When the slave downloads the index and does the copy of the new
files, it does a commit to force a new searcher hence why the slave
generation will be +1 from the master.

The index version is a timestamp and it may be the case that the version
represents the point in time when the index was downloaded to the slave? In
general, it shouldn't matter about these details because replication is
only triggered if the master's version  slave's version and the clocks
that all servers use are synched to some common clock.

Caveat however in my answer is that I have yet to try 4.1 as this is next
on my TODO list so maybe I'll run into the same problem :-) but I wanted to
provide some info as I just recently dug through the replication code to
understand it better myself.

Cheers
Amit


On Wed, Feb 13, 2013 at 11:57 PM, Bernd Fehling 
bernd.fehl...@uni-bielefeld.de wrote:

 OK then index generation and index version are out of count when it comes
 to verify that master and slave index are in sync.

 What else is possible?

 The strange thing is if master is 2 or more generations ahead of slave
 then it works!
 With your logic the slave must _always_ be one generation ahead of the
 master,
 because the slave replicates from master and then does an additional commit
 to recognize the changes on the slave.
 This implies that the slave acts as follows:
 - if the master is one generation ahaed then do an additional commit
 - if the master is 2 or more generations ahead then do _no_ commit
 OR
 - if the master is 2 or more generations ahead then do a commit but don't
   change generation and version of index

 Can this be true?

 I would say not really.

 Regards
 Bernd


 Am 13.02.2013 20:38, schrieb Amit Nithian:
  Okay so then that should explain the generation difference of 1 between
 the
  master and slave
 
 
  On Wed, Feb 13, 2013 at 10:26 AM, Mark Miller markrmil...@gmail.com
 wrote:
 
 
  On Feb 13, 2013, at 1:17 PM, Amit Nithian anith...@gmail.com wrote:
 
  doesn't it do a commit to force solr to recognize the changes?
 
  yes.
 
  - Mark
 
 



RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Pragyanshis Pattanaik
It is returning me all the documents which contains the phrase as it is 
searching against Defaultfield.my default field is like below
field name=SearchableField type=text_general indexed=true stored=false 
multiValued=true/copyField source=Product-Name-* 
dest=SearchableField/ copyField source=Product-Description-* 
dest=SearchableField/
I have defined SearchableField as default field.
Thanks,Pragyanshis
 Date: Wed, 13 Feb 2013 23:18:06 -0800
 From: iori...@yahoo.com
 Subject: Re: Why a phrase is getting searched against default fields in solr
 To: solr-user@lucene.apache.org
 
 Hi Pragyanshis,
 
 What happens when you remove bq parameter? 
 
 --- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com wrote:
 
  From: Pragyanshis Pattanaik pragyans...@outlook.com
  Subject: Why a phrase is getting searched against default fields in solr
  To: solr Forum solr-user@lucene.apache.org
  Date: Thursday, February 14, 2013, 8:24 AM
  Hi,
  This might be a very silly question but i want to know why
  this is happening.If i am using edismax query parser in solr
  and passing query something like below
  
 
  q=IPhone5wt=xmledismax=trueqf=Product-Name-0^100bq=(Product-Rating-0%3A7^300+OR+Product-Rating-0%3A8^400+OR+Product-Rating-0%3A9^500+OR+Product-Rating-0%3A10^600+OR+Product-Rating-0%3A*)
  Then why it is searching in default fields ?As i am
  specifying qf,it should search in the fields specified in qf
  parameter and boost those documents which has a higher
  rating.
  Please correct me if my understanding is wrong.Note:-I am
  using SOLR 4.0 Alpha
  Thanks,Pragyanshis
  


  

Re: Boost Specific Phrase

2013-02-14 Thread Ahmet Arslan

Hi Hemant,

I think your use case would be useful for relevancy tuning. It could be 
implemented as either SearchComponent or QParserPlugin.

Edismax query parser has pf2 pf3 parameters can remedy to some degree.

Probably edismax extension will be best place to put it. Similar to 
https://issues.apache.org/jira/browse/SOLR-4381

 
--- On Thu, 2/14/13, Hemant Verma hemantverm...@gmail.com wrote:

 From: Hemant Verma hemantverm...@gmail.com
 Subject: Re: Boost Specific Phrase
 To: solr-user@lucene.apache.org
 Date: Thursday, February 14, 2013, 7:56 AM
 Thanks for the response.
 
 pf parameter actually boost the documents considering all
 search keywords
 mentioned in main query but I am looking for something which
 boost the
 documents considering few search keywords from the user
 query.
 Like as per the example, user query is (project manager in
 India with 2 yrs
 experience) and my dictionary contains one entry as 'project
 manager' which
 specifies if user query has 'project manager' in his query
 then boost those
 documents which contains 'project manager' as an exact
 match.
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Boost-Specific-Phrase-tp4040188p4040371.html
 Sent from the Solr - User mailing list archive at
 Nabble.com.
 


How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
Hi,

I'm sure it's an old question..
I just want protecting Admin page (/solr) with Basic Authentication.
But I can't found fine answer yet out there.

I use Solr 4.1 with Apache Tomcat/7.0.35.

Could anyone give me a quick hints or links?

Thanks in advance!

-- 
wassalam,
[bayu]


Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Gora Mohanty
On 14 February 2013 14:05, Bayu Widyasanyata bwidyasany...@gmail.com wrote:
 Hi,

 I'm sure it's an old question..
 I just want protecting Admin page (/solr) with Basic Authentication.
 But I can't found fine answer yet out there.

 I use Solr 4.1 with Apache Tomcat/7.0.35.
[...]

The easiest way to do this with Tomcat7 is:
1. Install the manager app, and set up roles in
conf/tomcat-users.xml
2. A UserDatabaseRealm should already be defined in
conf/server.xml
3. Depending on how you installed Solr, there should be a folder
like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and
add security-constraint and security-role tags. The entries
for the latter should match the entries in step 1.

These links should be of help:
http://tomcat.apache.org/tomcat-7.0-doc/realm-howto.html
http://www.tomcatexpert.com/ask-the-experts/basic-auth-configuration-tomcat-7-https

Regards,
Gora


Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
On Thu, Feb 14, 2013 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:

 3. Depending on how you installed Solr, there should be a folder
 like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and
 add security-constraint and security-role tags. The entries
 for the latter should match the entries in step 1.


One thing that I'm not found is folder webapps/solr/WEB-INF/.
I install binary Solr distribution.
It might be not created when deployed or first accessed..??
I'm not sure... :( since I also new on Tomcat deployment.

Thanks,

-- 
wassalam,
[bayu]


RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Ahmet Arslan
Hi,

instead of edismax=true can you try defType=edismax

ahmet

--- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com wrote:

 From: Pragyanshis Pattanaik pragyans...@outlook.com
 Subject: RE: Why a phrase is getting searched against default fields in solr
 To: solr Forum solr-user@lucene.apache.org
 Date: Thursday, February 14, 2013, 10:21 AM
 It is returning me all the documents
 which contains the phrase as it is searching against
 Defaultfield.my default field is like below
 field name=SearchableField type=text_general
 indexed=true stored=false
 multiValued=true/    copyField
 source=Product-Name-*
 dest=SearchableField/    copyField
 source=Product-Description-* dest=SearchableField/
 I have defined SearchableField as default field.
 Thanks,Pragyanshis
  Date: Wed, 13 Feb 2013 23:18:06 -0800
  From: iori...@yahoo.com
  Subject: Re: Why a phrase is getting searched against
 default fields in solr
  To: solr-user@lucene.apache.org
  
  Hi Pragyanshis,
  
  What happens when you remove bq parameter? 
  
  --- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com
 wrote:
  
   From: Pragyanshis Pattanaik pragyans...@outlook.com
   Subject: Why a phrase is getting searched against
 default fields in solr
   To: solr Forum solr-user@lucene.apache.org
   Date: Thursday, February 14, 2013, 8:24 AM
   Hi,
   This might be a very silly question but i want to
 know why
   this is happening.If i am using edismax query
 parser in solr
   and passing query something like below
   
      
  
 q=IPhone5wt=xmledismax=trueqf=Product-Name-0^100bq=(Product-Rating-0%3A7^300+OR+Product-Rating-0%3A8^400+OR+Product-Rating-0%3A9^500+OR+Product-Rating-0%3A10^600+OR+Product-Rating-0%3A*)
   Then why it is searching in default fields ?As i
 am
   specifying qf,it should search in the fields
 specified in qf
   parameter and boost those documents which has a
 higher
   rating.
   Please correct me if my understanding is
 wrong.Note:-I am
   using SOLR 4.0 Alpha
   Thanks,Pragyanshis    
           
             
     
     
 
       
  


Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Gora Mohanty
On 14 February 2013 14:42, Bayu Widyasanyata bwidyasany...@gmail.com wrote:
 On Thu, Feb 14, 2013 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:

 3. Depending on how you installed Solr, there should be a folder
 like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and
 add security-constraint and security-role tags. The entries
 for the latter should match the entries in step 1.


 One thing that I'm not found is folder webapps/solr/WEB-INF/.
 I install binary Solr distribution.
 It might be not created when deployed or first accessed..??
 I'm not sure... :( since I also new on Tomcat deployment.

Presumably, you followed http://wiki.apache.org/solr/SolrTomcat
Copy the .war file dist/apache-solr-*.war into $SOLR_HOME as solr.war
Instead. remove solr.war, and try adding it through the browser interface
of the Tomcat Web Application Manager, as described, e.g., in the
section Deploying Solr with the Tomcat Manager at
http://lucidworks.lucidimagination.com/display/solr/Running+Solr+on+Tomcat
You might need to change the entry for solr/home in
webapps/solr/WEB-INF/web.xml

I imagine there is a way of adding web.xml with the other mode
of installation, but I am not sure how to do that.

Regards,
Gora


How-to get date of indexing process

2013-02-14 Thread Miguel

Hi everybody

   I am looking for the way to get date of last indexing process or 
commit event that it happened in my Solr server.

I found a possible solution to add timestamp field , for example:

|field name=timestamp  type=date  indexed=true  stored=true  default=NOW  
multiValued=false/|


But, I would like a solution without modify the schema of Solr server.
I checked statistics page but I not found a useful date.

Any ideas.

Thanks




RE: How-to get date of indexing process

2013-02-14 Thread Markus Jelsma
See: admin/luke?show=index or the admin UI.

 
 
-Original message-
 From:Miguel miguel.valen...@juntadeandalucia.es
 Sent: Thu 14-Feb-2013 10:45
 To: solr-user@lucene.apache.org
 Subject: How-to get date of indexing process
 
 Hi everybody
  
     I am looking for the way to get date of last indexing process or commit 
 event that it happened in my Solr server.
  I found a possible solution to add timestamp field , for example:
  
  
 
 field name=timestamp type=date indexed=true stored=true 
 default=NOW multiValued=false/
 
  
  But, I would like a solution without modify the schema of Solr server.
  I checked statistics page but I not found a useful date.
  
  Any ideas.
  
  Thanks  
  
  
  


Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Daniel Rijkhof
James,

I'm not completely sure, and i have not tested the following:

entityname.last_index_time might also not be accessible...

Daniel

On Thu, Feb 14, 2013 at 12:47 AM, Daniel Rijkhof
daniel.rijk...@gmail.comwrote:

 James,

 I debugged it until I found where things go 'wrong'.

 Apparently the current implementation VariableResolver does not allow the
 use of a period '.' in any variable/property key you want to use... It's
 reserved for namespaces.
 Personally I would really love to use a period in my variable/property key
 names, and see no reason why this should be an issue...

 So, using for example
 solr.dataimport.jdbcDriver=org.h2.Driver
 will not work

 using just:
 jdbcDriver=org.h2.Driver

 works fine...

 So i will rename all my properties... but took me hours to find out why
 something that used to work stopped working...

 I have never had problems of using periods in any properties
 file... apparently Solr is the only project that doesn't allow the use of
 periods...

 Even if this would be documented in a way that persons can find this
 documentation, i guess it would be better to just allow periods by changing
 the implementation of the VariableResolver just a little...

 00.43 now... off to bed.

 Let me know what you think,
 Daniel




 On Wed, Feb 13, 2013 at 6:45 PM, Dyer, James james.d...@ingramcontent.com
  wrote:

 The code that resolves variables in DIH was refactored extensively in
 4.1.0.  So if you've got a case where it does not resolve the variables
 properly, please give the details.  We can open a JIRA issue and get this
 fixed.

 James Dyer
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Daniel Rijkhof [mailto:daniel.rijk...@gmail.com]
 Sent: Wednesday, February 13, 2013 11:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.1.0 not using solrcore.properties ?

 I am looking at the source code of 4.1.0 and I cannot find any prove that
 solr 4.1.0's DIH would actually use any properties from the
 solrcore.properties file.

 I do however found that Solr does load my solrcore.properties file...

 It's strange that this would have been changed,

 Does anybody have prove it still can use properties defined in
 solrcore.properties within the DIH configuration?

 In that case, please reply...
 Daniel

 Daniel Rijkhof
 06 12 14 12 17


 On Wed, Feb 13, 2013 at 4:22 PM, Daniel Rijkhof daniel.rijk...@gmail.com
 wrote:

  I have the following problem:
 
  I'm upgrading from a nightly build 4.0.* to 4.1.0.
 
  My dataimport is configured with ${variables} which always worked fine,
  untill this upgrade.
 
  My solrcore.properties file seems to be ignored.
 
  Solr.xml:
  ?xml version=1.0 encoding=UTF-8 ?
 
  solr sharedLib=lib persistent=true
cores adminPath=/admin/cores host=${host:}
  hostPort=${jetty.port:}
  core default=true name=hfselectdata
 instanceDir=hfselectdata/
/cores
  /solr
 
  and in solrhome/hfselectdata/conf/ is the file solrcore.properties.
 
  Anybody any suggestions?
 
  Greatly appreciated
  Daniel
 
 





Re: How-to get date of indexing process

2013-02-14 Thread Miguel

Thanks Markus

I didn't know that page. It's all I need it.

Thanks again

El 14/02/2013 10:47, Markus Jelsma escribió:

See: admin/luke?show=index or the admin UI.

  
  
-Original message-

From:Miguel miguel.valen...@juntadeandalucia.es
Sent: Thu 14-Feb-2013 10:45
To: solr-user@lucene.apache.org
Subject: How-to get date of indexing process

Hi everybody
  
 I am looking for the way to get date of last indexing process or commit event that it happened in my Solr server.

  I found a possible solution to add timestamp field , for example:
  
  


field name=timestamp type=date indexed=true stored=true default=NOW 
multiValued=false/

  
  But, I would like a solution without modify the schema of Solr server.

  I checked statistics page but I not found a useful date.
  
  Any ideas.
  
  Thanks
  
  
  






Re: Why SolrInputDocument use a LinkedHashMap

2013-02-14 Thread Andre Bois-Crettez

Almost. I did not benchmark it but tend to believe this
http://docs.oracle.com/javase/6/docs/api/java/util/LinkedHashMap.html :

iteration over the collection-views of a LinkedHashMap requires time
proportional to the /size/ of the map, regardless of its capacity.
Iteration over a HashMap is likely to be more expensive, requiring time
proportional to its /capacity/. 

André

On 02/13/2013 06:58 PM, knort wrote:

If the order is not important, using a HashMap offers the same fast
iteration on the fields but without having an extra LinkedList.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Why-SolrInputDocument-use-a-LinkedHashMap-tp4040195p4040260.html
Sent from the Solr - User mailing list archive at Nabble.com.


--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


RE: Why a phrase is getting searched against default fields in solr

2013-02-14 Thread Pragyanshis Pattanaik
Yes i did some changes with the requesthandler.I have added str 
name=defTypeedismax/str and removed the df field specified there and Now 
its working as i expected.
Thanks for the help ahmet.

 Date: Thu, 14 Feb 2013 01:31:14 -0800
 From: iori...@yahoo.com
 Subject: RE: Why a phrase is getting searched against default fields in solr
 To: solr-user@lucene.apache.org
 
 Hi,
 
 instead of edismax=true can you try defType=edismax
 
 ahmet
 
 --- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com wrote:
 
  From: Pragyanshis Pattanaik pragyans...@outlook.com
  Subject: RE: Why a phrase is getting searched against default fields in solr
  To: solr Forum solr-user@lucene.apache.org
  Date: Thursday, February 14, 2013, 10:21 AM
  It is returning me all the documents
  which contains the phrase as it is searching against
  Defaultfield.my default field is like below
  field name=SearchableField type=text_general
  indexed=true stored=false
  multiValued=true/copyField
  source=Product-Name-*
  dest=SearchableField/copyField
  source=Product-Description-* dest=SearchableField/
  I have defined SearchableField as default field.
  Thanks,Pragyanshis
   Date: Wed, 13 Feb 2013 23:18:06 -0800
   From: iori...@yahoo.com
   Subject: Re: Why a phrase is getting searched against
  default fields in solr
   To: solr-user@lucene.apache.org
   
   Hi Pragyanshis,
   
   What happens when you remove bq parameter? 
   
   --- On Thu, 2/14/13, Pragyanshis Pattanaik pragyans...@outlook.com
  wrote:
   
From: Pragyanshis Pattanaik pragyans...@outlook.com
Subject: Why a phrase is getting searched against
  default fields in solr
To: solr Forum solr-user@lucene.apache.org
Date: Thursday, February 14, 2013, 8:24 AM
Hi,
This might be a very silly question but i want to
  know why
this is happening.If i am using edismax query
  parser in solr
and passing query something like below

   
   
  q=IPhone5wt=xmledismax=trueqf=Product-Name-0^100bq=(Product-Rating-0%3A7^300+OR+Product-Rating-0%3A8^400+OR+Product-Rating-0%3A9^500+OR+Product-Rating-0%3A10^600+OR+Product-Rating-0%3A*)
Then why it is searching in default fields ?As i
  am
specifying qf,it should search in the fields
  specified in qf
parameter and boost those documents which has a
  higher
rating.
Please correct me if my understanding is
  wrong.Note:-I am
using SOLR 4.0 Alpha
Thanks,Pragyanshis

  
  
  
  


  

Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
On all products I have I want to implement a price range filter.
Since this pricerange is applied on the entire population and not on a
single product, my assumption was that it would not make sense to define
this within the shopitem entity, but rather under the document
shopitems. So that's what I did in my data-config below.

But now on these requests: 
http://localhost:8983/solr/tt-shop/dataimport?command=reload-config
http://localhost:8983/solr/tt-shop/dataimport?command=full-import

I get the error:
DataImportHandler started. Not Initialized. No commands can be run

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost:1433;databaseName= user=**
password=* /
document name=shopitems
entity name=shopitem pk=id query=select * from products 

field name=id column=ID /
field name=prijs column=prijs /
field name=createdate column=createdate /

/entity
entity name=quot;pricerangequot; query=quot;;With 
Categorized as
(Select 
 CASE When prijs amp;lt;= 1000 Then 'lt;10'
  When prijs amp;gt; 1000 and prijs amp;lt;= 2500 Then '[10-25]'
  When prijs amp;gt; 2500 and prijs amp;lt;= 5000 Then '[25-50]'
  Else '50'
 END as PriceCategory  From products)
Select PriceCategory, Count(*) as Cnt From Categorized Group By
PriceCategory 
/entity

/document
/dataConfig




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr 4.1 spatial with JTS - spatial query withitin a WKT polygon contained within another query ...

2013-02-14 Thread Pires, Guilherme
Hello Everyone,

I've been integrating Solr 4.1 into a Web GIS solution and it's working great.
I have implemented JTS within Solr 4.1 and indexed thousands of WKT polygons 
provided by XML document genereated by a GE's GIS Core system. Everything seems 
to working out great.

Now I have a feature where I want to query solr with  
geo:intersects((POLYGON(...  with a polygon too big to send via xmlhttp object. 
I'm getting a http 505 error.

1.  Is there any other way of sending this huge string back to solr? (I've 
tried GET and POST)
2.  This polygon was the result of a previous query so, is there a  way of 
query inside a query? Something like ,... fq=geo:intersects(another 
query.spatialfield_with_the_wkt_polygon) ?
Thanks
Guilherme





Re: Index-time synonyms and trailing wildcard issue

2013-02-14 Thread Johannes Rodenwald
Hello Jack,

Thanks for your answer, it helped me gaining a deeper understandig what happens 
at index time, and finding a solution myself:

It seems that putting the synonym filter in both filter chains (index and 
query), setting expand=false, and putting the desired synonym first in the 
row, does the trick:
Synonyms line (reversed order!):
orange, apfelsine

All documents containing apfelsine are now mapped to orange, so there are 
no more documets containing apfelsine that would match a wildcard-query for 
apfel*  (Apfelsine is a true synonym for Orange in german, meaning 
chinese apple. Apfel = apple, shouldnt match oranges).

Problem solved, thanks again for the help!

Johannes Rodenwald 

- Ursprüngliche Mail -
Von: Jack Krupansky j...@basetechnology.com
An: solr-user@lucene.apache.org
Gesendet: Mittwoch, 13. Februar 2013 17:17:40
Betreff: Re: Index-time synonyms and trailing wildcard issue

By doing synonyms at index time, you cause apfelsin to be added to 
documents that contain only orang, so of course documents that previously 
only contained orang will now match for apfelsin or any term query that 
matches apfelsin, such as a wildcard. At query time, Lucene cannot tell 
whether your original document contained apfelsin or if apfelsin was 
added when the document was indexed due to an index-time synonym.

Solution: Either disable index time synonyms, or have a parallel field (via 
copyField) that does not have the index-time synonyms.

But... perhaps you should clarify what you really intend to happen with 
these pseudo-synonyms.

-- Jack Krupansky




JMX generation number is wrong

2013-02-14 Thread Aristedes Maniatis

I'm trying to monitor the state of a master-slave Solr4.1 cluster. I can easily 
get the generation number of the slaves using JMX like this:

solr/{corename}/org.apache.solr.handler.ReplicationHandler/generation

That works fine. However on the master, this number is always 1. Which makes it 
rather hard to check if the slaves are lagging behind.

Is this a defect in the JMX properties in Solr and I should file a Jira?


Ari


--
--
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


get filterCache in Component

2013-02-14 Thread Markus Jelsma
Hi,

We need to get the filterCache in a Component but 
SolrIndexSearcher.getCache(String name) does not return it. It seems the 
filterCache is not added to cacheMap and can therefore not be returned.

SolrCacheQuery,DocSet filterCache = 
rb.req.getSearcher().getCache(filterCache);

Will always return null. Can we get the filterCache via other means or should 
it be added to the cacheMap so getCache can return it?

Thanks,
Markus


Re: Most common query

2013-02-14 Thread Erick Erickson
If I'm understanding your quetion correctly, you have to build that out
yourself. Solr doesn't store the searches, nor the results.

Hmm, though if you keep the Solr logs around you can reconstruct the
queries from them although it takes a bit of work. The other place would be
your servelet container logs which should be able to store all the queries.


On Wed, Feb 13, 2013 at 10:27 AM, ROSENBERG, YOEL (YOEL)** CTR ** 
yoel.rosenb...@alcatel-lucent.com wrote:

  Hi,

 ** **

 I have a question, hope you can help me.

 I would like to get report using the solr admin tools that return the
 entire search that made on the system between dates.

 What is the correct way to do it?

 ** **

 BR,

 Yoel

 ** **

 

 *Yoel Rosenberg*

 ALCATEL-LUCENT

 Support Engineer

 T: +972 77 9088584

 M: +972 54 239 5204

 *yoel.rosenb...@alcatel-lucent.com* yoel.rosenb...@alcatel-lucent.com***
 *

 ** **



Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Erick Erickson
One data point: I can comfortably index and search the Wikipedia dump (11M
articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
queries, but

Erick


On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:

 Excellent, thank you very much for the reply!

 On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  Matthew Shapiro [m...@mshapiro.net] wrote:
 
   Sorry, I should clarify our current statistics.  First of all I meant
  183k
   documents (not 183, woops). Around 100k of those are full fledged html
   articles (not web pages but articles in our CMS with html content
 inside
   of them),
 
  If an article is around 10-30 pages (or the equivalent), this is still a
  small corpus.
 
   the rest of the data are more like key/value data records with a lot
   of attached meta data for searching.
 
  If the amount of unique categories (model, author, playtime, lix,
  favorite_band, year...) in the meta data is in the lower hundreds, you
  should be fine.
 
   Also, what I meant by search without a search term is that probably 80%
   (hard to confirm due to the lack of stats given by the GSA) of our
  searches
   are done on pure metadata clauses without any searching through the
  content
   itself,
 
  That clarifies a lot, thanks. So we have roughly speaking 4000*5
  queries/day ~= 14 queries/minute. Guessing wildly that your peak time
  traffic is about 5 times that, we end up with about 1 query/second. That
 is
  a very light load for the Solr installation we're discussing.
 
   so for example give me documents that have a content type of
   video, that are marked for client X, have a category of Y or Z, and was
   published to platform A, ordered by date published.
 
  That is a near-trivial query and you should get a reply very fast on
  modest hardware.
 
   The searches that use a search term are more like use the same query
  from the
   example as before, but find me all the documents that have the string
  My Video
   in it's title and description.
 
  Unless you experiment with fuzzy matches and phrase slop, this should
 also
  be fast. Ignoring analyzers, there is practically no difference between a
  meta data field and a larger content field in Solr.
 
  Your current search (guessing here) iterates all terms in the content
  fields and take a comparatively large penalty when a large document is
  encountered. The inversion of index in Solr means that the search terms
 are
  looked up in a dictionary and refers to the documents they belong to. The
  penalty for having thousands or millions of terms as compared to tens or
  hundreds in a field in an inverted index is very small.
 
  We're still in any random machine you've got available-land so I second
  Michael's suggestion.
 
  Regards,
  Toke Eskildsen



Re: Multi Core / On demand loading

2013-02-14 Thread Erick Erickson
I updated this page: http://wiki.apache.org/solr/CoreAdmin, look for
transientCacheSize and loadOnStartup. Be aware that this is somewhat in
flux, but anything you find please report!

Man, oh man, do I have a lot of documentation to do on all this once the
dust settles

Erick


On Wed, Feb 13, 2013 at 5:10 PM, Vinay B, vybe3...@gmail.com wrote:

 Amongst the highlights for the SOLR 4.1 release, I see
 Multi-core: On-demand core loading and LRU-based core unloading after
 reaching a user-specified maximum number.

 How is this configured and where should I be looking for a reference on
 this feature?

 Thanks



Re: Most common query

2013-02-14 Thread Ahmet Arslan
Hi,

If I am not mistaken I saw some open jira to collect queries and calculate 
popular searches etc.

Some commercial solutions exist:

http://sematext.com/search-analytics/index.html
http://soleami.com/blog/soleami-start_en.html


--- On Wed, 2/13/13, ROSENBERG, YOEL (YOEL)** CTR ** 
yoel.rosenb...@alcatel-lucent.com wrote:

From: ROSENBERG, YOEL (YOEL)** CTR ** yoel.rosenb...@alcatel-lucent.com
Subject: Most common query
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Date: Wednesday, February 13, 2013, 5:27 PM




 
 











Hi, 

   

I have a question, hope you can help me. 

I would like to get report using the solr admin
tools that return the entire search that made on the system between dates. 

What is the correct way to do it? 

   

BR, 

Yoel 

   

 

Yoel Rosenberg 

ALCATEL-LUCENT 

Support Engineer 

T: +972 77 9088584 

M: +972 54 239 5204 

yoel.rosenb...@alcatel-lucent.com 

   








Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Jack Krupansky
That raises the question of how your average professional notebook computer 
(PC or Mac or Linux) compares to a garden-variety cloud server such as an 
Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document 
ingestion rate or how many documents you can load before load and/or query 
performance starts to fall off the cliff. Anybody have any numbers? I mean, 
is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel? 
(With all the usual caveats that it all depends and your mileage will 
vary.) But the intent would be for a similar workload on both (like loading 
the wikipedia dump.)


-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Thursday, February 14, 2013 7:31 AM
To: solr-user@lucene.apache.org
Subject: Re: What should focus be on hardware for solr servers?

One data point: I can comfortably index and search the Wikipedia dump (11M
articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
queries, but

Erick


On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:


Excellent, thank you very much for the reply!

On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
wrote:

 Matthew Shapiro [m...@mshapiro.net] wrote:

  Sorry, I should clarify our current statistics.  First of all I meant
 183k
  documents (not 183, woops). Around 100k of those are full fledged html
  articles (not web pages but articles in our CMS with html content
inside
  of them),

 If an article is around 10-30 pages (or the equivalent), this is still a
 small corpus.

  the rest of the data are more like key/value data records with a lot
  of attached meta data for searching.

 If the amount of unique categories (model, author, playtime, lix,
 favorite_band, year...) in the meta data is in the lower hundreds, you
 should be fine.

  Also, what I meant by search without a search term is that probably 
  80%

  (hard to confirm due to the lack of stats given by the GSA) of our
 searches
  are done on pure metadata clauses without any searching through the
 content
  itself,

 That clarifies a lot, thanks. So we have roughly speaking 4000*5
 queries/day ~= 14 queries/minute. Guessing wildly that your peak time
 traffic is about 5 times that, we end up with about 1 query/second. That
is
 a very light load for the Solr installation we're discussing.

  so for example give me documents that have a content type of
  video, that are marked for client X, have a category of Y or Z, and 
  was

  published to platform A, ordered by date published.

 That is a near-trivial query and you should get a reply very fast on
 modest hardware.

  The searches that use a search term are more like use the same query
 from the
  example as before, but find me all the documents that have the string
 My Video
  in it's title and description.

 Unless you experiment with fuzzy matches and phrase slop, this should
also
 be fast. Ignoring analyzers, there is practically no difference between 
 a

 meta data field and a larger content field in Solr.

 Your current search (guessing here) iterates all terms in the content
 fields and take a comparatively large penalty when a large document is
 encountered. The inversion of index in Solr means that the search terms
are
 looked up in a dictionary and refers to the documents they belong to. 
 The

 penalty for having thousands or millions of terms as compared to tens or
 hundreds in a field in an inverted index is very small.

 We're still in any random machine you've got available-land so I 
 second

 Michael's suggestion.

 Regards,
 Toke Eskildsen





Re: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Erick Erickson
Daniel:

It would be great if you would go ahead and edit the Wiki, all you have to
do is create a signon. Having just gone through the pain of figuring this
out, you're best positioned to know how to warn others!

Best
Erick


On Thu, Feb 14, 2013 at 4:56 AM, Daniel Rijkhof daniel.rijk...@gmail.comwrote:

 James,

 I'm not completely sure, and i have not tested the following:

 entityname.last_index_time might also not be accessible...

 Daniel

 On Thu, Feb 14, 2013 at 12:47 AM, Daniel Rijkhof
 daniel.rijk...@gmail.comwrote:

  James,
 
  I debugged it until I found where things go 'wrong'.
 
  Apparently the current implementation VariableResolver does not allow the
  use of a period '.' in any variable/property key you want to use... It's
  reserved for namespaces.
  Personally I would really love to use a period in my variable/property
 key
  names, and see no reason why this should be an issue...
 
  So, using for example
  solr.dataimport.jdbcDriver=org.h2.Driver
  will not work
 
  using just:
  jdbcDriver=org.h2.Driver
 
  works fine...
 
  So i will rename all my properties... but took me hours to find out why
  something that used to work stopped working...
 
  I have never had problems of using periods in any properties
  file... apparently Solr is the only project that doesn't allow the use of
  periods...
 
  Even if this would be documented in a way that persons can find this
  documentation, i guess it would be better to just allow periods by
 changing
  the implementation of the VariableResolver just a little...
 
  00.43 now... off to bed.
 
  Let me know what you think,
  Daniel
 
 
 
 
  On Wed, Feb 13, 2013 at 6:45 PM, Dyer, James 
 james.d...@ingramcontent.com
   wrote:
 
  The code that resolves variables in DIH was refactored extensively in
  4.1.0.  So if you've got a case where it does not resolve the variables
  properly, please give the details.  We can open a JIRA issue and get
 this
  fixed.
 
  James Dyer
  Ingram Content Group
  (615) 213-4311
 
  -Original Message-
  From: Daniel Rijkhof [mailto:daniel.rijk...@gmail.com]
  Sent: Wednesday, February 13, 2013 11:09 AM
  To: solr-user@lucene.apache.org
  Subject: Re: Solr 4.1.0 not using solrcore.properties ?
 
  I am looking at the source code of 4.1.0 and I cannot find any prove
 that
  solr 4.1.0's DIH would actually use any properties from the
  solrcore.properties file.
 
  I do however found that Solr does load my solrcore.properties file...
 
  It's strange that this would have been changed,
 
  Does anybody have prove it still can use properties defined in
  solrcore.properties within the DIH configuration?
 
  In that case, please reply...
  Daniel
 
  Daniel Rijkhof
  06 12 14 12 17
 
 
  On Wed, Feb 13, 2013 at 4:22 PM, Daniel Rijkhof 
 daniel.rijk...@gmail.com
  wrote:
 
   I have the following problem:
  
   I'm upgrading from a nightly build 4.0.* to 4.1.0.
  
   My dataimport is configured with ${variables} which always worked
 fine,
   untill this upgrade.
  
   My solrcore.properties file seems to be ignored.
  
   Solr.xml:
   ?xml version=1.0 encoding=UTF-8 ?
  
   solr sharedLib=lib persistent=true
 cores adminPath=/admin/cores host=${host:}
   hostPort=${jetty.port:}
   core default=true name=hfselectdata
  instanceDir=hfselectdata/
 /cores
   /solr
  
   and in solrhome/hfselectdata/conf/ is the file solrcore.properties.
  
   Anybody any suggestions?
  
   Greatly appreciated
   Daniel
  
  
 
 
 



Re: Multi Core / On demand loading

2013-02-14 Thread Erick Erickson
Almost forgot. Do be aware of
https://issues.apache.org/jira/browse/SOLR-4400. This came to light under
an absurd load of opening/closing transient cores, which only means it
won't show up until you go into production. The fix is on both trunk and 4x.




On Thu, Feb 14, 2013 at 7:46 AM, Erick Erickson erickerick...@gmail.comwrote:

 I updated this page: http://wiki.apache.org/solr/CoreAdmin, look for
 transientCacheSize and loadOnStartup. Be aware that this is somewhat in
 flux, but anything you find please report!

 Man, oh man, do I have a lot of documentation to do on all this once the
 dust settles

 Erick


 On Wed, Feb 13, 2013 at 5:10 PM, Vinay B, vybe3...@gmail.com wrote:

 Amongst the highlights for the SOLR 4.1 release, I see
 Multi-core: On-demand core loading and LRU-based core unloading after
 reaching a user-specified maximum number.

 How is this configured and where should I be looking for a reference on
 this feature?

 Thanks





Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Á_____o
Well, thinking a bit more, the second solution is not practical.

If Solr retrieves, say, 1.000 documents, I would have to navigate through
ALL (maybe less with some reasonable upper limit) of them to recalculate the
scores and reorder them according to the new score although the Web App is
going to show just the first 20.

In other words, I would lose the benefits of Solr's (well, and most DB's)
row/offset feature to retrieve information in batchs rather than the whole
amount of results which may not be seen by the user at all.

I'm now wondering if a custom implementation of a ValueSource + a
FunctionQuery is a solution to my problem...

Any hint?
Thanks!

Álvaro



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-tp4040200p4040444.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Maximum Number of Records In Index

2013-02-14 Thread Macroman
Partial updates is nothing as clever as I may have made it sound, it is just
changing a record value , for example last name from Smith to Jones, that's
my partial update. 

No errors at all in indexing, I have not yet checked the logs , but the DIH
output counts show no errors, here is an example
str name=Total Requests made to DataSource2/strstr name=Total Rows
Fetched14823/strstr name=Total Documents Skipped0/strstr
name=Full Dump Started2013-02-14 07:00:30/strstr name=Indexing
completed. Added/Updated: 14823 documents. Deleted 0 documents./strstr
name=Committed2013-02-14 07:19:59/strstr name=Optimized2013-02-14
07:19:59/strstr name=Total Documents Processed14823/strstr
name=Time taken 0:19:58.557/str

Having analysed the SOLR index this afternoon I realised that I actually add
the date/time of when record indexed so did a quick SOLR admin count using
 record_date:[2000-02-14T00:00:00.000Z TO 2013-02-10T00:00:00.000Z]
this resulted in a count of 32.723 records indexed today, and when I add up
all the DIH's of Added/Updated it comes to 35,369 , weird !!! Now for the
total maths , yesterday's total index count was 13593885 and today it is
13598211 a difference of 4326, but I do need to take into account records
updates, so running the SQL form each of the DIH's sources in SQL Developer
to purely get counts, my counts are a total of 31,789 which means only 3,000
to 4,000 updates the rest are all new.

So I will definately say that records are being deleted so need to check the
logs as suggested. If no mention of deletions exist my next question will be
can I get a Month- breakdown on a SOLR date field so I can monitor
records that drop off, because one field that will definately not change is
the record creation date from the source systems which is part of the
indexed record?

this line ready for entering log details to see if any deletes occurred



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-Number-of-Records-In-Index-tp4038961p4040445.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MockAnalyzer in Lucene: attach stemmer or any custom filter?

2013-02-14 Thread Robert Muir
MockAnalyzer is really just MocKTokenizer+MockTokenFilter+

Instead you just define your own analyzer chain using MockTokenizer.
This is the way all lucene's own analysis tests work: e.g.
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/analysis/common/src/test/org/apache/lucene/analysis/en/TestEnglishMinimalStemFilter.java

On Thu, Feb 14, 2013 at 7:40 AM, Dmitry Kan solrexp...@gmail.com wrote:
 Hello,

 Asked a question on SO:

 http://stackoverflow.com/questions/14873207/mockanalyzer-in-lucene-attach-stemmer-or-any-custom-filter

 Is there a way to configure a stemmer or a custom filter with the
 MockAnalyzer class?
 Version: LUCENE_34

 Dmitry


RE: Solr 4.1.0 not using solrcore.properties ?

2013-02-14 Thread Dyer, James
Daniel,

This bug has already been recorded and hopefully will be fixed in time for 4.2. 
 See https://issues.apache.org/jira/browse/SOLR-4361 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Daniel Rijkhof [mailto:daniel.rijk...@gmail.com] 
Sent: Wednesday, February 13, 2013 5:47 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr 4.1.0 not using solrcore.properties ?

James,

I debugged it until I found where things go 'wrong'.

Apparently the current implementation VariableResolver does not allow the
use of a period '.' in any variable/property key you want to use... It's
reserved for namespaces.
Personally I would really love to use a period in my variable/property key
names, and see no reason why this should be an issue...

So, using for example
solr.dataimport.jdbcDriver=org.h2.Driver
will not work

using just:
jdbcDriver=org.h2.Driver

works fine...

So i will rename all my properties... but took me hours to find out why
something that used to work stopped working...

I have never had problems of using periods in any properties
file... apparently Solr is the only project that doesn't allow the use of
periods...

Even if this would be documented in a way that persons can find this
documentation, i guess it would be better to just allow periods by changing
the implementation of the VariableResolver just a little...

00.43 now... off to bed.

Let me know what you think,
Daniel




On Wed, Feb 13, 2013 at 6:45 PM, Dyer, James
james.d...@ingramcontent.comwrote:

 The code that resolves variables in DIH was refactored extensively in
 4.1.0.  So if you've got a case where it does not resolve the variables
 properly, please give the details.  We can open a JIRA issue and get this
 fixed.

 James Dyer
 Ingram Content Group
 (615) 213-4311

 -Original Message-
 From: Daniel Rijkhof [mailto:daniel.rijk...@gmail.com]
 Sent: Wednesday, February 13, 2013 11:09 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr 4.1.0 not using solrcore.properties ?

 I am looking at the source code of 4.1.0 and I cannot find any prove that
 solr 4.1.0's DIH would actually use any properties from the
 solrcore.properties file.

 I do however found that Solr does load my solrcore.properties file...

 It's strange that this would have been changed,

 Does anybody have prove it still can use properties defined in
 solrcore.properties within the DIH configuration?

 In that case, please reply...
 Daniel

 Daniel Rijkhof
 06 12 14 12 17


 On Wed, Feb 13, 2013 at 4:22 PM, Daniel Rijkhof daniel.rijk...@gmail.com
 wrote:

  I have the following problem:
 
  I'm upgrading from a nightly build 4.0.* to 4.1.0.
 
  My dataimport is configured with ${variables} which always worked fine,
  untill this upgrade.
 
  My solrcore.properties file seems to be ignored.
 
  Solr.xml:
  ?xml version=1.0 encoding=UTF-8 ?
 
  solr sharedLib=lib persistent=true
cores adminPath=/admin/cores host=${host:}
  hostPort=${jetty.port:}
  core default=true name=hfselectdata instanceDir=hfselectdata/
/cores
  /solr
 
  and in solrhome/hfselectdata/conf/ is the file solrcore.properties.
 
  Anybody any suggestions?
 
  Greatly appreciated
  Daniel
 
 





Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
My dual-core, HT-enabled Dell Latitude from last year has this CPU:
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
bogomips: 4988.65

An m3.xlarge reports:
model name : Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz
bogomips : 4000.14

I tried running geekbench and phoronx-test-suite and failed at both...
Anybody have a favorite, free, CLI benchmarking suite?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky j...@basetechnology.com wrote:
 That raises the question of how your average professional notebook computer
 (PC or Mac or Linux) compares to a garden-variety cloud server such as an
 Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document
 ingestion rate or how many documents you can load before load and/or query
 performance starts to fall off the cliff. Anybody have any numbers? I mean,
 is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel?
 (With all the usual caveats that it all depends and your mileage will
 vary.) But the intent would be for a similar workload on both (like loading
 the wikipedia dump.)

 -- Jack Krupansky

 -Original Message- From: Erick Erickson
 Sent: Thursday, February 14, 2013 7:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: What should focus be on hardware for solr servers?


 One data point: I can comfortably index and search the Wikipedia dump (11M
 articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
 queries, but

 Erick


 On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:

 Excellent, thank you very much for the reply!

 On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  Matthew Shapiro [m...@mshapiro.net] wrote:
 
   Sorry, I should clarify our current statistics.  First of all I meant
  183k
   documents (not 183, woops). Around 100k of those are full fledged html
   articles (not web pages but articles in our CMS with html content
 inside
   of them),
 
  If an article is around 10-30 pages (or the equivalent), this is still a
  small corpus.
 
   the rest of the data are more like key/value data records with a lot
   of attached meta data for searching.
 
  If the amount of unique categories (model, author, playtime, lix,
  favorite_band, year...) in the meta data is in the lower hundreds, you
  should be fine.
 
   Also, what I meant by search without a search term is that probably 
80%
   (hard to confirm due to the lack of stats given by the GSA) of our
  searches
   are done on pure metadata clauses without any searching through the
  content
   itself,
 
  That clarifies a lot, thanks. So we have roughly speaking 4000*5
  queries/day ~= 14 queries/minute. Guessing wildly that your peak time
  traffic is about 5 times that, we end up with about 1 query/second. That
 is
  a very light load for the Solr installation we're discussing.
 
   so for example give me documents that have a content type of
   video, that are marked for client X, have a category of Y or Z, and 
was
   published to platform A, ordered by date published.
 
  That is a near-trivial query and you should get a reply very fast on
  modest hardware.
 
   The searches that use a search term are more like use the same query
  from the
   example as before, but find me all the documents that have the string
  My Video
   in it's title and description.
 
  Unless you experiment with fuzzy matches and phrase slop, this should
 also
  be fast. Ignoring analyzers, there is practically no difference between
   a
  meta data field and a larger content field in Solr.
 
  Your current search (guessing here) iterates all terms in the content
  fields and take a comparatively large penalty when a large document is
  encountered. The inversion of index in Solr means that the search terms
 are
  looked up in a dictionary and refers to the documents they belong to. 
  The
  penalty for having thousands or millions of terms as compared to tens or
  hundreds in a field in an inverted index is very small.
 
  We're still in any random machine you've got available-land so I 
  second
  Michael's suggestion.
 
  Regards,
  Toke Eskildsen




Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
Or perhaps we should develop our own, Solr-based benchmark...

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Feb 14, 2013 at 10:54 AM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 My dual-core, HT-enabled Dell Latitude from last year has this CPU:
 model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
 bogomips: 4988.65

 An m3.xlarge reports:
 model name : Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz
 bogomips : 4000.14

 I tried running geekbench and phoronx-test-suite and failed at both...
 Anybody have a favorite, free, CLI benchmarking suite?

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky j...@basetechnology.com 
 wrote:
 That raises the question of how your average professional notebook computer
 (PC or Mac or Linux) compares to a garden-variety cloud server such as an
 Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document
 ingestion rate or how many documents you can load before load and/or query
 performance starts to fall off the cliff. Anybody have any numbers? I mean,
 is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel?
 (With all the usual caveats that it all depends and your mileage will
 vary.) But the intent would be for a similar workload on both (like loading
 the wikipedia dump.)

 -- Jack Krupansky

 -Original Message- From: Erick Erickson
 Sent: Thursday, February 14, 2013 7:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: What should focus be on hardware for solr servers?


 One data point: I can comfortably index and search the Wikipedia dump (11M
 articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
 queries, but

 Erick


 On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:

 Excellent, thank you very much for the reply!

 On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  Matthew Shapiro [m...@mshapiro.net] wrote:
 
   Sorry, I should clarify our current statistics.  First of all I meant
  183k
   documents (not 183, woops). Around 100k of those are full fledged html
   articles (not web pages but articles in our CMS with html content
 inside
   of them),
 
  If an article is around 10-30 pages (or the equivalent), this is still a
  small corpus.
 
   the rest of the data are more like key/value data records with a lot
   of attached meta data for searching.
 
  If the amount of unique categories (model, author, playtime, lix,
  favorite_band, year...) in the meta data is in the lower hundreds, you
  should be fine.
 
   Also, what I meant by search without a search term is that probably 
80%
   (hard to confirm due to the lack of stats given by the GSA) of our
  searches
   are done on pure metadata clauses without any searching through the
  content
   itself,
 
  That clarifies a lot, thanks. So we have roughly speaking 4000*5
  queries/day ~= 14 queries/minute. Guessing wildly that your peak time
  traffic is about 5 times that, we end up with about 1 query/second. That
 is
  a very light load for the Solr installation we're discussing.
 
   so for example give me documents that have a content type of
   video, that are marked for client X, have a category of Y or Z, and 
was
   published to platform A, ordered by date published.
 
  That is a near-trivial query and you should get a reply very fast on
  modest hardware.
 
   The searches that use a search term are more like use the same query
  from the
   example as before, but find me all the documents that have the string
  My Video
   in it's title and description.
 
  Unless you experiment with fuzzy matches and phrase slop, this should
 also
  be fast. Ignoring analyzers, there is practically no difference between
   a
  meta data field and a larger content field in Solr.
 
  Your current search (guessing here) iterates all terms in the content
  fields and take a comparatively large penalty when a large document is
  encountered. The inversion of index in Solr means that the search terms
 are
  looked up in a dictionary and refers to the documents they belong to. 
  The
  penalty for having thousands or millions of terms as compared to tens or
  hundreds in a field in an inverted index is very small.
 
  We're still in any random machine you've got available-land so I 
  second
  Michael's suggestion.
 
  Regards,
  Toke Eskildsen




RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Dyer, James
This looks like https://issues.apache.org/jira/browse/SOLR-2115 , which was 
fixed for 4.0-Alpha .

Bascially, if you do not put a data-config.xml file in the defaults section 
in solrconfig.xml, or if your config file has any errors, you won't be able to 
use DIH unless you fix the problem and restart solr.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Thursday, February 14, 2013 5:02 AM
To: solr-user@lucene.apache.org
Subject: Implement price range filter: DataImportHandler started. Not 
Initialized. No commands can be run

On all products I have I want to implement a price range filter.
Since this pricerange is applied on the entire population and not on a
single product, my assumption was that it would not make sense to define
this within the shopitem entity, but rather under the document
shopitems. So that's what I did in my data-config below.

But now on these requests: 
http://localhost:8983/solr/tt-shop/dataimport?command=reload-config
http://localhost:8983/solr/tt-shop/dataimport?command=full-import

I get the error:
DataImportHandler started. Not Initialized. No commands can be run

dataConfig
dataSource driver=com.microsoft.sqlserver.jdbc.SQLServerDriver
url=jdbc:sqlserver://localhost:1433;databaseName= user=**
password=* /
document name=shopitems
entity name=shopitem pk=id query=select * from products 

field name=id column=ID /
field name=prijs column=prijs /
field name=createdate column=createdate /

/entity
entity name=quot;pricerangequot; query=quot;;With 
Categorized as
(Select 
 CASE When prijs amp;lt;= 1000 Then 'lt;10'
  When prijs amp;gt; 1000 and prijs amp;lt;= 2500 Then '[10-25]'
  When prijs amp;gt; 2500 and prijs amp;lt;= 5000 Then '[25-50]'
  Else '50'
 END as PriceCategory  From products)
Select PriceCategory, Count(*) as Cnt From Categorized Group By
PriceCategory 
/entity

/document
/dataConfig




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418.html
Sent from the Solr - User mailing list archive at Nabble.com.




RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
Ok, but I restarted solr several times and the issue still occurs. So my
guess is that the entity I added contains errors:

entity name=amp;quot;pricerangeamp;quot; query=amp;quot;;With
Categorized as 
(Select 
 CASE When prijs amp;amp;lt;= 1000 Then 'amp;lt;10' 
  When prijs amp;amp;gt; 1000 and prijs amp;amp;lt;= 2500 Then
'[10-25]' 
  When prijs amp;amp;gt; 2500 and prijs amp;amp;lt;= 5000 Then
'[25-50]' 
  Else '50' 
 END as PriceCategory  From products) 
Select PriceCategory, Count(*) as Cnt From Categorized Group By 
PriceCategory  
/entity 

Or are you saying that this code is correct and that the 4.0-Alpha release
will resolve my issue?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418p4040483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: compare two shards.

2013-02-14 Thread Paul
I do a brute-force regression test where I read all the documents from
shard 1 and compare them to documents in shard 2. I had to have all the
fields stored to do that, but in my case that doesn't change the size of
the index much.

So, in other words, I do a search for a page's worth of documents sorted by
the same thing and compare them, then get the next page and do the same.



On Tue, Feb 12, 2013 at 4:20 AM, stockii stock.jo...@googlemail.com wrote:

 hello.

 i want to compare two shards each other, because these shards should have
 the same index. but this isnt so =(
 so i want to find these documents, there are missing in one shard of my
 both
 shards.

 my ideas
 - distrubuted shard request on my nodes and fire a facet search on my
 unique-field. but the result of facet component isnt reversable =(

 - grouping. but its not working correctly i think so. no groups of the same
 uniquekey in the resultset.


 does anyone some better ideas?



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Walter Underwood
Just using a single CPU (log processing with Python), my MacBook Pro (2GHz 
Intel Core i7) is twice as fast as an m2.xlarge EC2 instance.

Laptop disks are slower than the EC2 disks.

EC2 is for quantity, not quality.

wunder

On Feb 14, 2013, at 5:10 AM, Jack Krupansky wrote:

 That raises the question of how your average professional notebook computer 
 (PC or Mac or Linux) compares to a garden-variety cloud server such as an 
 Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document 
 ingestion rate or how many documents you can load before load and/or query 
 performance starts to fall off the cliff. Anybody have any numbers? I mean, 
 is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel? 
 (With all the usual caveats that it all depends and your mileage will 
 vary.) But the intent would be for a similar workload on both (like loading 
 the wikipedia dump.)
 
 -- Jack Krupansky
 
 -Original Message- From: Erick Erickson
 Sent: Thursday, February 14, 2013 7:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: What should focus be on hardware for solr servers?
 
 One data point: I can comfortably index and search the Wikipedia dump (11M
 articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
 queries, but
 
 Erick
 
 
 On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:
 
 Excellent, thank you very much for the reply!
 
 On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:
 
  Matthew Shapiro [m...@mshapiro.net] wrote:
 
   Sorry, I should clarify our current statistics.  First of all I meant
  183k
   documents (not 183, woops). Around 100k of those are full fledged html
   articles (not web pages but articles in our CMS with html content
 inside
   of them),
 
  If an article is around 10-30 pages (or the equivalent), this is still a
  small corpus.
 
   the rest of the data are more like key/value data records with a lot
   of attached meta data for searching.
 
  If the amount of unique categories (model, author, playtime, lix,
  favorite_band, year...) in the meta data is in the lower hundreds, you
  should be fine.
 
   Also, what I meant by search without a search term is that probably   
   80%
   (hard to confirm due to the lack of stats given by the GSA) of our
  searches
   are done on pure metadata clauses without any searching through the
  content
   itself,
 
  That clarifies a lot, thanks. So we have roughly speaking 4000*5
  queries/day ~= 14 queries/minute. Guessing wildly that your peak time
  traffic is about 5 times that, we end up with about 1 query/second. That
 is
  a very light load for the Solr installation we're discussing.
 
   so for example give me documents that have a content type of
   video, that are marked for client X, have a category of Y or Z, and   
   was
   published to platform A, ordered by date published.
 
  That is a near-trivial query and you should get a reply very fast on
  modest hardware.
 
   The searches that use a search term are more like use the same query
  from the
   example as before, but find me all the documents that have the string
  My Video
   in it's title and description.
 
  Unless you experiment with fuzzy matches and phrase slop, this should
 also
  be fast. Ignoring analyzers, there is practically no difference between  a
  meta data field and a larger content field in Solr.
 
  Your current search (guessing here) iterates all terms in the content
  fields and take a comparatively large penalty when a large document is
  encountered. The inversion of index in Solr means that the search terms
 are
  looked up in a dictionary and refers to the documents they belong to.  The
  penalty for having thousands or millions of terms as compared to tens or
  hundreds in a field in an inverted index is very small.
 
  We're still in any random machine you've got available-land so I  second
  Michael's suggestion.
 
  Regards,
  Toke Eskildsen
 

--
Walter Underwood
wun...@wunderwood.org





RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Dyer, James
No, you still have to fix problems with data-config.xml.  Just that prior to 
4.0-alpha if you started solr with a problem in the config, you had no way to 
fix it and refreshing without restarting solr (or at least doing a core 
reload).  With 4.0, you can fix your config file and just retry.

I think the problem might be the escaped quotes and amperstands.  Change it 
to...

entity name=pricerange query=With
Categorized as 
(Select 
 CASE When prijs amp;amp;lt;= 1000 Then 'amp;lt;10' 
  When prijs amp;amp;gt; 1000 and prijs amp;amp;lt;= 2500 Then
'[10-25]' 
  When prijs amp;amp;gt; 2500 and prijs amp;amp;lt;= 5000 Then
'[25-50]' 
  Else '50' 
 END as PriceCategory  From products) 
Select PriceCategory, Count(*) as Cnt From Categorized Group By 
PriceCategory  
/entity James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: PeterKerk [mailto:vettepa...@hotmail.com] 
Sent: Thursday, February 14, 2013 10:01 AM
To: solr-user@lucene.apache.org
Subject: RE: Implement price range filter: DataImportHandler started. Not 
Initialized. No commands can be run

Ok, but I restarted solr several times and the issue still occurs. So my
guess is that the entity I added contains errors:

entity name=amp;quot;pricerangeamp;quot; query=amp;quot;;With
Categorized as 
(Select 
 CASE When prijs amp;amp;lt;= 1000 Then 'amp;lt;10' 
  When prijs amp;amp;gt; 1000 and prijs amp;amp;lt;= 2500 Then
'[10-25]' 
  When prijs amp;amp;gt; 2500 and prijs amp;amp;lt;= 5000 Then
'[25-50]' 
  Else '50' 
 END as PriceCategory  From products) 
Select PriceCategory, Count(*) as Cnt From Categorized Group By 
PriceCategory  
/entity 

Or are you saying that this code is correct and that the 4.0-Alpha release
will resolve my issue?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418p4040483.html
Sent from the Solr - User mailing list archive at Nabble.com.




Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Michael Della Bitta
Just for sake of comparison, http://www.ec2instances.info/

At the low end, EC2 CPUs come in 1, 2, 2.5, and 3.25 unit sizes. A
m2.xlarge uses 3.25 unit CPUs, so one would have to step up to the
high storage, high IO, or cluster compute nodes to do better than that
at single threaded tasks.

Good thing Solr isn't single threaded, or my company would be bankrupt! :)


Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Feb 14, 2013 at 11:24 AM, Walter Underwood
wun...@wunderwood.org wrote:
 Just using a single CPU (log processing with Python), my MacBook Pro (2GHz 
 Intel Core i7) is twice as fast as an m2.xlarge EC2 instance.

 Laptop disks are slower than the EC2 disks.

 EC2 is for quantity, not quality.

 wunder

 On Feb 14, 2013, at 5:10 AM, Jack Krupansky wrote:

 That raises the question of how your average professional notebook computer 
 (PC or Mac or Linux) compares to a garden-variety cloud server such as an 
 Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document 
 ingestion rate or how many documents you can load before load and/or query 
 performance starts to fall off the cliff. Anybody have any numbers? I mean, 
 is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel? 
 (With all the usual caveats that it all depends and your mileage will 
 vary.) But the intent would be for a similar workload on both (like loading 
 the wikipedia dump.)

 -- Jack Krupansky

 -Original Message- From: Erick Erickson
 Sent: Thursday, February 14, 2013 7:31 AM
 To: solr-user@lucene.apache.org
 Subject: Re: What should focus be on hardware for solr servers?

 One data point: I can comfortably index and search the Wikipedia dump (11M
 articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
 queries, but

 Erick


 On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net wrote:

 Excellent, thank you very much for the reply!

 On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen t...@statsbiblioteket.dk
 wrote:

  Matthew Shapiro [m...@mshapiro.net] wrote:
 
   Sorry, I should clarify our current statistics.  First of all I meant
  183k
   documents (not 183, woops). Around 100k of those are full fledged html
   articles (not web pages but articles in our CMS with html content
 inside
   of them),
 
  If an article is around 10-30 pages (or the equivalent), this is still a
  small corpus.
 
   the rest of the data are more like key/value data records with a lot
   of attached meta data for searching.
 
  If the amount of unique categories (model, author, playtime, lix,
  favorite_band, year...) in the meta data is in the lower hundreds, you
  should be fine.
 
   Also, what I meant by search without a search term is that probably   
   80%
   (hard to confirm due to the lack of stats given by the GSA) of our
  searches
   are done on pure metadata clauses without any searching through the
  content
   itself,
 
  That clarifies a lot, thanks. So we have roughly speaking 4000*5
  queries/day ~= 14 queries/minute. Guessing wildly that your peak time
  traffic is about 5 times that, we end up with about 1 query/second. That
 is
  a very light load for the Solr installation we're discussing.
 
   so for example give me documents that have a content type of
   video, that are marked for client X, have a category of Y or Z, and   
   was
   published to platform A, ordered by date published.
 
  That is a near-trivial query and you should get a reply very fast on
  modest hardware.
 
   The searches that use a search term are more like use the same query
  from the
   example as before, but find me all the documents that have the string
  My Video
   in it's title and description.
 
  Unless you experiment with fuzzy matches and phrase slop, this should
 also
  be fast. Ignoring analyzers, there is practically no difference between  
  a
  meta data field and a larger content field in Solr.
 
  Your current search (guessing here) iterates all terms in the content
  fields and take a comparatively large penalty when a large document is
  encountered. The inversion of index in Solr means that the search terms
 are
  looked up in a dictionary and refers to the documents they belong to.  
  The
  penalty for having thousands or millions of terms as compared to tens or
  hundreds in a field in an inverted index is very small.
 
  We're still in any random machine you've got available-land so I  
  second
  Michael's suggestion.
 
  Regards,
  Toke Eskildsen


 --
 Walter Underwood
 wun...@wunderwood.org





Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Steve Rowe

On Feb 14, 2013, at 11:24 AM, Walter Underwood wun...@wunderwood.org wrote:
 Laptop disks are slower than the EC2 disks.

My laptop disk is an SSD.


Re: compare two shards.

2013-02-14 Thread Michael Della Bitta
If you can spare the load of a long request, I'd do an unsorted query
for everything, non-paged. I'd dump that into a line-per-row format
and use something like Apache Hive to do the analysis.

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Feb 12, 2013 at 4:20 AM, stockii stock.jo...@googlemail.com wrote:
 hello.

 i want to compare two shards each other, because these shards should have
 the same index. but this isnt so =(
 so i want to find these documents, there are missing in one shard of my both
 shards.

 my ideas
 - distrubuted shard request on my nodes and fire a facet search on my
 unique-field. but the result of facet component isnt reversable =(

 - grouping. but its not working correctly i think so. no groups of the same
 uniquekey in the resultset.


 does anyone some better ideas?



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/compare-two-shards-tp4039887.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi Core / On demand loading

2013-02-14 Thread vybe3142
Thanks,
We run SOLR 4.0 in production. Yesterday, I ported our configuration to 4.1
on my local workstation. I just looked at the SOLR-4400 fix versions and as
per the info, I might wait till 4.2 before porting. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multi-Core-On-demand-loading-tp4040341p4040498.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread PeterKerk
Ok, something went wrong with posting the code,since I did not escape the
quotes and ampersands.
I tried your code, but nu luck.

Here's the original query I'm trying to execute. What characters do I need
to escape? I thought only the  and  characters?



Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418p4040499.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Hi,

I don't want the field to be tokenized because Solr doesn't support sorting
on a tokenized field. In order to do case insensitive sorting I need to copy
a field to a lowercase but not tokenized field. How to define this?

I did below but it says I need to specify a tokenizer or a class for
analyzer. 

fieldType name=text_lowercase class=solr.TextField
positionIncrementGap=100
analyzer type=index
filter class=solr.LowerCaseFilterFactory /
/analyzer
analyzer type=query
filter class=solr.LowerCaseFilterFactory /
/analyzer
/fieldType

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Implement price range filter: DataImportHandler started. Not Initialized. No commands can be run

2013-02-14 Thread Steve Rowe
Hi Peter,

Your original query didn't make it to the mailing list.  You're experiencing 
a long-standing nabble bug: nabble eats code.  (I've told them about it a 
couple of times, but the problem persists, so I guess they're not interested in 
fixing it.)

My suggestion: don't use nabble for posting to mailing lists.  Or put code 
snippets up on a third-party text sharing facility, e.g. pastebin, github gist, 
etc.

Steve

On Feb 14, 2013, at 12:10 PM, PeterKerk vettepa...@hotmail.com wrote:

 Ok, something went wrong with posting the code,since I did not escape the
 quotes and ampersands.
 I tried your code, but nu luck.
 
 Here's the original query I'm trying to execute. What characters do I need
 to escape? I thought only the  and  characters?
 
 
 
 Thanks!
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Implement-price-range-filter-DataImportHandler-started-Not-Initialized-No-commands-can-be-run-tp4040418p4040499.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Upayavira
You can use a KeywordTokenizerFactory, which will tokenise into a single
term, and then do your lowercasing. Does that get you what you want?

Upayavira

On Thu, Feb 14, 2013, at 05:11 PM, Bing Hua wrote:
 Hi,
 
 I don't want the field to be tokenized because Solr doesn't support
 sorting
 on a tokenized field. In order to do case insensitive sorting I need to
 copy
 a field to a lowercase but not tokenized field. How to define this?
 
 I did below but it says I need to specify a tokenizer or a class for
 analyzer. 
 
 fieldType name=text_lowercase class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
   filter class=solr.LowerCaseFilterFactory /
   /analyzer
   analyzer type=query
   filter class=solr.LowerCaseFilterFactory /
   /analyzer
   /fieldType
 
 Thanks!
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to define a lowercase fieldtype without tokenizer

2013-02-14 Thread Bing Hua
Works perfectly. Thank you. I didn't know this tokenizer does nothing before
:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-define-a-lowercase-fieldtype-without-tokenizer-tp4040500p4040507.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Timothy Potter
Start by looking at Solr's external file field and
http://www.linkedin.com/profile/view?id=18807864trk=tab_pro

On Thu, Feb 14, 2013 at 6:24 AM, Á_o chachime...@yahoo.es wrote:
 Well, thinking a bit more, the second solution is not practical.

 If Solr retrieves, say, 1.000 documents, I would have to navigate through
 ALL (maybe less with some reasonable upper limit) of them to recalculate the
 scores and reorder them according to the new score although the Web App is
 going to show just the first 20.

 In other words, I would lose the benefits of Solr's (well, and most DB's)
 row/offset feature to retrieve information in batchs rather than the whole
 amount of results which may not be seen by the user at all.

 I'm now wondering if a custom implementation of a ValueSource + a
 FunctionQuery is a solution to my problem...

 Any hint?
 Thanks!

 Álvaro



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-tp4040200p4040444.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Combining Solr score with customized user ratings for a document

2013-02-14 Thread Timothy Potter
Oops - that's definitely not the link I meant to give ;-) Here's the
link from slideshare:

http://www.slideshare.net/thelabdude/boosting-documents-in-solr-lucene-revolution-2011

In there we used Mahout to calculate recommendation scores and then
loaded them using external file field.

Cheers,
Tim

On Thu, Feb 14, 2013 at 11:25 AM, Timothy Potter thelabd...@gmail.com wrote:
 Start by looking at Solr's external file field and
 http://www.linkedin.com/profile/view?id=18807864trk=tab_pro

 On Thu, Feb 14, 2013 at 6:24 AM, Á_o chachime...@yahoo.es wrote:
 Well, thinking a bit more, the second solution is not practical.

 If Solr retrieves, say, 1.000 documents, I would have to navigate through
 ALL (maybe less with some reasonable upper limit) of them to recalculate the
 scores and reorder them according to the new score although the Web App is
 going to show just the first 20.

 In other words, I would lose the benefits of Solr's (well, and most DB's)
 row/offset feature to retrieve information in batchs rather than the whole
 amount of results which may not be seen by the user at all.

 I'm now wondering if a custom implementation of a ValueSource + a
 FunctionQuery is a solution to my problem...

 Any hint?
 Thanks!

 Álvaro



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Combining-Solr-score-with-customized-user-ratings-for-a-document-tp4040200p4040444.html
 Sent from the Solr - User mailing list archive at Nabble.com.


RE: Can't determine Sort Order: 'prijs ASC', pos=5

2013-02-14 Thread Chris Hostetter

: I think the order needs to be in lowercase. Try asc instead of ASC.

Should be trivial to support uppercase ASC and DESC as well, not sure why 
no one thought of adding that before...

https://issues.apache.org/jira/browse/SOLR-4458

...patches welcome

-Hoss


RE: What should focus be on hardware for solr servers?

2013-02-14 Thread Toke Eskildsen
Steve Rowe [sar...@gmail.com] wrote:
 On Feb 14, 2013, at 11:24 AM, Walter Underwood wun...@wunderwood.org wrote:
  Laptop disks are slower than the EC2 disks.

 My laptop disk is an SSD.

So it's not a disk? ...Sorry, couldn't resist.

Unfortunately Amazon only has two SSD-backed solutions and they are #3 and #2 
in terms of cost/hour (http://www.ec2instances.info/). To make matters worse, 
one of them has only 240GB of storage, which leaves the $3.10/hour for 2TB of 
SSD as the only choice right now.

At Berlin Buzzwords 2013 there was a very interesting talk about indexing 24 
billion tweets, with the clear conclusion that it was a lot cheaper to buy your 
own hardware (with SSDs) instead of going Amazon. At that point in time, for 
that kind of corpus yadda yadda. There's a recording at 
http://2012.berlinbuzzwords.de/sessions/you-know-search-querying-24-billion-records-900ms

Regards,
Toke Eskildsen

fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
I have a few Solr indexes, each with 20-200 millions documents, which were
indexed by querying multiple PostgreSQL databases.  If I do rebuild the
index by the same way, it would take a few months, because the PostgresSQL
query is slow.

Now, I need to do the following changes to all indexes.
1. delete a couple fields from the Solr index
2. add a couple new fields
3. change the type of one field from string to int

Luckily, all fields were indexed and stored.   My plan is to query an old
index, and get values for all fields, and then add them into new index.

Any faster ways to build new indexes in my case?

Thanks,
Ming


Re: fatest way to rebuild Solr index

2013-02-14 Thread Shawn Heisey

On 2/14/2013 12:46 PM, Mingfeng Yang wrote:

I have a few Solr indexes, each with 20-200 millions documents, which were
indexed by querying multiple PostgreSQL databases.  If I do rebuild the
index by the same way, it would take a few months, because the PostgresSQL
query is slow.

Now, I need to do the following changes to all indexes.
1. delete a couple fields from the Solr index
2. add a couple new fields
3. change the type of one field from string to int

Luckily, all fields were indexed and stored.   My plan is to query an old
index, and get values for all fields, and then add them into new index.


Using the DataImportHandler with SolrEntityProcessor is probably your 
best bet.  I believe you would want to avoid updating the source index 
while using this.


http://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

Thanks,
Shawn



Re: fatest way to rebuild Solr index

2013-02-14 Thread Mingfeng Yang
Shawn,

Awesome.  Exactly something I am looking for.

Thanks!
Ming


On Thu, Feb 14, 2013 at 12:00 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/14/2013 12:46 PM, Mingfeng Yang wrote:

 I have a few Solr indexes, each with 20-200 millions documents, which were
 indexed by querying multiple PostgreSQL databases.  If I do rebuild the
 index by the same way, it would take a few months, because the PostgresSQL
 query is slow.

 Now, I need to do the following changes to all indexes.
 1. delete a couple fields from the Solr index
 2. add a couple new fields
 3. change the type of one field from string to int

 Luckily, all fields were indexed and stored.   My plan is to query an
 old
 index, and get values for all fields, and then add them into new index.


 Using the DataImportHandler with SolrEntityProcessor is probably your best
 bet.  I believe you would want to avoid updating the source index while
 using this.

 http://wiki.apache.org/solr/**DataImportHandler#**SolrEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#SolrEntityProcessor

 Thanks,
 Shawn




Re: long QTime for big index

2013-02-14 Thread Mou
Just to close this discussion , we solved the problem by splitting the index.
It turned out that distributed search with 12 cores are faster than
searching two cores.

All queries ,tomcat configuration, jvm configuration remain same. Now
queries are served in milliseconds.


On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
ml-node+s472066n4037870...@n3.nabble.com wrote:
 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
 very well, the same queries return in less than one sec . I am not sure yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: long QTime for big index

2013-02-14 Thread alxsss
Hi,

It is curious to know how many linux boxes do you have and how many cores in 
each of them. It was my understanding that solr puts in the memory all 
documents found for a keyword, not the whole index. So, why it must be faster 
with more cores, when number of selected documents from many separate cores  
are the same as from one core? 

Thanks.
Alex.

 

 

 

-Original Message-
From: Mou mouna...@gmail.com
To: solr-user solr-user@lucene.apache.org
Sent: Thu, Feb 14, 2013 2:35 pm
Subject: Re: long QTime for big index


Just to close this discussion , we solved the problem by splitting the index.
It turned out that distributed search with 12 cores are faster than
searching two cores.

All queries ,tomcat configuration, jvm configuration remain same. Now
queries are served in milliseconds.


On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
ml-node+s472066n4037870...@n3.nabble.com wrote:
 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it performs
 very well, the same queries return in less than one sec . I am not sure yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html
Sent from the Solr - User mailing list archive at Nabble.com.

 


Re: Solr 3.3.0 - Random CPU problem

2013-02-14 Thread federico.wachs
I took your advice, waited for the servers to go down then:

[ec2-user@zuk-solr-slave-02 ~]$ ps -wwwf -p 10131 
UIDPID  PPID  C STIME TTY  TIME CMD
tomcat   10131 1 17 23:00 ?00:03:13 /usr/sbin/sshd

This doesn't say much :(

What should I do know?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-3-3-0-Random-CPU-problem-tp4039969p4040548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: long QTime for big index

2013-02-14 Thread Mou
We have two boxes, they are really nice servers, 32 core cpu, 192 G
memory with both RAID arrays and fusion IOs. But each of them running
two instances of Solr, one for indexing and the other for
searching.Search index is on fusion IO card.

Each instance has 11 cores and a small core for making indexing almost realtime.

We have around 300 Million documents and 250G on disk. They are all
metadata . Search queries are very diverse and they do not repeat very
frequently , 40 -60 qps. Before we had two cores each 125 G on disk
and solr was taking long time to get results from those two cores. CPU
use was 90%.

We never had problem with indexing. 50% of all our docs gets updated
every day, so very high indexing rate.




On Thu, Feb 14, 2013 at 4:20 PM, alxsss [via Lucene]
ml-node+s472066n4040545...@n3.nabble.com wrote:
 Hi,

 It is curious to know how many linux boxes do you have and how many cores in
 each of them. It was my understanding that solr puts in the memory all
 documents found for a keyword, not the whole index. So, why it must be
 faster with more cores, when number of selected documents from many separate
 cores  are the same as from one core?

 Thanks.
 Alex.







 -Original Message-
 From: Mou [hidden email]
 To: solr-user [hidden email]
 Sent: Thu, Feb 14, 2013 2:35 pm
 Subject: Re: long QTime for big index


 Just to close this discussion , we solved the problem by splitting the
 index.
 It turned out that distributed search with 12 cores are faster than
 searching two cores.

 All queries ,tomcat configuration, jvm configuration remain same. Now
 queries are served in milliseconds.


 On Thu, Jan 31, 2013 at 9:34 PM, Mou [via Lucene]
 [hidden email] wrote:

 Thank you again.

 Unfortunately the index files will not fit in the RAM.I have to try using
 document cache. I am also moving my index to SSD again, we took our index
 off when fusion IO cards failed twice during indexing and index was
 corrupted.Now with the bios upgrade and new driver, it is supposed to be
 more reliable.

 Also I am going to look into the client app to verify that it is making
 proper query requests.

 Surprisingly when I used a much lower value than default for
 defaultconnectionperhost and maxconnectionperhost in solrmeter , it
 performs
 very well, the same queries return in less than one sec . I am not sure
 yet,
 need to run solrmeter with different heap size , with cache and without
 cache etc.

 
 If you reply to this email, your message will be added to the discussion
 below:

 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4037870.html
 To unsubscribe from long QTime for big index, click here.
 NAML




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040535.html

 Sent from the Solr - User mailing list archive at Nabble.com.




 
 If you reply to this email, your message will be added to the discussion
 below:
 http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040545.html
 To unsubscribe from long QTime for big index, click here.
 NAML




--
View this message in context: 
http://lucene.472066.n3.nabble.com/long-QTime-for-big-index-tp4037635p4040549.html
Sent from the Solr - User mailing list archive at Nabble.com.

Query question

2013-02-14 Thread dm_tim
Howdy,

I have a straight-forward index that contains a name field. I am currently
taking a string of text, tokenizing it into individual strings and making a
query out of them all against the name field.

Note that the name field is split up by a whitespace tokenizer and a lower
case filter during indexing.

My query is working fine but I want to boost the score when multiple terms
match. So for example if I had an entry in my index that was originally
Valley Fair Mall and the string I was using to search was I'm shopping at
Valley Fair mall my query is currently being chopped into:
name:i'm~ name:shopping~ name:at~ name:valley~ name:fair~ name:mall~

Note that I use OR by default. 

So as I said, the search result I want is the one with the highest score,
but I was hoping to find a way to boost the score based on the number of
terms it finds (or matches well) so that I can differentiate between a close
match and nowhere near. Any suggestions?

Regards,

T



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-question-tp4040559.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query question

2013-02-14 Thread Jack Krupansky
Use the edismax query parser and set the PF, PF2, and PF3 parameters so that 
adjacent pairs and triples of query terms will get phrase boosted.


See:
http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29
http://wiki.apache.org/solr/ExtendedDisMax#pf2_.28Phrase_bigram_fields.29

-- Jack Krupansky

-Original Message- 
From: dm_tim

Sent: Thursday, February 14, 2013 8:00 PM
To: solr-user@lucene.apache.org
Subject: Query question

Howdy,

I have a straight-forward index that contains a name field. I am currently
taking a string of text, tokenizing it into individual strings and making a
query out of them all against the name field.

Note that the name field is split up by a whitespace tokenizer and a lower
case filter during indexing.

My query is working fine but I want to boost the score when multiple terms
match. So for example if I had an entry in my index that was originally
Valley Fair Mall and the string I was using to search was I'm shopping at
Valley Fair mall my query is currently being chopped into:
name:i'm~ name:shopping~ name:at~ name:valley~ name:fair~ name:mall~

Note that I use OR by default.

So as I said, the search result I want is the one with the highest score,
but I was hoping to find a way to boost the score based on the number of
terms it finds (or matches well) so that I can differentiate between a close
match and nowhere near. Any suggestions?

Regards,

T



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-question-tp4040559.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Fetching the date based on lastupdate

2013-02-14 Thread ballusethuraman
Hi, 
  I am having a column called 'lastUpdate' in my solr which will contain
last updated date. Now i want fetch last 24 lastupdated dates from that
column. How to do this???
Querying the solr server with the following URL fetches me the result ,
http://localhost/solr/MC_10701_catalogEntry/q=lastUpdate:{* TO
NOW}sort=lastUpdate desc

This URL will fetch the lastupdated date in descending order.
Now I want only last 24 records to be fetched. Is there any function in solr
to do this?? Plz Help me.. Thanks in advance.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fetching-the-date-based-on-lastupdate-tp4040564.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: What should focus be on hardware for solr servers?

2013-02-14 Thread Otis Gospodnetic
You could run Lucene benchmark stuff and compare. Or look at
ActionGenerator from Sematext on Github which you could also use for
performance testing and comparing.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On Feb 14, 2013 10:56 AM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Or perhaps we should develop our own, Solr-based benchmark...

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Thu, Feb 14, 2013 at 10:54 AM, Michael Della Bitta
 michael.della.bi...@appinions.com wrote:
  My dual-core, HT-enabled Dell Latitude from last year has this CPU:
  model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
  bogomips: 4988.65
 
  An m3.xlarge reports:
  model name : Intel(R) Xeon(R) CPU   E5645  @ 2.40GHz
  bogomips : 4000.14
 
  I tried running geekbench and phoronx-test-suite and failed at both...
  Anybody have a favorite, free, CLI benchmarking suite?
 
  Michael Della Bitta
 
  
  Appinions
  18 East 41st Street, 2nd Floor
  New York, NY 10017-6271
 
  www.appinions.com
 
  Where Influence Isn’t a Game
 
 
  On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky j...@basetechnology.com
 wrote:
  That raises the question of how your average professional notebook
 computer
  (PC or Mac or Linux) compares to a garden-variety cloud server such as
 an
  Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as
 document
  ingestion rate or how many documents you can load before load and/or
 query
  performance starts to fall off the cliff. Anybody have any numbers? I
 mean,
  is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough
 feel?
  (With all the usual caveats that it all depends and your mileage will
  vary.) But the intent would be for a similar workload on both (like
 loading
  the wikipedia dump.)
 
  -- Jack Krupansky
 
  -Original Message- From: Erick Erickson
  Sent: Thursday, February 14, 2013 7:31 AM
  To: solr-user@lucene.apache.org
  Subject: Re: What should focus be on hardware for solr servers?
 
 
  One data point: I can comfortably index and search the Wikipedia dump
 (11M
  articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
  queries, but
 
  Erick
 
 
  On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro m...@mshapiro.net
 wrote:
 
  Excellent, thank you very much for the reply!
 
  On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen 
 t...@statsbiblioteket.dk
  wrote:
 
   Matthew Shapiro [m...@mshapiro.net] wrote:
  
Sorry, I should clarify our current statistics.  First of all I
 meant
   183k
documents (not 183, woops). Around 100k of those are full fledged
 html
articles (not web pages but articles in our CMS with html content
  inside
of them),
  
   If an article is around 10-30 pages (or the equivalent), this is
 still a
   small corpus.
  
the rest of the data are more like key/value data records with a
 lot
of attached meta data for searching.
  
   If the amount of unique categories (model, author, playtime, lix,
   favorite_band, year...) in the meta data is in the lower hundreds,
 you
   should be fine.
  
Also, what I meant by search without a search term is that
 probably 
 80%
(hard to confirm due to the lack of stats given by the GSA) of our
   searches
are done on pure metadata clauses without any searching through the
   content
itself,
  
   That clarifies a lot, thanks. So we have roughly speaking 4000*5
   queries/day ~= 14 queries/minute. Guessing wildly that your peak time
   traffic is about 5 times that, we end up with about 1 query/second.
 That
  is
   a very light load for the Solr installation we're discussing.
  
so for example give me documents that have a content type of
video, that are marked for client X, have a category of Y or Z,
 and 
 was
published to platform A, ordered by date published.
  
   That is a near-trivial query and you should get a reply very fast on
   modest hardware.
  
The searches that use a search term are more like use the same
 query
   from the
example as before, but find me all the documents that have the
 string
   My Video
in it's title and description.
  
   Unless you experiment with fuzzy matches and phrase slop, this should
  also
   be fast. Ignoring analyzers, there is practically no difference
 between
a
   meta data field and a larger content field in Solr.
  
   Your current search (guessing here) iterates all terms in the content
   fields and take a comparatively large penalty when a large document
 is
   encountered. The inversion of index in Solr means that the search
 terms
  are
   looked up in a dictionary and refers to the documents they belong
 to. 
   The
   penalty for having thousands or millions of terms as compared to
 tens or
   hundreds in a field in an 

suggestions w.r.t Issue with Collections API in 4.1

2013-02-14 Thread Anirudha Jadhav
*1.empty Zookeeper*
*2.empty index directories for solr*
*3.empty solr.xml*
?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
  cores adminPath=/admin/cores
zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:}
hostContext=solr  /cores
/solr
*3.1 upload / link cfg in zookeeper for test collection*
*4*.* start 4 solr servers on different machines*
*5. Access server* : i see
There are no SolrCores running — for the current functionality we require
at least one SolrCore, sorry :) that's ok

*6. CREATE collection*
http://hostname:15000/solr/admin/collections?action=CREATEname=testnumShards=1replicationFactor=4

this creates one core on each server with one shard named
- test_shard1_replica1
- test_shard1_replica2
- test_shard1_replica3
- test_shard1_replica4
and persists it in solr.xml on each server.

*but why are these core are not started?* and even on server reboot even
though solr.xml says  loadOnStartup=true
is still see ERROR on web admin UI
There are no SolrCores running — for the current functionality we require
at least one SolrCore, sorry :)

I did try this once successfully and I think i am missing something now.
Cannot see any errors in log that are severe

-- 
Anirudha P. Jadhav


How to make this work with SOLR ( LUCENE-2899 : Add OpenNLP Analysis capabilities as a module)

2013-02-14 Thread Vinay B,
I'm trying to explore Parts-Of-Speech tagging with SOLR. Firstly, am I
right in assuming that OpenNLP integration is the right direction in
which to proceed?

With respect to getting OpenNLP to work with SOLR (
http://wiki.apache.org/solr/OpenNLP ) , I tried following the
instructions , only to be faced with an error complaining that
OpenNLPTokenizerFactory cannot.be found . Upon researching the error,
I came across the issue
https://issues.apache.org/jira/browse/LUCENE-2899 , that indicates
that integration is not yet complete and the OpenNLP functionality is
only available via a patch (I'm runnign SOLR 4.1 locally).

I tried patching my SOLR 4.1 source , as well as a freshly downloaded
SOLR trunk, to no avail. I guess I just need some tips on how and what
to patch. I tried to patch the base directory as well as the lucene
directory. If there's something I need to hack in the  patch, do let
me know.

Thanks

vinayb@blackbox ~/Downloads/solr-4.1.0/lucene $ pwd
/home/vinayb/Downloads/solr-4.1.0/lucene
vinayb@blackbox ~/Downloads/solr-4.1.0/lucene $ ls
analysis   BUILD.txtcodecsdemo  highlighter
JRE_VERSION_MIGRATION.txt  LUCENE-2899.patch  misc
queries  sandbox  suggest  tools
backwards  build.xmlcommon-build.xml  facet ivy-settings.xml
licenses   memory module-build.xml
queryparser  site SYSTEM_REQUIREMENTS.txt
benchmark  CHANGES.txt  core  grouping  join
LICENSE.txtMIGRATE.txtNOTICE.txt
README.txt   spatial  test-framework
vinayb@blackbox ~/Downloads/solr-4.1.0/lucene $ patch -p0 -i
LUCENE-2899.patch --dry-run
can't find file to patch at input line 5
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--
|diff --git dev-tools/eclipse/dot.classpath dev-tools/eclipse/dot.classpath
|index 1d2abc1..575b4f0 100644
|--- dev-tools/eclipse/dot.classpath
|+++ dev-tools/eclipse/dot.classpath
--
File to patch:


Re: suggestions w.r.t Issue with Collections API in 4.1

2013-02-14 Thread Mark Miller
I don't know - by chance, I'm actually doing about the same sequence of events 
right now with Solr 4.1, and the cores are running fine…

What do the logs say?

- Mark

On Feb 14, 2013, at 10:18 PM, Anirudha Jadhav aniru...@nyu.edu wrote:

 *1.empty Zookeeper*
 *2.empty index directories for solr*
 *3.empty solr.xml*
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=true
  cores adminPath=/admin/cores
 zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:}
 hostContext=solr  /cores
 /solr
 *3.1 upload / link cfg in zookeeper for test collection*
 *4*.* start 4 solr servers on different machines*
 *5. Access server* : i see
 There are no SolrCores running — for the current functionality we require
 at least one SolrCore, sorry :) that's ok
 
 *6. CREATE collection*
 http://hostname:15000/solr/admin/collections?action=CREATEname=testnumShards=1replicationFactor=4
 
 this creates one core on each server with one shard named
 - test_shard1_replica1
 - test_shard1_replica2
 - test_shard1_replica3
 - test_shard1_replica4
 and persists it in solr.xml on each server.
 
 *but why are these core are not started?* and even on server reboot even
 though solr.xml says  loadOnStartup=true
 is still see ERROR on web admin UI
 There are no SolrCores running — for the current functionality we require
 at least one SolrCore, sorry :)
 
 I did try this once successfully and I think i am missing something now.
 Cannot see any errors in log that are severe
 
 -- 
 Anirudha P. Jadhav