date:20090827

On Wed, Aug 26, 2009 at 11:53 PM, Ron Ellis r...@benetech.org wrote:

 Hi Everyone,

 When trying to utilize the new HTTP based replication built into Solr 1.4 I
 encounter a problem. When I view the replication admin page on the slave
 all
 of the master values are null i.e. Replicatable Index Version:null,
 Generation: null | Latest Index Version:null, Generation: null.


If the master has just been started, it has no index which can be replicated
to slave. If you do a commit on master then a replicateable index version
will be shown on the slave and replication will proceed. Alternately, you
can add the following to master configuration

str name=replicateAfterstartup/str


 Despite
 these missing values the two seem to be talking over HTTP successfully (if
 I
 shutdown the master the slave replication page starts exploding with a
 NPE).


The slave replication page should not show a NPE if the master is down. I'll
look into it.



 When I hit http://solr/replication?command=indexversionwt=xml I get the
 following...

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime13/int
 /lst
 long name=indexversion0/long
 long name=generation0/long
 /response

 However in the admin/replication UI on the master I see...

 **
  Index Version: 1250525534711, Generation: 1778
 Any idea what I'm doing wrong or how I could begin to diagnose? I am using
 the 8/25 nightly build of solr with the example solrconfig.xml provided.
 The
 only modifications to the config have been to uncomment the master/rslave
 replication sections and remove the data directory location line so it
 falls
 back to solr.home/data. Also if it's relevant this index was originally
 created in solr 1.3.


I think that should be fine. I assume both master and slave are same Solr
version 1.4?

-- 
Regards,
Shalin Shekhar Mangar.

Re: Pattern matching in Solr

2009-08-27 Thread bhaskar chandrasekar

 
Hi,
 
In Schema.xml file,I am not able ot find splitOnCaseChange=1.
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.
 
Regards
Bhaskar

--- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote:


From: Avlesh Singh avl...@gmail.com
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Wednesday, August 26, 2009, 11:31 AM


You could have used your previous thread itself (
http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is text. Look for the
WordDelimiterFilterFactory in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange=1.
Because of this, ChandarBhaskar is converted into two tokens Chandra and
Bhaskar and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 Can any one help me with the below scenario?.

 Scenario 1:

 Assume that I give Google as input string
 i am using Carrot with Solr
 Carrot is for front end display purpose
 the issue is
 Assuming i give BHASKAR as input string
 It should give me search results pertaining to BHASKAR only.
  Select * from MASTER where name =Bhaskar;
  Example:It should not display search results as ChandarBhaskar or
  BhaskarC.
  Should display Bhaskar only.

 Scenario 2:
  Select * from MASTER where name like %BHASKAR%;
  It should display records containing the word BHASKAR
  Ex: Bhaskar
 ChandarBhaskar
  BhaskarC
  Bhaskarabc

  How to achieve Scenario 1 in Solr ?.



 Regards
 Bhaskar






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

On Thu, Aug 27, 2009 at 1:05 AM, Ryan McKinley ryan...@gmail.com wrote:


 On Aug 26, 2009, at 3:33 PM, djain101 wrote:


 I have one quick question...

 If in solrconfig.xml, if it says ...


 abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError

 does it mean abortOnConfigurationError defaults to false if it is not
 set
 as system property?


 correct


Should that be changed to be true by default in the example solrconfig?

-- 
Regards,
Shalin Shekhar Mangar.

Re: ${solr.abortOnConfigurationError:false} - does it defaults to false

On Thu, Aug 27, 2009 at 12:28 PM, Shalin Shekhar Mangar 
shalinman...@gmail.com wrote:

 On Thu, Aug 27, 2009 at 1:05 AM, Ryan McKinley ryan...@gmail.com wrote:


 On Aug 26, 2009, at 3:33 PM, djain101 wrote:


 I have one quick question...

 If in solrconfig.xml, if it says ...


 abortOnConfigurationError${solr.abortOnConfigurationError:false}/abortOnConfigurationError

 does it mean abortOnConfigurationError defaults to false if it is not
 set
 as system property?


 correct


 Should that be changed to be true by default in the example solrconfig?


I just checked the 1.3 release. It was true by default. Somewhere in between
the default was changed. I think we should revert this change.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Solr Replication

when you say a slice you mean one instance of solr? So your JMX
console is connecting to only one solr?

On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote:

Thanks for the response.

It's interesting because when I run jconsole all I can see is one
ReplicationHandler jmx mbean. It looks like it is defaulting to the first
slice it finds on its path. Is there anyway to have multiple replication
handlers or at least obtain replication on a per slice/instance via JMX
like how you can see attributes for each slice/instance via each
replication admin jsp page?

Thanks again.

From: noble.p...@corp.aol.com
Date: Wed, 26 Aug 2009 11:05:34 +0530
Subject: Re: Solr Replication
To: solr-user@lucene.apache.org

The ReplicationHandler is not enforced as a singleton , but for all
practical purposes it is a singleton for one core.

If an instance (a slice as you say) is setup as a repeater, It can
act as both a master and slave

in the repeater the configuration should be as follows

MASTER
|_SLAVE (I am a slave of MASTER)
|
REPEATER (I am a slave of MASTER and master to my slaves )
|
|
REPEATER_SLAVE( of REPEATER)

the point is that REPEATER will have a slave section has a masterUrl
which points to master and REPEATER_SLAVE will have a slave section
which has a masterurl pointing to repeater

On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote:

Hello,

We are running multiple slices in our environment. I have enabled JMX and
I am inspecting the replication handler mbean to obtain some information
about the master/slave configuration for replication. Is the replication
handler mbean a singleton? I only see one mbean for the entire server and
it's picking an arbitrary slice to report on. So I'm curious if every
slice gets its own replication handler mbean? This is important because I
have no way of knowing in this specific server any information about the
other slices, in particular, information about the master/slave value for
the other slices.

Reading through the Solr 1.4 replication strategy, I saw that a slice can
be configured to be a master and a slave, i.e. a repeater. I'm wondering
how repeaters work because let's say I have a slice named 'A' and the
master is on server 1 and the slave is on server 2 then how are these two
servers communicating to replicate? Looking at the jmx information I have
in the MBean both the isSlave and isMaster is set to true for my repeater
so how does this solr slice know if it's the master or slave? I'm a bit
confused.

Thanks.

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast.
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Trie Date question

2009-08-27 Thread Aleksander Stensby

Hello everyone,
after reading Grant's article about TrieRange capabilities on the lucid blog
I did some experimenting, but I have some trouble with the tdate type and I
was hoping that you guys could point me in the right direction.
So, basically I index a regular solr date field and use that for sorting and
range queries today. For experimenting I added tdate field, indexing it with
the same data as in my other date field, but I'm obviously doing something
wrong here, because the results coming back are completely different...
the definitions in my schema:
field name=datetime type=date indexed=true stored=false
omitNorms=true/
field name=tdatetime type=tdate indexed=true stored=false/

so if I do a query on my test index:
q=datetime:[NOW/DAY-1YEAR TO NOW/DAY]
i get numFound=1031524 (don't worry about the ordering yet)..
then, if I do the following on my trie date field:
q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY]
i get numFound=0
Where did I go wrong? (And yes, both fields are indexed with the exactly
same data...)
Thanks for any guidance here!
Cheers,
 Aleks

-- 
Aleksander M. Stensby
Lead Software Developer and System Architect
Integrasco A/S
www.integrasco.com
http://twitter.com/Integrasco
http://facebook.com/Integrasco

Please consider the environment before printing all or any of this e-mail

Solr project statisitics


Hi,

Where can I find general statistics about the Solr project. The only 
thing I found is statistics about the Lucene project at:

http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F

Now the question is whether these number include all lucene's 
sub-projects (including Solr). If that's the case, then is there a way 
find out Solr's part in these numbers, otherwise are there any other 
publicly available statistics about Solr?


Cheers,
Uri

Re: Cannot get solr 1.3.0 to run properly with plesk 9.2.1 on CentOS

2009-08-27 Thread Aaron Aberg

Guys,

Thanks everyone who helped or tried to help me out with this issue.
After talking with a buddy of mine who uses solr, he said that XPath
exception seemed familiar. It turns out that right at the bottom of
the Solr Wiki install page is a troubleshooting section with one
entry... and it was regarding XPath. tomcat did not have xalan in its
class path and the easiest way to fix that was to create a symlink to
the file in /usr/share/tomcat/shared/lib directory.

my version of xalan was located under /usr/share/java.

for future reference, have anyone complaining about this same issue
(XPath etc) go to this page:
http://wiki.apache.org/solr/SolrTomcat#head-7fe06bf7aac41f6307f0290a2150b365227e1074
and at the bottom they will get the same instructions.

Guys, again... thanks so much!

--Aaron

On Wed, Aug 26, 2009 at 8:47 PM, Fuad Efendif...@efendi.ca wrote:
 Looks like you totally ignored my previous post...




 Who is vendor of this openjdk-1.6.0.0? Who is vendor of JVM which this
 JDK
 runs on?
... such installs for Java are totally mess, you
 may have incompatible Servlet API loaded by bootstrap classloader before
 Tomcat classes




 First of all, please, try to install standard Java from SUN  on your
 development box, and run some samples...





 !This is due to your tomcat instance not having the xalan jar file in
 !the classpath


 P.S.
 Don't rely on CentOS 'approved' Java libraries.

Re: HTML decoder is splitting tokens

2009-08-27 Thread Anders Melchiorsen

Hello.

Thanks for the hints. Still some trouble, though.

I added just the HTMLStripCharFilterFactory because, according to
documentation, it should also replace HTML entities. It did, but still
left a space after the entity, so I got two tokens from Guuml;nther.
That seems like a bug?

Adding MappingCharFilterFactory in front of the HTML stripper (so that the
latter will not see the entity) does work as expected. That is, until I
try strings like use lt;pgt; to mark a paragraph, where the HTML
stripper will then remove parts of the actual text. So this approach will
not work.


Finally, I was happy that I could now use an arbitrary tokenizer with HTML
input. The PatternTokenizer, however, seems to be using character offsets
corresponding to the output of the char filters, and so the highlighting
markers end up at the wrong place. Is that a bug, or a configuration
issue?


Cheers,
Anders.


Koji Sekiguchi wrote:
 Hi Anders,

 Sorry, I don't know this is a bug or a feature, but
 I'd like to show an alternate way if you'd like.

 In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
 marked as deprecated. Instead, HTMLStripCharFilterFactory and
 an arbitrary TokenizerFactory are encouraged to use.
 And I'd recommend you to use MappingCharFilterFactory
 to convert character references to real characters.
 That is, you have:

 fieldType name=textHtml class=solr.TextField 
   analyzer
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping.txt/
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType

 where the contents of mapping.txt:

 uuml; = ü
 auml; = ä
 iuml; = ï
 euml; = ë
 ouml; = ö
 : :

 Then run analysis.jsp and see the result.

 Thank you,

 Koji


 Anders Melchiorsen wrote:
 Hi.

 When indexing the string Guuml;nther with
 HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two tokens,
 Gü and nther.

 Is this a bug, or am I doing something wrong?

 (Using a Solr nightly from 2009-05-29)


 Anders.

Re: Pattern matching in Solr

2009-08-27 Thread Avlesh Singh

In Schema.xml file,I am not able ot find splitOnCaseChange=1.

Unless you have modified the stock field type definition of text field in
your core's schema.xml you should be able to find this property set for the
WordDelimiterFilterFactory. Read more here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Moreover for scenario 2 the KeywordTokenizerFactory and
EdgeNGramFilterFactory refers which link in Solr wiki.

Google for these two.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
wrote:

Hi,

In Schema.xml file,I am not able ot find splitOnCaseChange=1.
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.

Regards
Bhaskar

--- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote:

From: Avlesh Singh avl...@gmail.com
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Wednesday, August 26, 2009, 11:31 AM

You could have used your previous thread itself (

http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr
),
Bhaskar.

In your scenario one, you need an exact token match, right? You are getting
expected results if your field type is text. Look for the
WordDelimiterFilterFactory in your field type definition for the text
field inside schema.xml. You'll find an attribute splitOnCaseChange=1.
Because of this, ChandarBhaskar is converted into two tokens Chandra
and
Bhaskar and hence the matches. You may choose to remove this attribute if
the behaviour is not desired.

For your scenario two, you may want to look at the KeywordTokenizerFactory
and EdgeNGramFilterFactory on Solr wiki.

Generally, for all such use cases people create multiple fields in their
schema storing the same data analyzed in different ways.

Cheers
Avlesh

On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar
bas_s...@yahoo.co.in
wrote:

Hi,

Can any one help me with the below scenario?.

Scenario 1:

Assume that I give Google as input string
i am using Carrot with Solr
Carrot is for front end display purpose
the issue is
Assuming i give BHASKAR as input string
It should give me search results pertaining to BHASKAR only.
Select * from MASTER where name =Bhaskar;
Example:It should not display search results as ChandarBhaskar or
BhaskarC.
Should display Bhaskar only.

Scenario 2:
Select * from MASTER where name like %BHASKAR%;
It should display records containing the word BHASKAR
Ex: Bhaskar
ChandarBhaskar
BhaskarC
Bhaskarabc

How to achieve Scenario 1 in Solr ?.

Regards
Bhaskar

__
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: Lucene Search Performance Analysis Workshop

2009-08-27 Thread Erik Hatcher


Fuad -

http://www.lucidimagination.com/blog/2009/05/27/filtered-query-performance-increases-for-solr-14/

Use fq=filter instead, generally speaking.

Erik


On Aug 26, 2009, at 10:24 PM, Fuad Efendi wrote:


I am wondering... are new SOLR filtering features faster than standard
Lucene queries like
{query} AND {filter}???

Why can't we improve Lucene then?

Fuad


P.S.
https://issues.apache.org/jira/browse/SOLR-1169
https://issues.apache.org/jira/browse/SOLR-1179





-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: August-26-09 8:50 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Lucene Search Performance Analysis Workshop

While Andrzej's talk will focus on things at the Lucene layer, I'm
sure there'll be some great tips and tricks useful to Solrians too.
Andrzej is one of the sharpest folks I've met, and he's also a very
impressive presenter.  Tune in if you can.

Erik


Begin forwarded message:


From: Andrzej Bialecki a...@getopt.org
Date: August 26, 2009 5:44:40 PM EDT
To: java-u...@lucene.apache.org
Subject: Lucene Search Performance Analysis Workshop
Reply-To: java-u...@lucene.apache.org

Hi all,

I am giving a free talk/ workshop next week on how to analyze and
improve Lucene search performance for native lucene apps. If you've
ever been challenged to get your Java Lucene search apps running
faster, I think you might find the talk of interest.

Free online workshop:
Thursday, September 3rd 2009
11:00-11:30AM PDT / 14:00-14:30 EDT

Follow this link to sign up:


http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP


About:
Lucene Performance Workshop:
Understanding Lucene Search Performance
with Andrzej Bialecki

Experienced Java developers know how to use the Apache Lucene
library to build powerful search applications natively in Java.
LucidGaze for Lucene from Lucid Imagination, just released this
week, provides a powerful utility for making transparent the
underlying indexing and search operations, and analyzing their
impact on search performance.

Agenda:
* Understanding sources of variability in Lucene search performance
* LucidGaze for Lucene APIs for performance statistics
* Applying LucidGaze for Lucene performance statistics to real-world
performance problems

Join us for a free online workshop. Sign up via the link below:


http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP


About the Presenter:
Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid
Imagination Technical Advisory Board; he also serves as the project
lead for Nutch, and as committer in the Lucene-java, Nutch and
Hadoop projects. He has broad expertise, across domains as diverse
as information retrieval, systems architecture, embedded systems
kernels, networking and business process/e-commerce modeling. He's
also the author of the popular Luke index inspection utility.
Andrzej holds a master's degree in Electronics from Warsaw Technical
University, speaks four languages and programs in many, many more.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Lucene Search Performance Analysis Workshop

2009-08-27 Thread Grant Ingersoll



On Aug 26, 2009, at 10:24 PM, Fuad Efendi wrote:


I am wondering... are new SOLR filtering features faster than standard
Lucene queries like
{query} AND {filter}???


The new filtering features in Solr are just doing what Lucene started  
doing in 2.4 and that is using skipping when possible.  It used to be  
the case in both Lucene and Solr that the filter was only every  
applied after scoring but before insertion into the Priority Queue.   
That is now fixed.





Why can't we improve Lucene then?

Fuad


P.S.
https://issues.apache.org/jira/browse/SOLR-1169
https://issues.apache.org/jira/browse/SOLR-1179





-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: August-26-09 8:50 PM
To: solr-user@lucene.apache.org
Subject: Fwd: Lucene Search Performance Analysis Workshop

While Andrzej's talk will focus on things at the Lucene layer, I'm
sure there'll be some great tips and tricks useful to Solrians too.
Andrzej is one of the sharpest folks I've met, and he's also a very
impressive presenter.  Tune in if you can.

Erik


Begin forwarded message:


From: Andrzej Bialecki a...@getopt.org
Date: August 26, 2009 5:44:40 PM EDT
To: java-u...@lucene.apache.org
Subject: Lucene Search Performance Analysis Workshop
Reply-To: java-u...@lucene.apache.org

Hi all,

I am giving a free talk/ workshop next week on how to analyze and
improve Lucene search performance for native lucene apps. If you've
ever been challenged to get your Java Lucene search apps running
faster, I think you might find the talk of interest.

Free online workshop:
Thursday, September 3rd 2009
11:00-11:30AM PDT / 14:00-14:30 EDT

Follow this link to sign up:


http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP


About:
Lucene Performance Workshop:
Understanding Lucene Search Performance
with Andrzej Bialecki

Experienced Java developers know how to use the Apache Lucene
library to build powerful search applications natively in Java.
LucidGaze for Lucene from Lucid Imagination, just released this
week, provides a powerful utility for making transparent the
underlying indexing and search operations, and analyzing their
impact on search performance.

Agenda:
* Understanding sources of variability in Lucene search performance
* LucidGaze for Lucene APIs for performance statistics
* Applying LucidGaze for Lucene performance statistics to real-world
performance problems

Join us for a free online workshop. Sign up via the link below:


http://www2.eventsvc.com/lucidimagination/event/ff97623d-3fd5-43ba-a69d-650d
cb1d6bbc?trk=WR-SEP2009-AP


About the Presenter:
Andrzej Bialecki, Apache Lucene PMC Member, is on the Lucid
Imagination Technical Advisory Board; he also serves as the project
lead for Nutch, and as committer in the Lucene-java, Nutch and
Hadoop projects. He has broad expertise, across domains as diverse
as information retrieval, systems architecture, embedded systems
kernels, networking and business process/e-commerce modeling. He's
also the author of the popular Luke index inspection utility.
Andrzej holds a master's degree in Electronics from Warsaw Technical
University, speaks four languages and programs in many, many more.


--
Best regards,
Andrzej Bialecki 
___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org







--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Solr project statisitics

2009-08-27 Thread Grant Ingersoll



On Aug 27, 2009, at 4:00 AM, Uri Boness wrote:


Hi,

Where can I find general statistics about the Solr project. The only  
thing I found is statistics about the Lucene project at:

http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F

Now the question is whether these number include all lucene's sub- 
projects (including Solr). If that's the case, then is there a way  
find out Solr's part in these numbers, otherwise are there any other  
publicly available statistics about Solr?


Those are pretty much it.  It is further complicated by the fact that  
the ASF has a really large mirroring system, which complicates the  
downloads picture quite a bit.  Nor does it account for the many  
people using distributions from Ubuntu, etc.

Re: Lucene Search Performance Analysis Workshop

2009-08-27 Thread Michael McCandless

On Thu, Aug 27, 2009 at 6:30 AM, Grant Ingersollgsing...@apache.org wrote:

 I am wondering... are new SOLR filtering features faster than standard
 Lucene queries like
 {query} AND {filter}???

 The new filtering features in Solr are just doing what Lucene started doing
 in 2.4 and that is using skipping when possible.  It used to be the case in
 both Lucene and Solr that the filter was only every applied after scoring
 but before insertion into the Priority Queue.  That is now fixed.

I think performance of filtering can still be further improved, within
Lucene... it's still very much a work in progress.

EG if a filter is random access (eg RAM resident as a bit set), which
I think for Solr is frequently the case (?), it ought to be applied
just like we now apply deleted documents (LUCENE-1536 is opened for
this).  This can result in sizable performance gains, especially for
more complex queries and no-so-dense filters.

Mike

Announcing Dutch Lucene User Group


Hi,

We started a new Lucene user group in The Netherlands. In the last 
couple of years we've notice an increasing demand and interest in Lucene 
and Solr. We thought it's about time to have a centralize place where 
people can have open discussions, trainings, and periodic meet-ups to 
share knowledge and experience with relation to these technologies. The 
website is up and running and you're welcome to register (even if you're 
not living in The Netherlands - the content is in English :-)). Check 
out: http://www.lucene-nl.org


Cheers,
Uri

Thanks

2009-08-27 Thread gwk


Hello,

Earlier this your our company decided to (finally :)) upgrade our 
website to something a little faster/prettier/maintainable-er. After 
some research we decided on using Solr and after indexing our data for 
the first time and trying some manual queries we were all amazed at the 
speed. This summer we started developing the new site and today we've 
gone live.You can see the site running at http://www.mysecondhome.eu (I 
don't mean to advertise, so feel free not to buy a house). I'd like to 
thank the people here for their help with lifting me from Solr-ignorance 
to Solr-seems-to-know-a-little-bit. We're running a nightly build of 
Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when 
using the sliders in the search screen.


Again, thank you and if you have any suggestions or questions regarding 
our implementation, feel free to ask.


Regards,

gwk

RE: Thanks

2009-08-27 Thread Dave Searle

Hi Gwk,

It's a nice clean site, easy to use and seems very fast, well done! How well 
does it do in regards to SEO though? I noticed there's a lot of ajax going on 
in the background to help speed things up for the user (love the sliders), but 
seems to be lacking structure for the search engines. I'm not sure if this is 
your intention or not, but you could massively increase the number of pages the 
crawlers see by extending your url rewrites to be a bit more static

i.e.

http://www.mysecondhome.co.uk/search/country/France#/s?s=date_descp=1t=objectta=[]pmin=0pmax=%3Ecountry[]=Franceapmin=0apmax=%3Esamin=0samax=%3E

could become:

http://www.mysecondhome.co.uk/search/country/France/region/Auvergne/minprice/20/maxprice/3/page/2

This is what we do with our solr implemented search system across all our 
sites, which in turn has increased general traffic and organic traffic (eg 
www.visordown.com, www.madeformums.com) 

Cheers
Dave




-Original Message-
From: gwk [mailto:g...@eyefi.nl] 
Sent: 27 August 2009 13:04
To: solr-user@lucene.apache.org
Subject: Thanks

Hello,

Earlier this your our company decided to (finally :)) upgrade our 
website to something a little faster/prettier/maintainable-er. After 
some research we decided on using Solr and after indexing our data for 
the first time and trying some manual queries we were all amazed at the 
speed. This summer we started developing the new site and today we've 
gone live.You can see the site running at http://www.mysecondhome.eu (I 
don't mean to advertise, so feel free not to buy a house). I'd like to 
thank the people here for their help with lifting me from Solr-ignorance 
to Solr-seems-to-know-a-little-bit. We're running a nightly build of 
Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when 
using the sliders in the search screen.

Again, thank you and if you have any suggestions or questions regarding 
our implementation, feel free to ask.

Regards,

gwk

Re: Thanks

2009-08-27 Thread gwk


Dave Searle wrote:

Hi Gwk,

It's a nice clean site, easy to use and seems very fast, well done! How well 
does it do in regards to SEO though? I noticed there's a lot of ajax going on 
in the background to help speed things up for the user (love the sliders), but 
seems to be lacking structure for the search engines. I'm not sure if this is 
your intention or not, but you could massively increase the number of pages the 
crawlers see by extending your url rewrites to be a bit more static

  

Hi Dave,

Thanks for the reply, actually, we did think about SEO, turn off 
javascript in your browser and you'll see the site still works (at 
least, it's supposed to). We've added all AJAXy-interaction after we 
implemented the functionality to work without Javascript. So you'll get 
no nice fancy sliders but two drop-downs to select a range.


Regards,

gwk

Re: Trie Date question

2009-08-27 Thread Yonik Seeley

I can't reproduce any problem.

Are you using a recent nightly build?
See the example schema of a recent nightly build for the correct way
to define a Trie based field - the article / blog may be out of date.

Here's what I used to test the example data:
http://localhost:8983/solr/select?q=manufacturedate_dt:[NOW/DAY-4YEAR%20TO%20NOW/DAY]

-Yonik
http://www.lucidimagination.com



On Thu, Aug 27, 2009 at 3:49 AM, Aleksander
Stensbyaleksander.sten...@integrasco.com wrote:
 Hello everyone,
 after reading Grant's article about TrieRange capabilities on the lucid blog
 I did some experimenting, but I have some trouble with the tdate type and I
 was hoping that you guys could point me in the right direction.
 So, basically I index a regular solr date field and use that for sorting and
 range queries today. For experimenting I added tdate field, indexing it with
 the same data as in my other date field, but I'm obviously doing something
 wrong here, because the results coming back are completely different...
 the definitions in my schema:
 field name=datetime type=date indexed=true stored=false
 omitNorms=true/
 field name=tdatetime type=tdate indexed=true stored=false/

 so if I do a query on my test index:
 q=datetime:[NOW/DAY-1YEAR TO NOW/DAY]
 i get numFound=1031524 (don't worry about the ordering yet)..
 then, if I do the following on my trie date field:
 q=tdatetime:[NOW/DAY-1YEAR TO NOW/DAY]
 i get numFound=0
 Where did I go wrong? (And yes, both fields are indexed with the exactly
 same data...)
 Thanks for any guidance here!
 Cheers,
  Aleks

 --
 Aleksander M. Stensby
 Lead Software Developer and System Architect
 Integrasco A/S
 www.integrasco.com
 http://twitter.com/Integrasco
 http://facebook.com/Integrasco

 Please consider the environment before printing all or any of this e-mail

RE: JDWP Error

2009-08-27 Thread Chris Hostetter

: JDPA/JDWP are for remote debugging of SUN JVM...
: It shouldn't be SOLR related... check configs of Resin...

right, it sounds like you probably already have another process that is 
listening on that port (and older execution of resin that was never shut 
down cleanly?) ...

: then, when we want to stop resin it doesn't works, any advice?



-Hoss

Optimal Cache Settings, complicated by regular commits

2009-08-27 Thread Andrew Ingram

Hi all,
I'm trying to work out the optimum cache settings for our Solr server, I'll
begin by outlining our usage.

Number of documents: approximately 25,000
Commit frequency: sometimes we do massive amounts of sequential commits,
most of the time its less frequent but still several times an hour
We make heavy use of faceting and sorting, and the number of possible facets
led to choosing a filterCache size of about 50,000

The problem we have is that the default cache settings resulting in very low
hit rates (less than 30% for documents, less than 1% for filterCache), so we
upped the cache size up gradually until the hit rates were in the 80s-90s,
now we have the issue of commits being very slow (more than 5 seconds for a
document), to the point where it causes a timeout elsewhere in our systems.
This is made worse by the fact that committing seems to empty the cache,
given that it takes about an hour to get the cache to a good state this is
obviously very problematic.

Is there a way for commits to selectively empty the cache? Any advice
regarding the config would be appreciated. The server load is relatively
low, ideally we're looking to minimize the response time rather than aim for
CPU or memory efficiency.

Regards,
Andrew Ingram

Re: SortableFloatFieldSource not accessible? (1.3)

2009-08-27 Thread Christophe Biocca

Yes it will.
Thanks.

On Wed, Aug 26, 2009 at 8:51 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 SortableFloatField works in function queries... it's just that
 everyone goes through SortableFloatField.getValueSource() to create
 them.  Will that work for you?

 -Yonik
 http://www.lucidimagination.com


 On Wed, Aug 26, 2009 at 6:23 PM, Christophe
 Bioccachristo...@openplaces.org wrote:
  The class SortableFloatFieldSource cannot be accessed from outside its
  package. So it can't be used as part of a FunctionQuery.
  Is there a workaround to this, or should I roll my own? Will it be fixed
 in
  1.4?

how to selectively sort records keeping some at the bottom always.. ?

2009-08-27 Thread Preetam Rao

Hi,
If I have documents of type a, b and c but when I sort by some criteria,
lets say date,
can I make documents of kind c always appear at the bottom ?

So effectively I want one kind of records always appear at the bottom since
they don't have valid data,
whether sort is ascending or descending;

Would a function query help here ? Or is it even possible ?

Thanks
Preetam

Searching with or without diacritics

2009-08-27 Thread György Frivolt

Hello,

 I started to use solr only recently using the ruby/rails sunspot-solr
client. I use solr on a slovak/czech data set and realized one not wanted
behaviour of the search. When the user searches an expression or word which
contains dicritics, letters like š, č, ť, ä, ô,... usually the special
characters are omitted in the search query. In this case solr does not
return records which contain the expression intended to be found by the
user.
 How can I configure solr in a way, that it founds records containing
special characters, even if they are without special accents in the query?

 Some info about my solr instance: Solr Specification Version: 1.3.0Solr
Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12
11:06:47Lucene Specification Version: 2.4-devLucene Implementation Version:
2.4-dev 691741 - 2008-09-03 15:25:16

Thank for your help, regards,

 Georg

Re: Thanks

This looks great! Congratulations!

Feel free to add your site to the Powered by Solr page at
http://wiki.apache.org/solr/PublicServers

On Thu, Aug 27, 2009 at 5:34 PM, gwk g...@eyefi.nl wrote:

 Hello,

 Earlier this your our company decided to (finally :)) upgrade our website
 to something a little faster/prettier/maintainable-er. After some research
 we decided on using Solr and after indexing our data for the first time and
 trying some manual queries we were all amazed at the speed. This summer we
 started developing the new site and today we've gone live.You can see the
 site running at http://www.mysecondhome.eu (I don't mean to advertise, so
 feel free not to buy a house). I'd like to thank the people here for their
 help with lifting me from Solr-ignorance to Solr-seems-to-know-a-little-bit.
 We're running a nightly build of Solr 1.4 with SOLR-1240 applied for the
 dynamic facet count updates when using the sliders in the search screen.

 Again, thank you and if you have any suggestions or questions regarding our
 implementation, feel free to ask.

 Regards,

 gwk




-- 
Regards,
Shalin Shekhar Mangar.

RE: Thanks

Great site (fast from Canada), multilingual, hope you will get millions of
ads quickly and share your findings of SOLR faceting performance (don't
forget about SOLR HTTP-caching support!)
I am currently developing similar in Canada, http://www.casaGURU.com (and
hope to improve http://www.zoocasa.com)


-Original Message-
From: gwk [mailto:g...@eyefi.nl] 
Sent: August-27-09 8:04 AM
To: solr-user@lucene.apache.org
Subject: Thanks

Hello,

Earlier this your our company decided to (finally :)) upgrade our 
website to something a little faster/prettier/maintainable-er. After 
some research we decided on using Solr and after indexing our data for 
the first time and trying some manual queries we were all amazed at the 
speed. This summer we started developing the new site and today we've 
gone live.You can see the site running at http://www.mysecondhome.eu (I 
don't mean to advertise, so feel free not to buy a house). I'd like to 
thank the people here for their help with lifting me from Solr-ignorance 
to Solr-seems-to-know-a-little-bit. We're running a nightly build of 
Solr 1.4 with SOLR-1240 applied for the dynamic facet count updates when 
using the sliders in the search screen.

Again, thank you and if you have any suggestions or questions regarding 
our implementation, feel free to ask.

Regards,

gwk

Distributed Search nightly delete

2009-08-27 Thread GiriGG


Hi All,

I need to build a Search system using Solr. I need to keep data of 30 days
which will be around 400GB.
I will be using Distributed Search with Master/Slaves (Data will be
published to each shard on round robin basis). My challenge is I need to
delete older than 30 days data (around 12GB) every night. Search will be
very high on the data of the current day as well as last 1 week data. How
many shards with masters  slaves I should have to handle the search as well
as to delete old data every night?

Thanks in Advance. 
-- 
View this message in context: 
http://www.nabble.com/Distributed-Search---nightly-delete-tp25173735p25173735.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr admin url for example gives 404

2009-08-27 Thread Chris Hostetter

:Try running ant example and then run Solr.

right ... on a clean checkout, the solr.war needs to be built and copied 
to the example directory, otherwise you are just running an empty jetty 
server.

do you see anything in example/webapps?

:  1 get the latest Solr from svn (R 808058)
:  2 run ant clean test   (all tests pass)
:  3 cd ./example
:  4. start solr
:  $ java -jar start.jar




-Hoss

Re: com.ctc.wstx.exc.WstxUnexpectedCharException error

2009-08-27 Thread Chris Hostetter


: I have a valid xml document that begins:

how are you inspecting the document?

I suspect that what you actually have is a documenting containing hte 
literal bytes RD but some tool you are using to view the document is 
displaying the $ to you as amp;

...OR...

your source document has the literal byts Ramp;D in it, but some code 
you are using is parsing that as xml and put wrtting it (over the wire) to 
solr has a string literal without reencoding (RD)

try running nc -l in place of solr, and have your indexing code post to 
it -- then see what you get.

Solr certianly doesn't have a problem with proerly escaped ampersands, but 
it will complain about illegal xml escape sequences...

$ java -Ddata=args -jar post.jar 'adddocfield 
name=idRamp;D/field/doc/add'
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in 
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing args to http://localhost:8983/solr/update..
SimplePostTool: COMMITting Solr index changes..

$ java -Ddata=args -jar post.jar 'adddocfield 
name=idRD/field/doc/add'
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in 
UTF-8, other encodings are not currently supported
SimplePostTool: POSTing args to http://localhost:8983/solr/update..
SimplePostTool: FATAL: Solr returned an error: 
comctcwstxexcWstxLazyException_Unexpected_character__code_60_expected_a_semicolon_after_the_reference_for_entity_D__at_


-Hoss

Updating a solr record

2009-08-27 Thread Paul Rosen

I realize there is no way to update particular fields in a solr record. 
I know the recommendation is to delete the record from the index and 
re-add it, but in my case, it is difficult to completely reindex, so 
that creates problems with my work flow.


That is, the info that I use to create a solr doc comes from two places: 
a local file that contains most of the info, and a URL in that file that 
points to a web page that contains the rest of the info.


To completely reindex, we have to hit every website again, which is 
problematic for a number of reasons. (Plus, those websites don't change 
much, so it is just wasted effort.) (Once in a while we do reindex, and 
it is a huge production to do so.)


But that means that if I want to make a small change to either 
schema.xml or the local files that I'm indexing, I can't. I can't even 
fix minor bugs until our yearly reindexing.


So, the question is:

Is there any way to get the info that is already in the solr index for a 
document, so that I can use that as a starting place? I would just tweak 
that record and add it again.


Thanks,
Paul

Re: Updating a solr record

On Thu, Aug 27, 2009 at 1:27 PM, Eric
Pughep...@opensourceconnections.com wrote:
 You can just query Solr, find the records that you want (including all
 the website data).  Update them, and then send the entire record back.


Correct me if I'm wrong, but I think you'd end up losing the fields
that are indexed but not stored.


-- 
http://www.linkedin.com/in/paultomblin

Re: Lucene Search Performance Analysis Workshop

2009-08-27 Thread Jason Rutherglen

Agreed, Solr uses random access bitsets everywhere so I'm thinking
this could be an improvement or at least a great option to enable and
try out. I'll update LUCENE-1536 so we can benchmark.

On Thu, Aug 27, 2009 at 4:06 AM, Michael
McCandlessluc...@mikemccandless.com wrote:
 On Thu, Aug 27, 2009 at 6:30 AM, Grant Ingersollgsing...@apache.org wrote:

 I am wondering... are new SOLR filtering features faster than standard
 Lucene queries like
 {query} AND {filter}???

 The new filtering features in Solr are just doing what Lucene started doing
 in 2.4 and that is using skipping when possible.  It used to be the case in
 both Lucene and Solr that the filter was only every applied after scoring
 but before insertion into the Priority Queue.  That is now fixed.

 I think performance of filtering can still be further improved, within
 Lucene... it's still very much a work in progress.

 EG if a filter is random access (eg RAM resident as a bit set), which
 I think for Solr is frequently the case (?), it ought to be applied
 just like we now apply deleted documents (LUCENE-1536 is opened for
 this).  This can result in sizable performance gains, especially for
 more complex queries and no-so-dense filters.

 Mike

Re: Problem using replication in 8/25/09 nightly build of 1.4

On Thu, Aug 27, 2009 at 12:27 PM, Shalin Shekhar
Mangarshalinman...@gmail.com wrote:
 On Wed, Aug 26, 2009 at 11:53 PM, Ron Ellis r...@benetech.org wrote:

 Hi Everyone,

 When trying to utilize the new HTTP based replication built into Solr 1.4 I
 encounter a problem. When I view the replication admin page on the slave
 all
 of the master values are null i.e. Replicatable Index Version:null,
 Generation: null | Latest Index Version:null, Generation: null.


 If the master has just been started, it has no index which can be replicated
 to slave. If you do a commit on master then a replicateable index version
 will be shown on the slave and replication will proceed. Alternately, you
 can add the following to master configuration

 str name=replicateAfterstartup/str


 Despite
 these missing values the two seem to be talking over HTTP successfully (if
 I
 shutdown the master the slave replication page starts exploding with a
 NPE).


 The slave replication page should not show a NPE if the master is down. I'll
 look into it

This should .be fixed in the trunk.



 When I hit http://solr/replication?command=indexversionwt=xml I get the
 following...

 response
 -
 lst name=responseHeader
 int name=status0/int
 int name=QTime13/int
 /lst
 long name=indexversion0/long
 long name=generation0/long
 /response

 However in the admin/replication UI on the master I see...

 **
  Index Version: 1250525534711, Generation: 1778
 Any idea what I'm doing wrong or how I could begin to diagnose? I am using
 the 8/25 nightly build of solr with the example solrconfig.xml provided.
 The
 only modifications to the config have been to uncomment the master/rslave
 replication sections and remove the data directory location line so it
 falls
 back to solr.home/data. Also if it's relevant this index was originally
 created in solr 1.3.


 I think that should be fine. I assume both master and slave are same Solr
 version 1.4?

 --
 Regards,
 Shalin Shekhar Mangar.




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Updating a solr record

2009-08-27 Thread Paul Rosen


Eric Pugh wrote:

Do you have to reindex?  Are you meaning an optimize operation?  You
can do an update by just sending Solr a new record, and letting Solr
deal with the removing and adding of the data.


The problem is that I can't easily create the new record. There is some 
data that I no longer have access to, but did at the time I created the 
record to begin with.



You can just query Solr, find the records that you want (including all
the website data).  Update them, and then send the entire record back.


This is what I'd like to know how to do. I'll experiment with this, but 
I thought that I wouldn't be able to get back all the info I need to 
recreate the doc.




Or am I missing something?  Are these documents so huge that you don't
want to pull back an entire record for some reason?


I would like to get the record from solr because I just can't create the 
record the same way as I originally did.


(Besides the time involved in crawling all those websites, some of them 
only allow us access for a limited amount of time, so to reindex, we 
need to call them up and schedule a time for them to whitelist us.)




Eric

On Thu, Aug 27, 2009 at 1:21 PM, Paul Rosenp...@performantsoftware.com wrote:

I realize there is no way to update particular fields in a solr record. I
know the recommendation is to delete the record from the index and re-add
it, but in my case, it is difficult to completely reindex, so that creates
problems with my work flow.



That is, the info that I use to create a solr doc comes from two places: a
local file that contains most of the info, and a URL in that file that
points to a web page that contains the rest of the info.

To completely reindex, we have to hit every website again, which is
problematic for a number of reasons. (Plus, those websites don't change
much, so it is just wasted effort.) (Once in a while we do reindex, and it
is a huge production to do so.)

But that means that if I want to make a small change to either schema.xml or
the local files that I'm indexing, I can't. I can't even fix minor bugs
until our yearly reindexing.

So, the question is:

Is there any way to get the info that is already in the solr index for a
document, so that I can use that as a starting place? I would just tweak
that record and add it again.

Thanks,
Paul

extended documentation on analyzers

2009-08-27 Thread Joe Calderon

is there an online resource or a book that contains a thorough list of
tokenizers and filters available and their functionality?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

is very helpful but i would like to go through additional filters to
make sure im not reinventing the wheel adding my own

--joe

RE: How to reduce the Solr index size..

stored=true means that this piece of info will be stored in a filesystem.
So that your index will contain 1Mb of pure log PLUS some info related to
indexing itself: terms, etc.

Search speed is more important than index size...

And note this: message field contains actual log, stored=true, so that
only this field will make 1Mb if not indexed


-Original Message-
From: Silent Surfer [mailto:silentsurfe...@yahoo.com] 
Sent: August-20-09 11:01 AM
To: Solr User
Subject: How to reduce the Solr index size..

Hi,

I am newbie to Solr. We recently started using Solr.

We are using Solr to process the server logs. We are creating the indexes
for each line of the logs, so that users would be able to do a fine grain
search upto second/ms.

Now what we are observing is , the index size that is being created is
almost double the size of the actual log size. i.e if the logs size is say 1
MB, the actual index size is around 2 MB.

Could anyone let us know what can be done to reduce the index size. Do we
need to change any configurations/delete any files which are created during
the indexing processes, but not required for searching..

Our schema is as follows:

   field name=pkey type=string indexed=true stored=true
required=false / 
   field name=date type=date indexed=true stored=true
omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

message field holds the actual logtext.

Thanks,
sS

Re: Solr project statisitics

Hmmm.. I see, too bad. So, here's a crazy question: if you had to guess, 
how much of these numbers come from Solr nowadays (compared to lucene 
java and the other related projects)? (I know.. it is a crazy question, 
but I had to ask :-))


Grant Ingersoll wrote:


On Aug 27, 2009, at 4:00 AM, Uri Boness wrote:


Hi,

Where can I find general statistics about the Solr project. The only 
thing I found is statistics about the Lucene project at:
http://people.apache.org/~vgritsenko/stats/projects/lucene.html#Downloads-N1008F 



Now the question is whether these number include all lucene's 
sub-projects (including Solr). If that's the case, then is there a 
way find out Solr's part in these numbers, otherwise are there any 
other publicly available statistics about Solr?


Those are pretty much it.  It is further complicated by the fact that 
the ASF has a really large mirroring system, which complicates the 
downloads picture quite a bit.  Nor does it account for the many 
people using distributions from Ubuntu, etc.

Re: extended documentation on analyzers

2009-08-27 Thread Walter Underwood

If you have a specific need, ask on this list. That worked for me. I  
don't think I would have recognized KeywordAnalyzer as the one I wanted.


wunder

On Aug 27, 2009, at 11:32 AM, Joe Calderon wrote:


is there an online resource or a book that contains a thorough list of
tokenizers and filters available and their functionality?

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

is very helpful but i would like to go through additional filters to
make sure im not reinventing the wheel adding my own

--joe

Re: facets: case and accent insensitive sort

2009-08-27 Thread Michel Bottan

Hi Sébastien,

I've experienced the same issue but when using range queries. Maybe this
might help you too.

I was trying to filter a query using a range as [ B TO F ] being case and
accent insensitive, and still get back the case and accent at results.

The solution have been NOT TOKENIZE the field and get a SINGLE token as if
it was a STRING field and store it without case and accents. The
KeywordTokenizer did the job, then at query time the indexed value
(without accents and case insensitve) is used, but the stored value is
returned in the response.

As far I know facets use indexed value at processing, but i'm not sure which
of both(indexed or stored) is returned.

KeywordTokenizer is not clear at Solr docs. See what Lucene says:
KeywordTokenizer - Emits the entire input as a single token. 

   fieldType name=text_insensitive class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
  /analyzer
 /fieldType


Cheers,
Michel Bottan

On Mon, Jun 29, 2009 at 10:17 AM, Sébastien Lamy lamys...@free.fr wrote:

 Thanks for your reply. I will have a look at this.

 Peter Wolanin a écrit :

  Seems like this might be approached using a Lucene payload?  For
 example where the original string is stored as the payload and
 available in the returned facets for display purposes?

 Payloads are byte arrays stored with Terms on Fields. See
 https://issues.apache.org/jira/browse/LUCENE-755

 Solr seems to have support for a few example payloads already like
 NumericPayloadTokenFilter

 Almost any way you approach this it seems like there are potentially
 problems since you might have multiple combinations of case and accent
 mapping to the same case-less accent-less value that you want to use
 for sorting (and I assume for counting) your facets?

 -Peter

 On Fri, Jun 26, 2009 at 9:02 AM, Sébastien Lamylamys...@free.fr wrote:


 Shalin Shekhar Mangar a écrit :


 On Fri, Jun 26, 2009 at 6:02 PM, Sébastien Lamy lamys...@free.fr
 wrote:




 If I use a copyField to store into a string type, and facet on that, my
 problem remains:
 The facets are sorted case and accent sensitive. And I want an
 *insensitive* sort.
 If I use a copyField to store into a type with no accents and case (e.g
 alphaOnlySort), then solr return me facet values with no accents and no
 case. And I want the facet values returned by solr to *have accents and
 case*.



 Ah, of course you are right. There is no way to do this right now except
 at
 the client side.



 Thank you for your response.
 Would it be easy to modify Solr to behave like I want. Where should I
 start
 to investigate?

Re: Return 2 fields per facet.. name and id, for example? / facet value search

2009-08-27 Thread Rihaed Tan

Hi,

I have a similar requirement to Matthew (from his post 2 years ago). Is this
still the way to go in storing both the ID and name/value for facet values?
I'm planning to use id#name format if this is still the case and doing a
prefix query. I believe this is a common requirement so I'd appreciate if
any of you guys can share what's the best way to do it.

Also, I'm indexing the facet values for text search as well. Should the
field declaration below suffice the requirement?

field name=category type=text indexed=true stored=true
required=true multiValued=true/

Thanks,
R



 Re: Return 2 fields per facet.. name and id, for example?


 Matthew Runo

Fri, 07 Sep 2007 13:15:12 -0700


 Ahh... sneaky. I'll probably do the combined-name#id method.


 ++

| Matthew Runo

| Zappos Development

| [EMAIL PROTECTED]

| 702-943-7833

++



 On Sep 7, 2007, at 12:38 PM, Yonik Seeley wrote:



On 9/7/07, Matthew Runo [EMAIL PROTECTED] wrote:


I've found something which is either already in SOLR, or should be

   (as I can see it being very helpful). I couldn't figure out how to do

   it though..


Lets say I'm trying to print out a page of products, and I want to

   provide a list of brands to filter by. It would be great if in my

   facets I could get this sort of xml...



int name=adidas id=145/int


That way, I'd be able to know the brand id of adidas without having

   to run a second query somewhere for each facet to look it up.


If you can get the name from the id in your webapp, then index the id

   to begin with (instead of the name).

   int name=145/int


 Or, if you need both the name and the id, index them both together,

   separated by a special character that you can strip out on the webapp

   side...


int name=adidas#145/int


-Yonik

Case insensitive search and original string

2009-08-27 Thread Rihaed Tan

Hi,
Totally a Solr newbie here. The docs and list have been helpful but I have a
question on lowercase / case insensitive search. Do you really need to have
another field (copied or not) to retain the original casing of a field?

So let's say I have a field with a type that is lowercased during index and
query time, where can I pull out the original string (non-lowercased) from
the response? Should copyfield be used?

Thanks,
R

Re: Optimal Cache Settings, complicated by regular commits

2009-08-27 Thread Jason Rutherglen

Andrew,

Which version of Solr are you using?

There's an open issue to fix caching filters at the segment
level, which will not clear the caches on each commit, you can
vote to indicate your interest.
http://issues.apache.org/jira/browse/SOLR-1308

-J

On Thu, Aug 27, 2009 at 7:06 AM, Andrew Ingrama...@andrewingram.net wrote:
 Hi all,
 I'm trying to work out the optimum cache settings for our Solr server, I'll
 begin by outlining our usage.

 Number of documents: approximately 25,000
 Commit frequency: sometimes we do massive amounts of sequential commits,
 most of the time its less frequent but still several times an hour
 We make heavy use of faceting and sorting, and the number of possible facets
 led to choosing a filterCache size of about 50,000

 The problem we have is that the default cache settings resulting in very low
 hit rates (less than 30% for documents, less than 1% for filterCache), so we
 upped the cache size up gradually until the hit rates were in the 80s-90s,
 now we have the issue of commits being very slow (more than 5 seconds for a
 document), to the point where it causes a timeout elsewhere in our systems.
 This is made worse by the fact that committing seems to empty the cache,
 given that it takes about an hour to get the cache to a good state this is
 obviously very problematic.

 Is there a way for commits to selectively empty the cache? Any advice
 regarding the config would be appreciated. The server load is relatively
 low, ideally we're looking to minimize the response time rather than aim for
 CPU or memory efficiency.

 Regards,
 Andrew Ingram

Re: Updating a solr record

2009-08-27 Thread Paul Rosen


Hi Eric,

I think I understand what you are saying but I'm not sure how it would work.

I think you are saying to have two different indexes, each one has the 
same documents, but one has the hard-to-get fields and the other has the 
easy-to-get fields. Then I would make the same query twice, once to each 
index.


So, let's say I'm looking for all documents that contain the word poem 
and I want to initially display the the 10 most relevant matches. I 
think I'd have to ask each index for its 10 most relevant matches, then 
merge them myself, and display the appropriate ones.


Well, the same document could appear in both lists so I'd have to get 
rid of duplicates. Also, wouldn't the relevancy of the duplicate doc go 
up? But I wouldn't know by how much.


That's the first problem, but then what if the user wants to see page 2? 
I certainly wouldn't query for documents #10-19 on each server.


Eric Pugh wrote:

Right...  You know, if some of your data needs to updated frequently,
but other is updated once per year, and is really massive dataset,
then maybe splitting it up into separate cores?  Since you mentioned
that you can't get the raw data again, you could just duplicate your
existing index by doing a filesytem copy.  Leave that alone so you
don't update it and lose your data, and start a new core that you can
update and ignore the fact is has all the website data in it.  And tie
the two cores data sets together outside of Solr.

Eric



On Thu, Aug 27, 2009 at 1:46 PM, Paul Tomblinptomb...@xcski.com wrote:

On Thu, Aug 27, 2009 at 1:27 PM, Eric
Pughep...@opensourceconnections.com wrote:

You can just query Solr, find the records that you want (including all
the website data).  Update them, and then send the entire record back.


Correct me if I'm wrong, but I think you'd end up losing the fields
that are indexed but not stored.


--
http://www.linkedin.com/in/paultomblin

Alfresco has internal index - integrating into Solr

2009-08-27 Thread jaybytez


I am currently prototyping the use of Alfresco Document Management that has
an internal Lucene to index all the documents managed by Alfresco.

What would I need to understand in order to integrate that Lucene Index into
a separate Solr installation?

I am new to Solr and am trying to use Solr to index WCM produced files on a
file system and then federate (integrate) the Alfresco Lucene Index.

So I want to understand how I should do this from Solr and what I need to
get from Alfresco.

Thanks...jay blanton
-- 
View this message in context: 
http://www.nabble.com/Alfresco-has-internal-index---integrating-into-Solr-tp25179342p25179342.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr Replication

2009-08-27 Thread J G

We have multiple solr webapps all running from the same WAR file. Each webapp
is running under the same Tomcat container and I consider each webapp the same
thing as a slice (or instance). I've configured the Tomcat container to
enable JMX and when I connect using JConsole I only see the replication handler
for one of the webapps in the server. I was under the impression each webapp
gets its own replication handler. Is this not true?

It would be nice to be able to have a JMX MBean for each replication handler in
the container so we can get all the same replication information using JMX as
in using the replication admin page for each web app.

Thanks.

From: noble.p...@corp.aol.com
Date: Thu, 27 Aug 2009 13:04:38 +0530
Subject: Re: Solr Replication
To: solr-user@lucene.apache.org

when you say a slice you mean one instance of solr? So your JMX
console is connecting to only one solr?

On Thu, Aug 27, 2009 at 3:19 AM, J Gskinny_joe...@hotmail.com wrote:

Thanks for the response.

Thanks again.

From: noble.p...@corp.aol.com
Date: Wed, 26 Aug 2009 11:05:34 +0530
Subject: Re: Solr Replication
To: solr-user@lucene.apache.org

The ReplicationHandler is not enforced as a singleton , but for all
practical purposes it is a singleton for one core.

If an instance (a slice as you say) is setup as a repeater, It can
act as both a master and slave

in the repeater the configuration should be as follows

MASTER
|_SLAVE (I am a slave of MASTER)
|
REPEATER (I am a slave of MASTER and master to my slaves )
|
|
REPEATER_SLAVE( of REPEATER)

the point is that REPEATER will have a slave section has a masterUrl
which points to master and REPEATER_SLAVE will have a slave section
which has a masterurl pointing to repeater

On Wed, Aug 26, 2009 at 12:40 AM, J Gskinny_joe...@hotmail.com wrote:

Hello,

We are running multiple slices in our environment. I have enabled JMX
and I am inspecting the replication handler mbean to obtain some
information about the master/slave configuration for replication. Is the
replication handler mbean a singleton? I only see one mbean for the
entire server and it's picking an arbitrary slice to report on. So I'm
curious if every slice gets its own replication handler mbean? This is
important because I have no way of knowing in this specific server any
information about the other slices, in particular, information about the
master/slave value for the other slices.

Reading through the Solr 1.4 replication strategy, I saw that a slice
can be configured to be a master and a slave, i.e. a repeater. I'm
wondering how repeaters work because let's say I have a slice named 'A'
and the master is on server 1 and the slave is on server 2 then how are
these two servers communicating to replicate? Looking at the jmx
information I have in the MBean both the isSlave and isMaster is set to
true for my repeater so how does this solr slice know if it's the master
or slave? I'm a bit confused.

Thanks.

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

_
Hotmail® is up to 70% faster. Now good news travels really fast.
http://windowslive.com/online/hotmail?ocid=PID23391::T:WLMTAGL:ON:WL:en-US:WM_HYGN_faster:082009

--
-
Noble Paul | Principal Engineer| AOL | http://aol.com

_
With Windows Live, you can organize, edit, and share your photos.
http://www.windowslive.com/Desktop/PhotoGallery

Re: How to reduce the Solr index size..

2009-08-27 Thread Glen Newton

2009/8/27 Fuad Efendi f...@efendi.ca:
 stored=true means that this piece of info will be stored in a filesystem.
 So that your index will contain 1Mb of pure log PLUS some info related to
 indexing itself: terms, etc.

 Search speed is more important than index size...

Not if you run out of space for the index. :-)

 And note this: message field contains actual log, stored=true, so that
 only this field will make 1Mb if not indexed


 -Original Message-
 From: Silent Surfer [mailto:silentsurfe...@yahoo.com]
 Sent: August-20-09 11:01 AM
 To: Solr User
 Subject: How to reduce the Solr index size..

 Hi,

 I am newbie to Solr. We recently started using Solr.

 We are using Solr to process the server logs. We are creating the indexes
 for each line of the logs, so that users would be able to do a fine grain
 search upto second/ms.

 Now what we are observing is , the index size that is being created is
 almost double the size of the actual log size. i.e if the logs size is say 1
 MB, the actual index size is around 2 MB.

 Could anyone let us know what can be done to reduce the index size. Do we
 need to change any configurations/delete any files which are created during
 the indexing processes, but not required for searching..

 Our schema is as follows:

   field name=pkey type=string indexed=true stored=true
 required=false /
   field name=date type=date indexed=true stored=true
 omitNorms=true/
   field name=level type=string indexed=true stored=true/
   field name=app type=string indexed=true stored=true/
   field name=server type=string indexed=true stored=true/
   field name=port type=string indexed=true stored=true/
   field name=class type=string indexed=true stored=true/
   field name=method type=string indexed=true stored=true/
   field name=filename type=string indexed=true stored=true/
   field name=linenumber type=string indexed=true stored=true/
   field name=message type=text indexed=true stored=true/

 message field holds the actual logtext.

 Thanks,
 sS










-- 

-

RE: Updating a solr record

I haven't read all messages in this thread yet, but I probably have an
answer to some questions...

1. You want to change schema.xml and to reindex, but you don't have access
to source documents (stored somewhere on Internet). But you probably use
stored=true in your schema. Then, use SOLR as your storage device, use
id:[* TO *] to retrieve documents from SOLR and reindex it in another SOLR
schema...

2. If you don't use stored=true you can still get access to term vectors,
which you can probably reuse to create fake field with same term vector in
an updated document... just an idea, may be I am wrong...


-Original Message-
From: Paul Rosen [mailto:p...@performantsoftware.com] 
Sent: August-27-09 1:22 PM
To: solr-user@lucene.apache.org
Subject: Updating a solr record

I realize there is no way to update particular fields in a solr record. 
I know the recommendation is to delete the record from the index and 
re-add it, but in my case, it is difficult to completely reindex, so 
that creates problems with my work flow.

That is, the info that I use to create a solr doc comes from two places: 
a local file that contains most of the info, and a URL in that file that 
points to a web page that contains the rest of the info.

To completely reindex, we have to hit every website again, which is 
problematic for a number of reasons. (Plus, those websites don't change 
much, so it is just wasted effort.) (Once in a while we do reindex, and 
it is a huge production to do so.)

But that means that if I want to make a small change to either 
schema.xml or the local files that I'm indexing, I can't. I can't even 
fix minor bugs until our yearly reindexing.

So, the question is:

Is there any way to get the info that is already in the solr index for a 
document, so that I can use that as a starting place? I would just tweak 
that record and add it again.

Thanks,
Paul

Re: Case insensitive search and original string

2009-08-27 Thread AHMET ARSLAN

--- On Thu, 8/27/09, Rihaed Tan tanrihae...@gmail.com wrote:

 From: Rihaed Tan tanrihae...@gmail.com
 Subject: Case insensitive search and original string
 To: solr-user@lucene.apache.org
 Date: Thursday, August 27, 2009, 10:10 PM
 Hi,
 Totally a Solr newbie here. The docs and list have been
 helpful but I have a
 question on lowercase / case insensitive search. Do you
 really need to have
 another field (copied or not) to retain the original casing
 of a field?

 So let's say I have a field with a type that is lowercased
 during index and
 query time, where can I pull out the original string
 (non-lowercased) from
 the response? Should copyfield be used?

 Thanks,
 R

Are you asking for displaying purpose? If yes by default Solr gives you 
original string of a field in the response. Stemming, lowercasing, etc do not 
effect this behaviour. You can allways display original documents to the users.

If you want to capture original words -that matched the query terms- from 
original documents, then use highlighting. ( hl=truehl.fragsize=0 ) You will 
find those words between em /em tags in the response.

RE: Alfresco has internal index - integrating into Solr

Check also Liferay trunk and WIKI pages, it had similar problem - and they
have plugin for SOLR now, just a matter of configuration change - and search
implementation is SOLR... They use SolrJ to do this task, and generic
wrappers around search implementation (which could be anything)...
-Fuad
http://www.linkedin.com/in/liferay

-Original Message-
From: jaybytez [mailto:jayby...@gmail.com] 
Sent: August-27-09 4:27 PM
To: solr-user@lucene.apache.org
Subject: Alfresco has internal index - integrating into Solr


I am currently prototyping the use of Alfresco Document Management that has
an internal Lucene to index all the documents managed by Alfresco.

What would I need to understand in order to integrate that Lucene Index into
a separate Solr installation?

I am new to Solr and am trying to use Solr to index WCM produced files on a
file system and then federate (integrate) the Alfresco Lucene Index.

So I want to understand how I should do this from Solr and what I need to
get from Alfresco.

Thanks...jay blanton
-- 
View this message in context:
http://www.nabble.com/Alfresco-has-internal-index---integrating-into-Solr-tp
25179342p25179342.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: tag cloud with solr 1.3

2009-08-27 Thread AHMET ARSLAN


 Hi all,How would I go about
 implementing a 'tag cloud' with Solr1.3? All I
 want to do is to display a list of most occurring terms in
 the corpus. Is there an easy way to go about that in 1.3? 

Yes http://localhost:8983/solr/admin/luke?fl=textnumTerms=100 will give you 
top 100 most occurring terms from field named text.

SnowballPorterFilterFactory stemming word question

2009-08-27 Thread darniz


i have a field defined in my schema.xml file
fieldtype name=stemField class=solr.TextField 
analyzer 
tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.SnowballPorterFilterFactory
language=English / 
/analyzer 
/fieldtype 
If i analyse this field type in analysis.jsp, the follwoing are the results
if i give running its stems word to run which is fine
If i give machine why is that it stems to machin, now from where does
this word come from
If i give revolutionary it stems to revolutionari, i thought it should
stem to revolution.

How does stemming work?
Does it reduces adverb to verb etc..., or we have to customize it.

Please let me know

Thanks


-- 
View this message in context: 
http://www.nabble.com/SnowballPorterFilterFactory-stemming-word-question-tp25180310p25180310.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: encoding problem

2009-08-27 Thread Bernadette Houghton

Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS 
through either the GUI or to startup.bat, but absolutely no impact. Have tried 
reindexing also, but still no impact - results such as -

Ã¢â‚¬Å“My Universe is HereÃ¢â‚¬Â�

bern

-Original Message-
From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] 
Sent: Wednesday, 26 August 2009 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: encoding problem

On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


Tomcat respects an environment variable called JAVA_OPTS through which you
can pass any jvm argument (e.g. heap size, file encoding). Set
JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
following to startup.bat:

set JAVA_OPTS=-Dfile.encoding=UTF-8

-- 
Regards,
Shalin Shekhar Mangar.

Re: encoding problem

2009-08-27 Thread Yonik Seeley

Have you determined if the problem is on the indexing side or the
query side?  I don't see any reason you should have to set/change any
encoding in the JVM.

-Yonik
http://www.lucidimagination.com



On Thu, Aug 27, 2009 at 7:03 PM, Bernadette
Houghtonbernadette.hough...@deakin.edu.au wrote:
 Hi Shalin, strangely, things still aren't working. I've set the JAVA_OPTS 
 through either the GUI or to startup.bat, but absolutely no impact. Have 
 tried reindexing also, but still no impact - results such as -

 Ã¢â‚¬Å“My Universe is HereÃ¢â‚¬Â�

 bern

 -Original Message-
 From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com]
 Sent: Wednesday, 26 August 2009 5:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: encoding problem

 On Wed, Aug 26, 2009 at 12:52 PM, Bernadette Houghton 
 bernadette.hough...@deakin.edu.au wrote:

 Thanks for your quick reply, Shalin.

 Tomcat is running on my Windows machine, but does not appear in Windows
 Services (as I was expecting it should ... am I wrong?). I'm running it from
 a startup.bat on my desktop - see below. Do I add the Dfile line to the
 startup.bat?

 SOLR is part of the repository software that we are running.


 Tomcat respects an environment variable called JAVA_OPTS through which you
 can pass any jvm argument (e.g. heap size, file encoding). Set
 JAVA_OPTS=-Dfile.encoding=UTF-8 either through the GUI or by adding the
 following to startup.bat:

 set JAVA_OPTS=-Dfile.encoding=UTF-8

 --
 Regards,
 Shalin Shekhar Mangar.

RE: Searching and Displaying Different Logical Entities

2009-08-27 Thread wojtekpia

Funtick wrote:

then 2) get all P's by ID, including facet counts, etc.
The problem I face with this solution is that I can have many matching P's
(10,000+), so my second query will have many (10,000+) constraints.

SOLR can automatically provide you P's with Counts, and it will be
_unique_...

I assume you mean to facet by P in the C index. My next problem is to sort
those P's based on some attribute of P (as opposed to alphabetically or by
occurrence in C).

Funtick wrote:

Even if cardinality of P is 10,000+ SOLR is very fast now (expect few
seconds response time for initial request). You need single query with
faceting...

Is there a practical limit for maxBooleanClauses? The default is 1024, but I
need at least 10,000.

Funtick wrote:

(!) You do not need P's ID.

Single document will have unique ID, and fields such as P, C (with
possible
attributes). Do not think in terms of RDBMS... Lucene does all
'normalization' behind the scenes, and SOLR will give you Ps with Cs...

If I put both P's and C's into a single index, then I agree, I don't need
P's ID. If I have P and C in separate indices then I still need to maintain
the logical relationship between P and C.

It wasn't clear to me if you suggested I continue with either of my 2
proposed solutions. Can you clarify?

Thanks,

Wojtek
--
View this message in context:
http://www.nabble.com/Searching-and-Displaying-Different-Logical-Entities-tp25156301p25181664.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: encoding problem

2009-08-27 Thread Bernadette Houghton

Shalin, the XML from solr admin for the relevant field is displaying as -

str name=citation_ta title=Browse by Author Name for Moncrieff, Joan 
href=/fez/list/author/Moncrieff%2C+Joan/Moncrieff, Joan/a, a 
title=Browse by Author Name for Macauley, Peter 
href=/fez/list/author/Macauley%2C+Peter/Macauley, Peter/a and a 
title=Browse by Author Name for Epps, Janine 
href=/fez/list/author/Epps%2C+Janine/Epps, Janine/a a title=Browse by 
Year 2006 href=/fez/list/year/2006/2006/a, a title=Click to view 
Journal, Media Article: ldquo;My Universe is Hererdquo;: Implications For the 
Future of Academic Libraries From the Results of a Survey of Researchers 
href=/fez/view/changeme:156Ã¢â‚¬Å“My Universe is HereÃ¢â‚¬Â�: Implications 
For the Future of Academic Libraries From the Results of a Survey of 
Researchers/ai/i, vol. 38, no. 2, pp. 71-83./str


The weird thing is that the title displays OK in one place, but not in the 
href bit.

bern

Can solr do the equivalent of select distinct(field)?

Can I get all the distinct values from the Solr database, or do I
have to select everything and aggregate it myself?

-- 
http://www.linkedin.com/in/paultomblin

Re: Updating a solr record


I guess if you have stored=true then there is no problem.


2. If you don't use stored=true you can still get access to term vectors,
which you can probably reuse to create fake field with same term vector in
an updated document... just an idea, may be I am wrong...
Reconstructing a the field value from a term enum might work... of 
course the value won't be as the original value, but when indexed, if 
you don't have any really special filters (e.g. shingle filter), most 
likely the tokens will be re-indexed as they are (that is, it is most 
likely that the filters will not have any effect). just make sure to 
take the position increments in account! for example, if you have 
synonym filter set up, then you'll need to choose only one term in a 
single position (otherwise the term frequency of the document will 
increase on every update).


Uri

Fuad Efendi wrote:

I haven't read all messages in this thread yet, but I probably have an
answer to some questions...

1. You want to change schema.xml and to reindex, but you don't have access
to source documents (stored somewhere on Internet). But you probably use
stored=true in your schema. Then, use SOLR as your storage device, use
id:[* TO *] to retrieve documents from SOLR and reindex it in another SOLR
schema...

2. If you don't use stored=true you can still get access to term vectors,
which you can probably reuse to create fake field with same term vector in
an updated document... just an idea, may be I am wrong...


-Original Message-
From: Paul Rosen [mailto:p...@performantsoftware.com] 
Sent: August-27-09 1:22 PM

To: solr-user@lucene.apache.org
Subject: Updating a solr record

I realize there is no way to update particular fields in a solr record. 
I know the recommendation is to delete the record from the index and 
re-add it, but in my case, it is difficult to completely reindex, so 
that creates problems with my work flow.


That is, the info that I use to create a solr doc comes from two places: 
a local file that contains most of the info, and a URL in that file that 
points to a web page that contains the rest of the info.


To completely reindex, we have to hit every website again, which is 
problematic for a number of reasons. (Plus, those websites don't change 
much, so it is just wasted effort.) (Once in a while we do reindex, and 
it is a huge production to do so.)


But that means that if I want to make a small change to either 
schema.xml or the local files that I'm indexing, I can't. I can't even 
fix minor bugs until our yearly reindexing.


So, the question is:

Is there any way to get the info that is already in the solr index for a 
document, so that I can use that as a starting place? I would just tweak 
that record and add it again.


Thanks,
Paul

Re: Can Apache Solr have more than one schema?

Not in the same core. You can define multiple cores where each core is a 
separate solr instance except they all run within one container. each 
core has its own index, schema and configuration. If you want to compare 
it to databases, then I guess a core is to Solr Server what a database 
is to its RDBMS.


Khai Doan wrote:

Hello,

My name is Khai.  I am new to Apache Solr. My question is: Can we have more
than one schema / table?

Thanks!

Khai

Re: Case insensitive search and original string

2009-08-27 Thread R. Tan

Hi Ahmet,
Yes, for display purpose. Okay, so I don't have to copy fields then.

Thank you very much.

R


On Fri, Aug 28, 2009 at 4:57 AM, AHMET ARSLAN iori...@yahoo.com wrote:



 --- On Thu, 8/27/09, Rihaed Tan tanrihae...@gmail.com wrote:

  From: Rihaed Tan tanrihae...@gmail.com
  Subject: Case insensitive search and original string
  To: solr-user@lucene.apache.org
  Date: Thursday, August 27, 2009, 10:10 PM
  Hi,
  Totally a Solr newbie here. The docs and list have been
  helpful but I have a
  question on lowercase / case insensitive search. Do you
  really need to have
  another field (copied or not) to retain the original casing
  of a field?
 
  So let's say I have a field with a type that is lowercased
  during index and
  query time, where can I pull out the original string
  (non-lowercased) from
  the response? Should copyfield be used?
 
  Thanks,
  R
 

 Are you asking for displaying purpose? If yes by default Solr gives you
 original string of a field in the response. Stemming, lowercasing, etc do
 not effect this behaviour. You can allways display original documents to the
 users.

 If you want to capture original words -that matched the query terms- from
 original documents, then use highlighting. ( hl=truehl.fragsize=0 ) You
 will find those words between em /em tags in the response.

Re: Can Apache Solr have more than one schema?

2009-08-27 Thread Khai Doan

Thanks Uri,

Now my question is: how can I specify which schema to query against?

Thanks!

Khai

On Thu, Aug 27, 2009 at 5:43 PM, Uri Boness ubon...@gmail.com wrote:

 Not in the same core. You can define multiple cores where each core is a
 separate solr instance except they all run within one container. each core
 has its own index, schema and configuration. If you want to compare it to
 databases, then I guess a core is to Solr Server what a database is to its
 RDBMS.


 Khai Doan wrote:

 Hello,

 My name is Khai.  I am new to Apache Solr. My question is: Can we have
 more
 than one schema / table?

 Thanks!

 Khai

Ok, why isn't this working?

I've loaded some data into my solr using the embedded server, and I
can see the data using Luke.  I start up the web app, and it says

cwd=/Users/ptomblin/apache-tomcat-6.0.20 
SolrHome=/Users/ptomblin/src/lucidity/solr/

I hit the schema button and it shows the correct schema.  However,
if I type anything into the query window, it never returns anything.
I've tried things that I know for sure are in the default search
field, but all I get back is

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
 int name=status0/int
 int name=QTime2/int
 lst name=params
  str name=indenton/str
  str name=start0/str
  str name=qscientist/str
  str name=rows10/str
  str name=version2.2/str
 /lst
/lst
result name=response numFound=0 start=0/
/response

How can I figure out why I'm not getting any results back?  Any log
files I can look at?

-- 
http://www.linkedin.com/in/paultomblin

Re: Can Apache Solr have more than one schema?

If you have configured multi-core, then all you need to do is use the 
following url pattern:


http://hostname:port/solr/core_name/select?q=...

where core_name is the name of the core you wish to query.

Uri

Khai Doan wrote:

Thanks Uri,

Now my question is: how can I specify which schema to query against?

Thanks!

Khai

On Thu, Aug 27, 2009 at 5:43 PM, Uri Boness ubon...@gmail.com wrote:

  

Not in the same core. You can define multiple cores where each core is a
separate solr instance except they all run within one container. each core
has its own index, schema and configuration. If you want to compare it to
databases, then I guess a core is to Solr Server what a database is to its
RDBMS.


Khai Doan wrote:



Hello,

My name is Khai.  I am new to Apache Solr. My question is: Can we have
more
than one schema / table?

Thanks!

Khai

UpdateRequestProcessor config location

2009-08-27 Thread Erik Earle


I've read through the wiki for this and it explains most everything except 
where in the solrconfig.xml theupdateRequestProcessorChain  goes.

I tried it at the top level but that doesn't seem to do anything.

http://wiki.apache.org/solr/UpdateRequestProcessor

Count of records

2009-08-27 Thread bhaskar chandrasekar





Hi,

 
We have integrated Solr index with Carrot2 Search Engine and able to get search 
results. 
  
In my search results page, by default Total Number of records matched for the 
particular query is not getting displayed. 
http://localhost:8089/carrot2-webapp-3.0.1/search?source=Solrview=treeskin=simplequery=javaresults=100algorithm=lingoSolrDocumentSource.solrTitleFieldName=title
 
SolrDocumentSource.solrSummaryFieldName=descriptionSolrDocumentSource.solrUrlFieldName=url
 
  
Currently I am getting like,  Results 1 - 100 of about 100 for java 
 
Consider I searched for Java; In my Solr index, total number of matches found 
are 1000. 
 I am interested to display only top 100 results. I should also get total match 
for the search query. 
  
Display should be similar to below: 
Results 1 - 100 of about 1000 for java 
 
 
Regards
Bhaskar

Re: Ok, why isn't this working?

On Thu, Aug 27, 2009 at 9:24 PM, Paul Tomblinptomb...@xcski.com wrote:
cwd=/Users/ptomblin/apache-tomcat-6.0.20 
SolrHome=/Users/ptomblin/src/lucidity/solr/


Ok, I've spotted the problem - while SolrHome is in the right place,
it's still looking for the data in
/Users/ptomblin/apache-tomcat-6.0.20/solr/data/

How can I changed that?


-- 
http://www.linkedin.com/in/paultomblin

Re: Pattern matching in Solr

2009-08-27 Thread bhaskar chandrasekar

Hi,
 
In Schema.xml file,I am not able ot find splitOnCaseChange=1.
I am not looking for case sensitive search.
Let me know what file you are refering to?.
I am looking for exact match search only

Moreover for scenario 2 the KeywordTokenizerFactory
and EdgeNGramFilterFactory refers which link in Solr wiki.
 
Regards
Bhaskar



--- On Thu, 8/27/09, Avlesh Singh avl...@gmail.com wrote:


From: Avlesh Singh avl...@gmail.com
Subject: Re: Pattern matching in Solr
To: solr-user@lucene.apache.org
Date: Thursday, August 27, 2009, 2:10 AM



 In Schema.xml file,I am not able ot find splitOnCaseChange=1.

Unless you have modified the stock field type definition of text field in
your core's schema.xml you should be able to find this property set for the
WordDelimiterFilterFactory. Read more here -
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-1c9b83870ca7890cd73b193cefed83c283339089

Moreover for scenario 2 the KeywordTokenizerFactory and
 EdgeNGramFilterFactory refers which link in Solr wiki.

Google for these two.

Cheers
Avlesh

On Thu, Aug 27, 2009 at 12:21 PM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:


 Hi,

 In Schema.xml file,I am not able ot find splitOnCaseChange=1.
 I am not looking for case sensitive search.
 Let me know what file you are refering to?.
 I am looking for exact match search only

 Moreover for scenario 2 the KeywordTokenizerFactory
 and EdgeNGramFilterFactory refers which link in Solr wiki.

 Regards
 Bhaskar

 --- On Wed, 8/26/09, Avlesh Singh avl...@gmail.com wrote:


 From: Avlesh Singh avl...@gmail.com
 Subject: Re: Pattern matching in Solr
 To: solr-user@lucene.apache.org
 Date: Wednesday, August 26, 2009, 11:31 AM


 You could have used your previous thread itself (

 http://www.lucidimagination.com/search/document/31c1ebcedd4442b/exact_pattern_search_in_solr
 ),
 Bhaskar.

 In your scenario one, you need an exact token match, right? You are getting
 expected results if your field type is text. Look for the
 WordDelimiterFilterFactory in your field type definition for the text
 field inside schema.xml. You'll find an attribute splitOnCaseChange=1.
 Because of this, ChandarBhaskar is converted into two tokens Chandra
 and
 Bhaskar and hence the matches. You may choose to remove this attribute if
 the behaviour is not desired.

 For your scenario two, you may want to look at the KeywordTokenizerFactory
 and EdgeNGramFilterFactory on Solr wiki.

 Generally, for all such use cases people create multiple fields in their
 schema storing the same data analyzed in different ways.

 Cheers
 Avlesh

 On Wed, Aug 26, 2009 at 10:58 PM, bhaskar chandrasekar 
 bas_s...@yahoo.co.in
  wrote:

  Hi,
 
  Can any one help me with the below scenario?.
 
  Scenario 1:
 
  Assume that I give Google as input string
  i am using Carrot with Solr
  Carrot is for front end display purpose
  the issue is
  Assuming i give BHASKAR as input string
  It should give me search results pertaining to BHASKAR only.
   Select * from MASTER where name =Bhaskar;
   Example:It should not display search results as ChandarBhaskar or
   BhaskarC.
   Should display Bhaskar only.
 
  Scenario 2:
   Select * from MASTER where name like %BHASKAR%;
   It should display records containing the word BHASKAR
   Ex: Bhaskar
  ChandarBhaskar
   BhaskarC
   Bhaskarabc
 
   How to achieve Scenario 1 in Solr ?.
 
 
 
  Regards
  Bhaskar
 
 
 
 


 __
 Do You Yahoo!?
 Tired of spam?  Yahoo! Mail has the best spam protection around
 http://mail.yahoo.com

Why isn't this working?

Yesterday or the day before, I asked specifically if I would need to
restart the Solr server if somebody else loaded data into the Solr
index using the EmbeddedServer, and I was told confidently that no,
the Solr server would see the new data as soon as it was committed.
So today I fired up the Solr server (and after making
apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really
lives and restarting the web server), and did some queries.  Then I
ran a program that loaded a bunch of data and committed it.  Then I
did the queries again.  And the new data is NOT showing.  Using Luke,
I can see 10022 documents in the index, but the Solr statistics page
(http://localhost:8080/solrChunk/admin/stats.jsp) is still showing
8677, which is how many there were before I reloaded the data.

So am I doing something wrong, or was the assurance I got yesterday
that this is possible wrong?

-- 
http://www.linkedin.com/in/paultomblin

Single Configuration for Master/Slave Replication - SOLR-1355

2009-08-27 Thread Ilan Rabinovitch


Hello,

I noticed the the documentation around Solr Replication in the wiki has 
recently changed to take Paul's patch into account (SOLR-1355).


I now see that with the current trunk of SOLR 1.4 it is possible to use 
a single solrconfig.xml to define both master and slave configurations, 
with environment variables determining which mode is selected.


Can settings outside of the replication handler be set different based 
on which mode is enabled as well?


For example, settings such as cache sizes might differ between a master 
and a slave configuration (ie autowarming, cache sizes, etc).


Can those similarly be wrapped in a lst tag with a name of master or 
slave set?


Thanks,
Ilan


--
Ilan Rabinovitch
i...@fonz.net

---
SCALE 8x: 2010 Southern California Linux Expo
Feb 19-21, 2010
Los Angeles, CA
http://www.socallinuxexpo.org

Re: Why isn't this working?

2009-08-27 Thread Ryan McKinley



On Aug 27, 2009, at 10:35 PM, Paul Tomblin wrote:


Yesterday or the day before, I asked specifically if I would need to
restart the Solr server if somebody else loaded data into the Solr
index using the EmbeddedServer, and I was told confidently that no,
the Solr server would see the new data as soon as it was committed.
So today I fired up the Solr server (and after making
apache-tomcat-6.0.20/solr/data a symlink to where the Solr data really
lives and restarting the web server), and did some queries.  Then I
ran a program that loaded a bunch of data and committed it.  Then I
did the queries again.  And the new data is NOT showing.  Using Luke,
I can see 10022 documents in the index, but the Solr statistics page
(http://localhost:8080/solrChunk/admin/stats.jsp) is still showing
8677, which is how many there were before I reloaded the data.

So am I doing something wrong, or was the assurance I got yesterday
that this is possible wrong?



did not follow the advice from yesterday... but...

the commit word can be a but misleading, it could also be called  
reload


Say you have an embedded solr server and an http solr server pointed  
to the same location.

1.  make sure only is read only!  otherwise you can make a mess.
2. calling commit on the embedded solr instance, will not have any  
effect on the http instance UNTIL you call commit (reload) on the http  
instance.


ryan

Re: Single Configuration for Master/Slave Replication - SOLR-1355

any attribute specified in solrcore.properties can be referenced in
solrconfig.xml/schema.xml. this has nothing specific with replication.

On Fri, Aug 28, 2009 at 8:19 AM, Ilan Rabinovitchi...@fonz.net wrote:
 Hello,

 I noticed the the documentation around Solr Replication in the wiki has
 recently changed to take Paul's patch into account (SOLR-1355).

 I now see that with the current trunk of SOLR 1.4 it is possible to use a
 single solrconfig.xml to define both master and slave configurations, with
 environment variables determining which mode is selected.

 Can settings outside of the replication handler be set different based on
 which mode is enabled as well?

 For example, settings such as cache sizes might differ between a master and
 a slave configuration (ie autowarming, cache sizes, etc).

 Can those similarly be wrapped in a lst tag with a name of master or slave
 set?

 Thanks,
 Ilan


 --
 Ilan Rabinovitch
 i...@fonz.net

 ---
 SCALE 8x: 2010 Southern California Linux Expo
 Feb 19-21, 2010
 Los Angeles, CA
 http://www.socallinuxexpo.org





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: UpdateRequestProcessor config location

could you provide more details on what exactly is that you have done?

On Fri, Aug 28, 2009 at 7:08 AM, Erik Earleerikea...@yahoo.com wrote:

 I've read through the wiki for this and it explains most everything except 
 where in the solrconfig.xml theupdateRequestProcessorChain  goes.

 I tried it at the top level but that doesn't seem to do anything.

 http://wiki.apache.org/solr/UpdateRequestProcessor








-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com

Re: Solr Replication