Apache solr for multiple searches

2009-11-28 Thread Bhuvi HN
Hi
I have been using Apache Solr for my Job Portal. I have been using the
Apache Solr successfully for searching the resumes based on  keywords. Now i
need to use the same for Job search.
Can we have one single instance of the Apache Solr running for both the
search like Job search and resume search.
Regards
Bhuvan


Multi index

2009-11-28 Thread Jörg Agatz
Hallo Users...

At the Moment i test MultiCorae Solr, but i cant search in more than one
core direktly..

Exist a way to use multiindex, 3-5 Indizes in one core ans search direkty in
all? ore only in one?

it is realy important or my Projekt.

Thanks

King


Re: restore space between words by spell checker

2009-11-28 Thread Andrzej Bialecki

Otis Gospodnetic wrote:

I'm not sure if that can be easily done (other than going char by char and 
testing), because nothing indicates where the space might be, not even an upper 
case there.  I'd be curious to know if you find a better solution.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



- Original Message 

From: Andrey Klochkov akloch...@griddynamics.com
To: solr-user solr-user@lucene.apache.org
Sent: Fri, November 27, 2009 6:09:08 AM
Subject: restore space between words by spell checker

Hi

If a user issued a misspelled query, forgetting to place space between
words, is it possible to fix it with a spell checker or by some other
mechanism?

For example, if we get query tommyhitfiger and have terms tommy and
hitfiger in the index, how to fix the query?


The usual approach to solving this is to index compound words, i.e. when 
producing a spellchecker dictionary add a record tommyhitfiger with a 
field that points to tommy hitfiger. Details vary depending on what 
spellchecking impl. you use.




--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Retrieving large num of docs

2009-11-28 Thread Raghuveer Kancherla
Hi Andrew,
I applied the patch you suggested. I am not finding any significant changes
in the response times.
I am wondering if I forgot some important configuration setting etc.
Here is what I did:

   1. Wrote a small program using solrj to use EmbeddedSolrServer (most of
   the code is from the solr wiki) and run the server on an index of ~700k docs
   and note down the avg response time
   2. Applied the SOLR-797.patch to the source code of Solr1.4
   3. complied the source code and rebuilt the jar files.
   4. Rerun step 1 using the new jar files.

Am I supposed to do any other config changes in order to see the performance
jump that you are able to achieve.

Thanks a lot,
Raghu


On Fri, Nov 27, 2009 at 3:16 PM, AHMET ARSLAN iori...@yahoo.com wrote:

  Hi Andrew,
  We are running solr using its http interface from python.
  From the resources
  I could find, EmbeddedSolrServer is possible only if I am
  using solr from a
  java program.  It will be useful to understand if a
  significant part of the
  performance increase is due to bypassing HTTP before going
  down this path.
 
  In the mean time I am trying my luck with the other
  suggestions. Can you
  share the patch that helps cache solr documents instead of
  lucene documents?

 May be these links can help
 http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
 http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
 http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr

 how often do you update your index?
 is your index optimized?
 configuring caching can also help:

 http://wiki.apache.org/solr/SolrCaching
 http://wiki.apache.org/solr/SolrPerformanceFactors








Re: Retrieving large num of docs

2009-11-28 Thread Andrey Klochkov
Hi Raghu

Let me describe our use case in more details. Probably that will clarify
things.

The usual use case for Lucene/Solr is retrieving of small portion of the
result set (10-20 documents). In our case we need to read the whole result
set and this creates huge load on Lucene index, meaning a lot of IO. Keep in
mind that we have large number of stored fields in the index.

In our case there's one thing that makes things simpler: our index is so
small that we can get every document in cache. This means that even if we
retrieve all documents for every result set, we don't retrieve them from
Lucene index and then the performance should be Ok. But here we've got 2
problems:

1. Solr caches Lucene's Document instances. And in case of retrieving the
whole result set it recreates SolrDocument instances every time. This
creates a load on CPU and in particular on Java GC.
2. EmbeddedSolrServer converts the whole response into a byte array and then
restores it back converting Lucene's documents and DocList's to Solr's
SolrDocument and SolrDocumentList instances. This create additional load on
CPU and GC.

We patched Solr to eliminate those things and that fixed our performance
problems.

I think that if you don't place all your documents in caches and/or you
don't use stored fields, retrieving ID field only, then probably those
improvements won't help you.

I suggest you first to find your bottlenecks. Look at IO, memory usage etc.
Using a profiler is the best thing too. Probably you can use some tools from
lucidimation for profiling.

On Sat, Nov 28, 2009 at 4:47 PM, Raghuveer Kancherla 
raghuveer.kanche...@aplopio.com wrote:

 Hi Andrew,
 I applied the patch you suggested. I am not finding any significant changes
 in the response times.
 I am wondering if I forgot some important configuration setting etc.
 Here is what I did:

   1. Wrote a small program using solrj to use EmbeddedSolrServer (most of
   the code is from the solr wiki) and run the server on an index of ~700k
 docs
   and note down the avg response time
   2. Applied the SOLR-797.patch to the source code of Solr1.4
   3. complied the source code and rebuilt the jar files.
   4. Rerun step 1 using the new jar files.

 Am I supposed to do any other config changes in order to see the
 performance
 jump that you are able to achieve.

 Thanks a lot,
 Raghu


 On Fri, Nov 27, 2009 at 3:16 PM, AHMET ARSLAN iori...@yahoo.com wrote:

   Hi Andrew,
   We are running solr using its http interface from python.
   From the resources
   I could find, EmbeddedSolrServer is possible only if I am
   using solr from a
   java program.  It will be useful to understand if a
   significant part of the
   performance increase is due to bypassing HTTP before going
   down this path.
  
   In the mean time I am trying my luck with the other
   suggestions. Can you
   share the patch that helps cache solr documents instead of
   lucene documents?
 
  May be these links can help
  http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
  http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
  http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr
 
  how often do you update your index?
  is your index optimized?
  configuring caching can also help:
 
  http://wiki.apache.org/solr/SolrCaching
  http://wiki.apache.org/solr/SolrPerformanceFactors
 
 
 
 
 
 




-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


Re: restore space between words by spell checker

2009-11-28 Thread Andrey Klochkov


 For example, if we get query tommyhitfiger and have terms tommy and
 hitfiger in the index, how to fix the query?


 The usual approach to solving this is to index compound words, i.e. when
 producing a spellchecker dictionary add a record tommyhitfiger with a
 field that points to tommy hitfiger. Details vary depending on what
 spellchecking impl. you use.


I'm using the default Solr's spell checker, which is using n-gram index and
Levenshtein distance. Can it's be customized to include compound words? What
alternative spell checkers for Lucene/Solr do exist?

I tried to experiment with Lucene spell checker and noticed that if
configured with a low accuracy it can find words tommy and hilfiger that
form the whole word. So I was able to create some logic which post-process
spell checker results and finds the correct query tommy hilfiger. It just
iterates over all possible combinations of terms suggested by spell checker
and compares the resulting query to original by DoubleMetaphor. I'm not sure
that this is the best solution though, probably it's just not fast enough.

-- 
Andrew Klochkov
Senior Software Engineer,
Grid Dynamics


is it possible to use Xinclude in schema.xml?

2009-11-28 Thread Peter Wolanin
I'm trying to determine if it's possible to use Xinclude to (for
example) have a base schema file and then substitute various pieces.

It seems that the schema fieldTypes throw exceptions if there is an
unexpected attribute?

SEVERE: java.lang.RuntimeException: schema fieldtype
text(org.apache.solr.schema.TextField) invalid
arguments:{xml:base=solr/core2/conf/text-analyzer.xml}

This is what I'm trying to do (details of the analyzer chain omitted -
nothing unusual) - so the error occurs when the external xml file is
actually included:

xi:include  href=solr/core2/conf/text-analyzer.xml
xmlns:xi=http://www.w3.org/2001/XInclude; 
  xi:fallback
fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
...
  /analyzer
  analyzer type=query
...
  /analyzer
/fieldType
  /xi:fallback
/xi:include


Where (for testing) the text-analyzer.xml file just looks like the fallback:


?xml version=1.0 encoding=UTF-8 ?
fieldType name=text class=solr.TextField positionIncrementGap=100 
  analyzer type=index
...
  /analyzer
  analyzer type=query
...
  /analyzer
/fieldType


-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: is it possible to use Xinclude in schema.xml?

2009-11-28 Thread Peter Wolanin
Follow-up:  it seems the schema parser doesn't barf if you use
xinclude with a single analyzer element, but so far seems like it's
impossible for a field type.  So this seems to work:

fieldType name=text class=solr.TextField positionIncrementGap=100 
xi:include  href=solr/core2/conf/text-analyzer.xml 
  xi:fallback
  analyzer type=index
...
  /analyzer
  /xi:fallback
/xi:include
  analyzer type=query
...
  /analyzer
/fieldType

On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com wrote:
 I'm trying to determine if it's possible to use Xinclude to (for
 example) have a base schema file and then substitute various pieces.

 It seems that the schema fieldTypes throw exceptions if there is an
 unexpected attribute?

 SEVERE: java.lang.RuntimeException: schema fieldtype
 text(org.apache.solr.schema.TextField) invalid
 arguments:{xml:base=solr/core2/conf/text-analyzer.xml}

 This is what I'm trying to do (details of the analyzer chain omitted -
 nothing unusual) - so the error occurs when the external xml file is
 actually included:

 xi:include  href=solr/core2/conf/text-analyzer.xml
 xmlns:xi=http://www.w3.org/2001/XInclude; 
  xi:fallback
    fieldType name=text class=solr.TextField positionIncrementGap=100 
      analyzer type=index
 ...
      /analyzer
      analyzer type=query
 ...
      /analyzer
    /fieldType
  /xi:fallback
 /xi:include


 Where (for testing) the text-analyzer.xml file just looks like the fallback:


 ?xml version=1.0 encoding=UTF-8 ?
    fieldType name=text class=solr.TextField positionIncrementGap=100 
      analyzer type=index
 ...
      /analyzer
      analyzer type=query
 ...
      /analyzer
    /fieldType


 --
 Peter M. Wolanin, Ph.D.
 Momentum Specialist,  Acquia. Inc.
 peter.wola...@acquia.com




-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


Re: ExternalFileField is broken in Solr 1.4?

2009-11-28 Thread Yonik Seeley
Are you sure?
TestFunctionQuery.testExternalField() has a test for reloading on a commit.

Are you putting the file in the data directory?

-Yonik
http://www.lucidimagination.com


2009/11/28 Koji Sekiguchi k...@r.email.ne.jp:
 It seems that ExternalFileField doesn't work in 1.4.
 In 1.4, I need to restart Solr to reflect external_[fieldname] file.
 Only commit/ was needed in 1.3...

 Koji


Re: is it possible to use Xinclude in schema.xml?

2009-11-28 Thread David Stuart
Yea i tried it as well it doesn't seem to implement xpointer properly  
so you can't add multiple fields or field types


David

On 28 Nov 2009, at 18:49, Peter Wolanin peter.wola...@acquia.com  
wrote:



Follow-up:  it seems the schema parser doesn't barf if you use
xinclude with a single analyzer element, but so far seems like it's
impossible for a field type.  So this seems to work:

   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

xi:include  href=solr/core2/conf/text-analyzer.xml 
 xi:fallback
 analyzer type=index
...
 /analyzer
 /xi:fallback
/xi:include
 analyzer type=query
...
 /analyzer
   /fieldType

On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com 
 wrote:

I'm trying to determine if it's possible to use Xinclude to (for
example) have a base schema file and then substitute various pieces.

It seems that the schema fieldTypes throw exceptions if there is an
unexpected attribute?

SEVERE: java.lang.RuntimeException: schema fieldtype
text(org.apache.solr.schema.TextField) invalid
arguments:{xml:base=solr/core2/conf/text-analyzer.xml}

This is what I'm trying to do (details of the analyzer chain  
omitted -

nothing unusual) - so the error occurs when the external xml file is
actually included:

xi:include  href=solr/core2/conf/text-analyzer.xml
xmlns:xi=http://www.w3.org/2001/XInclude; 
 xi:fallback
   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

 analyzer type=index
...
 /analyzer
 analyzer type=query
...
 /analyzer
   /fieldType
 /xi:fallback
/xi:include


Where (for testing) the text-analyzer.xml file just looks like the  
fallback:



?xml version=1.0 encoding=UTF-8 ?
   fieldType name=text class=solr.TextField  
positionIncrementGap=100 

 analyzer type=index
...
 /analyzer
 analyzer type=query
...
 /analyzer
   /fieldType


--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com





--
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com


RE: Trouble Configuring WordDelimiterFilterFactory

2009-11-28 Thread Steven A Rowe
Hi Rahul,

On 11/26/2009 at 12:53 AM, Rahul R wrote:
 Is there a way by which I can prevent the WordDelimiterFilterFactory
 from totally acting on numerical data ?

prevent ... from totally acting on is pretty vague, and nowhere AFAICT do you 
say precisely what it is you want.

It would help if you could give example text and the terms you think should be 
the result of analysis of the text.  If you want different index/query time 
behavior, please provide this info for both.

Steve



Re: ExternalFileField is broken in Solr 1.4?

2009-11-28 Thread Koji Sekiguchi

Hmm, if I set reopenReaders to false, as SolrIndexReader objects between
before and after commit are different, ExternalFileField works as expected.
If I set reopenReaders to true, it doesn't work. ... I don't like this 
dependency.


What I don't understand is that if I set reopenReaders to true in
solrconfig-functionquery.xml,  TestFunctionQuery.testExternalField()
is done successfully.

Koji

--
http://www.rondhuit.com/en/


Yonik Seeley wrote:

Are you sure?
TestFunctionQuery.testExternalField() has a test for reloading on a commit.

Are you putting the file in the data directory?

-Yonik
http://www.lucidimagination.com


2009/11/28 Koji Sekiguchi k...@r.email.ne.jp:
  

It seems that ExternalFileField doesn't work in 1.4.
In 1.4, I need to restart Solr to reflect external_[fieldname] file.
Only commit/ was needed in 1.3...

Koji



  





Re: ExternalFileField is broken in Solr 1.4?

2009-11-28 Thread Koji Sekiguchi

Yonik Seeley wrote:

Go ahead and open a bug.  One idea is to use a different key for the
weak map (something that changes every commit).

-Yonik
http://www.lucidimagination.com

  

Yonik,

Thank you. I opend SOLR-1607.
Do you have any ideas for the candidate of the key?

Koji

--
http://www.rondhuit.com/en/



Multi Index

2009-11-28 Thread Bhuvi HN
Hi all
I am in need of using single Solr instance for multi indexing. Please let me
know if this is possble to do.
Regards
Bhuvi