Re: search query text field with Comma

2014-10-06 Thread Sven Maurmann
Dear Ravi,

this is most likely the consequence of the analyzer-configuration: If you 
tokenize your text without removing the
commas (and other punctuation), the comma right after the word Series will be 
part of the resulting token. You
should check the configuration and make sure you use the appropriate analyzer 
(e.g. standard analyzer).

Best,

Sven





Am 06.10.2014 um 20:56 schrieb EXTERNAL Taminidi Ravi (ETI, 
Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com:

 Hi users, This is may be a basic question, but I am facing some trouble.. 
 
 The scenario is ,  I have a text  Truck Series, 12V and 15V, if the user 
 search Truck Series it is not getting the row , but Truck Series, is 
 working.. How can I get for search Truck Series..?
 
 Thanks
 
 Ravi
 



Re: solr finds allways all documents

2012-08-20 Thread Sven Maurmann
Dear Robert,

could you give me a little more information about your setting? For example the 
complete solrconfig.xml and
the complete schema.xml would definitely help.

Best,

Sven

-- 
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonn
sven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann




Am 20.08.2012 um 16:39 schrieb robert rottermann:

 Hi there,
 I am new to solr et all. Besides I am a  java noob.
 
 What I am doing:
 I want to do full text retrival on office documents. The metadata of these 
 documents are maintained in Postgesql.
 So the only intormation I need to get out of solr is a documet ID.
 
 My problem no is, that my index seem to be done badly.
 (nearly) What ever I look up, returns all documents.
 
 I would be very glad, if somebody could give me an idea what I shoul change.
 
 thanks
 Robert
 
 
 What I am using is the sample configuration that comes with solr 3.6.
 I removed all the fields and added the following:
 
 fields
 
field name=docid type=string indexed=true stored=true 
 required=true/
field name=docnum type=text indexed=true stored=true 
 required=false/
field name=titel type=text indexed=true stored=true 
 required=false/
field name=fsname type=text indexed=true stored=true 
 required=false/
field name=directory type=text indexed=true stored=true 
 required=false/
field name=fulltext type=text indexed=true stored=false 
 required=false/
dynamicField name=* type=ignored /
 /fields
 !-- Field to use to determine and enforce document uniqueness.
Unless this field is marked with required=false, it will be a required 
 field
 --
 uniqueKeydocid/uniqueKey
 
 
 



Re: Language analyzers

2012-05-16 Thread Sven Maurmann
Hi!

Could you explain this a little more detailed?

Thanks,
Sven
Am 16.05.2012 um 16:17 schrieb anarchos78:

 Hello,
 
 Is it possible to use two language analyzers for one fieldtype. Lets say
 Greek and English (for indexing and querying)
 
 Thanks
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Language-analyzers-tp3984116.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: analyzers in schema

2012-05-07 Thread Sven Maurmann
Dear Gary,

yes, you are right.

Best,
   Sven

Am 07.05.2012 um 17:08 schrieb G.Long:

 Hi :)
 
 In the schema.xml file, If an analyzer is specified for a fieldtype but 
 without the attribute type=index or type=query, does it mean the analyzer 
 is used by default for both cases?
 
 Gary



Re: solr connection question

2010-07-08 Thread Sven Maurmann

Hi,

Solr runs as a Web application. The requests you most probably mean
are just HTTP-requests to the underlying container. Internally each
request is processed against the Lucene index, usually being a file-
based one. Therefore there are no connections like in a database
application, where you have a pool of connections to your remote
databse server.

Best,
  Sven

--On Donnerstag, 8. Juli 2010 15:46 +0300 ZAROGKIKAS,GIORGOS 
g.zarogki...@multirama.gr wrote:



Hi solr users

I need to know how solr manages the connections when we make a
request(select update commit) Is there any connection pooling or an
article to learn about it connection management?? How can I log in a file
the connections solr server

I have setup my solr 1.4 with tomcat

Thanks in advance


Re: Configuring RequestHandler in solrconfig.xml OR in the Servlet code using SolrJ

2010-06-22 Thread Sven Maurmann

Hi,

there are reasons for both options. Usually it is a good idea to put the 
default
configuration into the solrconfig.xml (and even fix some of the 
configuration) in

order to have simple client-side code.

But sometimesit is necessary to have some flexibility for the actual query. 
In this
situation one would use the client-side approach. If done right, this does 
not mean

to put the parameters in the servlet code.

Cheers,
Sven

--On Dienstag, 22. Juni 2010 17:52 +0200 Jan Høydahl / Cominvent 
jan@cominvent.com wrote:



Hi,

Sometimes I do both. I put the defaults in solrconfig.xml and thus have
one place to define all kind of low-level default settings.

But then I make a possibility in the application space to add/override
any parameters as well. This gives you great flexibility to let server
administrators (with access to solrconfig.xml) tune low level stuff, but
also gives programmers a middle layer to put domain-space config instead
of locking it down on the search node or up in the web interfaces.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 21. juni 2010, at 22.29, Saïd Radhouani wrote:


I completely agreed. Thanks a lot!

-S

On Jun 21, 2010, at 9:08 PM, Abdelhamid ABID wrote:


Why would someone port the solr config into servlet code  ?
IMO the first option would be the best choice, one obvious reason is
that, when alter the solr config you only need to restart the server,
whereas changing in the source drive you to redeploy your app and
restart the server.



On 6/21/10, Saïd Radhouani r.steve@gmail.com wrote:


Hello,

I'm developing a Web application that communicate with Solr using
SolrJ. I have three search interfaces, and I'm facing two options:

1- Configuring one SearchHandler per search interface in solrconfig.xml

Or

2- Write the configuration in the java servlet code that is using SolrJ

It there any significant difference between these two options ? If yes,
what's the best choice?

Thanks,

-Saïd





--
Abdelhamid ABID
Software Engineer- J2EE / WEB


Re: Build query programmatically with lucene, but issue to solr?

2010-05-28 Thread Sven Maurmann

Hi Pillip,

could you give me some more information of your environment? A first idea
that comes to my mind is to use the SearchComponents for the solution of
your problem. You could either replace the whole QueryComponent (not re-
commended) or write a (probably small) SearchComponent that creates the
Lucene query and puts it into the appropriate place in the ResponseBuilder.
If you add such a component to first-components in your 
handler-definition,

you will execute the query.

Regards,

Sven

--On Freitag, 28. Mai 2010 12:23 -0400 Phillip Rhodes 
rhodebumpl...@gmail.com wrote:



Hi.
I am building up a query with quite a bit of logic such as parentheses,
plus signs, etc... and it's a little tedious dealing with it all at a
string level.  I was wondering if anyone has any thoughts on constructing
the query in lucene and using the string representation of the query to
send to solr.

Thanks,
Phillip


Re: is solr ignored my filters ?

2010-04-19 Thread Sven Maurmann

Hi,

could you provide at least some information? Usually you
can be 100% sure that Solr uses the configuration it is
provided with.

Cheers,
Sven

--On Montag, 19. April 2010 05:53 -0800 stockii st...@shopgate.com wrote:



hey.

sry for this ... stupid question ;)

when i perform an import from my data is use some filters. how can i
really be sure that solr used my configured filters and analyzer ?

when i search in solr the result looks 100% like bevor an import.

th =)
--
View this message in context:
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729646.html Sent
from the Solr - User mailing list archive at Nabble.com.


Re: HTMLStripCharFilterFactory configuration problem

2010-04-14 Thread Sven Maurmann

Hi,

please note that you get the stored value of the field as a result and
not the indexed one.

Cheers,
   Sven

--On Wednesday, April 14, 2010 02:54:52 PM +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:



Hi all,

I am facing problem to configure HTMLStripCharFilterFactory.
following is the schema :
   fieldType name=text class=solr.TextField
positionIncrementGap=100   analyzer type=index
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt/
  /analyzer
  analyzer type=query
  charFilter class=solr.HTMLStripCharFilterFactory/!--
escapedTags=lt;,gt;/  --
tokenizer class=solr.WhitespaceTokenizerFactory/

   !--  filter class=solr.LengthFilterFactory min=2 max=50
/ -- filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt/
  /analyzer
/fieldType

when I am checking with analysis.jsp it giving true result. But in
my query result still I am getting html tage..
I am using solrj client..

please help me




--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann


Re: Need a bit of help, Solr 1.4: type text.

2010-02-11 Thread Sven Maurmann

Hi,

the parameter for WordDelimiterFilterFactory is catenateAll;
you should set it to 1.

Cheers,
Sven

--On Mittwoch, 10. Februar 2010 16:37 -0800 Yu-Shan Fung 
ambivale...@gmail.com wrote:



Check out the configuration of WordDelimiterFilterFactory in your
schema.xml.

Depending on your settings, it's probably tokenizaing 13th into 13 and
th. You can also have them concatenated back into a single token, but I
can't remember the exact parameter. I think it could be catenateAll.



On Wed, Feb 10, 2010 at 4:32 PM, Dickey, Dan dan.dic...@savvis.net
wrote:


I'm using the standard text type for a field, and part of the data
being indexed is 13th, as in Friday the 13th.

I can't seem to get it to match when I'm querying for Friday the 13th
either quoted or not.

One thing that does match is 13 th if I send the search query with a
space between...

Any suggestions?

I know this is short on detail, but it's been a long day... time to get
outta here.

Thanks for any and all help.

   -Dan




This message contains information which may be confidential and/or
privileged. Unless you are the intended recipient (or authorized to
receive for the intended recipient), you may not read, use, copy or
disclose to anyone the message or any information contained in the
message. If you have received the message in error, please advise the
sender by reply e-mail and delete the message and any attachment(s)
thereto without retaining any copies.





--
When nothing seems to help, I go look at a stonecutter hammering away
at his rock perhaps a hundred times without as much as a crack showing in
it. Yet at the hundred and first blow it will split in two, and I know it
was not that blow that did it, but all that had gone before. — Jacob
Riis


Re: dismax and multi-language corpus

2010-02-11 Thread Sven Maurmann

Hi,

this is correct. Usually one does not know, how a stemmer - or
other language specific filters - behaves in the context of a
foreign language.

But there is an exception that sometimes comes to the rescue:
If one has a stable dictionary of terms in all the languages
of interest, then one might put these terms in a synoynm list
and also into a list of protected words for the stemmers. Then
searches for one those terms in any language will return the
documents regardless of their own language.

Of course this does not solve the general problem of cross-language
search, but it helps in certain circumstances.

Cheers,
   Sven

--On Donnerstag, 11. Februar 2010 13:45 -0800 Otis Gospodnetic 
otis_gospodne...@yahoo.com wrote:



Claudio,

Ah, through multilingual indexing/search work (with
http://www.sematext.com/products/multilingual-indexer/index.html ) I
learned that cross-language search often doesn't really make sense,
unless the search involves universal terms (e.g. Fiat, BMW, Mercedes,
Olivetti, Tomi de Paola, Alberto Tomba...).  If the search involved
natural language-specific terms, then searching in the foreign language
doesn't work so well and doesn't make a ton.  Imagine a search for ciao
ragazzi.  I have no idea what the Italian stemmer does with that, but
say it turns it into cia raga (it doesn't, but just imagine).  If this
was done with Italian docs at index time, you will find the matching
docs.  But what happens if ciao ragazzi was analyzed by some German
analyzer?  Different tokens will be created and indexed, so a ciao
ragazzi search won't work.  And this Analyzer would you use to analyze
that query anyway?  Italian or German?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Hadoop ecosystem search :: http://search-hadoop.com/



- Original Message 

From: Claudio Martella claudio.marte...@tis.bz.it
To: solr-user@lucene.apache.org
Sent: Thu, February 11, 2010 3:21:32 AM
Subject: Re: dismax and multi-language corpus

I'll try removing the '-'. I do need now to search it. the other option
would be to request the user what language to query. but in my region we
use italian and german in the same quantity, so it would turn out in
querying both the languages all the time. or you meant a more performant
solution of query both the languages all the time? :)


Otis Gospodnetic wrote:
 Claudio - fields with '-' in them can be problematic.

 Side comment: do you really want to search across all languages at
 once?  If
not, maybe 3 different dismax configs would make your searches better.

  Otis
 
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Hadoop ecosystem search :: http://search-hadoop.com/



 - Original Message 

 From: Claudio Martella
 To: solr-user@lucene.apache.org
 Sent: Wed, February 10, 2010 3:15:40 PM
 Subject: dismax and multi-language corpus

 Hello list,

 I have a corpus with 3 languages, so i setup a text content field
 (with no stemming) and 3 text-[en|it|de] fields with specific
 snowball stemmers. i copyField the text to my language-away fields.
 So, I setup this dismax searchHandler:



   dismax
   title^1.2 content-en^0.8 content-it^0.8
 content-de^0.8
   title^1.2 content-en^0.8 content-it^0.8
 content-de^0.8
   title^1.2 content-en^0.8 content-it^0.8
 content-de^0.8
   0.1




 but i get this error:

 HTTP Status 400 - org.apache.lucene.queryParser.ParseException:
 Expected ',' at position 7 in 'content-en'

 type Status report

 message org.apache.lucene.queryParser.ParseException: Expected ',' at
 position 7 in 'content-en'

 description The request sent by the client was syntactically incorrect
 (org.apache.lucene.queryParser.ParseException: Expected ',' at
 position 7 in 'content-en').

 Any idea?

 TIA

 Claudio

 --
 Claudio Martella
 Digital Technologies
 Unit Research  Development - Analyst

 TIS innovation park
 Via Siemens 19 | Siemensstr. 19
 39100 Bolzano | 39100 Bozen
 Tel. +39 0471 068 123
 Fax  +39 0471 068 129
 claudio.marte...@tis.bz.it http://www.tis.bz.it

 Short information regarding use of personal data. According to
 Section 13 of  Italian Legislative Decree no. 196 of 30 June 2003, we
 inform you that we  process your personal data in order to fulfil
 contractual and fiscal
obligations
 and also to send you information regarding our services and events.
 Your  personal data are processed with and without electronic means
 and by
respecting
 data subjects' rights, fundamental freedoms and dignity, particularly
 with  regard to confidentiality, personal identity and the right to
 personal data  protection. At any time and without formalities you
 can write an e-mail to  priv...@tis.bz.it in order to object the
 processing of your personal data for

 the purpose of sending advertising materials and also to exercise the
 right
to
 access personal data and other rights referred to in Section 7 of
 Decree  196/2003. The data controller is TIS Techno Innovation Alto
 Adige, Siemens  Street n. 19, 

Re: How to reindex data without restarting server

2010-02-11 Thread Sven Maurmann

Hi,

restarting the Solr server wouldn't help. If you want to re-index
your data you have to pipe it through the whole process again.

In your case it might be a good idea to consider having several
cores holding the different schema definitions. This will not save
you from getting the original data and doing the analysis once
again, but at least you do not have a schema not being consistent
with the data in the index.

If you have a way to find and access the original data from the
unique id in your index, you may create a small program that reads
the data belonging to the id and sends it to the new core for
indexing (just rough toughts depending of the nature of your
situation).

Cheers,
Sven

--On Freitag, 12. Februar 2010 03:40 +0500 Emad Mushtaq 
emad.mush...@sigmatec.com.pk wrote:



Hi,

I would like to know if there is a way of reindexing data without
restarting the server. Lets say I make a change in the schema file. That
would require me to reindex data. Is there a solution to this ?

--
Muhammad Emad Mushtaq
http://www.emadmushtaq.com/


Re: Embedded Solr problem

2010-02-08 Thread Sven Maurmann

Hi Ranveer,

I assume that you have enough knowlesge in Java. You should essentially
your code for instantiating the server (depending on what you intend to
do this may be done in a separate class or in a method of the class doing
the queries). Then you use this instance to handle all the queries using
for example the method query of SolrServer.

For further information you may want to consult either the API documentation
or the url http://wiki.apache.org/solr/Solrj from the wiki.

Cheers,
   Sven

--On Montag, 8. Februar 2010 08:53 +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:



Hi Sven,
thanks for reply.

yes i notice that every time when request, new instance is created of solr
server.
could you please guide me to do the same ( initialization to create an
instance of SolrServer, once during first request).


On Mon, Feb 8, 2010 at 2:11 AM, Sven Maurmann
sven.maurm...@kippdata.dewrote:


Hi,

would it be possible that you instantiate a new instance of your
SolrServer every time you do a query?

You should use the code you quoted in your mail once during
initialization to create an instance of SolrServer (the interface being
implemented by EmbeddedSolrServer) and subsquently use the query method
of SolrServer to do the query.

Cheers,
   Sven


--On Sonntag, 7. Februar 2010 21:54 +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:

 Hi All,


I am still very new to solr.
Currently I am facing problem to use EmbeddedSolrServer.
following is my code:

   File home = new
File(D:/ranveer/java/solr_home/solr/first);
CoreContainer coreContainer = new CoreContainer();
SolrConfig config = null;
config = new SolrConfig(home + /core1,solrconfig.xml,null);
CoreDescriptor descriptor = new
CoreDescriptor(coreContainer,core1,home + /core1);
SolrCore core = new SolrCore(core1, home+/core1/data, config, new
IndexSchema(config, schema.xml,null), descriptor);
coreContainer.register(core.getName(), core, true);
final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
core1);

Now my problem is every time when I making request for search SolrCore
is initializing the core.
I want if the core/instance of core is already start then just use
previously started core.
Due to this problem right now searching is taking too much time.
I tried to close core after search but same thing when fresh search
result is made, solr is starting from very basic.

please help..
thanks





Re: Indexing / querying multiple data types

2010-02-08 Thread Sven Maurmann

Hi,

could you be a little more precise about your configuration?
It may be much easier to answer your question then.

Cheers,
Sven

--On Montag, 8. Februar 2010 17:39 + stefan.ma...@bt.com wrote:


OK - so I've now got my data-config.xml sorted so that I'm pulling in the
expected number of indexed documents for my two data sets

So I've defined two entities (name1  name2) and they both make use of
the same fields  --  I'm not sure if this is a good thing to have done

When I run a query I include qt=name1 (or qt=name2) and am expecting to
only get the number of results from the appropriate data set --  in fact
I'm getting the sum total from both

Does the entity name=name1 equate to the query qt=name1

In my solrconfig.xml I have defined two requestHandlers (name1  name2)
using the common set of fields

So how do ensure that my query
http://localhost:7001/solr/select/?q=foodqt=name1
or
http://localhost:7001/solr/select/?q=foodqt=name2

Will operate on the correct data set as loaded via the data import  --
entity name=name1 or entity name=name2




Thankss
Stefan Maric
BT Innovate  Design | Collaboration Platform - Customer Innovation
Solutions


Re: Use of solr.ASCIIFoldingFilterFactory

2010-02-07 Thread Sven Maurmann

Hi,

you might have run into an encoding problem. If you use Tomcat as
the container for Solr you should probably consult the following

  http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config

Cheers,
Sven


--On Freitag, 5. Februar 2010 15:41 +0100 Yann PICHOT ypic...@gmail.com 
wrote:



Hi,

I have define this type in my schema.xml file :

fieldType name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

Fields definition :

  fields
field name=id type=string indexed=true stored=true
required=true /
field name=idProd type=string indexed=false stored=false
required=false /
field name=description type=text indexed=true stored=true
required=false /
field name=artiste type=text indexed=true stored=true
required=false /
field name=collection type=text indexed=true stored=true
required=false /
field name=titre type=text indexed=true stored=true
required=false /
field name=all type=text indexed=true stored=true
required=false /
  /fields

  copyField source=description dest=all/
  copyField source=collection dest=all/
  copyField source=artiste dest=all/
  copyField source=titre dest=all/

I have import my documents with DataImportHandler (my orginals documents
are in RDBMS).

I test query this query string  on SOLR web application : all:chateau.
Results (content of the field all)  :
  CHATEAU D'AMBOISE
  [CHATEAU EN FRANCE, BABELON]
  ope dvd rene chateau
  CHATEAU DE LA LOIRE
  DE CHATEAU EN CHATEAU ENTRE LA LOIRE ET LE CHER
  [LE CHATEAU AMBULANT, HAYAO MIYAZAKI]
  [Chambres d'hôtes au château, Moreau]
  [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL]
  [NEUF, NAISSANCE D UN CHATEAU FORT, MACAULAY]
  [ARCHIMEDE, LA VIE DE CHATEAU, KRAHENBUHL]

Now i try this query string : all:château.
No result :(

I don't understand. I think the second query respond the same result of
the first query but it is not the case.

I use SOLR 1.4 (Solr Implementation Version: 1.4.0 833479 -
grantingersoll - 2009-11-06 12:33:40).
Java 32 bits : Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
OS : Windows Seven 64 bits

Regards,
--
Yann


Re: Embedded Solr problem

2010-02-07 Thread Sven Maurmann

Hi,

would it be possible that you instantiate a new instance of your SolrServer
every time you do a query?

You should use the code you quoted in your mail once during initialization
to create an instance of SolrServer (the interface being implemented by
EmbeddedSolrServer) and subsquently use the query method of SolrServer to
do the query.

Cheers,
Sven

--On Sonntag, 7. Februar 2010 21:54 +0530 Ranveer Kumar 
ranveer.s...@gmail.com wrote:



Hi All,

I am still very new to solr.
Currently I am facing problem to use EmbeddedSolrServer.
following is my code:

File home = new
File(D:/ranveer/java/solr_home/solr/first);
CoreContainer coreContainer = new CoreContainer();
SolrConfig config = null;
config = new SolrConfig(home + /core1,solrconfig.xml,null);
CoreDescriptor descriptor = new CoreDescriptor(coreContainer,core1,home
+ /core1);
SolrCore core = new SolrCore(core1, home+/core1/data, config, new
IndexSchema(config, schema.xml,null), descriptor);
coreContainer.register(core.getName(), core, true);
final EmbeddedSolrServer server = new EmbeddedSolrServer(coreContainer,
core1);

Now my problem is every time when I making request for search SolrCore is
initializing the core.
I want if the core/instance of core is already start then just use
previously started core.
Due to this problem right now searching is taking too much time.
I tried to close core after search but same thing when fresh search result
is made, solr is starting from very basic.

please help..
thanks


Re: Basic questions about Solr cost in programming time

2010-01-29 Thread Sven Maurmann

Hi!

Of course the answer depends (as usually) very much on the features
you want to realize. But Solr can be set up very fast. When we created
our first prototype, it took us about a week to get it running with
spell phoneme search, spell checking, facetting - and even collapsing
(using the famous 236-patch).

It is definitely very nice that you can do a lot of things using the
available components and only configuring them inside solrconfig.xml
and schema.xml.

And you may well start with the standard distribution.

Cheers,
   Sven

--On Dienstag, 26. Januar 2010 12:00 -0800 Jeff Crump 
jcr...@hq.mercycorps.org wrote:



Hi,
I hope this message is OK for this list.

I'm looking into search solutions for an intranet site built with Drupal.
Eventually we'd like to scale to enterprise search, which would include
the Drupal site, a document repository, and Jive SBS (collaboration
software). I'm interested in Lucene/Solr because of its scalability,
faceted search and optimization features, and because it is free. Our
problem is that we are a non-profit organization with only three very
busy programmers/sys admins supporting our employees around the world.

To help me argue for Solr in terms of total cost, I'm hoping that members
of this list can share their insights about the following:

* About how many hours of programming did it take you to set up your
instance of Lucene/Solr (not counting time spent on optimization)?

* Are there any disadvantages of going with a certified distribution
rather than the standard distribution?


Thanks and best regards,
Jeff

Jeff Crump
jcr...@hq.mercycorps.org


Re: Solr wiki link broken

2010-01-26 Thread Sven Maurmann

Hi,

you might want to try the link called Frontpage on the generic
wiki page. But well, this seems to be kind of broken for some
locales.

Regards,
 Sven

--On Dienstag, 26. Januar 2010 01:23 -0500 Teruhiko Kurosaka 
k...@basistech.com wrote:



In
http://lucene.apache.org/solr/
the wiki tab and Docs (wiki) hyper text in the side bar text after
expansion are the link to http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a
generic help page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro



Re: Solr wiki link broken

2010-01-26 Thread Sven Maurmann

Hi Erik,

one observation from me who is using the wiki from a browser
living in a non-US locale: I usually get the standard wiki
frontpage (in German) and not (!) the Solr-Frontpage I get,
if I use a US locale (or click on the link FrontPage).

B.t.w I know that this does not strictly belong to this list.

Cheers,
Sven


--On Dienstag, 26. Januar 2010 04:05 -0500 Erik Hatcher 
erik.hatc...@gmail.com wrote:



All seems well now.  The wiki does have its flakey moments though.

Erik

On Jan 26, 2010, at 1:23 AM, Teruhiko Kurosaka wrote:


In
http://lucene.apache.org/solr/
the wiki tab and Docs (wiki) hyper text in the side bar text after
expansion are the link to
http://wiki.apache.org/solr

But the wiki site seems to be broken.  The above link took me to a
generic help page of the Wiki system.

What's going on? Did I just hit the site in a maintenance time?

Kuro




Re: Index gets deleted after commit?

2010-01-25 Thread Sven Maurmann

DIH is the DataImportHandler. Please consult the two URLs

  http://wiki.apache.org/solr/DataImportHandler

and

  http://wiki.apache.org/solr/DataImportHandlerFaq

for further information.

Cheers,
Sven

--On Monday, January 25, 2010 11:33:59 AM +0200 Bogdan Vatkov 
bogdan.vat...@gmail.com wrote:



Hi Amit,

What is DIH? (I am Solr newbie).
In the mean time I resolved my issue - it was very stupid one - on
of the files in my folder with XMLs (that I send to Solr with the
SimplePostTool), and actually the latest created one (so it got
executed last each time I run the folder), contaoned
delete*:* :)

Best regards,
Bogdan

On Sun, Jan 24, 2010 at 6:25 AM, Amit Nithian anith...@gmail.com
wrote:


Are you using the DIH? If so, did you try setting clean=false in
the URL line? That prevents wiping out the index on load.

On Jan 23, 2010 4:06 PM, Bogdan Vatkov bogdan.vat...@gmail.com
wrote:

After mass upload of docs in Solr I get some REMOVING ALL
DOCUMENTS FROM INDEX without any explanation.

I was running indexing w/ Solr for several weeks now and
everything was ok -
I indexed 22K+ docs using the SimplePostTool
I was first launching

deletequery*:*/query/delete
commit waitFlush=true waitSearcher=true/

then some 22K+ Add...
with a finishing
commit waitFlush=true waitSearcher=true/

But you can see from the log - right after the last commit I get
this strange REMOVING ALL...
I do not remember what I changed last but now I have this issue
that after the mass upload of docs the index gets completely
deleted.

why is this happening?


log after the last commit:

INFO: start

commit(optimize=false,waitFlush=true,waitSearcher=true,expungeDele
tes=false) Jan 24, 2010 1:48:24 AM
org.apache.solr.core.SolrDeletionPolicy onCommit INFO:
SolrDeletionPolicy.onCommit: commits:num=2

commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind
ex,segFN=segments_fr,version=1260734716752,generation=567,filename
s=[segments_fr]

commit{dir=/store/dev/inst/apache-solr-1.4.0/example/solr/data/ind
ex,segFN=segments_fs,version=1260734716753,generation=568,filename
s=[_gv.nrm, segments_fs, _gv.fdx, _gw.nrm, _gv.tii, _gv.prx,
_gv.tvf, _gv.tis, _gv.tvd, _gv.fdt, _gw.fnm, _gw.tis, _gw.frq,
_gv.fnm, _gw.prx, _gv.tvx, _gw.tii, _gv.frq]
Jan 24, 2010 1:48:24 AM org.apache.solr.core.SolrDeletionPolicy
updateCommits
INFO: newest commit = 1260734716753
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
init INFO: Opening searc...@de26e52 main
Jan 24, 2010 1:48:24 AM
org.apache.solr.update.DirectUpdateHandler2 commit INFO:
end_commit_flush
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions
=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu
lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for searc...@de26e52 main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions
=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumu
lative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati
ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan
24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for searc...@de26e52 main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,s
ize=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulati
ve_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0} Jan
24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming searc...@de26e52 main from searc...@4e8deb8a main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction
s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum
ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for searc...@de26e52 main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,eviction
s=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cum
ulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming searc...@de26e52 main from
searc...@4e8deb8a main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0
,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumula
tive_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Jan 24, 2010 1:48:24 AM org.apache.solr.search.SolrIndexSearcher
warm INFO: autowarming result for 

Re: multi field search

2010-01-18 Thread Sven Maurmann

Hi,

you might want to use the Dismax-Handler.

Sven

--On Monday, January 18, 2010 02:58:09 PM +0100 Lukas Kahwe Smith 
m...@pooteeweet.org wrote:



Hi,

I realize that I can copy all fields together into one multiValue
field and set that as the defaultSearchField. However in that case
I cannot leverage the various custom analyzers I want to apply to
the fields separately (name should use doublemetaphone, street
should use the world splitter etc.). I can of course also do an OR
query as well. But it would be nice to be able to do:

q=*:foo

and that would simply search all fields against the query foo.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org


Re: Fundamental questions of how to build up solr for huge portals

2010-01-16 Thread Sven Maurmann

Hi!

Your question is quite general in nature, therefore here are only a few
initial remarks on how to get started:

If you want to have a global search over all of your portals it might be
best to start with one Solr instance and access it from all the portals.
If you plan to build collections that are special to one or another portal
you can do so during index-time: Just mark the indexed object in a dedicated
field of the index.

If you provide query handlers for each of the portals you can control the
behaviour of the search based on the respective portal. You may than use
query filters to filter results based on the portal.

So much for the erer side. For your question about which client (language) 
to

use:

Since Solr is able to generate responses for a number of client platforms
you may want to consult http://wiki.apache.org/solr/IntegratingSolr for
additional information. I like to use a very lightweight solution using
Java Script with the query responses from Solr being delivered via JSON.
Since you can do this also for PHP clients, you might want to give it a
try.

Regards,

Sven


--On Samstag, 16. Januar 2010 15:16 +0100 Peter zarato...@gmx.net wrote:


Hello!

Our team wants to use solr for an community portal built up out of 3 and
more sub portals. We are unsure in which way we sould build up the whole
architecture, because we have more than one portal and we want to make
them all connected and searchable by solr. Could some experts help us on
these questions?

- whats the best way to use solr to get the best performance for an huge
portal with 5000 users that might expense fastly?
- which client to use (Java,PHP...)? Now the portal is almost PHP/MySQL
based. But we want to make solr as best as it could be in all ways
(performace, accesibility, way of good programming, using the whole
features of lucene - like tagging, facetting and so on...)


We are thankful of every suggestions :)

Thanks,
Peter




--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann



Re: Problem with text field in Solr

2010-01-15 Thread Sven Maurmann

Hi,

from a first glance on your configuration it appears that run run 
into the

following:

You use a wildcard query to query a stemmed term (aviation becomes 
aviat)

in the index. Now if you provide a wildcard query with the trailing
asterisk as the only wildcard, this wildcard query is rewritten as a
prefix query, which is not (!) stemmed.

Therefore everything seems to be fine for your first two examples (as 
avia
and aviat are both prefixes of the stemmed aviation), but the 
remaining
three queries try to match the prefixes aviati, aviatio and aviation 
against

the stemm aviat of aviation - and fail.

You may want to consult either the Lucene documentation (on the 
QueryParser
for example) of the appropriate chapters in the excellent book Lucene 
in

Action (LIA) by Hatcher and Gospodnetic.

Hope that helps.

Sven



--On Friday, January 15, 2010 04:15:40 PM +0530 deepak agrawal 
dk.a...@gmail.com wrote:



HI,

I am using Solr in which I have BODY field as text.
But when i am searching with BODY having word like *aviation*

when i am Searching *BODY:avia** (aviation is coming)
when i am Searching *BODY:aviat** (aviation is coming)
when i am searching *BODY:aviati** (aviation is not coming)
when i am searching *BODY:aviatio** (aviation is not coming)
when i am searching *BODY:aviation** (aviation is not coming)

Please help me how  can i search these type of world with
(*aviati*,** aviatio*,**aviation**)

Below is the detail of How we are using BODY with Text.

*field name=BODY type=text indexed=true stored=true
multiValued=true termVectors=true/*

fieldType name=text class=solr.TextField
positionIncrementGap=100   analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query
time filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
--
!-- Case insensitive stop word removal.
 enablePositionIncrements=true ensures that a 'gap' is
left to  allow for accurate phrase queries.
--
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/

  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

--
DEEPAK AGRAWAL
+91-9379433455
GOOD LUCK.




--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann


Re: Need help Migrating to Solr

2010-01-14 Thread Sven Maurmann

Hi,

since we did some kind of migration in a similar situation in the 
recent

past, I might add some (hopefully helpful) remarks:

If You use a Lucene-based application right now, You might already 
have
an idea of which fields You want to store in Solr. Since You already 
do
analyzing of fields, it should be easy to identify the necessary 
analyzers
and filter-chains to be configured in the fiel-type part of the 
schema.


Once you got the basic definition of the schema, You can start loading
content into Solr. You can inspect the results using the admin web 
gui.

I've found the ad hoc query interface and the analysis facility very
helpful to get an idea of the inner workings.

Of course that is only the very beginning. You should realize that 
Solr

offers a very powerful mechanism to configure the way how queries are
handled (using query handlers ...). The book Solr 1.4 Enterprise 
Search
Server is a very good first step to understanding what You can do 
with

Solr (refer to Solr's home page for the complete citation).

Sven

--On Thursday, January 14, 2010 08:38:12 AM -0500 Grant Ingersoll 
gsing...@apache.org wrote:



I've done a fair number of migrations, but it's kind of hard to
give generic advice on it.  Specific questions as you dig in would
be best.   I'd probably, at least, just start with a simple schema
that models most of your data and get Solr up and ingesting it.
Then run some queries against it in your browser (no need for
writing client side code yet) then go from there.

-Grant

On Jan 12, 2010, at 11:42 PM, Abin Mathew wrote:


Hi

I am new to the solr technology. We have been using lucene for
handling searching in our web application www.toostep.com which is
a knowledge sharing platform developed in java using Spring MVC
architecture and iBatis as the persistance framework. Now that the
application is getting very complex we have decided to implement
Solr technology over lucene. Anyone having expertise in this area
please give me some guidelines on where to start off and how to
form the schema for Solr.

Thanks and Regards
Abin Mathew


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene:
http://www.lucidimagination.com/search






--
kippdata informationstechnologie GmbH
Sven Maurmann   Tel: 0228 98549 -12
Bornheimer Str. 33a Fax: 0228 98549 -50
D-53111 Bonnsven.maurm...@kippdata.de

HRB 8018 Amtsgericht Bonn / USt.-IdNr. DE 196 457 417
Geschäftsführer: Dr. Thomas Höfer, Rainer Jung, Sven Maurmann


RE: Problem comitting on 40GB index

2010-01-13 Thread Sven Maurmann

Hi!

Garbage collection is an issue of the underlying JVM. You may use
–XX:+PrintGCDetails as an argument to your JVM in order to collect
details of the garbage collection. If you also use the parameter
–XX:+PrintGCTimeStamps you get the time stamps of the garbage
collection.

For further information you may want to refer to the paper

http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf

which points you to a few other utilities related to GC.

Best,

Sven Maurmann

--On Mittwoch, 13. Januar 2010 18:03 + Frederico Azeiteiro 
frederico.azeite...@cision.com wrote:



The hanging didn't happen again since yesterday. I never run out of space
again. This is still a dev environment, so the number of searches is very
low. Maybe I'm just lucky...

Where can I see the garbage collection info?

-Original Message-
From: Marc Des Garets [mailto:marc.desgar...@192.com]
Sent: quarta-feira, 13 de Janeiro de 2010 17:20
To: solr-user@lucene.apache.org
Subject: RE: Problem comitting on 40GB index

Just curious, have you checked if the hanging you are experiencing is not
garbage collection related?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 13 January 2010 13:33
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

That's my understanding.. But fortunately disk space is cheap G


On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro 
frederico.azeite...@cision.com wrote:


Sorry, my bad... I replied to a current mailing list message only
changing the subject... Didn't know about this  Hijacking problem.
Will not happen again.

Just for close this issue, if I understand correctly, for an index of
40G, I will need, for running an optimize:
- 40G if all activity on index is stopped
- 80G if index is being searched...)
- 120G if index is being searched and if a commit is performed.

Is this correct?

Thanks.
Frederico
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: terça-feira, 12 de Janeiro de 2010 19:18
To: solr-user@lucene.apache.org
Subject: Re: Problem comitting on 40GB index

Huh?

On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 : Subject: Problem comitting on 40GB index
 : In-Reply-To: 
 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com

 http://people.apache.org/~hossman/#threadhijack
 Thread Hijacking on Mailing Lists

 When starting a new discussion on a mailing list, please do not reply
 to an existing message, instead start a fresh email.  Even if you
 change the subject line of your email, other mail headers still track
 which thread you replied to and your question is hidden in that
 thread and gets less attention.   It makes following discussions in
 the mailing list archives particularly difficult.
 See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



 -Hoss




--
This transmission is strictly confidential, possibly legally privileged,
and intended solely for the  addressee.  Any views or opinions expressed
within it are those of the author and do not necessarily  represent those
of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies.
If you  are not the intended recipient then you must not disclose, copy
or take any action in reliance of this  transmission. If you have
received this transmission in error, please notify the sender as soon as
possible.  No employee or agent is authorised to conclude any binding
agreement on behalf of  i-CD Publishing (UK) Ltd with another party by
email without express written confirmation by an  authorised employee of
the Company. http://www.192.com (Tel: 08000 192 192).  i-CD Publishing
(UK) Ltd  is incorporated in England and Wales, company number 3148549,
VAT No. GB 673128728.