date:20100517

Re: date slider

2010-05-17 Thread gwk


Hi,

I'm not sure if this applies to your use case but when I was building 
our faceted search (see http://www.mysecondhome.co.uk/search.html) at 
first I wanted to do the same, retrieve the minimum and maximum values 
but when I did the few values that were a lot higher than the others 
made it almost impossible to select a reasonable range. That's why I 
switched to a fixed range of reasonable values with the last option 
being anything higher. This way the resultset is spread out pretty 
evenly over the length of the slider. If the values over which you want 
to do range selection don't vary a lot I think this is the best option, 
otherwise I guess you'll have to use another solution. Maybe if the 
values do change a lot but not very often you could generate new fixed 
range values after updating Solr. If you think something like what I've 
made is useful to you, I'll be happy to answer any questions about how I 
implemented this.


Regards,

gwk

On 5/16/2010 10:07 PM, Lukas Kahwe Smith wrote:

On 16.05.2010, at 21:01, Ahmet Arslan iori...@yahoo.com wrote:




http://wiki.apache.org/solr/StatsComponent can give you
min and max values.


Sorry my bad, I just tested StatsComponent with tdate field. And it 
is not working for date typed fields. Wiki says it is for numeric 
fields.


ok thx for checking. is my use case really so unusual? i guess i could 
store a unix timestamp or i just do a fixed range.


hmm if i use facets with a really large gap will it always give me at 
least the min and max maybe? will try it out when i get home.


regards
Lukas

Re: date slider

2010-05-17 Thread Király Péter


Maybe you would like something like this:

lowest value: 
http://localhost:8983/solr/select?q=*:*rows=1fl=datesort=date%20asc
highest value: 
http://localhost:8983/solr/select?q=*:*rows=1fl=datesort=date%20desc


Hope this helps,
Péter


- Original Message - 
From: gwk g...@eyefi.nl

To: solr-user@lucene.apache.org
Sent: Monday, May 17, 2010 11:04 AM
Subject: Re: date slider



Hi,

I'm not sure if this applies to your use case but when I was building our 
faceted search (see http://www.mysecondhome.co.uk/search.html) at first I 
wanted to do the same, retrieve the minimum and maximum values but when I 
did the few values that were a lot higher than the others made it almost 
impossible to select a reasonable range. That's why I switched to a fixed 
range of reasonable values with the last option being anything higher. 
This way the resultset is spread out pretty evenly over the length of the 
slider. If the values over which you want to do range selection don't vary 
a lot I think this is the best option, otherwise I guess you'll have to 
use another solution. Maybe if the values do change a lot but not very 
often you could generate new fixed range values after updating Solr. If 
you think something like what I've made is useful to you, I'll be happy to 
answer any questions about how I implemented this.


Regards,

gwk

On 5/16/2010 10:07 PM, Lukas Kahwe Smith wrote:

On 16.05.2010, at 21:01, Ahmet Arslan iori...@yahoo.com wrote:




http://wiki.apache.org/solr/StatsComponent can give you
min and max values.


Sorry my bad, I just tested StatsComponent with tdate field. And it is 
not working for date typed fields. Wiki says it is for numeric fields.


ok thx for checking. is my use case really so unusual? i guess i could 
store a unix timestamp or i just do a fixed range.


hmm if i use facets with a really large gap will it always give me at 
least the min and max maybe? will try it out when i get home.


regards
Lukas

Wildcars in Queries

2010-05-17 Thread Robert Naczinski

Hi,

i'm new to solr. Can I use wilcard like '*' in my queries?

Thanx,

Robert

Re: Wildcars in Queries

2010-05-17 Thread Leonardo Menezes

Yes, also you can use '?' for a single character wild card.

On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski 
robert.naczin...@googlemail.com wrote:

 Hi,

 i'm new to solr. Can I use wilcard like '*' in my queries?

 Thanx,

 Robert

Re: Wildcars in Queries

2010-05-17 Thread Robert Naczinski

How I can do that? I that distribute example I'cant use wildcards ;-(

2010/5/17 Leonardo Menezes leonardo.menez...@googlemail.com:
 Yes, also you can use '?' for a single character wild card.

 On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski 
 robert.naczin...@googlemail.com wrote:

 Hi,

 i'm new to solr. Can I use wilcard like '*' in my queries?

 Thanx,

 Robert

Re: Wildcars in Queries

2010-05-17 Thread Leonardo Menezes

http://wiki.apache.org/solr/SolrQuerySyntax

On Mon, May 17, 2010 at 11:44 AM, Robert Naczinski 
robert.naczin...@googlemail.com wrote:

 How I can do that? I that distribute example I'cant use wildcards ;-(

 2010/5/17 Leonardo Menezes leonardo.menez...@googlemail.com:
  Yes, also you can use '?' for a single character wild card.
 
  On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski 
  robert.naczin...@googlemail.com wrote:
 
  Hi,
 
  i'm new to solr. Can I use wilcard like '*' in my queries?
 
  Thanx,
 
  Robert

Direct hits using Solr

2010-05-17 Thread Sai . Thumuluri

Hi, Is there a way to have Solr return a URL that is not part of index. We have 
a need that search engine return a specific URL for a specific search term and 
that result is supposed to be the first result (per Biz) among the result set. 
The URL is an external URL and there is no intent to index contents of that 
site.  

any help towards feasibility of this issue is greatly appreciated

Thanks,
Sai Thumuluri

DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread stockii


Hello.

for my Delta-Import, i get the Id's which are should be updatet from an
extra table in my database.

... when dih finished the delta-import it's necessary, that the table with
the ID's is to delete.

can i put a sql query in the DIH for that issue ? this code should only be
send to the database when import was succesfully ...

any suggestions ?? 

thxxx
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823232.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Direct hits using Solr

2010-05-17 Thread Ahmet Arslan

 We have a need that
 search engine return a specific URL for a specific search
 term and that result is supposed to be the first result (per
 Biz) among the result set. 

This part seems like http://wiki.apache.org/solr/QueryElevationComponent

 The URL is an external URL and
 there is no intent to index contents of that site.  

Can you explain in more detail? Even if you don't index content of that site, 
you may have to index that URL.

Re: disable caches in real time

2010-05-17 Thread Marco Martinez

Any suggestions?

I have thought in have two configurations per server and reload each one
with the appropiated config file but i would prefer another solution if its
possible.

Thanks,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/14 Marco Martinez mmarti...@paradigmatecnologico.com

 Hi,

 I want to know if there is any approach to disable caches in a specific
 core from a multicore server.

 My situation is the next:

 I have a multicore server where the core0 will be listen to the queries and
 other core (core1) that will be replicated from a master server. Once the
 replication has been done, i will swap the cores. My point is that i want to
 disable the caches in the core that is in charge of the replication to save
 memory in the machine.

 Any suggestions will be appreciated.

 Thanks in advance,


 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread Ahmet Arslan

 for my Delta-Import, i get the Id's which are should be
 updatet from an
 extra table in my database.
 
 ... when dih finished the delta-import it's necessary, that
 the table with
 the ID's is to delete.
 
 can i put a sql query in the DIH for that issue ? 

deletedPkQuery (sql query) is used in delta-import to delete documents from 
solr index. Is this what you mean?

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread stockii


hm .. no =(

i want to delete from a mysql database, not from my solr-index
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823264.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread stockii


hm i think i can use deletedPkQuery but it dont works for me, but maybe you
can help me. here is my conifg.

 entity name=item pk=id transformer=script:BoostDoc
  query=select i.id, i.shop_id, i.is_active, i.shop ... 

  deltaImportQuery=select i.id, i.s ...WHERE ...  AND
i.id='${dataimporter.delta.update_id}

  deltaQuery=   SELECT update_id FROM solr_imports 

  deletedPkQuery =DELETE FROM solr_imports WHERE   
solr_imports.update_id='${dataimporter.item.update_id}' 


so, i only want to delete these ID'S which are updatet. this is my
exception:
SCHWERWIEGEND: Delta Import Failed
org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to
execute query: DELETE FROM solr_imports WHERE solr_imports.update_id='' 
Processing Document # 1


so the deletedPkQuery  get no ID's =(



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823298.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread Ahmet Arslan


 hm i think i can use deletedPkQuery but it dont works for
 me, but maybe you
 can help me. here is my conifg.
 
  entity name=item pk=id
 transformer=script:BoostDoc
   query=select i.id, i.shop_id, i.is_active, i.shop
 ... 
 
   deltaImportQuery=select i.id, i.s ...WHERE
 ...  AND
 i.id='${dataimporter.delta.update_id}
 
   deltaQuery=    SELECT update_id FROM
 solr_imports 
 
   deletedPkQuery =DELETE FROM solr_imports
 WHERE   
 solr_imports.update_id='${dataimporter.item.update_id}'
 
 
 
 so, i only want to delete these ID'S which are updatet.
 this is my
 exception:
 SCHWERWIEGEND: Delta Import Failed
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 Unable to
 execute query: DELETE FROM solr_imports WHERE
 solr_imports.update_id='' 
 Processing Document # 1
 
 
 so the deletedPkQuery  get no ID's =(

I am not sure what will happen with this kind of deletedPkQuery. Probably you 
won't be able to use ${dataimporter.delta.update_id} variable. But i am also 
curios what will happen. can you try this:

deletedPkQuery =DELETE FROM solr_imports WHERE   
solr_imports.update_id='${dataimporter.delta.update_id}'

Since your deltaQuery does not contain WHERE clause, why not delete (with 
another program or script) solr_imports table after delta-import?

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread stockii


thats what i try ! :D 

i dont want to do this with another script, because i never know when a
delta-import is finished, and when he is completed, i dont know with which
result. complete, fail, ?!?!?

so i thought dih can delete the updated ID's in my database =(

i try also to empty the table like this: TRUNCTATE TABLE solr_imports

this works, but i get new exception...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823413.html
Sent from the Solr - User mailing list archive at Nabble.com.

CFP for Lucene Revolution Conference, Boston, MA October 7 8 2010

2010-05-17 Thread Grant Ingersoll

Lucene Revolution Call For Participation - Boston, Massachusetts October 7  8, 
2010
 
The first US conference dedicated to Apache Lucene and Solr is coming to 
Boston, October 7  8, 2010. The conference is sponsored by Lucid Imagination 
with additional support from community and other commercial co‐sponsors. The 
audience will include those experienced Solr and Lucene application 
development, along with those experienced in other enterprise search 
technologies interested becoming more familiar with Solr and Lucene 
technologies and the opportunities they present. 

We are soliciting 45‐minute presentations for the conference.

Key Dates:
May 12, 2010 Call For Participation Open
June 23, 2010Call For Participation Closes
June 28, 2010Speaker Acceptance/Rejection Notification
October 5‐6, 2010  Lucene and Solr Pre‐conference Training Sessions
October 7‐8, 2010  Conference Sessions


Topics of interest include:
Lucene and Solr in the Enterprise (case studies, implementation, return on 
investment, etc.)
 “How We Did It” Development Case Studies
Spatial/Geo search
 Lucene and Solr in the Cloud (Deployment cases as well as tutorials)
Scalability and Performance Tuning
Large Scale Search
Real Time Search
Data Integration/Data Management
Lucene  Solr for Mobile Applications

All accepted speakers will qualify for discounted conference admission. 
Financial assistance is available for speakers that qualify.

To submit a 45‐minute presentation proposal, please send an email to 
c...@lucenerevolution.org with Subject containing: your name, Topic your 
session title containing the following information in plain text.

If you have more than one topic proposed, send a separate email. Do not attach 
Word or other text file documents.

Return all fields completed as follows:
1.Your full name, title, and organization 
2.Contact information, including your address, email, phone number 
3.The name of your proposed session (keep your title simple, interesting, 
and relevant to the topic) 
4.A 75‐200 word overview of your presentation; in addition to the topic, 
describe whether your
presentation is intended as a tutorial, description of an implementation, an 
theoretical/academic
discussion, etc. 
5.A 100‐200‐word speaker bio that includes prior conference speaking or 
related experience
To be considered, proposals must be received by 12 Midnight PDT Wednesday, June 
23, 2010.

Please email any general questions regarding the conference to 
i...@lucenerevolution.org. To be added to the conference mailing list, please 
email sig...@lucenerevolution.org. If your organization is interested in 
sponsorship opportunities, email spon...@lucenerevolution.org.

We look forward to seeing you in Boston!

RE: Direct hits using Solr

2010-05-17 Thread Sai . Thumuluri

How do I index an URL without indexing the content? Basically our requirement 
is that - we have certain search terms for which there need to be a URL that 
should come right on top. I tried to use elevate option within Solr - but from 
what I know - I need to have an id of the indexed content for me to elevate a 
particular URL. 

Sai Thumuluri 

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com] 
Sent: Monday, May 17, 2010 6:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Direct hits using Solr

 We have a need that
 search engine return a specific URL for a specific search
 term and that result is supposed to be the first result (per
 Biz) among the result set. 

This part seems like http://wiki.apache.org/solr/QueryElevationComponent

 The URL is an external URL and
 there is no intent to index contents of that site.  

Can you explain in more detail? Even if you don't index content of that site, 
you may have to index that URL.

Customized Solr DataImporter

2010-05-17 Thread kishan


HI,
I want to map my solr-fields using the Customized DataImport Handler

For ex:

 I have a fields called 
   field column=NAME name=field1 /
  field column=NO name=field2 /

Actually my column-names comes dynamically from another table it varies from
client to client.
instead of giving the Mapped-Db-columns  as 'NAME' i want to configure this
dynamically using the Customized Import Handler.

can i use My Own DataImportHandler To implement this.

Please help me .

Thanks in advance

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Customized-Solr-DataImporter-tp823556p823556.html
Sent from the Solr - User mailing list archive at Nabble.com.

Issues with clustering in multicore

2010-05-17 Thread Rakhi Khatwani

Hi,
  I was trying out a clustering example. which worked out as mentioned
in the document.
Now, I want to use the clustering feature in my multicore where i have my
core indexes saved.

so i edit the solrconfig.xml in tht file to add clustering information (i
did make sure that the lib declaration points to the correct location).

but when i restart the solrserver for multicore, i get the following
exception
May 17, 2010 7:17:41 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.clustering.ClusteringComponent'
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413)
at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492)
at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525)
at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833)
at org.apache.solr.core.SolrCore.init(SolrCore.java:551)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)
at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:139)
at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117)
at org.mortbay.jetty.Server.doStart(Server.java:210)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40)
at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.mortbay.start.Main.invokeMain(Main.java:183)
at org.mortbay.start.Main.start(Main.java:497)
at org.mortbay.start.Main.main(Main.java:115)
Caused by: java.lang.ClassNotFoundException:
org.apache.solr.handler.clustering.ClusteringComponent
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592)
at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357)
... 35 more

Any pointers,
RegardS,
Raakhi

Re: Direct hits using Solr

2010-05-17 Thread Erik Hatcher

Sai - this seems to be best built into your application tier above  
Solr, such that you have a database of special terms and URL mappings  
and simply present them above the results returned from Solr.


Erik
http://www.lucidimagination.com

On May 17, 2010, at 3:11 PM, sai.thumul...@verizonwireless.com wrote:

How do I index an URL without indexing the content? Basically our  
requirement is that - we have certain search terms for which there  
need to be a URL that should come right on top. I tried to use  
elevate option within Solr - but from what I know - I need to have  
an id of the indexed content for me to elevate a particular URL.


Sai Thumuluri

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Monday, May 17, 2010 6:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Direct hits using Solr


We have a need that
search engine return a specific URL for a specific search
term and that result is supposed to be the first result (per
Biz) among the result set.


This part seems like http://wiki.apache.org/solr/QueryElevationComponent


The URL is an external URL and
there is no intent to index contents of that site.


Can you explain in more detail? Even if you don't index content of  
that site, you may have to index that URL.

Re: Solr Search problem; cannot search the existing word in the index content

2010-05-17 Thread Erick Erickson

A couple of things:
1 try searching with debugQuery=on attached to your URL, that'll
give you some clues.
2 It's really worthwhile exploring the admin pages for a while, it'll also
give you a world of information. It takes a while to understand what the
various pages are telling you, but you'll come to rely on them.
3 Are you really searching with leading and trailing wildcards or is that
just the mail changing bolding? Because this is tricky, very tricky. Search
the mail archives for leading wildcard to see lots of discussion of this
topic.

You might back off a bit and try building up to wildcards if that's what
you're doing

HTH
Erick

On Mon, May 17, 2010 at 1:11 AM, Mint o_O! mint@gmail.com wrote:

 Hi,

 I'm working on the index/search project recently and i found solr which is
 very fascinating to me.

 I followed the test successful from the tutorial page. Starting up jetty
 and
 run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar
 *.xml*) so far so good at this stage.

 Now i have create my own testing westpac.xml file with real data I intend
 to
 implement, putting in exampledocs and again ran the command
 (user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*).
 Everything went on very well however when i searched for *rhode* which is
 in the content. And Index returned nothing.

 Could anyone guide me what I did wrong why i couldn't search for that word
 even though that word is in my index content.

 thanks,

 Mint

Re: Related terms/combined terms

2010-05-17 Thread stockii


ho it is possible to search with termscomponent and shingle for things likeÖ:
Driver Callaway it should be the same suggestion come like when i search
for Callaway Dri..
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Related-terms-combined-terms-tp694083p823749.html
Sent from the Solr - User mailing list archive at Nabble.com.

AW: Autosuggest

2010-05-17 Thread Markus.Rietzler

i have also thought about an autosuggest for our intranet search.
one other solution could be:

put all the searched queries into a database and do a lookup not on the terms 
indexed by solr but rather a lookup to what have been searched in the past. 

we have written a small script, that takes the solr-log, extracts the query, 
hits  co put everything into a mysql-database and then have the autosuggest 
search again these database entries.


markus

 -Ursprüngliche Nachricht-
 Von: Blargy [mailto:zman...@hotmail.com] 
 Gesendet: Samstag, 15. Mai 2010 17:45
 An: solr-user@lucene.apache.org
 Betreff: Re: Autosuggest
 
 
 Maybe I should have phrased it as: Is this ready to be used 
 with Solr 1.4?
 
 Also, as Grang asked in the thread, what is the actual status 
 of that patch?
 Thanks again!
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819765.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler


Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier

RE: Direct hits using Solr

2010-05-17 Thread Ahmet Arslan

 How do I index an URL without
 indexing the content? Basically our requirement is that - we
 have certain search terms for which there need to be a URL
 that should come right on top. I tried to use elevate option
 within Solr - but from what I know - I need to have an id of
 the indexed content for me to elevate a particular URL. 

What does your current schema.xml look like? How many URLs do you have? You can 
add a new field -lets say named URL- to schema.xml and insert those URL with 
some special uniqueKey. And then you can list those uniqueKeys and keywords 
into elevate.xml.

Re: Customized Solr DataImporter

2010-05-17 Thread Ahmet Arslan


 HI,
 I want to map my solr-fields using the Customized
 DataImport Handler
 
 For ex:
 
  I have a fields called 
    field column=NAME name=field1
 /
   field column=NO name=field2 /
 
 Actually my column-names comes dynamically from another
 table it varies from
 client to client.
 instead of giving the Mapped-Db-columns  as 'NAME' i
 want to configure this
 dynamically using the Customized Import Handler.
 
 can i use My Own DataImportHandler To implement this.

Sounds like you can do what you want using Dynamic_fields combined with or 
without a custom transformer.

http://wiki.apache.org/solr/SchemaXml#Dynamic_fields
http://wiki.apache.org/solr/DIHCustomTransformer

Re: Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler


Le 17/05/2010 16:57, Xavier Schepler a écrit :

Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier



I made some tests and it appears than the second query is much faster 
than the first ...

Re: Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Marco Martinez

No, the equivalent for this will be:

- A: (the lazy fox) *OR* B: (the lazy fox)
- C: (the lazy fox)


Imagine the situation that you dont have in B 'the lazy fox', with the AND
you get 0 results although you have 'the lazy fox' in A and C

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/17 Xavier Schepler xavier.schep...@sciences-po.fr

 Hey,

 let's say  I have :

 - a field named A with specific contents

 - a field named B with specific contents

 - a field named C witch contents only from A and B added with copyField.

 Are those queries equivalents in terms of performance :

 - A: (the lazy fox) AND B: (the lazy fox)
 - C: (the lazy fox)

 ??

 Thanks,

 Xavier

Re: Targeting two fields with the same query or one field gathering contents from both ?

2010-05-17 Thread Xavier Schepler


Le 17/05/2010 17:49, Marco Martinez a écrit :

No, the equivalent for this will be:

- A: (the lazy fox) *OR* B: (the lazy fox)
- C: (the lazy fox)


Imagine the situation that you dont have in B 'the lazy fox', with the AND
you get 0 results although you have 'the lazy fox' in A and C

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2010/5/17 Xavier Scheplerxavier.schep...@sciences-po.fr

   

Hey,

let's say  I have :

- a field named A with specific contents

- a field named B with specific contents

- a field named C witch contents only from A and B added with copyField.

Are those queries equivalents in terms of performance :

- A: (the lazy fox) AND B: (the lazy fox)
- C: (the lazy fox)

??

Thanks,

Xavier




 
   

yes you're right I figured it after posting.

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread Ahmet Arslan

 thats what i try ! :D 
 
 i dont want to do this with another script, because i never
 know when a
 delta-import is finished, and when he is completed, i dont
 know with which
 result. complete, fail, ?!?!?

If you are updating your index *only* with DIH, after every full/delta import 
commit and optimize occurs by default. And you can auto-run your special code 
after every optimize/commit.

http://wiki.apache.org/solr/SolrConfigXml#A.22Update.22_Related_Event_Listeners

solrconfig.xml:

   listener event=postCommit class=solr.RunExecutableListener
      str name=exesolr/bin/test/str
      str name=dir./str
      bool name=waittrue/bool
    /listener

test file:

#!/bin/bash
java -jar /home/search/junk.jar

Re: How to tell which field matched?

2010-05-17 Thread Kevin Osborn

In our case, we had specific matching that we needed to return, so I can't 
really contribute this to the code base, but we did get this working. 
Basically, we have a custom request handler. After it receives the search 
results, we then send this to our matcher algorithm. We then go through each 
document in the doc list. Based on the field type we are looking at, we send 
our input data through the correct analyzer and come up with a TokenStream. And 
then for each document, we also send each value in the field (for multivalued) 
through that field's analyzer to also produce a TokenStream. Each TokenStream 
was also sent into a multi-valued HashMap with starting position as the key. We 
then step through each position to find matches. We use some other hash lists 
as well to make it more efficient so that we are only analyzing the same data 
once.

In our case, we were just looking for score of how similar the index and input 
data were as well as some other information that was specific to our 
application. So, it is not necessarily how Solr/Lucene determined a match. But, 
it provided what we needed for our case. And in fact, we did not want exactly 
how the search results were created.

And then we return the NamedList similar to how the highlighter or debug works.

One warning is that this is a very doable problem, but is definitely not 
trivial to implement, depending on your specific requirements.




From: Jon Baer jonb...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sat, May 15, 2010 8:56:57 AM
Subject: Re: How to tell which field matched?

Sorry my response wasn't to actually use debugQuery on for production it was 
more of wondering if it (the component) gave you the insight data you were 
looking for, on a side note Im also interested in this type of component 
because there are a number of projects I have worked on recently where it seems 
people outside of tuning the index want to know why did my query match these 
results? in some sort of ~plain english explanation~.

I have the feeling what you want is possible it's just not finding it's way 
into the result set yet (guess) or needs a plugin.

- Jon  

On May 15, 2010, at 11:16 AM, Tim Garton wrote:

 Additionally, I don't think this gets us what we want with multiValued
 fields.  It tells if a multiValued field matched, but not which value
 out of the multiple values matched.  I am beginning to suspect that
 this information can't be returned and we may have to restructure our
 schema.
 
 -Tim
 
 On Sat, May 15, 2010 at 7:12 AM, Sascha Szott sz...@zib.de wrote:
 Hi,
 
 I'm not sure if debugQuery=on is a feasible solution in a productive
 environment, as generating such extra information requires a reasonable
 amount of computation.
 
 -Sascha
 
 Jon Baer wrote:
 
 Does the standard debug component (?debugQuery=on) give you what you need?
 
 
 http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22
 
 - Jon
 
 On May 14, 2010, at 4:03 PM, Tim Garton wrote:
 
 All,
 I've searched around for help with something we are trying to do
 and haven't come across much.  We are running solr 1.4.  Here is a
 summary of the issue we are facing:
 
 A simplified example of our schema is something like this:
 
   field name=id type=string indexed=true stored=true
 required=true /
   field name=title type=text indexed=true stored=true
 required=true /
   field name=date_posted type=tdate indexed=true stored=true /
   field name=supplement_title type=text indexed=true
 stored=true multiValued=true /
   field name=supplement_pdf_url type=text indexed=true
 stored=true multiValued=true /
   field name=supplement_pdf_text type=text indexed=true
 stored=true multiValued=true /
 
 When someone does a search we search across the title,
 supplement_title, and supplement_pdf_text fields.  When we get our
 results, we would like to be able to tell which field the search
 matched and if it's a multiValued field, which of the multiple values
 matched.  This is so that we can display results similar to:
 
Example Title
Example Supplement Title
Example Supplement Title 2 (your search matched this document)
Example Supplement Title 3
 
Example Title 2
Example Supplement Title 4
Example Supplement Title 5
Example Supplement Title 6 (your search matched this document)
 
etc.
 
 How would you recommend doing this?  Is there some way to get solr to
 tell us which field matched, including multiValued fields?  As a
 workaround we have been using highlighting to tell which field
 matched, but it doesn't get us what we want for multiValued fields and
 there is a significant cost to enabling the highlighting.  Should we
 design our schema in some other fashion to achieve these results?
 Thanks.
 
 -Tim

RE: Wildcars in Queries

2010-05-17 Thread Ankit Bhatnagar


Yes you can use.
But be careful with such Queries like *ababa, (might will blow up)
Also it depends on  how you are anlysing the fields?

Ankit

-Original Message-
From: Robert Naczinski [mailto:robert.naczin...@googlemail.com] 
Sent: Monday, May 17, 2010 5:22 AM
To: solr-user@lucene.apache.org
Subject: Wildcars in Queries

Hi,

i'm new to solr. Can I use wilcard like '*' in my queries?

Thanx,

Robert

RE: Direct hits using Solr

2010-05-17 Thread Sai . Thumuluri

Thank you Erik, I will follow this route

Sai Thumuluri 

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Sent: Monday, May 17, 2010 10:22 AM
To: solr-user@lucene.apache.org
Subject: Re: Direct hits using Solr

Sai - this seems to be best built into your application tier above  
Solr, such that you have a database of special terms and URL mappings  
and simply present them above the results returned from Solr.

Erik
http://www.lucidimagination.com

On May 17, 2010, at 3:11 PM, sai.thumul...@verizonwireless.com wrote:

 How do I index an URL without indexing the content? Basically our  
 requirement is that - we have certain search terms for which there  
 need to be a URL that should come right on top. I tried to use  
 elevate option within Solr - but from what I know - I need to have  
 an id of the indexed content for me to elevate a particular URL.

 Sai Thumuluri

 -Original Message-
 From: Ahmet Arslan [mailto:iori...@yahoo.com]
 Sent: Monday, May 17, 2010 6:12 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Direct hits using Solr

 We have a need that
 search engine return a specific URL for a specific search
 term and that result is supposed to be the first result (per
 Biz) among the result set.

 This part seems like
http://wiki.apache.org/solr/QueryElevationComponent

 The URL is an external URL and
 there is no intent to index contents of that site.

 Can you explain in more detail? Even if you don't index content of  
 that site, you may have to index that URL.

Date faceting and memory leaks

2010-05-17 Thread Yao


I have been running load testing using JMeter on a Solr 1.4 index with ~4
million docs. I notice a steady JVM heap size increase as I iterator 100
query terms a number of times against the index. The GC does not seems to
claim the heap after the test run is completed. It will run into OutOfMemory
as I repeat the test or increase the number of threads/users. 

The date facet queries are specified as following (as part of append
section in request handler):
lst name=appends
str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO
*]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO
NOW-30DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO
NOW-90DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO
NOW-180DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO
NOW-365DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[* TO
NOW-730DAY]/str
/lst

The last_modified field is a TrieDateField with a precisionStep of 6.

I have played for filterCache setting but does not have any effects as the
date field cache seems be  managed by Lucene FieldCahce.

Please help as I can be struggling with this for days. Thanks in advance.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Antonio Lobato

What garbage collection settings are you running at the command line when 
starting Solr?
On May 17, 2010, at 2:41 PM, Yao wrote:

 
 I have been running load testing using JMeter on a Solr 1.4 index with ~4
 million docs. I notice a steady JVM heap size increase as I iterator 100
 query terms a number of times against the index. The GC does not seems to
 claim the heap after the test run is completed. It will run into OutOfMemory
 as I repeat the test or increase the number of threads/users. 
 
 The date facet queries are specified as following (as part of append
 section in request handler):
lst name=appends
str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO
 *]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO
 NOW-30DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO
 NOW-90DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO
 NOW-180DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO
 NOW-365DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[* TO
 NOW-730DAY]/str
/lst
 
 The last_modified field is a TrieDateField with a precisionStep of 6.
 
 I have played for filterCache setting but does not have any effects as the
 date field cache seems be  managed by Lucene FieldCahce.
 
 Please help as I can be struggling with this for days. Thanks in advance.
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

RE: Date faceting and memory leaks

2010-05-17 Thread Ge, Yao (Y.)

I do not have any GC specific setting in command line. I had tried to
force GC collection via Jconsole at the end of the run but it didn't
seems to do anything the heap size.
-Yao 

-Original Message-
From: Antonio Lobato [mailto:alob...@symplicity.com] 
Sent: Monday, May 17, 2010 2:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Date faceting and memory leaks

What garbage collection settings are you running at the command line
when starting Solr?
On May 17, 2010, at 2:41 PM, Yao wrote:

 I have been running load testing using JMeter on a Solr 1.4 index with
~4
 million docs. I notice a steady JVM heap size increase as I iterator
100
 query terms a number of times against the index. The GC does not seems
to
 claim the heap after the test run is completed. It will run into
OutOfMemory
 as I repeat the test or increase the number of threads/users. 

 The date facet queries are specified as following (as part of append
 section in request handler):
lst name=appends
str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY
TO
 *]/str
 str
name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO
 NOW-30DAY]/str
 str
name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO
 NOW-90DAY]/str
 str
name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO
 NOW-180DAY]/str
 str
name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO
 NOW-365DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[* TO
 NOW-730DAY]/str
/lst

 The last_modified field is a TrieDateField with a precisionStep of 6.

 I have played for filterCache setting but does not have any effects as
the
 date field cache seems be  managed by Lucene FieldCahce.

 Please help as I can be struggling with this for days. Thanks in
advance.
 -- 
 View this message in context:
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
72p824372.html
 Sent from the Solr - User mailing list archive at Nabble.com.

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Re: Date faceting and memory leaks

2010-05-17 Thread Antonio Lobato

I have ~50 million docs, and use the follow lines without any issues:

-XX:MaxNewSize=24m -XX:NewSize=24m -XX:+UseParNewGC 
-XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC

Perhaps try them out?

On May 17, 2010, at 2:47 PM, Ge, Yao (Y.) wrote:

 I do not have any GC specific setting in command line. I had tried to
 force GC collection via Jconsole at the end of the run but it didn't
 seems to do anything the heap size.
 -Yao 
 
 -Original Message-
 From: Antonio Lobato [mailto:alob...@symplicity.com] 
 Sent: Monday, May 17, 2010 2:44 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Date faceting and memory leaks
 
 What garbage collection settings are you running at the command line
 when starting Solr?
 On May 17, 2010, at 2:41 PM, Yao wrote:
 
 
 I have been running load testing using JMeter on a Solr 1.4 index with
 ~4
 million docs. I notice a steady JVM heap size increase as I iterator
 100
 query terms a number of times against the index. The GC does not seems
 to
 claim the heap after the test run is completed. It will run into
 OutOfMemory
 as I repeat the test or increase the number of threads/users. 
 
 The date facet queries are specified as following (as part of append
 section in request handler):
   lst name=appends
   str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY
 TO
 *]/str
str
 name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO
 NOW-30DAY]/str
str
 name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO
 NOW-90DAY]/str
str
 name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO
 NOW-180DAY]/str
str
 name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO
 NOW-365DAY]/str
str name=facet.query{!ex=last_modified}last_modified:[* TO
 NOW-730DAY]/str
   /lst
 
 The last_modified field is a TrieDateField with a precisionStep of 6.
 
 I have played for filterCache setting but does not have any effects as
 the
 date field cache seems be  managed by Lucene FieldCahce.
 
 Please help as I can be struggling with this for days. Thanks in
 advance.
 -- 
 View this message in context:
 http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
 72p824372.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 ---
 Antonio Lobato
 Symplicity Corporation
 www.symplicity.com
 (703) 351-0200 x 8101
 alob...@symplicity.com
 

---
Antonio Lobato
Symplicity Corporation
www.symplicity.com
(703) 351-0200 x 8101
alob...@symplicity.com

Re: DIH. behavior after a import. Log, delete table !?

2010-05-17 Thread stockii


oh, nice.

so i can me make a jar-file with the query i need and in solrconfig.xml i
need to define this.. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p824484.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: StackOverflowError during Delta-Import

2010-05-17 Thread Blargy


Is there anymore information I can post so someone can give me a clue on
whats happening? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824516.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


No I still have the OOM issue with repeated facet query request on the date
field. I forgot to mention that I am running 64-bit IBM 1.5 JVM. I also
tried the Sun 1.6 JVM with and without your GC arguments. The GC pattern is
different but the heap size does not drop as the test going on. I tested
with a single thread from Jmeter just to make sure there is ample room for
GC to clean house. The jmeter fires request one after another without pause
but I assume it should not effect GC. It is clear to me that date facet
query has some major impact on this as I can run the load test with other
field facets with no problem (JVM heap size would stabilize at certain level
over time).
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824577.html
Sent from the Solr - User mailing list archive at Nabble.com.

shards design/customization coding question

2010-05-17 Thread D C


We have a large index, separated
into multiple shards, that consists of records exported from a database.  One 
requirement is to support near real-time
synchronization with the database.  To accomplish this we are considering 
creating
a daily shard where create and update documents
(records never get deleted) will be posted and at the end of the day, empty 
the daily shard into
the other shards and start afresh the next day.


 


The problem with this
approach is when an existing database record is updated into the daily shard, 
then the daily shard contains an updated document that has a duplicate id with 
another shard. 
It is my understanding that in the case of duplicate document ids returned
from multiple shards, the document returned first will be returned in the
search results and the other duplicate document ids will be discarded.


 


My question is where can I
customize the solr code to specify that documents from a particular shard 
should be
given precedence in the search results.  Any pointers would be very much 
appreciated.
  
_
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

SOLR-788

2010-05-17 Thread Shawn Heisey

I am looking at SOLR-788, trying to apply it to latest trunk.  It looks 
like that's going to require some rework, because the included constant 
PURPOSE_GET_MLT_RESULTS conflicts with something added later, 
PURPOSE_GET_TERMS.


How hard would it be to rework this to apply correctly to trunk?  Is it 
simply a matter of advancing the constant to the next bit in the mask?  
There's been no discussion on the issue as to whether the original patch 
or the alternate one is better.  Does anyone know?


Thanks,
Shawn

Re: StackOverflowError during Delta-Import

2010-05-17 Thread Blargy


I just found out if I remove my deletedPkQuery then the import will work. Is
it possible that the there is some conflict between my delta indexing and my
delta deleting?

Any suggestions?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824780.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Chris Hostetter

: Subject: Date faceting and memory leaks

First off, just to be clear, you don't seem to be useing the date 
faceting feature, you are using the Facet Query feature, your queries 
just so happen to be on a date field.

Second: to help people help you, you need to provide all the details.  
you've shown us the appends section of your request handler config, but 
you havne't given us any other details about the queries -- what does the 
*full* configuration look like for this handler?  what do all the test 
urls look like? etc...  You also haven't given us any other details 
about your solr setup.  in particularly, knowing what your cache 
configurations look like is crucial.

: I have been running load testing using JMeter on a Solr 1.4 index with ~4
: million docs. I notice a steady JVM heap size increase as I iterator 100
: query terms a number of times against the index. The GC does not seems to
: claim the heap after the test run is completed. It will run into OutOfMemory

Third: how *exactly* are you measuring/monitoring heap size ? ... you 
won't neccessarily see the Heap decrease in size, even after GC.

Forth: what do you cache sizes (and cache hit rates look like 
before/during/after your test run?  I ask aout this specificly because the 
queries you have configured don't do any date rounding, which means Solr 
will attempt to cache a differnet range query for each of your hard coded 
facet.query ranges every millisecond that it recieves a request...

: str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO
: *]/str

...so you might want to consider changing those to things like...

   str name=facet.query{!ex=last_modified}last_modified:[NOW/DAY-90DAY TO 
NOW/DAY-30DAY]/str

...if what you care about is day precision.  presumably in your requests 
you have an fq that is taged with the name last_modified ? (see what 
i mean about needing all the details, i'm just guessing here based on what 
i know) ... you'll want that to round down to the start of the day as 
well.

These unique queries for every millisecond could easily explain getting an 
OOM if your filterCache is very large (since i don't know how big your 
filterCache is, or what kind of cache hit rates you are getting, i can 
only guess)

: I have played for filterCache setting but does not have any effects as the
: date field cache seems be  managed by Lucene FieldCahce.

no.  a fieldCache is created for each field as needed (mainly for sorting, 
and in some cases for field term faceting) but for facet.querys like 
these (and for hte corrisponding fqs) an entry in the filterCache is 
created for each unique query.


-Hoss

Re: grouping in fq

2010-05-17 Thread Chris Hostetter


: Wait. If the default op is OR, I thought this query:
: 
: (+category:xyz +price:[100 TO *]) -category:xyz
: 
: meant with xyz and range, OR without xyz because without a plus or

Nope.  regardless of hte default op, you've got a BooleanQuery with two 
clauses, one of which is negative.  the other clauses is either mandatory 
because the default op says it should be, or it's mandatory because it's 
the only SHOULD clause and there are not MUST clauses.

consider it written out a little more simply...

(+A +B) -A

...the params arround the A and B clauses make them a BooleanQuery 
which we can call X...

X -A

...and now hopefully it's clear: A is prohibited, and since there aren't 
any mandatory clauses, and there is only one optional clause, that 
optional clause (X) is now mandatory ... Since X = (+A +B), that means (+A 
+B) is mandatory.  

so we get no matches because we can't match A and -A at the same time.

: minus, OR really means SHOULD (which, bizzarely, is not a keyword).

(yeah, it anoyes me that there is no prefix markup for SHOULD ... it 
wouldn't be so bad except thta if you change the default op to MUST there 
is no way of expression whole families of queries .. that's why i never 
recomend making the default op MUST)


-Hoss

Which Solr to use?

2010-05-17 Thread Sixten Otto

I've been investigating Solr on and off as a (or even the) search
solution for my employer's content management solution. One of the
biggest questions in my mind at this point is which version to go
with. In general, 1.4 would seem the obvious choice, as it's the only
released version on that list. There's a commercially supported distro
from Lucid, and things should presumably be pretty stable.

What led me down the rabbit hole is that a) we generally have quite a
lot of business documents to index (Word and PDF, mostly), and b) the
pull approach implemented in the DataImportHandler is much more
attractive in our architecture than the push model we'd otherwise
have to contruct. Unfortunately, the TikaEntityProcessor and the
binary data sources on which it depends were added after 1.4 was
released.

Back in early March, I was able to get things up and running with a
1.5 nightly (and Tika 0.7-snapshot), but since then the course of Solr
development has... changed significantly. The 1.5 branch has been
abandoned, and (to my uninformed eye) it seems that there's a lot of
upheaval in the trunk as things merge with Lucene. And it also appears
that the released Tika 0.7 might not be compatible with Solr? (Judging
by SOLR-1902, that is.)

What I'm looking for is some advice on what course to pursue:
- Plunge ahead with the trunk, and hope that things stabilize by a few
months from now, when we'd be hoping to go live on one of our biggest
client sites.
- Go with the last 1.5 code, knowing that the features we want are in
there, and hope we don't run into anything majorly broken.
- Stick with 1.4, and just accept the necessity of needing to push
content to the HTTP interface.

I don't expect a definitive answer, of course, but I'd like to be
better informed about the risks and benefits.

Also: does anyone have a sense whether it'd be possible to back-port
the TikaEntityProcessor stuff to 1.4?

Sixten

Re: synonyms not working with copyfield

2010-05-17 Thread Chris Hostetter


: fields during indexing. However, my search interface is just a text
: box like Google and I need to take the query and return only those
: documents that match ALL terms in the query and if I am going to take

as mentioned previously in this thread: this is exactly what the dismax 
QParser was designed for.

-Hoss

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


Chris,

Thanks for the detailed response. No I am not using Date Facet but Facet
Query as for facet display. Here is the full configuration of my dismax
query handler:

  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
title text^0.5 domain^0.1 nature^0.1 author
 /str
 str name=pf
title text
 /str
 str name=bf
recip(ms(NOW,last_modified),3.16e-11,1,1)
 /str
 str name=fl
url,title,domain,nature,src,last_modified,text,sz
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hlon/str
 str name=hl.fltitle,text/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.title.hl.fragsize0/str
 str name=f.text.hl.snippets3/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.text.hl.alternateFieldtext/str
 str name=f.text.h1.maxAlternateFieldLength400/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
lst name=appends
 str name=facet.field{!ex=src}src/str
 str name=facet.field{!ex=domain}domain/str
 str name=facet.field{!ex=nature}nature/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO
*]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO
NOW-30DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO
NOW-90DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO
NOW-180DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO
NOW-365DAY]/str
 str name=facet.query{!ex=last_modified}last_modified:[* TO
NOW-730DAY]/str
/lst
  /requestHandler

Cache settings:
  filterCache class=solr.LRUCache size=1512000 initialSize=1512000
autowarmCount=1280/
  queryResultCache  class=solr.LRUCache size=512  initialSize=512  
autowarmCount=32/
  documentCache class=solr.LRUCache  size=512  initialSize=512 
autowarmCount=0/

I am monitoring Solr JVM Heap Memory Usage via remote Jconsole, the image
below shows how heap size keep increasing as more facet query requests being
sent the Solr via JMeter:
http://n3.nabble.com/file/n825038/memory-1.jpg 

The following is the request URL pattern:
select?rows=0facet=truefacet.mincount=1facet.method=enumq=${query}qt=dismax
where ${query} is selected randomly from a list of 100 query terms

The date rounding suggest is a very good one, I will need to rerun the test
and report back on the cache setting. I remember my filterCache hit ratio is
around 0.7. I did use the tagged results for multi-select display of facet
values but in this case there is no fq in the load test request URL.

Thanks again and I will report back on the re-run with date rounding.

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825038.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Date faceting and memory leaks

2010-05-17 Thread Chris Hostetter

: Cache settings:
:   filterCache class=solr.LRUCache size=1512000 initialSize=1512000
: autowarmCount=1280/

that's a monster filterCache ...i can easly imagine it causing an OOM if 
your heap is only 5G.

: The date rounding suggest is a very good one, I will need to rerun the test
: and report back on the cache setting. I remember my filterCache hit ratio is
: around 0.7. I did use the tagged results for multi-select display of facet

a hit ratio or 0.7 ratio, or 0.7% hit rate? ... with that many 
unique facet queries, i can't imaging you were getting a 70% hit rate.  
I'm betting if you monitor that filterCache size and hit rate as you run 
your test you'll see it just grow and grow until the OOM.  and if you 
analyze the heap dumps you'll probably see the cache hanging on to a ton 
of DocSets that will never be used again.

: values but in this case there is no fq in the load test request URL.

I've never tested this, so i can't say for sure, but if it turns out that 
the filterCache is not your problem, then perhaps there is soemthing wonky 
with the filterquery exclusion code in cases like this -- where you 
explicilty exlucde a taged fq but that fq doesn't exist.  the qya to rule 
it out would be to remove the exlcusion from your configs and test it that 
way to see if the behavior is hte same.


-Hoss

Re: Date faceting and memory leaks

2010-05-17 Thread Yao


Chris,

Just completed the re-run and your date rounding tip saved my day. I now
realized the NOW as a timestamp is a very bad idea for query caching as it
is never the same in value. NOW/DAY would at least makes a set facet queries
caches re-usable for a period of time. It turns on you can help with your
insight with just the little fraction of information provided. Thanks again!

-Yao
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825059.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Date faceting and memory leaks

2010-05-17 Thread Yao


Just to close the loop.
 
I was fooling around the all the cache setting trying to figure out my
problem, so the filterCache is set as part of the experiments. It did
not cause any memory issue in this case. After the date rounding
adjustment, I re-ran the query with 15 threads with 6000 request and got
1,500/minute throughput by only using a little more than 0.5 GB of Heap
Memory.
 
The hit ratio reported in Solr admin statistics page shows filterCache
has a hitratio of 0.99. with 103800 lookups and 103773 hits, I assume it
is 99%. 
 
Have a nice day.
 
-Yao



From: Chris Hostetter-3 [via Lucene]
[mailto:ml-node+825052-1711725506-201...@n3.nabble.com] 
Sent: Monday, May 17, 2010 9:04 PM
To: Ge, Yao (Y.)
Subject: Re: Date faceting and memory leaks


: Cache settings: 
:   filterCache class=solr.LRUCache size=1512000
initialSize=1512000 
: autowarmCount=1280/ 

that's a monster filterCache ...i can easly imagine it causing an OOM if

your heap is only 5G. 

: The date rounding suggest is a very good one, I will need to rerun the
test 
: and report back on the cache setting. I remember my filterCache hit
ratio is 
: around 0.7. I did use the tagged results for multi-select display of
facet 

a hit ratio or 0.7 ratio, or 0.7% hit rate? ... with that many 
unique facet queries, i can't imaging you were getting a 70% hit rate.

I'm betting if you monitor that filterCache size and hit rate as you run

your test you'll see it just grow and grow until the OOM.  and if you 
analyze the heap dumps you'll probably see the cache hanging on to a ton

of DocSets that will never be used again. 

: values but in this case there is no fq in the load test request URL. 

I've never tested this, so i can't say for sure, but if it turns out
that 
the filterCache is not your problem, then perhaps there is soemthing
wonky 
with the filterquery exclusion code in cases like this -- where you 
explicilty exlucde a taged fq but that fq doesn't exist.  the qya to
rule 
it out would be to remove the exlcusion from your configs and test it
that 
way to see if the behavior is hte same. 


-Hoss 






View message @
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243
72p825052.html 
To unsubscribe from Re: Date faceting and memory leaks, click here
 (link removed) 
WdlQGZvcmQuY29tfDgyNTAzOHwxNjYwNDQ2MTQ1 . 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825086.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: shards design/customization coding question

2010-05-17 Thread Shawn Heisey


On 5/17/2010 2:40 PM, D C wrote:

We have a large index, separated
into multiple shards, that consists of records exported from a database.  One 
requirement is to support near real-time
synchronization with the database.  To accomplish this we are considering 
creating
a daily shard where create and update documents
(records never get deleted) will be posted and at the end of the day, empty 
the daily shard into
the other shards and start afresh the next day.

   

snip

My question is where can I
customize the solr code to specify that documents from a particular shard 
should be
given precedence in the search results.  Any pointers would be very much 
appreciated.

   


Quick answer: SOLR-1537.  https://issues.apache.org/jira/browse/SOLR-1537

Long answer begins with this: You probably don't need it.

This is exactly how we've got our system arranged, which has only been 
in production for a few weeks now.  There are six static shards that 
contain all but the newest content.  Another shard, which we call the 
incremental, holds the most recent data, currently three weeks.  The 
incremental shard gets updated every two minutes and optimized once an 
hour.  Deletes are run against all of the shards every ten minutes.  To 
avoid unnecessary cache warming, the delete script checks for the 
presence of the deleted data before actually running the update.  Once a 
night, the incremental index is trimmed to three weeks, with that data 
being distributed among the other shards, and one static shard gets 
optimized.


We have two unique identifiers in the database for each document.  One 
is an autoincrement field we call did, for document ID.  This is the 
primary key in the database table, but is used only behind the scenes.  
The other is tag_id, which is the field that a user sees and is the 
uniqueKey in Solr.  When a document is updated, its did will change, but 
its tag_id will not.  Deletes from Solr's perspective are handled by 
did, not tag_id, and when a document is updated, we treat the old did 
like any other delete.  The new document gets added to our incremental 
shard very quickly, and a little bit later, the old one is deleted from 
the static shard that contains it.


The incremental shard is much smaller than the others, so it responds a 
lot faster.  This means that there's a significant likelihood that it 
will always take precedence.  For reliability reasons in the event of a 
hardware problem, we did incorporate the patch from SOLR-1537 into our 
system, which in addition to keeping the index up when a shard goes 
away, makes the deduplication order explicit.  If you go the route you 
are planning, it is unlikely you'll need this.  I have since added load 
balancing to my setup, so when we upgrade SOLR, this patch will no 
longer be used.


In the absence of a second identifier and SOLR-1537, you could get more 
deterministic behavior by using the delete mechanism in a slightly 
different way from mine - add it to your daily/incremental index, then 
find it in the other shards and delete it.  It will mean a cache rewarm 
when the delete is committed, and I don't know if that will cause 
problems for your setup.


Thanks,
Shawn

Re: Solr Search problem; cannot search the existing word in the index content

2010-05-17 Thread Lance Norskog

backslash*rhode
\*rhode may work.

On Mon, May 17, 2010 at 7:23 AM, Erick Erickson erickerick...@gmail.com wrote:
 A couple of things:
 1 try searching with debugQuery=on attached to your URL, that'll
 give you some clues.
 2 It's really worthwhile exploring the admin pages for a while, it'll also
 give you a world of information. It takes a while to understand what the
 various pages are telling you, but you'll come to rely on them.
 3 Are you really searching with leading and trailing wildcards or is that
 just the mail changing bolding? Because this is tricky, very tricky. Search
 the mail archives for leading wildcard to see lots of discussion of this
 topic.

 You might back off a bit and try building up to wildcards if that's what
 you're doing

 HTH
 Erick

 On Mon, May 17, 2010 at 1:11 AM, Mint o_O! mint@gmail.com wrote:

 Hi,

 I'm working on the index/search project recently and i found solr which is
 very fascinating to me.

 I followed the test successful from the tutorial page. Starting up jetty
 and
 run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar
 *.xml*) so far so good at this stage.

 Now i have create my own testing westpac.xml file with real data I intend
 to
 implement, putting in exampledocs and again ran the command
 (user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*).
 Everything went on very well however when i searched for *rhode* which is
 in the content. And Index returned nothing.

 Could anyone guide me what I did wrong why i couldn't search for that word
 even though that word is in my index content.

 thanks,

 Mint





-- 
Lance Norskog
goks...@gmail.com

Re: Customized Solr DataImporter

2010-05-17 Thread kishan


Thanks for the reply,

 I dont know what pattern the user will configure the columns in a separate
table.i have to read this table to map the solr-fields to these columns ,so
i cant give dynamic fields also,and Transformers also seems to be no use in
this case.


Please provide me any other solution
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Customized-Solr-DataImporter-tp823556p825428.html
Sent from the Solr - User mailing list archive at Nabble.com.

54 matches

Mail list logo