date:20101222

I want to run delta-import in Crontab but don't know how.

I used php file in Crontab before, like:

command: php /home/user/public_html/auto.php

I tried:

command:
/home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import

It didn't work.

The url I run in browser is:

http://181.163.64.228:8983/solr/db/dataimport?command=delta-import

Thanks
Richard

Re: Crontab for delta-import

2010-12-22 Thread Stefan Moises


Hi,

you can use wget if available on your server, e.g. command
wget --quiet 
'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import'


Cheers,
Stefan

Am 22.12.2010 12:31, schrieb Ruixiang Zhang:

I want to run delta-import in Crontab but don't know how.

I used php file in Crontab before, like:

command: php /home/user/public_html/auto.php

I tried:

command:
/home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import

It didn't work.

The url I run in browser is:

http://181.163.64.228:8983/solr/db/dataimport?command=delta-import

Thanks
Richard



--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***

Re: Crontab for delta-import

Thanks for your quick reply. I couldn't find the wget on my server. Do you
know where it should be located or how I can check if I have it on my
server? If not, can I install one?

Thanks

On Wed, Dec 22, 2010 at 3:38 AM, Stefan Moises moi...@shoptimax.de wrote:

 Hi,

 you can use wget if available on your server, e.g. command
 wget --quiet '
 http://181.163.64.228:8983/solr/db/dataimport?command=delta-import'

 Cheers,
 Stefan

 Am 22.12.2010 12:31, schrieb Ruixiang Zhang:

  I want to run delta-import in Crontab but don't know how.

 I used php file in Crontab before, like:

 command: php /home/user/public_html/auto.php

 I tried:

 command:

 /home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import

 It didn't work.

 The url I run in browser is:

 http://181.163.64.228:8983/solr/db/dataimport?command=delta-import

 Thanks
 Richard


 --
 ***
 Stefan Moises
 Senior Softwareentwickler

 shoptimax GmbH
 Guntherstraße 45 a
 90461 Nürnberg
 Amtsgericht Nürnberg HRB 21703
 GF Friedrich Schreieck

 Tel.: 0911/25566-25
 Fax:  0911/25566-29
 moi...@shoptimax.de
 http://www.shoptimax.de
 ***

Re: Crontab for delta-import

2010-12-22 Thread Stefan Moises

Just call wget http://www.somedomain.com on the console to see if it is 
available...
Depends on your distro where it is installed and how to install it... I 
have mine in /usr/bin/wget

Alternatively, use lynx or curl as command, e.g.
curl --silent 
'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import'


Cheers,
Stefan

Am 22.12.2010 12:46, schrieb Ruixiang Zhang:
Thanks for your quick reply. I couldn't find the wget on my server. Do 
you know where it should be located or how I can check if I have it on 
my server? If not, can I install one?


Thanks

On Wed, Dec 22, 2010 at 3:38 AM, Stefan Moises moi...@shoptimax.de 
mailto:moi...@shoptimax.de wrote:


Hi,

you can use wget if available on your server, e.g. command
wget --quiet
'http://181.163.64.228:8983/solr/db/dataimport?command=delta-import'

Cheers,
Stefan

Am 22.12.2010 12:31, schrieb Ruixiang Zhang:

I want to run delta-import in Crontab but don't know how.

I used php file in Crontab before, like:

command: php /home/user/public_html/auto.php

I tried:

command:

/home/user/public_html/solr/apache-solr-1.4.1/example/example-DIH/solr/db/dataimport?command=delta-import

It didn't work.

The url I run in browser is:

http://181.163.64.228:8983/solr/db/dataimport?command=delta-import

Thanks
Richard


-- 
***

Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de mailto:moi...@shoptimax.de
http://www.shoptimax.de
***





--
***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***

Re: Query performance issue while using EdgeNGram

2010-12-22 Thread Shanmugavel SRD


1) Thanks for this update. I have to use 'WhiteSpaceTokenizer'
2) I have to suggest the whole query itself (Say name or title)
3) Could you please let me know if there is a way to find the evicted docs?
4) Yes, we are seeing improvement in the response time if we optimize. But
still for some queries QTime is more than 8 secs. It is a 'Blocker' for us.
Could you please suggest any to reduce the QTime to 1 secs.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-performance-issue-while-using-EdgeNGram-tp2097056p2130751.html
Sent from the Solr - User mailing list archive at Nabble.com.

ZkSolrResourceLoader does not support getConfigDir

2010-12-22 Thread Joanna Matusiak

Hi,
I got small problem with DIH for SolrCloud. I have specified my
dataSource settings in the seperated file: data-config.xml in the conf
folder (same folder where schema.xml and solrconfig are placed).
When I try importing my data from DB table for indexing I receive the
following problem:


ZkSolrResourceLoader does not support getConfigDir() - likely, what
you are trying to do is not supported in ZooKeeper mode

org.apache.solr.common.cloud.ZooKeeperException: ZkSolrResourceLoader
does not support getConfigDir() - likely, what you are trying to do is
not supported in ZooKeeper mode

at 
org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:97)
at 
org.apache.solr.handler.dataimport.DataImportHandler.getSolrWriter(DataImportHandler.java:282)
at 
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:198)

at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1329)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:343)

at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)

at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)

at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)

at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)

at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

Regards,
Joanna

Re: solrj http client 4

2010-12-22 Thread Stevo Slavić

Tried to checkout lucene/solr and setup projects and classpath in eclipse -
there seems to be circular dependency between modules - this is not
possible/allowed in maven built project, would require refactoring.

Regards,
Stevo.

On Wed, Dec 8, 2010 at 1:42 PM, Stevo Slavić ssla...@gmail.com wrote:

 OK, thanks. Can't promise anything, but would love to contribute. First
 impression on the source code - ant is used as build tool, wish it was
 maven. If it was maven then
 https://issues.apache.org/jira/browse/SOLR-1218 would be trivial or
 wouldn't exist in the first place.

 Regards,
 Stevo.


 On Wed, Dec 8, 2010 at 10:25 AM, Chantal Ackermann 
 chantal.ackerm...@btelligent.de wrote:

 SOLR-2020 addresses upgrading to HttpComponents (form HttpClient). I
 have had no time to work more on it, yet, though. I also don't have that
 much experience with the new version, so any help is much appreciated.

 Cheers,
 Chantal

 On Tue, 2010-12-07 at 18:35 +0100, Yonik Seeley wrote:
  On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić ssla...@gmail.com
 wrote:
   Hello solr users and developers,
  
   Are there any plans to upgraded http client dependency in solrj from
 3.x to
   4.x?
 
  I'd certainly be for moving to 4.x (and I think everyone else would
 too).
  The issue is that it's not a drop-in replacement, so someone needs to
  do the work.
 
  -Yonik
  http://www.lucidimagination.com
 
   Found this https://issues.apache.org/jira/browse/SOLR-861 ticket -
   judging by comments in it upgrade might help fix the issue. I have a
 project
   in jar hell, getting different versions of http client as transitive
   dependency...
  
   Regards,
   Stevo.

RE: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

This won't actually give you the number of distinct facet values, but will give 
you the number of documents matching your conditions. It's more equivalent to 
SQL without the distinct. 

There is no way in Solr 1.4 to get the number of distinct facet values. 

I am not sure about the new features in trunk.  

From: Peter Karich [peat...@yahoo.de]
Sent: Wednesday, December 22, 2010 6:10 AM
To: solr-user@lucene.apache.org
Subject: Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE 
length(field)  0 AND other_criteria

 facets=truefacet.field=field // SELECT count(distinct(field))
fq=field:[* TO *]  // WHERE length(field)  0
q=other_criteriaAfq=other_criteriaB// AND other_criteria

advantage: you can look into several fields at one time when adding
another facet.field
disadvantage: you get the counts splitted by the values of that field

fix this via field collapsing / results grouping
http://wiki.apache.org/solr/FieldCollapsing
or use deduplication: http://wiki.apache.org/solr/Deduplication

Regards,
Peter.

 Hi,

 Is there a way with faceting or field collapsing to do the SQL equivalent of

 SELECT count(distinct(field)) FROM index WHERE length(field)  0 AND
 other_criteria

 i.e. I'm only interested in the total count not the individual records
 and counts.

 Cheers,
 Dan


--
http://jetwick.com open twitter search

Re: Transparent redundancy in Solr

2010-12-22 Thread Jan Høydahl

Well, SolrCloud is not yet fully specified for the indexing side - more work 
remains.
But my point is that the architecture for should be ZK based.

I added a new jira issue to flesh out a strategy for SolrCloud controlled 
distributed indexing in SOLR-2293

Perhaps you should open a JIRA issue for indexer failover as well. The simplest 
model would be to promote one of the search slaves to master indexer, as each 
slave will have an (almost up-to-date) copy of the index. The client should 
then have a means of getting alerted about the failover and from what timestamp 
he will need to re-feed content (based on slave index date).

In my opinion it is extremely hard to try to solve some kind of always-in-sync 
instant failover, and most will not need it either.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 19. des. 2010, at 19.29, Upayavira wrote:

 Jan,
 
 I'd appreciate a little more explanation here. I've explored SolrCloud
 somewhat, but there's some bits of this architecture I don't yet get.
 
 You say, next time an indexer slave pings ZK. What is an indexer
 slave? Is that the external entity that is doing posting indexing
 content? If this app that posts to Solr, you imply it must check with ZK
 before it can do an HTTP post to Solr? Also, once you do this leader
 election to switch to an alternative master, are you implying that this
 new master was once a slave of the original master, and thus has a valid
 index?
 
 Find this interesting, but still not quite sure on how it works exactly.
 
 Upayavira
 
 On Fri, 17 Dec 2010 10:09 +0100, Jan Høydahl / Cominvent
 jan@cominvent.com wrote:
 Hi,
 
 I believe the way to go is through ZooKeeper[1], not property files or
 local hacks. We've already started on this route and it makes sense to
 let ZK do what it is designed for, such as leader election. When a node
 starts up, it asks ZK what role it should have and fetches corresponding
 configuration. Then it polls ZK regularly to know if the world has
 changed. So if a master indexer goes down, ZK will register that as a
 state change condition, and next time one of the indexer slaves pings ZK,
 it may be elected as new master, and config in ZK is changed
 correspondingly, causing all adds to flow to the new master...
 
 Then, when the slaves cannot contact their old master, they ask ZK for an
 update, and retrieve a new value for master URL.
 
 Note also that SolrCloud is implementing load-balancing and sharding as
 part of the arcitecture so often we can skip dedicated LBs.
 
 [1] : http://wiki.apache.org/solr/SolrCloud
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 
 On 15. des. 2010, at 18.50, Tommaso Teofili wrote:
 
 Hi all,
 me, Upayavira and other guys at Sourcesense have collected some Solr
 architectural views inside the presentation at [1].
 For sure one can set up an architecture for failover and resiliency on the
 search face (search slaves with coordinators and distributed search) but
 I'd like to ask how would you reach transparent redundancy in Solr on the
 index face.
 On slide 13 we put 2 slave backup masters and so if one of the main masters
 goes down you can switch slaves' replication on the backup master.
 First question if how could it be made automatic?
 In a previous thread [2] I talked about a possible solution writing the
 master url of slaves in a properties file so when you have to switch you
 change that url to the backup master and reload the slave's core but that is
 not automatic :-) Any more advanced ideas?
 Second question: when main master comes up how can it be automatically
 considered as the backup master (since hopefully the backup master has
 received some indexing requests in the meantime)? Also consider that its
 index should be wiped out and replicated from the new master to ensure index
 integrity.
 Looking forward for your feedback,
 Cheers,
 Tommaso
 
 [1] : http://www.slideshare.net/sourcesense/sharded-solr-setup-with-master
 [2] : http://markmail.org/thread/vjj5jovbg6evpmpp

RE: White space in facet values

The phrase solution works as does escaping the space with a backslash:  
fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped 
like this (amperstands and parenthesis come to mind)...

I assume you already have this indexed as string, not text...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy [mailto:angelf...@yahoo.com] 
Sent: Wednesday, December 22, 2010 1:11 AM
To: solr-user@lucene.apache.org
Subject: White space in facet values

How do I handle facet values that contain whitespace? Say I have a field 
Product that I want to facet on. A value for Product could be Electric 
Guitar. How should I handle the white space in Electric Guitar during 
indexing? What about when I apply the constraint fq=Product:Electric Guitar?

XInclude in multi core

2010-12-22 Thread Markus Jelsma

Hi,

In a test set up i have a master and slave in the same JVM but different cores. 
Of course i'd like to replicate configuration files and include some via 
XInclude.

The problem is the href path; it's can't use properties and is relative to the 
servlet container.

Here's the problem, i also replicate the solrconfig.xml so a include 
solr/corename/conf/file.xml will not work in the cores i replicate it to and i 
can't embed some corename property in the href to make it generic.

Anyone knows a trick here? Thanks!

Cheers,
-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: White space in facet values

2010-12-22 Thread Yonik Seeley

On Wed, Dec 22, 2010 at 9:53 AM, Dyer, James james.d...@ingrambook.com wrote:
 The phrase solution works as does escaping the space with a backslash:  
 fq=Product:Electric\ Guitar ... actually a lot of characters need to be 
 escaped like this (amperstands and parenthesis come to mind)...

One way to avoid escaping is to use the raw or term query parsers:

fq={!raw f=Product}Electric Guitar

In 4.0-dev, use {!term} since that will work with field types that
need to transform the external representation into the internal one
(like numeric fields need to do).

http://wiki.apache.org/solr/SolrQuerySyntax

-Yonik
http://www.lucidimagination.com




 I assume you already have this indexed as string, not text...

 James Dyer
 E-Commerce Systems
 Ingram Content Group
 (615) 213-4311


 -Original Message-
 From: Andy [mailto:angelf...@yahoo.com]
 Sent: Wednesday, December 22, 2010 1:11 AM
 To: solr-user@lucene.apache.org
 Subject: White space in facet values

 How do I handle facet values that contain whitespace? Say I have a field 
 Product that I want to facet on. A value for Product could be Electric 
 Guitar. How should I handle the white space in Electric Guitar during 
 indexing? What about when I apply the constraint fq=Product:Electric Guitar?

Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo

2010-12-22 Thread Anurag


What you want to ask? When this problem arises.? Is it when you try to index
to solr? What are the commands that you are running? Which version of solr(
1.4.1?).
On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] 
ml-node+2130906-265633473-146...@n3.nabble.comml-node%2b2130906-265633473-146...@n3.nabble.com
 wrote:

 Hello Erick,

 Could you kindly give a hand on my problem. Any ideas, hints,
 suggestions are highly appreciated. Many thanks

 1. The problem: Solr index directory '/solr/data/index' doesn't exist.
 Creating new index...
 2. Some other info.:

 - use the solr example 1.4.1
 - Geronimo 2.1.6
 - solr home: /opt/dev/config/solr
 - dataDir: /opt/dev/config/solr/data/index. I set the read, write right
 to every and each folder, from opt, dev...to the last one, index (just
 for sure ;) )
 - lockType:
  - single/ simple: Cannot create directory: /solr/data/index at
 org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
  - native: Cannot create directory: /solr/data/index at
 org.apache.lucene.store.NativeFSLockFactory.acquireTestLock

 - the Geronimo log:
 ===
 2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
 'help' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:07,941 INFO  [DirectoryMonitor] Hot deployer notified
 that an artifact was removed: default/solr2/1293005281314/war
 2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
 'help' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:14,139 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/plugin.Deployment!227983155|0'
 2010-12-22 15:13:18,795 WARN  [TomcatModuleBuilder] Web application .
 does not contain a WEB-INF/geronimo-web.xml deployment plan.  This may
 or may not be a problem, depending on whether you have things like
 resource references that need to be resolved.  You can also give the
 deployer a separate deployment plan file on the command line.
 2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Using JNDI solr.home:
 /opt/dev/config/solr
 2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Solr home set to
 '/opt/dev/config/solr/'
 2010-12-22 15:13:19,051 INFO  [SolrDispatchFilter]
 SolrDispatchFilter.init()
 
 2010-12-22 15:13:19,462 INFO  [IndexSchema] default search field is text
 2010-12-22 15:13:19,463 INFO  [IndexSchema] query parser default
 operator is OR
 2010-12-22 15:13:19,464 INFO  [IndexSchema] unique key field: id
 2010-12-22 15:13:19,490 INFO  [JmxMonitoredMap] JMX monitoring is
 enabled. Adding Solr mbeans to JMX Server:
 com.sun.jmx.mbeanserver.jmxmbeanser...@144752d
 2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
 org.apache.solr.core.QuerySenderListener{queries=[]}
 2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
 org.apache.solr.core.QuerySenderListener{queries=[{q=solr
 rocks,start=0,rows=10}, {q=static firstSearcher warming query from
 solrconfig.xml}]}
 2010-12-22 15:13:19,533 WARN  [SolrCore] Solr index directory
 '/solr/data/index' doesn't exist. Creating new index...
 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR.
 Check solr/home property
 java.lang.RuntimeException: java.io.IOException: Cannot create
 directory: /solr/data/index
  at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:545)
  at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)

  at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)

 ...
 2010-12-22 15:13:19,601 INFO  [SolrDispatchFilter]
 SolrDispatchFilter.init() done
 2010-12-22 15:13:19,601 INFO  [SolrServlet] SolrServlet.init()
 2010-12-22 15:13:19,602 INFO  [SolrResourceLoader] Using JNDI solr.home:
 /opt/dev/config/solr
 2010-12-22 15:13:19,602 INFO  [SolrServlet] SolrServlet.init() done
 2010-12-22 15:13:19,606 INFO  [SolrResourceLoader] Using JNDI solr.home:
 /opt/dev/config/solr
 2010-12-22 15:13:19,606 INFO  [SolrUpdateServlet]
 SolrUpdateServlet.init() done
 2010-12-22 15:13:19,721 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/plugin.Deployment!227983155|0'

 ===

 With regards,
 Bac Hoang




 --
  View message @
 http://lucene.472066.n3.nabble.com/Dismax-score-maximu-of-any-one-field-tp2119563p2130906.html
 To start a new topic under Solr - User, email
 ml-node+472068-1941297125-146...@n3.nabble.comml-node%2b472068-1941297125-146...@n3.nabble.com
 To unsubscribe from Solr - User, click

edismax inconsistency -- AND/OR

I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm 
experiencing inconsistent behavior with terms grouped in parenthesis.  
Sometimes they are AND'ed and sometimes OR'ed together.

1. q=Title:(life)defType=edismax   285 results
2. q=Title:(hope)defType=edismax   34 results

3. q=Title:(life AND hope)defType=edismax  1 result
4. q=Title:(life OR hope)defType=edismax  318 results
5. q=Title:(life hope)defType=edismax   1 result (life, hope are being AND'ed 
together)

6. q=Title:(life AND hope) AND Title:(life)defType=edismax  1 result
7. q=Title:(life OR hope) AND Title:(life)defType=edismax  285 result
8. q=Title:(life hope) AND Title:(life)defType=edismax  285 results (life, 
hope are being OR'ed together)

See how in #5, the two terms get AND'ed, but by adding the additional 
(nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or 
a bug?  Am I likely doing something wrong?

I've tried this both with ...defaultOperator=AND... and 
...defaultOperator=OR...  I've also tried the two settings with q.op.  It 
seems as if edismax doesn't use these at all.

When using the default query parser, I get consistent AND/OR logic as expected. 
 That is, if the defaultOperator (or q.op if specified) is always 
consistently applied.

As a workaround, I think I can just always insert the operator (as in examples 
6  7).  However, this is an extra burden on our clients that I'd like to avoid 
if at all possible.

See below for more configuration information.  Any ideas are appreciated.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


Snippets from schema.xml:

fieldType name=textStemmed class=solr.TextField positionIncrementGap=100
   analyzer type=index
  tokenizer 
class=solr.WhitespaceTokenizerFactory/
  filter class=solr.StopFilterFactory 
ignoreCase=true words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 
stemEnglishPossessive=1 /
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.PorterStemFilterFactory/
   /analyzer
   analyzer type=query
  tokenizer 
class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
  filter class=solr.StopFilterFactory 
ignoreCase=true words=stopwords.txt enablePositionIncrements=true /
  filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=0 catenateWords=0 
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 
stemEnglishPossessive=1 /
  filter class=solr.LowerCaseFilterFactory/
  filter class=solr.PorterStemFilterFactory/
   /analyzer
/fieldType

...

field name=Title type=textStemmed indexed=true stored=true 
multiValued=false omitNorms=true omitTermFreqAndPositions=false /

...

solrQueryParser defaultOperator=AND/

Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo

2010-12-22 Thread Gora Mohanty

On Wed, Dec 22, 2010 at 4:55 PM, Bac Hoang bac.ho...@axonactive.vn wrote:
 Hello Erick,

 Could you kindly give a hand on my problem. Any ideas, hints, suggestions
 are highly appreciated. Many thanks

 1. The problem: Solr index directory '/solr/data/index' doesn't exist.
 Creating new index...
 2. Some other info.:

 - use the solr example 1.4.1
 - Geronimo 2.1.6
 - solr home: /opt/dev/config/solr
 - dataDir: /opt/dev/config/solr/data/index.
[...]

Shouldn't the dataDir be /opt/dev/config/solr/data?

Alternatively, try removing /opt/dev/config/solr/data (please
first make sure  that you have no critical data there), and
restarting Solr. If dataDir is missing, Solr should create it.

Regards,
Gora

Solr Spellcheker automatically tokenizes on period marks

2010-12-22 Thread Sebastian M


Hello,


My main (full text) index contains the terms www, sometest, com, which
is intended and correct.

My spellcheck index contains the term www.sometest.com. which is also
intended and correct.

However, when querying the spellchecker using the query www.sometest.com,
I get the suggestion www.www.sometest.com.com, despite the fact that I'm
not using a tokenizer that splits on . (period marks) as part of my
spellcheck query analyzer. 

When running the Field Analyzer (in the Solr admin page), I can see that
even after the last filter (see below), my term text remains
www.sometest.com, which is untokenized, as expected. 

Any thoughts as to what may be causing this undesired tokenization?

To summarize:

Main index contains: www, sometest, com
Spellcheck index contains: www.sometest.com
Spellcheck query: www.sometest.com
Expected result: (no suggestion)
Actual result: www.www.sometest.com.com


Here is my spellcheck query analyzer:
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer



Thank you in advance; any suggestions are welcome!
Sebastian
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2131844.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr query to get results based on the word length (letter count)

2010-12-22 Thread Giri

Hi,

I have a solar index that has thousands of records, the title is one of the
solar fields, and I would like to query for title values that are less than
50 characters long. Is there a way to construct the Solr query to provide
results based on the character length?


thank you very much!

Re: Solr Spellcheker automatically tokenizes on period marks

2010-12-22 Thread Markus Jelsma

Check the analyzer of the field you defined for queryAnalyzerFieldType which is 
configured in the search component.

On Wednesday 22 December 2010 16:32:18 Sebastian M wrote:
 Hello,
 
 
 My main (full text) index contains the terms www, sometest, com,
 which is intended and correct.
 
 My spellcheck index contains the term www.sometest.com. which is also
 intended and correct.
 
 However, when querying the spellchecker using the query www.sometest.com,
 I get the suggestion www.www.sometest.com.com, despite the fact that I'm
 not using a tokenizer that splits on . (period marks) as part of my
 spellcheck query analyzer.
 
 When running the Field Analyzer (in the Solr admin page), I can see that
 even after the last filter (see below), my term text remains
 www.sometest.com, which is untokenized, as expected.
 
 Any thoughts as to what may be causing this undesired tokenization?
 
 To summarize:
 
 Main index contains: www, sometest, com
 Spellcheck index contains: www.sometest.com
 Spellcheck query: www.sometest.com
 Expected result: (no suggestion)
 Actual result: www.www.sometest.com.com
 
 
 Here is my spellcheck query analyzer:
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
   filter class=solr.StandardFilterFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 
 
 
 Thank you in advance; any suggestions are welcome!
 Sebastian

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

AW: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo

2010-12-22 Thread Bac Hoang


Hello Anurag,

The specific problem I faced when started solr in Geronimo 
(http://{server}:{port}/solr) is /solr/data/index could not be found, then solr 
tried to create that folder but failed, even permission is granted.

More detail got from the log: Solr index directory  '/solr/data/index' 
doesn't exist. Creating new index...
2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR. Check 
solr/home property
java.lang.RuntimeException: java.io.IOException: Cannot create directory: 
/solr/data/index

You're right, I'm using solr is 1.4.1

Thanks indeed
Bac Hoang
 


-Ursprüngliche Nachricht-
Von: Anurag [mailto:anurag.it.jo...@gmail.com]
Gesendet: tư 12/22/2010 10:17 CH
An: solr-user@lucene.apache.org
Betreff: Re: Solr index directory '/solr/data/index' doesn't exist. Creating 
new index... on Geronimo
 

What you want to ask? When this problem arises.? Is it when you try to index
to solr? What are the commands that you are running? Which version of solr(
1.4.1?).
On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] 
ml-node+2130906-265633473-146...@n3.nabble.comml-node%2b2130906-265633473-146...@n3.nabble.com
 wrote:

 Hello Erick,

 Could you kindly give a hand on my problem. Any ideas, hints,
 suggestions are highly appreciated. Many thanks

 1. The problem: Solr index directory '/solr/data/index' doesn't exist.
 Creating new index...
 2. Some other info.:

 - use the solr example 1.4.1
 - Geronimo 2.1.6
 - solr home: /opt/dev/config/solr
 - dataDir: /opt/dev/config/solr/data/index. I set the read, write right
 to every and each folder, from opt, dev...to the last one, index (just
 for sure ;) )
 - lockType:
  - single/ simple: Cannot create directory: /solr/data/index at
 org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
  - native: Cannot create directory: /solr/data/index at
 org.apache.lucene.store.NativeFSLockFactory.acquireTestLock

 - the Geronimo log:
 ===
 2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
 'help' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:07,941 INFO  [DirectoryMonitor] Hot deployer notified
 that an artifact was removed: default/solr2/1293005281314/war
 2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
 'help' not found for portletId: '/console-base.WARModules!874780194|0'
 2010-12-22 15:13:14,139 INFO  [SupportedModesServiceImpl] Portlet mode
 'edit' not found for portletId: '/plugin.Deployment!227983155|0'
 2010-12-22 15:13:18,795 WARN  [TomcatModuleBuilder] Web application .
 does not contain a WEB-INF/geronimo-web.xml deployment plan.  This may
 or may not be a problem, depending on whether you have things like
 resource references that need to be resolved.  You can also give the
 deployer a separate deployment plan file on the command line.
 2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Using JNDI solr.home:
 /opt/dev/config/solr
 2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Solr home set to
 '/opt/dev/config/solr/'
 2010-12-22 15:13:19,051 INFO  [SolrDispatchFilter]
 SolrDispatchFilter.init()
 
 2010-12-22 15:13:19,462 INFO  [IndexSchema] default search field is text
 2010-12-22 15:13:19,463 INFO  [IndexSchema] query parser default
 operator is OR
 2010-12-22 15:13:19,464 INFO  [IndexSchema] unique key field: id
 2010-12-22 15:13:19,490 INFO  [JmxMonitoredMap] JMX monitoring is
 enabled. Adding Solr mbeans to JMX Server:
 com.sun.jmx.mbeanserver.jmxmbeanser...@144752d
 2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
 org.apache.solr.core.QuerySenderListener{queries=[]}
 2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
 org.apache.solr.core.QuerySenderListener{queries=[{q=solr
 rocks,start=0,rows=10}, {q=static firstSearcher warming query from
 solrconfig.xml}]}
 2010-12-22 15:13:19,533 WARN  [SolrCore] Solr index directory
 '/solr/data/index' doesn't exist. Creating new index...
 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR.
 Check solr/home property
 java.lang.RuntimeException: java.io.IOException: Cannot create
 directory: /solr/data/index
  at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
  at org.apache.solr.core.SolrCore.init(SolrCore.java:545)
  at
 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)

  at
 org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83)

 ...
 2010-12-22 15:13:19,601 INFO  [SolrDispatchFilter]
 SolrDispatchFilter.init() done
 2010-12-22 15:13:19,601 INFO  [SolrServlet] SolrServlet.init()
 2010-12-22 15:13:19,602 INFO  [SolrResourceLoader] Using JNDI solr.home:

Re: Duplicate values in multiValued field

In my experience, that should work fine. Facetting in 1.4 works fine on 
multi-valued fields, and a duplicate value in the multi-valued field 
shouldn't be a problem.


On 12/22/2010 2:31 AM, Andy wrote:

If I put duplicate values into a multiValued field, would that cause any issues?

For example I have a multiValued field Color. Some of my documents have 
duplicate values for that field, such as: Green, Red, Blue, Green, Green.

Would the above (having 3 duplicate Green) be the same as having the duplicated 
values of: Green, Red, Blue?

Or do I need to clean my data and remove duplicate values before indexing?

Thanks.

Re: Solr query to get results based on the word length (letter count)

2010-12-22 Thread Gora Mohanty

On Wed, Dec 22, 2010 at 9:06 PM, Giri giriprak...@gmail.com wrote:
 Hi,

 I have a solar index that has thousands of records, the title is one of the
 solar fields, and I would like to query for title values that are less than
 50 characters long. Is there a way to construct the Solr query to provide
 results based on the character length?
[...]

One could write a custom query parser, but if one needed that, would it
not be easier to simply index the length of the title value as a separate
field?

Regards,
Gora

Re: Solr Spellcheker automatically tokenizes on period marks

2010-12-22 Thread Sebastian M


Hi and thanks for your reply,

My searchComponent is as such:

searchComponent name=spellcheck class=solr.SpellCheckComponent

str name=queryAnalyzerFieldTypetextSpell/str

...
/searchComponent


And then in my schema.xml, I have:

fieldType name=textSpell class=solr.TextField
positionIncrementGap=100 
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer 

...
/fieldType


Which is the analyzer I pasted in my original post. So this only confirms
that the query term is going through these filters and tokenizer, but none
of them splits on period marks.

Do you see any possible problems with my setup?

Thanks!
Sebastian
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Spellcheker-automatically-tokenizes-on-period-marks-tp2131844p2131959.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: White space in facet values

Another technique, which works great for facet fq's and avoids the need 
to worry about escaping, is using the field query parser instead:


fq={!field f=Product}Electric Guitar

Using the field query parser avoids the need for ANY escaping of your 
value at all, which is convenient in the facetting case -- you still 
need to URI-escape (ampersands for instance), but you shouldn't need to 
escape any Solr special characters like parens or double quotes or 
anything else, if you've made your string suitable for including in a 
URI. With the field query parser, a lot less to worry about.


http://lucene.apache.org/solr/api/org/apache/solr/search/FieldQParserPlugin.html

On 12/22/2010 9:53 AM, Dyer, James wrote:

The phrase solution works as does escaping the space with a backslash:  
fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped 
like this (amperstands and parenthesis come to mind)...

I assume you already have this indexed as string, not text...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy [mailto:angelf...@yahoo.com]
Sent: Wednesday, December 22, 2010 1:11 AM
To: solr-user@lucene.apache.org
Subject: White space in facet values

How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A 
value for Product could be Electric Guitar. How should I handle the white space in 
Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?

Re: White space in facet values

Huh, does !term in 4.0 mean the same thing as !field in 1.4?  What you 
describe as !term in 4.0 dev is what I understand as !field in 1.4 doing.


On 12/22/2010 10:01 AM, Yonik Seeley wrote:

On Wed, Dec 22, 2010 at 9:53 AM, Dyer, Jamesjames.d...@ingrambook.com  wrote:

The phrase solution works as does escaping the space with a backslash:  
fq=Product:Electric\ Guitar ... actually a lot of characters need to be escaped 
like this (amperstands and parenthesis come to mind)...

One way to avoid escaping is to use the raw or term query parsers:

fq={!raw f=Product}Electric Guitar

In 4.0-dev, use {!term} since that will work with field types that
need to transform the external representation into the internal one
(like numeric fields need to do).

http://wiki.apache.org/solr/SolrQuerySyntax

-Yonik
http://www.lucidimagination.com





I assume you already have this indexed as string, not text...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Andy [mailto:angelf...@yahoo.com]
Sent: Wednesday, December 22, 2010 1:11 AM
To: solr-user@lucene.apache.org
Subject: White space in facet values

How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A 
value for Product could be Electric Guitar. How should I handle the white space in 
Electric Guitar during indexing? What about when I apply the constraint fq=Product:Electric Guitar?

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

2010-12-22 Thread Erik Hatcher


On Dec 22, 2010, at 09:21 , Jonathan Rochkind wrote:

 This won't actually give you the number of distinct facet values, but will 
 give you the number of documents matching your conditions. It's more 
 equivalent to SQL without the distinct. 
 
 There is no way in Solr 1.4 to get the number of distinct facet values. 

That's not true - the total number of facet values is the distinct number of 
values in that field.   You need to be sure you have facet.limit=-1 (default is 
100) to see all values in the response rather than just a page of them though. 

Erik

Using two request handlers in the same query...

I have two request handlers set up something like this:

requestHandler name=Keyword_SI class=solr.SearchHandler 
lst name=defaults
 str name=defTypeedismax/str
 float name=tie0.01/float
 str name=qfTitle^130 Features^110 Edition^100 CTBR_SEARCH^90 
THEM_SEARCH^80 BSAC_SEARCH1^70/str
 str name=q.alt*:*/str
/lst
/requestHandler
requestHandler name=Title_SI class=solr.SearchHandler 
lst name=defaults
 str name=defTypeedismax/str
 float name=tie0.01/float
 str name=qfTitle^100 Edition^10 Series^1/str
 str name=q.alt*:*/str
/lst
/requestHandler

Is there any way to use both of these handlers for different parts of the query?

I have a case where a user can search by Title, then later search within 
their results by keyword.  I was trying to see if I could do this with local 
params, but it doesn't seem that you can specify a qt= like this:  
q={!qt=Title_SI}life

If this had worked (but it didn't), I was hoping I could solve my problem like 
this:  qt=Title_SIq=(life)  AND ( _query_:{!qt=Keyword_SI}faith) , using the 
technique found at 
http://www.lucidimagination.com/blog/2009/03/31/nested-queries-in-solr/

Is there any way to do this?  I'm using version 1.4.1

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

Configuration option for disableReplication

2010-12-22 Thread Francis Rhys-Jones

Hi,

I am looking into using a multi core configuration to allow us to
fully rebuild our index while still applying updates.

I have two cores main-core and rebuild-core. I push the whole dataset
into the rebuild core, during which time I can happily keep pushing
updates into the main-core. Once the rebuild is complete I swap the
cores and delete *:* from the rebuild core.

This works fine however there are a couple of edge cases:

On server restart solr needs to remember which core has been swapped
in to be the main core, this can be solved by adding the
persistent=true attribute to the solr config, however this does
require the solr.xml to be writeable.

While deploying a new version of our application we overwrite the
solr.xml, as the new version could potentially have legitimate changes
to the solr.xml that need to be rolled out, again leaving the cores
out of sync.

My proposed solution is to have the indexing process do some sanity
checking at the start of each run, and swap in the correct core if
necessary.

This works however there is the potential for the slaves to start
replicating the empty index before the correct index is swapped in.

To get round this problem I would like to have replication disabled on start up.

Removing  replicateAfter=startup has this affect but it would be more
future proof to be able to specify a default for the
replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
stopping replication until I explicitly turn it on.

The change looks fairly simple.

What do you think?

Francis
Please consider the environment before printing this email.
--
Visit guardian.co.uk - newspaper website of the year
www.guardian.co.uk  www.observer.co.uk

To save up to 33% when you subscribe to the Guardian and the Observer
visit http://www.guardian.co.uk/subscriber

-

This e-mail and all attachments are confidential and may also
be privileged. If you are not the named recipient, please notify
the sender and delete the e-mail and all attachments immediately.
Do not disclose the contents to another person. You may not use
the information for any purpose, or store, or copy, it in any way.

Guardian News  Media Limited is not liable for any computer
viruses or other material transmitted with or as part of this
e-mail. You should employ virus checking software.

Guardian News  Media Limited

A member of Guardian Media Group plc
Registered Office
PO Box 68164
Kings Place
90 York Way
London
N1P 2AP

Registered in England Number 908396

RE: solrj http client 4

2010-12-22 Thread Steven A Rowe

Stevo,

You may be interested in LUCENE-2657 
https://issues.apache.org/jira/browse/LUCENE-2657, which provides full POMs 
for Lucene/Solr trunk.

I don't use Eclipse, but I think it can use POMs to bootstrap project 
configuration.  (I know IntelliJ can do this.)

Steve

 -Original Message-
 From: Stevo Slavić [mailto:ssla...@gmail.com]
 Sent: Wednesday, December 22, 2010 9:17 AM
 To: solr-user@lucene.apache.org
 Subject: Re: solrj  http client 4
 
 Tried to checkout lucene/solr and setup projects and classpath in eclipse
 -
 there seems to be circular dependency between modules - this is not
 possible/allowed in maven built project, would require refactoring.
 
 Regards,
 Stevo.
 
 On Wed, Dec 8, 2010 at 1:42 PM, Stevo Slavić ssla...@gmail.com wrote:
 
  OK, thanks. Can't promise anything, but would love to contribute. First
  impression on the source code - ant is used as build tool, wish it was
  maven. If it was maven then
  https://issues.apache.org/jira/browse/SOLR-1218 would be trivial or
  wouldn't exist in the first place.
 
  Regards,
  Stevo.
 
 
  On Wed, Dec 8, 2010 at 10:25 AM, Chantal Ackermann 
  chantal.ackerm...@btelligent.de wrote:
 
  SOLR-2020 addresses upgrading to HttpComponents (form HttpClient). I
  have had no time to work more on it, yet, though. I also don't have
 that
  much experience with the new version, so any help is much appreciated.
 
  Cheers,
  Chantal
 
  On Tue, 2010-12-07 at 18:35 +0100, Yonik Seeley wrote:
   On Tue, Dec 7, 2010 at 12:32 PM, Stevo Slavić ssla...@gmail.com
  wrote:
Hello solr users and developers,
   
Are there any plans to upgraded http client dependency in solrj
 from
  3.x to
4.x?
  
   I'd certainly be for moving to 4.x (and I think everyone else would
  too).
   The issue is that it's not a drop-in replacement, so someone needs to
   do the work.
  
   -Yonik
   http://www.lucidimagination.com
  
Found this https://issues.apache.org/jira/browse/SOLR-861 ticket
 -
judging by comments in it upgrade might help fix the issue. I have
 a
  project
in jar hell, getting different versions of http client as
 transitive
dependency...
   
Regards,
Stevo.

Re: Solr query to get results based on the word length (letter count)

No good way. At indexing time, I'd just store the number of chars in the 
title in a field of it's own.  You can possibly do that solely in 
schema.xml with clever use of analyzers and copyField.


Solr isn't an rdbms.  Best to de-normalize at index time so what you're 
going to want to query is in the index.


On 12/22/2010 10:36 AM, Giri wrote:

Hi,

I have a solar index that has thousands of records, the title is one of the
solar fields, and I would like to query for title values that are less than
50 characters long. Is there a way to construct the Solr query to provide
results based on the character length?


thank you very much!

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

Well, that's true -- you can get the total number of facet values if you 
ALSO are willing to get back every facet value in the response.


If you've got a hundred thousand or so unique facet values, and what you 
really want is just the _count_ without ALSO getting back a very large 
response (and waiting for Solr to construct the very large response), 
then you're out of luck.


But if you're willing to get back all the values in the response too, 
that'll work, true.


On 12/22/2010 11:23 AM, Erik Hatcher wrote:

On Dec 22, 2010, at 09:21 , Jonathan Rochkind wrote:


This won't actually give you the number of distinct facet values, but will give you the 
number of documents matching your conditions. It's more equivalent to SQL without the 
distinct.

There is no way in Solr 1.4 to get the number of distinct facet values.

That's not true - the total number of facet values is the distinct number of 
values in that field.   You need to be sure you have facet.limit=-1 (default is 
100) to see all values in the response rather than just a page of them though.

Erik

Re: full text search in multiple fields


Hi guys,

There's one more thing to get this code to work as I need I just found
out...

Im now using: q=title_search:hort*defType=lucene 
as iorixxx suggested.

it works good BUT, this query doesnt find results if the title in DB is
Hortus supremus

I tried adding some tokenizers and filters to solve this, what I think is a
casing issue, but no luck...

below is my code...what am I missing here?

Thanks again!


fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType


field name=title type=text_ws indexed=true stored=true/
field name=title_search type=text indexed=true stored=true/
copyField source=title dest=title_search/
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2132659.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: White space in facet values


: Huh, does !term in 4.0 mean the same thing as !field in 1.4?  What you
: describe as !term in 4.0 dev is what I understand as !field in 1.4 doing.

There is a subtle distinction between {!field}, {!raw}, and {!term} which 
i attempted to explain on slides 26 and 43 in this presentation...

http://people.apache.org/~hossman/apachecon2010/facets/

(you can use the HTML controls or print preview to view the notes i had 
when giving it)

The nutshell explanation...

when building filter queries from facet constraint values:
* {!field} works in a lot of situations, but if you are using an analyzer 
on your facet field, there are some edge cases were it won't do what you 
expect.
* {!raw} is truely raw terms, which works in almost all cases where you 
are likely using facet.field -- but it's too raw for some field 
types that use binary term values (like Trie) 
* {!term} does exactly what you would expect/want in all cases when your 
input is a facet constraint.  it builts a term query from the human 
readable string representation (even if the internal representation is 
binary)

-Hoss

Re: Duplicate values in multiValued field


: If I put duplicate values into a multiValued field, would that cause any 
issues? 
: 
: For example I have a multiValued field Color. Some of my documents 
: have duplicate values for that field, such as: Green, Red, Blue, Green, 
: Green.
: 
: Would the above (having 3 duplicate Green) be the same as having the 
: duplicated values of: Green, Red, Blue?

they won't be exdactly the same: the doc with dup vlaues will have a 
higher length, so it's lengthNorm will be lower; and it will have a 
higher term frequency for the terms that ar duplicated.  in short, those 
documents won't score the same when searching the Color filed for any 
color.

-Hoss

Re: full text search in multiple fields


Did you reindex after you changed your analyzers?

On 12/22/2010 12:57 PM, PeterKerk wrote:

Hi guys,

There's one more thing to get this code to work as I need I just found
out...

Im now using:q=title_search:hort*defType=lucene
as iorixxx suggested.

it works good BUT, this query doesnt find results if the title in DB is
Hortus supremus

I tried adding some tokenizers and filters to solve this, what I think is a
casing issue, but no luck...

below is my code...what am I missing here?

Thanks again!


fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt
ignoreCase=true expand=false/
--
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords_dutch.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=0 catenateNumbers=0
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
/fieldType


field name=title type=text_ws indexed=true stored=true/
field name=title_search type=text indexed=true stored=true/
copyField source=title dest=title_search/

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Erick Erickson

No, one cannot ignore the schema. If you try to add a field not in the
schema you get
an error. One could, however, use any arbitrary subset
of the fields defined in the schema for any particular #document# in the
index. Say
your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
doc, and
fields f6-f10 in another and f1, f4, f9 in another and.

The only field(s) that #must# be in a document are the required=true
fields.

There's no real penalty for omitting fields from particular documents. This
allows
you to store special documents that aren't part of normal searches.

You could, for instance, use a document to store meta-information about your
index that had whatever meaning you wanted in a field(s) that *no* other
document
had. Your app could then read that special document and make use of that
info.
Searches on normal documents wouldn't return that doc, etc.

You could effectively have N indexes contained in one index where a document
in each logical sub-index had fields disjoint from the other logical
sub-indexes.
Why you'd do something like that rather than use cores is a very good
question,
but you #could# do it that way...

All this is much different from a database where there are penalties for
defining
a large number of unused fields.

Whether doing this is wise or not given the particular problem you're trying
to
solve is another discussion G..

Best
Erick

On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote:

Based on more searches and manual consolidation, I've put together some of
the ideas for this already suggested in a summary below. The last item in
the
summary
seems to be interesting, low technical cost way of doing it.

Basically, it treats the index like a 'BigTable', a la No SQL.

Erick Erickson pointed out:
...but there's absolutely no requirement
that all documents in SOLR have the same fields...

I guess I don't have the right understanding of what goes into a Document
in Solr. Is it just a set of fields, each with it's own independent field
type
declaration/id, it's name, and it's content?

So even though there's a schema for an index, one could ignore it and
jsut throw any other named fields and types and content at document
addition
time?

So If I wanted to search on a base set, all documents having it, I could
then
additionally filter based on the (might be wrong use of this) dynamic
fields?

Origninal Thread that I started:

http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html

Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):

1/ Base object of some kind, x number of fields
2/ Derived objects representing Divisiion in company, different customer
bases,
etc.
each having 2 additional, unique fields.
3/ Assume 1000 such derived object types
4/ A 'flattened' Index would have the x base object fields,
and 2000 additional fields

Solutions Posited
---

A/ First thought, muliti-value columns as key pairs.
1/ Difficult to access individual items of more than one 'word' length
for querying in multivalued fields.
2/ All sorts of statistical stuff probably wouldn't apply?
3/ (James Dayer said:) There's also one gotcha we've experienced
when
searching acrosse
multi-valued fields: SOLR will match across field occurences.
In the example below, if you were to search
q=contrib_name:(james
AND smith),
you will get this record back. It matches one name from one
contributor and

another name from a different contributor. This is not what
our
users want.

As a work-around, I am converting these to phrase queries with
slop: james smith~50 ... Just use a slop # smaller than your
positionIncrementGap

and bigger than the # of terms entered. This will prevent the
cross-field matches

yet allow the words to occur in any order.

The problem with this approach is that Lucene doesn't support
wildcards in phrases
B/ Dynamic fields was suggested, but I am not sure exactly how they
work, and the person who suggested it was not sure it would work,
either.
C/ Different field naming conventions were suggested in field types were
similar.
I can't predict that.
D/ Found this old thread, and i had other suggestions:
1/ Use multiple cores, one for each record type/schema, aggregate
them in
during the query.
2/ Use a fixed number of additional fields X 2. Eatch additional
field is
actually a pair of fields.
The first

Re: Consequences for using multivalued on all fields

2010-12-22 Thread Erick Erickson

PositionIncrementGap for multiValued fields is, perhaps, the most
interesting
difference. One of the drivers here is, say, indexing across some boundary
that you don't want phrases or near clauses to match. For instance, say you
have text with
sentences, and your requirement is that phrases don't match across sentence
boundaries. One way to handle that is to add successive sentences to a
multivalued
field and define that field with a large increment gap.

But otherwise, as far as I know, there's no difference worth mentioning
between
indexing a bunch of stuff as one long string or breaking it up into multiple
segments in a multivalued field with the increment gap set to 1, except for
edge cases like the sorting thing Geert-Jan mentions

Best
Erick

On Tue, Dec 21, 2010 at 12:49 PM, Dennis Gearon gear...@sbcglobal.netwrote:

Thanks you for the input. You might have seen my posts about doing a
flexible
schema for derived objects. Sounds like dynamic fields might be the ticket.

We'll be ready to test the idea in about a month, mabye 3 weeks. I'll post
a
comment about it whn it gets there.

I don't know if I would gain anything, but I think that ALL boolean that
were
NOT in the base object but wehre in the derived objects could be put into
one
field and textually positioned key:pairs, at least for searh purposes.

Since the derived object would have it's own, additional methods, one of
those
methods could be to 'unserialize' the 'boolean column'. In fact, that could
be a
base object function - Empty boolean column values just end up not
populating
any extra base object attiributes.

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them
yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message
From: kenf_nc ken.fos...@realestate.com
To: solr-user@lucene.apache.org
Sent: Tue, December 21, 2010 6:07:51 AM
Subject: Re: Consequences for using multivalued on all fields

I have about 30 million documents and with the exception of the Unique ID,
Type and a couple of date fields, every document is made of dynamic fields.
Now, I only have maybe 1 in 5 being multi-value, but search and facet
performance doesn't look appreciably different from a fixed schema
solution.
I don't do some of the fancier things, highlighting, spell check, etc. And
I
use a lot more string or lowercase field types than I do Text (so not as
many fully tokenized fields), that probably helps with performance.

The only disadvantage I know of is dealing with field names at runtime.
Depending on your architecture, you don't really know what your document
looks like until you have it in a result set. For what I'm doing, that
isn't
a problem.
--
View this message in context:

http://lucene.472066.n3.nabble.com/Consequences-for-using-multivalued-on-all-fields-tp2125867p2126120.html

Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Dennis Gearon

I'm open to cores, if it's the faster(indexing/querying/keeping mentally 
straight) way to do things.

But from what you say below, the eventual goal of the site would mean either 
100 
extra 'generic' fields, or 1,000-100,000's of cores.
Probably cores is easier to administer for security and does more accurate 
querying?

What is the relationship between dynamic fields and the schema?

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 10:44:27 AM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

No, one cannot ignore the schema. If you try to add a field not in the
schema you get
an error. One could, however, use any arbitrary subset
of the fields defined in the schema for any particular #document# in the
index. Say
your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
doc, and
fields f6-f10 in another and f1, f4, f9 in another and.

The only field(s) that #must# be in a document are the required=true
fields.

There's no real penalty for omitting fields from particular documents. This
allows
you to store special documents that aren't part of normal searches.

You could, for instance, use a document to store meta-information about your
index that had whatever meaning you wanted in a field(s) that *no* other
document
had. Your app could then read that special document and make use of that
info.
Searches on normal documents wouldn't return that doc, etc.

You could effectively have N indexes contained in one index where a document
in each logical sub-index had fields disjoint from the other logical
sub-indexes.
Why you'd do something like that rather than use cores is a very good
question,
but you #could# do it that way...

All this is much different from a database where there are penalties for
defining
a large number of unused fields.

Whether doing this is wise or not given the particular problem you're trying
to
solve is another discussion G..

Best
Erick

On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Based on more searches and manual consolidation, I've put together some of
 the ideas for this already suggested in a summary below. The last item in
 the
 summary
 seems to be interesting, low technical cost way of doing it.

 Basically, it treats the index like a 'BigTable', a la No SQL.

 Erick Erickson pointed out:
 ...but there's absolutely no requirement
 that all documents in SOLR have the same fields...

 I guess I don't have the right understanding of what goes into a Document
 in Solr. Is it just a set of fields, each with it's own independent field
 type
 declaration/id, it's name, and it's content?

 So even though there's a schema for an index, one could ignore it and
 jsut throw any other named fields and types and content at document
 addition
 time?

 So If I wanted to search on a base set, all documents having it, I could
 then
 additionally filter based on the (might be wrong use of this) dynamic
 fields?






 Origninal Thread that I started:
 

http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html
l


-
-

 Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):

-
-


 1/ Base object of some kind, x number of fields
 2/ Derived objects representing Divisiion in company, different customer
 bases,
 etc.
  each having 2 additional, unique fields.
 3/ Assume 1000 such derived object types
 4/ A 'flattened' Index would have the x base object fields,
and 2000 additional fields


 
 Solutions Posited
 ---

 A/ First thought, muliti-value columns as key pairs.
  1/ Difficult to access individual items of more than one 'word' length
 for querying in multivalued fields.
  2/ All sorts of statistical stuff probably wouldn't apply?
  3/ (James Dayer said:) There's also one gotcha we've experienced
 when
 searching acrosse
multi-valued fields:  SOLR will match across field occurences.
 In the  example below, if you were to search
 q=contrib_name:(james
 AND smith),
 you will get this record back.  It matches one name from one
 contributor  and

 another name from a different contributor.  This is not what
 our
 users want.


 As a work-around,

Re: full text search in multiple fields


Certainly did!
Why, are you saying this code is correct as-is?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133022.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Case Insensitive sorting while preserving case during faceted search

: Hoss, I think the use case being asked about is specifically doing a
: facet.sort though, for cases where you actually do want to sort facet values
: with facet.sort, not sort records -- while still presenting the facet values
: with original case, but sorting them case insensitively.

Ah yes ... thank you, i did in fact missunderstand the question.

: Because I'm pretty sure there isn't really any good solution for this, Solr
: just won't do that, just how it goes.

correct.  the facet constraint values come from indexed terms, and the 
terms are what get sorted by facet.sort -- if you want to collapse some 
terms down so they are equivilent (ie: Foo and foo and foo  are 
treated identical) then that's what you get back.

if your goal is just to have pretty values, you can use things like the 
CapitalizationFilter, but if you need a particularly complex analyzer for 
your values in order for them to sort a certain way, you can't then get 
back the original pre-analyzed values.

One way people deal with this typ of situation, is to index identifers for 
their facet constraints, and then their UI uses those ids to lookupthe 
display value (ie: index categoryId, display categoryName) ... this has 
the added benefit of allowing you to change category names w/o 
re-indexing.

-Hoss

Re: Solr index directory '/solr/data/index' doesn't exist. Creating new index... on Geronimo

2010-12-22 Thread Anurag


So problem may be that index folder was not able to create. So try to
check the conf folder where solconfig.xml  schema.xml resides. Also u may
try to index using $java -jar post.jar *.xml files.

You may try different version like 1.3.0 or 1.4.0 to test what is wrong. It
sometimes happens that the downloaded solr may have something missing.


On Wed, Dec 22, 2010 at 9:18 PM, Bac Hoang [via Lucene] 
ml-node+2131930-846132511-146...@n3.nabble.comml-node%2b2131930-846132511-146...@n3.nabble.com
 wrote:


 Hello Anurag,

 The specific problem I faced when started solr in Geronimo 
 (http://{server}:{port}/solr)
 is /solr/data/index could not be found, then solr tried to create that
 folder but failed, even permission is granted.

 More detail got from the log: Solr index directory  '/solr/data/index'
 doesn't exist. Creating new index...
 2010-12-22 15:13:19,599 ERROR [SolrDispatchFilter] Could not start SOLR.
 Check solr/home property
 java.lang.RuntimeException: java.io.IOException: Cannot create directory:
 /solr/data/index

 You're right, I'm using solr is 1.4.1

 Thanks indeed
 Bac Hoang



 -Ursprüngliche Nachricht-
 Von: Anurag [mailto:[hidden 
 email]http://user/SendEmail.jtp?type=nodenode=2131930i=0]

 Gesendet: tư 12/22/2010 10:17 CH
 An: [hidden email] http://user/SendEmail.jtp?type=nodenode=2131930i=1
 Betreff: Re: Solr index directory '/solr/data/index' doesn't exist.
 Creating new index... on Geronimo


 What you want to ask? When this problem arises.? Is it when you try to
 index
 to solr? What are the commands that you are running? Which version of solr(

 1.4.1?).
 On Wed, Dec 22, 2010 at 5:49 PM, Bac Hoang [via Lucene] 
 [hidden email] http://user/SendEmail.jtp?type=nodenode=2131930i=2[hidden
 email] http://user/SendEmail.jtp?type=nodenode=2131930i=3
  wrote:

  Hello Erick,
 
  Could you kindly give a hand on my problem. Any ideas, hints,
  suggestions are highly appreciated. Many thanks
 
  1. The problem: Solr index directory '/solr/data/index' doesn't exist.
  Creating new index...
  2. Some other info.:
 
  - use the solr example 1.4.1
  - Geronimo 2.1.6
  - solr home: /opt/dev/config/solr
  - dataDir: /opt/dev/config/solr/data/index. I set the read, write right
  to every and each folder, from opt, dev...to the last one, index (just
  for sure ;) )
  - lockType:
   - single/ simple: Cannot create directory: /solr/data/index at
  org.apache.solr.core.SolrCore.initIndex(SolrCore.java:397)
   - native: Cannot create directory: /solr/data/index at
  org.apache.lucene.store.NativeFSLockFactory.acquireTestLock
 
  - the Geronimo log:
  ===
  2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
  'edit' not found for portletId: '/console-base.WARModules!874780194|0'
  2010-12-22 15:13:03,001 INFO  [SupportedModesServiceImpl] Portlet mode
  'help' not found for portletId: '/console-base.WARModules!874780194|0'
  2010-12-22 15:13:07,941 INFO  [DirectoryMonitor] Hot deployer notified
  that an artifact was removed: default/solr2/1293005281314/war
  2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
  'edit' not found for portletId: '/console-base.WARModules!874780194|0'
  2010-12-22 15:13:09,148 INFO  [SupportedModesServiceImpl] Portlet mode
  'help' not found for portletId: '/console-base.WARModules!874780194|0'
  2010-12-22 15:13:14,139 INFO  [SupportedModesServiceImpl] Portlet mode
  'edit' not found for portletId: '/plugin.Deployment!227983155|0'
  2010-12-22 15:13:18,795 WARN  [TomcatModuleBuilder] Web application .
  does not contain a WEB-INF/geronimo-web.xml deployment plan.  This may
  or may not be a problem, depending on whether you have things like
  resource references that need to be resolved.  You can also give the
  deployer a separate deployment plan file on the command line.
  2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Using JNDI solr.home:
  /opt/dev/config/solr
  2010-12-22 15:13:19,040 INFO  [SolrResourceLoader] Solr home set to
  '/opt/dev/config/solr/'
  2010-12-22 15:13:19,051 INFO  [SolrDispatchFilter]
  SolrDispatchFilter.init()
  
  2010-12-22 15:13:19,462 INFO  [IndexSchema] default search field is text
  2010-12-22 15:13:19,463 INFO  [IndexSchema] query parser default
  operator is OR
  2010-12-22 15:13:19,464 INFO  [IndexSchema] unique key field: id
  2010-12-22 15:13:19,490 INFO  [JmxMonitoredMap] JMX monitoring is
  enabled. Adding Solr mbeans to JMX Server:
  com.sun.jmx.mbeanserver.jmxmbeanser...@144752d
  2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
  org.apache.solr.core.QuerySenderListener{queries=[]}
  2010-12-22 15:13:19,525 INFO  [SolrCore] Added SolrEventListener:
  org.apache.solr.core.QuerySenderListener{queries=[{q=solr
  rocks,start=0,rows=10}, {q=static firstSearcher warming query from
  solrconfig.xml}]}
  2010-12-22 15:13:19,533 WARN  [SolrCore] Solr index directory
  '/solr/data/index' doesn't exist. Creating new index...
  2010-12-22 15:13:19,599

Re: Configuration option for disableReplication

2010-12-22 Thread Upayavira

I've just done a bit of playing here, because I've spent a lot of time
reading the SolrReplication wiki page[1], and have often wondered how
some features interact.

Unfortunately, if you specify str name=enablefalse/str in your
replication request handler for your master, you cannot re-enable it
with a call to /solr/replication?command=enablereplication

Therefore, it would seem your best bet is to call
/solr/replication?command=disablepolling on all of your slaves prior to
upgrading. Then, when you're sure everything is right, call
/solr/replication?command=enablepolling on each slave, and you should be
good to go.

I tried this, watching the request log on my master, and the incoming
replication requests did actually stop due to the disablepolling
command, so you should be fine with this approach.

Does this get you to where you want to be?

Upayavira

On Wed, 22 Dec 2010 17:10 +, Francis Rhys-Jones
francis.rhys-jo...@guardian.co.uk wrote:
 Hi,
 
 I am looking into using a multi core configuration to allow us to
 fully rebuild our index while still applying updates.
 
 I have two cores main-core and rebuild-core. I push the whole dataset
 into the rebuild core, during which time I can happily keep pushing
 updates into the main-core. Once the rebuild is complete I swap the
 cores and delete *:* from the rebuild core.
 
 This works fine however there are a couple of edge cases:
 
 On server restart solr needs to remember which core has been swapped
 in to be the main core, this can be solved by adding the
 persistent=true attribute to the solr config, however this does
 require the solr.xml to be writeable.
 
 While deploying a new version of our application we overwrite the
 solr.xml, as the new version could potentially have legitimate changes
 to the solr.xml that need to be rolled out, again leaving the cores
 out of sync.
 
 My proposed solution is to have the indexing process do some sanity
 checking at the start of each run, and swap in the correct core if
 necessary.
 
 This works however there is the potential for the slaves to start
 replicating the empty index before the correct index is swapped in.
 
 To get round this problem I would like to have replication disabled on
 start up.
 
 Removing  replicateAfter=startup has this affect but it would be more
 future proof to be able to specify a default for the
 replicationEnabled field (see SOLR-1175) in the ReplcationHandler,
 stopping replication until I explicitly turn it on.
 
 The change looks fairly simple.
--- 
Enterprise Search Consultant at Sourcesense UK, 
Making Sense of Open Source

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Lance Norskog

A dynamic field just means that the schema allows any field with a
name matching the wildcard. That's all.

There is no support for referring to all of the existing fields in the
wildcard. That is, there is no support for *_en:word as a field
search. Nor is there any kind of grouping for facets. The feature for
addressing a particular field in some of the parameters does not
support wildcards. If you add wildcard fields, you have to remember
what they are.

On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon gear...@sbcglobal.net wrote:
I'm open to cores, if it's the faster(indexing/querying/keeping mentally
straight) way to do things.

But from what you say below, the eventual goal of the site would mean either
100
extra 'generic' fields, or 1,000-100,000's of cores.
Probably cores is easier to administer for security and does more accurate
querying?

What is the relationship between dynamic fields and the schema?

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a
better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message
From: Erick Erickson erickerick...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 10:44:27 AM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

The only field(s) that #must# be in a document are the required=true
fields.

There's no real penalty for omitting fields from particular documents. This
allows
you to store special documents that aren't part of normal searches.

All this is much different from a database where there are penalties for
defining
a large number of unused fields.

Whether doing this is wise or not given the particular problem you're trying
to
solve is another discussion G..

Best
Erick

On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote:

Basically, it treats the index like a 'BigTable', a la No SQL.

Erick Erickson pointed out:
...but there's absolutely no requirement
that all documents in SOLR have the same fields...

I guess I don't have the right understanding of what goes into a Document
in Solr. Is it just a set of fields, each with it's own independent field
type
declaration/id, it's name, and it's content?

So even though there's a schema for an index, one could ignore it and
jsut throw any other named fields and types and content at document
addition
time?

So If I wanted to search on a base set, all documents having it, I could
then
additionally filter based on the (might be wrong use of this) dynamic
fields?

Origninal Thread that I started:

http://lucene.472066.n3.nabble.com/A-schema-inside-a-Solr-Schema-Schema-in-a-can-tt2103260.html
l

-
-

Repeat of the problem, (not actual ratios, numbers, i.e. could be WORSE!):

-
-

Solutions Posited
---

A/ First thought, muliti-value columns as key pairs.
1/ Difficult to access individual items of more than one 'word' length

Re: edismax inconsistency -- AND/OR

2010-12-22 Thread Shawn Heisey


On 12/22/2010 8:25 AM, Dyer, James wrote:

I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm 
experiencing inconsistent behavior with terms grouped in parenthesis.  
Sometimes they are AND'ed and sometimes OR'ed together.

1. q=Title:(life)defType=edismax  285 results
2. q=Title:(hope)defType=edismax  34 results

3. q=Title:(life AND hope)defType=edismax  1 result
4. q=Title:(life OR hope)defType=edismax  318 results
5. q=Title:(life hope)defType=edismax  1 result (life, hope are being AND'ed 
together)

6. q=Title:(life AND hope) AND Title:(life)defType=edismax  1 result
7. q=Title:(life OR hope) AND Title:(life)defType=edismax  285 result
8. q=Title:(life hope) AND Title:(life)defType=edismax  285 results (life, 
hope are being OR'ed together)

See how in #5, the two terms get AND'ed, but by adding the additional 
(nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature or 
a bug?  Am I likely doing something wrong?


The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as minimum 'should' match.  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.


This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.


Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:


http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!


Shawn

Re: hole RAM using by solr during Optimize

2010-12-22 Thread Shawn Heisey


On 12/22/2010 2:56 AM, stockii wrote:

Hello.

I have a RAM problem during a optimize.

When is start an delta or full import, solr using only this ram which i
allocate to him.
eg.: java -jar -Xmx2g start.jar

when solr is fetching the rows from database the using of ram ist okay. But
when solr begin to otimize,  solr want all of the available ram 
?!?!?!?!?!?

why is it so. the used Ram jumpes into the sky and only 40 MB Ram is free,
of 8 GB !!! how can i limit this ?


Is it Solr that's using all the RAM, or the OS disk cache?  I have found 
other messages from you that say you're on Linux, so going with that 
assumption, you can see everything if you run the 'top' command and 
press shift-M to sort it by memory usage.  Solr (java) should be at the 
top of the list, and the RES (or maybe RSS, depending on flavor) column 
will tell you how much RAM it's using.  Having only 40MB free memory is 
typical for a Linux system.


Above the process list are a bunch of indicators that give you the 
overall RAM usage.  The number on the bottom right is cached.  This 
refers to the OS disk cache, and it probably has the bulk of your 
usage.  Below is what my screen looks like.  Solr is using 1.4GB of RAM 
(out of 2.5GB possible), the disk cache is using 7.5GB, and I have less 
than 30MB free.


top - 15:20:04 up 34 days, 16 min,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  68 total,   2 running,  66 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni,100.0%id,  0.0%wa,  0.0%hi,  0.0%si,  
0.0%st

Mem:   9437184k total,  9407424k used,29760k free,   165464k buffers
Swap:  1048568k total,   68k used,  1048500k free,  7527788k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
22928 ncindex   20   0 2574m 1.4g   9m S  0.0 15.1 432:36.91 java
21319 root  15   0 90136 3424 2668 R  0.0  0.0   0:00.01 sshd

If it's your disk cache that's using up most of your memory, it's 
perfectly normal.  Solr is not to blame for it, and you do not want to 
change it.  If you're worried about memory usage because you have 
performance issues, I can try to narrow it down for you.  That will 
require more information, starting with your 'top' output, total index 
size, and if you're using distributed search, how big each shard is.  I 
am likely to ask for further information beyond that.


Shawn

Re: full text search in multiple fields


 Certainly did!
 Why, are you saying this code is correct as-is?

Yes, the query q=title_search:hort*defType=lucene should return documents 
having Hortus supremus in their title field with the configurations you send 
us.

It should exists somewhere in the result set, if not in the top 10.

Try a few things to make sure your document is indexed.

q=title_search:Hortus supremusdefType=lucenefl=title,title_search
q=title:Hortus supremusdefType=lucenefl=title,title_search

Are they returning that document? Or find that document's unique id and query 
it.

Re: DIH for taxonomy faceting in Lucid webcast

: 
: 1) My categories are stored in database as coded numbers instead of 
: fully spelled out names. For example I would have a category of 2/7 
: and a lookup dictionary to convert 2/7 into NonFic/Science. How do I 
: do such lookup in DIH?

My advice: don't.

I thought i mentioned this in that webcast, but if you've already got 
unique identifiers for your category names, keep using them in your 
index/facets, and then have your front end application resolve them into 
pretty category names.  it's usually just as easy to do apply the labels 
at query time as at index time, and if you do it at query time you can 
tweak the labels w/o reindexing.

: 2) Once I have the fully spelled out category path such as 
: NonFic/Science, how do I turn that into 0/NonFic  
: 1/NonFic/Science using the DIH?

I don't have any specific suggestions for you -- i've never tried it in 
DIH myself.  the ScriptTransformer might be able to help you out, but i'm 
not sure.

: 3) Some of my categories are multi-words containing whitespaces, such as 
: Computer Science and Functional Programming, so I'd have facet 
: values such as 2/NonFic/Computer Science/Functional Programming.  How 
: do I handle whitespaces in this case? Would filtering by fq still work?

a) it should if you use the {!raw} qparser
b) if you follow my advice in #1, it won't matter.



-Hoss

Re: full text search in multiple fields


Ok, I was trying to hide the actual name of the location, because I dont want
it to get indexed by search engines AND its a bit of a weird name :p

The name of the location in the database is: Museumrestaurant De Pappegay

Anyway, here it is, I executed the queries you gave me, and this is the
result:

DOC FOUND:
http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title_search:%22pappegay%22defType=lucenefl=title,title_search
http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title_search:%22Pappegay%22defType=lucenefl=title,title_search

http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title:%22Pappegay%22defType=lucenefl=title,title_search

NO DOC FOUND:
http://localhost:8983/solr/db/select/?indent=onfacet=truesort=membervalue%20descsort=location_rating%20descq=title:%22pappegay%22defType=lucenefl=title,title_search
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133915.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Different Results..

--- On Wed, 12/22/10, satya swaroop satya.yada...@gmail.com wrote:

 From: satya swaroop satya.yada...@gmail.com
 Subject: Different Results..
 To: solr-user@lucene.apache.org
 Date: Wednesday, December 22, 2010, 10:44 AM
 Hi All,
          i am getting
 different results when i used with some escape keys..
 for example:::
 1) when i use this request
             http://localhost:8080/solr/select?q=erlang!ericson

    the result obtained is

    result name=response numFound=1934
 start=0

 2) when the request is
              http://localhost:8080/solr/select?q=erlang/ericson

     the result is

           result
 name=response numFound=1 start=0

 My query here is, do solr consider both the queries
 differently and what do
 it consider for !,/ and all other escape characters.

First of all ! has a special meaning. it means NOT. It is part of the query 
syntax. It is equivalent to minus - operator. 

q=erlang!ericson is parsed into : 
defaultSearchField:erlang -defaultSearchField:ericson

You can see this by appending debugQuery=on to your search URL.

So you need to escape ! in your case. 
q=erlang\!ericson will return same result set as q=erlang/ericson

You can see the complete list of special charter list.
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping Special 
Characters

Re: full text search in multiple fields

 
 The name of the location in the database is:
 Museumrestaurant De Pappegay

What was the wildcard query for this?

Sorting results on MULTIPLE fields, not showing expected order


I want to sort results as follows
- highest membervalue (float) on top.
- within those results I want to sort the items that share the same position
on the user rating (integer), once again, highest rating on top
- and within those results I want to sort the items that share the same
position on the fact if they have a photo (bit)


Now I have this:
fq=themes:%22Boat%20and%20Water%22sort=hasphoto%20descq=*:*fl=id,title

I see the correct item on top.

But when I have the full query:
fq=themes:%22Boat%20and%20Water%22sort=membervalue
%20descsort=location_rating%20descsort=hasphoto%20descq=*:*fl=id,title

An item appears on top that has:
membervalue=0.00
location_rating=0
hasphoto=false

There are other location that have either a higher membervalue, a
locationrating or a photo. This location should NOT be on top. 
Why is this happening?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-on-MULTIPLE-fields-not-showing-expected-order-tp2133959p2133959.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: edismax inconsistency -- AND/OR

Shawn,

Thank you for the reply.  The URL you gave was helpful and Smiley  Pugh even 
more so.  On Smiley  Pugh page 140, they indicate that mm=100% using dismax is 
analogous to Standard's q.op=AND.  This is exactly what I need.

However...testing with these queries and edismax, I get different # of results:

q=Title:(life hope) AND Title:(life)q.op=AND  (STANDARD Q.P.) - 1 result
q=Title:(life AND hope) AND Title:(life)defType=edismax - 1 result
q=Title:(life hope) AND Title:(life)defType=edismaxmm=100% - 285 results 
(ut-oh.  looks like the first 2 get OR'ed)

The dismax parser seems to behave as documented:

q=life hope lifedefType=dismaxrows=0qf=Titlemm=0% - 285 results (results 
are OR'ed as expected)
q=life hope lifedefType=dismaxrows=0qf=Titlemm=100% - 1 result (results are 
AND'ed as expected)

Unfortunately I need to be able to combine the use of pf with key:value 
syntax, wildcards, etc, so I need to use edismax, I think.

With a quick glance at ExtendedDismaxQParserPlugin, I'm finding...
 - MM is ignored if there are any of these operators in the query (OR NOT + -)  
... but AND is ok (line 227)
 - MM is ignored if the parse method did not return a BooleanQuery instance 
(line 244)
 - MM is used after all regardless of operators used in the query, so long as 
its a BooleanQuery (line 286)
 - The default MM value is 100% if not specified in the query parameters 
(lines 241, 283)
Given the apparent contradiction here, my very quick analysis is surely missing 
something!  But if this is accurate, then the trick is to formulate the query 
in such a way so that parse returns an instance of BooleanQuery, right?  

Any more advice anyone can give is appreciated!  For the client I'm responsible 
for, I'm just inserting explicit operators between all of the user's queries.  
But for the client I'm not responsible for I would love to have a workaround 
for the other developers!  I think they'd appreciate it...

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Wednesday, December 22, 2010 4:08 PM
To: solr-user@lucene.apache.org
Subject: Re: edismax inconsistency -- AND/OR

On 12/22/2010 8:25 AM, Dyer, James wrote:
 I'm using SOLR 1.4.1 with SOLR-1553 applied (edismax query parser).  I'm 
 experiencing inconsistent behavior with terms grouped in parenthesis.  
 Sometimes they are AND'ed and sometimes OR'ed together.

 1. q=Title:(life)defType=edismax  285 results
 2. q=Title:(hope)defType=edismax  34 results

 3. q=Title:(life AND hope)defType=edismax  1 result
 4. q=Title:(life OR hope)defType=edismax  318 results
 5. q=Title:(life hope)defType=edismax  1 result (life, hope are being 
 AND'ed together)

 6. q=Title:(life AND hope) AND Title:(life)defType=edismax  1 result
 7. q=Title:(life OR hope) AND Title:(life)defType=edismax  285 result
 8. q=Title:(life hope) AND Title:(life)defType=edismax  285 results (life, 
 hope are being OR'ed together)

 See how in #5, the two terms get AND'ed, but by adding the additional 
 (nonsense) clause in #8, the first two terms get OR'ed .  Is this a feature 
 or a bug?  Am I likely doing something wrong?

The dismax parser doesn't pay any attention to the default query 
operator.  in the absence of these values in the actual query, edismax 
likely doesn't either.  What matters is the value of the mm variable, 
also known as minimum 'should' match.  If your mm value is 50%, which 
is a common value to see in dismax examples, I believe it would behave 
exactly like you are seeing.

This is a complex little beast.  Just a couple of weeks ago, Chris 
Hostetter said that although he wrote the code and the syntax for mm, 
the explanation for the parameter that's in the Smiley and Pugh Solr 
book (pages 138-140) is the clearest he's ever seen.

Here's some detailed documentation on it.  I can't find my copy of the 
book right now, so I don't know if this is as good as what's in it:

http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html

Hopefully this is applicable to you, and not something you already 
thought of!

Shawn

Any way to tie corresponding values together in different multiValued fields?

2010-12-22 Thread Andy

I have products, each has a specific Product ID.

For certain products such as Shirts, there are also extra fields such as 
Size and Color.

Right now I define both Size and Color as multiValued fields. And when I 
have a Shirt of Size M and Color white, I just put M in Size and white in 
Color. Now if I have another shirt with the same Product ID but Size L and 
Color blue, I add L to Size and blue to Color.

This causes a problem during faceting. If a user filters on M for Size and 
blue for Color, he'd get a match. But in reality there isn't a shirt with 
Size M and Color blue.

Is there any way to encode the data to tie Size M to Color white, and to tie 
Size L to Color blue so that the filtering would come out right? How should I 
handle this use case?

Thanks.

Re: Sorting results on MULTIPLE fields, not showing expected order


--- On Thu, 12/23/10, PeterKerk vettepa...@hotmail.com wrote:

 From: PeterKerk vettepa...@hotmail.com
 Subject: Sorting results on MULTIPLE fields, not showing expected order
 To: solr-user@lucene.apache.org
 Date: Thursday, December 23, 2010, 1:01 AM
 
 I want to sort results as follows
 - highest membervalue (float) on top.
 - within those results I want to sort the items that share
 the same position
 on the user rating (integer), once again, highest rating on
 top
 - and within those results I want to sort the items that
 share the same
 position on the fact if they have a photo (bit)
 
 
 Now I have this:
 fq=themes:%22Boat%20and%20Water%22sort=hasphoto%20descq=*:*fl=id,title
 
 I see the correct item on top.
 
 But when I have the full query:
 fq=themes:%22Boat%20and%20Water%22sort=membervalue
 %20descsort=location_rating%20descsort=hasphoto%20descq=*:*fl=id,title
 
 An item appears on top that has:
 membervalue=0.00
 location_rating=0
 hasphoto=false
 
 There are other location that have either a higher
 membervalue, a
 locationrating or a photo. This location should NOT be on
 top. 
 Why is this happening?


Multiple sort orderings can be separated by a comma, ie: sort=field 
name+direction[,field name+direction]... [1]

[1]http://wiki.apache.org/solr/CommonQueryParameters#sort

Re: full text search in multiple fields


Mmmm, this is strange:

When I do:
q=title_search:Pappegay*defType=luceneq=*:*fl=id,title

nothing is found.

but if I do:
q=title_search:PappegaydefType=luceneq=*:*fl=id,title

the location IS found.

I do need a wildcard though, since users may also search on parts of the
title (as described earlier in this post). But this looks almost as if the
location is not found if the wildcard is on the end and the searched string
is no longer than the position of the wildcard(if that makes sense :)
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2133991.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Sorting results on MULTIPLE fields, not showing expected order


Wow, you're fast :)
But that indeed did the trick, thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Sorting-results-on-MULTIPLE-fields-not-showing-expected-order-tp2133959p2134000.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: DIH for taxonomy faceting in Lucid webcast

2010-12-22 Thread Andy


--- On Wed, 12/22/10, Chris Hostetter hossman_luc...@fucit.org wrote:

 : 2) Once I have the fully spelled out category path such
 as 
 : NonFic/Science, how do I turn that into 0/NonFic
  
 : 1/NonFic/Science using the DIH?
 
 I don't have any specific suggestions for you -- i've never
 tried it in 
 DIH myself.  the ScriptTransformer might be able to
 help you out, but i'm 
 not sure.

Thanks Chris.

What did you use to generate those encodings if not DIH?

Re: full text search in multiple fields

 
 When I do:
 q=title_search:Pappegay*defType=luceneq=*:*fl=id,title
 
 nothing is found.
 
 but if I do:
 q=title_search:PappegaydefType=luceneq=*:*fl=id,title
 
 the location IS found.
 
 I do need a wildcard though, since users may also search on
 parts of the
 title (as described earlier in this post). But this looks
 almost as if the
 location is not found if the wildcard is on the end and the
 searched string
 is no longer than the position of the wildcard(if that
 makes sense :)

Why are you using two q parameters in your search URL? 
q=*:*q=title_search:Pappegay*

Re: full text search in multiple fields

 
 When I do:
 q=title_search:Pappegay*defType=luceneq=*:*fl=id,title
 
 nothing is found.
 

This is expected since you have lowercase filter in your index analyzer. 
Wildcard searches are not analyzed. So you need to lowercase your query on 
client side. q=title_search:pappegay*defType=lucenefl=id,title

Re: full text search in multiple fields


Oeps, sloppy, was a copy paste error.

I now have: 

WORKING:
http://localhost:8983/solr/db/select/?indent=onq=title_search:PappegaydefType=lucenefl=id,title

NOT WORKING:
http://localhost:8983/solr/db/select/?indent=onq=title_search:Pappegay*defType=lucenefl=id,title
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/full-text-search-in-multiple-fields-tp1888328p2134044.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Recap on derived objects in Solr Index, 'schema in a can'

2010-12-22 Thread Dennis Gearon

I think I'm just going to have to have my partner and I play with both cores 
and 
dynamic fields.

If multiple cores are queried, and the schemas match up in order and postion 
for 
the base fields, the 'extra' fields in the different cores just show up in the 
result set with their field names? The query against different cores, with 
'base 
attributes' and 'extended attributes' has to be tailored for each core, right? 
I.E., not querying for fields that don't exist?

(That could be handled by making the query a server side langauge object with 
inheritance for the extended fields)

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Lance Norskog goks...@gmail.com
To: solr-user@lucene.apache.org
Sent: Wed, December 22, 2010 1:45:04 PM
Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

A dynamic field just means that the schema allows any field with a
name matching the wildcard. That's all.

There is no support for referring to all of the existing fields in the
wildcard. That is, there is no support for *_en:word as a field
search. Nor is there any kind of grouping for facets. The feature for
addressing a particular field in some of the parameters does not
support wildcards. If you add wildcard fields, you have to remember
what they are.

On Wed, Dec 22, 2010 at 11:04 AM, Dennis Gearon gear...@sbcglobal.net wrote:
 I'm open to cores, if it's the faster(indexing/querying/keeping mentally
 straight) way to do things.

 But from what you say below, the eventual goal of the site would mean either 
100
 extra 'generic' fields, or 1,000-100,000's of cores.
 Probably cores is easier to administer for security and does more accurate
 querying?

 What is the relationship between dynamic fields and the schema?

  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a 
better
 idea to learn from others’ mistakes, so you do not have to make them yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Erick Erickson erickerick...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Wed, December 22, 2010 10:44:27 AM
 Subject: Re: Recap on derived objects in Solr Index, 'schema in a can'

 No, one cannot ignore the schema. If you try to add a field not in the
 schema you get
 an error. One could, however, use any arbitrary subset
 of the fields defined in the schema for any particular #document# in the
 index. Say
 your schema had fields f1, f2, f3...f10. You could have fields f1-f5 in one
 doc, and
 fields f6-f10 in another and f1, f4, f9 in another and.

 The only field(s) that #must# be in a document are the required=true
 fields.

 There's no real penalty for omitting fields from particular documents. This
 allows
 you to store special documents that aren't part of normal searches.

 You could, for instance, use a document to store meta-information about your
 index that had whatever meaning you wanted in a field(s) that *no* other
 document
 had. Your app could then read that special document and make use of that
 info.
 Searches on normal documents wouldn't return that doc, etc.

 You could effectively have N indexes contained in one index where a document
 in each logical sub-index had fields disjoint from the other logical
 sub-indexes.
 Why you'd do something like that rather than use cores is a very good
 question,
 but you #could# do it that way...

 All this is much different from a database where there are penalties for
 defining
 a large number of unused fields.

 Whether doing this is wise or not given the particular problem you're trying
 to
 solve is another discussion G..

 Best
 Erick

 On Mon, Dec 20, 2010 at 11:03 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 Based on more searches and manual consolidation, I've put together some of
 the ideas for this already suggested in a summary below. The last item in
 the
 summary
 seems to be interesting, low technical cost way of doing it.

 Basically, it treats the index like a 'BigTable', a la No SQL.

 Erick Erickson pointed out:
 ...but there's absolutely no requirement
 that all documents in SOLR have the same fields...

 I guess I don't have the right understanding of what goes into a Document
 in Solr. Is it just a set of fields, each with it's own independent field
 type
 declaration/id, it's name, and it's content?

 So even though there's a schema for an index, one could ignore it and
 jsut throw any other named fields and types and content at document
 addition
 time?

 So If I wanted to search on a base set, all

Re: Query performance issue while using EdgeNGram

2010-12-22 Thread Erick Erickson

Hmmm. find evicted docs? If you mean find out how many docs are deleted,
look
on the admin schema browser page and the difference between MaxDoc and
NumDocs
is the number of deleted documents.

You say for some queries the QTime is more than 8 secs. What happens if
you
re-run that query a bit later? The reason I ask is if you're not warming the
cache that
that particular query uses, you may be seeing cache loading time here.

Look at the admin stats page, especially for evictions. It's also possible
that your
caches are being reclaimed for some queries and you're seeing response
time spikes when the caches are re-loaded.

Best
Erick

On Wed, Dec 22, 2010 at 7:10 AM, Shanmugavel SRD
srdshanmuga...@gmail.comwrote:


 1) Thanks for this update. I have to use 'WhiteSpaceTokenizer'
 2) I have to suggest the whole query itself (Say name or title)
 3) Could you please let me know if there is a way to find the evicted docs?
 4) Yes, we are seeing improvement in the response time if we optimize. But
 still for some queries QTime is more than 8 secs. It is a 'Blocker' for us.
 Could you please suggest any to reduce the QTime to 1 secs.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Query-performance-issue-while-using-EdgeNGram-tp2097056p2130751.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Print highlighting descriptions

I want to print the highlighting descriptions:

{responseHeader:{status:0,
QTime:2,params:{hl.fl:description,json.wrf:jsonp1293069622009,wt:json,q:target,hl:true}},response:{numFound:7945,start:0,maxScore:6.9186745,docs:[{description:target,url:target,id:269653,score:6.9186745},{description:Target
The
Woodlands,url:Target_The_Woodlands,id:37277,score:4.3241715},{description:Target
Kent,url:Target_Kent,id:37275,score:4.3241715}]},
highlighting:{269653:{description:[emtarget/em
]},37277:{description:[emTarget/em The
Woodlands]},37275:{description:[emTarget/em
Kent]}}}

I know the descriptions in docs is:  response.response.docs[i].description
But I don't know how to print out the highlighting descriptions, such as
emTarget/em Kent (No need to highlight, just print out).

Thanks
Ruixiang

Re: Print highlighting descriptions

2010-12-22 Thread Koji Sekiguchi


(10/12/23 11:56), Ruixiang Zhang wrote:

I want to print the highlighting descriptions:

{responseHeader:{status:0,
QTime:2,params:{hl.fl:description,json.wrf:jsonp1293069622009,wt:json,q:target,hl:true}},response:{numFound:7945,start:0,maxScore:6.9186745,docs:[{description:target,url:target,id:269653,score:6.9186745},{description:Target
The
Woodlands,url:Target_The_Woodlands,id:37277,score:4.3241715},{description:Target
Kent,url:Target_Kent,id:37275,score:4.3241715}]},
highlighting:{269653:{description:[emtarget/em
]},37277:{description:[emTarget/em  The
Woodlands]},37275:{description:[emTarget/em
Kent]}}}

I know the descriptions in docs is:  response.response.docs[i].description
But I don't know how to print out the highlighting descriptions, such as
emTarget/em  Kent (No need to highlight, just print out).


Ruixiang,

If you meant that you want to get Target Kent instead of emTarget/em  
Kent,
you can change em tags to empty string by using hl.simple.pre/hl.simple.post
parameters:

http://wiki.apache.org/solr/HighlightingParameters#hl.simple.pre.2BAC8-hl.simple.post

Koji
--
http://www.rondhuit.com/en/

Re: Print highlighting descriptions

Thanks Koji. Actually my question is:

We can use  response.response.docs[i].description
to print the description in docs.

What expression should we use to print the description in highlighting?

Re: Print highlighting descriptions

2010-12-22 Thread Koji Sekiguchi


(10/12/23 14:10), Ruixiang Zhang wrote:

Thanks Koji. Actually my question is:

We can use  response.response.docs[i].description
to print the description in docs.

What expression should we use to print the description in highlighting?


Ruixiang,

I cannot understand your question. Is it Solr question? :)
You said No need to highlight, just print out in your previous mail,
then asked above???

What do you mean by expression and print?

Koji
--
http://www.rondhuit.com/en/

Re: Print highlighting descriptions