Re: DataImport

2008-06-11 Thread Shalin Shekhar Mangar
Hi Mihails,

The solr home is a directory which contains the conf/ and data/ folders. The
conf folder contains solrconfig.xml, schema.xml and other such configuration
files. The data/ folder contains the index files.

Other than adding the war file to tomcat, you also need to designate a
certain folder as solr home, so that solr knows from where to load it's
configuration. By default, solr searches for a folder named solr under the
current working directory (pwd) to use as home. There are other ways of
configuring it as given in solr wiki. Hope that helpes.

2008/6/11 Mihails Agafonovs [EMAIL PROTECTED]:

 I've already done that, but cannot access solr via web, and apache log
 says something wrong with solr home directory.
 -
 Couldn't start SOLR. Check solr/home property.
 -
  Quoting Chakraborty, Kishore K. : Mihails,
  Put the solr.war into the webapps directory and restart tomcat, then
 follow up the console and you'll see messages saying solr.war is
 getting deployed.
  Use a recent nightly build as that has the dataimport related patch
 included.
  Regards
  Kishore.
  -Original Message-
  From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
  Sent: Wednesday, June 11, 2008 1:13 PM
  To: solr-user@lucene.apache.org
  Subject: Re: DataImport
  If I've copied the solr.war under tomcat/webapps directory, after
  restarting it the archive extracts itself and I get solr directory.
  Why do I need to set example-solr-home/solr, which is not in the
  /webapps directory, as home directory?
  Quoting Shalin Shekhar Mangar : No, the steps are as follows:
  1. Download the example-solr-home.jar from the DataImportHandler
  wiki page
  2. Extract it. You'll find a folder named example-solr-home and a
  solr.war
  file after extraction
  3. Copy the solr.war to tomcat_home/webapps. You don't need any
  other solr
  instance. This war is self-sufficient.
  4. You need to set the example-solr-home/solr folder as the solr
  home
  folder. For instructions on how to do that, look at
  http://wiki.apache.org/solr/SolrTomcat
  From the port number of the URL you are trying, it seems that you're
  using
  the Jetty supplied with Solr instead of Tomcat.
  2008/6/9 Mihails Agafonovs :
   I've placed the solr.war under the tomcat directory, restarted
  tomcat
   to deploy the solr.war. But still... there is no .jar, no folder
  named
   example-data-config, and hitting
   http://localhost:8983/solr/dataimport doesn't work.
   Do I need the original Solr instance to use this .war with?
Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar. You
  can
   use the solr.war file. If you really
need a jar, you'll need to use the SOLR-469.patch at
http://issues.apache.org/jira/browse/SOLR-469 and build solr from
   source
after applying that patch.
2. The jar contains a folder named example-solr-home. Please
  check
   again.
Please let me know if you run into any problems.
2008/6/9 Mihails Agafonovs :
 Looked through the tutorial on data import, section Full
  Import
 Example.
 1) Where is this dataimport.jar? There is no such file in the
 extracted example-solr-home.jar.
 2) Use the solr folder inside example-data-config folder as
  your
 solr home. What does this mean? Anyway, there is no folder
 example-data-config.
  Ar cieņu, Mihails
--
Regards,
Shalin Shekhar Mangar.
Ar cieņu, Mihails
  
   Links:
   --
   [1] mailto:[EMAIL PROTECTED]
  
  --
  Regards,
  Shalin Shekhar Mangar.
  Ar cieņu, Mihails
  Links:
  --
  [1] mailto:[EMAIL PROTECTED]
  Ar cieņu, Mihails

 Links:
 --
 [1] mailto:[EMAIL PROTECTED]




-- 
Regards,
Shalin Shekhar Mangar.


Re: DataImport

2008-06-11 Thread Mihails Agafonovs
I'm stuck...

I now have /tomcat5.5/webapps/solr (exploded solr.war),
/tomcat5.5/webapps/solr/solr-example/.
I've ran

export
JAVA_OPTS=$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/
to make /example/solr/ as a home directory.

What am I doing wrong?

 Quoting Shalin Shekhar Mangar : Hi Mihails,
 The solr home is a directory which contains the conf/ and data/
folders. The
 conf folder contains solrconfig.xml, schema.xml and other such
configuration
 files. The data/ folder contains the index files.
 Other than adding the war file to tomcat, you also need to designate
a
 certain folder as solr home, so that solr knows from where to load
it's
 configuration. By default, solr searches for a folder named solr
under the
 current working directory (pwd) to use as home. There are other ways
of
 configuring it as given in solr wiki. Hope that helpes.
 2008/6/11 Mihails Agafonovs :
  I've already done that, but cannot access solr via web, and apache
log
  says something wrong with solr home directory.
  -
  Couldn't start SOLR. Check solr/home property.
  -
   Quoting Chakraborty, Kishore K. : Mihails,
   Put the solr.war into the webapps directory and restart tomcat,
then
  follow up the console and you'll see messages saying solr.war is
  getting deployed.
   Use a recent nightly build as that has the dataimport related
patch
  included.
   Regards
   Kishore.
   -Original Message-
   From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
   Sent: Wednesday, June 11, 2008 1:13 PM
   To: solr-user@lucene.apache.org
   Subject: Re: DataImport
   If I've copied the solr.war under tomcat/webapps directory, after
   restarting it the archive extracts itself and I get solr
directory.
   Why do I need to set example-solr-home/solr, which is not in the
   /webapps directory, as home directory?
   Quoting Shalin Shekhar Mangar : No, the steps are as follows:
   1. Download the example-solr-home.jar from the DataImportHandler
   wiki page
   2. Extract it. You'll find a folder named example-solr-home and
a
   solr.war
   file after extraction
   3. Copy the solr.war to tomcat_home/webapps. You don't need any
   other solr
   instance. This war is self-sufficient.
   4. You need to set the example-solr-home/solr folder as the solr
   home
   folder. For instructions on how to do that, look at
   http://wiki.apache.org/solr/SolrTomcat
   From the port number of the URL you are trying, it seems that
you're
   using
   the Jetty supplied with Solr instead of Tomcat.
   2008/6/9 Mihails Agafonovs :
I've placed the solr.war under the tomcat directory, restarted
   tomcat
to deploy the solr.war. But still... there is no .jar, no
folder
   named
example-data-config, and hitting
http://localhost:8983/solr/dataimport doesn't work.
Do I need the original Solr instance to use this .war with?
 Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar.
You
   can
use the solr.war file. If you really
 need a jar, you'll need to use the SOLR-469.patch at
 http://issues.apache.org/jira/browse/SOLR-469 and build solr
from
source
 after applying that patch.
 2. The jar contains a folder named example-solr-home. Please
   check
again.
 Please let me know if you run into any problems.
 2008/6/9 Mihails Agafonovs :
  Looked through the tutorial on data import, section Full
   Import
  Example.
  1) Where is this dataimport.jar? There is no such file in
the
  extracted example-solr-home.jar.
  2) Use the solr folder inside example-data-config folder as
   your
  solr home. What does this mean? Anyway, there is no folder
  example-data-config.
   Ar cieņu, Mihails
 --
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails
   
Links:
--
[1] mailto:[EMAIL PROTECTED]
   
   --
   Regards,
   Shalin Shekhar Mangar.
   Ar cieņu, Mihails
   Links:
   --
   [1] mailto:[EMAIL PROTECTED]
   Ar cieņu, Mihails
 
  Links:
  --
  [1] mailto:[EMAIL PROTECTED]
 
 -- 
 Regards,
 Shalin Shekhar Mangar.
 Ar cieņu, Mihails

Links:
--
[1] mailto:[EMAIL PROTECTED]


Re: DataImportHandler : How to mix XPathEntityProcessor and TemplateTransformer

2008-06-11 Thread Nicolas Pastorino

Thanks a million for your time and help.
It indeed works smoothly now.

I also, by the way, had to apply the patch attached to the  
following message :
http://www.nabble.com/Re%3A-How-to-describe-2-entities-in-dataConfig- 
for-the-DataImporter--p17577610.html
in order to have the TemplateTransformer to not throw Null Pointer  
exceptions :)


Cheers !
--
Nicolas Pastorino

On Jun 10, 2008, at 18:05 , Noble Paul നോബിള്‍  
नोब्ळ् wrote:



It is a bug, nice catch
there needs to be a null check there in the method
can us just try replacing the method with the following?

private Node getMatchingChild(XMLStreamReader parser) {
  if(childNodes == null) return null;
  String localName = parser.getLocalName();
  for (Node n : childNodes) {
if (n.name.equals(localName)) {
  if (n.attribAndValues == null)
return n;
  if (checkForAttributes(parser, n.attribAndValues))
return n;
}
  }
  return null;
}

I tried with that code and it is working. We shall add it in the  
next patch



--Noble
On Tue, Jun 10, 2008 at 9:11 PM, Nicolas Pastorino [EMAIL PROTECTED] wrote:
I just forgot to mention the error related to the description  
below. I get

the following when running a full-import ( sorry for the noise .. ) :

SEVERE: Full Import failed
java.lang.RuntimeException: java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords 
(XPathRecordReader.java:85)

   at
org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery 
(XPathEntityProcessor.java:207)

   at
org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow( 
XPathEntityProcessor.java:161)

   at
org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow 
(XPathEntityProcessor.java:144)

   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument 
(DocBuilder.java:280)

   at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument 
(DocBuilder.java:302)

   at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump 
(DocBuilder.java:173)

   at
org.apache.solr.handler.dataimport.DocBuilder.execute 
(DocBuilder.java:134)

   at
org.apache.solr.handler.dataimport.DataImporter.doFullImport 
(DataImporter.java:323)

   at
org.apache.solr.handler.dataimport.DataImporter.rumCmd 
(DataImporter.java:374)

   at
org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBod 
y(DataImportHandler.java:179)

   at
org.apache.solr.handler.RequestHandlerBase.handleRequest 
(RequestHandlerBase.java:125)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
   at
org.apache.solr.servlet.SolrDispatchFilter.execute 
(SolrDispatchFilter.java:338)

   at
org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.java:272)

   at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter 
(ServletHandler.java:1089)

   at
org.mortbay.jetty.servlet.ServletHandler.handle 
(ServletHandler.java:365)

   at
org.mortbay.jetty.security.SecurityHandler.handle 
(SecurityHandler.java:216)

   at
org.mortbay.jetty.servlet.SessionHandler.handle 
(SessionHandler.java:181)

   at
org.mortbay.jetty.handler.ContextHandler.handle 
(ContextHandler.java:712)

   at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at
org.mortbay.jetty.handler.ContextHandlerCollection.handle 
(ContextHandlerCollection.java:211)

   at
org.mortbay.jetty.handler.HandlerCollection.handle 
(HandlerCollection.java:114)

   at
org.mortbay.jetty.handler.HandlerWrapper.handle 
(HandlerWrapper.java:139)

   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)

   at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete 
(HttpConnection.java:821)

   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
   at org.mortbay.jetty.HttpParser.parseAvailable 
(HttpParser.java:208)
   at org.mortbay.jetty.HttpConnection.handle 
(HttpConnection.java:378)

   at
org.mortbay.jetty.bio.SocketConnector$Connection.run 
(SocketConnector.java:226)

   at
org.mortbay.thread.BoundedThreadPool$PoolThread.run 
(BoundedThreadPool.java:442)

Caused by: java.lang.NullPointerException
   at
org.apache.solr.handler.dataimport.XPathRecordReader 
$Node.getMatchingChild(XPathRecordReader.java:198)

   at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse 
(XPathRecordReader.java:171)

   at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse 
(XPathRecordReader.java:174)

   at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse 
(XPathRecordReader.java:174)

   at
org.apache.solr.handler.dataimport.XPathRecordReader$Node.access 
$000(XPathRecordReader.java:89)

   at
org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords 

Re: DataImportHandler : How to mix XPathEntityProcessor and TemplateTransformer

2008-06-11 Thread Noble Paul നോബിള്‍ नोब्ळ्
We are cutting a a patch which incorporates all the recent bug fixes,
so that you guys do not have to apply patches over patches

--Noble

On Wed, Jun 11, 2008 at 3:49 PM, Nicolas Pastorino [EMAIL PROTECTED] wrote:
 Thanks a million for your time and help.
 It indeed works smoothly now.

 I also, by the way, had to apply the patch attached to the following
 message :
 http://www.nabble.com/Re%3A-How-to-describe-2-entities-in-dataConfig-for-the-DataImporter--p17577610.html
 in order to have the TemplateTransformer to not throw Null Pointer
 exceptions :)

 Cheers !
 --
 Nicolas Pastorino

 On Jun 10, 2008, at 18:05 , Noble Paul നോബിള്‍ नोब्ळ् wrote:

 It is a bug, nice catch
 there needs to be a null check there in the method
 can us just try replacing the method with the following?

 private Node getMatchingChild(XMLStreamReader parser) {
  if(childNodes == null) return null;
  String localName = parser.getLocalName();
  for (Node n : childNodes) {
if (n.name.equals(localName)) {
  if (n.attribAndValues == null)
return n;
  if (checkForAttributes(parser, n.attribAndValues))
return n;
}
  }
  return null;
}

 I tried with that code and it is working. We shall add it in the next
 patch


 --Noble
 On Tue, Jun 10, 2008 at 9:11 PM, Nicolas Pastorino [EMAIL PROTECTED] wrote:

 I just forgot to mention the error related to the description below. I
 get
 the following when running a full-import ( sorry for the noise .. ) :

 SEVERE: Full Import failed
 java.lang.RuntimeException: java.lang.NullPointerException
   at

 org.apache.solr.handler.dataimport.XPathRecordReader.streamRecords(XPathRecordReader.java:85)
   at

 org.apache.solr.handler.dataimport.XPathEntityProcessor.initQuery(XPathEntityProcessor.java:207)
   at

 org.apache.solr.handler.dataimport.XPathEntityProcessor.fetchNextRow(XPathEntityProcessor.java:161)
   at

 org.apache.solr.handler.dataimport.XPathEntityProcessor.nextRow(XPathEntityProcessor.java:144)
   at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:280)
   at

 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:302)
   at

 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:173)
   at

 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:134)
   at

 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:323)
   at

 org.apache.solr.handler.dataimport.DataImporter.rumCmd(DataImporter.java:374)
   at

 org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:179)
   at

 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:125)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:965)
   at

 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
   at

 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)
   at

 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
   at
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
   at

 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
   at
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
   at
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
   at

 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
   at

 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
   at
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
   at org.mortbay.jetty.Server.handle(Server.java:285)
   at
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
   at

 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
   at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
   at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
   at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
   at

 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
   at

 org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
 Caused by: java.lang.NullPointerException
   at

 org.apache.solr.handler.dataimport.XPathRecordReader$Node.getMatchingChild(XPathRecordReader.java:198)
   at

 org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:171)
   at

 org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:174)
   at

 org.apache.solr.handler.dataimport.XPathRecordReader$Node.parse(XPathRecordReader.java:174)
   at

 

Re: Problem with add a XML

2008-06-11 Thread Grant Ingersoll


On Jun 11, 2008, at 3:46 AM, Thomas Lauer wrote:


now I want tho add die files to solr. I have start solr on windows
in the example directory with java -jar start.jar


I have the following Error Message:

C:\test\outputjava -jar post.jar *.xml
SimplePostTool: version 1.2
SimplePostTool: WARNING: Make sure your XML documents are encoded in  
UTF-8, other encodings are not currently supported



This is your issue right here.  You have to save that second file in  
UTF-8.




SimplePostTool: POSTing files to http://localhost:8983/solr/update..
SimplePostTool: POSTing file 1.xml
SimplePostTool: POSTing file 2.xml
SimplePostTool: FATAL: Connection error (is Solr running at http://localhost:8983/solr/update 
 ?): java.io.IOException: S

erver returned HTTP response code: 400 for URL: 
http://localhost:8983/solr/update

C:\test\output

Regards Thomas Lauer





__ Hinweis von ESET NOD32 Antivirus, Signaturdatenbank- 
Version 3175 (20080611) __


E-Mail wurde geprüft mit ESET NOD32 Antivirus.

http://www.eset.com


--
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ









Re: searching only within allowed documents

2008-06-11 Thread Geoffrey Young




Solr allows you to specify filters in separate parameters that are
applied to the main query, but cached separately.

q=the user queryfq=folder:f13fq=folder:f24


I've been wanting more explanation around this for a while, so maybe now 
is a good time to ask :)


the cached separately verbiage here is the same as in the twiki, but I 
don't really understand what it means.  more precisely, I'm wondering 
what the real performance, caching, etc differences are between


  q=fielda:foo+fieldb:barmm=100%

and

  q=fielda:foofq=fieldb:bar

my situation is similar to the original poster's in that documents 
matching fielda is very large and common (say theaters across the world) 
while fieldb would narrow it considerably (one by country, then one by 
zipcode, etc).


thanks

--Geoff




Re: DataImport

2008-06-11 Thread Shalin Shekhar Mangar
Ok, let's start again from scratch with a clean Tomcat installation.

1. Download example-solr-home.jar from the wiki and extract it to a local
folder for example to /home/your_username/
2. You will now see a folder called example-solr-home where you extracted
the jar file in the above step
3. Copy /home/your_username/example-solr-home/solr.war to
/tomcat5.5/webapps/solr.war
4. export
JAVA_OPTS=-Dsolr.solr.home=/home/your_username/example-solr-home/solr
5. start tomcat from the same shell after exporting the above variable

Verify that tomcat starts without showing any exceptions in the logs. Now
you will be able to run the examples given in the DataImportHandler wiki.

2008/6/11 Mihails Agafonovs [EMAIL PROTECTED]:

 I'm stuck...

 I now have /tomcat5.5/webapps/solr (exploded solr.war),
 /tomcat5.5/webapps/solr/solr-example/.
 I've ran

 export

 JAVA_OPTS=$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/
 to make /example/solr/ as a home directory.

 What am I doing wrong?

  Quoting Shalin Shekhar Mangar : Hi Mihails,
  The solr home is a directory which contains the conf/ and data/
 folders. The
  conf folder contains solrconfig.xml, schema.xml and other such
 configuration
  files. The data/ folder contains the index files.
  Other than adding the war file to tomcat, you also need to designate
 a
  certain folder as solr home, so that solr knows from where to load
 it's
  configuration. By default, solr searches for a folder named solr
 under the
  current working directory (pwd) to use as home. There are other ways
 of
  configuring it as given in solr wiki. Hope that helpes.
  2008/6/11 Mihails Agafonovs :
   I've already done that, but cannot access solr via web, and apache
 log
   says something wrong with solr home directory.
   -
   Couldn't start SOLR. Check solr/home property.
   -
Quoting Chakraborty, Kishore K. : Mihails,
Put the solr.war into the webapps directory and restart tomcat,
 then
   follow up the console and you'll see messages saying solr.war is
   getting deployed.
Use a recent nightly build as that has the dataimport related
 patch
   included.
Regards
Kishore.
-Original Message-
From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June 11, 2008 1:13 PM
To: solr-user@lucene.apache.org
Subject: Re: DataImport
If I've copied the solr.war under tomcat/webapps directory, after
restarting it the archive extracts itself and I get solr
 directory.
Why do I need to set example-solr-home/solr, which is not in the
/webapps directory, as home directory?
Quoting Shalin Shekhar Mangar : No, the steps are as follows:
1. Download the example-solr-home.jar from the DataImportHandler
wiki page
2. Extract it. You'll find a folder named example-solr-home and
 a
solr.war
file after extraction
3. Copy the solr.war to tomcat_home/webapps. You don't need any
other solr
instance. This war is self-sufficient.
4. You need to set the example-solr-home/solr folder as the solr
home
folder. For instructions on how to do that, look at
http://wiki.apache.org/solr/SolrTomcat
From the port number of the URL you are trying, it seems that
 you're
using
the Jetty supplied with Solr instead of Tomcat.
2008/6/9 Mihails Agafonovs :
 I've placed the solr.war under the tomcat directory, restarted
tomcat
 to deploy the solr.war. But still... there is no .jar, no
 folder
named
 example-data-config, and hitting
 http://localhost:8983/solr/dataimport doesn't work.
 Do I need the original Solr instance to use this .war with?
  Quoting Shalin Shekhar Mangar : 1. Correct, there is no jar.
 You
can
 use the solr.war file. If you really
  need a jar, you'll need to use the SOLR-469.patch at
  http://issues.apache.org/jira/browse/SOLR-469 and build solr
 from
 source
  after applying that patch.
  2. The jar contains a folder named example-solr-home. Please
check
 again.
  Please let me know if you run into any problems.
  2008/6/9 Mihails Agafonovs :
   Looked through the tutorial on data import, section Full
Import
   Example.
   1) Where is this dataimport.jar? There is no such file in
 the
   extracted example-solr-home.jar.
   2) Use the solr folder inside example-data-config folder as
your
   solr home. What does this mean? Anyway, there is no folder
   example-data-config.
Ar cieņu, Mihails
  --
  Regards,
  Shalin Shekhar Mangar.
  Ar cieņu, Mihails

 Links:
 --
 [1] mailto:[EMAIL PROTECTED]

--
Regards,
Shalin Shekhar Mangar.
Ar cieņu, Mihails
Links:
--
[1] mailto:[EMAIL PROTECTED]
Ar cieņu, Mihails
  
   Links:
   --
   [1] mailto:[EMAIL PROTECTED]
  
  --
  Regards,
  Shalin Shekhar Mangar.
  Ar cieņu, Mihails

 Links:
 --
 [1] mailto:[EMAIL 

Re: DataImport

2008-06-11 Thread Mihails Agafonovs
Exception in Lucene Index Updater.

Anyway, for some reasons I'm able to start Solr only using its own
Jetty. Everything else works fine on my Tomcat, except Solr.
 Quoting Shalin Shekhar Mangar : Ok, let's start again from scratch
with a clean Tomcat installation.
 1. Download example-solr-home.jar from the wiki and extract it to a
local
 folder for example to /home//
 2. You will now see a folder called example-solr-home where you
extracted
 the jar file in the above step
 3. Copy /home//example-solr-home/solr.war to
 /tomcat5.5/webapps/solr.war
 4. export
 JAVA_OPTS=-Dsolr.solr.home=/home//example-solr-home/solr
 5. start tomcat from the same shell after exporting the above
variable
 Verify that tomcat starts without showing any exceptions in the
logs. Now
 you will be able to run the examples given in the DataImportHandler
wiki.
 2008/6/11 Mihails Agafonovs :
  I'm stuck...
 
  I now have /tomcat5.5/webapps/solr (exploded solr.war),
  /tomcat5.5/webapps/solr/solr-example/.
  I've ran
 
  export
 
 
JAVA_OPTS=$JAVA_OPTS-Dsolr.solr.home=/usr/share/tomcat5.5/webapps/solr/example/solr/
  to make /example/solr/ as a home directory.
 
  What am I doing wrong?
 
   Quoting Shalin Shekhar Mangar : Hi Mihails,
   The solr home is a directory which contains the conf/ and data/
  folders. The
   conf folder contains solrconfig.xml, schema.xml and other such
  configuration
   files. The data/ folder contains the index files.
   Other than adding the war file to tomcat, you also need to
designate
  a
   certain folder as solr home, so that solr knows from where to
load
  it's
   configuration. By default, solr searches for a folder named
solr
  under the
   current working directory (pwd) to use as home. There are other
ways
  of
   configuring it as given in solr wiki. Hope that helpes.
   2008/6/11 Mihails Agafonovs :
I've already done that, but cannot access solr via web, and
apache
  log
says something wrong with solr home directory.
-
Couldn't start SOLR. Check solr/home property.
-
 Quoting Chakraborty, Kishore K. : Mihails,
 Put the solr.war into the webapps directory and restart
tomcat,
  then
follow up the console and you'll see messages saying solr.war
is
getting deployed.
 Use a recent nightly build as that has the dataimport related
  patch
included.
 Regards
 Kishore.
 -Original Message-
 From: Mihails Agafonovs [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, June 11, 2008 1:13 PM
 To: solr-user@lucene.apache.org
 Subject: Re: DataImport
 If I've copied the solr.war under tomcat/webapps directory,
after
 restarting it the archive extracts itself and I get solr
  directory.
 Why do I need to set example-solr-home/solr, which is not in
the
 /webapps directory, as home directory?
 Quoting Shalin Shekhar Mangar : No, the steps are as follows:
 1. Download the example-solr-home.jar from the
DataImportHandler
 wiki page
 2. Extract it. You'll find a folder named example-solr-home
and
  a
 solr.war
 file after extraction
 3. Copy the solr.war to tomcat_home/webapps. You don't need
any
 other solr
 instance. This war is self-sufficient.
 4. You need to set the example-solr-home/solr folder as the
solr
 home
 folder. For instructions on how to do that, look at
 http://wiki.apache.org/solr/SolrTomcat
 From the port number of the URL you are trying, it seems that
  you're
 using
 the Jetty supplied with Solr instead of Tomcat.
 2008/6/9 Mihails Agafonovs :
  I've placed the solr.war under the tomcat directory,
restarted
 tomcat
  to deploy the solr.war. But still... there is no .jar, no
  folder
 named
  example-data-config, and hitting
  http://localhost:8983/solr/dataimport doesn't work.
  Do I need the original Solr instance to use this .war with?
   Quoting Shalin Shekhar Mangar : 1. Correct, there is no
jar.
  You
 can
  use the solr.war file. If you really
   need a jar, you'll need to use the SOLR-469.patch at
   http://issues.apache.org/jira/browse/SOLR-469 and build
solr
  from
  source
   after applying that patch.
   2. The jar contains a folder named example-solr-home.
Please
 check
  again.
   Please let me know if you run into any problems.
   2008/6/9 Mihails Agafonovs :
Looked through the tutorial on data import, section Full
 Import
Example.
1) Where is this dataimport.jar? There is no such file in
  the
extracted example-solr-home.jar.
2) Use the solr folder inside example-data-config folder
as
 your
solr home. What does this mean? Anyway, there is no
folder
example-data-config.
 Ar cieņu, Mihails
   --
   Regards,
   Shalin Shekhar Mangar.
   Ar cieņu, Mihails
 
  Links:
  --
  [1] mailto:[EMAIL PROTECTED]
 
 --
 Regards,
 Shalin 

range query highlighting

2008-06-11 Thread Stefan Oestreicher
Hi,

I'm using solr built from trunk and highlighting for range queries doesn't
work.
If I search for 2008 everything works as expected but if I search for
[2000 TO 2008] nothing gets highlighted.
The field I'm searching on is a TextField and I've confirmed that the query
and index analyzers are working as expected. 
I didn't find anything in the issue tracker about this. 

Any ideas?

TIA,
 
Stefan Oestreicher
 
--
Dr. Maté GmbH
Stefan Oestreicher / Entwicklung
[EMAIL PROTECTED]
http://www.netdoktor.at
Tel Buero: + 43 1 405 55 75 24
Fax Buero: + 43 1 405 55 75 55
Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6



RE: [jira] Updated: (SOLR-469) Data Import RequestHandler

2008-06-11 Thread Julio Castillo
 Shalin,
Thanks for consolidating the patch.

Any idea, when the dB Import request handler will be part of the nightly
build?

Thanks again

** julio

-Original Message-
From: Shalin Shekhar Mangar (JIRA) [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2008 8:43 AM
To: [EMAIL PROTECTED]
Subject: [jira] Updated: (SOLR-469) Data Import RequestHandler


 [
https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugi
n.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar updated SOLR-469:
---

Attachment: SOLR-469.patch

A new patch file (SOLR-469.patch) consisting of some important bug fixes and
minor enhancements. The changes and the corresponding classes are given
below

*Changes*
* Set fetch size to Integer.MIN_VALUE if batchSize in configuration is -1 as
per Patrick's suggestion -- JdbcDataSource
* Transformers can add a boost to a document by adding a key/value pair
row.put($docBoost, 2.0f) from any entity -- DocBuilder,SolrWriter and
DataImportHandler
* Fixes for infinite loop in SqlEntityProcessor when delta query fails for
some reason and NullPointerException is thrown in EntityProcessorBase --
EntityProcessorBase
* Fix for NullPointerException in TemplateTransformer and corresponding test
-- TemplateTransformer, TestTemplateTransformer
* Enhancement for specifying table.column syntax for pk attribute in entity
as per issue reported by Chris Moser and Olivier Poitrey --
SqlEntityProcessor,TestSqlEntityProcessor2
* Fix for NullPointerException in XPathRecordReader when attribute specified
through xpath is null -- XPathRecordReader, TestXPathRecordReader
* Enhancement to DataSource interface to provide a close method --
DataSource, FileDataSource, HttpDataSource, MockDataSource
* Context interface has a new method getDataSource(String entityName) for
getting a new DataSource instance for the given entity -- Context,
ContextImpl, DataImporter, DocBuilder
* FileListEntityProcessor implements olderThan and newerThan filtering
parameters -- FileListEntityProcessor, TestFileListEntityProcessor
* Debug Mode can be disabled from solrconfig.xml by enableDebug=false --
DataImporter, DataImportHandler
* Running statistics are exposed on the Solr Statistics page in addition to
cumulative statictics -- DataImportHandler, DocBuilder

 Data Import RequestHandler
 --

 Key: SOLR-469
 URL: https://issues.apache.org/jira/browse/SOLR-469
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 1.3
Reporter: Noble Paul
Assignee: Grant Ingersoll
 Fix For: 1.3

 Attachments: SOLR-469-contrib.patch, SOLR-469.patch, 
 SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, 
 SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch


 We need a RequestHandler Which can import data from a DB or other
dataSources into the Solr index .Think of it as an advanced form of
SqlUpload Plugin (SOLR-103).
 The way it works is as follows.
 * Provide a configuration file (xml) to the Handler which takes in the
necessary SQL queries and mappings to a solr schema
   - It also takes in a properties file for the data source
configuraution
 * Given the configuration it can also generate the solr schema.xml
 * It is registered as a RequestHandler which can take two commands
do-full-import, do-delta-import
   -  do-full-import - dumps all the data from the Database into
the index (based on the SQL query in configuration)
   - do-delta-import - dumps all the data that has changed since
last import. (We assume a modified-timestamp column in tables)
 * It provides a admin page
   - where we can schedule it to be run automatically at regular
intervals
   - It shows the status of the Handler (idle, full-import, 
 delta-import)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Updated: (SOLR-469) Data Import RequestHandler

2008-06-11 Thread Shalin Shekhar Mangar
Hi Julio,

That was fast! I just uploaded a patch :)

Actually, it is waiting on SOLR-563 (
http://issues.apache.org/jira/browse/SOLR-563) which deals with modifying
the build scripts to create a contrib project area in Solr. I'm planning to
work on that this week. Once that is done, it would be upto a committer to
add it to the trunk.

On Wed, Jun 11, 2008 at 9:24 PM, Julio Castillo [EMAIL PROTECTED]
wrote:

  Shalin,
 Thanks for consolidating the patch.

 Any idea, when the dB Import request handler will be part of the nightly
 build?

 Thanks again

 ** julio

 -Original Message-
 From: Shalin Shekhar Mangar (JIRA) [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, June 11, 2008 8:43 AM
 To: [EMAIL PROTECTED]
 Subject: [jira] Updated: (SOLR-469) Data Import RequestHandler


 [

 https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugi
 n.system.issuetabpanels:all-tabpanel ]

 Shalin Shekhar Mangar updated SOLR-469:
 ---

Attachment: SOLR-469.patch

 A new patch file (SOLR-469.patch) consisting of some important bug fixes
 and
 minor enhancements. The changes and the corresponding classes are given
 below

 *Changes*
 * Set fetch size to Integer.MIN_VALUE if batchSize in configuration is -1
 as
 per Patrick's suggestion -- JdbcDataSource
 * Transformers can add a boost to a document by adding a key/value pair
 row.put($docBoost, 2.0f) from any entity -- DocBuilder,SolrWriter and
 DataImportHandler
 * Fixes for infinite loop in SqlEntityProcessor when delta query fails for
 some reason and NullPointerException is thrown in EntityProcessorBase --
 EntityProcessorBase
 * Fix for NullPointerException in TemplateTransformer and corresponding
 test
 -- TemplateTransformer, TestTemplateTransformer
 * Enhancement for specifying table.column syntax for pk attribute in entity
 as per issue reported by Chris Moser and Olivier Poitrey --
 SqlEntityProcessor,TestSqlEntityProcessor2
 * Fix for NullPointerException in XPathRecordReader when attribute
 specified
 through xpath is null -- XPathRecordReader, TestXPathRecordReader
 * Enhancement to DataSource interface to provide a close method --
 DataSource, FileDataSource, HttpDataSource, MockDataSource
 * Context interface has a new method getDataSource(String entityName) for
 getting a new DataSource instance for the given entity -- Context,
 ContextImpl, DataImporter, DocBuilder
 * FileListEntityProcessor implements olderThan and newerThan filtering
 parameters -- FileListEntityProcessor, TestFileListEntityProcessor
 * Debug Mode can be disabled from solrconfig.xml by enableDebug=false --
 DataImporter, DataImportHandler
 * Running statistics are exposed on the Solr Statistics page in addition to
 cumulative statictics -- DataImportHandler, DocBuilder

  Data Import RequestHandler
  --
 
  Key: SOLR-469
  URL: https://issues.apache.org/jira/browse/SOLR-469
  Project: Solr
   Issue Type: New Feature
   Components: update
 Affects Versions: 1.3
 Reporter: Noble Paul
 Assignee: Grant Ingersoll
  Fix For: 1.3
 
  Attachments: SOLR-469-contrib.patch, SOLR-469.patch,
  SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch,
  SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch
 
 
  We need a RequestHandler Which can import data from a DB or other
 dataSources into the Solr index .Think of it as an advanced form of
 SqlUpload Plugin (SOLR-103).
  The way it works is as follows.
  * Provide a configuration file (xml) to the Handler which takes in
 the
 necessary SQL queries and mappings to a solr schema
- It also takes in a properties file for the data source
 configuraution
  * Given the configuration it can also generate the solr schema.xml
  * It is registered as a RequestHandler which can take two commands
 do-full-import, do-delta-import
-  do-full-import - dumps all the data from the Database into
 the index (based on the SQL query in configuration)
- do-delta-import - dumps all the data that has changed since
 last import. (We assume a modified-timestamp column in tables)
  * It provides a admin page
- where we can schedule it to be run automatically at regular
 intervals
- It shows the status of the Handler (idle, full-import,
  delta-import)

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




-- 
Regards,
Shalin Shekhar Mangar.


Re: range query highlighting

2008-06-11 Thread Yonik Seeley
It's a known deficiency... ConstantScoreRangeQuery and
ConstantScorePrefixQuery which Solr uses  rewrite to a
ConstantScoreQuery and don't expose the terms they match.
Performance-wise it seems like a bad idea if the number of terms
matched is large (esp when used in a MultiSearcher or later in
global-idf for distributed search).

-Yonik

On Wed, Jun 11, 2008 at 11:09 AM, Stefan Oestreicher
[EMAIL PROTECTED] wrote:
 Hi,

 I'm using solr built from trunk and highlighting for range queries doesn't
 work.
 If I search for 2008 everything works as expected but if I search for
 [2000 TO 2008] nothing gets highlighted.
 The field I'm searching on is a TextField and I've confirmed that the query
 and index analyzers are working as expected.
 I didn't find anything in the issue tracker about this.

 Any ideas?

 TIA,

 Stefan Oestreicher

 --
 Dr. Maté GmbH
 Stefan Oestreicher / Entwicklung
 [EMAIL PROTECTED]
 http://www.netdoktor.at
 Tel Buero: + 43 1 405 55 75 24
 Fax Buero: + 43 1 405 55 75 55
 Alser Str. 4 1090 Wien Altes AKH Hof 1 1.6.6




CSV output

2008-06-11 Thread Marshall Weir

Hi,

Does SOLR have .csv output? I can find references to .csv input, but  
not output.


Thank you,
Marshall


Re: CSV output

2008-06-11 Thread Otis Gospodnetic
Hi Marshall,

I don't think there is a CSV Writer, but here are some pointers for writing one:

$ ff \*Writer\*java | grep -v Test | grep request
./src/java/org/apache/solr/request/PHPResponseWriter.java
./src/java/org/apache/solr/request/XSLTResponseWriter.java
./src/java/org/apache/solr/request/JSONResponseWriter.java
./src/java/org/apache/solr/request/PythonResponseWriter.java
./src/java/org/apache/solr/request/RawResponseWriter.java
./src/java/org/apache/solr/request/QueryResponseWriter.java
./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java
./src/java/org/apache/solr/request/BinaryResponseWriter.java
./src/java/org/apache/solr/request/RubyResponseWriter.java
./src/java/org/apache/solr/request/TextResponseWriter.java
./src/java/org/apache/solr/request/XMLWriter.java
./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java
./src/java/org/apache/solr/request/XMLResponseWriter.java

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Marshall Weir [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 11, 2008 12:52:50 PM
 Subject: CSV output
 
 Hi,
 
 Does SOLR have .csv output? I can find references to .csv input, but  
 not output.
 
 Thank you,
 Marshall



Re: CSV output

2008-06-11 Thread Walter Underwood
I recommend using the OpenCSV package. Works fine, Apache 2.0 license.

http://opencsv.sourceforge.net/

wunder

On 6/11/08 10:00 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:

 Hi Marshall,
 
 I don't think there is a CSV Writer, but here are some pointers for writing
 one:
 
 $ ff \*Writer\*java | grep -v Test | grep request
 ./src/java/org/apache/solr/request/PHPResponseWriter.java
 ./src/java/org/apache/solr/request/XSLTResponseWriter.java
 ./src/java/org/apache/solr/request/JSONResponseWriter.java
 ./src/java/org/apache/solr/request/PythonResponseWriter.java
 ./src/java/org/apache/solr/request/RawResponseWriter.java
 ./src/java/org/apache/solr/request/QueryResponseWriter.java
 ./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java
 ./src/java/org/apache/solr/request/BinaryResponseWriter.java
 ./src/java/org/apache/solr/request/RubyResponseWriter.java
 ./src/java/org/apache/solr/request/TextResponseWriter.java
 ./src/java/org/apache/solr/request/XMLWriter.java
 ./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java
 ./src/java/org/apache/solr/request/XMLResponseWriter.java
 
 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 
 
 - Original Message 
 From: Marshall Weir [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 11, 2008 12:52:50 PM
 Subject: CSV output
 
 Hi,
 
 Does SOLR have .csv output? I can find references to .csv input, but
 not output.
 
 Thank you,
 Marshall
 



Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi,

I've just changed the stemming algorithm slightly and am running a few  
tests against the old stemmer versus the new stemmer. I did a query  
for 'hanger' and using the old stemmer I get the following scoring for  
a document with the title: Converter Hanger Assembly Replacement


6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is different in each of the explanations, ie: the fieldNorm  
using the old stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454).  
For the new stemmer  0.4375 = fieldNorm(field=title_t, doc=3454). I  
ran the title through both stemmers and get the same number of tokens  
produced. I do no index time boosting on the title_t field. I am using  
DefaultSimilarity in both instances. So I figured the calculated  
fieldNorm would be:


field boost * lengthNorm = 1 * 1/sqrt(4) = 0.5

I wouldn't have thought that changing the stemmer would have any  
impact on the fieldNorm in this case. Any insight? Please kick me over  
to the lucene list if you feel this isn't appropriate 

Searching for words with accented characters.

2008-06-11 Thread Robert Haschart
We are using Solr as the search engine for our public access library 
catalog.  In testing I did a search for a French movie that I know is in 
the catalog named:  Kirikou et la sorcière  and nothing was returned.  
If I search for just the work Kirikou several results are returned, 
and the problem becomes apparent.  The records contain Kirikou et la 
sorcie?re  where the accent is a unicode combining character following 
the e. 

After some research into Unicode normalization, I found and installed a 
Unicode normalization filter that is set to convert letters followed by 
combining codes into the precomposed form.  I also installed a 
solr.ISOLatin1AccentFilterFactory that will then convert these 
precomposed forms into the latin equivalent without any accent.   The 
following is the fieldType definition taken from the schema.xml file:


  fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=schema.UnicodeNormalizationFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=1 
catenateNumbers=1 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=schema.UnicodeNormalizationFilterFactory/
   filter class=solr.ISOLatin1AccentFilterFactory/
   filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt ignoreCase=true expand=true/
   filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
   filter class=solr.WordDelimiterFilterFactory 
generateWordParts=1 generateNumberParts=1 catenateWords=0 
catenateNumbers=0 catenateAll=0/

   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/

   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
   /fieldType

So it seems like this should work. 
However again searching for Kirikou et la sorcière or sorcière or 
sorcie?re or just sorciere  doesn't return the docment in question.


I've tried looking at the results from solr/admin/analysis.jsp  entering 
in text from the record for the Field value (Index) and entering in 
sorciere in the Field value (Query)  and I get the follow results, which 
seems to indicate that there should be a match between the stemmed entry 
sorcier in the record and the stemmed word sorcier from the query.


So clearly I am either doing something wrong or misinterpreting the 
analyzers, but I am at a loss as to how to figure out what is wrong.  
Any suggestions?



   org.apache.solr.analysis.WhitespaceTokenizerFactory {}

term position 	1 	2 	3 	4 	5 	6 	7 	8 	9 	10 	11 	12 	13 	14 	15 	16 
17 	18 	19 	20 	21 	22 	23 	24 	25 	26 	27 	28 	29
term text 	Kirikou 	et 	la 	sorcie?re 	France 	3 	Cinema 	/ 	RTBF 
(Te?le?vision 	belge). 	Grand 	Prix 	du 	festival 	d'Annecy 	1999 
France 	French 	VHS 	VIDEO 	.VHS10969 	1 	vide?ocassette 	(1h10 	min.) 
(VHS) 	Ocelot, 	Michel
term type 	word 	word 	word 	word 	word 	word 	word 	word 	word 
word 	word 	word 	word 	word 	word 	word 	word 	word 	word 	word 	word 
word 	word 	word 	word 	word 	word 	word 	word
source start,end 	0,7 	8,10 	11,13 	14,23 	25,31 	32,33 	34,40 	41,42 
43,47 	48,61 	62,69 	72,77 	78,82 	83,85 	86,94 	95,103 	104,108 
110,116 	117,123 	124,127 	129,134 	135,144 	147,148 	149,163 	164,169 
170,175 	176,181 	183,190 	191,197



   schema.UnicodeNormalizationFilterFactory {}

term position 	1 	2 	3 	4 	5 	6 	7 	8 	9 	10 	11 	12 	13 	14 	15 	16 
17 	18 	19 	20 	21 	22 	23 	24 	25 	26 	27 	28 	29
term text 	(Kirikou,0,7) 	(et,8,10) 	(la,11,13) 	(sorcière,14,23) 
(France,25,31) 	(3,32,33) 	(Cinema,34,40) 	(/,41,42) 	(RTBF,43,47) 
((Télévision,48,61) 	(belge).,62,69) 	(Grand,72,77) 	(Prix,78,82) 
(du,83,85) 	(festival,86,94) 	(d'Annecy,95,103) 	(1999,104,108) 
(France,110,116) 	(French,117,123) 	(VHS,124,127) 	(VIDEO,129,134) 
(.VHS10969,135,144) 	(1,147,148) 	(vidéocassette,149,163) 
((1h10,164,169) 	(min.),170,175) 	((VHS),176,181) 	(Ocelot,,183,190) 
(Michel,191,197)
term type 	word 	word 	word 	word 	word 	word 	word 	word 	word 
word 	word 	word 	word 	word 	word 	word 	word 	word 	word 	word 	word 
word 	word 	word 	word 	word 	word 	word 	word
source start,end 	0,7 	8,10 	11,13 	14,23 	25,31 	32,33 	34,40 	41,42 
43,47 	48,61 	62,69 	72,77 	78,82 	83,85 	86,94 	95,103 	104,108 
110,116 	117,123 	124,127 	129,134 	135,144 	147,148 	149,163 	164,169 
170,175 	176,181 	183,190 	191,197



   org.apache.solr.analysis.ISOLatin1AccentFilterFactory {}

term position 	1 	2 

Re: CSV output

2008-06-11 Thread Brendan Grainger
When I was asked for something similar I quickly cobbled together a  
stylesheet (I'm no xsl expert so it's probably pretty bad).


Invoked like this:

http://localhost:8982/solr/select?q=testingfl=id,title_t,scorewt=xslttr=csv.xslrows=10

YMMV, but feel free to use it if it helps, I've attached it.

Brendan








On Jun 11, 2008, at 1:05 PM, Walter Underwood wrote:


I recommend using the OpenCSV package. Works fine, Apache 2.0 license.

http://opencsv.sourceforge.net/

wunder

On 6/11/08 10:00 AM, Otis Gospodnetic [EMAIL PROTECTED]  
wrote:



Hi Marshall,

I don't think there is a CSV Writer, but here are some pointers for  
writing

one:

$ ff \*Writer\*java | grep -v Test | grep request
./src/java/org/apache/solr/request/PHPResponseWriter.java
./src/java/org/apache/solr/request/XSLTResponseWriter.java
./src/java/org/apache/solr/request/JSONResponseWriter.java
./src/java/org/apache/solr/request/PythonResponseWriter.java
./src/java/org/apache/solr/request/RawResponseWriter.java
./src/java/org/apache/solr/request/QueryResponseWriter.java
./src/java/org/apache/solr/request/PHPSerializedResponseWriter.java
./src/java/org/apache/solr/request/BinaryResponseWriter.java
./src/java/org/apache/solr/request/RubyResponseWriter.java
./src/java/org/apache/solr/request/TextResponseWriter.java
./src/java/org/apache/solr/request/XMLWriter.java
./src/java/org/apache/solr/request/BinaryQueryResponseWriter.java
./src/java/org/apache/solr/request/XMLResponseWriter.java

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 

From: Marshall Weir [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Wednesday, June 11, 2008 12:52:50 PM
Subject: CSV output

Hi,

Does SOLR have .csv output? I can find references to .csv input, but
not output.

Thank you,
Marshall








Re: Question about fieldNorm

2008-06-11 Thread Yonik Seeley
That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
 I've just changed the stemming algorithm slightly and am running a few tests
 against the old stemmer versus the new stemmer. I did a query for 'hanger'
 and using the old stemmer I get the following scoring for a document with
 the title: Converter Hanger Assembly Replacement

 6.4242806 = (MATCH) sum of:
  2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
  3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

 Using the new stemmer I get:

 5.621245 = (MATCH) sum of:
  2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
  3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

 The thing that is perplexing is that the fieldNorm for the title_t field is
 different in each of the explanations, ie: the fieldNorm using the old
 stemmer is: 0.5 = fieldNorm(field=title_t, doc=3454). For the new stemmer
  0.4375 = fieldNorm(field=title_t, doc=3454). I ran the title through both
 stemmers and get the same number of tokens produced. I do no index time
 boosting on the title_t field. I am using DefaultSimilarity in both
 instances. So I figured the calculated fieldNorm would be:

 field boost * lengthNorm = 1 * 

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

Yes I did rebuild the index and they are the same document (just  
verified). The only thing that changed was the stemmer, but that makes  
no sense to me. Also, if the equation for the fieldNorm is:


fieldBoost * lengthNorm = fieldBoost * 1 /sqrt(numTermsForField)

Then that would mean numTermsForField would be: 5.22 when the norm is  
0.4375. Am I correct about how this is calculated?


Thanks again
Brendan

On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using the  
old
stemmer is: 0.5 = 

Re: Ignore fields in XML response

2008-06-11 Thread Shalin Shekhar Mangar
Sure, use the fl parameter to specify the fields that you want
(comma-separated)

On Wed, Jun 11, 2008 at 11:31 PM, Yves Zoundi [EMAIL PROTECTED]
wrote:

 Hi guys,



Is it possible to remove some fields from the XML response?
 I have a field which can contains a huge amount of data and I would like
 it to be ignore it in the XML response. Can it be achieved without
 writing a custom XMLResponseWriter?



 Thanks




-- 
Regards,
Shalin Shekhar Mangar.


Re: Ignore fields in XML response

2008-06-11 Thread Erik Hatcher
Yves - you can control which fields are returned from a search using  
the fl (field list) parameter.  fl=* provides all fields except  
score.  fl=id,title,score provides only those selected fields, etc.


Erik

On Jun 11, 2008, at 2:01 PM, Yves Zoundi wrote:


Hi guys,



   Is it possible to remove some fields from the XML response?
I have a field which can contains a huge amount of data and I would  
like

it to be ignore it in the XML response. Can it be achieved without
writing a custom XMLResponseWriter?



Thanks





Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms. So on indexing using the new stemmer converter hanger  
assembly replacement gets expanded to: converter hanger assembly  
assemble replacement so there are 5 terms which gets a length norm of  
0.4472136 instead of 0.5. Still unsure how it gets 0.4375 though as  
the result for the field norm though unless I have a boost of 0.9783  
somewhere there.


Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
I've just changed the stemming algorithm slightly and am running a  
few tests
against the old stemmer versus the new stemmer. I did a query for  
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)

The thing that is perplexing is that the fieldNorm for the title_t  
field is
different in each of the explanations, ie: the fieldNorm using the  

RE: Ignore fields in XML response

2008-06-11 Thread Yves Zoundi
Thank you guys!

-Message d'origine-
De : Erik Hatcher [mailto:[EMAIL PROTECTED] 
Envoyé : 11 juin 2008 14:07
À : solr-user@lucene.apache.org
Objet : Re: Ignore fields in XML response

Yves - you can control which fields are returned from a search using  
the fl (field list) parameter.  fl=* provides all fields except  
score.  fl=id,title,score provides only those selected fields, etc.

Erik

On Jun 11, 2008, at 2:01 PM, Yves Zoundi wrote:

 Hi guys,



Is it possible to remove some fields from the XML response?
 I have a field which can contains a huge amount of data and I would  
 like
 it to be ignore it in the XML response. Can it be achieved without
 writing a custom XMLResponseWriter?



 Thanks




Re: Question about fieldNorm

2008-06-11 Thread Yonik Seeley
Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:
 Hi Yonik,

 I just realized that the stemmer does make a difference because of synonyms.
 So on indexing using the new stemmer converter hanger assembly replacement
 gets expanded to: converter hanger assembly assemble replacement so there
 are 5 terms which gets a length norm of 0.4472136 instead of 0.5. Still
 unsure how it gets 0.4375 though as the result for the field norm though
 unless I have a boost of 0.9783 somewhere there.

 Brendan


 On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:

 That is strange... did you re-index or change the index?  If so, you
 might want to verify that docid=3454 still corresponds to the same
 document you queried earlier.

 -Yonik


 On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
 [EMAIL PROTECTED] wrote:

 I've just changed the stemming algorithm slightly and am running a few
 tests
 against the old stemmer versus the new stemmer. I did a query for
 'hanger'
 and using the old stemmer I get the following scoring for a document with
 the title: Converter Hanger Assembly Replacement

 6.4242806 = (MATCH) sum of:
 2.5697122 = (MATCH) max of:
  0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.1963516 = queryWeight(markup_t:hanger), product of:
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)
 3.8545685 = (MATCH) max of:
  0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.02993451 = queryNorm
1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.5593724 = idf(docFreq=6375, numDocs=1655591)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.8320503 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.02993451 = queryNorm
4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265229 = idf(docFreq=425, numDocs=1655591)
  0.5 = fieldNorm(field=title_t, doc=3454)

 Using the new stemmer I get:

 5.621245 = (MATCH) sum of:
 2.248498 = (MATCH) max of:
  0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
0.19635157 = queryWeight(markup_t:hanger), product of:
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
0.5547002 = queryWeight(title_t:hanger^2.0), product of:
  2.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = fieldNorm(field=title_t, doc=3454)
 3.372747 = (MATCH) max of:
  0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product of:
0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
  0.5 = boost
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.029934512 = queryNorm
1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454), product of:
  1.7320508 = tf(termFreq(markup_t:hanger)=3)
  6.559371 = idf(docFreq=6375, numDocs=1655589)
  0.109375 = fieldNorm(field=markup_t, doc=3454)
  3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
0.83205026 = queryWeight(title_t:hanger^3.0), product of:
  3.0 = boost
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.029934512 = queryNorm
4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454), product of:
  1.0 = tf(termFreq(title_t:hanger)=1)
  9.265228 = idf(docFreq=425, numDocs=1655589)
  0.4375 = 

Re: Question about fieldNorm

2008-06-11 Thread Brendan Grainger

Thanks so much, that explains it.

Brendan

On Jun 11, 2008, at 4:00 PM, Yonik Seeley wrote:


Field norms have limited precision (it's encoded as an 8 bit float) so
you are probably seeing rounding.

-Yonik

On Wed, Jun 11, 2008 at 2:13 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:

Hi Yonik,

I just realized that the stemmer does make a difference because of  
synonyms.
So on indexing using the new stemmer converter hanger assembly  
replacement
gets expanded to: converter hanger assembly assemble replacement  
so there
are 5 terms which gets a length norm of 0.4472136 instead of 0.5.  
Still
unsure how it gets 0.4375 though as the result for the field norm  
though

unless I have a boost of 0.9783 somewhere there.

Brendan


On Jun 11, 2008, at 1:37 PM, Yonik Seeley wrote:


That is strange... did you re-index or change the index?  If so, you
might want to verify that docid=3454 still corresponds to the same
document you queried earlier.

-Yonik


On Wed, Jun 11, 2008 at 1:09 PM, Brendan Grainger
[EMAIL PROTECTED] wrote:


I've just changed the stemming algorithm slightly and am running  
a few

tests
against the old stemmer versus the new stemmer. I did a query for
'hanger'
and using the old stemmer I get the following scoring for a  
document with

the title: Converter Hanger Assembly Replacement

6.4242806 = (MATCH) sum of:
2.5697122 = (MATCH) max of:
0.2439919 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.1963516 = queryWeight(markup_t:hanger), product of:
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.5697122 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)
3.8545685 = (MATCH) max of:
0.12199595 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.0981758 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.02993451 = queryNorm
  1.2426275 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.5593724 = idf(docFreq=6375, numDocs=1655591)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.8545685 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.8320503 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265229 = idf(docFreq=425, numDocs=1655591)
0.02993451 = queryNorm
  4.6326146 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265229 = idf(docFreq=425, numDocs=1655591)
0.5 = fieldNorm(field=title_t, doc=3454)

Using the new stemmer I get:

5.621245 = (MATCH) sum of:
2.248498 = (MATCH) max of:
0.24399184 = (MATCH) weight(markup_t:hanger in 3454), product of:
  0.19635157 = queryWeight(markup_t:hanger), product of:
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
2.248498 = (MATCH) weight(title_t:hanger^2.0 in 3454), product of:
  0.5547002 = queryWeight(title_t:hanger^2.0), product of:
2.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)
3.372747 = (MATCH) max of:
0.12199592 = (MATCH) weight(markup_t:hanger^0.5 in 3454), product  
of:

  0.09817579 = queryWeight(markup_t:hanger^0.5), product of:
0.5 = boost
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.029934512 = queryNorm
  1.2426274 = (MATCH) fieldWeight(markup_t:hanger in 3454),  
product of:

1.7320508 = tf(termFreq(markup_t:hanger)=3)
6.559371 = idf(docFreq=6375, numDocs=1655589)
0.109375 = fieldNorm(field=markup_t, doc=3454)
3.372747 = (MATCH) weight(title_t:hanger^3.0 in 3454), product of:
  0.83205026 = queryWeight(title_t:hanger^3.0), product of:
3.0 = boost
9.265228 = idf(docFreq=425, numDocs=1655589)
0.029934512 = queryNorm
  4.0535374 = (MATCH) fieldWeight(title_t:hanger in 3454),  
product of:

1.0 = tf(termFreq(title_t:hanger)=1)
9.265228 = idf(docFreq=425, numDocs=1655589)
0.4375 = fieldNorm(field=title_t, doc=3454)

The 

Re: Searching for words with accented characters.

2008-06-11 Thread solrtom

Hi  Robert,

Did you rebuild the index after changing your config? The index time
analyzer is only applied when a document is indexed, changing it has no
effect on already indexed documents.

Tom


Robert Haschart wrote:
 
 We are using Solr as the search engine for our public access library 
 catalog.  In testing I did a search for a French movie that I know is in 
 the catalog named:  Kirikou et la sorcière  and nothing was returned.  
 If I search for just the work Kirikou several results are returned, 
 and the problem becomes apparent.  The records contain Kirikou et la 
 sorcie?re  where the accent is a unicode combining character following 
 the e. 
 
 After some research into Unicode normalization, I found and installed a 
 Unicode normalization filter that is set to convert letters followed by 
 combining codes into the precomposed form.  I also installed a 
 solr.ISOLatin1AccentFilterFactory that will then convert these 
 precomposed forms into the latin equivalent without any accent.   The 
 following is the fieldType definition taken from the schema.xml file:
 
fieldType name=text class=solr.TextField
 positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=schema.UnicodeNormalizationFilterFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=1 
 catenateNumbers=1 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=schema.UnicodeNormalizationFilterFactory/
 filter class=solr.ISOLatin1AccentFilterFactory/
 filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true 
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory 
 generateWordParts=1 generateNumberParts=1 catenateWords=0 
 catenateNumbers=0 catenateAll=0/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory 
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   /analyzer
 /fieldType
 
 So it seems like this should work. 
 However again searching for Kirikou et la sorcière or sorcière or 
 sorcie?re or just sorciere  doesn't return the docment in question.
 
 I've tried looking at the results from solr/admin/analysis.jsp  entering 
 in text from the record for the Field value (Index) and entering in 
 sorciere in the Field value (Query)  and I get the follow results, which 
 seems to indicate that there should be a match between the stemmed entry 
 sorcier in the record and the stemmed word sorcier from the query.
 
 So clearly I am either doing something wrong or misinterpreting the 
 analyzers, but I am at a loss as to how to figure out what is wrong.  
 Any suggestions?
 
 
 org.apache.solr.analysis.WhitespaceTokenizerFactory {}
 
 term position 1   2   3   4   5   6   7   
 8   9   10  11  12  13  14  15  16 
 1718  19  20  21  22  23  24  25  26  
 27  28  29
 term text Kirikou et  la  sorcie?re   France  3   
 Cinema  /   RTBF 
 (Te?le?vision belge). Grand   Prixdu  festival
 d'Annecy1999 
 FranceFrench  VHS VIDEO   .VHS10969   1   vide?ocassette  
 (1h10   min.) 
 (VHS) Ocelot, Michel
 term type wordwordwordwordwordwordwordword
 word 
 word  wordwordwordwordwordwordwordwordword
 wordword 
 word  wordwordwordwordwordwordword
 source start,end  0,7 8,1011,13   14,23   25,31   32,33   34,40   
 41,42 
 43,47 48,61   62,69   72,77   78,82   83,85   86,94   95,103  104,108 
 110,116   117,123 124,127 129,134 135,144 
 147,148 149,163 164,169 
 170,175   176,181 183,190 191,197
 
 
 schema.UnicodeNormalizationFilterFactory {}
 
 term position 1   2   3   4   5   6   7   
 8   9   10  11  12  13  14  15  16 
 1718  19  20  21  22  23  24  25  26  
 27  28  29
 term text (Kirikou,0,7)   (et,8,10)   (la,11,13)  
 (sorcière,14,23) 
 (France,25,31)(3,32,33)   (Cinema,34,40)  (/,41,42)   
 

synonym token types and ranking

2008-06-11 Thread Uri Boness

Hi,

I've noticed that currently the SynonymFilter replaces the original 
token with the configured tokens list (which includes the original 
matched token) and each one of these tokens is of type word. Wouldn't 
it make more sense to only mark the original token as type word and 
the the other tokens as synonym types? In addition, once payloads are 
integrated with Solr, it would be nice if it would be possible to 
configure a payload for synonyms. One of the requirements we're 
currently facing in our project is that matches on synonyms should weigh 
less than exact matches.


cheers,
Uri


Strategy for presenting fresh data

2008-06-11 Thread James Brady

Hi,
The product I'm working on requires new documents to be searchable  
very quickly (inside 60 seconds is my goal). The corpus is also going  
to grow very large, although it is perfectly partitionable by user.


The approach I tried first was to have write-only masters and read- 
only slaves with data being replicated from one to another postCommit  
and postOptimise.


This allowed new documents to be visible inside 5 minutes or so (until  
the indexes got so large that re-opening IndexSearchers took for ever,  
that is...), but still not good enough.


Now, I am considering cutting out the commit / replicate / re-open  
cycle by augmenting Solr with a RAMDirectory per core.


Your thoughts on the following approach would be much appreciated:

Searches would be forked to both the RAMDirectory and FSDirectory,  
while writes would go to the RAMDirectory only. The RAMDirectory would  
be flushed back to the FSDirectory regularly, using  
IndexWriter.addIndexes (or addIndexesNoOptimise).


Effectively, I'd be creating a searchable queue in front of a  
regularly committed and optimised conventional index.


As this seems to be a useful pattern (and is mentioned tangentially in  
Lucene in Action), is there already support for this in Lucene?


Thanks,
James


Re: synonym token types and ranking

2008-06-11 Thread Otis Gospodnetic
Hi Uri,

Yes, I think that would make sense (word vs. synonym token types).  Custom 
boosting/weighting of original token vs. synonym token(s) also makes sense.  Is 
this something you can provide a patch for?

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: Uri Boness [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 11, 2008 8:56:02 PM
 Subject: synonym token types and ranking
 
 Hi,
 
 I've noticed that currently the SynonymFilter replaces the original 
 token with the configured tokens list (which includes the original 
 matched token) and each one of these tokens is of type word. Wouldn't 
 it make more sense to only mark the original token as type word and 
 the the other tokens as synonym types? In addition, once payloads are 
 integrated with Solr, it would be nice if it would be possible to 
 configure a payload for synonyms. One of the requirements we're 
 currently facing in our project is that matches on synonyms should weigh 
 less than exact matches.
 
 cheers,
 Uri



Re: Strategy for presenting fresh data

2008-06-11 Thread Otis Gospodnetic
Hi James,

Yes, this makes sense.  I've recommended doing the same to others before.  It 
would be good to have this be a part of Solr.  There is one person (named 
Jason) working on adding more real-time search support to both Lucene and Solr.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch


- Original Message 
 From: James Brady [EMAIL PROTECTED]
 To: solr-user@lucene.apache.org
 Sent: Wednesday, June 11, 2008 11:24:38 PM
 Subject: Strategy for presenting fresh data
 
 Hi,
 The product I'm working on requires new documents to be searchable  
 very quickly (inside 60 seconds is my goal). The corpus is also going  
 to grow very large, although it is perfectly partitionable by user.
 
 The approach I tried first was to have write-only masters and read- 
 only slaves with data being replicated from one to another postCommit  
 and postOptimise.
 
 This allowed new documents to be visible inside 5 minutes or so (until  
 the indexes got so large that re-opening IndexSearchers took for ever,  
 that is...), but still not good enough.
 
 Now, I am considering cutting out the commit / replicate / re-open  
 cycle by augmenting Solr with a RAMDirectory per core.
 
 Your thoughts on the following approach would be much appreciated:
 
 Searches would be forked to both the RAMDirectory and FSDirectory,  
 while writes would go to the RAMDirectory only. The RAMDirectory would  
 be flushed back to the FSDirectory regularly, using  
 IndexWriter.addIndexes (or addIndexesNoOptimise).
 
 Effectively, I'd be creating a searchable queue in front of a  
 regularly committed and optimised conventional index.
 
 As this seems to be a useful pattern (and is mentioned tangentially in  
 Lucene in Action), is there already support for this in Lucene?
 
 Thanks,
 James



Re: searching only within allowed documents

2008-06-11 Thread climbingrose
It depends on your query. The second query is better if you know that
fieldb:bar filtered query will be reused often since it will be cached
separately from the query. The first query occuppies one cache entry while
the second one occuppies two cache entries, one in queryCache and one in
filteredCache. Therefore, if you're not going to reuse fieldb:bar, the
second query is better.

On Wed, Jun 11, 2008 at 10:53 PM, Geoffrey Young [EMAIL PROTECTED]
wrote:



  Solr allows you to specify filters in separate parameters that are
 applied to the main query, but cached separately.

 q=the user queryfq=folder:f13fq=folder:f24


 I've been wanting more explanation around this for a while, so maybe now is
 a good time to ask :)

 the cached separately verbiage here is the same as in the twiki, but I
 don't really understand what it means.  more precisely, I'm wondering what
 the real performance, caching, etc differences are between

  q=fielda:foo+fieldb:barmm=100%

 and

  q=fielda:foofq=fieldb:bar

 my situation is similar to the original poster's in that documents matching
 fielda is very large and common (say theaters across the world) while fieldb
 would narrow it considerably (one by country, then one by zipcode, etc).

 thanks

 --Geoff





-- 
Regards,

Cuong Hoang


Re: searching only within allowed documents

2008-06-11 Thread climbingrose
Just correct myself, in the last setence, the first query is better if
fieldb:bar isn't reused often

On Thu, Jun 12, 2008 at 2:02 PM, climbingrose [EMAIL PROTECTED]
wrote:

 It depends on your query. The second query is better if you know that
 fieldb:bar filtered query will be reused often since it will be cached
 separately from the query. The first query occuppies one cache entry while
 the second one occuppies two cache entries, one in queryCache and one in
 filteredCache. Therefore, if you're not going to reuse fieldb:bar, the
 second query is better.


 On Wed, Jun 11, 2008 at 10:53 PM, Geoffrey Young 
 [EMAIL PROTECTED] wrote:



  Solr allows you to specify filters in separate parameters that are
 applied to the main query, but cached separately.

 q=the user queryfq=folder:f13fq=folder:f24


 I've been wanting more explanation around this for a while, so maybe now
 is a good time to ask :)

 the cached separately verbiage here is the same as in the twiki, but I
 don't really understand what it means.  more precisely, I'm wondering what
 the real performance, caching, etc differences are between

  q=fielda:foo+fieldb:barmm=100%

 and

  q=fielda:foofq=fieldb:bar

 my situation is similar to the original poster's in that documents
 matching fielda is very large and common (say theaters across the world)
 while fieldb would narrow it considerably (one by country, then one by
 zipcode, etc).

 thanks

 --Geoff





 --
 Regards,

 Cuong Hoang




-- 
Regards,

Cuong Hoang


Re: Strategy for presenting fresh data

2008-06-11 Thread rohit arora



Hi,

I am new to Solr Lucene I have only one defaule core i am working on creating 
multiple core. 
Can you help me in this matter.

with regards
nbsp;Rohit Arora
--- On Thu, 6/12/08, James Brady lt;[EMAIL PROTECTED]gt; wrote:
From: James Brady lt;[EMAIL PROTECTED]gt;
Subject: Strategy for presenting fresh data
To: solr-user@lucene.apache.org
Date: Thursday, June 12, 2008, 8:54 AM

Hi,
The product I'm working on requires new documents to be searchable  
very quickly (inside 60 seconds is my goal). The corpus is also going  
to grow very large, although it is perfectly partitionable by user.

The approach I tried first was to have write-only masters and read- 
only slaves with data being replicated from one to another postCommit  
and postOptimise.

This allowed new documents to be visible inside 5 minutes or so (until  
the indexes got so large that re-opening IndexSearchers took for ever,  
that is...), but still not good enough.

Now, I am considering cutting out the commit / replicate / re-open  
cycle by augmenting Solr with a RAMDirectory per core.

Your thoughts on the following approach would be much appreciated:

Searches would be forked to both the RAMDirectory and FSDirectory,  
while writes would go to the RAMDirectory only. The RAMDirectory would  
be flushed back to the FSDirectory regularly, using  
IndexWriter.addIndexes (or addIndexesNoOptimise).

Effectively, I'd be creating a searchable queue in front of a  
regularly committed and optimised conventional index.

As this seems to be a useful pattern (and is mentioned tangentially in  
Lucene in Action), is there already support for this in Lucene?

Thanks,
James


  

DataImportHandler questions ..

2008-06-11 Thread Neville Burnell
Hi,

I'm playing with the Solr Data Import Handler, and everything looks
great so far! 

Hopefully we will be able to replace our homegrown ODBC indexing service
[using camping+ferret] with Solr!

The wiki page mentions scheduling full imports and delta imports but I
couldn't find any further details. Is scheduling supported  by the
current handler, or do I need to use an external trigger ?

Also, any idea when the DataImportHandler [SOLR-469] might become part
of the nightlies ? I read somewhere that it might happen RSN fingers
crossed

Thanks,

Neville