Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt

Yonik Seeley schrieb:

On 9/5/07, Brian Carmalt [EMAIL PROTECTED] wrote:
  

I've bin trying to index a 300MB file to solr 1.2. I keep getting out of
memory heap errors.



300MB of what... a single 300MB document?  Or is that file represent
multiple documents in XML or CSV format?

-Yonik
  

Hello Yonik,

Thank you for your fast reply.  It is one large document. If it was made up
of smaller docs, I would split it up and index them separately.

Can Solr be made to handle such large docs?

Thanks, Brian


Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt

Hello again,

I run Solr on Tomcat under windows and use the tomcat monitor to start 
the service. I have set the minimum heap
size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of 
ram. The error that I get after sending

approximately 300 MB is:

java.lang.OutOfMemoryError: Java heap space
   at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947)
   at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
   at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)
   at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
   at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
   at 
org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)

   at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
   at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
   at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
   at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
   at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
   at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
   at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
   at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
   at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261)
   at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
   at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581)
   at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)

   at java.lang.Thread.run(Thread.java:619)

After sleeping on the problem I see that it does not directly stem from 
Solr, but from the

module  org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas.

First is this doable?
If yes, will I have to modify the code to save the file to disk and then 
read it back

in order to index it in chunks.
Or can I get it it working on a stock Solr install.

Thanks,

Brian

Norberto Meijome schrieb:

On Wed, 05 Sep 2007 17:18:09 +0200
Brian Carmalt [EMAIL PROTECTED] wrote:

  
I've bin trying to index a 300MB file to solr 1.2. I keep getting out of 
memory heap errors.

Even on an empty index with one Gig of vm memory it sill won't work.



Hi Brian,

VM != heap memory.

VM = OS memory
heap memory = memory made available by the JavaVM to the Java process. Heap 
memory errors are hardly ever an issue of the app itself (other , of course, 
with bad programming... but it doesnt seem to be issue here so far)


[EMAIL PROTECTED] [Thu Sep  6 14:59:21 2007]
/usr/home/betom
$ java -X
[...]
-Xmssizeset initial Java heap size
-Xmxsizeset maximum Java heap size
-Xsssizeset java thread stack size
[...]

For example, start solr as :
java  -Xms64m -Xmx512m   -jar start.jar

YMMV with respect to the actual values you use.

Good luck,
B
_
{Beto|Norberto|Numard} Meijome

Windows caters to everyone as though they are idiots. UNIX makes no such assumption. 
It assumes you know what you are doing, and presents the challenge of figuring it out for yourself if you don't.


I speak for myself, not my employer. Contents may be hot. Slippery when wet. 
Reading disclaimers makes you go blind. Writing them is worse. You have been 
Warned.

  




Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 08:55 +0200, Brian Carmalt wrote:
 Hello again,
 
 I run Solr on Tomcat under windows and use the tomcat monitor to start 
 the service. I have set the minimum heap
 size to be 512MB and then maximum to be 1024mb. The system has 2 Gigs of 
 ram. The error that I get after sending
 approximately 300 MB is:
 
 java.lang.OutOfMemoryError: Java heap space
 at org.xmlpull.mxp1.MXParser.fillBuf(MXParser.java:2947)
 at org.xmlpull.mxp1.MXParser.more(MXParser.java:3026)
 at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1384)
 at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
 at org.xmlpull.mxp1.MXParser.nextText(MXParser.java:1058)
 at 
 org.apache.solr.handler.XmlUpdateRequestHandler.readDoc(XmlUpdateRequestHandler.java:332)
 at 
 org.apache.solr.handler.XmlUpdateRequestHandler.update(XmlUpdateRequestHandler.java:162)
 at 
 org.apache.solr.handler.XmlUpdateRequestHandler.handleRequestBody(XmlUpdateRequestHandler.java:84)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:77)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:658)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:191)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:159)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:104)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:261)
 at 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
 at 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:581)
 at 
 org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)
 
 After sleeping on the problem I see that it does not directly stem from 
 Solr, but from the
 module  org.xmlpull.mxp1.MXParser. Hmmm. I'm open to sugestions and ideas.

Which version do you use of solr?

http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup

The trunk version of the XmlUpdateRequestHandler is now based on StAX.
You may want to try whether that is working better.

Please try and report back.

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Tagging using SOLR

2007-09-06 Thread Doss
Dear all,

We are running an appalication built using SOLR, now we are trying to build
a tagging system using the existing SOLR indexed field called
tag_keywords, this field has different keywords seperated by comma, please
give suggestions on how can we build tagging system using this field?

Thanks,
Mohandoss.


Re: Indexing very large files.

2007-09-06 Thread Brian Carmalt

Moin Thorsten,
I am using Solr 1.2.0. I'll try the svn version out and see of that helps.

Thanks,
Brian


Which version do you use of solr?

http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/handler/XmlUpdateRequestHandler.java?view=markup

The trunk version of the XmlUpdateRequestHandler is now based on StAX.
You may want to try whether that is working better.

Please try and report back.

salu2
  




solr.py problems with german Umlaute

2007-09-06 Thread Christian Klinger

Hi all,

i try to add/update documents with
the python solr.py api.

Everything works fine so far
but if i try to add a documents which contain
German Umlaute (ö,ä,ü, ...) i got errors.

Maybe someone has an idea how i could convert
my data?
Should i post this to JIRA?

Thanks for help.

Btw: I have no sitecustomize.py .

This is my script:
--
from solr import *
title=Übersicht
kw = {'id':'12','title':title,'system':'plone','url':'http://www.google.de'}
c = SolrConnection('http://192.168.2.13:8080/solr')
c.add_many([kw,])
c.commit()
--

This is the error:

  File t.py, line 5, in ?
c.add_many([kw,])
  File /usr/local/lib/python2.4/site-packages/solr.py, line 596, in 
add_many

self.__add(lst, doc)
  File /usr/local/lib/python2.4/site-packages/solr.py, line 710, in __add
lst.append('field name=%s%s/field' % (
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)




Re: Indexing very large files.

2007-09-06 Thread Thorsten Scherler
On Thu, 2007-09-06 at 11:26 +0200, Brian Carmalt wrote:
 Hallo again,
 
 I checked out the solr source and built the 1.3-dev version and then I 
 tried to index the same file to the new server.
 I do get a different exception trace, but the result is the same.
 
 java.lang.OutOfMemoryError: Java heap space
 at java.util.Arrays.copyOf(Arrays.java:2882)
 at 
 java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)

It seems that you are reaching the limits because of the StringBuilder.

Did you try to raise the mem to the max like:
java  -Xms1536m -Xmx1788m -jar start.jar

Anyway you will have to look into 
SolrInputDocument readDoc(XMLStreamReader parser) throws
XMLStreamException {
...
StringBuilder text = new StringBuilder();
...
case XMLStreamConstants.CHARACTERS:
  text.append( parser.getText() );
  break;
...

The problem is that the text object is bigger then heaps, 
maybe invoking garbage collection before will help.

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Tagging using SOLR

2007-09-06 Thread Erik Hatcher


On Sep 6, 2007, at 3:29 AM, Doss wrote:
We are running an appalication built using SOLR, now we are trying  
to build

a tagging system using the existing SOLR indexed field called
tag_keywords, this field has different keywords seperated by  
comma, please

give suggestions on how can we build tagging system using this field?


There is also a wiki page on some brainstorming on how to implement  
tagging within Solr: http://wiki.apache.org/solr/UserTagDesign


It's easy enough to have a tag_keywords field, but updating a single  
tag_keywords field is not so straightforward without sending the  
entire document to Solr every time it is tagged.  See SOLR-139's  
extensive comments and patches to see what you're getting into.


Erik



Re: Replication broken.. no helpful errors?

2007-09-06 Thread Bill Au
The snapinstaller script opens a new searcher by calling commit.  From the
attached debug output it looks like that actually worked:

+ /opt/solr/bin/commit
+ [[ 0 != 0 ]]
+ logExit ended 0

Try running the /opt/solr/bin/commit directly with the -V option.

Bill

On 9/5/07, Matthew Runo [EMAIL PROTECTED] wrote:

 If it helps anyone, this index is around a gig in size.

 ++
   | Matthew Runo
   | Zappos Development
   | [EMAIL PROTECTED]
   | 702-943-7833
 ++


 On Sep 5, 2007, at 3:14 PM, Matthew Runo wrote:

  It seems that the scripts cannot open new searchers at the end of
  the process, for some reason. Here's a message from cron, but I'm
  not sure what to make of it... It looks like the files properly
  copied over, but failed the install. I removed the temp* directory,
  but still SOLR could not launch a new searcher. I don't see any
  activity in catalina.out though...
 
 
  started by tomcat5
  command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/
  data -S /opt/solr/logs -d /opt/solr/data -v
  pulling snapshot temp-snapshot.20070905150504
  receiving file list ... done
  deleting segments_1ine
  deleting _164h_1.del
  deleting _164h.tis
  deleting _164h.tii
  deleting _164h.prx
  deleting _164h.nrm
  deleting _164h.frq
  deleting _164h.fnm
  deleting _164h.fdx
  deleting _164h.fdt
  deleting _164g_1.del
  deleting _164g.tis
  deleting _164g.tii
  deleting _164g.prx
  deleting _164g.nrm
  deleting _164g.frq
  deleting _164g.fnm
  deleting _164g.fdx
  deleting _164g.fdt
  deleting _164f_1.del
  deleting _164f.tis
  deleting _164f.tii
  deleting _164f.prx
  deleting _164f.nrm
  deleting _164f.frq
  deleting _164f.fnm
  deleting _164f.fdx
  deleting _164f.fdt
  deleting _164e_1.del
  deleting _164e.tis
  deleting _164e.tii
  deleting _164e.prx
  deleting _164e.nrm
  deleting _164e.frq
  deleting _164e.fnm
  deleting _164e.fdx
  deleting _164e.fdt
  deleting _164d_1.del
  deleting _164d.tis
  deleting _164d.tii
  deleting _164d.prx
  deleting _164d.nrm
  deleting _164d.frq
  deleting _164d.fnm
  deleting _164d.fdx
  deleting _164d.fdt
  deleting _164c_1.del
  deleting _164c.tis
  deleting _164c.tii
  deleting _164c.prx
  deleting _164c.nrm
  deleting _164c.frq
  deleting _164c.fnm
  deleting _164c.fdx
  deleting _164c.fdt
  deleting _164b_1.del
  deleting _164b.tis
  deleting _164b.tii
  deleting _164b.prx
  deleting _164b.nrm
  deleting _164b.frq
  deleting _164b.fnm
  deleting _164b.fdx
  deleting _164b.fdt
  deleting _164a_1.del
  deleting _164a.tis
  deleting _164a.tii
  deleting _164a.prx
  deleting _164a.nrm
  deleting _164a.frq
  deleting _164a.fnm
  deleting _164a.fdx
  deleting _164a.fdt
  deleting _163z_3.del
  deleting _163z.tis
  deleting _163z.tii
  deleting _163z.prx
  deleting _163z.nrm
  deleting _163z.frq
  deleting _163z.fnm
  deleting _163z.fdx
  deleting _163z.fdt
  deleting _163o_3.del
  deleting _163o.tis
  deleting _163o.tii
  deleting _163o.prx
  deleting _163o.nrm
  deleting _163o.frq
  deleting _163o.fnm
  deleting _163o.fdx
  deleting _163o.fdt
  deleting _163d_4.del
  deleting _163d.tis
  deleting _163d.tii
  deleting _163d.prx
  deleting _163d.nrm
  deleting _163d.frq
  deleting _163d.fnm
  deleting _163d.fdx
  deleting _163d.fdt
  deleting _1632_6.del
  deleting _1632.tis
  deleting _1632.tii
  deleting _1632.prx
  deleting _1632.nrm
  deleting _1632.frq
  deleting _1632.fnm
  deleting _1632.fdx
  deleting _1632.fdt
  deleting _162r_7.del
  deleting _162r.tis
  deleting _162r.tii
  deleting _162r.prx
  deleting _162r.nrm
  deleting _162r.frq
  deleting _162r.fnm
  deleting _162r.fdx
  deleting _162r.fdt
  deleting _162g_d.del
  deleting _162g.tis
  deleting _162g.tii
  deleting _162g.prx
  deleting _162g.nrm
  deleting _162g.frq
  deleting _162g.fnm
  deleting _162g.fdx
  deleting _162g.fdt
  deleting _1625_m.del
  deleting _1625.tis
  deleting _1625.tii
  deleting _1625.prx
  deleting _1625.nrm
  deleting _1625.frq
  deleting _1625.fnm
  deleting _1625.fdx
  deleting _1625.fdt
  deleting _161u_w.del
  deleting _161u.tis
  deleting _161u.tii
  deleting _161u.prx
  deleting _161u.nrm
  deleting _161u.frq
  deleting _161u.fnm
  deleting _161u.fdx
  deleting _161u.fdt
  deleting _161j_16.del
  ./
  _161j_17.del
  _164m.fdt
  _164m.fdx
  _164m.fnm
  _164m.frq
  _164m.nrm
  _164m.prx
  _164m.tii
  _164m.tis
  _164m_1.del
  _164x.fdt
  _164x.fdx
  _164x.fnm
  _164x.frq
  _164x.nrm
  _164x.prx
  _164x.tii
  _164x.tis
  _164x_1.del
  segments.gen
  segments_1inv
 
  sent 516 bytes  received 105864302 bytes  30247090.86 bytes/sec
  total size is 966107226  speedup is 9.13
  + [[ -z search1 ]]
  + [[ -z /opt/solr/logs ]]
  + fixUser -M search1 -S /opt/solr/logs -d /opt/solr/data -V
  + [[ -z tomcat5 ]]
  ++ whoami
  + [[ tomcat5 != tomcat5 ]]
  ++ who -m
  ++ cut '-d ' -f1
  ++ sed '-es/^.*!//'
  + oldwhoami=
  + [[ '' == '' ]]
  +++ pgrep -g0 snapinstaller

RSS syndication Plugin

2007-09-06 Thread Thorsten Scherler
Hi all,

I am curious whether somebody has written a rss plugin for solr.

The idea is to provide a rss syndication link for the current search. 

It should be really easy to implement since it would be just a
transformation solrXml - RSS which easily can be done with a simple
xsl.

Has somebody already done this?

salu2
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: RSS syndication Plugin

2007-09-06 Thread Ryan McKinley


perhaps:
https://issues.apache.org/jira/browse/SOLR-208

in http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/xslt/

check:
example_atom.xsl
example_rss.xsl


Thorsten Scherler wrote:

Hi all,

I am curious whether somebody has written a rss plugin for solr.

The idea is to provide a rss syndication link for the current search. 


It should be really easy to implement since it would be just a
transformation solrXml - RSS which easily can be done with a simple
xsl.

Has somebody already done this?

salu2




Re: Distribution Information?

2007-09-06 Thread Bill Au
That is very strange.  Even if there is something wrong with the config or
code, the static HTML contained in distributiondump.jsp should show up.

Are you using the latest version of the JSP?  There has been a recent fix:

http://issues.apache.org/jira/browse/SOLR-333

Bill

On 9/5/07, Matthew Runo [EMAIL PROTECTED] wrote:

 When I load the distrobutiondump.jsp, there is no output in my
 catalina.out file.

 ++
   | Matthew Runo
   | Zappos Development
   | [EMAIL PROTECTED]
   | 702-943-7833
 ++


 On Sep 5, 2007, at 1:55 PM, Matthew Runo wrote:

  Not that I've noticed. I'll do a more careful grep soon here - I
  just got back from a long weekend.
 
  ++
   | Matthew Runo
   | Zappos Development
   | [EMAIL PROTECTED]
   | 702-943-7833
  ++
 
 
  On Aug 31, 2007, at 6:12 PM, Bill Au wrote:
 
  Are there any error message in your appserver log files?
 
  Bill
 
  On 8/31/07, Matthew Runo [EMAIL PROTECTED] wrote:
  Hello!
 
  /solr/admin/distributiondump.jsp
 
  This server is set up as a master server, and other servers use the
  replication scripts to pull updates from it every few minutes. My
  distribution information screen is blank.. and I couldn't find any
  information on fixing this in the wiki.
 
  Any chance someone would be able to explain how to get this page
  working, or what I'm doing wrong?
 
  ++
| Matthew Runo
| Zappos Development
| [EMAIL PROTECTED]
| 702-943-7833
  ++
 
 
 
 
 




update servlet not working

2007-09-06 Thread Benjamin Li
Hi,

We have the example solr installed with jetty.

We are able to navigate to the solr/admin page, but when we try to
POST an xml document via the command line, there is a fatal error. It
seems that the solr/update servlet isnt running, giving a http 400
error.

does anyone have any clue what is going on?

thats in advance!

-- 
cheers,
ben


Re: update servlet not working

2007-09-06 Thread Chris Hostetter
: We are able to navigate to the solr/admin page, but when we try to
: POST an xml document via the command line, there is a fatal error. It
: seems that the solr/update servlet isnt running, giving a http 400
: error.

a 400 could mean a lot of things ... what is the full HTTP response you 
get back from Solr?  what kinds of Stack traces show up in the Jetty log 
output?




-Hoss



Re: Distribution Information?

2007-09-06 Thread Matthew Runo

Well, I do get...

Distribution Info
Master Server

No distribution info present

...

But there appears to be no information filled in.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 6, 2007, at 6:09 AM, Bill Au wrote:

That is very strange.  Even if there is something wrong with the  
config or
code, the static HTML contained in distributiondump.jsp should show  
up.


Are you using the latest version of the JSP?  There has been a  
recent fix:


http://issues.apache.org/jira/browse/SOLR-333

Bill

On 9/5/07, Matthew Runo [EMAIL PROTECTED] wrote:


When I load the distrobutiondump.jsp, there is no output in my
catalina.out file.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Sep 5, 2007, at 1:55 PM, Matthew Runo wrote:


Not that I've noticed. I'll do a more careful grep soon here - I
just got back from a long weekend.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Aug 31, 2007, at 6:12 PM, Bill Au wrote:


Are there any error message in your appserver log files?

Bill

On 8/31/07, Matthew Runo [EMAIL PROTECTED] wrote:

Hello!

/solr/admin/distributiondump.jsp

This server is set up as a master server, and other servers use  
the

replication scripts to pull updates from it every few minutes. My
distribution information screen is blank.. and I couldn't find any
information on fixing this in the wiki.

Any chance someone would be able to explain how to get this page
working, or what I'm doing wrong?

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++














RE: Indexing very large files.

2007-09-06 Thread Lance Norskog
Now I'm curious: what is the use case for documents this large?

Thanks,

Lance Norskog



Re: Replication broken.. no helpful errors?

2007-09-06 Thread Matthew Runo
The thing is that a new searcher is not opened if I look in the  
stats.jsp page. The index version never changes.


When I run..

sudo /opt/solr/bin/commit -V -u tomcat5

..I get a new searcher opened, but even though it (in theory)  
installed the new index, I see no docs in there. During the  
snapinstaller...


+ echo 2007/09/06 11:43:49 command: /opt/solr/bin/snapinstaller -M  
search1 -S /opt/solr/logs -d /opt/solr/data -V -u tomcat5

+ [[ -n '' ]]
++ ls /opt/solr/data
++ grep 'snapshot\.'
++ grep -v wip
++ sort -r
++ head -1
+ name=temp-snapshot.20070905150504
+ trap 'echo caught INT/TERM, exiting now but partial installation  
may have already occured;/bin/rm -rf ${data_dir/index.tmp$$;logExit  
aborted 13' INT TERM

+ [[ temp-snapshot.20070905150504 == '' ]]
+ name=/opt/solr/data/temp-snapshot.20070905150504
++ cat /opt/solr/logs/snapshot.current


...it would seem that snappuller might not be properly setting the  
directory name - or should it be temp-*?


I had replication working for a few weeks, and then it broke, and has  
been down since. We're going live with this project in about a week,  
and I really need to get this going before then =p


++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 6, 2007, at 6:01 AM, Bill Au wrote:

The snapinstaller script opens a new searcher by calling commit.   
From the

attached debug output it looks like that actually worked:

+ /opt/solr/bin/commit
+ [[ 0 != 0 ]]
+ logExit ended 0

Try running the /opt/solr/bin/commit directly with the -V option.

Bill

On 9/5/07, Matthew Runo [EMAIL PROTECTED] wrote:


If it helps anyone, this index is around a gig in size.

++
  | Matthew Runo
  | Zappos Development
  | [EMAIL PROTECTED]
  | 702-943-7833
++


On Sep 5, 2007, at 3:14 PM, Matthew Runo wrote:


It seems that the scripts cannot open new searchers at the end of
the process, for some reason. Here's a message from cron, but I'm
not sure what to make of it... It looks like the files properly
copied over, but failed the install. I removed the temp* directory,
but still SOLR could not launch a new searcher. I don't see any
activity in catalina.out though...


started by tomcat5
command: /opt/solr/bin/snappuller -M search1 -P 18080 -D /opt/solr/
data -S /opt/solr/logs -d /opt/solr/data -v
pulling snapshot temp-snapshot.20070905150504
receiving file list ... done
deleting segments_1ine
deleting _164h_1.del
deleting _164h.tis
deleting _164h.tii
deleting _164h.prx
deleting _164h.nrm
deleting _164h.frq
deleting _164h.fnm
deleting _164h.fdx
deleting _164h.fdt
deleting _164g_1.del
deleting _164g.tis
deleting _164g.tii
deleting _164g.prx
deleting _164g.nrm
deleting _164g.frq
deleting _164g.fnm
deleting _164g.fdx
deleting _164g.fdt
deleting _164f_1.del
deleting _164f.tis
deleting _164f.tii
deleting _164f.prx
deleting _164f.nrm
deleting _164f.frq
deleting _164f.fnm
deleting _164f.fdx
deleting _164f.fdt
deleting _164e_1.del
deleting _164e.tis
deleting _164e.tii
deleting _164e.prx
deleting _164e.nrm
deleting _164e.frq
deleting _164e.fnm
deleting _164e.fdx
deleting _164e.fdt
deleting _164d_1.del
deleting _164d.tis
deleting _164d.tii
deleting _164d.prx
deleting _164d.nrm
deleting _164d.frq
deleting _164d.fnm
deleting _164d.fdx
deleting _164d.fdt
deleting _164c_1.del
deleting _164c.tis
deleting _164c.tii
deleting _164c.prx
deleting _164c.nrm
deleting _164c.frq
deleting _164c.fnm
deleting _164c.fdx
deleting _164c.fdt
deleting _164b_1.del
deleting _164b.tis
deleting _164b.tii
deleting _164b.prx
deleting _164b.nrm
deleting _164b.frq
deleting _164b.fnm
deleting _164b.fdx
deleting _164b.fdt
deleting _164a_1.del
deleting _164a.tis
deleting _164a.tii
deleting _164a.prx
deleting _164a.nrm
deleting _164a.frq
deleting _164a.fnm
deleting _164a.fdx
deleting _164a.fdt
deleting _163z_3.del
deleting _163z.tis
deleting _163z.tii
deleting _163z.prx
deleting _163z.nrm
deleting _163z.frq
deleting _163z.fnm
deleting _163z.fdx
deleting _163z.fdt
deleting _163o_3.del
deleting _163o.tis
deleting _163o.tii
deleting _163o.prx
deleting _163o.nrm
deleting _163o.frq
deleting _163o.fnm
deleting _163o.fdx
deleting _163o.fdt
deleting _163d_4.del
deleting _163d.tis
deleting _163d.tii
deleting _163d.prx
deleting _163d.nrm
deleting _163d.frq
deleting _163d.fnm
deleting _163d.fdx
deleting _163d.fdt
deleting _1632_6.del
deleting _1632.tis
deleting _1632.tii
deleting _1632.prx
deleting _1632.nrm
deleting _1632.frq
deleting _1632.fnm
deleting _1632.fdx
deleting _1632.fdt
deleting _162r_7.del
deleting _162r.tis
deleting _162r.tii
deleting _162r.prx
deleting _162r.nrm
deleting _162r.frq
deleting _162r.fnm
deleting _162r.fdx
deleting _162r.fdt
deleting _162g_d.del
deleting _162g.tis
deleting _162g.tii
deleting _162g.prx
deleting _162g.nrm
deleting 

Re: updates on the server

2007-09-06 Thread Matthew Runo
On a related note, it'd be great if we could set up a series of  
transformations to be done on data when it comes into the index,  
before being indexed. I guess a custom tokenizer might be the best  
way to do this though..?


ie:

-Post
-Data is cleaned up, properly escaped, etc
-Then data is passed to whatever tokenizer we want to use.

++
 | Matthew Runo
 | Zappos Development
 | [EMAIL PROTECTED]
 | 702-943-7833
++


On Sep 3, 2007, at 7:10 AM, Erik Hatcher wrote:



On Sep 3, 2007, at 12:22 AM, James O'Rourke wrote:
Is there a way to pass the solr server a set of documents without  
all the fields present and only update the fields that are  
provided leaving the remaining document fields intact or do I need  
to pull those documents over the wire myself and do the update  
manual and then add them back to the index?


With Solr currently you cannot update a specific field, you have to  
re-send the entire document to replace the existing one.  However,  
preliminary support for such capability has been contributed here:  
http://issues.apache.org/jira/browse/SOLR-139 - this is not in its  
final form, so this is to use at your own risk given the caveats  
listed in that issue about concurrency.


I'm currently using the patch I posted to that issue in a  
production environment and its working fine thus far, but it will  
change in at least core ways and likely request parameter and  
formatting ways before making its debut in Solr's trunk.


Erik





RE: solr.py problems with german Umlaute

2007-09-06 Thread Lance Norskog
I researched this problem before. The problem I found is that Python strings
are not Unicode by default. You have to do something to make them Unicode.
Here are the links I found:

http://www.reportlab.com/i18n/python_unicode_tutorial.html
 
http://evanjones.ca/python-utf8.html
 
http://jjinux.blogspot.com/2006/04/python-protecting-utf-8-strings-from.html


We do the utf-8 encodesubmit and so our strings are badly encoded and
stored. We are seeing the problem shown in Marc-Andre Lemburg in the
reportlab.com link: an e-forward-accent becomes some Japanese character.

-Original Message-
From: news [mailto:[EMAIL PROTECTED] On Behalf Of Christian Klinger
Sent: Thursday, September 06, 2007 2:55 AM
To: solr-user@lucene.apache.org
Subject: solr.py problems with german Umlaute

Hi all,

i try to add/update documents with
the python solr.py api.

Everything works fine so far
but if i try to add a documents which contain German Umlaute (ö,ä,ü, ...) i
got errors.

Maybe someone has an idea how i could convert my data?
Should i post this to JIRA?

Thanks for help.

Btw: I have no sitecustomize.py .

This is my script:
--
from solr import *
title=Übersicht
kw = {'id':'12','title':title,'system':'plone','url':'http://www.google.de'}
c = SolrConnection('http://192.168.2.13:8080/solr')
c.add_many([kw,])
c.commit()
--

This is the error:

   File t.py, line 5, in ?
 c.add_many([kw,])
   File /usr/local/lib/python2.4/site-packages/solr.py, line 596, in
add_many
 self.__add(lst, doc)
   File /usr/local/lib/python2.4/site-packages/solr.py, line 710, in __add
 lst.append('field name=%s%s/field' % (
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: 
ordinal not in range(128)



Re: solr.py problems with german Umlaute

2007-09-06 Thread Yonik Seeley
On 9/6/07, Brian Carmalt [EMAIL PROTECTED] wrote:
 Try it with title.encode('utf-8').
 As in: kw =
 {'id':'12','title':title.encode('utf-8'),'system':'plone','url':'http://www.google.de'}

It seems like the client library should be responsible for encoding,
not the user.
So try changing
title=Übersicht
  into a unicode string via
title=uÜbersicht

And that should hopefully get your test program working.
If it doesn't it's probably a solr.py bug and should be fixed there.

-Yonik


solr/home

2007-09-06 Thread Matt Mitchell

Hi,

I recently upgraded to Solr 1.2. I've set it up through Tomcat using  
context fragment files. I deploy using the tomcat web manager. In the  
context fragment I set the environment variable solr/home. This use  
to work as expected. The solr/home value pointed to the directory  
where data, conf etc. live. Now, this value doesn't get used and  
instead, tomcat creates a new directory called solr and solr/data  
in the same directory where the context fragment file is located.  
It's not really a problem in this particular instance. I like the  
idea of it defaulting to solr in the same location as the context  
fragment file, but as long as I can depend on it always working like  
that. It is a little puzzling as to why the value in my environment  
setting doesn't work though?


Has anyone else experienced this behavior?

Matt


Re: update servlet not working

2007-09-06 Thread Tom Hill
I don't use the java client, but when I switched to 1.2, I'd get that
message when I forget to add the content type header, as described in
CHANGES.txt

  9. The example solrconfig.xml maps /update to XmlUpdateRequestHandler using
the new request dispatcher (SOLR-104).  This requires posted content to
have a valid contentType: curl -H 'Content-type:text/xml; charset=utf-8'
The response format matches that of /select and returns standard error
codes.  To enable solr1.1 style /update, do not map /update to any
handler in solrconfig.xml (ryan)

But your request log shows a GET, should be a POST, I would think. I'd
double check the parameters on post.jar


On 9/6/07, Benjamin Li [EMAIL PROTECTED] wrote:
 oops, sorry, its says missing content stream

 as far as logs go: i have a request log, didn't find anything with
 stack traces though. where is it? we're using the example one packaged
 with solr.
 GET /solr/update HTTP/1.1 400 1401

 just to make sure, i typed java -jar post.jar solrfile.xml

 thanks!

 On 9/6/07, Chris Hostetter [EMAIL PROTECTED] wrote:
  : We are able to navigate to the solr/admin page, but when we try to
  : POST an xml document via the command line, there is a fatal error. It
  : seems that the solr/update servlet isnt running, giving a http 400
  : error.
 
  a 400 could mean a lot of things ... what is the full HTTP response you
  get back from Solr?  what kinds of Stack traces show up in the Jetty log
  output?
 
 
 
 
  -Hoss
 
 


 --
 cheers,
 ben



Re: Replication broken.. no helpful errors?

2007-09-06 Thread Yonik Seeley
On 9/6/07, Matthew Runo [EMAIL PROTECTED] wrote:
 The thing is that a new searcher is not opened if I look in the
 stats.jsp page. The index version never changes.

The index version is read from the index... hence if the lucene index
doesn't change (even if a ew snapshot was taken), the version won't
change even if a new searcher was opened.

Is the problem on the master side now since it looks like the slave is
pulling a temp-snapshot?

-Yonik


Re: solr/home

2007-09-06 Thread Matt Mitchell

Here you go:

Context docBase=/usr/local/lib/solr.war debug=0  
crossContext=true 
   Environment name=solr/home type=java.lang.String value=/usr/ 
local/projects/my_app/current/solr-home /

/Context

This is the same file I'm putting into the Tomcat manager XML  
Configuration file URL form input.


Matt

On Sep 6, 2007, at 3:25 PM, Tom Hill wrote:


It works for me. (fragments with solr 1.2 on tomcat 5.5.20)

Could you post your fragment file?

Tom


On 9/6/07, Matt Mitchell [EMAIL PROTECTED] wrote:

Hi,

I recently upgraded to Solr 1.2. I've set it up through Tomcat using
context fragment files. I deploy using the tomcat web manager. In the
context fragment I set the environment variable solr/home. This use
to work as expected. The solr/home value pointed to the directory
where data, conf etc. live. Now, this value doesn't get used and
instead, tomcat creates a new directory called solr and solr/data
in the same directory where the context fragment file is located.
It's not really a problem in this particular instance. I like the
idea of it defaulting to solr in the same location as the context
fragment file, but as long as I can depend on it always working like
that. It is a little puzzling as to why the value in my environment
setting doesn't work though?

Has anyone else experienced this behavior?

Matt





Re: solr.py problems with german Umlaute

2007-09-06 Thread Mike Klaas


On 6-Sep-07, at 12:13 PM, Yonik Seeley wrote:


On 9/6/07, Brian Carmalt [EMAIL PROTECTED] wrote:

Try it with title.encode('utf-8').
As in: kw =
{'id':'12','title':title.encode 
('utf-8'),'system':'plone','url':'http://www.google.de'}


It seems like the client library should be responsible for encoding,
not the user.
So try changing
title=Übersicht
  into a unicode string via
title=uÜbersicht

And that should hopefully get your test program working.
If it doesn't it's probably a solr.py bug and should be fixed there.


It may or may not, depending on the vagaries of the encoding in his  
text editor.


What python gets when you enter u'é' is the byte sequence  
corresponding to the encoding of your editor.  For instance, my  
terminal is set to utf-8 and when I type in é it is equivalent to  
entering the bytes C3 A9:


In [5]: 'é'
Out[5]: '\xc3\xa9'

Prepending u does not work, because you are telling python that you  
want these two bytes as unicode characters.  Note that this could be  
fixed by setting python's default encoding to match.


In [1]: u'é'
Out[1]: u'\xc3\xa9'
In [11]: print u'é'
é

The proper thing to do is to interpret the byte sequence given the  
proper encoding:


'é'.decode('utf-8')
Out[3]: u'\xe9'

or enter the desired unicode character directly:

 u'\u00e9'
u'\xe9'
 print u'\u00e9'
é

This is less complicated in the usual case of reading data from a  
file, because the encoding should be known (terminal encoding issues  
are much trickier).  Use codecs.open() to get a unicode-output text  
stream.


-Mike 

searching where a value is not null?

2007-09-06 Thread David Whalen
Hi all.

I'm trying to construct a query that in pseudo-code would read
like this:

field != ''

I'm finding it difficult to write this as a solr query, though.
Stuff like:

NOT field:()

doesn't seem to do the trick.

any ideas?

dw


Re: searching where a value is not null?

2007-09-06 Thread Yonik Seeley
On 9/6/07, David Whalen [EMAIL PROTECTED] wrote:
 Hi all.

 I'm trying to construct a query that in pseudo-code would read
 like this:

 field != ''

 I'm finding it difficult to write this as a solr query, though.
 Stuff like:

 NOT field:()

 doesn't seem to do the trick.

 any ideas?

perhaps field:[* TO *]

-Yonik


Slow response

2007-09-06 Thread Aaron Hammond
I am pretty new to Solr and this is my first post to this list so please
forgive me if I make any glaring errors. 

 

Here's my problem. When I do a search using the Solr admin interface for
a term that I know does not exist in my index the QTime is about 1ms.
However, if I add facets to the search the response takes more than 20
seconds (and sometimes longer) to return. Here is the slow URL - 

 

/select?qf=AUTHOR_t+SUBJECT_t+TITLE_twt=xmlf.AUTHOR_facet.facet.sort=t
ruef.FORMAT_t.facet.limit=25start=0facet=truefacet.mincount=1q=frak
f.FORMAT_t.facet.mincount=1f.ITYPE_facet.facet.mincount=1f.SUBJECT_fa
cet.facet.limit=25facet.field=AUTHOR_facetfacet.field=FORMAT_tfacet.f
ield=LANGUAGE_tfacet.field=PUBDATE_tfacet.field=SUBJECT_facetfacet.fi
eld=AGENCY_facetfacet.field=ITYPE_facetf.AGENCY_facet.facet.sort=true
f.AGENCY_facet.facet.limit=-1rows=10f.ITYPE_facet.facet.limit=-1f.ITY
PE_facet.facet.sort=truef.AUTHOR_facet.facet.limit=25f.LANGUAGE_t.face
t.sort=truef.PUBDATE_t.facet.limit=-1f.AGENCY_facet.facet.mincount=1f
.AUTHOR_facet.facet.mincount=1fl=*fl=scoreqt=dismaxversion=2.2f.SUB
JECT_facet.facet.sort=truef.SUBJECT_facet.facet.mincount=1f.PUBDATE_t.
facet.sort=falsef.FORMAT_t.facet.sort=truef.LANGUAGE_t.facet.limit=25
f.LANGUAGE_t.facet.mincount=1f.PUBDATE_t.facet.mincount=1

 

I am pretty sure I can't be the first to ask this question but I can't
seem to find anything online with the answer. Thanks for your help.

 

Aaron



Re: Slow response

2007-09-06 Thread Yonik Seeley
On 9/6/07, Aaron Hammond [EMAIL PROTECTED] wrote:
 I am pretty new to Solr and this is my first post to this list so please
 forgive me if I make any glaring errors.

 Here's my problem. When I do a search using the Solr admin interface for
 a term that I know does not exist in my index the QTime is about 1ms.
 However, if I add facets to the search the response takes more than 20
 seconds (and sometimes longer) to return. Here is the slow URL -

Faceting on multi-value fields is more a function of the number of
terms in the field (and their distribution) rather than the number of
hits for a query.  That said, perhaps faceting should be able to bail
out if there are no hits.

Is your question more about why faceting takes so long in general, or
why it takes so long if there are no results?  If you haven't, try
optimizing your index for facet faceting in general.  How many docs do
you have in your index?

As a side note, the way multi-valued faceting currently works, it's
actually normally faster if the query returns a large number of hits.

-Yonik


Non-HTTP Indexing

2007-09-06 Thread Renaud Waldura
Dear Solr Users:
 
Is it possible to index documents directly without going through any
XML/HTTP bridge?
I have a large collection (10^7 documents, some very large) and indexing
speed is a concern.
Thanks!
 
--Renaud
 


RE: Non-HTTP Indexing

2007-09-06 Thread Wu, Daniel
There are couple choices, see:

http://wiki.apache.org/solr/SolJava

- Daniel

 -Original Message-
 From: Renaud Waldura [mailto:[EMAIL PROTECTED] 
 Sent: Thursday, September 06, 2007 2:21 PM
 To: solr-user@lucene.apache.org
 Subject: Non-HTTP Indexing
 
 
 Dear Solr Users:
  
 Is it possible to index documents directly without going 
 through any XML/HTTP bridge? I have a large collection (10^7 
 documents, some very large) and indexing speed is a concern. Thanks!
  
 --Renaud
  
 


RE: Slow response

2007-09-06 Thread Aaron Hammond
Thank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller
our result set got. 

Currently we are searching about 1.2 million documents with the source
document about 2KB, but we do duplicate some of the data. I bumped up my
filterCache to 5 million and the 2nd search I did for an non-indexed
term came back in 2.1 seconds so that is much improved. I am a little
concerned about having this value so high but this is our problem and we
will play with it. 

I do have a few follow-up questions. First, in regards to the
filterCache once a single search has been done and facets requested, as
long as new facets aren't requested and the size is large enough then
the filters will remain in the cache, correct?

Also, you mention that faceting is more a function of the number of the
number of terms in the field. The 2 fields causing our problems are
Authors and Subjects. If we divided up the data that made these facets
into more specific fields (Primary author, secondary author, etc.) would
this perform better? So the number of facet fields would increase but
the unique terms for a given facet should be less.

Thanks again for all your help.

Aaron


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Thursday, September 06, 2007 4:17 PM
To: solr-user@lucene.apache.org
Subject: Re: Slow response

On 9/6/07, Aaron Hammond [EMAIL PROTECTED] wrote:
 I am pretty new to Solr and this is my first post to this list so
please
 forgive me if I make any glaring errors.

 Here's my problem. When I do a search using the Solr admin interface
for
 a term that I know does not exist in my index the QTime is about 1ms.
 However, if I add facets to the search the response takes more than 20
 seconds (and sometimes longer) to return. Here is the slow URL -

Faceting on multi-value fields is more a function of the number of
terms in the field (and their distribution) rather than the number of
hits for a query.  That said, perhaps faceting should be able to bail
out if there are no hits.

Is your question more about why faceting takes so long in general, or
why it takes so long if there are no results?  If you haven't, try
optimizing your index for facet faceting in general.  How many docs do
you have in your index?

As a side note, the way multi-valued faceting currently works, it's
actually normally faster if the query returns a large number of hits.

-Yonik


Re: Slow response

2007-09-06 Thread Mike Klaas

On 6-Sep-07, at 3:16 PM, Aaron Hammond wrote:


Thank-you for your response, this does shed some light on the subject.
Our basic question was why were we seeing slower responses the smaller
our result set got.

Currently we are searching about 1.2 million documents with the source
document about 2KB, but we do duplicate some of the data. I bumped  
up my

filterCache to 5 million and the 2nd search I did for an non-indexed
term came back in 2.1 seconds so that is much improved. I am a little
concerned about having this value so high but this is our problem  
and we

will play with it.

I do have a few follow-up questions. First, in regards to the
filterCache once a single search has been done and facets  
requested, as

long as new facets aren't requested and the size is large enough then
the filters will remain in the cache, correct?

Also, you mention that faceting is more a function of the number  
of the

number of terms in the field. The 2 fields causing our problems are
Authors and Subjects. If we divided up the data that made these facets
into more specific fields (Primary author, secondary author, etc.)  
would

this perform better? So the number of facet fields would increase but
the unique terms for a given facet should be less.


There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and  
intersected with the query restul bitset.  This is more general and  
performs well up to a few thousand terms.


2. field enumeration: cache the field contents, and generate counts  
using this data.  Relatively independent of #unique terms, but  
requires at most a single facet value per field per document.


So, if you factor author into Primary author/Secondary author, where  
each is guaranteed to only have one value per doc, this could greatly  
accelerate your faceting.  There are probably fewer unique subjects,  
so strategy 1 is likely fine.


To use strategy 2, just make sure that multivalued=false is set for  
those fields in schema.xml


-Mike


Re: Slow response

2007-09-06 Thread Mike Klaas


On 6-Sep-07, at 3:25 PM, Mike Klaas wrote:



There are essentially two facet computation strategies:

1. cached bitsets: a bitset for each term is generated and  
intersected with the query restul bitset.  This is more general and  
performs well up to a few thousand terms.


2. field enumeration: cache the field contents, and generate counts  
using this data.  Relatively independent of #unique terms, but  
requires at most a single facet value per field per document.


So, if you factor author into Primary author/Secondary author,  
where each is guaranteed to only have one value per doc, this could  
greatly accelerate your faceting.  There are probably fewer unique  
subjects, so strategy 1 is likely fine.


To use strategy 2, just make sure that multivalued=false is set  
for those fields in schema.xml


I forgot to mention that strategy 2 also requires a single token for  
each doc (see http://wiki.apache.org/solr/ 
FAQ#head-14f9f2d84fb2cd1ff389f97f19acdb6ca55e4cd3)


-Mike


caching query result

2007-09-06 Thread Jae Joo
HI,

I am wondering that is there any way for CACHING FACETS SEARCH Result?

I have 13 millions and have facets by states (50). If there is a mechasim to
chche, I may get faster result back.

Thanks,

Jae


removing a field from the relevance calculation

2007-09-06 Thread Bart Smyth
Hi,

I'm having trouble getting a field of type SortableFloatField to not
weigh into to the relevancy score returned for a document.

fieldtype name=sfloat class=solr.SortableFloatField
sortMissingLast=true omitNorms=true/

So far I've tried boosting the field to 0.0 at index time using this
field type - and also implemented a custom Similarity implementation
that overrode lengthNorm(String fieldname, int numTerms) after
converting the field to a text field.

Nothing I do seems to affect the behavior that when the value of the
field in question changes, the score of the document changes along with
it.  The field does need to be both indexed and stored.  There is a
requirement to be able to sort by that field, and it must be returned in
the document when searching.

Am I going about this the wrong way?

Regards,

Bart Smyth

 


IMPORTANT: This e-mail, including any attachments, may contain private or 
confidential information.
If you think you may not be the intended recipient, or if you have received 
this e-mail in error, please contact the sender immediately and delete all 
copies of this e-mail. 
If you are not the intended recipient, you must not reproduce any part of this 
e-mail or disclose its contents to any other party. 
This email represents the views of the individual sender, which do not 
necessarily reflect those of education.au limited except where the sender 
expressly states otherwise. 
It is your responsibility to scan this email and any files transmitted with it 
for viruses or any other defects. education.au limited will not be liable for 
any loss, damage or consequence caused directly or indirectly by this email.



Re: updates on the server

2007-09-06 Thread Erik Hatcher


On Sep 6, 2007, at 2:56 PM, Matthew Runo wrote:
On a related note, it'd be great if we could set up a series of  
transformations to be done on data when it comes into the index,  
before being indexed. I guess a custom tokenizer might be the best  
way to do this though..?


ie:

-Post
-Data is cleaned up, properly escaped, etc
-Then data is passed to whatever tokenizer we want to use.


Solr should do more work on the data indexing side, to allow clients  
to more easily hand documents to it and modify them.  XML isn't  
necessarily the prettiest way, and we see other formats being  
supported with the CSV and rich document indexing.


A custom tokenizer or token filter make great sense in the single  
field sense of data transformation, but parsing some request data  
into multiple fields must be done at a higher level.


Erik



Re: caching query result

2007-09-06 Thread Yonik Seeley
On 9/6/07, Jae Joo [EMAIL PROTECTED] wrote:
 I have 13 millions and have facets by states (50). If there is a mechasim to
 chche, I may get faster result back.

How fast are you getting results back with standard field faceting
(facet.field=state)?


Question on use of wildcard to field name at query

2007-09-06 Thread Toru Matsuzawa
Hi all.

Wildcard cannot be used for field name by specifying query though storage
in index is possible according to the specification of wildcard by
dynamic field.

I want to use wildcard to specify field name at query.

Please teach something a good idea.

The following images.

--document
add
 doc
   field name=id0/feild
   field name=name00hoge hoge/field
   field name=name01hogesaru/field
   field name=name02saru/field
   field name=name03saru saru/field
 /doc
 doc
   field name=id1/feild
   field name=name04hage hage/field
   field name=name10hagesaru/field
   field name=name12hoge/field
 /doc
/add

--schema.xml
  dynamicField name=name* type=text_ws indexed=true stored=true/

--result of query
/select/?q=name0?:hoge
 result:doc 0

/select/?q=name*:hoge
result:doc 0
   doc 1

/select/?q=name1?:hoge
result:doc 1

Thanks,

-- 
Toru Matsuzawa