Still having indexing problems

2007-05-11 Thread Gary Browne
Hello

 

 

I have tried indexing the example files using the Jetty method, rather
than Tomcat, which still didn't work. I would prefer to use my Tomcat
URL.

 

After starting jettty, I issued

 

Java -jar post.jar http://localhost:8983/solr/update solr.xml
monitor.xml

 

as in the examples on the tutorial, but post.jar cannot be found...

 

Where is it? Is there a path variable I need to set up somewhere?

 

 

Any help greatly appreciated.

 

 

Regards,

 

Gary

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 



RE: Solr concurrent commit not updated

2007-05-11 Thread David Xiao
I have keep the id field be unique.
Actually I found the problem is due to following Python code:

P = subprocess.Popen(arguments, )
It seems that when the program ends, the sub-process started by that call is 
not finish yet. And I guess that's why staticis shows commit but not adddoc

Anyone have similar issue?





-Original Message-
From: James liu [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 11, 2007 11:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr concurrent commit not updated

u should know id is unique number.

2007/5/11, David Xiao [EMAIL PROTECTED]:

 Hello all,



 I have tested by use post.sh in example directory to add xml documents
 into solr. It works when I add one by one.

 But when I have a lot of .xml file to be posted (say about 500-1000 files)
 and I wrote a shell script to call post.sh one by one. I found those xml
 files are not searchable after post.



 But from solr admin page / statistics I found that it records commited
 numbers. But numDocs is not updated.

 So why, when I use post.sh to post one xml it will be fine, but if I use
 post.sh for 500 times, each time one xml will be different behavior?



 Regards,

 David




-- 
regards
jl



Crawler for solr

2007-05-11 Thread David Xiao
Hello,

 

I am using crawler to index and search some intranet webpages which need 
authorization. I wrote my own crawler for this kind of needs. But with the 
requirement is evolving, I need another crawler for external webpages (on 
internet)  too, so I am looking for a generic crawler that can integrate with 
Solr.

 

The crawler should be easy to configure and able to customize Xml output 
according to schema.xml

Does anyone had good idea?

 

Regards,

David



Re: Requests per second/minute monitor?

2007-05-11 Thread Yonik Seeley

On 5/10/07, Ian Holsman [EMAIL PROTECTED] wrote:

What I would like to know is (and excuse the newbieness of the question) how
to enable solr to log a file with the following data.


- time spent (ms) in the request.


currently logged


- IP# of the incoming request


normally in the container access log?


- what the request was (and what handler executed it)


currently logged


- a status code to signal if the request failed for some reasons


currently logged


- number of rows fetched


The number of documents that matched?  That's a higher level concept
rather specific to a request handler.   That info is returned in most
responses though.


and
- the number of rows actually returned


That's also in the response, but would be largely meaningless in general.
One could also determine this number from the input parameters and the
number of docs that matched.

A better number might be size of the response (which is normally in
the container access log).
fields could be very small, or very large, and faceting, highlighting,
or other data could dwarf the size/speed due to the main response
documents.

-Yonik


Re: Crawler for solr

2007-05-11 Thread Brian Whitman

On May 11, 2007, at 7:32 AM, David Xiao wrote:

Hello,
I am using crawler to index and search some intranet webpages which  
need authorization. I wrote my own crawler for this kind of needs.  
But with the requirement is evolving, I need another crawler for  
external webpages (on internet)  too, so I am looking for a generic  
crawler that can integrate with Solr.


The crawler should be easy to configure and able to customize Xml  
output according to schema.xml



Nutch with the SolrIndexer and the solrj client is wonderful for this.





Re: Index Concurrency

2007-05-11 Thread Yonik Seeley

On 5/10/07, joestelmach [EMAIL PROTECTED] wrote:

 Yes, coordination between the main index searcher, the index writer,
 and the index reader needed to delete other documents.

Can you point me to any documentation/code that describes this
implementation?


Look at SolrCore.getSearcher() and DirectUpdateHandler2.

-Yonik


Re: Still having indexing problems

2007-05-11 Thread Yonik Seeley

On 5/11/07, Gary Browne [EMAIL PROTECTED] wrote:

Hello

I have tried indexing the example files using the Jetty method, rather
than Tomcat, which still didn't work. I would prefer to use my Tomcat
URL.

After starting jettty, I issued

Java -jar post.jar http://localhost:8983/solr/update solr.xml
monitor.xml

as in the examples on the tutorial, but post.jar cannot be found...


Try using the latest nightly build?
If you are using 1.1, just use the post.sh

-Yonik


Re: New user - indexing problems

2007-05-11 Thread patrick o'leary




Hey Gary

Leave out the URL

just use ./post.sh *.xml

Your causing curl to attempt to make a get request.


P


Gary Browne wrote:

  Hi

 

I'll probably be posting a bunch of stupid questions in the near future,
so bear with me. I'm finding the documentation a little confusing. For
starters, I've got Solr up and running under Tomcat on port 8080, and I
can pull up the admin page, no problems. I'm running on RHEL AS 4, with
curl installed.

 

I'm not sure how to get indexing started - I tried the following:

 

./post.sh http://localhost:8080/solr/update solr.xml monitor.xml (from
exampledocs directory)

 

 and received this error message::

 

The specified HTTP method is not allowed for the requested resource
(HTTP method GET is not supported by this URL).

 

Any help with this would be much appreciated.

 

Regards

Gary

 

 

Gary Browne
Development Programmer
Library IT Services
University of Sydney
Australia
ph: 61-2-9351 5946 

 

  


-- 
Patrick O'Leary

AOL Syndication Technologies
Phone: + 1 703 265 8763

Honesty is the best policy, but insanity is a better defense !

View
Patrick O Leary's profile





delete for multiple documents at once

2007-05-11 Thread Maximilian Hütter
Hi,

I'm trying to delete multiple documents at once, but it doesn't work.

I am sending this:

?xml version=1.0 encoding=UTF-8?
delete
id1_3223_po_opc_2/id
id1_2454_po_opc_4/id
/delete

result status=0/resultresult
status=1org.xmlpull.v1.XmlPullParserException: expected START_TAG or
END_TAG not TEXT (position: TEXT seen
...po_opc_2lt;/idgt;\nlt;idgt;1_2454_po_opc_4lt;/... @4:50)
at org.xmlpull.mxp1.MXParser.nextTag(MXParser.java:1083)
at org.apache.solr.core.SolrCore.update(SolrCore.java:832)
at
org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:53)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
at
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:498)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:185)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:715)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:401)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:458)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:790)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:628)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:209)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:358)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:217)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
/result

Isn't it possible to do deletes like that?

Thanks,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: Alphabetical Facets

2007-05-11 Thread Kevin Osborn
I don't have any pointers, but I would love to have this feature.

- Original Message 
From: Ryan McKinley [EMAIL PROTECTED]
To: solr-user@lucene.apache.org
Sent: Friday, May 11, 2007 9:23:02 AM
Subject: Alphabetical Facets

Has anyone given any thought to alphabetical faceting?

I'd like to be able to display facets sorted alphabetically rather then 
by count or index order.  For example, all the subjects for a something 
of type=a and in collection=b sorted alphabetically.

Any pointers before I delve into it?

ryan







Re: Alphabetical Facets

2007-05-11 Thread Chris Hostetter

: Has anyone given any thought to alphabetical faceting?

if by alphabetical you mean the natural unicode ordering of terms for
facet.field type facets -- that's already supported.

It's the default sort if there is no facet limit (ie:  facet.limit=-1) but
even with a limit it can be explicitly turned on with facet.sort=false

http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b061e37c702203c99d8853d5f1
http://localhost:8983/solr/select/?q=*%3A*facet=truefacet.field=catrows=0facet.limit=5facet.sort=false


-Hoss



Re: Still having indexing problems

2007-05-11 Thread Chris Hostetter

: Java -jar post.jar http://localhost:8983/solr/update solr.xml
: monitor.xml

: as in the examples on the tutorial, but post.jar cannot be found...

the tutorial on the website is the most current tutorial for the most
current development builds ... please refer to the tutorial included with
the release of Solr you are using for the most acurate information.



-Hoss



Re: can i modifie date format

2007-05-11 Thread Chris Hostetter

James, there is actually already an active thread currently discussing the
various issues of Solr's date format going on, with a lot of details
about the various places formatting might be different, and the issues
involved with allowing more configuration, you may want to catchu pp with
that thread and reply there...

http://www.nabble.com/dates---times-tf3722932.html

The short answer is: at the moment no there is no mechanim for customizing
the Solr format, but the Format is a very universal one, and i would be
extremely suprised if it were not possible to get MySQL to format dates in
that way.

: MS SQL database have one date format
:
: solr have one date format
:
: web page show have one date format
:
: why not user config date format, solr read date format rule,
:
: maybe like this, http://cn2.php.net/manual/en/function.date.php
:
: now solr 1.1 date format is /MM/DD H:I:S?
:
:
:
:
:
:
:
:
: --
: regards
: jl
:



-Hoss



RE: Alphabetical Facets

2007-05-11 Thread Binkley, Peter
Would it be difficult to add support for other unicode collations, for
i18n purposes? 

peter

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 11, 2007 11:38 AM
To: solr-user@lucene.apache.org
Subject: Re: Alphabetical Facets

Chris Hostetter wrote:
 : Has anyone given any thought to alphabetical faceting?
 
 if by alphabetical you mean the natural unicode ordering of terms for 
 facet.field type facets -- that's already supported.
 
 It's the default sort if there is no facet limit (ie:  facet.limit=-1)

 but even with a limit it can be explicitly turned on with 
 facet.sort=false
 
 http://wiki.apache.org/solr/SimpleFacetParameters#head-569f93fb24ec41b
 061e37c702203c99d8853d5f1 
 http://localhost:8983/solr/select/?q=*%3A*facet=truefacet.field=cat;
 rows=0facet.limit=5facet.sort=false
 

perfect!

I read that, but did not realize natural index order is alphabetical
in the ascii range.

thanks
ryan


RE: Alphabetical Facets

2007-05-11 Thread Chris Hostetter

: Would it be difficult to add support for other unicode collations, for
: i18n purposes?

Difficult? ... probably not, but it would require code. :)

The existing natural order sorting on the other hand is there because it
was free and easy ... it's the order terms are enumrated in the index.


-Hoss



Re: New user - indexing problems

2007-05-11 Thread Chris Hostetter
: Leave out the URL
:
: just use ./post.sh *.xml

except that post.sh assumes you are using the example jetty install on
port 8983, so you'll need to edit it to use port 8080



-Hoss



Re: delete for multiple documents at once

2007-05-11 Thread Mike Klaas

On 11-May-07, at 9:43 AM, Maximilian Hütter wrote:


Hi,

I'm trying to delete multiple documents at once, but it doesn't work.

I am sending this:

?xml version=1.0 encoding=UTF-8?
delete
id1_3223_po_opc_2/id
id1_2454_po_opc_4/id
/delete




Isn't it possible to do deletes like that?


No it isn't, but you can do multi deletes using delete by query:

querydocId:XXX OR docID:YYY OR docId:ZZZ ...

-Mike

[acts_as_solr] Release v.0.8 is out

2007-05-11 Thread Thiago Jackiw


The new release v.0.8 of acts_as_solr is out and includes:

NEW - New video tutorial
NEW - Faceted search has been implemented and its possible to 'drill-down' on 
the facets
NEW - New rake tasks you can use to start/stop the solr server in test, 
development and production environments: (thanks Matt Clark)
rake solr:start|stop RAILS_ENV=test|development|production (defaults to 
development if none given)

NEW - Changes to the plugin's test framework and it now supports Sqlite as well 
(thanks Matt Clark)
FIX - Patch applied (thanks Micah) that allows one to have multiple solr 
instances in the same servlet
FIX - Patch applied (thanks Micah) that allows indexing of STIs
FIX - Patch applied (thanks Gordon) that allows the plugin to use a table's 
primary key different than 'id'
FIX - Returning empty array instead of empty strings when no records are found
FIX - Problem with unit tests failing due to order of the tests and speed of 
the commits

== About ==
This plugin adds full text search capabilities and many other nifty features 
from Apache's Solr to any Rails model

== Installation ==
On your Rails' root directory, just type
 script/plugin install http://opensvn.csie.org/acts_as_solr/trunk

== Very Basic Usage ==
Just include the line below to any of your ActiveRecord models:
 acts_as_solr

Or if you want, you can specify only the fields that should be indexed:
 acts_as_solr :fields = [:name, :author]

Then to find instances of your model, just do:
 Model.find_by_solr(query) or Model.find_id_by_solr(query)

Or if you want to specify the starting row and the number of rows per page:
 Model.find_by_solr(query, :start = 0, :rows = 10)


Get it while it's hot = http://acts-as-solr.rubyforge.org

--
Thiago Jackiw
acts_as_solr = http://acts-as-solr.rubyforge.org
Sitealizer = http://sitealizer.rubyforge.org



Re: Solr concurrent commit not updated

2007-05-11 Thread Mike Klaas


On 11-May-07, at 2:45 AM, David Xiao wrote:


I have keep the id field be unique.
Actually I found the problem is due to following Python code:

P = subprocess.Popen(arguments, )
It seems that when the program ends, the sub-process started by  
that call is not finish yet. And I guess that's why staticis shows  
commit but not adddoc


Anyone have similar issue?


When a unix process terminates, its child processes are also  
terminated (well, it depends on exactly how you created them).


Actually, I'm not sure about that on further thought.  However, it is  
best to wait for your processes to complete.  After spawning them  
all, you can use P.wait() to wait for the processes individually, or  
os.wait() to wait for any of them to complete.


Of course, since you are using python anyway, it would be best to open 
() the xml file and post it  yourself (threadedly if you want some  
concurrency).


regards,
-Mike



Re: delete for multiple documents at once

2007-05-11 Thread Yonik Seeley

On 5/11/07, Mike Klaas [EMAIL PROTECTED] wrote:

On 11-May-07, at 9:43 AM, Maximilian Hütter wrote:
 I'm trying to delete multiple documents at once, but it doesn't work.

 I am sending this:

 ?xml version=1.0 encoding=UTF-8?
 delete
 id1_3223_po_opc_2/id
 id1_2454_po_opc_4/id
 /delete


 Isn't it possible to do deletes like that?

No it isn't, but you can do multi deletes using delete by query:


Sounds like it should be added though...

-Yonik


Re: Alphabetical Facets

2007-05-11 Thread Yonik Seeley

On 5/11/07, Binkley, Peter [EMAIL PROTECTED] wrote:

Would it be difficult to add support for other unicode collations, for
i18n purposes?


It would require collecting *all* of the facet terms/counts, which is
potentially very large, and then re-sorting.  Definitely much more
expensive to do.

-Yonik