Re: How does HTMLStripWhitespaceTokenizerFactory work?

2007-06-12 Thread Thierry Collogne
Ok. Thanks for the clarification. We will do the stripping before the indexing. On 11/06/07, Chris Hostetter [EMAIL PROTECTED] wrote: : Ok. Is it possible to get back the content without the html tags? Solr never does anything to modify the stored value of a field, so you'd really need to

Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz
On 6/12/07, Yonik Seeley [EMAIL PROTECTED] wrote: ... I think Tika will be the way forward (some of the code for Tika is coming from LIUS)... Work has indeed started to incoroporate the Lius code into Tika, see https://issues.apache.org/jira/browse/TIKA-7 and

storing the document URI in the index

2007-06-12 Thread Ard Schrijvers
Hello, is it possible to configure solr to store the document URI in the lucene index (the URI is not an xml field, but just the document's location)? Or is everybody used to storing the contents of a document in the lucene index (doesn't this imply a much larger index though?), so instead of

Re: storing the document URI in the index

2007-06-12 Thread Erik Hatcher
On Jun 12, 2007, at 8:51 AM, Ard Schrijvers wrote: is it possible to configure solr to store the document URI in the lucene index (the URI is not an xml field, but just the document's location)? Yes. Set the field to be store and non-indexed, field type string is what I use. Or is

RE: storing the document URI in the index

2007-06-12 Thread Ard Schrijvers
Hello Erik, thanks for the fast answer (sry for my mail not indenting but must use webmail :-( ), but the problem I am facing is that I do not see solr storing the location of the documents it indexed. So, I need to store the location of a document in a field, but I do not see where solr

Re: storing the document URI in the index

2007-06-12 Thread Otis Gospodnetic
Ard, You have to store the URI in a Field yourself. That means you need to define that field in the schema and you have to set its value when adding documents. Otis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simpy -- http://www.simpy.com/ - Tag - Search - Share -

RE: storing the document URI in the index

2007-06-12 Thread Ard Schrijvers
Hello Otis, thanks for the info. Would it a be an improvement to be able to specify in the schema.xml wether or not the URI should be stored or not in a field which name you can also specify in the schema? It might be very well possible that you do not own the xml documents you index over

indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Nathaniel A. Johnson
Hi all, Can anyone give me some advice on breaking a document up and indexing it by access control lists. What we have are xml documents that are transformed based on the user viewing it. Some users might see all of the document, while other may see a few fields, and yet others see nothing at

Re: storing the document URI in the index

2007-06-12 Thread Otis Gospodnetic
I'm afraid I don't understand your question. Perhaps somebody else does. Otis - Original Message From: Ard Schrijvers [EMAIL PROTECTED] To: solr-user@lucene.apache.org; solr-user@lucene.apache.org Sent: Tuesday, June 12, 2007 9:23:16 AM Subject: RE: storing the document URI in the

Re: storing the document URI in the index

2007-06-12 Thread Yonik Seeley
On 6/12/07, Ard Schrijvers [EMAIL PROTECTED] wrote: thanks for the info. Would it a be an improvement to be able to specify in the schema.xml wether or not the URI should be stored or not in a field which name you can also specify in the schema? It might be very well possible that you do not

Re: storing the document URI in the index

2007-06-12 Thread Walter Underwood
Solr doesn't have the URL of the document. The document is given to Solr in an HTTP POST. Solr is not a web spider, it is a search web service. wunder On 6/12/07 6:23 AM, Ard Schrijvers [EMAIL PROTECTED] wrote: Hello Otis, thanks for the info. Would it a be an improvement to be able to

RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Hello Nate, IMHO, you will not be able to do this in solr unless you accept pretty hard constraints on your ACLs (I will get back to this in a moment). IMO, it is not possible to index documents along with ACLs. ACLs can be very fine grained, and the thing you describe, ACL specific parts of a

RE: storing the document URI in the index

2007-06-12 Thread Ard Schrijvers
Thanks Yonik and Walter, putting it that way, it does make good sense to not store the transient xml file which it is most of the usecases (I was thinking differently because I do have xml files on file system or over http, like from a webdav call) Anyway, thx for all answers, and again, sry

RE: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ard Schrijvers
Excuse me, I meant solr ofcourse :-) For these reasons, I do not think you can achieve with solar

Tomcat: The requested resource (/solr/update) is not available.

2007-06-12 Thread Matt Mitchell
Hi, I've got an app using Cocoon and Solr, both running through Tomcat. The post.sh file has been modified to grab local files, send it to Cocoon (via http), the Solr-fied xml from Cocoon is then sent to the update url in Tomcat/Solr. Not sure any of that is relevant though! I'm running

RE: Multi-language indexing and searching

2007-06-12 Thread Teruhiko Kurosaka
Daniel, I was reading your email and responses to it with great interest. I was aware that Solr has an implicit assumption that a field is mono-lingual per system. But your mail and its correspondence made me wonder if this limitation is practical for multi-lingual search applications. For

Re: Multi-language indexing and searching

2007-06-12 Thread Yonik Seeley
On 6/12/07, Teruhiko Kurosaka [EMAIL PROTECTED] wrote: For bi-lingual or tri-lingual search, we can have parallel fields (title_en, title_fr, title_de, for example) but this wouldn't scale well. Due to search across multiple fields, or due to increased index size? Lucene and Solr requires

RE: storing the document URI in the index

2007-06-12 Thread Thorsten Scherler
On Tue, 2007-06-12 at 16:33 +0200, Ard Schrijvers wrote: Thanks Yonik and Walter, putting it that way, it does make good sense to not store the transient xml file which it is most of the usecases (I was thinking differently because I do have xml files on file system or over http, like from

Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Ken Krugler
Hi all, Can anyone give me some advice on breaking a document up and indexing it by access control lists. What we have are xml documents that are transformed based on the user viewing it. Some users might see all of the document, while other may see a few fields, and yet others see nothing at

Re: indexing documents (or pieces of a document) by access controls

2007-06-12 Thread Daniel Alheiros
Hi And about the fields, if they are/aren't going to be present on the responses based on the user group, you can do it in many different ways (using XML transformation to remove the undesirable fields, implementing your own RequestHandler able to process your group information, filtering the

Re: LIUS/Fulltext indexing

2007-06-12 Thread Vish D.
Sounds interesting. I can't seem to find any clear dates on the project website. Do you know? ...V1 shipping date? Thanks! On 6/12/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: On 6/12/07, Yonik Seeley [EMAIL PROTECTED] wrote: ... I think Tika will be the way forward (some of the code for

Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz
On 6/12/07, Vish D. [EMAIL PROTECTED] wrote: ...Sounds interesting. I can't seem to find any clear dates on the project website. Do you know? ...V1 shipping date?... Not at the moment, Tika just entered incubation and it's impossible to predict what will happen. But help is welcome, of course

RE: Multi-language indexing and searching

2007-06-12 Thread Teruhiko Kurosaka
Hi Yonik, On 6/12/07, Teruhiko Kurosaka [EMAIL PROTECTED] wrote: For bi-lingual or tri-lingual search, we can have parallel fields (title_en, title_fr, title_de, for example) but this wouldn't scale well. Due to search across multiple fields, or due to increased index size? Due to the

RE: question about sorting

2007-06-12 Thread Xuesong Luo
Thanks, Yonik. Unfortunately we have users whose first names contain more than one word, it seems copy field is my only option. Thanks Xuesong -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Tuesday, June 12, 2007 10:35 AM To:

Re: question about sorting

2007-06-12 Thread Yonik Seeley
On 6/12/07, Xuesong Luo [EMAIL PROTECTED] wrote: Thanks, Yonik. Unfortunately we have users whose first names contain more than one word, it seems copy field is my only option. Yes, if you need to be able to match on part of a first name, rather than just exact first name. -Yonik

RE: Multi-language indexing and searching

2007-06-12 Thread Chris Hostetter
: Due to the prolification of number of fields. Say, we want : to have the field title to have the title of the book in : its original language. But because Solr has this implicit : assumption of one language per field, we would have to have : the artifitial fields title_fr, title_de, title_en,

Re: To make sure XML is UTF-8

2007-06-12 Thread Ajanta Phatak
Hi Not sure if you've had a solution for your problem yet, but I had dealt with a similar issue that is mentioned below and hopefully it'll help you too. Of course, this assumes that your original data is in utf-8 format. The default charset encoding for mysql is Latin1 and our display

Re: LIUS/Fulltext indexing

2007-06-12 Thread Vish D.
Wonder if TOM could be useful to integrate? http://tom.library.upenn.edu/convert/sofar.html On 6/12/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: On 6/12/07, Vish D. [EMAIL PROTECTED] wrote: ...Sounds interesting. I can't seem to find any clear dates on the project website. Do you know?

Re: To make sure XML is UTF-8

2007-06-12 Thread Tiong Jeffrey
Hi Ajanta, thanks! Since I used PHP, I managed to use the PHP decode function to change it to UTF-8. But just a question, even if we change mysql default char-set to UTF-8, and if the input originally is in other format, the mysql engine won't help to convert it to UTF-8 rite? I think my

Compass vs Solr

2007-06-12 Thread Harini Raghavan
Hi Everyone, We have a web application with search functionality built using lucene. The search is across different types of data, so it does not scale well from the database. As lucene does not allow to store relational data, we decided to try out Compass since it provides a object relation