RE: indexing documents (or pieces of a document) by access controls
Hello, When I had those kind of problems (less complex) with lucene, the only idea was to filter from the front-end, according to the ACL policy. Lucene docs and fields weren't protected, but tagged. Searching was always applied with a field audience, with hierarchical values like public, reserved, protected, secret, so that a public document has the secret value also, to be found with a audience:secret, according to the rights of the user who searchs. For the fields, the not allowed ones for some users where striped. Yes I know this is a possibility...but we happen to want our authorisation facetted based. I am attacking the problem with keeping derived data from lucene in memory all translated into some byte/int values. The hardest part is keeping the derived data in sink with lucene *and* the different jackrabbit users (some have changes in there session but not yet saved their data) Anyway, I can do facetted authorisation + counting in less than 20 ms for 1.000.000 documents (normal pc) so hopefully I can succeed. I must admit OTH, that I did not find some sort of ingenious algorithm, but merely depend on the speed of the processor: doubling the number of documents means doubling the response time and needed memory (though 1.000.000 doc fitted in 25 Mb, so 40.000.000 in a Gb...that is fine by me) May be you can have a look to the xmldb Exist ? The search engine, xquery based, is not focused on the same goals as lucene, but I can promise you that all queries will never return results from documents you are not allowed to read. I did not look at it, but my feeling is that it is not fast enough, Regards Ard -- Frédéric Glorieux École nationale des chartes direction des nouvelles technologies et de l'informatique
Re: problems getting data into solr index
Hello Hoss Thanks for replying, I tried what you suggested as the iniital step of my troubleshooting and it outputs it fine. It was what I suspected initially as well, but thanks for the advice. hossman_lucene wrote: : I'm running solr1.2 and Jetty, I'm having problems looping through a mysql : database with python and putting the data into the solr index. : : Here's the error : : UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 369: : ordinal not in range(128) I may be missing something here, but i don't think that error is coming from Solr ... UnicodeDecodeError appears to be a python error message, so i suspect the probelm is between MySql and your python script .. i bet if yo uchange your script to comment out hte lines where you talk to solr, and just read the data from mysql and throw it to /dev/null you'd still see that message. http://wiki.wxpython.org/UnicodeDecodeError -Hoss -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a5954 Sent from the Solr - User mailing list archive at Nabble.com.
Re: problems getting data into solr index
Hi Yonik Here's the output from netcat POST /solr/update HTTP/1.1 Host: localhost:8983 Accept-Encoding: identity Content-Length: 83 Content-Type: text/xml; charset=utf-8 that looks Ok to me, but I am a bit twp you see. :-) Yonik Seeley wrote: On 6/13/07, vanderkerkoff [EMAIL PROTECTED] wrote: I'm running solr1.2 and Jetty, I'm having problems looping through a mysql database with python and putting the data into the solr index. Here's the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 369: ordinal not in range(128) There are two issues... what char encoding you tell solr to use, via Content-type in the HTTP headers (solr defaults to UTF-8), and then if what you send matches that coding. If you can get the complete message (including HTTP headers) that is being sent to Solr, that would help people debug the problem. One easy way is to use netcat to pretend to be solr: 1) shut down solr 2) start up netcat on solr's port nc -l -p 8983 3) send your update message from the client as you normally would -Yonik -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a6020 Sent from the Solr - User mailing list archive at Nabble.com.
who use time?
i write script to get run time to sure how to performance. i find very intresting thing that i query 2 solr box to get data and solr response show me qtime all zero. but i find multi get data script use time is 0.046674966812134(it will change) solr box in my pc. and index data is very small. so i don't know why it use much time like 0.046674966812134. -- regards jl
Re: problems getting data into solr index
is it ok? 2007/6/14, vanderkerkoff [EMAIL PROTECTED]: Hi Yonik Here's the output from netcat POST /solr/update HTTP/1.1 Host: localhost:8983 Accept-Encoding: identity Content-Length: 83 Content-Type: text/xml; charset=utf-8 that looks Ok to me, but I am a bit twp you see. :-) Yonik Seeley wrote: On 6/13/07, vanderkerkoff [EMAIL PROTECTED] wrote: I'm running solr1.2 and Jetty, I'm having problems looping through a mysql database with python and putting the data into the solr index. Here's the error UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 369: ordinal not in range(128) There are two issues... what char encoding you tell solr to use, via Content-type in the HTTP headers (solr defaults to UTF-8), and then if what you send matches that coding. If you can get the complete message (including HTTP headers) that is being sent to Solr, that would help people debug the problem. One easy way is to use netcat to pretend to be solr: 1) shut down solr 2) start up netcat on solr's port nc -l -p 8983 3) send your update message from the client as you normally would -Yonik -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a6020 Sent from the Solr - User mailing list archive at Nabble.com. -- regards jl
Index time boost is not working
I am using solr in my rails application. When I create the document that need to be stored in Solr I can see the boost values being set on the fields as attributes. However when I browse the indexes through luke I see a boost value of 1. What am I missing. Thanks for your input. -Madhan
Problem with surrogate characters in utf-8
Hi all, I have a problem after updating to solr 1.2. I'm using the bundled jetty that comes with the latest solr release. Some of the contents that are stored in my index contain characters from the unicode private section above 0x10. (They are used by some proprietary software and the text extraction does not throw them out). Contrasting to solr 1.1, the current release returns these characters coded as sequence of two surrogate characters. This could result from some utf-16 conversion that is taking place somewhere in the system? In fact a look into the index with luke suggests that lucene is storing it's data in utf-16 encoding. The code point 0x100058 is stored as the two surrogate characters 0xDBC0 and 0xDC58. This is the same behaviour in solr 1.1 and 1.2. But while in solr 1.1 the character is put together to form one 4-byte utf-8 character in the result, solr 1.2 returns the utf-8 codes for the two surrogate characters that I see using luke. Unfortunately this results in an invalid utf-8 encoded text that (for example) can not be displayed by Internet Explorer. A request like http://localhost:8983/solr/select?q=*:* results in an error message from the browser. This is easy to reproduce if someone would try to debug. I have attached a valid utf-8 encoded xml document that contains the 4-byte encoded codepoint 0x100058. It can be indexed with post.jar. Sending this request via Internet Explorer now results in an error: http://localhost:8983/solr/select?q=*:* utf.xml I tried the new solr 1.2 war file with the old example distribution (solr 1.1 and jetty 5.1). Suprisingly enough this does not reveal the problem. So the whole story might even be a jetty issue. Any ideas? -- Christian ?xml version=1.0 encoding=UTF-8? add doc field name=idUTF8TEST/field field name=nameabcdefgôhijklmnop/field /doc /add
Solr 1.2 HTTP Client for Java
Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: problems getting data into solr index
Hi Brian I've now set the mysqldb to be default charset utf8, and everything else is utf8. collation etc etc. I think I know what the problem is, and it's a really old one and I feel foolish now for not realising it earlier. Our content people are copying and pasting sh*t from word into the content. :-) Now that the database is utf8, I'd like to write something to change the crap from word into a readable value before it get's into the database. Using python, so I suppose this is more of a python question than a solr one. Anyone got any tips anyway? Brian Whitman wrote: Post the line of code this is breaking on. Are you pulling the data from mysql as utf8? Are you setting the encoding of Mysqldb? Solr has no problems with proper utf8 and you don't need to do anything special to get it to work. Check out the newer solr.py in JIRA. -- View this message in context: http://www.nabble.com/problems-getting-data-into-solr-index-tf3915542.html#a8400 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index time boost is not working
Is your field defined with omitNorms=true by any chance? Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Madhan Subhas [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 5:35:17 AM Subject: Index time boost is not working I am using solr in my rails application. When I create the document that need to be stored in Solr I can see the boost values being set on the fields as attributes. However when I browse the indexes through luke I see a boost value of 1. What am I missing. Thanks for your input. -Madhan
Re: Solr 1.2 HTTP Client for Java
On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. signature.asc Description: This is a digitally signed message part
Re: Solr 1.2 HTTP Client for Java
Thanks Martin. I'm using one of them which the optimize command doesn't work properly Have you seen the same problem? Regards, Daniel On 14/6/07 13:07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Delete entire index
I think this would be useful. The other day I hit this problem of fq= not working. It turned out that the schema was changed (some non-indexed fields were made indexed), the bulk upload was done, but that bulk upload left the old index files in place, so ended up with double index within the same index dir. Otis - Original Message From: Ryan McKinley [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Wednesday, June 13, 2007 3:40:52 PM Subject: Re: Delete entire index : : Actually, it's not quite equivalent if there was a schema change. : There are some sticky field properties that are per-segment global. : For example, if you added omitNorms=true to a field, then did Hmmm... I thought the optimize would take care of that? Oh yes, sorry, I was thinking about optimize after you reindexed. If you forget to do optimize, you get a different index though... definitely spooky stuff to someone not expecting it. Is there an easy way to check if the lucene per/field properties are out of sync with the solr schema? If so, maybe we should display it on the admin page. Are there other sticky field properties besides omitNorms? I know I have made changes to a production server where I: 1. change the field definition for a field 2. get the last indexed time for a document of that type 3. index all documents of that type 4. delete everything of that type not addded since the start time 5. optimize It appeared to work fine...
Re: Index time boost is not working
How do you browse the indexes? On 14/06/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Is your field defined with omitNorms=true by any chance? Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Madhan Subhas [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 5:35:17 AM Subject: Index time boost is not working I am using solr in my rails application. When I create the document that need to be stored in Solr I can see the boost values being set on the fields as attributes. However when I browse the indexes through luke I see a boost value of 1. What am I missing. Thanks for your input. -Madhan
Re: Index time boost is not working
Check your schema.xml, that's where you'll see how the field is defined. Otis - Original Message From: Thierry Collogne [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 8:27:54 AM Subject: Re: Index time boost is not working How do you browse the indexes? On 14/06/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Is your field defined with omitNorms=true by any chance? Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Madhan Subhas [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 5:35:17 AM Subject: Index time boost is not working I am using solr in my rails application. When I create the document that need to be stored in Solr I can see the boost values being set on the fields as attributes. However when I browse the indexes through luke I see a boost value of 1. What am I missing. Thanks for your input. -Madhan
RE: Solr 1.2 HTTP Client for Java
The code in http://solrstuff.org/svn/solrj/ is very stable, works with most all features for both searching and indexing and will be moving into the main distribution soon as the standard java client library. - will -Original Message- From: Martin Grotzke [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.2 HTTP Client for Java On Thu, 2007-06-14 at 13:13 +0100, Daniel Alheiros wrote: Thanks Martin. I'm using one of them which the optimize command doesn't work properly Have you seen the same problem? Nope, I'm using the client only for queries - the xml generation / posting to solr is done by another module in our application, and not with java. Cheers, Martin Regards, Daniel On 14/6/07 13:07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -- Martin Grotzke http://www.javakaffee.de/blog/
Re: Solr 1.2 HTTP Client for Java
I tried using that client, but I didn't get any good results while searching for worst with special characters. I have also searched for documentation for that client, but didn't find any. Does anyone know where to find documentation concerning the java client? On 14/06/07, Will Johnson [EMAIL PROTECTED] wrote: The code in http://solrstuff.org/svn/solrj/ is very stable, works with most all features for both searching and indexing and will be moving into the main distribution soon as the standard java client library. - will -Original Message- From: Martin Grotzke [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.2 HTTP Client for Java On Thu, 2007-06-14 at 13:13 +0100, Daniel Alheiros wrote: Thanks Martin. I'm using one of them which the optimize command doesn't work properly Have you seen the same problem? Nope, I'm using the client only for queries - the xml generation / posting to solr is done by another module in our application, and not with java. Cheers, Martin Regards, Daniel On 14/6/07 13:07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -- Martin Grotzke http://www.javakaffee.de/blog/
Conceptual Question
Hey all, i checked out solr and i'm pretty amazed since this could save us a lot of work. we are working on a document managment system and currently change the document structure to be valid to predefined schemas. each document will contain of several 'complex types' what is compareable to xml-snipplets (e.g. two column set of image and list, two column list of links and paragraph,...). there are two main points that i'm asking me (and discussing with my collegues), which will make the decision to spend more time in this solution or not. to be short, here the are: * we will have contents that become similar nested to a part of html. this means they will be much more nested than the examples given to solr. i guess this point will be possible via the import schema. the important point (and first question) is that our schema is very likely to change. that means we will have 'revisions' of documents whereas each revision has its own, slightly different, schema. structuring the documents itself wouldn't be a problem i guess, as we could define an 'id' and 'rev' as unique in combination. but how can we handle the revision dependent schema? is there a good way for such thing? * the second big problem is also related to the versioning and its changing schemas. assume we will have a lot of documents that are built of a document type 'Foo' rev 1. now we decide that the schema of Foo changes, the documents, already stored, become invalid somehow. to solve this, we will create some kind of update procedure that fits all documents to the new schema. will there be a way to solve this problem without fetching all Foo:rev:1 documents and re-importing them as Foo:rev:2 documents? as i write this, it seems to me that this is a stupid question since there is no change interface. nevertheless, do you see any problems here, if ca. 1 documents will be affected at once? i would be very happy about each single opinion for my questions. thank you very much, andi -- Andreas Balke // Lead Developer Digiden GmbH • Agentur für Kommunikationslösungen In der Backfabrik • Saarbrückerstraße 37b • D-10405 Berlin Fon: +49 (30) 446 749 425 • Fax: +49 (30) 446 749 479 www.digiden.de HRB 96276 B • Geschäftsführer: Mike Petersen smime.p7s Description: S/MIME Cryptographic Signature
Re: who use time?
On 6/14/07, James liu [EMAIL PROTECTED] wrote: i write script to get run time to sure how to performance. i find very intresting thing that i query 2 solr box to get data and solr response show me qtime all zero. but i find multi get data script use time is 0.046674966812134(it will change) If you are timing the complete script there is startup time to take into account. If you are only timing the request/response to solr, then that is a bit slow considering the query time itself is less than a millisecond. That does not include document retrieval and response writing. How many documents are you retrieving? If you re-execute the same exact query again, is it still slower? -Yonik
Re: Conceptual Question
On 6/14/07, Andreas Balke [Digiden GmbH] [EMAIL PROTECTED] wrote: the important point (and first question) is that our schema is very likely to change. that means we will have 'revisions' of documents whereas each revision has its own, slightly different, schema. structuring the documents itself wouldn't be a problem i guess, as we could define an 'id' and 'rev' as unique in combination. but how can we handle the revision dependent schema? is there a good way for such thing? Make changes to the schema in a backward compatible way. You can easily add new fields to a schema without any impact to existing documents. However, if you change the type of an existing field, or how it's analyzed, then it doesn't make sense for documents before the change and after to coexist (both sets would not be searchable in a consistent manner). * the second big problem is also related to the versioning and its changing schemas. assume we will have a lot of documents that are built of a document type 'Foo' rev 1. now we decide that the schema of Foo changes, the documents, already stored, become invalid somehow. to solve this, we will create some kind of update procedure that fits all documents to the new schema. The easiest way is to simply delete and reindex all the docs that should change. will there be a way to solve this problem without fetching all Foo:rev:1 documents and re-importing them as Foo:rev:2 documents? as i write this, it seems to me that this is a stupid question since there is no change interface. There is a change interface in JIRA, as long as all of the fields originally sent are stored. -Yonik
RE: Solr 1.2 HTTP Client for Java
Can you provide some examples of the searches you were running and the errors you were getting? - will -Original Message- From: Thierry Collogne [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 10:19 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.2 HTTP Client for Java I tried using that client, but I didn't get any good results while searching for worst with special characters. I have also searched for documentation for that client, but didn't find any. Does anyone know where to find documentation concerning the java client? On 14/06/07, Will Johnson [EMAIL PROTECTED] wrote: The code in http://solrstuff.org/svn/solrj/ is very stable, works with most all features for both searching and indexing and will be moving into the main distribution soon as the standard java client library. - will -Original Message- From: Martin Grotzke [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.2 HTTP Client for Java On Thu, 2007-06-14 at 13:13 +0100, Daniel Alheiros wrote: Thanks Martin. I'm using one of them which the optimize command doesn't work properly Have you seen the same problem? Nope, I'm using the client only for queries - the xml generation / posting to solr is done by another module in our application, and not with java. Cheers, Martin Regards, Daniel On 14/6/07 13:07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -- Martin Grotzke http://www.javakaffee.de/blog/
Re: Problem with surrogate characters in utf-8
On 6/14/07, Burkamp, Christian [EMAIL PROTECTED] wrote: I tried the new solr 1.2 war file with the old example distribution (solr 1.1 and jetty 5.1). Suprisingly enough this does not reveal the problem. So the whole story might even be a jetty issue. That definitely points to it being a Jetty issue. -Yonik
RE: Index time boost is not working
Otis, here is the setting in the schema for the fields I use. OmitNorms is not set to any value here. Should explicitly set the value to false. Thanks Madhan fields field name=id type=string indexed=true stored=true/ field name=default type=text indexed=true stored=false multiValued=true/ dynamicField name=*_i type=integer indexed=true stored=true/ dynamicField name=*_t type=text indexed=true stored=true/ dynamicField name=*_f type=float indexed=true stored=true/ dynamicField name=*_b type=boolean indexed=true stored=true/ dynamicField name=*_d type=date indexed=true stored=true/ dynamicField name=*_s type=string indexed=true stored=true/ dynamicField name=*_ri type=sint indexed=true stored=true/ dynamicField name=*_rf type=sfloat indexed=true stored=true/ dynamicField name=*_facet type=string indexed=true stored=true/ /fields -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 5:32 AM To: solr-user@lucene.apache.org Subject: Re: Index time boost is not working Check your schema.xml, that's where you'll see how the field is defined. Otis - Original Message From: Thierry Collogne [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 8:27:54 AM Subject: Re: Index time boost is not working How do you browse the indexes? On 14/06/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Is your field defined with omitNorms=true by any chance? Otis -- Lucene Consulting -- http://lucene-consulting.com/ - Original Message From: Madhan Subhas [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Thursday, June 14, 2007 5:35:17 AM Subject: Index time boost is not working I am using solr in my rails application. When I create the document that need to be stored in Solr I can see the boost values being set on the fields as attributes. However when I browse the indexes through luke I see a boost value of 1. What am I missing. Thanks for your input. -Madhan
Re: Solr 1.2 HTTP Client for Java
Excellent. Any idea if you are going to make it distributable via the central Maven repo? It could make things easier for those using maven to build their projects... Like me :) Regards, Daniel On 14/6/07 17:09, Ryan McKinley [EMAIL PROTECTED] wrote: I'm working on integrating the solrj client into the official solr source tree right now. It should be ready to use (test!) later today... Once it is in /trunk, it will be easy for us to know what version of what we are talking about and can definitely help work through any issues. good good ryan Thierry Collogne wrote: I tried using that client, but I didn't get any good results while searching for worst with special characters. I have also searched for documentation for that client, but didn't find any. Does anyone know where to find documentation concerning the java client? On 14/06/07, Will Johnson [EMAIL PROTECTED] wrote: The code in http://solrstuff.org/svn/solrj/ is very stable, works with most all features for both searching and indexing and will be moving into the main distribution soon as the standard java client library. - will -Original Message- From: Martin Grotzke [mailto:[EMAIL PROTECTED] Sent: Thursday, June 14, 2007 8:39 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.2 HTTP Client for Java On Thu, 2007-06-14 at 13:13 +0100, Daniel Alheiros wrote: Thanks Martin. I'm using one of them which the optimize command doesn't work properly Have you seen the same problem? Nope, I'm using the client only for queries - the xml generation / posting to solr is done by another module in our application, and not with java. Cheers, Martin Regards, Daniel On 14/6/07 13:07, Martin Grotzke [EMAIL PROTECTED] wrote: On Thu, 2007-06-14 at 11:32 +0100, Daniel Alheiros wrote: Hi I've been using one Java client I got from a colleague but I don't know exactly its version or where to get any update for it. Base package is org.apache.solr.client (where there are some common packages) and the client main package is org.apache.solr.client.solrj. Is it available via Maven2 central repository? Have a look at the issue tracker, there's one with solr clients: http://issues.apache.org/jira/browse/SOLR-20 I've also used one of them, but to be honest, do not remember which one ;) Cheers, Martin Regards, Daniel http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this. -- Martin Grotzke http://www.javakaffee.de/blog/ http://www.bbc.co.uk/ This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated. If you have received it in error, please delete it from your system. Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately. Please note that the BBC monitors e-mails sent or received. Further communication will signify your consent to this.
Re: Solr 1.2 HTTP Client for Java
Any idea if you are going to make it distributable via the central Maven repo? It will be included in the next official solr release. I don't use maven, but assume all official apache projects are included in their repo. If they do nightly snapshots, it will be there ryan
RE: question about highlight field
Hi, Chris, I rewrite the prefix wildcard query consult* to (consult consult?*), it works with highlighting. Do you think it's a possible solution? Could you explain a little bit why put a ? before * won't crash solr if matching a lot of terms? Thanks Xuesong In the trunk (soon to be Solr 1.2) Mike fixed that so the query is rewritten to it's expanded form before highlighting is done ... this works great for true wild card queries (ie: cons*t* or cons?lt*) but for prefix queries Solr has an optimization ofr Prefix queries (ie: consult*) to reduce the likely hood of Solr crashing if the prefix matches a lot of terms ... unfortunately this breaks highlighting of prefix queries, and no one has implemented a solution yet... https://issues.apache.org/jira/browse/SOLR-195 -Hoss
Re: Solr 1.2 HTTP Client for Java
Ryan McKinley wrote: Any idea if you are going to make it distributable via the central Maven repo? It will be included in the next official solr release. I don't use maven, but assume all official apache projects are included in their repo. If they do nightly snapshots, it will be there Each project must actively push [1] released artifacts to maven repository, there is no other way of getting them there. [1] http://www.apache.org/dev/release-publishing.html#maven-repo -- Sami Siren
Keep having error on unknown field
Hi, When I tried to use jetty to index my xml, i kept getting the following error even though I have defined properly in the schema.xml. The error is - SEVERE: org.apache.solr.core.SolrException: ERROR:unknown field 'name'... In my schema, it was defined like this - filed name=name type=string indexed=true stored=true multiValued=true/ The only different between this field and other field is that this field has a lot of value per document, but i already set it multiValued=true. Other than that, i don't understand what else that will makesthis different from the other field. Thanks a lot! Jef
Re: Keep having error on unknown field
On 6/14/07, Tiong Jeffrey [EMAIL PROTECTED] wrote: The error is - SEVERE: org.apache.solr.core.SolrException: ERROR:unknown field 'name'... In my schema, it was defined like this - filed name=name type=string indexed=true stored=true multiValued=true/ filed = field -Yonik
Re: problem with schema.xml
: get fresh log messages when the server is started again. The new : schema.xml that shows the changes is via: : : http://localhost:8983/solr/admin/get-file.jsp?file=schema.xml : : Maybe there's some extra magic to getting the new field to show up at : all as null or something valuable? Is that the behaviour of the : default field tag? whoa ... wait a minute: i think there is a disconect in what exactly should happen when you change your schema.xml ... adding a new field doesn't automaticly create a value ofr that field for every document -- a new stored field won't magicly appear as empty in every document returned when you do a search, adding a new field just means that you can now add documents with that field. so it seems like your new schema.xml probably is being loaded fine when you restart your server -- you can be certain if you try to index a document with that new field and it works. -Hoss
Re: DisMax request handler doesn't work with stopwords?
: I'm having the same issues. We are using Dismax, with a stopword list. : Currently we are having customers typing in model ipod, we added model to : the stopwords list and tested with the standard handler..works fine, but not : with dismax (MM = 3lt;-1 5lt;-2 6lt;90%). When i comment out MM, it : works. Do you have any recommendations on how to deal with this issue, : without doing away with MM (MM does help with alot of phrase queries). are you sure your problem isn't the same as Casey's? that you are using dismax across a field which doesn't treat model as a stopword? can you provide the query toString info from debugQuery=true so we can see exactly how dismax is parsing your request? -Hoss
Re: Solr 1.2 HTTP Client for Java
Daniel Alheiros wrote: Excellent. I just added SOLR-20 to trunk. you will need: 1. checkout trunk 2. ant dist 3. include: apache-solr-1.3-dev-common.jar apache-solr-1.3-dev-solrj.jar solrj-lib/*.jar Here is the basic interface: http://svn.apache.org/repos/asf/lucene/solr/trunk/client/java/solrj/src/org/apache/solr/client/solrj/SolrServer.java For example setting up the two implementations: http://svn.apache.org/repos/asf/lucene/solr/trunk/client/java/solrj/test/org/apache/solr/client/solrj/embedded/ server = new CommonsHttpSolrServer( url ); server = new EmbeddedSolrServer( SolrCore.getSolrCore() ); Give it a go! - - - - As a side note, trunk has had a LOT of changes recently. Now (more then usual) i would recommend against using trunk for anything important. The API and structure is moving around (1.3 will be compatible with 1.2 API, but there is a good chance something is broken now) ryan
facet query counts
I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat. If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges. So, facet.query=Field1%3A%5B40+TO+*%5D facet.query=Field1%3A%5B30+TO+39.9%5D facet.query=Field1%3A%5B20+TO+29.9%5D facet.query=Field1%3A%5B10+TO+19.9%5D facet.query=Field1%3A%5B*+TO+9.9%5D Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set? And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here?
Re: Keep having error on unknown field
arh! i spent 6-7 hours on this error and didnt see this! thanks! On 6/15/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/14/07, Tiong Jeffrey [EMAIL PROTECTED] wrote: The error is - SEVERE: org.apache.solr.core.SolrException: ERROR:unknown field 'name'... In my schema, it was defined like this - filed name=name type=string indexed=true stored=true multiValued=true/ filed = field -Yonik
Re: Keep having error on unknown field
Do we have a bug filed on this? Solr really should have complained about the unknown element. --wunder On 6/14/07 4:54 PM, Tiong Jeffrey [EMAIL PROTECTED] wrote: arh! i spent 6-7 hours on this error and didnt see this! thanks! On 6/15/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/14/07, Tiong Jeffrey [EMAIL PROTECTED] wrote: The error is - SEVERE: org.apache.solr.core.SolrException: ERROR:unknown field 'name'... In my schema, it was defined like this - filed name=name type=string indexed=true stored=true multiValued=true/ filed = field -Yonik
Re: facet query counts
On 14-Jun-07, at 4:29 PM, Kevin Osborn wrote: I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat. If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges. So, facet.query=Field1%3A%5B40+TO+*%5D facet.query=Field1%3A%5B30+TO+39.9%5D facet.query=Field1%3A%5B20+TO+29.9%5D facet.query=Field1%3A%5B10+TO+19.9%5D facet.query=Field1%3A%5B*+TO+9.9%5D Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set? My guess is precision issues. Those are mighty large values to be storing in a binary float--you're probably comparing mostly the exponent, which is not necessarily disjoint. Have you tried sdouble? And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here? Is this true on other field types as well? -Mike
Re: facet query counts
A 32 bit float has about 7 decimal digits of precision, so your range queries actually do overlap since 40f is exactly the same as 39f -Yonik On 6/14/07, Kevin Osborn [EMAIL PROTECTED] wrote: I have a large subset (47640) of my total index. Most of them (45335) have a single field, which we will call Field1. Field1 is a sfloat. If my query restricts the resultset to my subset and I do a facet count on Field1, then the number of records returned is 47640. And if I sum up the facet counts, it adds to 45335. So far, so good. But, I really want to do range queries on Field1. So, I use facet.query to split Field1 into 5 ranges. So, facet.query=Field1%3A%5B40+TO+*%5D facet.query=Field1%3A%5B30+TO+39.9%5D facet.query=Field1%3A%5B20+TO+29.9%5D facet.query=Field1%3A%5B10+TO+19.9%5D facet.query=Field1%3A%5B*+TO+9.9%5D Now, if I sum up the counts, it adds to 54697. I can't find where this number comes from. If I have open-ended ranges on both my high and low end, shouldn't the sum of facet.query equal the sum of a normal facet count? And if a record never has more than one instance of Field1, how can the sum be greater than the total record set? And this problem seems to occur in most (if not all) of my range queries. Is there anything that I am doing wrong here?
Re: Keep having error on unknown field
SOLR-133 includes this fix... it squawks if it hits an unknown tag. Walter Underwood wrote: Do we have a bug filed on this? Solr really should have complained about the unknown element. --wunder On 6/14/07 4:54 PM, Tiong Jeffrey [EMAIL PROTECTED] wrote: arh! i spent 6-7 hours on this error and didnt see this! thanks! On 6/15/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 6/14/07, Tiong Jeffrey [EMAIL PROTECTED] wrote: The error is - SEVERE: org.apache.solr.core.SolrException: ERROR:unknown field 'name'... In my schema, it was defined like this - filed name=name type=string indexed=true stored=true multiValued=true/ filed = field -Yonik
Where are the log files...
It happened twice in the past few days that the solr instance stopped responding (the admin page does not load) while the process was still running. I'd like to find out what's causing this. I notice that I can change logger level from admin page but I didn't figure out where the log files are. Search on solr wiki and internet didn't help... -- Best regards, Jack
Re: who use time?
2007/6/14, Yonik Seeley [EMAIL PROTECTED]: On 6/14/07, James liu [EMAIL PROTECTED] wrote: i write script to get run time to sure how to performance. i find very intresting thing that i query 2 solr box to get data and solr response show me qtime all zero. but i find multi get data script use time is 0.046674966812134(it will change) If you are timing the complete script there is startup time to take into account. If you are only timing the request/response to solr, then that is a bit slow considering the query time itself is less than a millisecond. That does not include document retrieval and response writing. I just timing my script to get data from 2 solr boxes, not complete script. It just query two box and return id,score .rows=10. response type use json. and i see their qtime all zero. How many documents are you retrieving? one: numDocs : 1 maxDoc : 23000 the other: numDocs : 9000 maxDoc : 9000 3-4k per doc If you re-execute the same exact query again, is it still slower? It will be quick. time will be 0.0043279ms -Yonik -- regards jl
Re: Where are the log files...
what version of solr/container are you running? this sounds similar to what people running solr 1.1 with the jetty include in that example... Jack L wrote: It happened twice in the past few days that the solr instance stopped responding (the admin page does not load) while the process was still running. I'd like to find out what's causing this. I notice that I can change logger level from admin page but I didn't figure out where the log files are. Search on solr wiki and internet didn't help...
Re: Where are the log files...
if u use jetty, u should see jetty's log. if u use tomcat, u should see tomcat's log. solr is only a program that run with container. 2007/6/15, Ryan McKinley [EMAIL PROTECTED]: what version of solr/container are you running? this sounds similar to what people running solr 1.1 with the jetty include in that example... Jack L wrote: It happened twice in the past few days that the solr instance stopped responding (the admin page does not load) while the process was still running. I'd like to find out what's causing this. I notice that I can change logger level from admin page but I didn't figure out where the log files are. Search on solr wiki and internet didn't help... -- regards jl
Re: problems getting data into solr index
On 14-Jun-07, at 4:30 AM, vanderkerkoff wrote: Hi Brian I've now set the mysqldb to be default charset utf8, and everything else is utf8. collation etc etc. I think I know what the problem is, and it's a really old one and I feel foolish now for not realising it earlier. Our content people are copying and pasting sh*t from word into the content. :-) Now that the database is utf8, I'd like to write something to change the crap from word into a readable value before it get's into the database. Using python, so I suppose this is more of a python question than a solr one. Anyone got any tips anyway? I've dealt with tons of issues with python and unicode, but I need more information before proceeding with tips. Specifically, what is the format of the shit being copied and pasted into your app, and what python datatype is handling it? I suspect it is encoded somehow, which could be problematic. Is it going through a web browser? How is it getting into mysql? -MIke
Re[2]: Where are the log files...
Yeah, I'm running 1.1 with jetty. But I didn't find *.log in the whole solr directory. Is jetty putting the log files outside the directory? what version of solr/container are you running? this sounds similar to what people running solr 1.1 with the jetty include in that example... Jack L wrote: It happened twice in the past few days that the solr instance stopped responding (the admin page does not load) while the process was still running. I'd like to find out what's causing this. I notice that I can change logger level from admin page but I didn't figure out where the log files are. Search on solr wiki and internet didn't help...
Ping fails out of the box
I just downloaded version 1.2 and set it up on my Windows PC. Search works but Ping returns error 500: --- HTTP ERROR: 500 Internal Server Error RequestURI=/solr/admin/ping Powered by Jetty:// --- Is there any minimum setting for Ping to work? -- Best regards, Jack
Re: Ping fails out of the box
: I just downloaded version 1.2 and set it up on my Windows PC. : Search works but Ping returns error 500: : Is there any minimum setting for Ping to work? the ping url triggers a query which can be configured in the solrconfig.xml, it should work out of the box (even without indexing any data) as long as you have a valid solrconfig.xml and schema.xml ... check the log files, at minimum there will be a stack trace for that 500 error, but prior to that should be some errors from startup letting you know what's wrong. -Hoss