Re: solr plugins

2007-05-25 Thread Chris Hostetter

:  I updated with a patch. Is it possible to get this in soon cuz I
: have a client waiting on this.

I've posted some comments about your patch.

at the moment, the committers have started focusing on getting 1.2
released.  Even if this was a relaly popular issue, it's a non trivial
change that we probably would not want to rush before the release.



-Hoss



facet should add facet.analyzer

2007-05-25 Thread James liu

facet.analyzer is true, do analyze, if false don't analyze.

why i say that, Chinese word not use space to split, so if analyzed, it will
change.

now i will use map to fix it before no facet.analyzer.

--
regards
jl


Re: solr plugins

2007-05-25 Thread John Wang

Hi Yonik:

I updated with a patch. Is it possible to get this in soon cuz I
have a client waiting on this.

Thanks again

-John

On 5/22/07, John Wang <[EMAIL PROTECTED]> wrote:

Hi Yonik:

 Thank you again for your help!

 I created an improvement item in jira (SOLR-243) on this.

-John


On 5/19/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 5/19/07, John Wang < [EMAIL PROTECTED]> wrote:
> > Hi Yonik:
> >
> > Thanks for the info!
> >
> > This solves my problem, but not elegantly.
> >
> >  I have a custome implementation where I derived from the
> > IndexReader class to store some custome data. Now I am trying to write
> > a Solr plugin for my search implementation but I want to be able to
> > use my IndexReader implementation.
> >
> > Is there a way to overwrite the IndexReader instantiation? e..g
> > IndexReader newReader() etc.
>
> Not currently, but it might be a useful feature.
>
> -Yonik
>




Re: AW: Re[2]: add and delete docs at same time

2007-05-25 Thread Mike Klaas


On 25-May-07, at 2:49 AM, Burkamp, Christian wrote:


Thierry,

If you always start from scratch you could even reset the index  
completely (i.e. delete the index directory). Solr will create a  
new index automatically at startup.


This will also make indexing and optimizing much faster for any non- 
trivial size index.


-Mike


RE: field display values

2007-05-25 Thread Chris Hostetter

This would require some storage when the index is built to map between the
internal field name and the "display name" ... since this is not a Lucene
concept it would have to be a higher level concept hat Solr write to disk
directly -- there are currently no concepts like this but that doens't
mean there can't be.

the question becomes: "Is this the type of data that Solr *should* store?"
... in my opinion the answer is no.

I can't think of any value add in having Solr keep track of the fact that
"ds" means "Download Speed" vs having an external data mapping keep track
of that information, since direct access to that info inside of Solr
wouldn't typically make the performance of requests any faster or
reduce the size of the responses, it seems like the type of data that make
more senese to maintain externally.

as to your specific situation...

: I would normally agree but the problem is that I'm making very heavy use
: of the dynamic fields and therefore don't really know what a record
: looks like.  Ie the only thing that knows about the data is the input
: data itself.  I've added logic to 'solrify' the input field names as
: they come to me in the "Download Speed" format but making the reverse
: happen is impossible from the client side because each record is
: different.

...if every document is truely differnet, then the "ds" field for one doc
may not be the same as the "ds" field for another doc ... which makes it
sound like hte field display names themselves are document specific 'data'
that should be stored as field values.

I have a lot of personal experience with an app (the first Solr app
actually) where the dynamic fields a doc has depend on it's category,
and i actually put the info about the fields (including their display
names and info on how to facet on them) into stored fields special
"metadata documents" which go into the index and then a custom request
handler first says "what category am i interested in?" to find the
relevant metadata doc, and then uses the info found in that doc to both
query the index for the "real" results, as well as to return the "display"
values for all of the important fields.

if you can particion your index in this way, then similar metadata docs
might mke sense for you ... if you can't (becuase every doc turely is
differnet) then making the "real" documents also store the "metadata"
about field names can work just as well.


-Hoss



Re: field display values

2007-05-25 Thread Kevin Osborn
I had a similar issue with a heavy use of dynamic fields. You first want to get 
those spaces out of there. Lucene does not like spaces in field names. So, I 
just replaced the space with a rarely used character (ASCII 8 or something like 
that). I did this in my indexing. And then I just translate between the Lucene 
encoded field name (without spaces) and my display field name (with spaces) 
when I go back and forth between Solr and my client.

So, your "Download Speed"=>"DownloadSpeed"=>"Download Speed". Spaces are 
the only characters that seemed to cause problems.

It seems to work just fine.

- Original Message 
From: Will Johnson <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, May 25, 2007 1:48:22 PM
Subject: RE: field display values

I would normally agree but the problem is that I'm making very heavy use
of the dynamic fields and therefore don't really know what a record
looks like.  Ie the only thing that knows about the data is the input
data itself.  I've added logic to 'solrify' the input field names as
they come to me in the "Download Speed" format but making the reverse
happen is impossible from the client side because each record is
different.

- will

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 25, 2007 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: field display values


Will Johnson wrote:
> Has anyone done anything interesting to preserve display values for
> field names.  Ie my users would like to see
> 
> Download Speed (MB/sec): 5
> 
> As opposed to:
> 
> ds:5
> 
>  


The general model has been to think of solr like SQL... it is only the 
database - display choices should be at the client side.  It seems easy 
enough to have a map on the client with:
   "ds" => "Download Speed (MB/sec)"

That said, something like the sql 'as' command would be useful:
   SELECT ds as `Download Speed (MB/sec)` FROM table...;

rather then define the field name at index time (as your example 
suggests) it makes more sense to define it at query time (or as a 
default in the RequestHandler config)  Maybe something like:

/select?fl=ds&display.ds=Download Speed (MB/sec)


Maybe this would be a way to specify date formatting?

/select?fl=timestamp&display.timestamp='Year'&display.format.timestamp=Y
YYY


just thoughts...








RE: field display values

2007-05-25 Thread Will Johnson
I would normally agree but the problem is that I'm making very heavy use
of the dynamic fields and therefore don't really know what a record
looks like.  Ie the only thing that knows about the data is the input
data itself.  I've added logic to 'solrify' the input field names as
they come to me in the "Download Speed" format but making the reverse
happen is impossible from the client side because each record is
different.

- will

-Original Message-
From: Ryan McKinley [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 25, 2007 4:32 PM
To: solr-user@lucene.apache.org
Subject: Re: field display values


Will Johnson wrote:
> Has anyone done anything interesting to preserve display values for
> field names.  Ie my users would like to see
> 
> Download Speed (MB/sec): 5
> 
> As opposed to:
> 
> ds:5
> 
>  


The general model has been to think of solr like SQL... it is only the 
database - display choices should be at the client side.  It seems easy 
enough to have a map on the client with:
   "ds" => "Download Speed (MB/sec)"

That said, something like the sql 'as' command would be useful:
   SELECT ds as `Download Speed (MB/sec)` FROM table...;

rather then define the field name at index time (as your example 
suggests) it makes more sense to define it at query time (or as a 
default in the RequestHandler config)  Maybe something like:

/select?fl=ds&display.ds=Download Speed (MB/sec)


Maybe this would be a way to specify date formatting?

/select?fl=timestamp&display.timestamp='Year'&display.format.timestamp=Y
YYY


just thoughts...




Re: field display values

2007-05-25 Thread Ryan McKinley


Will Johnson wrote:

Has anyone done anything interesting to preserve display values for
field names.  Ie my users would like to see

Download Speed (MB/sec): 5

As opposed to:

ds:5

 



The general model has been to think of solr like SQL... it is only the 
database - display choices should be at the client side.  It seems easy 
enough to have a map on the client with:

  "ds" => "Download Speed (MB/sec)"

That said, something like the sql 'as' command would be useful:
  SELECT ds as `Download Speed (MB/sec)` FROM table...;

rather then define the field name at index time (as your example 
suggests) it makes more sense to define it at query time (or as a 
default in the RequestHandler config)  Maybe something like:


/select?fl=ds&display.ds=Download Speed (MB/sec)


Maybe this would be a way to specify date formatting?

/select?fl=timestamp&display.timestamp='Year'&display.format.timestamp=


just thoughts...




field display values

2007-05-25 Thread Will Johnson
Has anyone done anything interesting to preserve display values for
field names.  Ie my users would like to see

 

Download Speed (MB/sec): 5

 

As opposed to:

 

ds:5

 

there are options for doing fancy encoding of field names but those seem
less that ideal.  What I'd really like to do is at add time:

 



  

hi

  



 

And then at result time:

 

hi

 

I've thought of having custom request handlers save this info away and
then add it back in with a customer response writer but this seemed like
it might be a more generally useful type of thing to have.  

 

Thoughts, ideas?

 

- will



Re: Problem with machine hostname and Solr/Tomcat

2007-05-25 Thread Chris Hostetter
: Anyone encounter a problem when changing their hostname?  (via
: /etc/conf.d/hostname or just the hostname command)  I'm getting this error
: when going to the admin screen, I have a feeling it's a simple fix.  It
: seems to work when it thinks the machine's name is just 'localhost'.

i don't think this is a tomcat or solr issue ... it looks like a basic
java/dns issue (that can most likely be reproduced with a 4 line
commandline java app for testing)

Take a look at the InetAddress jvadocs, specificly the info on Caching. my
guess is either:

1) your reverse name lookup doesn't match the name you are using which
causes the getLocalHost call to freak out becuase it can't do a DNS lookup
on the hostname it thinks it is.
2) you changed the name while the JVM was running, and the "forever" cache
is returning a name that no longer exists.



-Hoss



Re: read only indexes?

2007-05-25 Thread Jeff Rodenburg

We're controlling this with Tomcat configuration on our end.  I'm not a
servlet-container guru, but I would imagine similar capabilities exist on
Jetty, et al.

-- j

On 5/24/07, Ryan McKinley <[EMAIL PROTECTED]> wrote:


Is there a good way to force an index to be read-only?

I could configure a dummy handler to sit on top of /update and throw an
error, but i'd like a stronger assurance that nothing can call
UpdateHandler.addDoc()




Re: Difficulty posting unicode to solr index

2007-05-25 Thread Yonik Seeley

On 5/25/07, Ethan Gruber <[EMAIL PROTECTED]> wrote:

Posting utf8-example.xml is the first thing I tried when I ran into this
problem, and like the other files I had been working with, query results
return garbage characters inside of unicode.


After posting utf8-example.xml, try this query:

http://localhost:8983/solr/select?indent=on&q=id%3AUTF8TEST&fl=features&wt=python

The python writer uses unicode escapes to keep the output in the ascii
range, so it's an easy way to see exactly what Solr thinks those
characters are.
You should get

{
'responseHeader':{
 'status':0,
 'QTime':0,
 'params':{
'wt':'python',
'indent':'on',
'q':'id:UTF8TEST',
'fl':'features'}},
'response':{'numFound':1,'start':0,'docs':[
{
 'features':[
  'No accents here',
  u'This is an e acute: \u00e9',
  u'eaiou with circumflexes: \u00ea\u00e2\u00ee\u00f4\u00fb',
  u'eaiou with umlauts: \u00eb\u00e4\u00ef\u00f6\u00fc',
  'tag with escaped chars: ',
  'escaped ampersand: Bonnie & Clyde']}]
}}

If you do, that means that the problem is not getting the data into
solr, but the interpretation of what you get out.

-Yonik


Re: read only indexes?

2007-05-25 Thread Otis Gospodnetic
Didn't somebody talk about providing Solr with a custom (subclass of) 
IndexReader here on the list the other day?  Perhaps then a ReadOnlyIndexWriter 
with an appropriately overriden delete methods might be one approach to this.  
Or chmod -w? ;)

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

- Original Message 
From: Ryan McKinley <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Thursday, May 24, 2007 2:35:44 PM
Subject: read only indexes?

Is there a good way to force an index to be read-only?

I could configure a dummy handler to sit on top of /update and throw an 
error, but i'd like a stronger assurance that nothing can call 
UpdateHandler.addDoc()






Re: Difficulty posting unicode to solr index

2007-05-25 Thread Ethan Gruber

Posting utf8-example.xml is the first thing I tried when I ran into this
problem, and like the other files I had been working with, query results
return garbage characters inside of unicode.

On 5/25/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 5/25/07, Ethan Gruber <[EMAIL PROTECTED]> wrote:
> Yes, it's definitely encoded in UTF-8.  I'm going to attempt either
today or
> Tuesday to post the files to a solr index that is online (as opposed to
> localhost as was my case a few days ago) using post.sh through SSH and
let
> you know how it turns out.  That should definitely indicate whether or
not
> the problem is with my files themselves or the post.jar file.

Why don't you try a file that we know is encoded in UTF-8,
the solr/example/exampledocs/utf8-example.xml

Try it first without modifying it (an editor can change the encoding a
file is stored in).

-Yonik



Re: Difficulty posting unicode to solr index

2007-05-25 Thread Yonik Seeley

On 5/25/07, Ethan Gruber <[EMAIL PROTECTED]> wrote:

Yes, it's definitely encoded in UTF-8.  I'm going to attempt either today or
Tuesday to post the files to a solr index that is online (as opposed to
localhost as was my case a few days ago) using post.sh through SSH and let
you know how it turns out.  That should definitely indicate whether or not
the problem is with my files themselves or the post.jar file.


Why don't you try a file that we know is encoded in UTF-8,
the solr/example/exampledocs/utf8-example.xml

Try it first without modifying it (an editor can change the encoding a
file is stored in).

-Yonik


Re: AW: Re[2]: add and delete docs at same time

2007-05-25 Thread Erik Hatcher
Just to be clear, [* TO *] does not necessarily return all  
documents.  It returns all documents that have a value in the  
specified (or default) field.  Be careful with that!   *:*, however,  
does match all documents.


Erik


On May 25, 2007, at 5:49 AM, Burkamp, Christian wrote:


Thierry,

If you always start from scratch you could even reset the index  
completely (i.e. delete the index directory). Solr will create a  
new index automatically at startup.
If you don't like to delete the files another approach would be to  
use a query that returns all documents. You do not need a dummy  
field for this. The range query [* TO *] returns all documents. (In  
newer versions of solr you can use *:* which is executing a bit  
faster.


-- Christian

-Ursprüngliche Nachricht-
Von: Thierry Collogne [mailto:[EMAIL PROTECTED]
Gesendet: Freitag, 25. Mai 2007 10:30
An: solr-user@lucene.apache.org; Jack L
Betreff: Re: Re[2]: add and delete docs at same time

We always do a full delete before indexing, this is because for us  
that is the only way to be sure that there are no documents in the  
index that don't exist anymore.


So delete all, than add all.

To use the delete all, we did the following. We added a field  
called dummyDelete. This field always contains the value delete.

Like this
delete

Then to delete all documents we do a request containing:

 dummyDelete:delete

That way all documents are deleted where the field dummyDelete  
contains delete => all the documents


Hope this is clear. I am not sure if this is a good solution, but  
it does work. :)


Greet,

Thierry

On 25/05/07, Jack L <[EMAIL PROTECTED]> wrote:


Oh, is that the case? One document per request for delete?
I'm about to implement delete. Just want to confirm.

--
Best regards,
Jack

Thursday, May 24, 2007, 12:47:21 PM, you wrote:


currently no.



Right now you even need a new request for each delete...




Patrick Givisiez wrote:


can I add and delete docs at same post?

Some thing like this:

myDocs.xml
=

4 5 6  1
2 3
=

Thanks!











RE: index problem with write lock

2007-05-25 Thread Will Johnson
I think I had the same problem (the same error at least) and submitted a
patch.  The patch adds a new config option to use the nio locking
facilities instead of the default lucene locking.  In the ~week since I
haven't seen the issue after applying the patch (ymmv)

https://issues.apache.org/jira/browse/SOLR-240

- will

-Original Message-
From: Chris Hostetter [mailto:[EMAIL PROTECTED] 
Sent: Friday, May 25, 2007 1:50 AM
To: solr-user@lucene.apache.org
Subject: Re: index problem with write lock


: i know  how to  fix it.
:
: but i just don't know why it happen.
:
: this solr error information:
:
: > Exception during commit/optimize:java.io.IOException: Lock obtain
timed
: > out: SimpleFSLock@/usr/solrapp/solr21/data/index/write.lock

that's the problem you see ... but in normal SOlr operation there's no
reason why there should be any problem getting the write lock -- Solr
only
ever makes one IndexWriter at a time.

which is why i asked about any other errors earlier in your log
(possibly
much earlier) to indicate *abnormal* Solr operation.


-Hoss


Re: Doubt in using synonyms.txt

2007-05-25 Thread Doss

Thanks Yonik.

Regards,
Doss.

On 5/25/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:


On 5/24/07, Doss <[EMAIL PROTECTED]> wrote:
> Is it advisable to maintain a large amount of data in synonyms.txt file?

It's read into an in-memory map, so the only real impact is increased
RAM usage.  There really shouldn't be a performance impact.

-Yonik



Problem with machine hostname and Solr/Tomcat

2007-05-25 Thread Brian Lucas

Anyone encounter a problem when changing their hostname?  (via
/etc/conf.d/hostname or just the hostname command)  I'm getting this error
when going to the admin screen, I have a feeling it's a simple fix.  It
seems to work when it thinks the machine's name is just 'localhost'.

org.apache.jasper.JasperException: Exception in JSP: /admin/_info.jsp:43

40:   }
41: 
42:   String collectionName = schema!=null ? schema.getName():"unknown";
43:   InetAddress addr = InetAddress.getLocalHost();
44:   String hostname = addr.getCanonicalHostName();
45: 
46:   String defaultSearch = "";


Stacktrace:

org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:467)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:377)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:133)

root cause

java.net.UnknownHostException: app10: app10
java.net.InetAddress.getLocalHost(InetAddress.java:1308)
org.apache.jsp.admin.index_jsp._jspService(index_jsp.java:95)
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:98)
javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:328)
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:315)
org.apache.jasper.servlet.JspServlet.service(JspServlet.java:265)
javax.servlet.http.HttpServlet.service(HttpServlet.java:803)

org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:133)


-- 
View this message in context: 
http://www.nabble.com/Problem-with-machine-hostname-and-Solr-Tomcat-tf3816176.html#a10803121
Sent from the Solr - User mailing list archive at Nabble.com.



unsubcribe

2007-05-25 Thread Rafeek Raja

unsubcribe


Re: Difficulty posting unicode to solr index

2007-05-25 Thread Ethan Gruber

Yes, it's definitely encoded in UTF-8.  I'm going to attempt either today or
Tuesday to post the files to a solr index that is online (as opposed to
localhost as was my case a few days ago) using post.sh through SSH and let
you know how it turns out.  That should definitely indicate whether or not
the problem is with my files themselves or the post.jar file.

On 5/24/07, James liu <[EMAIL PROTECTED]> wrote:


how do u sure ur file is encoded by utf-8?

2007/5/24, Ethan Gruber <[EMAIL PROTECTED]>:
>
> Hi,
>
> I am attempting to post some unicode XML documents to my solr
> index.  They
> are encoded in UTF-8.  When I attempt to query from the solr admin page,
> I'm
> basically getting gibberish garbage text in return.  I decided to try a
> file
> that I know is supposed to work, which is the utf8-example.xml found in
> the
> exampledocs folder.  This also did not return proper unicode
> results.  None
> of my other coworkers have run into this problem, but I believe there is
> one
> difference between their system and my system which could account for
> the
> error.  They're using Macs and thus posting with post.sh, and I am
> running
> Windows and posting with a post.jar file.  Could post.jar not support
> unicode?  Has anyone run into this problem before?
>
> Thanks,
> Ethan
>



--
regards
jl


The function of distinct of RDBMS

2007-05-25 Thread 薬袋 貴志
Hi my name is Techan.

I want to put the function of distinct of RDBMS in solr.
I want to use any field.
However, whether it solves it in detail like any is not understood. Do
you know
someone?

(I'm sorry about computing english.)


AW: Re[2]: add and delete docs at same time

2007-05-25 Thread Burkamp, Christian
Thierry,

If you always start from scratch you could even reset the index completely 
(i.e. delete the index directory). Solr will create a new index automatically 
at startup.
If you don't like to delete the files another approach would be to use a query 
that returns all documents. You do not need a dummy field for this. The range 
query [* TO *] returns all documents. (In newer versions of solr you can use 
*:* which is executing a bit faster.

-- Christian

-Ursprüngliche Nachricht-
Von: Thierry Collogne [mailto:[EMAIL PROTECTED] 
Gesendet: Freitag, 25. Mai 2007 10:30
An: solr-user@lucene.apache.org; Jack L
Betreff: Re: Re[2]: add and delete docs at same time

We always do a full delete before indexing, this is because for us that is the 
only way to be sure that there are no documents in the index that don't exist 
anymore.

So delete all, than add all.

To use the delete all, we did the following. We added a field called 
dummyDelete. This field always contains the value delete.
Like this
delete

Then to delete all documents we do a request containing:

 dummyDelete:delete

That way all documents are deleted where the field dummyDelete contains delete 
=> all the documents

Hope this is clear. I am not sure if this is a good solution, but it does work. 
:)

Greet,

Thierry

On 25/05/07, Jack L <[EMAIL PROTECTED]> wrote:
>
> Oh, is that the case? One document per request for delete?
> I'm about to implement delete. Just want to confirm.
>
> --
> Best regards,
> Jack
>
> Thursday, May 24, 2007, 12:47:21 PM, you wrote:
>
> > currently no.
>
> > Right now you even need a new request for each delete...
>
>
> > Patrick Givisiez wrote:
> >>
> >> can I add and delete docs at same post?
> >>
> >> Some thing like this:
> >>
> >> myDocs.xml
> >> =
> >> 
> >> 4  >> name="mainId">5  >> name="mainId">6  1 
> >> 2 3 
> >> =
> >>
> >> Thanks!
> >>
> >>
> >>
> >>
>
>



Re: Re[2]: add and delete docs at same time

2007-05-25 Thread Thierry Collogne

We always do a full delete before indexing, this is because for us that is
the only way to be sure that there are no documents in the index that don't
exist anymore.

So delete all, than add all.

To use the delete all, we did the following. We added a field called
dummyDelete. This field always contains the value delete.
Like this
   delete

Then to delete all documents we do a request containing:

dummyDelete:delete

That way all documents are deleted where the field dummyDelete contains
delete => all the documents

Hope this is clear. I am not sure if this is a good solution, but it does
work. :)

Greet,

Thierry

On 25/05/07, Jack L <[EMAIL PROTECTED]> wrote:


Oh, is that the case? One document per request for delete?
I'm about to implement delete. Just want to confirm.

--
Best regards,
Jack

Thursday, May 24, 2007, 12:47:21 PM, you wrote:

> currently no.

> Right now you even need a new request for each delete...


> Patrick Givisiez wrote:
>>
>> can I add and delete docs at same post?
>>
>> Some thing like this:
>>
>> myDocs.xml
>> =
>> 
>> 4
>> 5
>> 6
>> 
>> 1
>> 2
>> 3
>> =
>>
>> Thanks!
>>
>>
>>
>>